You are on page 1of 6

International Journal of Research in Electrical, Electronics and Communication

Engineering
Volume 6 Issue 1

Multiple People Detection and Tracking

Gyanvi Agarwal1, Gul Afshan2, Gayatri Sridhar3


Students
Department of Electronics and Communication Engineering
Maharaja Surajmal Institute of Technology, New Delhi, India
Corresponding Authors’ Email Id: - gyanvi99@gmail.com1,
gul.afshana@msitjanakpuri.co.in2, gayatrisridhar04@gmail.com 3

Abstract
Multiple people detection in real-time is still a challenging task despite having
different techniques. It is challenging because partially occluded people are
still often not recognized in a heavily populated area, and also due to Non-
Maximum suppression, correct bounding boxes are also discarded, which
leads to imprecision in the detections. This paper presents the various
modifications done to multiple people detection and tracking algorithms,
which improves the efficiency and accuracy of the previously used cases.

Keywords: - YOLO, OpenCV, Faster-RCNN, MOT.

INTRODUCTION addition to the above techniques, Multiple


Humans are intelligent species that People Tracking is also a vital component
recognize and identify an object upon that is applied in real-time tracking, but the
looking. This is usually by experience. detection accuracy is considerably low. In
The human visual system is fast and this manuscript, we discuss the comparative
accurate in determining the object. analysis of YOLO object detection with
Similarly, fast and accurate algorithms OpenCV and the improvised version of
can help computers detect and classify Multiple People Tracking.
objects for this; various deep learning
algorithms, including R-CNN, Faster R- The organization of this paper is systemized
CNN, etc., were able to do the proposed as follows: Section one explains the YOLO
task-based on selective search, but the object detection with openCV, Section two
computation time is very high. In explains multiple people tracking using the
23 Page 23-28 © MANTECH PUBLICATIONS 2021. All Rights Reserved
International Journal of Research in Electrical, Electronics and Communication
Engineering
Volume 6 Issue 1

body and joint detection, followed by First Method R-CNN


section three which presents the This method is based on selective search[1].
conclusion: It divides the image into multiple regions and
then classifies each region into various
SECTION 1 classes. Using the technique of sliding
Yolo Object Detection with OpenCV window method, bounding boxes are formed.
To gain a complete image See below figure 1.
understanding, we should not only
concentrate on classifying different Method 2: Fast R-CNN
images but also try to precisely estimate Fast RCNN uses a single model that extracts
the concepts and locations of objects features from the regions, divides them into
contained in each image. This task is different classes, and returns the boundary
referred to as object detection. The boxes for the identified classes
progress in the field of machine and simultaneously. Fast RCNN resolves two
deep learning have already designed major issues of RCNN, i.e., passing one
various algorithms. Some methods are instead of 2,000 regions per image to the
R-CNN, Fast R-CNN, Faster R-CNN, ConvNet and using one instead of three
voila jones. different models. See below figure 2.

Figure 1: Selected search process in R-CNN[1]

Figure 2: Fast R-CNN

24 Page 23-28 © MANTECH PUBLICATIONS 2021. All Rights Reserved


International Journal of Research in Electrical, Electronics and Communication
Engineering
Volume 6 Issue 1

Method 3: Fast R-CNN that the fast-moving objects are captured


Faster RCNN is the modified version of very quickly compared to the rest of the
Fast RCNN. The major difference between methods. This method is mainly used for
them is that Fast RCNN uses the selective speed. It is faster compared to any other
search for generating Regions of Interest, method.
while Faster RCNN uses “Region Proposal
Network”, which made the algorithm Use of OpenCV in YOLO Objection
much faster. OpenCV reads the input image and data
points and specifies the file path to an
image in a Numpy array.

YOLO RESULTS

Figure 4: Before YOLO Detection


Figure 3: Faster R-CNN

Although faster R-CNN could solve many


problems associated with the R-CNN and
fast R-CNN, Object proposal takes much
time.

Method 4: YOLO Figure 5: After YOLO Detection

YOLO object detection is a computer


vision task that involves both localizing Comparison of Computation Time with

one or more objects within an image and Other Techniques

classifies each object in the image. It is a As per the results from previous research

single-stage detector. YOLO displays an work, YOLO is 4 times faster than Fast R-

object in a rectangular bounding box with CNN and 3 times faster than Faster R-

a provided caption. The major advantage is CNN.

25 Page 23-28 © MANTECH PUBLICATIONS 2021. All Rights Reserved


International Journal of Research in Electrical, Electronics and Communication
Engineering
Volume 6 Issue 1

Following Bar Graph Shows The Result producing all affinities and accurate body
and joint detections. Positive pairwise cost
to the box helps in removing wrong double
detections of a person, which resolves the
problem of Non- Maximum Suppression.
In addition to this, this method also helps
in recovering the detections via
Figure 6: Comparison Of Various Yolo
interpolation and extrapolation.
Object Detection Techniques

One problem that arises while using the


SECTION 2
body and joint detection tracking is
Multiple People Tracking Using Body
orientation ambiguity (the right shoulder
and Joint Detections
may appear on the left or right half side of
The goal of Multiple People Tracking is to
a bounding box depending on the walking
infer the trajectories of all targets that
direction) is overcome by calculating four
appear in a video sequence to track the
parameters: Barycentric distance, x-y-
joints and to form the bounding box
offset, Angle in reference box and distance
accurately. To achieve this, the
in the referral box.[6]
contribution in the field is threefold:
1. Integration of joint detection into near
online multiple people tracking
systems.
2. Affinities to fuse people detections and
joint detections into the tracking
system.
Figure 7: Results of the Deformable part-
3. Obtained tracker is robust against based model (DPM)
occlusions and sets a new state of the
art in the tracking.

It is assumed that all the necessary input


detections are already provided by the
external detectors[5], and the changes are
Figure 8: Results of MOT after
done on the data association step where
integrating the joint and body detection
minimum cost graph labeling helps in method
26 Page 23-28 © MANTECH PUBLICATIONS 2021. All Rights Reserved
International Journal of Research in Electrical, Electronics and Communication
Engineering
Volume 6 Issue 1

CONCLUSION II. Girshick, R. (2015). Fast R-CNN.


This paper provides a comprehensive Proceedings of the IEEE
review of YOLO object detection with international conference on
OpenCV and Multiple People Tracking computer vision (pp. 1440-1448).
using body and joint detection techniques. http://openaccess.thecvf.com/menu
The review starts on generic object .py[38
detection where three previous tasks, i.e.,
R-CNN, Faster R-CNN, and Fast R-CNN, III. Ren, S., He, K., Girshick, R., &
have been discussed then YOLO objection Sun, J. (2015). Faster R-CNN:
detection with open CV was proposed Towards real-time object detection
finally, which could overcome the with region proposal networks.
processing time issue followed by the Advances in neural information
MOT framework, which tracks people processing systems (pp. 91-99).
based on body and joint detection, which Neural Information Processing
could improve the efficiency of previous Systems Foundation, Inc.[40]
MOT based frameworks.
IV. Anonymous. (2017). Nutrition &
For example, the issue of non-maximum exercise-timing is everything.
suppression and recognition of partially Retrieved October 12, 2019 from
occluded people. This review is also https://blog.nasm.org/workout-and-
meaningful for the developments in neural nutrition-timing [61]
networks and related learning systems,
which provides valuable insights and V. R. Henschel, L. Leal-Taix´e, D.
guidelines for future progress. Cremers, and B. Rosenhahn.
Fusion of head and full-body
REFERENCES detectors for multi-object
I. Girshick, R. (2015). Fast R-CNN. tracking.In CVPR Workshop on
Proceedings of the IEEE Joint Detection, Tracking, and
international conference on Prediction in the Wild (CVPRW),
computer vision (pp. 1440-1448). 2018.[61]
http://openaccess.thecvf.com/menu
.py[36] VI. S. Tang, B. Andres, M. Andriluka,
and B. Schiele. Multiperson

27 Page 23-28 © MANTECH PUBLICATIONS 2021. All Rights Reserved


International Journal of Research in Electrical, Electronics and Communication
Engineering
Volume 6 Issue 1

tracking by multicut and deep


matching. In ECCV Workshop on
Benchmarking Multi-Target
Tracking (EC-CVW), 2016.[80]

VII. R. Henschel, L. Leal-Taix´e, and


B. Rosenhahn. Efficient Multiple
people tracking using minimum
cost arborescences. In German
Conference on Pattern Recognition
(GCPR), 2014.[70]

VIII. V. Chari, S. Lacoste-Julien, I.


Laptev, and J. Sivic. On pairwise
costs for network flow multi-object
tracking. In proceedings of the
IEEE Conference on Computer
Vision and Pattern Recognition
(CVPR), 2015.[81]

28 Page 23-28 © MANTECH PUBLICATIONS 2021. All Rights Reserved

You might also like