You are on page 1of 5

Traffic analysis of video surveillance using Faster-

RCNN
Imad EL MALLAHI Adnane Mohammed Mahrez Mohamed Adnane Mahraz
dept. of computer sciences dept. of computer sciences dept. of computer sciences
Sidi Mohemmed Ben Abdellah Sidi Mohemmed Ben Abdellah Sidi Mohemmed Ben Abdellah
University, faculty of sciences Dhar El University, faculty of sciences Dhar El University, faculty of sciences Dhar El
mahraz mahraz mahraz
Fez, Morocco Fez, Morocco Fez, Morocco
imade.elmallahi@usmba.ac.ma adnane.mehrez@usmba.ac.ma adnane_1@yahoo.fr

Hamid TAIRI
dept. of computer sciences
Sidi Mohemmed Ben Abdellah
University, faculty of sciences Dhar El
mahraz
Fez, Morocco
hamid.tairi@usmba.ac.ma

Abstract— This work focusses the issue of traffic analysis of for storing, and moving in forward and back, and analyzing
video surveillance using Faster-RCNN. The challenge here is videos. In this work, we concern the analyze videos for
expressed as classifying and counting vehicles in Traffic. To Traffic surveillance, this research remains limited in terms of
improve this proposed issue, we used the related work of real-time data analysis.
Faster R-CNN and detector vehicles. For the classification in
live traffic movement. The performance of the Faster R-CNN Some of the representative papers in this theme, which
was improved by many restrictions such as adaptive feature used heterogeneous low-resolution data to estimate traffic
pooling, anchors optimization, focal loss, and other mask density and count vehicles.
branch. For training the proposed model, we used UA-
DETRAC dataset of 10 hours of videos at 25 frames per many efforts have been made to improve vehicle
seconds (fps), with resolution of 960×540 pixels, this dataset analysis, counting, classification and detection, satisfactory
have more than 140 k frames in the UA-DETRAC dataset and results in specific tasks.
8250 vehicles that are manually annotated with DarkLabel. Most of the recent studies focus on adapting and
improving related work in vehicle frameworks such as
Keywords— Video surveillance, Traffic estimation, Faster R-
recurrent convolutional neural networks and SSD and YOLO
CNN, Traffic detection, Convolutional neural network,
Surveillance Vehicle.
detectors. This includes new contributions at the architectural
level, ameliorating speed level, classification, and tracking
I. INTRODUCTION vehicle. In this study, we focused on a gap between recent
works and their applications in the real world. We began to
Now, most cities in the world have many video adopt the model of Faster R-CNN and improved the
surveillance systems. They have fast growth, now they have sequential baseline performance via an auxiliary mask, next,
heterogeneous cameras with various resolutions [1]. today, we have added the anchor's optimization, then we calculate
the Closed-circuit television works all times the week and the loss and pooling. The proposed can be used to analyze,
generates a huge amount of data, called Big and huge Data of and be count the traffic movement, the figure 1 represents the
videos. Moreover, this video data can be used as a Flow shart of proposed video surveillance using Faster-
foundation for the automated vehicle surveillance system. RCNN . The experiments will be used by city officials to
There are many issues when working with Big Data traffic improve the overall throughput of the intersection.
surveillance. To realize an intelligent system of Traffic
vehicle surveillance, we have an efficient hard disk system
Region
Proposal
Network
Frame Conv
Extracti Feature Classification
layers Maps
ons counting vehicles
Region of
Input video interest Final detection
pooling

Figure 1: Flowchart of proposed video surveillance using Faster-RCNN

XXX-X-XXXX-XXXX-X/XX/$XX.00 ©20XX IEEE


II. CLASSIFICATION AND RECOGNITION OF TRAFFIC VIDEO stage detectors that currently occupy leading places in
Cityscapes [7], the COCO [8], and benchmark datasets.
A. Related Works The comparaison between the first stage and second stage
The classification and detection the moving objects in detectors began with the prediction of regions and next
videos is a significant issue to resolve in building autonomus classify and refine each of them during the two stage.
video surveillance. Some algorithms have been implemented The first paper of R-CNN [9] was employed a simple
to aid resolve this issue, many methodes for classifying and approach: region was generated with many selective
detecting objects in videos such as background researchs algorithms and next implemented the classification
subtraction.The differents methods to resolve this issue with of CNN. The global vitesse of the R-CNN is feeble due to
own designing traffic video analytic resolutions should the selective research compute time and requirement to
consider the type of algorithm implementation employed to execut classifier for each region. To resolve this issue, we
good fit the task at hand. suggest a Fast R-CNN [10]. Instead of executiogn ing CNN
for each region. Modifing selective research in the proposed
Faster R-CNN [11] with CNN, called region suggested
B. Objects detection in video surveillance network, also boosted accuracy, and vetesse of the detector
The fast convolutional neural networks (FCNN) for
III. GOAL AND METHODOLOGY OF RESEARCH
object detection in video surveillance, it divided into two
principal’s classes: the first is single phase detectors and The main goal, we are divide the issue in three sub
second phase detectors. The single-phase detectors are stages, first car detection, Next car tracking, and finally car
usually very fast and can be predict objects in video orientation estimation.
bounding boxes with classes within a simple, and single A. Dataset of detection the video surveillance
network. Conventional examples of the first and single-stage
detectors are SSD [2], and YOLO [ 3 ]. These methods can In this application, the number of surveillance cameras is
be use particularly in cases when goal objects occupy a more than forty surveillance cameras, the most of
considerable amount of global image. Some recent examples surveillance cameras are fixed above the crossroads to give a
of this methods using UA-DETRAC vehicle detection global overview of the traffic situation in the boulevards the
dataset [4]. Moreover, this data has been used by Dmitriy city. In this dataset the most of camers have twenty-five
Anisimov and Tatiana Khanova [5] to constructed SSD-like frames per second, with resolution of 960×540. Morever, the
detector well be run for 40 frames per second while streams of these videos are not very good due to firstly
maintaining favorable precision. In addition, the more hardware faults, then blurring, and finally compression
application are used of good speed-precision trade-of is artefacts. The figure 1 represents the video from the one of
version 2 for YOLO method [6], the particular of vehicle these cameras, and the figure 2 show the images from these
detection via anchors clustering, multi-layer feature fusion cameras of dataset.
strategy, and loss normalization. The R-CNN family of
detectors is used for the many representative method of two-

Figure 2: Some examples cameras of UA-DETRAC dataset


Figure 3: Frame in the UA-DETRAC dataset
we annotated these data with many polygons more than sixty
thousand instances across nine hundred eighty two frames
such as UA-DETRAC [12] or KITTI [13] . The from the chosen camera. All annotation relevant satage were
dissimilarity comes from multitude aspects:firstlly with the provided with the tool of COCO Annotator [14]. It is very
number of instances on a single frame, next with high difficult to annotate video sequences, because they have
variations in scale, then with viewing angle, and finally of many time consuming, also there are crowded scenes.
occlusions level. The challenge od this dataset resides even a Instead, we concentrate firstly on traffic situations, on
human eye is not able of labelling each instance on the of weather conditions, on time of day. In this paper, we also
image all time, figure 3 shows the frame in the UA- annotated every single vehicle that have distinguished with
DETRAC dataset. In this paper, we concetrate on a single big confidence, particularly in crowded traffic jams. the
camera that monitors one of the many problematic Table 1 give summarize the distribution of dataset.
crossroads. To maximize detection precision, and get a
minimum viable product for a deliberate choice. In this work,
TABLE 1 DISTRIBUTION OF DATASET
Mean instance per
Type of vehicle Number of instances
frame
Bus 1234 1.26
Truck 2415 2.46

Car 53083 4.06

Trolleybus 611 0.62

TRAM 1298 1298

VAN 2783 2.83

B. Faster R-CNN detector for detect the module [16-18 ] and have a good start point in all detection
The base of our detection architecture is a widely task. Then, the next work may provided complex multi-
adopted two step Faster R-CNN [15] detector. The task learning. The extenction of Faster R-CNN model
main reasons for this choice. At first, two step detectors for multiple stages is as easy as adding a new
are current related work in many detection benchmarks prediction.
We also improved Faster R-CNN baseline quality
with a caracteristic pyramid network (FPN) backbone
[19], a suplimentary mask branch has added [20], Then,
the some anchors shape optimization, after that the
focal loss [21], and finally adaptive caracteristic
pooling [22].
C. Mask branch of R-CNN Table 3: Comparaison of YOLO and Proposed method
The provided in the Mask R-CNN work [23], Detection Algorithm Recal Precision
supplementary regression of the per-instance masks leads to
good accuracy. Thus, the primary optimization that we used
a additional mask branch. This branch executes in parallel Proposed method 0.80 0.79
with the current Faster R-CNN branches and target to regress
a binary mask for every region of interest. YOLO 0.78 0.77
D. Optimization of anchors shape
V. CONCLUSION
This amelioration from the easy observation that the Faster In this paper, we focussed on the issue of traffic analysis
R-CNN anchors parameterization was provided for the of video surveillance using Faster-RCNN. We expressed as
COCO dataset [ 20], but our suggested dataset is different. classifying and counting vehicles in Traffic. To improve this
proposed issue, we used the related work of Faster R-CNN
IV. EXPIRIMENTATION both and detector vehicles. The performance of the Faster R-
CNN was improved by many restrictions such as adaptive
A. Numerical Result and experimentations
feature pooling, anchors optimization, focal loss, and other
mask branch. For training the proposed model, we used UA-
In this experimentation, we carried out on a HP EliteBook
DETRAC dataset of 10 hours of videos at 25 frames per
x460 1130 G5 -Windows 10 -based on python 3.8
environment, which an Intel Core i5 - 8400 2.80 Giga Hz seconds (fps), with resolution of 960×540 pixels, this dataset
CPU and Nvidia GeForce RTX 2160 GPU with eight Giga have more than 140 k frames in the UA-DETRAC dataset
Byte in memory, employing Darknet-53 as the convolutional and 8250 vehicles that are manually annotated with
neural network on deep learning framework, with backbone DarkLabel
for YOLOv3 object detection in this approach. The chose of
the parameters for this algorithm are began with learning rate REFERENCES
to 0.0001, next we adopted the step of the learning strategy.
For the optimization, we adjust the parameters of the [1] Zhang W, et all. Table of contents. IEEE conference on computer
architecture, we based on Stochastic Gradient Descent vision and pattern recognition. volum. 2017-Jan, 2017.
(SGD) with a momentum of 0.9. In all experiments, and to [2] Liu W, et all. SSD: single shot multibox detector. computer science
avoid the overfitting for this model, we used a weight decay (including subseries lecture notes in artificial intelligence and lecture
of 0.0005. Moreover, to train 40,000 batches, we employ the notes in bioinformatics) . pp. 21–37, 2016.
same top parameters. To reduce the learning rate to 1/10 of [3] Redmon J, et all. You only look once: unified, real-time object
the initial value, we use also the 36,000 th and 46,000 th detection. IEEE conference on computer vision and pattern
recognition . 2016, p. 779–88.
batches of this architecture.
[4] Wen L et all. UA-DETRAC: a new benchmark and protocol for
multi-object detection and tracking. 2015.
B. Comparisons the result after and before detection [5] Anisimov D, et all. Towards lightweight convolutional neural
of YOLO networks for object detection. In: 2017 14 th IEEE international
As shown in Table 2, the recal and precesion is much conference on advanced video and signal based surveillance (AVSS).
2017, p. 1–8.
higher than Means No data Augmentation. To increase the
[6] Sang J, et all. An improved YOLOv2 for vehicle detection. Sensors.
computational cost, we added an additional layer at the 2018;18(12):4272.
output of the model, but the numerical results which are [7] Cordts M, et all. The cityscapes dataset for semantic urban scene
made up of recall, and precision have the most interest on the understanding. In: 2016 IEEE conference on computer vision and
sensitivity calculation, and in Table 3, the recal and pattern recognition (CVPR). 2016, p. 3213–23.
precesion is much higher than YOLO. [8] Lin TY, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár
P, Zitnick CL. Microsoft COCO: common objects in context. In:
Table 2: Numerical result of Means No data Lecture notes in computer science (including subseries lecture notes
Augmentation and Video Corruption in artificial intelligence and lecture notes in bioinformatics). vol. 8693
LNCS, 2014, p. 740–55.
Augmentation True False False Recal Precision
[9] Girshick R, et all. Rich feature hierarchies for accu- rate object
of data Positive Positive Negative
detection and semantic segmentation. In: Proceedings of the IEEE
conference on computer vision and pattern recognition. 2012.
Video 3254 623 624 0.76 0.71 [10] Girshick R, et all. Fast R-CNN. In: Proceedings of the IEEE
Corruption international conference on computer vision 2015 Inter, 2015,
p.1440–8.
Means No 1423 612 516 0.74 0.69 [11] Ren S, et all. Faster R-CNN: towards real-time object detection with
data region proposal networks. IEEE Trans Pattern Anal Mach Intell.
Augmentation 2017;39(6):1137–49.
[12] Wen L, et all. UA-DETRAC: a new benchmark and protocol for
multi-object detection and tracking. 2015
[13] Geiger A, et all. Vision meets robotics: the KITTI dataset. Int J Rob [19] Lin TY, et all. Feature pyramid networks for object detection. IEEE
Res. 2013;32(11):1231–7 conference on computer vision and pattern recognition (CVPR). vol.
[14] Brooks J, et all. COCO annotator. 2019, https ://githu b.com/jsbro 2017 Jan, 2017, p. 936–44.
ks/coco-annotator/. [20] He K, et all. Mask R-CNN. IEEE international conference on
[15] Redmon J, et all. YOLO9000: Better, faster, stronger. IEEE computer vision (ICCV).vol. 2017 Oct, 2017, p. 2980–8.
conference on computer vision and pattern recognition . 2017, p. [21] Wang X, et all. Focal loss dense detector for vehicle surveillance. In:
6517–25. 2018 international conference on intelligent systems and computer
[16] Cordts M, et all. The cityscapes dataset for semantic urban scene vision (ISCV). vol. 2018 May, 2018, p. 1–5.
understanding. In: 2016 IEEE conference on computer vision and [22] Liu S, et all. Path aggregation network for instance segmentation. In:
pattern recognition (CVPR). 2016, p. 3213–23. 2018 IEEE/CVF conference on computer vision and pattern
recognition. 2018, p. 8759–68.
[17] Lin TY, et all. Microsoft COCO: common objects in context. In: [23] He K, et all.. Mask R-CNN. international conference on computer
Lecture notes in computer science (including subseries lecture notes vision (ICCV). vol. 2017 Oct, 2017, p. 2980–8.
in artificial intelligence and lecture notes in bioinformatics). vol. 8693
LNCS, 2014, p. 740–55.
IEEE.
[18] Wen L, et all. UA-DETRAC: a new benchmark and protocol for
multi-object detection and tracking. 2015.

You might also like