Professional Documents
Culture Documents
RCNN
Imad EL MALLAHI Adnane Mohammed Mahrez Mohamed Adnane Mahraz
dept. of computer sciences dept. of computer sciences dept. of computer sciences
Sidi Mohemmed Ben Abdellah Sidi Mohemmed Ben Abdellah Sidi Mohemmed Ben Abdellah
University, faculty of sciences Dhar El University, faculty of sciences Dhar El University, faculty of sciences Dhar El
mahraz mahraz mahraz
Fez, Morocco Fez, Morocco Fez, Morocco
imade.elmallahi@usmba.ac.ma adnane.mehrez@usmba.ac.ma adnane_1@yahoo.fr
Hamid TAIRI
dept. of computer sciences
Sidi Mohemmed Ben Abdellah
University, faculty of sciences Dhar El
mahraz
Fez, Morocco
hamid.tairi@usmba.ac.ma
Abstract— This work focusses the issue of traffic analysis of for storing, and moving in forward and back, and analyzing
video surveillance using Faster-RCNN. The challenge here is videos. In this work, we concern the analyze videos for
expressed as classifying and counting vehicles in Traffic. To Traffic surveillance, this research remains limited in terms of
improve this proposed issue, we used the related work of real-time data analysis.
Faster R-CNN and detector vehicles. For the classification in
live traffic movement. The performance of the Faster R-CNN Some of the representative papers in this theme, which
was improved by many restrictions such as adaptive feature used heterogeneous low-resolution data to estimate traffic
pooling, anchors optimization, focal loss, and other mask density and count vehicles.
branch. For training the proposed model, we used UA-
DETRAC dataset of 10 hours of videos at 25 frames per many efforts have been made to improve vehicle
seconds (fps), with resolution of 960×540 pixels, this dataset analysis, counting, classification and detection, satisfactory
have more than 140 k frames in the UA-DETRAC dataset and results in specific tasks.
8250 vehicles that are manually annotated with DarkLabel. Most of the recent studies focus on adapting and
improving related work in vehicle frameworks such as
Keywords— Video surveillance, Traffic estimation, Faster R-
recurrent convolutional neural networks and SSD and YOLO
CNN, Traffic detection, Convolutional neural network,
Surveillance Vehicle.
detectors. This includes new contributions at the architectural
level, ameliorating speed level, classification, and tracking
I. INTRODUCTION vehicle. In this study, we focused on a gap between recent
works and their applications in the real world. We began to
Now, most cities in the world have many video adopt the model of Faster R-CNN and improved the
surveillance systems. They have fast growth, now they have sequential baseline performance via an auxiliary mask, next,
heterogeneous cameras with various resolutions [1]. today, we have added the anchor's optimization, then we calculate
the Closed-circuit television works all times the week and the loss and pooling. The proposed can be used to analyze,
generates a huge amount of data, called Big and huge Data of and be count the traffic movement, the figure 1 represents the
videos. Moreover, this video data can be used as a Flow shart of proposed video surveillance using Faster-
foundation for the automated vehicle surveillance system. RCNN . The experiments will be used by city officials to
There are many issues when working with Big Data traffic improve the overall throughput of the intersection.
surveillance. To realize an intelligent system of Traffic
vehicle surveillance, we have an efficient hard disk system
Region
Proposal
Network
Frame Conv
Extracti Feature Classification
layers Maps
ons counting vehicles
Region of
Input video interest Final detection
pooling
B. Faster R-CNN detector for detect the module [16-18 ] and have a good start point in all detection
The base of our detection architecture is a widely task. Then, the next work may provided complex multi-
adopted two step Faster R-CNN [15] detector. The task learning. The extenction of Faster R-CNN model
main reasons for this choice. At first, two step detectors for multiple stages is as easy as adding a new
are current related work in many detection benchmarks prediction.
We also improved Faster R-CNN baseline quality
with a caracteristic pyramid network (FPN) backbone
[19], a suplimentary mask branch has added [20], Then,
the some anchors shape optimization, after that the
focal loss [21], and finally adaptive caracteristic
pooling [22].
C. Mask branch of R-CNN Table 3: Comparaison of YOLO and Proposed method
The provided in the Mask R-CNN work [23], Detection Algorithm Recal Precision
supplementary regression of the per-instance masks leads to
good accuracy. Thus, the primary optimization that we used
a additional mask branch. This branch executes in parallel Proposed method 0.80 0.79
with the current Faster R-CNN branches and target to regress
a binary mask for every region of interest. YOLO 0.78 0.77
D. Optimization of anchors shape
V. CONCLUSION
This amelioration from the easy observation that the Faster In this paper, we focussed on the issue of traffic analysis
R-CNN anchors parameterization was provided for the of video surveillance using Faster-RCNN. We expressed as
COCO dataset [ 20], but our suggested dataset is different. classifying and counting vehicles in Traffic. To improve this
proposed issue, we used the related work of Faster R-CNN
IV. EXPIRIMENTATION both and detector vehicles. The performance of the Faster R-
CNN was improved by many restrictions such as adaptive
A. Numerical Result and experimentations
feature pooling, anchors optimization, focal loss, and other
mask branch. For training the proposed model, we used UA-
In this experimentation, we carried out on a HP EliteBook
DETRAC dataset of 10 hours of videos at 25 frames per
x460 1130 G5 -Windows 10 -based on python 3.8
environment, which an Intel Core i5 - 8400 2.80 Giga Hz seconds (fps), with resolution of 960×540 pixels, this dataset
CPU and Nvidia GeForce RTX 2160 GPU with eight Giga have more than 140 k frames in the UA-DETRAC dataset
Byte in memory, employing Darknet-53 as the convolutional and 8250 vehicles that are manually annotated with
neural network on deep learning framework, with backbone DarkLabel
for YOLOv3 object detection in this approach. The chose of
the parameters for this algorithm are began with learning rate REFERENCES
to 0.0001, next we adopted the step of the learning strategy.
For the optimization, we adjust the parameters of the [1] Zhang W, et all. Table of contents. IEEE conference on computer
architecture, we based on Stochastic Gradient Descent vision and pattern recognition. volum. 2017-Jan, 2017.
(SGD) with a momentum of 0.9. In all experiments, and to [2] Liu W, et all. SSD: single shot multibox detector. computer science
avoid the overfitting for this model, we used a weight decay (including subseries lecture notes in artificial intelligence and lecture
of 0.0005. Moreover, to train 40,000 batches, we employ the notes in bioinformatics) . pp. 21–37, 2016.
same top parameters. To reduce the learning rate to 1/10 of [3] Redmon J, et all. You only look once: unified, real-time object
the initial value, we use also the 36,000 th and 46,000 th detection. IEEE conference on computer vision and pattern
recognition . 2016, p. 779–88.
batches of this architecture.
[4] Wen L et all. UA-DETRAC: a new benchmark and protocol for
multi-object detection and tracking. 2015.
B. Comparisons the result after and before detection [5] Anisimov D, et all. Towards lightweight convolutional neural
of YOLO networks for object detection. In: 2017 14 th IEEE international
As shown in Table 2, the recal and precesion is much conference on advanced video and signal based surveillance (AVSS).
2017, p. 1–8.
higher than Means No data Augmentation. To increase the
[6] Sang J, et all. An improved YOLOv2 for vehicle detection. Sensors.
computational cost, we added an additional layer at the 2018;18(12):4272.
output of the model, but the numerical results which are [7] Cordts M, et all. The cityscapes dataset for semantic urban scene
made up of recall, and precision have the most interest on the understanding. In: 2016 IEEE conference on computer vision and
sensitivity calculation, and in Table 3, the recal and pattern recognition (CVPR). 2016, p. 3213–23.
precesion is much higher than YOLO. [8] Lin TY, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár
P, Zitnick CL. Microsoft COCO: common objects in context. In:
Table 2: Numerical result of Means No data Lecture notes in computer science (including subseries lecture notes
Augmentation and Video Corruption in artificial intelligence and lecture notes in bioinformatics). vol. 8693
LNCS, 2014, p. 740–55.
Augmentation True False False Recal Precision
[9] Girshick R, et all. Rich feature hierarchies for accu- rate object
of data Positive Positive Negative
detection and semantic segmentation. In: Proceedings of the IEEE
conference on computer vision and pattern recognition. 2012.
Video 3254 623 624 0.76 0.71 [10] Girshick R, et all. Fast R-CNN. In: Proceedings of the IEEE
Corruption international conference on computer vision 2015 Inter, 2015,
p.1440–8.
Means No 1423 612 516 0.74 0.69 [11] Ren S, et all. Faster R-CNN: towards real-time object detection with
data region proposal networks. IEEE Trans Pattern Anal Mach Intell.
Augmentation 2017;39(6):1137–49.
[12] Wen L, et all. UA-DETRAC: a new benchmark and protocol for
multi-object detection and tracking. 2015
[13] Geiger A, et all. Vision meets robotics: the KITTI dataset. Int J Rob [19] Lin TY, et all. Feature pyramid networks for object detection. IEEE
Res. 2013;32(11):1231–7 conference on computer vision and pattern recognition (CVPR). vol.
[14] Brooks J, et all. COCO annotator. 2019, https ://githu b.com/jsbro 2017 Jan, 2017, p. 936–44.
ks/coco-annotator/. [20] He K, et all. Mask R-CNN. IEEE international conference on
[15] Redmon J, et all. YOLO9000: Better, faster, stronger. IEEE computer vision (ICCV).vol. 2017 Oct, 2017, p. 2980–8.
conference on computer vision and pattern recognition . 2017, p. [21] Wang X, et all. Focal loss dense detector for vehicle surveillance. In:
6517–25. 2018 international conference on intelligent systems and computer
[16] Cordts M, et all. The cityscapes dataset for semantic urban scene vision (ISCV). vol. 2018 May, 2018, p. 1–5.
understanding. In: 2016 IEEE conference on computer vision and [22] Liu S, et all. Path aggregation network for instance segmentation. In:
pattern recognition (CVPR). 2016, p. 3213–23. 2018 IEEE/CVF conference on computer vision and pattern
recognition. 2018, p. 8759–68.
[17] Lin TY, et all. Microsoft COCO: common objects in context. In: [23] He K, et all.. Mask R-CNN. international conference on computer
Lecture notes in computer science (including subseries lecture notes vision (ICCV). vol. 2017 Oct, 2017, p. 2980–8.
in artificial intelligence and lecture notes in bioinformatics). vol. 8693
LNCS, 2014, p. 740–55.
IEEE.
[18] Wen L, et all. UA-DETRAC: a new benchmark and protocol for
multi-object detection and tracking. 2015.