You are on page 1of 5

Recognition of The Rearview Image Pattern of a

Motorcycle for a System to Detect Current Violations


on a One-Way Street
1st Isma Warniati Rahman 2rd Zahir Zainuddin
Departement of Electrical Engineering Departement of Informatics Engineering
Hasanuddin University Hasanuddin University
Makassar, Indonesia Makassar, Indonesia
rahmaniw17d@student.unhas.ac.id zahir@unhas.ac.id

3th Syafruddin Syarif 4th Andani Achmad


Departement of Electrical Engineering Departement of Electrical Engineering
Hasanuddin University Hasanuddin University
Makassar, Indonesia Makassar, Indonesia
syafruddin.s@eng.unhas.ac.id andani@unhas.ac.id

Abstract— This study aims to detect traffic violations, can monitor one-way roads for 24 hours so that drivers who go
especially motorbikes that go against the flow on a one-way street against the flow can be detected and sanctioned so that it creates
by recognizing the pattern of rearview images of vehicles. The a deterrent effect.
method used is Faster-RCNN. This system uses an IP camera with
a pole height of 2.5 m with a tilt angle of 45 degrees. Recognition There have been various studies on motor vehicle traffic with
of the rear view of the vehicle using the Labeling method using the use of technology that is growing rapidly, one of which is the
LabelImg and TensorFlow with the Faster-RCNN model for use of computer vision. In 2017 Zhang et al. conducted a study
classification. The result of system accuracy is as much as 88%. of deep learning applications and monitoring of traffic flow
using drone technology. This study uses the Deep Learning
Keywords—rearview motorcycle recognition, motorcycle framework to base Faster R-CNN to train programs to detect
violation, faster R-CNN, vehicle detection. vehicles in the video. The accuracy in this study is 90% [3]. In
2017, Xiaozhu et al. used the Faster-RCNN with the ZF model
I. INTRODUCTION to detect objects in the form of armored vehicles, with accuracy
The use of motorcycles, especially in urban areas, has for the tank category 97.20% for the detection rate and 89.60%
experienced rapid development, but it's not balanced with a for the recognition rate while for the combat vehicle category
much smaller level of road development. This triggers the wheeled infantry with an accuracy of 96.70% for the detection
occurrence of traffic congestion resulting in disrupted travel rate and 88.90% for the recognition level accuracy [4]. Azam et
comfort, fatigue, and time wastage that is detrimental to road al. Researched object pose estimation. The Faster-RCNN in this
users. Good transportation is if the travel time is fast enough, not study is used to estimate vehicle poses in intelligent
jammed, safe (free from possible accidents), and comfortable transportation and automatic driving. This study uses a data set
service conditions. of cars from Stanford with an accuracy of 99.35% [5].
One of the methods used to break the bottleneck is the use of In 2019, Lahinta et al. conducted a study about counting the
one-way roads, but new problems arise due to road users who number of vehicles at a crossroads and classifying the density
violate the rules of using one-way roads such as going against level of each road segment. The method used is the Gaussian
the flow. This can endanger road users, which can trigger traffic Mixture Model (GMM) and Morphological Operation (MO) to
jams and even accidents. detect vehicle objects with an accuracy of 90.9% [6]. Research
on motor vehicles was also carried out by Jaya et al., which
Based on an article about the Traffic Law, there were 1547
tracked vehicle tax violations using the identification of the type
vehicles against the current that caught the Razia at Operation
and number plate on the vehicle. The algorithm used is the
Zebra on October 23, 2019, in the West Jakarta area[1]. In Law
Gaussian Mixture Model (GMM) to detect vehicle types and
No.22 of 2009 concerning Traffic and Road Transportation
Automatic Number Plate Recognition (ANPR) to detect vehicle
(LLAJ) article 287 paragraph 1, it is explained that sanctions will
plates with an accuracy level of 91% for vehicle types and 70%
be imposed on motorized motorists not complying with road
for vehicle plate detection [7].
markings, which will be punished with a maximum of 2 months
in prison or a maximum fine. 500,000 rupiah[2]. The lack of There is a lot of research on motorized vehicle traffic, but so
direct supervision on one-way streets makes people dare to go in far to the best of the author's knowledge there has been no
the direction of the reason the distance traveled is closer than by research on violations of vehicle traffic that go against the flow
following the appropriate road. Therefore we need a system that on one-way streets. The system used to detect violations of
vehicle traffic against the flow still uses the manual method and 1) Feature Extraction: At this stage, a rearview image of
is not done at any time so there are still many motorized vehicle the vehicle is entered into the convolutional neural
users who are still against the flow and are not recorded by the network to extract features.
authorities. Therefore, this study aims to detect vehicles against
the current on one-way roads, especially two-wheeled vehicles, 2) Region Proposal: A map of the features obtained is
by recognizing the rearview pattern of the vehicle using the entered into the RPN. The proposal region takes the
Faster-RCNN model. This research uses a camera so that it will image map feature as input and output in the form of a
be more efficient and reduce human work. rectangular object proposal.
3) Bounding Box: Top n region proposals produced by
II. PROPOSED METHOD RPN will be entered into the network to form
In this study, the authors propose the identification of classification values and improve bounding box.
rearview motorcycle that can be used to detect motorcycle that
4) Result : At this stage, the results of a program to detect
are against the current. The Faster-RCNN model is used to
traffic violations that is against the flow with the
identify the rear view of the motorcycle . The process of taking
rearview parameter are displayed
pictures in the form of videos using the Ip Camera Yoosee
YYP2P with a 2MP camera quality. The video data is used as
input data. Data was taken using a pole with a height of 2.5
meters with a camera angle of 45 degrees.
Illustration of the proposed program can be seen in figure 1

Fig. 2. The implementation flowchart of motorcycle detection

Fig. 1. Illustration System

The camera is mounted on the left side of the road. Then the
camera will take pictures of motorcycles passing by on the road
and identify whether there is a vehicle that is against the current
by detecting the rear view of the motorcycle. If the camera
detects the rear view of the motorcycle, the system will
immediately detect violations of motorcycle traffic against the
flowt.
Fig. 3. Two frame from video capture by IPcam
A. Rearview Motorcycle Identification
1. Faster-RCNN
The author uses the Faster R-CNN (deep learning
framework based on Caffe) method to detect the rear view of Faster-RCNN is a development of Fast R-CNN developed
the motorcycle in the video. The identification of the rear view by Shaoqing Ren and assisted by Girshick (a pioneer of R-CNN
of the motorcycle is obtained by analyzing the object detection and Fast R-CNN) with the addition of the Region Proposal
results from several consecutive frames. a more specific Network (RPN). The Faster-RCNN consists of two modules,
implementation process is shown in Figure 2. namely, a deep convolutional network that proposes regions
(Region), and the second module is a Fast R-CNN detector that
The deep learning method is divided into two main steps, uses the proposed area [8]. The whole system is one integrated
namely the training process and the detection process. Figure 3 network to detect objects. The RPN module provides
is a frame taken from the video in the training process. information to the Fast R-CNN module where to look. Faster-
RCNN uses a high-quality region proposal provided by RPN for
The process carried out to identify the rear view of the vehicle target recognition, which increases the speed of detecting objects
is as follows : [9]. Inputs used in Faster-RCNN are images, and output in the
form of detected objects is marked with a bounding box [10]. b. Anchors
The architecture of Faster-RCNN is shown in Figure 4. Anchors play an important role in Faster-RCNN. Anchor is
a. Regional Proposal Network (RPN) a box that in default configuration Faster-RCNN is 9 anchors in
the image position. in each sliding-window location, it will
RPN is a network based on total convolution that can simultaneously predict several proposal regions, where the
simultaneously predict the target area frame from each input maximum possible number of proposals for each location is
image location and target score [11]. RPN produces several represented as k. So the reg layer has an output of 4k, which
bounding boxes, each box has two probability scores, whether at encodes the coordinates of the box k, and the cls layer produces
that location there is an object or not. RPN is used to produce a score of 2k, which estimates the probability of the object or not
high-quality Region Proposal boxes, share convolution features the object for each proposal [8].
throughout the graph with detection networks, and solve speed
problems from the original Selective Search that can enhance the An example of a chart showing nine anchors can be seen in
object detection effect [9]. The use of RPN to replace Selective Figure 6. Three colors of anchors represent three scales or sizes:
Search reduces the computing requirements significantly and 128x128, 256x256, and 512x512, with each box having a width
makes the whole model trainable end-to-end. Also, the Faster- and height ratio of 1: 1, 1: 2, and 2: 1. This anchor works well
RCNN produces faster and more accurate performance when for Pascal VOC datasets and also COCO.
compared to Fast R-CNN. The structure of the RPN is shown in
Figure 5

Fig. 6. Example Anchors (320,320)

c. Feature Extraction
The original feature extraction contains several Conv + relu
layers and some of the first convolution structures obtained
through transfer learning such as Alex, VGG-16, GoogLeNet,
etc. After reaching the FC level, additional Conv + ReLu layers
are added to the feature output. There are nine most likely
candidate windows that can be considered for each image
position where this is called an anchor [12].

d. Bounding Box Regression


Fig. 4. Faster-RCNN Architecture
Bounding box regression is a popular technique for
improving or predicting localization boxes in a recent object
detection approach. In the Object Proposal bounding box
inserted into the ROI (Region of Interest) Merge layer, this is
the Max Pooling layer, which divides the input into various
patches so that the output remains 7x7 in size. Then, the class
score layer provides the ability of the object of each class, and
the Bounding box regression layer provides a bounding box
around the detected object [10].

e. Loss Function
The loss function is a function that describes the loss
Fig. 5. Anchors of RPN associated with all the possibilities generated by a model. This
loss function works when the learning model provides errors
that must be considered. A good loss function gives a low error III. EXPERIMENTAL AND RESAULT
value. Following the definition of multi-task loss [11], objective First, we used as much as 510 rearview image data of the
functions are minimized. Figure Loss function on Faster-RCNN vehicle, which was divided into test data and training data with
is defined in equation (1).. a ratio of 80% for training data or as many as 408 images and
20% test data or as many as 102 images. Then the labeling
1 ∗ process is carried out on each image using Labeling, wherein this
({ }, { }) = ( , )
process, each image looking behind the vehicle, will be labeled
with the name label "Breaking". This labeling process is carried
∗ ∗ out to create a Pascal VOC data set.
+ Σ ( , ) (1)
After the image labeling process is complete, then the
process of converting the XML file to CSV. The XML file is
Where is the index of the anchor in a mini-batch and is obtained from the image labeling, which is then converted into
the probability of the anchor predicted as an object. The a CSV file to be processed into a data set that can be read by
ground-truth label ∗ is 1 if the anchor is positive, and 0 if the TensorFlow in the form of TFRecord. After that, the config file
anchor is negative. is a vector representation of 4 coordinate that will be used to configure the training model will be
parameters from bounding box predictions and ∗ is a ground- arranged. This configuration uses the Faster-RCNN Inception
truth box associated with a positive anchor. Pet V2 model provided by Tensorflow. Then the training process
is carried out wherein this process takes as much as 13 hours
In classification loss, is a log loss of 2 classes, namely with 51,000 steps. The model is extracted into a Frozen
objects and not objects. For regression loss ( , ∗) = Inference Graph so that it can be used to predict the object

R( , ) is used where R is a strong loss function (Smooth L1) provided, in this case, the rear view of the vehicle. After all
defined in [13]. ∗ means to enable loss regression only for processes are complete, the object will be detected. Vehicles that
positive anchors (= 1) and deactivated if vice versa (= 0). oppose the flow will be detected marked by the appearance of
The four coordinate parameters of the bounding box bounding boxes in the area of vehicles that oppose the current.
prediction are defined in equation (2) - (5) [11] From our research, the results obtained are explained can be
seen in Table I, which shows the accuracy of the identification
= , = , (2) of the rear view of the vehicle.

TABLE I. ACCURACY SYSTEM


= log , = log , (3)
Actual System Total Accuracy
∗ ∗
∗ ∗ V NV V NV (%)
= , = , (4)
1 9 1 0 1 90
1 8 9
∗ ∗

= log , ∗
= log , (5) 2 8 2 0 2 90
1 7 8
3 7 3 0 3 100
Where and indicate the center coordinates of the box 0 7 7
whereas and ℎ are the width and height of the box. The
variables , and ∗ respectively are for representing 4 6 4 0 4 80
prediction boxes, Anchor boxes, and ground-truth boxes (this 2 4 6
also applies to , , ℎ). This can be considered a bounding box 5 5 5 0 5 80
regression from the anchor box to the nearest ground-truth box 2 3 5
[11]. The Loss graph is shown in Figure 7.
Average Accuracy 88

In this study, the testing process was carried out using ten
motors with five different scenarios, as shown in table 1. where
V is a motorcycle that is against the current, while NV is a
motorcycle that is not against the current The results of testing
using a system that was created based on the five scenarios
above are as follows:
• In the first scenario, there is one motorcycle detected
Fig. 7. Losses Graph violating, eight motorcycles detected not violating, and one
motorcycle that did not violate but was detected violating
by the system, in this scenario, obtained an accuracy of
90%.
• In the second scenario, two motorbikes were detected that (ICMLC), Ningbo, China, Jul. 2017, pp. 189–194, doi:
violated, seven vehicles were detected not breaking, and 10.1109/ICMLC.2017.8107763.
one motorcycle that was on the right track but was detected [4] X. Xiaozhu and H. Cheng, “Object Detection of Armored
violating by the system. The resulting accuracy is 90%. Vehicles Based on Deep Learning in Battlefield
• In this third scenario, the actual data and data on the system Environment,” in 2017 4th International Conference on
have the same results so that the accuracy obtained is Information Science and Control Engineering (ICISCE),
100%. The system detected three vehicles that violated and Changsha, Jul. 2017, pp. 1568–1570, doi:
seven vehicles that did not violate. 10.1109/ICISCE.2017.327.
[5] S. Azam, A. Rafique, and M. Jeon, “Vehicle pose
• In the fourth scenario, in the system, there are six violating detection using region based convolutional neural
vehicles where 2 of them are negative data, and four non- network,” in 2016 International Conference on Control,
violating vehicles are detected, the accuracy of the fourth Automation and Information Sciences (ICCAIS), Ansan,
scenario is 80%. South Korea, Oct. 2016, pp. 194–198, doi:
• In the fifth scenario, seven vehicles detected violating were 10.1109/ICCAIS.2016.7822459.
2 of them were negative data, and three vehicles were [6] F. Lahinta, Z. Zainuddin, and S. Syarif, “Vehicle
detected not violating, the resulting accuracy was 80% Detection and Counting to Identify Traffic Density in
The Intersection of Road Using Image Processing,”
The average accuracy obtained from the five test scenarios
presented at the 1st International Conference on Science
above is 88%. The experimental result shown in figure 9
and Technology, ICOST 2019, 2-3 May, Makassar,
Indonesia, Makassar, Indonesia, 2019, doi:
10.4108/eai.2-5-2019.2284706.
[7] A. Jaya, Z. Zainuddin, and S. Syarif, “Tracking of
Vehicle Tax Violations Using Vehicle Type and Plate
Number Identification,” presented at the 1st International
Conference on Science and Technology, ICOST 2019, 2-
3 May, Makassar, Indonesia, Makassar, Indonesia, 2019,
doi: 10.4108/eai.2-5-2019.2284610.
[8] S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN:
Towards Real-Time Object Detection with Region
Proposal Networks,” IEEE Trans. Pattern Anal. Mach.
Intell., vol. 39, no. 6, pp. 1137–1149, Jun. 2017, doi:
Fig. 8. Experimental Result 10.1109/TPAMI.2016.2577031.
[9] J. Zou and R. Song, “Microarray camera image
CONCLUSION segmentation with Faster-RCNN,” in 2018 IEEE
This study uses real-time data on the streets. Data was taken International Conference on Applied System Invention
using a Yoosee YPP2P camera with 2MP camera quality. The (ICASI), Chiba, Apr. 2018, pp. 86–89, doi:
camera is placed on a pole as high as 2.5 M with a 45-degree camera 10.1109/ICASI.2018.8394403.
angle. The parameters used are the rear view of the vehicle, in this [10] B. T. Nalla, T. Sharma, N. K. Verma, and S. R. Sahoo,
case, a two-wheeled vehicle (motorbike). This program was built “Image Dehazing for Object Recognition using Faster
using the deep learning method by utilizing Tensorflow and the RCNN,” in 2018 International Joint Conference on
classification process using Faster-RCNN. The results of comparing Neural Networks (IJCNN), Rio de Janeiro, Jul. 2018, pp.
manual observation and using the system obtained an accuracy 01–07, doi: 10.1109/IJCNN.2018.8489280.
88%.
[11] X. Wang and Q. Zhang, “The Building Area Recognition
REFERENCES in Image Based on Faster-RCNN,” in 2018 International
Conference on Sensing,Diagnostics, Prognostics, and
[1] K. C. Media, “6.686 Kendaraan Terjaring Operasi Zebra Control (SDPC), Xi’an, China, Aug. 2018, pp. 676–680,
Hari Pertama, Paling Banyak karena Lawan Arus,” doi: 10.1109/SDPC.2018.8664773.
KOMPAS.com. [12] J. Zhuang, L. Liu, K. Tang, and J. Li, “Faster RCNN for
https://megapolitan.kompas.com/read/2019/10/24/13121 Printing Nozzle Detection in Complex Scene,” in 2018
461/6686-kendaraan-terjaring-operasi-zebra-hari- 10th International Conference on Intelligent Human-
pertama-paling-banyak-karena (accessed Nov. 24, 2019). Machine Systems and Cybernetics (IHMSC), Hangzhou,
[2] “Undang Undang.” http://hubdat.dephub.go.id/uu/288- Aug. 2018, pp. 230–234, doi:
uu-nomor-22-tahun-2009-tentang-lalu-lintas-dan- 10.1109/IHMSC.2018.00060.
angkutan-jalan (accessed Nov. 24, 2019). [13] R. Girshick, “Fast R-CNN,” in 2015 IEEE International
[3] J.-S. Zhang, J. Cao, and B. Mao, “Application of deep Conference on Computer Vision (ICCV), Santiago, Chile,
learning and unmanned aerial vehicle technology in Dec. 2015, pp. 1440–1448, doi:
traffic flow monitoring,” in 2017 International 10.1109/ICCV.2015.169.
Conference on Machine Learning and Cybernetics

You might also like