Professional Documents
Culture Documents
B. Proposed Framework
In this proposed framework, the deep CNN model that we
utilize is the Faster R-CNN Inception v2 model [3]. A CNN in
Faster R-CNN [8][14] extracts image features. Inception [15]
has a layer by layer structure, but unlike other deep
architecture it goes wider rather than deeper. Inception module
removes the barrier of choosing multiple filters to compensate
2
Authorized licensed use limited to: University College London. Downloaded on May 23 2020 at 09:21:39 UTC from IEEE Xplore. Restrictions apply
Fig. 2. Architecture of Inception network a) Original inception refers to the portion that doesn't belong to the crosswalk region
module, b) inception module after 55 convolution replaced by and thus not detected. False positive refers to the portion that is
two 3x3, and c) inception after non factorization detected but is not a part of the original crwalk region. And
false negative refers to the portion that is not detected but is a
part of the original crosswalk region Since, we know the
coordinates of the original region are, we can calculate in
positive by considering the amount of are detected correctly.
Similarly, we can calculate the other measurements
accordingly.
TP
Precision=
TP+ FP
Recall is the percentage of total positive cases that the
classifiers can catch correctly refers to the recall also known as
sensitivity which is calculated by Eq. (3)
Fig. 3. Flow diagram of crowwalk region detection and
localization using Faster R-CNN and Inception v2 TP
Recall=
TP+ FN
C. Experimental Results
We evaluated our model on the dataset that we have made with The F1-score is defined as the harmonic mean of the
Tensorflow 1.12 library. Cross Entropy implementations recall and precision. The FI-score is defined by Fq. (4)
provided by Tensorflow faster renn inception v2 network
architecture were used to evaluate training subset frequently. TP
We scheduled our model to be run initially at learning rate F 1−Score=
TP+ FP
0.0002 and after 90k and 120k iterations at learning rate
0.00002 and 0.000002 respectively. We applied early stopping The accuracy is calculated by the following equation Eq.
when the model converged at -45k iterations. Batch size was (5).
one, i.e. the whole dataset was used in one epoch. The loss TP+TN
curve of the training model is presented in Fig. 4. Accuracy=
The performance of the proposed framework has been tested in TP+ FP+ TN + FN
different lighting conditions as well as on different types of
zebra crossings to evaluate the accuracy and efficiency of the Fig. 5 demonstrates the detection results of some example
detection and recognition of zebra crossings images of crosswalk under different environment and lighting
conditions. The Sample 1 in Fig. 5 shows the detection results
of crosswalk image with normal illumination in a sunny day.
Sample 2 presents low light condition for the crosswalk image.
Whole crosswalk has low illumination and surrounding area
has little light. Sample 3 shows the experimental results in the
uneven illumination or shadows environment, where the
crosswalk's different portion has varying high and low
illumination. Sample 4 and 5 represents night and rainy
weather respectively. Sample 4 has artificial lights on the
crosswalk and very dim illumination and Sample 5 has surface
reflection of light. The experimental result for crosswalk
where option of crosswalk is occluded by human is shown in
Fig. 4. Loss-curve of the training model. Sample 6. Bounding box rectangles are shown in thick borders
To understand the efficiency and correctness of the proposed for viewing purposes although real detections have thin
system, some processing examples of different illumination borders.
conditions and orientations of crosswalks have been The machine learning metrics of TP, TN. FP. FN and lou for
demonstrated. The detection accuracy table for varying Sample 1 to 6 is shown in Table II. The crosswalk's ROI
crosswalks are given with the calculation of true positive (TP), detection and localization accuracy at different environmental
true negative (TN), false positive (FP), false negative (FN), conditions is shown in Table III.
intersection over union (lou), precision, recall, F1-score and
accuracy itself. The average accuracy for each orientation and
lighting conditions of crosswalk samples are calculated too
Training Ch
The true positive refers to the portion that belongs to the
onginal crosswalk region and correctly detected. True negative
3
Authorized licensed use limited to: University College London. Downloaded on May 23 2020 at 09:21:39 UTC from IEEE Xplore. Restrictions apply
Sample 4 97.96 97.96 97.96 97.98
Sample 5 100 100 100 100
Sample 6 90.00 100 94.7400 95.00
Average 95.33 99.66 97.41 97.50
S Fig.6 demonstrates results of some false detection example
ample 1: Sunny predicted by our proposed method. False positives happen due
to the structures that are very similar to crosswalks such as
road markings with alternating patterns, stairs in opposite to
illumination, shadows similar to crosswalks etc.
Sample 5: Rainy Our method has the best accuracy of 97.50% over different
environmental situations which are the most diverse weather
conditions. While [6] only classifies crosswalks using CNNs
where many of the images acquiring and annotation steps do
not involve any learning, proposed method can both classify
and detect crosswalks in diverse and complex scenarios with
Sample 6: Occluded crosswalk the proposed network as shown in figure 5. These images
show the robustness of the model in different viewing angles,
scale variance of crosswalks both in horizontal and vertical
directions, occlusion etc. Moreover, in [6] achieves its highest
Fig. 5. Crosswalk images in different environmental and
accuracy with the very deep, computationally expensive VGG
lighting conditions: a) Original experimental image, by
Ground-truth image, e) Predicted image, and d) loU of
network, while our proposed framework surpasses it with a
Ground-truth and predicted image
comparatively cheaper network.
TABLE II. TP, TN, FP, FN and lol of Crosswalkdetected ROI
IV. CONCLUSION
at different environmental conditions
TP (%) TN (%) FP (%)
We present a method based on a DCNN model that can detect
Samples FN("%) Tol (%)
zebra crosswalks in diverse weather and lighting conditions.
Sample 1 92.00 100 8.00 0.00 87.00
Moreover, our model can detect multiple crosswalks in
Sample 2 98.00 100 2.00 0.00 86.50 different orientations without the need of any extra processing
Sample 3 94.00 100 6.00 0.00 83.00 with the accuracy of 97.50%. We provided the framework
Sample 4 96.00 98.00 2.00 2.00 89.00 images of our own and those that are taken from video frames.
Sample 5 99.00 100.00 0.00 0.00 87.00 Proposed method uses the Faster R-CNN and Inception v2
Sample 6 90.00 100 10.00 0.00 85.00 where Inception v2 works by going wider to reduce
bottlenecks without hurting accuracy. There are certain scopes
TABLE III. Crosswalk ROI detection and localizationaccuracy of improvement in our model. Future work would be done to
at different environmental conditions implement the system in real-time environments in complex
Average scenarios and improve the accuracy.
Precisi- Recall F1-score Accuracy
Samples on (%) (%) (%)
Accuracy
(%) (%)
Sample 1 92.00 100 95.83 96.00 97.50
Sample 2 98.00 100 98.99 99.00
Sample 3 94.00 100 96.91 97.00
4
Authorized licensed use limited to: University College London. Downloaded on May 23 2020 at 09:21:39 UTC from IEEE Xplore. Restrictions apply
References
[1] S. Se, "Zebra-crossing detection for the partially sighted," [10] D. Koester, B. Lunt, and R. Stiefelhagen, "Zebra Crossing
in Proceedings IEEE Conference on Computer Vision and Detection from Aerial Imagery Across Countries," In
Pattern Recognition. CVPR 2000 (Cat. No.PR00662), vol. 2, International Conference on Computers Helping People with
pp. 211-217. IEEE, 2000. Special Needs (pp. 27-34). Springer, Cham, 2016.
[2] S. Yu, H. Lee, and J. Kim, "LYTNet: A Convolutional [11] X. Liu, Y. Zhang, and Q. Li, "AUTOMATIC
Neural Network for Real-Time Pedestrian Traffic Lights and PEDESTRIAN CROSSING DETECTION AND
Zebra Crossing Recognition for the Visually Impaired". In IMPAIRMENT ANALYSIS BASED ON MOBILE
International Conference on Computer Analysis of Images and MAPPING SYSTEM." ISPRS Annals of Photogrammetry.
Patterns, pp. 259-270, Springer. Cham, 2019. Remote Sensing and Spatial Information Sciences, vol. IV-
[3] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. 2/W4, pp. 251-258, 09/13 2017.
Wojna, "Rethinking the Inception Architecture for Computer [12] V. N. Murali and J. M. Coughlan, "Smartphone-based
Vision," in 2016 IEEE Conference on Computer Vision and crosswalk detection and localization for visually impaired
Pattern Recognition (CVPR), pp. 2818-2826, IEEE, 2016. pedestrians," in 2013 IEEE International Conference on
[4] D. Ahmetovic, C. Bernareggi, A. Gerino, and S. Mascetti, Multimedia and Expo Workshops (ICMEW), pp. 1-7, IEEE,,
"Zebra Recognizer: Efficient and Precise Localization of 2013.
Pedestrian Crossings," in 2014 22nd International Conference [13] M. Poggi, L. Nanni, and S. Mattoccia, "Crosswalk
on Pattern Recognition, pp. 2566-2571, IEEE, 2014. Recognition Through Point-Cloud Processing and Deep-
[5] D. Ahmetovic, C. Bernareggi, and S. Mascetti, Learning Suited to a Wearable Mobility Aid for the Visually
"Zebralocalizer : identification and localization of pedestrian Impaired," in New Trends in Image Analysis and Processing -
crossings," In Proceedings of the 13rd International ICIAP 2015 Workshops, Cham, pp. 282-289: Springer
Conference on Human Computer Interaction with Mobile International Publishing, 2015.
Devices and Services (pp. 275-284). ACM, 2011. [14] S. Ren, K. He, R. Girshick, and J. Sun, "Faster R-CNN:
Towards Real- Time Object Detection with Region Proposal
[6] R. F. Berriel, A. T. Lopes, A. F. d. Souza, and T. Oliveira- Networks." IEEE Transactions on Pattern Analysis and
Santos, "Deep Learning-Based Large-Scale Automatic Machine Intelligence, vol. 39, no. 6, pp. 1137-1149, IEEE,
Satellite Crosswalk Classification," IEEE Geoscience and 2017.
Remote Sensing Letters, vol. 14, no. 9, pp. 1513-1517, IEEE, [15] C. Szegedy et al., "Going deeper with convolutions," in
2017. 2015 IEEE Conference on Computer Vision and Pattern
[7] P. A. a. L. D. Bharat K. Bhargava, "A Mobile-Cloud Recognition (CVPR), pp. 1-9, IEEE, 2015.
Pedestrian Crossing Guide for the Blind." In International [16] S. I. a. C. Szegedy, "Batch Normalization: Accelerating
Conference on Advances in Computing & Communication, Deep Network Training by Reducing Internal Covariate Shift."
2011. Journal of Machine Learning Research, vol. 37, pp. 448-456,
[8] R. Girshick, "Fast R-CNN," in 2015 IEEE International 2015.
Conference on Computer Vision (ICCV), 2015, pp. 1440-
1448, IEEE, 2015.
[9] V. Ivanchenko, J. Coughlan, and S. Huiying, "Detecting
and locating crosswalks using a camera phone," in 2008 IEEE
Computer Society Conference on Computer Vision and Pattern
Recognition Workshops, pp. 1-8, IEEE, 2008.
5
Authorized licensed use limited to: University College London. Downloaded on May 23 2020 at 09:21:39 UTC from IEEE Xplore. Restrictions apply