You are on page 1of 4

Deep Learning Approach for U.S.

Traffic Sign Recognition


Emmanuel B. Nuakoh, Kaushik Roy, Xiaohong Yuan, Albert Esterline
Department of Computer Science
North Carolina A&T State University
1601 E Market St
Greensboro, NC 27411
ebnuakoh@aggies.ncat.edu; {kroy, xhyuan, esterlin}@ncat.edu

ABSTRACT Several methods have shown success in TSRS. Convolutional


Advanced Driver Assistant Systems (ADAS) have seen massive Neural Networks (CNN) is one of the methods that has yielded
improvements in recent years; from detecting pedestrians, road high success in classifying traffic signs [5]. These deep learning
lanes, traffic signs and signals, and vehicles to recognizing and methodologies have been tested exhaustively on European traffic
tracking traffic signs. Traffic sign recognition systems are used to signs but have not yet been adopted for classifying U.S. traffic
detect and classify the traffic signs. This research is focused on signs. This work is an application of the VGGNet proposed by [7]
the classification aspect of the ADAS; i.e. identifying the class a to classify U.S. traffic signs beyond speed limit signs to
traffic sign belongs to. Most of the current ADAS that use the incorporate all signs contained in the LISA TS dataset [8].
U.S. traffic signs are limited to speed limit signs recognition only.
This work seeks to expand the corridors of U.S. traffic signs 1.1 Problem Statement and Hypothesis
recognition to cover all the publicly available classes. The The application of TSRS in ADAS has seen some commercial
research adopts the VGGNet architecture modified to classify success with limited functionality [4]. The limitation is not only
U.S. traffic signs provided by the LISA TS benchmark. The linked to the number of supported traffic signs, but also the areas
original VGGNet was used to classify The German Traffic Sign of the road network where they are most effective. The lack of
Recognition Benchmark and reported an accuracy of 98.7%. This standardized datasets for training and testing models pose another
work recorded a validation accuracy of 99.04%. problem in this area of research, making it quite difficult to
compare the performance of different models.
CCS Concepts Unbalanced traffic sign datasets also present another layer of
• Computing methodologies➝Supervised learning by difficulty for developing TSRSs. Machine learning depends on
classification; Neural networks huge amount of data for training the algorithm to perform better
on unseen data, however, most traffic sign datasets are porous and
Keywords need to be augmented in other ways to balance them.
Traffic Sign Recognition; Convolutional Neural Network;
Deep Learning; Deep Neural Networks; Computer Vision 1.2 Contribution
The main contribution of this work is to use an existing deep
1. INTRODUCTION learning architecture and modify it to classify the U.S. traffic
Traffic sign recognition systems (TSRS) are concerned with two signs presented in LISA TS benchmark. Limited work has been
activities; detecting and recognizing traffic signs in an image done using U.S. datasets and no known work has been reported
frame [1]. Detecting or localizing the traffic sign includes finding using a deep learning technique such as CNN for classifying this
the color and shape in an image frame. Recognizing a sign has to dataset. This research is an initial step to develop a robust CNN
do with classifying the detected sign into a given type of traffic for classifying the whole U.S. traffic sign dataset in the LISA TS
signs. This work is focused on the recognition aspect of the TSRS. benchmark.
Several researches have been conducted in the area of traffic sign 2. LITERATURE REVIEW
recognition using publicly available datasets, such as The Belgian
Traffic sign recognition research has gained a lot of traction in
Traffic Sign Classification (BTSC) dataset [2], the German
recent years. Various methods with wide applicability for image
Traffic Sign Recognition and Detection Benchmark (GTSRB and
recognition have proven that they can be used to recognize traffic
GTSDB) [3], the Croatian traffic sign dataset (rMASTIF) [4], and
signs as well [9].
the Tsinghua-Tencent 100 K benchmark [5]. However, limited
research has been conducted on US traffic signs [6]. Research into TSRS was boosted since several of these datasets
are commonly used to evaluate the performance of computer
Permission to make digital or hard copies of all or part of this work for vision algorithms for traffic sign detection and recognition. Deep
personal or classroom use is granted without fee provided that copies are learning for image recognition has seen wide applicability in
not made or distributed for profit or commercial advantage and that biometrics, medical diagnostics, text categorization, speech
copies bear this notice and the full citation on the first page. Copyrights recognition and traffic sign recognition, just to name a few.
for components of this work owned by others than ACM must be According to [13] conventional machine learning has limitation
honored. Abstracting with credit is permitted. To copy otherwise, or with processing the raw data. These are representation-learning
republish, to post on servers or to redistribute to lists, requires prior
specific permission and/or a fee. Request permissions from
based classifications. Each layer of representation signifies the
Permissions@acm.org. presence or absence of parts of objects such as edges and motif.
ICDLT 2019, July 5–7, 2019, Xiamen, China Subsequent layers detect objects as combinations of identified
© 2019 Association for Computing Machinery. parts.
ACM ISBN 978-1-4503-7160-5/19/07…$15.00
https://doi.org/10.1145/3342999.3343016

47
2.1 Traffic Sign Recognition with Deep (GTSRB) competition, where some of the best methods presented
used CNN for classification [18].
Learning
[10] developed a committee of CNN and a multilayer perceptron There is limited research on traffic sign detection and recognition
(MLP) that were trained on HOG features (HOG3). They trained using US traffic signs. [6] used an R-CNN algorithm to detect US
the best architecture initialized with uniformly random traffic signs and showed some good result on LISA-TS Extension
distribution weights with a hyperbolic tangent activation function. dataset [19]. The classification was based on the speed limit super
The classification had a combined classification rate (CCR) of class alone.
98.98%, outperforming humans in some instances as reported by
Our research tries to extend the boundaries of the current research
[11] in the GTSRB competition. In the same report, [12] placed
to include all the signs in the dataset by replicating work done by
second with 98.97% accuracy using a multi-layer ConvNets with
[12] adopted for classifying the U.S. traffic signs dataset.
more sophisticated non-linearities. An improvement to [10] using
CNN with data augmentation and jittering and achieved an 3.2 Experimental Setup
accuracy of 99.46% [13]. The experimental setup consists of two main parts. Firstly,
[14] proposed a Hinge Loss Stochastic Gradient Descent preparing the data, that spans all the tools used to extract and
(HLSGD) cost function method for training CNNs. This function separate traffic signs into their respective classes. Secondly, the
is similar to Support Vector Machines (SVM) hinge loss and extracted data is fed into a deep neural network for classification.
performed faster than the Stochastic Gradient Descent (SGD), The LISA TS dataset is a publicly available traffic sign dataset for
which is preferred for training CNN. They achieved an accuracy the United States [8]. This dataset offers the researchers, who are
of 99.65% beating the state-of-the-art approach of 99.46% interested in classifying U.S. traffic signs, an avenue to train and
reported in [13, 15]. test their models. The LISA TS dataset was selected for this work
because there is a limited work has been done on U.S. traffic
[4] developed OneCNN, a convolutional neural network, inspired signs.
by [12]. They developed a single network that is deeper and more
complex but less computationally costly, and used it to classify 3.2.1 Preparing the dataset.
multiple datasets, particularly, GTSRB, BTSC, and introduced a The LISA TS dataset comes with a set of Python tools for
new dataset called rMASTIF. They achieved an accuracy of extracting traffic sign images from annotated frames. Since this
99.11% as against the state-of-the-art of 99.65% for the GTSRB research is concerned with the classification aspect of TSRS, these
[14]; 98.17% for BTSC against the state-of-the-art [15] with tools were useful. There are 47 classes of traffic signs that are
98.77% and; recorded 99.53% for rMASTIF. Their work was only named using the convention: “0000”, “0001”, …, “0045” and
related to the classification aspect of the TSRS as does this “0046”. An 80/20 split sets for Training and Testing were
research. obtained using the Python script provided as “split1.csv” and
“split2.csv”, respectively. Traffic sign images under each split set
2.2 Traffic Sign Recognition with Traditional were cropped and copied to their respective class folder. These
Machine Learning class folders were then manually copied into two new folders
[16] used SVM with Gaussian kernels to recognize traffic signs called “Training” and “Testing” as training and test sets,
from blobs which have been categorized into shape classes. To respectively.
test the effect of occlusion on recognition, an occlusion mask was
placed on the images. Small-, medium- and largesized masks 3.2.2 Exploring the dataset.
reported 93.24%, 67.85% and 44.90% probabilities of The LISA TS dataset consists of 6610 frames with 7855
successfully recognizing the signs respectively. An observation annotations [8]. The sizes of the traffic sign images range from as
made was that a large-sized occlusion mask placed in the middle little as 6×6 to 167×168 pixels. The cameras used for image
pictoram’s inner area, showed the worst performance during collection had resolutions ranging from 640×480 to 1024×522
recognition. [17] applied an SVM for classifying traffic signs on pixels with certain images being greyscale while others were color
the GTSRB dataset. They combined local binary pattern (LBP), images.
HOG and GABOR for feature extraction. For LBP alone, the
performance was 93.36%, GABOR alone recorded a performance The extracted signs were split into training and test sets with
of 93.90% and HOG alone recorded 94.56%. A combination of all 80/20 split ratio respectively. Fig. 1 shows the distribution of
three yielded the best performance of 97.04%. HOG and GABOR traffic signs for each class. The most populated classes are
together had a performance of 97.00%; close to the performance “SpeedLimit40”, “doNotPass”, and “rampSpeedAdvisory50” in
of all three combined. Their proposed algorithm was ninth overall an order of decreasing magnitude. The least occurring traffic signs
compared to the results obtained in the 2011 GTSRB competition; in the dataset are “rampSpeedAdvisory20”, “speedLimitUrdbl”,
however, it had the best performance for “Other Prohibitions”, “thruMergeLeft”, and “turnLeft”.
and “Mandatory”, categories scoring 99.86%, and 99.83%, Fig. 2 shows samples of traffic sign images in each class. The
respectively. EBLearn 2LConvNet, and CNN HOG3 were the images shown are the first images in that class. It should be noted
previous best performers in those categories scoring 99.80%, and that several of these signs have numbers showing speed limits of
97.89, respectively. some sort. This raises concern for the performance of the model
especially with the limited samples in each class.
3. METHODOLOGY AND EXPERIMENTS
3.1 Approach
CNN has made a lot of stride in the image recognition space in
recent times. It gained recognition in the traffic sign recognition
space during “The German Traffic Sign Recognition Benchmark”

48
each of memory size of 4GB, installed on a single system were
used to allow multiprocessing of the computations during training.

4. RESULTS
The original research recorded an accuracy of 98.7% on the
GTSRB dataset. This work recorded a training accuracy of
99.02% and a validation accuracy of 99.04% on LISA TS dataset.
The model was trained over 600 epochs. The confusion matrices
of the 600th epoch are presented in Fig. 4 for normalized and non-
normalized samples. 15 out of the 1567 images were wrongly
classified. One image in the “keepRight” class causes a confusion
to the model as “zoneAhead45” 1% of the time. One
“pedestrainCrossing” sign amounting to 1% of the class was
misclassified as “stop”. “rampSpeedAdvisoryUrdbl” causes
Fig. 1. Training Data Distribution per Class for Each Traffic confusion to the model by classifying as “school” 100% of the
Sign time; only one sign is in this class. One “rightLaneMustTurn” sign
making up 6% of the class misclassifies as “stop”.
3.2.3 Deep Neural Network (DNN) Model “speedLimit25” is wrongly classified 2% of the time as
Architecture. “speedLimitUrdbl”, which stands for unreadable speed limits.
“speedLimit55” is misclassified as “truckSpeedLimit55” 100% of
The VGG Layer has a batch normalization layer for faster and
the time. “speedLimit65” is misclassified 6% of the time as
better training, a param ReLu layer for solving dead linear
“stop”. “speedLimitUrdbl” represents speed limit signs that were
rectifier issue during training, convolution layer with parametric
unreadable to be correctly classified by human. This class was
relay activation using Xavier Scheme for weights and biases
classified 8% of the time as “speedLimit35” and 4% of the time as
initialization, fully connected layer with fully connected dense
“speedLimit50”. Finally, “yield” misclassified 8% of the time as
layers also using Xavier Scheme for weights and biases
“stop”.
initializations, and a max pooling operation as its basic elements.
Fig. 5 shows plots of accuracy (left) and loss (right) during
The spatial transformer layer consists of a localization and an
training and validation respectively. It was noted that the model
affine transformation layer. This layer consists of a 5x5
could have performed better if it had been trained a little longer.
Convolution filter, followed by a 3x3 Convolution filter, a 1x1
The model does not overfit since the validation accuracy is less
Convolution Filter, 128 Dense layer, 64 Dense layer and 6
than the training accuracy throughout the whole process. The
Identity transformers.
model shows a loss of 7.03% during training and 8.8% during
The VGG Network (Fig. 3) is made up of 2 back to back validation. The original VGGNet showed a loss of 7.5% during
Convolutions of 2x2 kernel size and a stride of 2. The layer has 1 validation, no value was recorded for training.
pooling layer followed by a dropout layer. The model is made up
of 4 VGG layers. The first layer extracts 32 feature maps from an 5. CONCLUSION AND FUTURE WORK
input layer of 32x32x3 features while the second layer extracts 64 ADAS helps drivers with the perception in a complex
feature maps from a 16x16x32 input, with the third layer environment with a lot of background noise. A computer is rather
extracting 128 feature maps from an 8x8x64 input and the fourth, trained to pick up things that may escape the human eye. This
256 feature maps from an input of 4x4x128. The model also has 3 research seeks to investigate using the VGGNet to recognize U.S.
Fully Connected layers with the 1024 hidden layers that map to traffic signs. Previous work done with the same dataset, but with
512 hidden layers that further map to the 47 classes in the dataset. the speed limits signs only scored an accuracy of 95.7%, a little
under the 97% reported by [6]. This research shows an overall
accuracy of 99.04% during validation.

Figure 2. Sample Image in Each Class


Model building params used include: Xavier Initialization as a
tuning methodology; early stopping to restore previous checkpoint Figure. 3. VGGNet Architecture1
if accuracy gains in testing does not meet current requirements; With the LISA TS dataset being limited and porous, only training
batch size of 500, learning rate of 5e-5, regularization factor of and validation were performed. In future, the LISA TS extension
1e-5 were chosen as best choice for hyper params after searching would be employed for testing the model. Future work would be
over 500 Epochs; Adam Optimizer was used for optimization. extended to detect traffic signs in video frames in real time.
This implementation is a modified version of VGGNet1. 8 GPUs Temporal Convolutional Networks (TCN) would be investigated
whether it can be used to improve the speed of the model. It has
been used by [20] for detecting human action from video frames
1 https://github.com/vamsiramakrishnan/TrafficSignRecognition

49
but no known work has been done using TCN for traffic sign [7] Simonyan, K., and Zisserman, 2015. “A. Very deep
recognition. Convolutional Networks for Large-Scale Image
Recognition”. In International Conference on Learning
The dataset would be experimented on modified LeNet, AlexNet,
Representations (ICLR).
and CUDA ConvNet. The GTSRB and BTSC would be used to
test the model performance and compare the algorithm to other [8] Møgelmose, A., Trivedi, M. M., and Moeslund, T. B. 2012.
standardized algorithms. "Vision based Traffic Sign Detection and Analysis for
Intelligent Driver Assistance Systems: Perspectives and
Survey," IEEE Transactions on Intelligent Transportation
Systems.
[9] Mathias, M., Timofte, R., Benenson, R., and Van Gool, L.
2013. “Traffic Sign Recognition—How Far Are We from
The Solution?”. In Neural Networks (IJCNN), The 2013
International Joint Conference on (pp. 1-8). IEEE.
[10] Cireşan, D., Meier, U., Masci, J. and Schmidhuber, J. 2011.
"A committee of neural networks for traffic sign
classification," The 2011 International Joint Conference on
Neural Networks, San Jose, CA, 2011, pp. 1918-1921.
[11] Stallkamp, J., Schlipsing, M., Salmen, J., & Igel, C. (2011,
July). The German Traffic Sign Recognition Benchmark: A
Figure 4. Confusion Matrix of Model after 600 Epochs with multi-class classification competition. In IJCNN 6, 7.
and without Normalization [12] Sermanet, P., and LeCun, Y. 2011. “Traffic Sign Recognition
with Multi-Scale Convolutional Networks”. In Neural
Networks (IJCNN), The 2011 International Joint Conference
on (pp. 2809-2813). IEEE.
[13] CireşAn, D., Meier, U., Masci, J., and Schmidhuber, J. 2012.
Multi-column deep neural network for traffic sign
classification. Neural networks, 32, 333-338.
[14] Jin, J., Fu, K., and Zhang, C. 2014. "Traffic Sign Recognition
with Hinge Loss Trained Convolutional Neural Networks".
In IEEE Transactions on Intelligent Transportation Systems,
vol. 15, no. 5, pp. 1991-2000, Oct. 2014.
Figure 5. Plot of Accuracy and Loss during Training and
Validation [15] Zhu, Y., Wang, X., Yao, C., and Bai, X. 2013. Traffic sign
classification using two-layer image representation. In IEEE
6. REFERENCES International Conference on Image Processing (pp. 3755-
[1] Escalera, A. D. L., Moreno, L., Salichs, M. A., and 3759). (2013, September). IEEE.
Armingol, J. M. 1997. Road traffic sign detection and [16] Maldonado-Bascón, S., Lafuente-Arroyo, S., Gil-Jimenez, P.,
classification. Gómez-Moreno, H., and LópezFerreras, F. 2007. “Road-sign
[2] Timofte, R., Zimmermann, K., and Van Gool, L. 2014. Detection and Recognition Based on Support Vector
“Multi-View Traffic Sign Detection, Recognition, and 3D Machines”. IEEE transactions on intelligent transportation
Localization”. Machine vision and applications, 25(3), 633- systems, 8(2), 264-278.
647. [17] Berkaya, S. K., Gunduz, H., Ozsen, O., Akinlar, C., and
[3] Stallkamp, J., Schlipsing, M., Salmen, J., and Igel, C. 2011. Gunal, S. 2016. “On circular traffic sign detection and
“The German traffic sign recognition benchmark: a multi- recognition”. Expert Systems with Applications, 48, 67-75.
class classification competition”. In Neural Networks [18] Stallkamp, J., Schlipsing, M., Salmen, J., and Igel, C. 2012.
(IJCNN), The 2011 International Joint Conference on (pp. “Man vs. computer: Benchmarking Machine Learning
1453-1460). IEEE. Algorithms for Traffic Sign Recognition”. Neural Networks,
[4] Jurišić, F., Filković, I., and Kalafatić, Z. 2015. “Multiple- Volume 32, 2012, Pages 323-332.
Dataset Traffic Sign Classification with OneCNN”. In [19] Møgelmose, A., Liu, D., and Trivedi, M. M., 2014. “Traffic
Pattern Recognition (ACPR), (2015, November) 3rd IAPR Sign Detection for U.S. Roads: Remaining Challenges and a
Asian Conference on (pp. 614-618). IEEE. Case for Tracking,” IEEE Intelligent Transportation Systems
[5] Zhu, Z., Liang, D., Zhang, S., Huang, X., Li, B., and Hu, S. Conference, (ITSC2014), Oct. 2014.
2016. “Traffic-Sign Detection and Classification in The [20] Lea, C., Flynn, M. D., Vidal, R., Reiter, A., and Hager, G. D.
Wild”. In Proceedings of the IEEE Conference on Computer 2017. "Temporal Convolutional Networks for Action
Vision and Pattern Recognition (pp. 2110-2118). Segmentation and Detection," 2017 IEEE Conference on
[6] Li, Y., Møgelmose, A., and Trivedi, M. M. 2016. “Pushing Computer Vision and Pattern Recognition (CVPR),
the “Speed Limit”: High-Accuracy US Traffic Sign Honolulu, HI, 2017, pp. 1003-1012.
Recognition with Convolutional Neural Networks”. IEEE
Transactions on Intelligent Vehicles, 1(2), 167-176.

50

You might also like