Professional Documents
Culture Documents
Abstract—In the arena of artificial intelligence, the world is these systems are strikingly different from contemporary
revolutionizing with many technological applications being forms. ITS are compatible to recognize traffic signs involving
incorporated with Artificial Intelligence due to improved the significant performance of convolutional neural networks.
efficiency and performance. AI has penetrated drastically, Intelligent transport systems are an element of mobility mix
delving deep into locker room decisions in many fields like feeder for public transit. Humans are much more subject to
agriculture, healthcare, military, manufacturing, robotics, error than the computer so technology does it for us!
transportation and so on. AI does a lot more than improving our
lives, in most cases, it saves our lives too. Autonomous vehicles, This paper discusses deeper about Intelligent traffic sign
the so-called self-driving cars, are one of the greatest applications detection and recognition systems and it is deployed
of AI and are very instrumental in making the machine work combining intelligent cameras with voice bot and information
autonomously by observing and interpreting the real-life scenario being processed by the performed software. Traditionally, the
of the environment. This paper deals with the deployment of an standard computer vision methods were deployed all over to
Automatic Traffic sign detection System with voice assistant, detect the traffic sign and classify them, but it requires more
which is one of the applications of autonomous vehicles, which manual work to define the features of the images [1-5]. Instead
can tone down the driver from puzzling traffic conditions have come up with a Deep learning model, wherein there is no
significantly increasing driving safety and comfort. This will need to define the features of images, which can be done
require an appropriate database and algorithm for improved
automatically by the neural network with feature engineering.
accuracy in performance. This paper, therefore, compares the
Deep learning algorithms are reusable and can provide precise
features, accuracy, and efficiency of various deep learning
algorithms and comes up with a varied model thus saving
results with a lesser amount of training data. Here CNN
computational resources. technique is adopted (Convolutional Neural Network or
ConvNet), which falls under the class DNN(Deep Neural
Keywords— Artificial Intelligence; Deep learning; Algorithms, Network), and is a multi-layer neural network, which can
Voice Assistant; Accuracy; Efficiency. recognize visual patterns with minimal preprocessing from
pixel images, and it is very effective in reducing the amount of
the parameters without losing the quality of the model.
I. INTRODUCTION
Therefore, to obtain better accuracy and efficiency, the
With the accelerated advancement of technology, architectural working of popular CNN architectures(LeNet-5,
automation has emerged as a valuable tool for raising living Alex Net, VGG, and ResNet)is discussed further, two efficient
standards. Since the number of vehicles is increasing daily so models to detect and recognize the traffic sign have been
are the road accidents. According to the report, India loses 1.5 proposed in this paper.
lakh lives every year due to road accidents. The major reason
accounting for these accidents is distraction and incautiousness
II. LITERATURE SURVEY
due to which drivers are unable to recognize and follow traffic
signboards. Traffic sign boards are an integral part of the road, Exploring for an algorithm which provides better accuracy
which provide important information and instructions to the is crucial to classify any object. Traditional methods deal with
road users, which in turn requires drivers to adhere to the road usage of Haar wavelet features obtained from AdaBoost
regulations. Traffic signs are the most significant and integral training to detect traffic signs was proposed by Bahlmann et al
part of engineering the road infrastructure cautioning drivers to which uses color sensitive wavelet features with Bayesian
ensure speed awareness, road conditions, and other obstacles. classifier with error rate about 15 % on test set containing
So, to lessen the road accidents, recognizing and understanding traffic sign videos and also has achieved fewer count of false
the traffic symbols is of supreme importance. An intelligent positives of 1 out every 600 frames.[6].
transport system is the beginning to solve these traffic-related Changzhen, X. et al. (2016) proposed an approach traffic
problems that have not been encountered by the public, so sign recognition with limited predefined set of signs using
978-1-6654-9515-8/$31.00©2022IEEE
Authorized licensed use limited to: VIT University- Chennai Campus. Downloaded on April 18,2022 at 09:28:36 UTC from IEEE Xplore. Restrictions apply.
2022International Conference on Advanced computing Technologies & Applications (ICACTA), Mar. 04 – 05, 2022, Coimbatore, INDIA
existing defining models in regional language of china with neighborhood pixels were not considered and interpreted as
deep learning convolutional neural networks dividing the separate input features. Some drawbacks are it causes over-
dataset into testing and training model showed about 99% fitting and there is no built-in mechanism to overcome
accuracy using real time detection in video sequences [7]. this.[14][15].
Inception models were proposed by GoogLeNet in 2015,
where there is requirement to decide the type of convolution
network beforehand and all convolutions are made in parallel
resulting in concatenated feature maps before the next layer. It
consisted of a 22-layer CNN with error rate of 6.7% [8].
“Traffic sign detection via interest region extraction” by
Salti et al, deals with robust computer vision algorithms with
pattern recognition and solid image analysis. The major
drawback of this is, it only detects three categories of traffic
symbols (prohibitory,danger, mandatory) [9].
“On circular traffic sign detection and recognition” deals
with colored images which utilizes RGB thresholding Fig. 1. Architectural Diagram of LeNet-5
technique with circular detection algorithm.This method of
approach might have many challenges in real-time due to B. AlexNet
illumination,darkness and unclear images [10]. AlexNet with multi-GPU training is done by splitting half
“Towards Real-Time Traffic Sign Detection and of the model's neurons to two GPUs. Fully connected layers
Classification”, deals with color probability model which have the parameter count really high so it is often very
enhance the specific colors red, blue, yellow and machine computationally expensive. Since fully connected layers
learning algorithms SVM, CNN to classify and detect the possess increased number of parameters it is computationally
traffic signs and have achieved accuracy of 99.65%, but the expensive classifier can be linear (1 fc layer) but the
model will one of be able to classify the images into three performance may be not as expected or nonlinear (> 1 layer)
categories mandatory, prohibitory, mandatory, mandatory but the complexity and model parameters increase
traffic signs [11]. significantly, which might cause overfitting. Alex Net is NOT
deep enough compared to the later model such as VGGNet,
MdTarequl proposed a system “Traffic sign detection and Google Net, and ResNet. Large convolution filters (5*5)
recognition based on convolutional neural networks” consisting shortly after that is not encouraged.Gradient vanishing could
of two separate neural networks,first neural network is to not solve the problem by initiating weights in the neural
identify the shape of contour, another neural network is networks in normal distribution so it is replaced by the Xavier
identified whether the image contains sign or not and if so, the method. The performance is surpassed by more complex
network will classify it. The performance of this model models such asGoogle Net (6.7%), and ResNet(3.6%).[16][17]
compared to other architectural models [12].
“Road Traffic Sign detection and Classification”, involves
two algorithms, the first one identifies the traffic sign in the
image and the second one recognizes the sign, but the model
will only be able to categorize the images into four categories
i.e warning, prohibition,obligation and informative [13].
III. ARCHITECTURES
A. LeNet-5
LeNet-5 is a CNN architecture that learns from raw pixels
of the input image using five layers (convolution,subsampling
and fully connected) later by two fully connected layers with a
reduced count of trainable parameters.Major disadvantages of
a fully connected Neural Network is that it applies
transformation considering each pixel as a separate input
which is computationally very expensive and complex. LeNet
has deployed a methodology to overcome the former problem
by considering the correlation between closer or neighborhood
pixels and distribution of feature motifs all over the
image.Extracting similar features from many regions with
shareable learning parameters has changed the classic Fig.2. Architectural Diagram of AlexNet
perspective of training wherein correlation among
978-1-6654-9515-8/$31.00©2022IEEE
Authorized licensed use limited to: VIT University- Chennai Campus. Downloaded on April 18,2022 at 09:28:36 UTC from IEEE Xplore. Restrictions apply.
2022International Conference on Advanced computing Technologies & Applications (ICACTA), Mar. 04 – 05, 2022, Coimbatore, INDIA
C. VGG NET-16 network increases, the accuracy of the network gets saturated at
VGG 16 is one of the most preferred CNN architectures a particular point and then again decays immediately. To solve
and was developed by Simonyan and Zisserman by 2014.This this problem of degradation, Residual network architecture has
has 16 Convolutional layers (13 convolutional, 3 fully been introduced. This network consists of a novel approach
connected layers). It has a very uniform architecture. The pathway called Skip connection. These connections provide
design of the network is similar to LeNet, AlexNet. It has the alternate pathways for data and gradients to flow and thus
same traditional stacking layer as Alexnet but with different making the training possible. Resnet is built out of Residual
filter size and 138 million parameters. blocks. This network consists of 34 plain network layers,
which was inspired from VGG-19, then skip connections are
Input layer: It has a 3 channel RGB image size 224*224. added between the plain network layer to form a residual
1st layer: It has two 3*3 convolutions in succession network.
followed by a pooling layer. Feature map size reduces to 1 The main advantage of adding skip connections is that if
cross 112cross 112 and each of these 3 cross 3 convolution any layer decreases the performance of resnet architecture,
layers at 64 feature maps. that type of layer will be skipped by regularization[20][21].
2nd layer: Feature map size reduces to 1 cross 56 cross 56
and each of these 3 cross 3 convolutional layers at 128feature
maps.
3rd layer: Feature map size reduces to 1 cross 28 cross 28
and each of these 3 cross 3 convolutional layers at 256feature
maps.
4th layer: It has three 3*3 convolutions in succession
followed by a pooling layer. Feature map size reduces to 1
cross 14cross 14 and each of these 3 cross 3 convolution layers
at 512feature maps.
5th layer: It also has three 3*3 convolutions in succession
followed by a pooling layer. Feature map size reduces to 1
cross 7cross 7 and each of these 3 cross 3 convolution layers at
512feature maps.
After the fifth layer they rasterized and connected to three
fully connected layers. Finally, the output will be one of 1000
classification.[18][19].
D. RESNET 34
Kaiming He et al developed Resnet in 2015, which
produced an error rate under 3.6%, with 152 layers (deep [22]
CNN). It has been observed that as the depth of the neural Fig.4. Architectural Diagram of ResNet-34
Authorized licensed use limited to: VIT University- Chennai Campus. Downloaded on April 18,2022 at 09:28:36 UTC from IEEE Xplore. Restrictions apply.
2022International Conference on Advanced computing Technologies & Applications (ICACTA), Mar. 04 – 05, 2022, Coimbatore, INDIA
IV. DATASET TABULATION Human competitive result accuracy have been obtained.
B. Preparation:
To begin the process first step is preparation of the
environment, installation of fastai library and its dependencies.
After the preparation of the environment, the dataset is split as
training set with 80% images and validation set with 20%
images.
C. Exploratory Analysis:
Exploratory data analysis is the process of analyzing data
to summarize the main characteristic of the data using
visualization methods, a complete understanding of the dataset
is necessary, so first explore the number of classes and images
in the dataset.Since the images have different sizes, explore the
size of the images through histogram,which will give us an
insight of the input dimension for the network.
V. PROPOSED METHODOLOGY-I
A. Why RESNET?
Experimental results depict that with increasing learning
rate in log scale there is an increase in loss. The overall test
accuracy obtained by the Resnet model serves to be the highest
among all the models specified in the below table.
TABLEI
ACCURACY OF VARIOUS MODEL Fig.7. Histogram Plot of Dataset
978-1-6654-9515-8/$31.00©2022IEEE
Authorized licensed use limited to: VIT University- Chennai Campus. Downloaded on April 18,2022 at 09:28:36 UTC from IEEE Xplore. Restrictions apply.
2022International Conference on Advanced computing Technologies & Applications (ICACTA), Mar. 04 – 05, 2022, Coimbatore, INDIA
H. Test-time augmentation:
Accuracy of the model has been improved by using test time
augmentation. This is similar to the Image Augmentation
technique which was performed earlier, few augmented
versions of input images will be created, predictions will be
run on each of them, and then average results will be
calculated. TTA helped us to increase the accuracy to
99.612% and reduced the error by 45%.
E. Training:
For the training of the model,use a pre-trained ResNet 34
on the image net dataset. Start with small model input and
small training procedure(2 epoch), then the batch size is Fig.11. Code Snippet of accuracy of the model
optimized.Theoretically it has been said that larger batch size VI. PROPOSED METHODOLOGY-II
reduces training time.But experimentally the larger batch size
leads to lower validation accuracy, and the models starts over- Requirement of high performance of computational resources
fitting, so take batch size as 256. After finding a decent set of and need to minimize cost is of major importance so this
hyper-parameters, the model has switched to high training methodology presents a modified version of well-known
procedure and high input model. VGGNet having 16-19 layers, with 12 layers.
978-1-6654-9515-8/$31.00©2022IEEE
Authorized licensed use limited to: VIT University- Chennai Campus. Downloaded on April 18,2022 at 09:28:36 UTC from IEEE Xplore. Restrictions apply.
2022International Conference on Advanced computing Technologies & Applications (ICACTA), Mar. 04 – 05, 2022, Coimbatore, INDIA
978-1-6654-9515-8/$31.00©2022IEEE
Authorized licensed use limited to: VIT University- Chennai Campus. Downloaded on April 18,2022 at 09:28:36 UTC from IEEE Xplore. Restrictions apply.
2022International Conference on Advanced computing Technologies & Applications (ICACTA), Mar. 04 – 05, 2022, Coimbatore, INDIA
headlamp control system[23].For broader analysis of [10] Selcan Kaplan Berkaya, Huseyin Gunduz, Ozgur Ozsen, Cuneyt Akinlar,
results Amazon Web Server Deep Learning AMIs (Amazon Serkan Gunal, On circular traffic sign detection and recognition,
Expert Systems with Applications, Volume 48,2016, Pages 67-75,
Machine Images) is used to prevent longer training time by ISSN 0957-4174,https://doi.org/10.1016/j.eswa.2015.11.018.
accelerating training and testing time by the use of [11] Yang, Y., Luo, H., Xu, H., & Wu, F. (2016). Towards Real-Time Traffic
infrastructure and the tools that AMIs contain and it could be Sign Detection and Classification. IEEE Transactions on Intelligent
done both in the cloud and at any scale[24]. Transportation Systems, 17(7), 2022–2031. doi:10.1109/tits.2015.2482461.
[12] Islam, M. T. (2019). Traffic sign detection and recognition based on
convolutional neural networks. 2019 International Conference on Advances in
IX. CONCLUSION Computing, Communication and Control (ICAC3).
This paper discusses about two efficient and optimal method doi:10.1109/icac347590.2019.9036784.
[13] De la Escalera, A., Moreno, L. E., Salichs, M. A., & Armingol, J. M.
for detecting and recognizing traffic signs, which aimed to (1997). Road traffic sign detection and classification. IEEE Transactions on
reduce the problems faced by the drivers in recognizing the Industrial Electronics, 44(6), 848–859. doi:10.1109/41.649946.
traffic signs, decrease the road accidents caused due to [14]El-Sawy, A., EL-Bakry, H., & Loey, M. (2016). CNN for Handwritten
disobeying road signals, and to increase the awareness about Arabic Digits Recognition Based on LeNet-5. Proceedings of the International
Conference on Advanced Intelligent Systems and Informatics 2016, 566–575.
the traffic signs. To achieve this, a deep learning based pre doi:10.1007/978-3-319-48308-5_54.
trained transfer learning model ResNet 34, modified VGG [15]Kayed, M., Anter, A., & Mohamed, H. (2020). Classification of Garments
model with 12 layers for accurate detection and recognition 42 from Fashion MNIST Dataset Using CNN LeNet-5 Architecture. 2020
classes of traffic signals with around 40,000 images is International Conference on Innovative Trends in Communication and
Computer Engineering (ITCE). doi:10.1109/itce48509.2020.9047776.
proposed and obtained the accuracy of 99.612%, 97.12%. The [16]Shanthi, T., & Sabeenian, R. S. (2019). Modified Alexnet architecture for
detected output will be further moved to a set of algorithms classification of diabetic retinopathy images. Computers & Electrical
where the output(text) will be converted into speech, and the Engineering, 76, 56–64.doi: 10.1016/j.compeleceng.2019.03.004.
end result will be provided as voice to the users. [17]Almisreb, A. A., Jamil, N., & Din, N. M. (2018). Utilizing AlexNet Deep
Transfer Learning for Ear Recognition. 2018 Fourth International
Conference on Information Retrieval and Knowledge Management (CAMP).
doi:10.1109/infrkm.2018.8464769.
REFERENCES [18] Mateen, Muhammad; Wen, Junhao; Nasrullah; Song, Sun; Huang,
[1] J. Crissman and C. E. Thorpe “UNSCARF, a color vision system for the Zhouping. 2019. "Fundus Image Classification Using VGG-19 Architecture
detection of unstructured roads,” in Proc. IEEE Int. Conf. Robotics and with PCA and SVD" Symmetry 11, no. 1: 1.
Automation, Sacramento, CA, Apr. 1991, pp. 2496–2501. https://doi.org/10.3390/sym11010001.
[2] L. Davis, “Visual navigation at the University of Maryland,” in Proc. Int. [19] Bi, Z., Yu, L., Gao, H. et al. Improved VGG model-based efficient traffic
Conf. Intelligent Autonomous Systems 2, Amsterdam, The Netherlands, 1989, sign recognition for safe driving in 5G scenarios. Int. J. Mach. Learn. &
pp. 1–19. Cyber. (2020). https://doi.org/10.1007/s13042-020-01185-5.
[3] E. Dickmans, “Machine perception exploiting high-level spatio-temporal [20]Korfiatis P, Kline TL, Lachance DH, Parney IF, Buckner JC, Erickson BJ.
models,” presented at the AGARD Lecture Series 185, Madrid, Spain, Sept. Residual Deep Convolutional Neural Network Predicts MGMT Methylation
17–18, 1992. Status. J Digit Imaging. 2017;30(5):622-628. doi:10.1007/s10278-017-0009-z.
[4] D. Pomerleau, “Neural network based autonomous navigation,” in Vision [21]Wang, A.; Wang, M.; Wu, H.; Jiang, K.; Iwahori, Y. A Novel LiDAR
and Navigation: The Carnegie Mellon Navlab, C. E. Thorpe, Ed. Norwell, Data Classification Algorithm Combined CapsNet with ResNet. Sensors2020,
MA: Kluwer, 1990, ch. 5. 20, 1151. https://doi.org/10.3390/s20041151.
[5] I. Masaki, Ed., Vision Based Vehicle Guidance. Berlin, Germany: [22]Vose, Aaron & Balma, Jacob & Heye, Alex & Rigazzi, Alessandro &
Springer-Verlag, 1992. Siegel, Charles & Moise, Diana & Robbins, Benjamin & Sukumar, Sreenivas
[6] C. Bahlmann, Y. Zhu, Visvanathan Ramesh, M. Pellkofer and T. Koehler, Rangan. (2019). Recombination of Artificial Neural Networks.
"A system for traffic sign detection, tracking, and recognition using color, [23]Fernández Alcantarilla, Pablo & Bergasa, Luis & Jimenez, Pedro &
shape, and motion information," IEEE Proceedings. Intelligent Vehicles Fernández-Llorca, David & Sotelo, Miguel-Angel & Sánchez-Mayoral, Silvia.
Symposium, 2005., 2005, pp. 255-260, doi: 10.1109/IVS.2005.1505111. (2011). Automatic Light Beam Controller for driver assistance. Mach. Vis.
[7] X. Changzhen, W. Cong, M. Weixin and S. Yanmei, "A traffic sign Appl.. 22. 819-835. 10.1007/s00138-011-0327-y.
detection algorithm based on deep convolutional neural network," 2016 IEEE [24] Cheremskoy A. AWS Deep Learning AMIs, A Secure and Scalable
International Conference on Signal and Image Processing (ICSIP), 2016, pp. Environment for Deep Learning on Amazon EC2; Cited: Nov 21 2017.
676-679, doi: 10.1109/SIPROCESS.2016.7888348. Website. Available from: https: //aws.amazon.com/amazon-
[8] Shaikh F. Analytics Vidhya, Deep Learning vs. Machine Learning – the ai/amis/?nc2=h_m1.
essential differences you need to know!
[9] Samuele Salti, Alioscia Petrelli, Federico Tombari, Nicola Fioraio, Luigi
Di Stefano, Traffic sign detection via interest region extraction,
Pattern Recognition, Volume 48, Issue 4,2015, Pages 1039-1049, ISSN 0031-
3203,https://doi.org/10.1016/j.patcog.2014.05.017.
978-1-6654-9515-8/$31.00©2022IEEE
Authorized licensed use limited to: VIT University- Chennai Campus. Downloaded on April 18,2022 at 09:28:36 UTC from IEEE Xplore. Restrictions apply.