You are on page 1of 15

SCHOOL OF COMPUTER SCIENCE AND ENGINEERING

CSE3013 – ARTIFICIAL INTELLIGENCE


SLOT – B2+TB2

FINAL REPORT

Title -
Traffic Sign Detection
Using Convolutional Neural Network

Submitted By
16BCE0239 – Pranav Tandon

16BCE0568 – Mayank Mathur

16BCE0844 – Shyamli Singh

Submitted To
Prof Kannan A

1
Abstract

Traffic sign recognition (TSR) represents an important feature of advanced driver


assistance systems, contributing to the safety of the drivers, pedestrians and vehicles as
well. Developing TSR systems requires the use of computer vision techniques, which could
be considered fundamental in the field of pattern recognition in general. Despite all the
previous works and research that has been achieved, traffic sign detection and recognition
still remain a very challenging problem, precisely if we want to provide a real time
processing solution.

We propose an approach for traffic sign detection based on Convolutional Neural


Networks (CNN). We first transform the original image into the gray scale image by using
support vector machines, then use convolutional neural networks with fixed and learnable
layers for detection and recognition. The fixed layer can reduce the amount of interest
areas to detect, and crop the boundaries very close to the borders of traffic signs. The
learnable layers can increase the accuracy of detection significantly.

2
Introduction

Driver assistance systems (DAS) have gotten an ever-increasing number of


considerations both academy and industry areas. Among different elements of DAS, the
traffic sign detection has turned out to be a standout amongst the most essential modules
since it gives cautions to the drivers to relieve the pressure of driving. Recognition of traffic
signs has been a considerable issue in automated vehicles since the middle of 1990s, and
different strategies have been proposed by researchers.

So as to explain the worries over street and transportation wellbeing, programmed


traffic sign discovery framework has been presented. A programmed Traffic sign detection
system traffic signs from and inside pictures caught by cameras or imaging sensors. In
antagonistic rush hour traffic conditions, the driver may not see traffic signs, which may lead
to accidents. In such situations, the Traffic sign detection system can play an important role
in identifying the traffic sign. The fundamental goal of the exploration on Traffic sign
detection is to improve the power and effectiveness of the Traffic sign detection framework.
To build up a programmed Traffic sign recognition framework is a repetitive occupation given
the ceaseless changes in nature and lighting conditions. Among different issues that
additionally should be tended to be addressed are partial obscuring, multiple traffic signs
appearing at a single time, and blurring and fading of traffic signs, which can also create
problem for the detection purpose. For applying the Traffic sign detection framework
continuously condition, a quick and effective approach is required.

3
Related works
According to [ 2 ], the first work on automated traffic sign detection was reported in
Japan in 1984. [1] image pattern recognition has been chiefly researched only for an
individual object. However, it is an advanced direction to recognize the object which becomes
a target from a scene image. [2]. A novel color space Eigen color proposed based on
Karhunen-Loeve (KL), is used for traffic sign detection[3]. The main disadvantage of these
color-based methods is that it is difficult to set the value of threshold because the color
information is not invariant in real-world environment with different lightening conditions.

Methods based on shape of the traffic signs, have also been widely used. In [4] a
method is proposed using smoothness and Laplacian filter to detect round signs.[5]
proprietary algorithms use specific color filters and the features of specific shapes to
distinguish a specific type of traffic sign. but they can detect only stop signs [6] Edges are
tested at different levels of resolution by using so called a Hierarchical Structure Code. It is
assumed that closed edge contours are available at one of these levels of resolution, and
failures happen when the outline of the traffic sign merges with the background. In [7] a
detection algorithm by using Hough transform is introduced. In order to speed up the detect
algorithm. In [8] images are segmented in HSI color space, and template matching techniques
are then used to find traffic signs. Recently, Convolutional Neural Network has been adopted
in object recognition for its high accuracy [10] [11] [12] [13].

In [10], a multi-layer convolutional networks is proposed to boost traffic sign


recognition, using a combination of supervised and unsupervised learning. This model can
learn multi stages of invariant features of image, with each layer containing a filter bank layer,
a non-linear transform layer, and a spatial feature pooling layer. Inspired by the excellence
of traffic sign recognition using Convolutional Neural Network (CNN), we proposed a method
based on CNN, using fixed and learnable filters to detect traffic signs on scene images.

4
Existing Method
The method is to develop a new framework for traffic sign detection and recognition
based on the proposals by the guidance of fully conventional neural networks, which largely
reduces the search area of traffic sign under the premise of ensuring the detection rate.

Proposed Method

we can adopt the latest architectures for object detection such as feature pyramid
network and multi-scale training. Neural networks can be trained to recognise patterns
containing certain colours. Colour segmentation neural networks can be used to reduce the
colour resolution of image.Neural networks are well known for their powerful classification
capability. When used in classification, neural networks can be trained to recognise road
signs’ features within a region of interest. The commonly used models in road sign
recognition are Multilayer Perceptron (MLP) network, Radial Basis Network (RBF), etc…,

5
DATASET DESCRIPTION
A) SOURCE LINK

http://benchmark.ini.rub.de/?section=gtsrb&subsection=dataset

B) DATASET DESCRIPTION

The pickled data is a dictionary with 4 key/value pairs:

'features' is a 4D array containing raw pixel data of the traffic sign images, (num
examples, width, height, channels).

'labels' is a 1D array containing the label/class id of the traffic sign. The


file signnames.csv contains id -> name mappings for each id.

'sizes' is a list containing tuples, (width, height) representing the original width and
height the image.

'coords' is a list containing tuples, (x1, y1, x2, y2) representing coordinates of a bounding
box around the sign in the image. These coordinates assume the original image. The pickled
data contains resized versions (32 by 32 ) of these images.

6
C) SAMPLE DATASET

D) TRAIN AND TEST DATASET

7
PROPOSED ARCHITECHTURE

The architecture proposed is inspired from Yann Le Cun’s paper on classification of


traffic signs. We added a few tweaks and created a modular codebase which allows us to try
out different filter sizes, depth, and number of convolution layers, as well as the dimensions
of fully connected layers. In homage to Le Cun, and with a touch of cheekiness, we called such
network EdLeNet.

We mainly tried 5x5 and 3x3 filter (aka kernel) sizes, and start with depth of 32 for
our first convolutional layer. EdLeNet’s 3x3 architecture is shown below:

The network is composed of 3 convolutional layers — kernel size is 3x3, with depth


doubling at next layer — using ReLU as the activation function, each followed by a 2x2 max
pooling operation. The last 3 layers are fully connected, with the final layer producing 43
results (the total number of possible labels) computed using the SoftMax activation function.
The network is trained using mini-batch stochastic gradient descent with the Adamoptimizer.
We build a highly modular coding infrastructure that enables us to dynamically create our
models like in the following snippets:

The ModelConfig contains information about the model such as:


 The model function (e.g. EdLeNet)
 The model name
 Input format (e.g. [32, 32, 1] for grayscale),
 Convolutional layers config [filter size, start depth, number of layers],
 Fully connected layers dimensions (e.g. [120, 84])
 Number of classes
 Dropout keep percentage values [p-conv, p-fc]

8
Layer Description

Input 32x32x1 grayscale image

Convolution 1x1 stride, valid padding

RELU

Max pooling 2x2 stride, same padding

Dropout .90,propbability to keep units

Convolution 1x1 stride , valid padding

RELU

Max pooling 2x2 stride, same padding

Dropout .80,propbability to keep units

Flatten

multiply weight bias Add bias after weight are multiplied

RELU

Dropout .70,propbability to keep units

Full connection

multiply weight bias Add bias after weight are multiplied

RELU

Dropout 60,propbability to keep units

DENSE

RELU

While neural networks can be a great learning device they are often referred to as a black
box. We can understand what the weights of a neural network look like better by plotting
their feature maps. After successfully training your neural network you can see what it's

9
feature maps look like by plotting the output of the network's weight layers in response to a
test stimuli image. From these plotted feature maps, it's possible to see what characteristics
of an image the network finds interesting. For a sign, maybe the inner network feature maps
react with high activation to the sign's boundary outline or to the contrast in the sign's
painted symbol.

Layer1:

We can see that the network is focusing a lot on the edges of the circle and somehow on this
sample image(speed limit 120kmph). The background is ignored.

Layer2:

It is rather hard to determine what the network is focusing on in layer 1, but it seems to
"activate" around the edges of the circle and in the middle, where our sample image(Speed
limit : 120 Kmph) is printed.

10
ALGORITHM

Step 0: Load The Data

Step 1: Dataset Summary & Exploration

Step 2: Design and Test a Model Architecture

Step 2.1:Pre-process the Data Set (normalization, grayscale, etc.)

 Minimally, the image data should be normalized so that the data has mean
zero and equal variance. For image data, (pixel - 128)/ 128 is a quick way to
approximately normalize the data and can be used in this project.
 Other pre-processing steps are optional. You can try different techniques to
see if it improves performance.
 Use the code cell (or multiple code cells, if necessary) to implement the first
step of your project.

Step 2.2:Data augmentation

The first thing we tried is to augment the data replicating the class labels which
are rare in the dataset, so it can reduce high variance of our model (Overfitting)

Conclusion :

we realized that data augmentation cannot make drastic improvements


to the performance of my model, and the augmentation step was ommited due
to slowing down the entire training procedure

Step 2.3 :Train, Validate and Test the Model


A validation set can be used to assess how well the model is performing. A low
accuracy on the training and validation sets imply underfitting. A high accuracy on the
training set but low accuracy on the validation set implies overfitting.

Step 3: Test a Model on New Images

Output Top 5 Softmax Probabilities For Each Image Found on the Web

Step 4 : Visualize the Neural Network's State with Test Images

11
While neural networks can be a great learning device they are often referred to as a
black box. We can understand what the weights of a neural network look like better by
plotting their feature maps. After successfully training your neural network you can see what
it's feature maps look like by plotting the output of the network's weight layers in response
to a test stimuli image. From these plotted feature maps, it's possible to see what
characteristics of an image the network finds interesting. For a sign, maybe the inner network
feature maps react with high activation to the sign's boundary outline or to the contrast in
the sign's painted symbol.

Provided for you below is the function code that allows you to get the visualization
output of any tensorflow weight layer you want. The inputs to the function should be a stimuli
image, one used during training or a new one you provided, and then the tensorflow variable
name that represents the layer's state during the training process, for instance if you wanted
to see what the LeNet lab's feature maps looked like for it's second convolutional layer you
could enter conv2 as the tf_activation variable.

OUTPUT

12
Results
Number of training images 39209

Number of Validation images 4410

Number of Testing images 12630

Number of different traffic signs 43

Number of detections 19

Number of misses 2

No. of Batch Dropout1 Dropout2 Dropout3 Dropout4 Validation Testing


Epochs Size Accuracy Accuracy

40 128 - - - - 86.3% 81.1%

40 128 0.85 0.75 0.65 0.55 87.3% 82.2%

40 128 0.90 0.80 0.70 0.50 91.7% 89.67%

100 128 0.90 0.80 0.70 0.50 94.3% 94.7%

13
CONCLUSION

We covered how deep learning can be used to classify traffic signs with high accuracy,
employing a variety of pre-processing and regularization techniques (e.g. dropout), and trying
different model architectures. We built highly configurable code and developed a flexible way
of evaluating multiple architectures. Our model reached close to close to 94% accuracy on
the test set, achieving 96% on the validation set.

REFERENCES

[1] P. Paclík, J. Novovičová, and R. P. W. Duin, ―Building road-sign classifiers using a trainable
similarity measure,‖ IEEE Transactions on Intelligent Transportation Systems, vol. 7, no. 3, pp.
309–321, 2006.

[2] Hsu, S.H. and Huang, C.L., 2001. Road sign detection and recognition using matching
pursuit method. Image and Vision Computing, 19(3), pp.119-129.

[3] Greenhalgh, J. and Mirmehdi, M., 2012. Real-time detection and recognition of road
traffic signs. IEEE Transactions on Intelligent Transportation Systems, 13(4), pp.1498-1506.

[4] Shustanov, A. and Yakimov, P., 2017. CNN design for real-time traffic sign recognition.
Procedia engineering, 201, pp.718-725.

[5] Zhu, Y., Zhang, C., Zhou, D., Wang, X., Bai, X. and Liu, W., 2016. Traffic sign detection and
recognition using fully convolutional network guided proposals. Neurocomputing, 214, pp.758-
766.

[6] Lee, H.S. and Kim, K., 2018. Simultaneous traffic sign detection and boundary estimation
using convolutional neural network. IEEE Transactions on Intelligent Transportation Systems,
19(5), pp.1652-1663.

[7] Zhu, Z., Liang, D., Zhang, S., Huang, X., Li, B. and Hu, S., 2016. Traffic-sign detection and
classification in the wild. In Proceedings of the IEEE Conference on Computer Vision and Pattern
Recognition (pp. 2110-2118).

14
[8] Ruta, A., Porikli, F., Watanabe, S. and Li, Y., 2011. In-vehicle camera traffic sign detection
and recognition. Machine Vision and Applications, 22(2), pp.359-375.

[9] Liang, M., Yuan, M., Hu, X., Li, J. and Liu, H., 2013, August. Traffic sign detection by ROI
extraction and histogram features-based recognition. In The 2013 international joint conference
on Neural networks (IJCNN) (pp. 1-8). IEEE

[10] Ellahyani, A., El Ansari, M. and El Jaafari, I., 2016. Traffic sign detection and recognition
based on random forests. Applied Soft Computing, 46, pp.805-815.

15

You might also like