You are on page 1of 84

Bangla Licence Plate Detection & Recognition Using

ANN
Submitted by

Md. Tanveer Ahmed


ID: 152392303

Md. Tanvir Hossain


ID: 152392321

A final year project report submitted to the City University in partial fulfillment
of the requirements of the Degree of Bachelor of Computer Science and
Engineering

Supervised by Co-Supervised by

Md. Ataullah Bhuiyan Zakaria Hossain


Senior Lecturer Lecturer
Department of Computer Science and Department of Computer Science and
Engineering Engineering
City University City University
Dhaka-Bangladesh Dhaka-Bangladesh

CITY UNIVERSITY
DHAKA, BANGLADESH
MARCH 2020
Certification

This is to certify that the work presented in this project entitled “Bangla Licence Plate
Detection & Recognition Using ANN” is the outcome of the work done by Md. Tanveer
Ahmed & Md. Tanvir Hossain under the supervision of Md. Ataullah Bhuiyan, Senior
Lecturer, Department of Computer Science and Engineering, City University, Dhaka,
Bangladesh & Co-supervisor Zakaria Hossain, Lecturer, Department of Computer Science
and Engineering, City University, Dhaka, Bangladesh during 4thOctober 2019 to 15thMarch
2020. It is also declared that neither this project/report nor any part it has been submitted or is
being currently submitted anywhere else for the award of any degree or diploma.

Approved By:

--------------------------------- ---------------------------------
Supervisor Co-Supervisor
Md. Ataullah Bhuiyan Zakaria Hossain
Senior Lecturer Lecturer
Department of Computer Science & Department of Computer Science &
Engineering Engineering
City University City University
Dhaka, Bangladesh Dhaka, Bangladesh

i
Declaration
This is to certify that the work presented in this project entitled “Bangla Licence Plate
Detection & Recognition Using ANN” is the outcome of the work I have done under the
supervision of Md Ataullah Bhuiyan, Senior Lecturer of Department of Computer Science
and Engineering, City University, Dhaka, Bangladesh & Co-supervisor Zakaria Hossain,
Lecturer, Department of Computer Science and Engineering, City University, Dhaka,
Bangladesh It is also declared that this project report or part of it was not a copy of a
document done by any organization or being currently submitted anywhere else for any
degree or diploma.

……………………………….
Md. Tanveer Ahmed
ID: 152392303
Batch: 39
Department of CSE
City University, City Campus,
Dhaka, Bangladesh

ii
Declaration
This is to certify that the work presented in this project entitled “Bangla Licence Plate
Detection & Recognition Using ANN” is the outcome of the work I have done under the
supervision of Md Ataullah Bhuiyan, Senior Lecturer of Department of Computer Science
and Engineering, City University, Dhaka, Bangladesh & Co-supervisor Zakaria Hossain,
Lecturer, Department of Computer Science and Engineering, City University, Dhaka,
Bangladesh. It is also declared that this project report or part of it was not a copy of a
document done by any organization or being currently submitted anywhere else for any
degree or diploma.

……………………………….
Md. Tanvir Hossain
ID: 152392321
Batch: 39
Department of CSE
City University, City Campus,
Dhaka, Bangladesh

iii
Acknowledgement
Project development is not an easy task. It requires co-operation and help of various people.
It always happen that word run out when We are really thankful and sincerely want to inspire
our feeling of gratitude towards the one when helped in the completion of the project.

We are deeply indebted to my supervisor Md. Ataullah Bhuiyan, Senior Lecturer of


Department of Computer Science & Engineering, City University, Dhaka, Bangladesh & &
co-supervisor Zakaria Hossain, Lecturer, Department of Computer Science and
Engineering, City University, Dhaka, Bangladesh Without his help, guidance, sympathetic
co-operation, stimulating suggestions and encouragement, the planning and development of
this project would be very difficult for us.

Our special thanks go to the Head of the Department of CSE, Md. Safaet Hossain, who had
given us the permission and encouraged us to go ahead. We are bound to the Honorable Dean
of Department of Science Faculty, Prof. Dr. Md. Shawkut Ali Khan,for his endless support.

I am very grateful to all my faculty members who gave me their valuable guides to complete
my graduation. I am also very grateful to all those people who have helped me to complete
my project.

Md. Tanveer Ahmed


ID: 152392303
Batch: 39
Department of CSE
City University, City Campus,
Dhaka, Bangladesh

iv
Acknowledgement
Project development is not an easy task. It requires co-operation and help of various people.
It always happen that word run out when We are really thankful and sincerely want to inspire
our feeling of gratitude towards the one when helped in the completion of the project.

We are deeply indebted to my supervisor Md. Ataullah Bhuiyan, Senior Lecturer of


Department of Computer Science & Engineering, City University, Dhaka, Bangladesh & &
co-supervisor Zakaria Hossain, Lecturer, Department of Computer Science and
Engineering, City University, Dhaka, Bangladesh Without his help, guidance, sympathetic
co-operation, stimulating suggestions and encouragement, the planning and development of
this project would be very difficult for us.

Our special thanks go to the Head of the Department of CSE, Md. Safaet Hossain, who had
given us the permission and encouraged us to go ahead. We are bound to the Honorable Dean
of Department of Science Faculty, Prof. Dr. Md. Shawkut Ali Khan,for his endless support.

I am very grateful to all my faculty members who gave me their valuable guides to complete
my graduation. I am also very grateful to all those people who have helped me to complete
my project.

Md. Tanvir Hossain


ID: 152392321
Batch: 39
Department of CSE
City University, City Campus,
Dhaka, Bangladesh

v
Abstract
The importance of vehicle number plate detection is increasing day by day notably in
Bangladesh for digitalized transportation system. Particularly in situation to find out the lost
vehicle, improvement of security, access of vehicles in restricted areas, to save time in toll
gates and detecting the guilty vehicles which are involved in road accidents or other related
crime. This project represents a method of Bangla Licence plate detection and recognition.
Our project consists 3 stage of processing, in stage 1 we try 3 different object detection
algorithm to detect Licence plate, in stage 2 we segment that LP, in stage 3 we classify
characters and digits from segmentation with the help of our Proposed CNN classifier and
other CNN based algorithm that is resnet50.We tested our algorithms for 60 images taken
from the road. We achieve almost 100% success in Bangla license plate detection and digits
recognition, and 86% success in character recognition.

vi
Table of Contents

Chapter No. Content Name Page No.


----------- Certification i
----------- Declaration ii
----------- Acknowledgement v
----------- Abstract vi
----------- Table of Contents vii
Chapter 1 1.1 Automatic License Plate Recognition 01
Introduction 1.2 Applications of ALPR 03
1.3 Bangladeshi Licence Plate 04
1.4 Organization of this report 07

Chapter 2 2.1 Detection 08


Literature Review 2.2 Classification 09
2.3 Comparison 10

Chapter 3 3.1 Introduction 11


Algorithms and 3.2 Plate Detection 11
Techniques
3.2.1 Faster RCNN algorithm 11

3.2.2 SSD algorithm 16

3.2.3YOLOV3algorithm 19

3.3 Characters and Digits classification algorithms 22

3.3.1 Classical ML algorithm 23

3.3.1.1Logistic Regression 23

3.3.1.2 Support Vector Machine 24

3.3.1.3Naïve Bayes 25

3.3.1.4k- nearest neighbor’s algorithm 27

3.3.1.5Random Forest 27

3.3.2 CNN Based Architecture 28

3.3.2.1VGG16 28

3.3.2.2VGG19 29

3.3.2.3ResNet50 30

3.3.3Proposed CNN Model 31

vii
3.4 Segmentation 33
3.4.1Binarization 33
3.4.2Trim and Segmentation 34

Chapter 4 4.1 Introduction 35


Train& Test 4.2Data Frequency 35
Review
4.3 License Plate Detection algorithms training& 37
testing
4.3.1 Faster RCNN Train and Test 37
4.3.2 SSD Train and Test 38
4.3.3 YOLOV3 Train and Test 38
4.4Classification algorithms Evaluation 39
4.4.1 Logistic Regression 40
4.4.2 Support Vector Machine 42
4.4.3 Naïve Bayes 44
4.4.4 k- nearest neighbor’s algorithm 46

4.4.5 Random Forest 48


4.4.6 VGG16 50
4.4.7 VGG19 53
4.4.8ResNet50 56
4.4.9 Proposed CNN Model 59

Chapter 5 5.1 Introduction 62


Proposed Scheme 5.2 Detection of licence plate region 62
5.2.1Comparison 62
5.3 Charactersand digitsclassification 65
5.3.1Comparison 65
5.4Finalization 69
Chapter 6 6.1 Summary 71
Conclusion 6.2 Suggestions for future work 71
Chapter No. Content Name Page No.
----------- References 72

viii
List of Figures
Fig. 1.1: Stationary ALPR systems: (a) Orange county, USA, and (b) Use of CCTV camera in
ALPR system. Mobile ALPR system: (c) Dubai, and (d) England
………………………………………………………………………………………..02
Fig. 1.2: Vehicle license plates from different countries: (a) USA, (b) Brazil, (c) England, and
(d)India………………………………………………………………...04
Fig. 1.3: Some Bangla license plates with complex background……………………05
Fig. 1.4: Different types of Bangla license plate: (a) Private, (b) Public, (c) Government,
and(d)Military………………………………………………………06
Fig 3.1: Architecture of Faster RCNN ………….………………………………………12
Fig 3.2: Anchor Boxes …………………………………………………………………13
Fig 3.3: Architecture of SSD …………………………………………………………...19
Fig 3.4: working technique of yolov3 ………….……………………………………….20
Fig 3.5: Non-Max Suppression ………………………………………………………...22
Fig 3.6: Architecture of yolov3………………..………………………………………..22
Fig 3.7: plotting graph of logistic regression …..……………………………………….24
Fig 3.8: Hair data plotting (1) ………………..…………………………………………25
Fig 3.9: Hair data plotting (2) ……………….………………………………………….25
Fig 3.10: Data plotting on KNN…………….…………………………………………..27
Fig 3.11: Bunch of Decision trees ………….…………………………………………..28
Fig 3.12: VGG16 architecture ………………….……………………………………....29
Fig 3.13: VGG19 architecture ………………….………………………………………30
Fig 3.14: Architecture of Resnet50 …………….………………………………………31
Fig 3.15: Architecture of Proposed Model ……..………………………………………32
Fig 3.16: Summary of Proposed Model ………..………………………………………33
Fig 3.17: Binarization operation ……………….…………………………………….34
Fig 3.18: (a)Trim the border ; (b)Final plate …..……………………………………..34
Fig 3.19: (a)Row1 (all characters) ; (b)Row2 (all digits)……..…..………………….34
Fig 3.20: Segmentation of characters and digits……………..….…………………..34
Fig 4.1: Data frequency of digits ……………………………...….………………….35
Fig 4.2: Data frequency of characters (1) ……………………..….………………….36
Fig 4.3: Data frequency of characters (2) ……………………...….…………………36
Fig 5.1: Accuracy comparison of FRCNN,SSD,YOLOV3(pascalVOC dataset)..…..63
Fig 5.2: Accuracy comparison of FRCNN,SSD,YOLOV3(COCO dataset) …….…..64
Fig 5.3: Frame rate comparison of FRCNN,SSD,YOLOV3 …………………….….64
Fig 5.4: All ML algorithms comparison………………………………………….….65
Fig 5.5: CNN based algorithms comparison…………………………………….…..66
Fig 5.6: ML vs CNN based ResNet50 comparison……………………………….…67
Fig 5.7: Proposed model vs ResNet50 comparison……………………………….…68
Fig 5.8: Synchronization of entire work………………………………………….….70

ix
List of Tables
Table 3.1: Frequency table of weather data ……………………………..…………..27
Table 4.1: Accuracy table of logistic regression (digits)………………….…………40
Table 4.2: Accuracy table of logistic regression (Characters)…………….…………41
Table 4.3: Accuracy table of SVM (digits)………………………………..…………42
Table 4.4: Accuracy table of SVM (characters)……………………………………..43
Table 4.5: Accuracy table of Naïve bayes (digits)……………………….………….44
Table 4.6: Accuracy table of Naïve bayes (characters)……………….….………….45
Table 4.7: Accuracy table of KNN (digits)………………………….….…………..46
Table 4.8: Accuracy table of KNN (characters)……………………..….…………..47
Table 4.9: Accuracy table of Random forest (digits)………………..….…………...48
Table 4.10: Accuracy table of Random forest (characters)………….…...………….49
Table 4.11: Accuracy table of VGG16 (digits)…………………….……..…………51
Table 4.12: Accuracy table of VGG16 (characters)………………….…...…………52
Table 4.13: Accuracy table of VGG19 (digits)……………………….……………..54
Table 4.14: Accuracy table of VGG19 (characters)………………….……………...55
Table 4.15: Accuracy table of Resnet50 (digits)…………………….………………57
Table 4.16: Accuracy table of Resnet50 (characters)……………………………….58
Table 4.17: Accuracy table of proposed model (digits)…………………………….60
Table 4.18: Accuracy table of proposed model (characters)………………………..61

x
List of Abbreviation
LP Licence plate
ROI Region of interest
FRCNN Faster regional based convolutional neural network
CNN Convolutional neural network
ANN Artificial neural network
NN Neural network
DNN Deep neural network
SSD Single shot detector
YOLOV3 You only look once version 3
SVM Support vector machine
KNN K- nearest neighbor
OCR Optical character recognition
VGG Visual geometry group
ResNet Residual network
RF Random forest
ML Machine learning

Keywords: Convolutional neural network (CNN), Licence Plate, Segmentation, ResNet50,


ML.

xi
Chapter 1
Introduction

1.1 Automatic License Plate Recognition


Automatic License Plate Recognition (ALPR) is a technology that uses optical
character recognition (OCR) to automatically read license plate characters. It is one of
the important module of Intelligent Transportation System (ITS). A fully functional
ALPR system is comprised of three major stages such as,

1. Detection of license plate in the input image containing vehicles,

2. Segmentation of characters and digits in the license plate, and

3. Recognition and classification of these characters and digits.

There are two types of ALPR systems, such as stationary and mobile. In stationary
ALPR systems, stationary cameras can be mounted on road signs, street lights,
highway overpasses or buildings in a cost-effective way to monitor moving and
parked vehicles. Camera software is made capable of identifying the pixel patterns
that form the license plates and translating the letters and numbers on the plate into a
digital format. The plate data is then compared in real-time to a list of plate numbers
that belong to a set of vehicles of interest. If the system detects a match, it sends an
alert to the dispatcher or other designated personnel. On the other hand, in the mobile
ALPR systems, multiple cameras are mounted on the vehicle. As the vehicle moves, it
takes the snaps of the license plates from other vehicle around it and transmits plate
data to a database. In this project, we are only interested in stationary ALPR systems.
Fig. 1.1 shows some existing stationary and mobile ALPR systems. A stationary
ALPR system in Orange

county, USA is shown in Fig. 1.1(a), where three cameras are mounted on three lanes
of a road. CCTV cameras beside the road are seldom used in stationary ALPR system
as shown in Fig. 1.1(b). Two mobile ALPR systems functioning in Dubai and London
are shown in Fig. 1.1(c) and 1.1(d) respectively. Here, cameras are mounted on top of
the cars.

1
Fig. 1.1: Stationary ALPR systems: (a) Orange county, USA, and (b) Use of CCTV
camera in ALPR system. Mobile ALPR system: (c) Dubai, and (d) England.

There are some difficulties that hinder the successful recognition of a license plate in
the ALPR system. 1. Poor image resolution: Poor image resolution may result in if the
target license plate is very far away from the camera. The use of a low-resolution
camera may also cause this problem.

2. Blurry images: Snapped image might be blurred because of high vehicle speed.
Camera with high shutter speed can be used to avoid this problem.

3. Poor lighting and low contrast: Sometime it might be very difficult to distinguish
the license plate from the background due to overexposure, reflection or shadows.
Image contrast can be adjusted by grey-scale histogram analysis to overcome this
problem.

4. An object obscuring the plate such as, a tow bar, or dirt on the plate.

2
5. Sometime different fonts are used in license plate. However, some countries do not
allow such variations in the fonts of the plates, which eliminate the problem.

6. Different circumvention techniques in front of the vehicle near the license plate is a
major difficulty in locating license plates.

7. The patterns of the license plates might be different from one country to another
country.

1.2 Applications of ALPR

Research works have listed many applications of ALPR, such as electronic payment
systems, free-way and aerial management for traffic surveillance, recovering stolen
cars, identifying cars with an open warrant for arrest, catching speeders by comparing
the average time it takes to get from one stationary camera to another stationary
camera, determining what cars do or do not belong to a parking garage, expediting
parking by eliminating the need for human confirmation of parking passes etc. As an
essential part of Intelligent Transportation System (ITS), an ALPR system is capable
of providing many benefits. It can be used to bring flexibility and automation in toll
collection systems for highways, flyovers, and bridges. With the presence of an ALPR
system tolls can be collected without manual intervention that reduce time and traffic
congestion. Traffic pattern analysis during peak and off peak hours can be done
effectively with an ALPR system, which is an important component of urban
planning. ALPR system can be used by metropolitan police department for effective
law enforcement, effective enforcement of traffic rules, and enhanced vehicle theft

prevention. It can also bring highest efficiency for border control systems. Other
possible applications include building a comprehensive database of traffic movement,
automation and simplicity of airport and harbor logistics, security monitoring of
roads, checkpoints, etc. Another important application of ALPR system is vehicle
surveillance. It can be used to prevent non-payment at gas stations, drive-in
restaurants, etc.

3
Fig. 1.2: Vehicle license plates from different countries: (a) USA, (b) Brazil, (c)
England, and (d) India

Integrating License Plate Recognition Software Engine into parking management


systems, controlled and automatic vehicle entry and exit in car parks or secure zones
becomes possible. Furthermore, the ability to recognize registration number is a
significant added value for comprehensive parking solutions or inventory
management. A parking lot equipped with ALPR system can provide many
advantages, 1) Flexible and automatic vehicle entry to and exit from a car park, 2)
Management information about car park usage. 3) Improved security for both car park
operators and car park users, and 4) Improved traffic flow during peak periods. Other
possible applications include: 1) Vehicle recognition through date and time stamping
as well as exact location, 2) Inventory management, and 3) Comprehensive database
of traffic movement. State border control is one of the most important applications of
automatic license plate recognition.

1.3 Bangladeshi License Plate

License plates from different countries in Fig. 1.2 clearly show that, they are very
much different from one country to another country. For this reason, it is not effective
to use an ALPR system developed in one country to another country without specific
modifications. According to our study, no successful ALPR system has been

4
developed so far for Bangladeshi license plates. There are some unique characteristics
of Bangladeshi license plates which arewritten in Bangla language.

Fig. 1.3: Some Bangla license plates with complex background.

Most of the other languages although have disjoint characters in a word, the
characters in a Bangla word are often joined at the top by a horizontal line called
Matra. Moreover, some Bangla characters are comprised of two or more separate
regions. There might be some disjoint parts in a character either at the top or at the
bottom. These special characteristics of Bangla characters make it very difficult to
segment. Unlike the other countries, a lot of variations can also be seen among the
license plate patterns in Bangladesh. This is shown in Fig. 1.3. In Bangladesh,
government provides only the registration number to a vehicle and the vehicle owner
individually makes the license plates without following a standard template. As a
result, the pattern of license plate varies from one owner to another owner, which
makes it increasingly difficult to locate license plate regions in an input image. In
addition, a large number of Bangladeshi license plates have two rows. The first row
contains the registration area and its type and the second row contains the registration
number. There might be an extra row at the top or at the bottom of the license plate
containing some extra information. This extra information also makes it more
challenging to extract the registration information from the license plates. In
Bangladesh, license plates can be divided into five categories, such as private, public,

5
government, military, and foreign missions. Both private and public license plate have
same format, which includes three major information, such as registration area, type,
and number. Registration area is mainly a set of Bangla

Fig. 1.4: Different types of Bangla license plate: (a) Private, (b) Public, (c)
Government, and (d) Military.

characters indicating the name of a district or city, where the vehicle is registered.
Registration type is a single Bangla character expressing type of vehicles for example,
small car, motor cycle, van, etc. Registration number is comprised of six Bangla digit,
where first two are separated by a hyphen. The only difference between private and
public license plate is that the foreground and the background of private license plates
are white and black respectively, whereas in public license plate it is opposite. These
license plate information are mainly written in two rows, but private or public license
plate with only one row containing all the information can also be found.
Government, military, and foreign mission license plates have only one row. The
background of government owned vehicles are yellow and the foreground is black. It
contains a Bangla character indicating registration type followed by a five digit
registration number. License plates of military vehicles have a special arrow sign or
the abbreviation of the name of the military force at the beginning, which is followed
by a four digit registration number. License plate format of foreign missions are same

6
as the government license plate format. Four different types of Bangla license plate
are shown in Fig. 1.4. Bangladesh Road Transport Authority (BRTA) is the only
organization that issue vehicle registration number. In our project we only work on
Dhaka metro go number plate.

1.4 Organization of This Report

The rest of the report is organized as follows:

 In Chapter 2, we briefly describe several supporting technologies and


algorithms that we use in our proposed scheme for the ALPR system in
Bangladesh.
 In Chapter 3 we will put literature review
 In Chapter 4 We will test all algorithms and evaluate them.
 In Chapter 5 We proposed a scheme, which we will use for our final work
 In Chapter 6 we put some concluding remarks and suggestions for future
works

7
Chapter 2
Literature review

2.1 Detection
To detect licenceplatemd. Mahmud hasan apply boundary based algorithm this
includes horizontal and vertical edge detection, canny edge detection
algorithm,souvola image binarization algorithm, otsu image binarization algorithm
symmetry detection, moving average filter algorithm, extrema detection. Detection
accuracy: 92.8% [8].

Birmohan sing apply some algorithms together to detect licence plate and these are
low-pass filter,binarization and candidate region filtering, brensenthresholding
candidate region selection algorithm, morphological closing, identification of
candidate region technique. Detection accuracy: 97.21% [3].

Md. Rakibulhaque apply LSO (line segmentation and orientation) algorithm for
licence plate detection.

The algorithms author includes are sobel edge detection, highlight the licence plate
region using image morphology, horizontal and vertical extraction ,otsu method.
Detection accuracy: 85.8% [4].

Mohammad jabber Hossain apply sobel edge operator, morphological dilation and
erosion to find the plate. detection accuracy: 94% [1].

Sohaib Abdullah apply yoloV3 (you only look once version 3) to detect licence plate,
it is actually a CNN based object detection algorithm.yolov3 contain 53 layer, it gives
85% IOU [2].

Here boundary based algorithm‘s main problem was that licence plate may not has a
distinguishable boundary if the color of both vehicle and the background of licence
plate are same.

In our project we apply CNN based object detection algorithm yolov3.[5] author also
used yolov3 , but here we use tiny version of yolov3 that contain only 15 layer.

8
2.2 Classification
Mdmahmudulhasan apply artificial neural network technique for classification, to
classify digits author use multilayer feed-forward back-propagation neural network
[1], structure of that network was:

Input layer => 64

Hidden layer1 =>32

Hidden layer2 =>16

Output layer =>10

Accuracy was 97.5%

To classify characters author use feed-forward back-propagation neural network [8].


Structure of that network was:

Input layer => 64

Hidden layer1 =>48

Output layer =>25

Accuracy was 88.7%

Mdrakibulhaque proposed template matching technique for digits and characters


classification, and achieve overall 84.87 accuracy [4].

Mohammad jabber Hossain proposed template matching technique for digits and
characters classification, and achieve overall 95.74% accuracy [1].

Sohaib Abdullah use yolov3 for digit recognition, achieve 81% IOU , to classify
charracters author use resnet20 and achieve 92.79% accuracy [2].

M Mshaifur Rahman proposed CNN architecture to classify digits and characters,


here author build a 6 layer CNN architecture. Distribution of this architecture is 2
conv layer which create 6 and 12 feature map respectively, 2 maxpooling layer, 1
fully connected layer with 300 neuron and 16 output layer. This network achieve
88.67% accuracy (overall) [6].

9
By reviewing all we proposed a CNN based architecture, which is able to classify
digits with 99.8% accuracy.

And also we proposed resnet50 to classify characters and achieve 99.08% accuracy.

2.3 Comparison
Existing algorithms our proposed algorithm

1.Boundary based algorithm Yolov3 15 layers (tiny version)

2.Yolov3 53 layers

10
Chapter 3
Algorithms and Techniques

3.1 Introduction
We try a number of existing techniques and algorithms in our project, such as for
licence plate detection we try 3 object detection algorithm that is, FRCNN, SSD and
YOLOV3.Here we also use some techniques for classification process, we use many
stage of segmentation process, for classification we try NN architecture also
traditional machine learning algorithm KNN.

3.2 Plate Detection


In this project for plate detection we try 3 different object detection algorithm and
these are:

1. Faster RCNN (FRCNN)

2. Single Shot Detector (SSD)

3. You Only Look Once Version 3 (YOLOV3)

All of these algorithms are based on CNN. They use VGG16, VGG19, Resnet,
inception, or Darknet as their backbone CNN for classifying objects.

3.2.1 Faster RCNN algorithm


Faster R-CNN has two networks: region proposal network (RPN) for generating region
proposals and a network using these proposals to detect objects. The main different
here with Fast R-CNN is that the later uses selective search to generate region
proposals. The time cost of generating region proposals is much smaller in RPN than
selective search, when RPN shares the most computation with the object detection

11
network. Briefly, RPN ranks region boxes (called anchors) and proposes the ones most
likely containing objects. The architecture is as follows.

Fig 3.1: Architecture of Faster RCNN

Anchors

Anchors play an important role in Faster R-CNN. An anchor is a box. In the default
configuration of Faster R-CNN, there are 9 anchors at a position of an image. The
following graph shows 9 anchors at the position (320, 320) of an image with size (600,
800).

12
Fig 3.2: Anchor Boxes

Let‘s look closer:

1. Three colors represent three scales or sizes: 128x128, 256x256, and 512x512.

2. Let‘s single out the red boxes/anchors. The three boxes have height width ratios
1:1, 1:2 and 2:1 respectively.

If we choose one position at every stride of 16, there will be 1989 (39x51) positions.
This leads to 17901 (1989 x 9) boxes to consider. The sheer size is hardly smaller than
the combination of sliding window and pyramid. Or we can reason this is why it has a
coverage as good as other state of the art methods. The bright side here is that we can
use region proposal network, the method in Fast RCNN, to significantly reduce
number. These anchors work well for Pascal VOC dataset as well as the COCO
dataset. However we have the freedom to design different kinds of anchors/boxes. For
example, we are designing a network to count passengers/pedestrians, we may not
13
need to consider the very short, very big, or square boxes. A neat set of anchors may
increase the speed as well as the accuracy.

Region Proposal Network

The output of a region proposal network (RPN) is a bunch of boxes/proposals that will
be examined by a classifier and regressor to eventually check the occurrence of
objects. To be more precise, RPN predicts the possibility of an anchor being
background or foreground, and refine the anchor.

The Classifier of Background and Foreground

The first step of training a classifier is make a training dataset. The training data is the
anchors we get from the above process and the ground-truth boxes. The problem we
need to solve here is how we use the ground-truth boxes to label the anchors. The basic
idea here is that we want to label the anchors having the higher overlaps with ground-
truth boxes as foreground, the ones with lower overlaps as background. Apparently, it
needs some tweaks and compromise to separate foreground and background. Now we
have labels for the anchors. The second question here is what features of the anchors
are.

Let‘s say the 600x800 image shrinks 16 times to a 39x51 feature map after applying
CNNs. Every position in the feature map has 9 anchors, and every anchor has two
possible labels (background, foreground). If we make the depth of the feature map as
18 (9 anchors x 2 labels), we will make every anchor have a vector with two values
(normal called logit) representing foreground and background. If we feed the logit into
a softmax/logistic regression activation function, it will predict the labels. Now the
training data is complete with features and labels. Another thing we may pay attention
to is receptive field if we want to re-use a trained network as the CNNs in the process.

14
Make sure the receptive fields of every position on the feature map cover all the
anchors it represents. Otherwise the feature vectors of anchors won‘t have enough
information to make predictions. In the architecture of over feat, it only uses non-
overlapping convolutional and pooling filters to make sure every position in the feature
map cover its own receptive field without overlapping others. In Faster R-CNN,
receptive fields of different anchors often overlap each other, as we can from the above
graph. It leaves the RPN to be position-aware.

The Regressor of Bounding Box

If we follow the process of labelling anchors, we can also pick out the anchors based
on the similar criteria for the regressor to refine. One point here is that anchors labelled
as background shouldn‘t include in the regression, as we don‘t have ground-truth
boxes for them. The depth of feature map is 32 (9 anchors x 4 positions).

The paper uses smooth-L1 loss on the position (x, y) of top-left the box, and the
logarithm of the heights and widths, which is as the same as in Fast R-CNN.

The overall loss of the RPN is a combination of the classification loss and the
regression loss.

ROI Pooling

After RPN, we get proposed regions with different sizes. Different sized regions means
different sized CNN feature maps. It‘s not easy to make an efficient structure to work

15
on features with different sizes. Region of Interest Pooling can simplify the problem by
reducing the feature maps into the same size. Unlike Max-Pooling which has a fix size,
ROI Pooling splits the input feature map into a fixed number (let‘s say k) of roughly
equal regions, and then apply Max-Pooling on every region. Therefore the output of
ROI Pooling is always k regardless the size of input.

3.2.2 SSD algorithm

SSD is a popular algorithm in object detection. It‘s generally faster than Faster
RCNN.A typical CNN network gradually shrinks the feature map size and
increase the depth as it goes to the deeper layers. The deep layers cover larger
receptive fields and construct more abstract representation, while the shallow
layers cover smaller receptive fields. By utilizing this information, we can

use shallow layers to predict small objects and deeper layers to predict

16
big objects, as small objects don‘t need bigger receptive fields and bigger
receptive fields can be confusing for small objects. The following chart
shows the architecture of SSD using VGG net as the base net. The middle
column shows the feature map sets the net generates from different layers. For
example the first feature map set is generated from VGG net layer 23, and have
a size of 38x38 and depth of 512. Every point in the 38x38 feature map covers a
part of the image, and the 512 channels can be the features for every point. By
using the features in the 512 channels, we can do image classification to predict
the label and regression to predict the bounding box for small objects on very
point. The second feature map set has a size of 19x19, which can be used for
slightly larger objects, as the points of the features cover bigger receptive fields.
Down to the last layer, there is only one point in the feature map set, which is
ideal for big objects.For Pascal VOC dataset, there are 21 classes (20 objects + 1
background). we have noticed there are 4x21 outputs for every feature point in
the classification results. Actually, the number 4 comes from the fact we
predict 4 objects with different bounding boxes for every point. It’s a common
trick used in Yolo and Faster RCNN. In SSD, multiple boxes for every feature
point are called priors, while in Faster RCNN they are called anchors. For every
prior, we predict one bounding box for all the classes, so there are 4 values for
very feature point. Beware it’s different from Faster RCNN. It may lead to
worse bounding box prediction due to the confusion among different classes.

17
18
Fig 3.3: Architecture of SSD

3.2.3 YOLOV3 algorithm


 yolo stand for you only look once

19
 it‘s a custom object detector network
 originally based on darknet, a neural network library written in C and CUDA

Features of this algorithm:

i. Detection happen in 3 stages (or scales).


ii. Each grid makes 3 prediction, using 3 anchor boxes.
iii. So there are total number of predefined anchor boxes are 9. These are chosen
using K means clustering.
iv. A cell is selected if the center of an object falls in the receptive field of that
cell.

Output of the network:

Equation: (#prediction per cell) * (5 + number of objects to be detected)

Fig 3.4: working technique of yolov3

Next steps:

 Make prediction using the output

20
 Concatenate the outputs at various scales

 Use Non-Max suppression to get rid of multiple detection of same objectDraw


the remaining bounding box

21
Fig 3.5: Non-Max Suppression

The architecture of yolov3 contain residual block, detection layer and up sampling
layer

Fig 3.6: Architecture of yolov3

3.3 Characters and Digits classification algorithms

22
Classification of characters and digits are the most important part of this project. In
this project we try different algorithms for classifying characters and digits. Then we
evaluate these algorithms and choose the best one.

The algorithms we try:

 Classical ML algorithm
o Logistic Regression
o Support Vector Machine
o Naïve Bayes
o K-nearest neighbor
o Random Forest
 CNN based architecture
o VGG16
o VGG19
o ResNet50
 Proposed CNN Model

3.3.1 Classical ML algorithms

In this section we discuss about 5 classical classification ML algorithms

3.3.1.1Logistic Regression

Logistic Regression is a classification and not a regression algorithm. It estimates


discrete values (Binary values like 0/1, yes/no, true/false) based on a given set of
independent variable(s). Simply put, it basically, predicts the probability
of occurrence of an event by fitting data to a logit function. Hence, it is also known
as logit regression. The values obtained would always lie within 0 and 1 since it
predicts the probability

23
Odds= p/ (1-p)
= probability of event occurrence / probability of event occurrence ln (odds)
= ln (p/ (1-p)) logit (p)
= ln (p/ (1-p))
= (b0+b1X1+b2X2+b3X3....+bkXk)

In the equation given above, p is the probability of the presence of the characteristic
of interest. It chooses parameters that maximize the likelihood of observing the
sample values rather than that minimize the sum of squared errors (like in ordinary
regression).

Fig 3.7: plotting graph of logistic regression

3.3.1.2 Support Vector Machine

In this algorithm, we plot each data item as a point in n-dimensional space (where n is
a number of features we have) with the value of each feature being the value of a
particular coordinate.

For example, if we only had two features like Height and Hair length of an individual,
we‘d first plot these two variables in two-dimensional space where each point has two
coordinates (these coordinates are known as Support Vectors)

24
Fig 3.8: Hair data plotting (1)

Now, we will find some line that splits the data between the two differently classified
groups of data. This will be the line such that the distances from the closest point in
each of the two groups will be farthest away.

Fig 3.9: Hair data plotting (2)

In the example shown above, the line which splits the data into two differently
classified groups is the blue line, since the two closest points are the farthest apart
from the line. This line is our classifier. Then, depending on where the testing data
lands on either side of the line, that‘s what class we can classify the new data as.

3.3.1.3Naïve Bayes

This is a classification technique based on an assumption of independence between


predictors or what‘s known as Bayes’ theorem. In simple terms, a Naive Bayes
classifier assumes that the presence of a particular feature in a class is unrelated to the
presence of any other feature.

25
For example, a fruit may be considered to be an apple if it is red, round, and about 3
inches in diameter. Even if these features depend on each other or upon the existence
of the other features, a Naive Bayes Classifier would consider all of these properties
to independently contribute to the probability that this fruit is an apple.

To build a Bayesian model is simple and particularly functional in case of enormous


data sets. Along with simplicity, Naive Bayes is known to outperform sophisticated
classification methods as well.

Bayes theorem provides a way of calculating posterior probability


P(c|x) from P(c), P(x) and P(x|c). The expression for Posterior Probability is as
follows.

Here,

 P(c|x) is the posterior probability of class (target)


given predictor (attribute).
 P(c) is the prior probability of class.
 P(x|c) is the likelihood which is the probability of predictor given class.
 P(x) is the prior probability of predictor.

Example: Let‘s work through an example to understand this better. So, here I have a
training data set of weather namely, sunny, overcast and rainy, and corresponding
binary variable ‗Play‘. Now, we need to classify whether players will play or not
based on weather condition. Let‘s follow the below steps to perform it.

Step 1: Convert the data set to the frequency table

26
Step 2: Create a Likelihood table by finding the probabilities like Overcast
probability = 0.29 and probability of playing is 0.64.

Table 3.1: Frequency table of weather data

Step 3: Now, use the Naive Bayesian equation to calculate the posterior probability
for each class. The class with the highest posterior probability is the outcome of
prediction.

3.3.1.4KNN( K- nearest neighbor)

K nearest neighbors is a simple algorithm used for both classification and regression
problems. It basically stores all available cases to classify the new cases by a majority
vote of its k neighbors. The case assigned to the class is most common amongst its K
nearest neighbors measured by a distance function (Euclidean, Manhattan,
Minkowski, and Hamming).

While the three former distance functions are used for continuous variables, Hamming
distance function is used for categorical variables. If K = 1, then the case is simply
assigned to the class of its nearest neighbor. At times, choosing K turns out to
be a challenge while performing KNN modeling.

Fig 2.10: Data plotting on KNN

3.3.1.5 Random Forest


27
Random forests or random decision forests are an ensemble learning method
for classification, regression and other tasks that operate by constructing a multitude
of decision trees at training time and outputting the class that is the mode of the
classes (classification) or mean prediction (regression) of the individual
trees. Random decision forests correct for decision trees' habit of over fitting to
their training set.
The first algorithm for random decision forests was created by Tin Kam Ho. using
the random subspace method, which, in Ho's formulation, is a way to implement the
"stochastic discrimination" approach to classification proposed by Eugene Kleinberg.
An extension of the algorithm was developed by Leo Breimanand Adele Cutler, who
registered "Random Forests" as a trademark (as of 2019, owned by Minitab,
Inc.). The extension combines Breiman's "bagging" idea and random selection of
features, introduced first by Ho and later independently by Amit and Geman in order
to construct a collection of decision trees with controlled variance

Fig 3.11: Bunch of Decision trees

3.3.2CNN Based architecture

In this section we will discuss CNN based architecture like vgg16, vgg19 and
ResNet50.

3.3.2.1 VGG16

VGG16 is a convolution neural net (CNN) architecture which was used to win ILSVR
(ImageNet) competition in 2014. It is considered to be one of the excellent vision
model architecture till date. Most unique thing about VGG16 is that instead of having a
large number of hyper-parameter they focused on having convolution layers of 3x3

28
filter with a stride 1 and always used same padding and maxpool layer of 2x2 filter of
stride 2. It follows this arrangement of convolution and max pool layers consistently
throughout the whole architecture. In the end it has 2 FC (fully connected layers)
followed by a softmax for output. The 16 in VGG16 refers to it has 16 layers that have
weights. This network is a pretty large network and it has about 138 million (approx.)
parameters.

Fig 3.12: VGG16 architecture

3.3.2.2 VGG19

VGG-19 is a convolutional neural network that is trained on more than a million


images from the ImageNet database [1]. The network is 19 layers deep and can
classify images into 1000 object categories, such as a keyboard, mouse, pencil, and

29
many animals. As a result, the network has learned rich feature representations for a
wide range of images.

Fig 3.13: VGG19 architecture

3.3.2.3ResNet50

ResNet-50 that is a smaller version of ResNet 152 and frequently used as a starting
point for transfer learning.However, increasing network depth does not work by simply

30
stacking layers together. Deep networks are hard to train because of the notorious
vanishing gradient problem — as the gradient is back-propagated to earlier layers,
repeated multiplication may make the gradient extremely small. As a result, as the
network goes deeper, its performance gets saturated or even starts degrading rapidly.

Skip Connection — The Strength of ResNet

ResNet first introduced the concept of skip connection. The diagram below illustrates
skip connection. The figure on the left is stacking convolution layers together one after
the other. On the right we still stack convolution layers as before but we now also add
the original input to the output of the convolution block. This is called skip connection

Fig 3.14: Architecture of Resnet50

3.3.3Proposed CNN Model

Advantage of using CNN for image classification problems:

 The usage of CNNs are motivated by the fact that they can capture / are
able to learn relevant features from an image /video at different levels
similar to a human brain. This is feature learning! Conventional neural
networks cannot do this.

 Another main feature of CNNs is weight sharing. Let‘s take an example


to explain this. Say we have a one layered CNN with 10 filters of size

31
5x5. Now you can simply calculate parameters of such a CNN, it would
be 5*5*10 weights and 10 biases i.e. 5* 5*10 + 10 = 260 parameters

 In terms of performance, CNNs outperform NNs and other classical ML


algorithm on conventional image recognition tasks and many other tasks.
Look at the Inception model, Resnet50 and many others for instance.

 Has high statistical efficiency (needs few labels to reliably learn from)

 Has high computational efficiency (needs fewer operations to be able to


learn)

Model architecture

Fig 3.15: Architecture of Proposed Model

32
Model Summary:

Fig 3.16: Summaryof Proposed Model

3.4 Segmentation
Segmentation plays a very vital role in classification.The characters and digits inside
the licence plates are segmented. Both binary and grey scale image processing
techniques are used to segment the characters.

We do process our segmentation in two steps

1. Binarization
2. Trim and segmentation

3.4.1Binarization
In binarization stage we convert the plate into gray scale image then we apply
binarization operation. We apply a certain threshold value and if any pixel from that
gray scale image contain higher number than that threshold value then we convert that
pixel into white otherwise we convert it into black.

33
Fig 3.17: Binarization operation

3.4.2 Trim and Segmentation


In this stage we trim the border of the licence plate, and after trimming we get the
final plate

Fig 3.18: (a)Trim the border ; (b)Final plate

After getting the final plate we segment the plate into two row, first row contain all
characters and second row contain all digits

Fig 3.19: (a)Row1 (all characters) ; (b)Row2 (all digits)

Then we apply another segmentation operation to separate all digits and character dha
and go from these rows

Fig 3.20: Segmentation of characters and digits

After segmenting all of these we will send them into the classifier to classify all of
them

34
Chapter 4
Train & Test Review

4.1 Introduction
In this chapter we are going to show training and testing result of all algorithms.Here
our training platform is googlecolab, we know that, to train a deep learning
technology we need very high performance computer, but in that case our local PC is
not able to maintain this types of huge computational operation, that‘s why we choose
googlecolab for training.To train and test our data we divide our data into 85% and
15% ratio. To train we choose 85% data and to test we choose 15% data.

4.2Data Frequency

Fig 4.1: Data frequency of digits

35
Fig 4.2: Data frequency of characters (1)

Fig 4.3: Data frequency of characters (2)

36
4.3 License Plate Detection algorithms training & testing

To detect licence plate we try three object detection algorithm, and these are:

1. Faster RCNN

2. Single shot detector

3. You only look once version 3

4.3.1FRCNN Train and Test

Training

 Total image for training 120


 Input resolution 1000*600 (RGB)
 Training steps 60000
 Loss 0.0126
 Total time for training - 6 hours

Testing

 We test 100 images and it is able to detect all images


 Then we try some far distance images this time it is not able to detect all
images

Accuracy and frame rate

According to FRCNN paper:

 Its mAP: 73.2


 Fps: 7
 Dataset pascalVOC 2007 & 2012

37
Training platform: googlecolab

 So it is clear that this algorithm is not good for real-time object detection

4.3.2 SSD Train and Test


Training

 Totalimage for training 130


 Input resolution300*300 (RGB)
 Training steps 52000
 Loss 0.0255
 Total time for training - 5.3 hours

Testing

 We test 100 images and it is able to detect all images


 Then we try some far distance images this time it is not able to detect all
images

Accuracy and frame rate

According to SSD paper:

 Its mAP: 74.3


 Fps: 46
 Dataset pascalVOC 2007 & 2012

Training platform: googlecolab

 So it is clear that this algorithm is well balanced for real-time object detection

4.3.3 YOLOV3 Train and Test

Training

 Total image for training 350


 Input resolution 416*416 (RGB)

38
 Training steps 4000
 Loss 0.09
 Total time for training - 6 hours

Testing

 We test 100 images and it is able to detect all images


 Then we try some far distance images this time it is not able to detect all
images

Accuracy and frame rate

According to YOLOV3 paper:

 Its mAP: 76.8


 Fps: 67
 Dataset pascalVOC 2007 & 2012

Training platform: googlecolab

 So it is clear that this algorithm is very good for real-time object detection

4.4 Classification Algorithms Evaluation


For classification wetry two types of algorithm, one is classical ML algorithms and
other is Based on CNN architecture. In this section we will evaluate all of these
algorithms

39
4.4.1 Logistic Regression

Digits

Min accuracy: 63.8


Max accuracy: 87.5
testing Data: 7174

0 1 2 3 4 5 6 7 8 9

64.9 81.0 87.1 79.9 74.9 64.2 64.6 87.5 76.5 63.8

Table 4.1: Accuracy table of logistic regression (digits)

40
Characters

Min accuracy: 38.0


Max accuracy: 95.0
Testing Data: 15376

ক খ গ ঘ চ জ ঢ ল ম ভ

94.1 64.7 95.0 38.0 47.3 50.8 94.3 63.4 92.8 45.9

Table 4.2: Accuracy table of logistic regression (Characters)

41
4.4.2 Support Vector machine

Digits

Minaccuracy: 92.5
Maxaccuracy: 98.4
Testing Data: 7174

0 1 2 3 4 5 6 7 8 9

95.6 97.9 98.4 94.4 96.6 93.0 93.9 97.7 94.2 92.5

Table 4.3: Accuracy table of SVM (digits)

42
Characters

Minaccuracy: 66.5
Maxaccuracy: 98.1
Testing Data: 15376

ক খ গ ঘ চ জ ঢ ল ম ভ

98.1 80.5 98.0 66.5 77.9 73.2 97.4 80.5 96.3 78.3

Table 4.4: Accuracy table of SVM (characters)

43
4.4.3 Naïve Bayes

Digits

Minaccuracy:40.6
Maxaccuracy:71.1
Testing Data: 7174

0 1 2 3 4 5 6 7 8 9

40.6 63.3 59.2 57.2 45.5 44.8 43.2 71.1 62.8 41.5

Table 4.5: Accuracy table of Naïve bayes (digits)

44
Characters

Minaccuracy:37.3
Maxaccuracy:87.6
Testing Data: 15376

ক খ গ ঘ চ জ ঢ ল ম ভ

86.1 72.0 81.8 37.3 64.1 40.5 75.5 48.8 87.6 42.8

Table 4.6: Accuracy table of Naïve bayes (characters)

45
4.4.4 K-nearest neighbor

Digits

Minaccuracy: 97.9
Maxaccuracy: 99.8
Testing Data: 7174

0 1 2 3 4 5 6 7 8 9

99.4 98.7 99.7 99.7 99.5 98.3 97.9 99.8 99.4 98.0

Table 4.7: Accuracy table ofKNN (digits)

46
Characters

Minaccuracy: 3.9
Maxaccuracy: 97.5
Testing Data:15376

ক খ গ ঘ চ জ ঢ ল ম ভ

96.1 31.9 97.5 24.5 41.7 3.9 97.0 33.6 93.9 27.1

Table 4.8: Accuracy table of KNN (characters)

47
4.4.5 Random Forest

Digits

Minaccuracy: 98.2
Maxaccuracy: 99.8
Testing Data: 7174

0 1 2 3 4 5 6 7 8 9

99.7 99.6 99.3 99.8 99.7 99.0 99.0 99.6 99.4 98.2

Table 4.9: Accuracy table of Random forest (digits)

48
Characters

Minaccuracy: 53.2
Maxaccuracy: 98.1
Testing Data:15376

ক খ গ ঘ চ জ ঢ ল ম ভ

98.1 78.3 97.2 53.0 81.5 68.3 97.9 76.3 95.5 73.5

Table 4.10: Accuracy table of Random forest (characters)

49
4.4.6 VGG16

Training vsvalidation accuracy and losscurve: [digit classification]

Training vs validation accuracy and loss curve: [characters classification]

50
VGG16 evaluation:

Digits

Minaccuracy: 0.0
Maxaccuracy: 96.1
Testing Data: 7174

0 1 2 3 4 5 6 7 8 9

96.1 22.2 12.8 19.1 7.6 0.9 4.4 1.9 0.0 3.9

Table 4.11: Accuracy table of VGG16 (digits)

51
Characters

Minaccuracy: 19.6
Maxaccuracy: 88.9
Testing Data: 10148

ক খ গ ঘ চ জ ঢ ল ম ভ

88.9 36.4 76.8 19.6 45.0 39.6 77.6 33.8 82.8 46.7

Table 4.12: Accuracy table of VGG16 (characters)

52
4.4.7 VGG19
Training vs validation accuracy and loss curve: [digit classification]

Training vs validation accuracy and loss curve: [characters classification]

53
VGG19 evaluation:

Digits

Minaccuracy: 01.5
Maxaccuracy: 95.5
Testing Data: 7174

0 1 2 3 4 5 6 7 8 9

95.5 30.5 25.2 3.6 4.7 14.8 5.4 6.7 1.5 2.0

Table 4.13: Accuracy table of VGG19 (digits)

54
Characters

Minaccuracy: 11.6
Maxaccuracy: 88.5
Testing Data: 10148

ক খ গ ঘ চ জ ঢ ল ম ভ

89.5 28.7 84.0 11.6 32.3 45.2 66.8 32.3 63.1 59.8

Table 4.14: Accuracy table of VGG19 (characters)

55
4.4.8 ResNet50
Training vs validation accuracy and loss curve: [digits classification]

Training vs validation accuracy and loss curve: [characters classification]

56
ResNet50 evaluation:

Digits

Minaccuracy: 99.8
Maxaccuracy: 100.0
Testing Data: 7174

0 1 2 3 4 5 6 7 8 9

100 99.8 99.8 100 99.8 100 99.8 99.8 100 100

Table 4.15: Accuracy table of Resnet50 (digits)

57
Characters

Minaccuracy: 93.6
Maxaccuracy: 99.6
Testing Data: 15376

ক খ গ ঘ চ জ ঢ ল ম ভ

99.7 95.1 99.5 93.6 96.7 94.3 99.6 99.2 98.7 98.9

Table 4.16: Accuracy table of Resnet50 (characters)

58
4.4.9 Proposed Model
Training vs validation accuracy and loss curve: [digits classification]

Training vs validation accuracy and loss curve:[characters classification]

59
Proposed Model evaluation:

Digits

Minaccuracy: 99.8
Maxaccuracy: 100.0
Testing Data: 7174

0 1 2 3 4 5 6 7 8 9

99.8 99.6 100 99.7 99.5 100 99.8 99.8 100 99.8

Table 4.17: Accuracy table of proposed model (digits)

60
Characters

Minaccuracy: 87.9
Maxaccuracy: 99.6
Testing Data: 15376

ক খ গ ঘ চ জ ঢ ল ম ভ

99.6 88.7 99.5 86.9 87.9 91.6 99.5 97.4 98.8 91.2

Table 4.18: Accuracy table of proposed model (characters)

61
Chapter 5
Proposed Scheme

5.1 Introduction

In this chapter we are going to compare each and every algorithm and give a final
scheme and we will implement this final scheme in our project. In finalization section
of this chapter we will show how we synchronize both CNN based object detection
algorithm and our own classification CNN algorithm to get the final output.

5.2Detection of license plate region

To detect licence plate we try different types of object detection algorithms, these are

 FRCNN
 SSD
 YOLOV3

All of these algorithm are based on CNN. They use different types of CNN
architecture as backbone of object classification, such as FRCNN uses VGG16,
VGG19 or Alexnet, SSD uses resnet, alexnet, inception types of CNN architecture
YOLOV3 uses darknet types of CNN architecture

5.2.1Comparison

62
As we discuss previously that different model use different CNN architecture that‘s
why it is very difficult compare these models for same platform.

Accuracy comparison

mAP measure on pascalVOC dataset

Fig 5.1: Accuracy comparison of FRCNN,SSD,YOLOV3(pascalVOC dataset)

63
mAP measure on COCO dataset

Fig 5.2: Accuracy comparison of FRCNN,SSD,YOLOV3(COCO dataset)

Frame rate comparison

Fig 5.3: Frame rate comparison of FRCNN,SSD,YOLOV3

64
5.3Characters and Digits Classification

To classify characters and digits we try different classical ML algorithm and CNN
based architecture and also we proposed a CNN model.

5.3.1Comparison

Here we will compare all ML algorithm and choose the best one ,then we will
compare all CNN based algorithm and chose the best one , then we will compare both
best one and find the winner then we will compare the winner with our proposed
model and choose the final scheme

All ML algorithm comparison:

Fig 5.4: All ML algorithms comparison

Here to classify digits Random forest score the highest and to classify characters
SVM score the highest

65
CNN based algorithm comparison:

Fig 5.5: CNN based algorithms comparison

Here to classify digits and characters ResNet50 score the highest

66
ML vs CNN based ResNet50 Comparison:

Fig 5.6: ML vs CNN based ResNet50 comparison

Here ResNet50 score the highest accuracy

67
ResNet50 vs our proposed CNN model Comparison:

Fig 5.7: Proposed model vs ResNet50 comparison

Here to classify digits and characters we compare our proposed model and resnet50.

Here first we will see the difference aspect then decide the final scheme.

Digit Character

Diff = 99.9 – 99.8 = 0.1 Diff = 99 – 98.3 =0.7

Here the accuracy difference is for digits are0.1 and for characters are 0.7 so for
characters the accuracy difference is big, so here we will choose resnet50 for
characters classification. Here for digits the accuracy difference is only 0.1 now we
will analysis other factors to take the final decision for digit classification.

Proposed Model ResNet50

Parameter: 1,199,882 Parameter: 24,115,850

68
Here our proposed model is 24 times smaller than ResNet50, so it‘s clear that our
model will consume less memory.

Proposed Model ResNet50

Ex Time: 0.00004 S Ex Time: 0.004 s

Here our proposed model is 100 times faster than ResNet50.

So by comparing these factors we can take our proposed model to classify digits.

5.4 Finalization
In previous section we compare all algorithm and finally it‘s time to make a final
decision.

By comparing all object detection algorithm we choose YOLOV3, because its


accuracy is good enough for detecting licence plate and also it is real time compatible.

By comparing our proposed CNN model and other classification model we choose our
proposed CNN model for digit classification and ResNet50 for character
classification.

Final scheme:

 Licence plate detection – YOLOV3


 Segmentation technique
 Classify digits – our proposed CNN model
 Classify characters – ResNet50

Now we draw a diagram of entire scenario, it will show how 3 stage process work
with our final scheme.

69
First stage –Licence plate detection (algorithm –yolov3)

YOLO
V3

Second stage – Apply segmentation technique

Binarization Trim Final plate

Two row segmentationchar& digit segmentation

1st row 2nd row

Third stage – Classify characters and digits (algorithm – our own CNN model)

 When we classifydha then we make it Dhaka metro

Fig 5.8: Synchronization of entire work

70
Chapter 6
Conclusion

6.1 Summary
In this project, we present a set of algorithms for license plate detection and
recognition written in Bangla using three stages of processing. In each stage we try
new techniques and algorithms suitable for recognizing Bangla license plates.At first,
we detected the licenec plate of vehicle from the image by using object detection
algorithms and then compare them and then choose the best one. Then we apply
segmentation technique to separate the characters and digits from licence plate. Then
we proposed a convolutional neural network which is able to classify digits and also
we add CNN based architecture ResNet50 to classify characters. Then we join both
work together and we took 60 images containing vehicle then apply our final scheme
and it was able to successfully detect and recognize those vehicle licence plate and
digits with 98% accuracy and characters with 86% accuracy.

6.2 Suggestions for future work


In our project licence plate detection happen with almost 100% accuracy. But the
main problem we face in our project is in classification stage, we were able to classify
all digits with very high accuracy but when we try to classify characters then it gives
us many misclassification result, so we assume that it happen due to lack of data
variation and lack of data amount. So in future we will collect more data and evaluate
our work further.

71
References
[1] Mohammad JaberHossain , Md. HasanUzzaman , A. F. M. SaifuddinSaif "
Bangla Digital Number Plate Recognition using Template Matching ",
inProceedingsInternational Journal of Computer Applications (0975 – 8887) Volume
181 – No. 29, November 2018

[2] SohaibAbdullah ―Yolo- Based three-stage network for Bangla license plate recognition in
Dhaka metropolitan city‖,inProceedingsInternational Conference on Bangla Speech and
Language Processing (ICBSLP), 21-22 September, 2018

[3] Birmohan sing ―Automatic number plate recognition system by character position
method‖,inProceedingsInt. J. Computational Vision and Robotics, Vol. 6, Nos. 1/2,
2016

[4] Md. Rakibulhaque― Line segmentation and orientation algorithm for automatic
Bengali license plate localization and recognition‖,inProceedingsinternational Journal
of

[5] Schmidhuber, Jürgen. "Deep learning in neural networks: An overview."Neural


networks 61 (2015): 85-117

[6] M.M.A. Joarder, K. Mahmud, T. Ahmed, M. Kawser, and B. Ahamed, ―Bangla


automatic number plate recognition system using artificial neural network,‖ Asian
Trans. on Science & Technology, vol. 02, no.01, March 2012.

[7] Md. MahmudulHasan . ―Real Time Detection and Recognition of Vehicle License
Plate in Bangla‖ (2011)

[8] M. Alom, P. Sidike, T. Taha and V. Asari, "Handwritten Bangla Digit


Recognition Using Deep Learning"

72

You might also like