You are on page 1of 26

SMART PARKING SYSTEM USING YOLOv3 DEEP

LEARNING MODEL

MAJOR PROJECT REPORT

Submitted in partial fulfilment of the requirements for the award of the degree of

BACHELOR OF TECHNOLOGY

in

ELECTRONICS & COMMUNICATION ENGINEERING


by

Deepanshu Sadhwani Zarqua Neyaz Ashutosh Mishra Jyotir Aditya Kalra


En. No: 41451202817 En. No: 41651202817 En. No: 35151202817 En. No:35351202817

Guided by

Dr. Narina Thakur Mr. Sourabh Rana


Dean R&D Assistant Professor, ECE

DEPARTMENT OF ELECTRONICS & COMMUNICATION ENGINEERING


BHARATI VIDYAPEETH’S COLLEGE OF ENGINEERING
(AFFILIATED TO GURU GOBIND SINGH INDRAPRASTHA UNIVERSITY, DELHI)
NEW DELHI – 110063
JUNE 2021
CANDIDATE’S DECLARATION

It is hereby certified that the work which is being presented in the B. Tech Minor project Report entitled
"SMART PARKING SYSTEM USING YOLOv3 DEEP LEARNING MODEL" in partial
fulfilment of the requirements for the award of the degree of Bachelor of Technology and submitted
in the Department of Electronics & Communication Engineering of BHARATI VIDYAPEETH’S
COLLEGE OF ENGINEERING, New Delhi (Affiliated to Guru Gobind Singh Indraprastha
University, Delhi) is an authentic record of our own work carried out during a period from MARCH
2021 to JUNE 2021 under the guidance of Dr. Narina Thakur, Assistant Professor.

The matter presented in the B. Tech Major Project Report has not been submitted by us for the award
of any other degree of this or any other Institute.

Deepanshu Sadhwani Zarqua Neyaz Ashutosh Mishra Jyotir Aditya kalra


En. No: 41451202817 En. No: 41651202817 En. No: 35151202817 En. No:35351202817

This is to certify that the above statement made by the candidates is correct to the best of my
knowledge. They are permitted to appear in the External Major Project Examination.

(Dr. Narina Thakur) (Mr. Sourabh Rana)


Dean R&D Assistant Professor, ECE

P a g e 1 | 26
ABSTRACT

The massive integration of information technologies, under different aspects of the modern world, has led to
the treatment of vehicles as conceptual resources in information systems. Since an autonomous information
system has no meaning without any data, there is a need to reform vehicle information between reality and
the information system. This can be achieved by human agents or by special intelligent equipment that will
allow identification of vehicles by their registration plates in real environments. Among intelligent
equipment, it is made of the system of detection and recognition of the number plates of vehicles. An
Automatic Number Plate Recognition (ANPR) system is a secure system for smart cities which employs the
principle of image processing and uses the Optical Character Recognition (OCR) to read the image of
vehicle number plate. An automated, fast, reliable and robust vehicle plate recognition system has become
critical for traffic control and traffic law enforcement, and ANPR is the solution. This paper focuses on an
improved OCR-based plate detection technique using YOLOv3 deep learning model, which utilizes an
object-based dataset trained by Convolutional Neural Network (CNN). The goal is to detect the
alphanumeric data of the detected license plate. The project will produce a Dataframe containing vehicle’s
registration details, entry time, exit time and fees for the total duration of parking. To boost accuracy, a
blended algorithm for license plate detection and recognition is proposed and compared to current
methodologies.

P a g e 2 | 26
ACKNOWLEDGEMENT

We express our deep gratitude to Dr. Narina Thakur, Dean R&D and Mr. Sourabh Rana, Assistant
professor, Department of Electronics and Communication Engineering for their valuable guidance and
suggestion throughout our project work. We are thankful to Mr. Rajiv Nehra, Project Coordinator, for
his valuable guidance.

We would like to extend our sincere thanks to Head of the Department, Dr. Kirti Gupta for her time
to time suggestions to complete our project work.

Deepanshu Sadhwani Zarqua Neyaz Ashutosh Mishra Jyotir Aditya kalra


En. No: 41451202817 En. No: 41651202817 En. No: 35151202817 En. No:35351202817

P a g e 3 | 26
TABLE OF CONTENTS

CANDIDATE DECLARATION (i)


ABSTRACT (ii)
ACKNOWLEDGEMENT (iii)
TABLE OF CONTENTS (iv) – (v)
LIST OF FIGURES (vi)

Chapter 1: INTRODUCTION 7–8

1.1 OBJECTIVE 7
1.2 MOTIVATION 7
1.3 IMPLEMENTATION 8
1.4 SUMMARY OF PROJECT 8

Chapter 2: LITERATURE 9 – 11
SURVEY
Chapter 3: TOOLS & 12 – 17
METHODOLOGY
12
DEEP LEARNING
3.1
NEURAL NETWORKS 12-13
3.1.1 WORKING OF DEEP 13
3.1.2 LEARNING ALGORITHMS
3.1.3 TYPES OF DEEP LEARNING 13
ALGORITHMS
3.2 14
CONVOLUTIONAL NEURAL
NETWORK (CNN)
3.2.1 WORKING OF CNN 14

3.3 YOU ONLY LOOK ONCE 15


(YOLO)
3.3.1 15
3.3.2 WORKING OF YOLO 15
3.3.3 BENEFITS OF YOLO 15-16
YOLOv1 vs YOLOv2 vs YOLOv3
3.4 OPTICAL CHARACTER 16-17
3.5 RECOGNITION (OCR) 17
PYTESSERACT

Chapter 4: IMPLEMENTATION 18 – 21
& RESULTS
4.1 SYSTEM WORKFLOW 18
DIAGRAM
STEPS FOR
4.2 IMPLEMENTATION 18-19

P a g e 4 | 26
4.2.1 DATASET 19

19
TRAINING THE MODEL USING
4.2.2 DARKNET
FRAMEWORK

4.2.3 IMAGE SEGMENTATION 19

4.2.4 OPTICAL CHARACTER 20


RECOGNITION USING
PYTESSERACT
4.2.5 STORING EXTRACTED DTATA 20
IN DATABASE

4.3 EVALUATION MODELS AND 21


RESULTS

Chapter 4: FUTURE SCOPE & 22 – 23


CONCLUSION 22
5.1 CONCLUSION 23
FUTURE SCOPE
5.2

REFERENCES 24-25

P a g e 5 | 26
LIST OF FIGURES

Figure 1.1 License Plate Detection……………………………………………7


Figure 2.1 YOLOv3:An incremental improvement…………………………10
Figure 3.1 Machine learning vs Deep learning……………………………….12
Figure 3.2 Layers of Neural Network………………………………………...13
Figure 3.3 Image Processing via CNN……………………………………….14
Figure 3.4 YOLO Convolutional Neural Network…………………………...15
Figure 3.5 Optical Character Recognition……………………………………17
Figure 3.6 Extracting Characters using Pytesseract………………………….17
Figure 4.1 System Workflow Diagram………………………………………18
Figure 4.2 Images from the Dataset………………………………………….19
Figure 4.3 Steps of pre-processing the image………………………………..19
Figure 4.4 OCR using Pytesseract……………………………………………20
Figure 4.5 Data Entry In Database System…………………………………...20
Figure 4.6 Evaluation scores of Smart Parking System using YOLOv3……..21
Figure 4.7 Evaluation Scores of Smart Parking System using VGG16………21

P a g e 6 | 26
CHAPTER 1: INTRODUCTION

This project proposes to build an affordable system for the monitoring of vehicles in different scenarios
such as residential societies, business parks etc.

1.1 OBJECTIVE

 To detect vehicle license plate using YOLOv3 Deep Learning Model.


 To extract text from detected number plate using Pytesseract.
 Maintaining the records of Vehicle License plate with Entry and Exit time records.

1.2 MOTIVATION

Smart Parking System system is considered essential when it comes to vehicle surveillance and is now
making its presence felt in the parking management sector. This completely eliminates the errors caused
due to manual entry of vehicle registration details. Often parking lot operators do not enter complete
details or sometimes enter incorrect details into the system, especially during peak hours, which may
later cause problems for vehicle owner, while exiting the lot, and is also a major security issue.
Smart Parking System makes this process completely seamless and secure. Not only does PMS store
correct vehicle registration data in the database, but it also automatically verifies the vehicle accurately
at exit points.

Figure 1.1: License Plate Recognition [Google Image].


1.3 IMPLEMENTATION

Number Plate Detection: This problem can be tackled using the Object Detection approach where we
need to train our model using the car/other vehicle images with number plates using
YOLOv1/YOLOv2/YOLOv3 Deep learning architectures based on CNN [12].

Extracting text from the detected Number Plate: This problem can be solved using OCR (Optical
Character Recognition) which can be helpful in extracting alphanumeric characters from cropped
Number Plate images using Pytesseract.

P a g e 7 | 26
 The solution can be implemented using YOLOv3 for the license plate detection as YOLOv3 has
higher accuracy and less computation time.

 For text extraction, Microsoft Vision API or Google Pytesseract is considered.

 For certain image processing steps, a combination of PIL and OpenCV can be used.

1.4 SUMMARY OF THE REPORT

Vehicle Number Plate Detection aims at detection of the License Plate present on a vehicle and then
extracting the contents of that License Plate. A vehicle’s license plate is commonly known as ‘a number
plate’.
Maintaining the records of vehicle data using numeric or alphanumeric code that uniquely identifies the
vehicle. These number plates can be of different color & have different font and font size depending upon
the country and other rules.
The project outcome is more focused on to detect the license plate and extracting the alphanumerical
data from the license plate. The outcomes of the project will be a JSON data recording the registration
details and Entry time and Exit time of the Vehicle and an alarm based message if the parking time
exceeds 12hours.

P a g e 8 | 26
CHAPTER 2: LITERATURE SURVEY

ANPR phases into mainly three steps, License plate detection, segmentation and character recognition.
License plate detection is an active field of research over the years. Many researchers have been doing
research on ANPR with different algorithms and each of them have tried to improve the performance of the
ANPR. ANPR approaches can be broadly divided into two main categories: traditional image processing
methods and deep learning methods. We will review the relevant literature for License plate detection, then
the various techniques of segmentation and character recognition. The limitations of some of the techniques
which are mostly the traditional methods will be discussed in the latter part of this section. We have also
discussed about how the performance of ANPR has been improved by using various deep learning models
over these recent years.

For detecting License plates a CNN- based approach [14], [15] that allows to estimate the locations of the
license plates. It models a function that produces a score for each image sub-region, allowing us to estimate
the locations of the detected license plates by combining the results obtained from sparse overlapping
regions. The main contributions of [15] were to design the robust CNN-based license plate detector, the
creation of an output function that allows combining the results obtained from a subset of image sub-regions,
and which can be employed for other object detection tasks and the development of a challenging image
benchmark, freely available for research purposes. Many other researchers have also used CNN in License
plate detection, such as in [3], [8], where they were able to train their model using CNN. This paper has
presented a new OKM-CNN [3] technique for effective detection and recognition of license plates. The
proposed OKM-CNN model operates on three main stages. In the first stage, license plates localization and
detection process take place using IBA and CCA model. Subsequently, OKM based clustering technique
gets executed to segment the LP image and finally, characters in LP recognition takes place using CNN
model.

Prior work on object detection repurposes classifiers to perform detection. Instead, object detection can be
framed as a regression problem to spatially separated bounding boxes and associated class probabilities. A
single neural network predicts bounding boxes and class probabilities directly from full images in one
evaluation. So a new approach for object detection was introduced using YOLO [16]. YOLO is extremely
fast as compared to previously introduced algorithms. We simply run our neural network on a new image at
test time to predict detections. It can process streaming video in real-time with less than 25 milliseconds of
latency and it outperforms other detection methods, including DPM and R-CNN [2]. YOLO has been used in
[5], [7], [8], [9] as an object detector.

In [8], authors presented a robust and efficient ALPR system based on the state-of-the-art YOLO object
detector. The Convolutional Neural Networks (CNNs) are trained and fine-tuned for each ALPR stage. They
designed a two-stage approach employing simple data augmentation tricks such as inverted License Plates
(LPs) and flipped characters. In [7], they have trained the robust end-to-end real time ANPR system using
the YOLO algorithm for License plate localization as well as character recognition and then the recognized
characters are sorted in order from left to right for it to be same as in the License plate. In [9], YOLO model
based on darknet framework was used. They proposed to design a more adaptable and affordable smart
parking system via distributed cameras, edge computing, data analytics, and advanced deep learning
algorithms which is YOLO in this case. Whereas in [5], a sliding-window single class detector via tiny
YOLO CNN classifier was proposed. In this work, they addressed the problem of car license plate detection
using a You Only Look Once (YOLO)-Darknet deep learning framework, where YOLO’s 7 convolutional
layers to detect a single class was used and the detection method was a sliding-window process.

Since the model learns to predict bounding boxes from data, it struggles to generalize to objects in new or
unusual aspect ratios or configurations. This limitation of YOLO has led the researchers to do research on
other object detectors. Like in [6], a neural network architecture for License plate localization using
bottleneck depth-separable convolution with inverted residuals was proposed. The neural network used for
License plate localization is based on SSD architecture. The original feature extractor used in SSD is VGG-
16. VGG-16 consists of 13 convolutional layers followed by three fully connected layers and is very
appealing because of its uniform architecture. They found that combining the versatility of depth wise
P a g e 9 | 26
separable convolutions with the underlying ideas of relevant information extraction, abstraction, and
accumulation inherent in linear bottlenecks could provide an accurate and fast License plate localization
solution with little to no reduction in overall accuracy. However, VGG consists of about 140 million
parameters [6], making a system using it computationally complex and thus requiring a powerful GPU to run
effectively and within an acceptable timeframe. This is one of the limitations of VGG-16.

A new object detection and recognition using one stage improved model was proposed in [2]. This paper
presented the fundamental overview of object detection methods by including two classes of object detectors.
In two stage detector covered algorithms are RCNN, Fast RCNN, and Faster RCNN, whereas in one stage
detector YOLO v1, v2, v3, and SSD are covered. Two stage detectors focus more on accuracy, whereas the
primary concern of one stage detectors is speed. They identified a new methodology of single stage model
for improving speed without sacrificing much accuracy. The comparison results show that in two stage
detector out of RCNN, Fast RCNN and Faster RCNN, the latter one is the best among them i.e. Faster
RCNN. In one stage detector, among YOLOv1, YOLOv2, YOLOv3 and SSD, the YOLO v3-Tiny increases
the speed of object detection while ensures the accuracy of the result.

The proposed paper on YOLOv3: An incremental improvement [10] presents some updates on YOLO. At
320*320 YOLOv3 runs in 22 ms at 28.2 mAP, as accurate as SSD but three times faster. If at the old .5 IOU
mAP (mean average precision) detection metric was looked at YOLOv3 is quite good. It achieves 57:9 AP50
in 51 ms on a Titan X, compared to 57:5 AP50 in 198 ms by RetinaNet, similar performance but 3.8 times
faster. As an application of the YOLOv3 model a multinational License Plate Recognition using generalized
character sequence detection [1] is proposed. In this paper, the proposed system is mainly based on you only
look once (YOLO) networks. Particularly, tiny YOLOv3 was used for the rest step whereas the second step
uses YOLOv3-SPP, a version of YOLOv3 that consists of the spatial pyramid pooling (SPP) block. The
localized license plate is fed into YOLOv3-SPP for character recognition. The character recognition network
returns the bounding boxes of the predicted characters and does not provide information about the sequence
of the license plate number. A license plate number with an incorrect sequence is considered wrong. Thus, to
extract the correct sequence, they proposed a layout detection algorithm that can extract the correct sequence
of license plate numbers from multinational license plates.

Figure 2.1: YOLOv3: An Incremental Improvement [10]

Another proposed algorithm for license plate detection is Template Matching [12], [13]. Its aim is to match
the template scheme to identify the vehicle's number plate. First and foremost, the car number plate must be
located from the input picture of the car so use of template matching has been before beginning the
morphology procedure, this stage is prioritized in order to obtain the plate number's boundary box. Second,
each character on the car's number plate was recognized using Optical Character Recognition. Over the past
many years, the use of neural networks for license plate recognition has been very common. The researchers
have been making use of various type of neural networks. During the research, we have come across
different neural networks such as ANN [19], PNN [21], and BP Neural Network [20]. In [19], they presented

P a g e 10 | 26
a feed-forward Artificial Neural Network (ANN) based OCR algorithm that was specifically designed to
meet the needs of an ANPR system. MATLAB was used to implement and validate the algorithm. The
primary goal of this study is to apply the entire ANPR scheme on a single FPGA. In [20], authors did the
research of Vehicle Plate Recognition Technical Based on BP Neural Network. A BP neural network is
essentially a set of input and output samples that is transformed into a nonlinear optimization problem. It is a
learning algorithm that uses the gradient algorithm to solve the weight problem. In [21], the plate is
localized using Otsu’s thresholding method. Vertical and horizontal histograms are used for character
segmentation. Finally, character recognition is done by Probabilistic Neural Networks.

Some unique license plate localization techniques were also encountered while research, which were
Symmetric Wavelets [18] and Multi-level Genetic Algorithm [17]. These were the methods which were used
for ANPR before deep learning models. In [18] they proposed a novel preprocessing method. The input
image is first transformed to grayscale, and then the correlation procedure is carried out using a mask.
Statistical measurements such as root mean square error (RMSE) and peak signal to noise ratio (PSNR) are
calculated after preprocessing which produced dominant values. After that, symmetric wavelets and
mathematical morphology are used to localize the data. In [17], authors proposed using the genetic algorithm
at many layers to locate multiple license plates in a single image. As a result, any number of license plates in
a single image can be identified and located. The localization of symbols on two-dimensional compound
objects may be done with excellent accuracy rates utilizing the Multi-level genetic technique.

In [2], the authors have deeply discussed about the one stage detectors that are YOLOv1, YOLOv2,
YOLOv3 and SSD. This research has played major part on deciding why we were able to choose YOLOv3
deep learning model for our proposed work. So we will talk about each above mentioned detectors briefly
and how these have outperformed the other. First, YOLO v1 uses the Darknet framework and ImageNet-
1000 dataset to train the model. It distributes the given picture to a grid of S×S cells. Limitations of YOLO
version1 are based on the closeness of the objects in the picture. If the objects appear as a cluster, they could
not find the small objects. YOLOv2 supersedes YOLO by offering a great balance between running time and
accuracy. For better accuracy, YOLOv2 introduces batch normalization, which helps to enhance 2 percent in
map by attaching it into each layer of convolution. The next advanced variant of YOLO is version 3 that uses
logistic regression to compute the target score. It gives the score for all targets in each boundary box. YOLO
v3 can give the multilabel classification because it uses a logistic classifier for each class in place of the
softmax layer used in YOLOv2. YOLOv3 uses darknet-53. It has fifty-three layers of convolution. These
layers are more in-depth compared to darknet-19 used in YOLOv2. The advantage of YOLOv3 over
YOLOv2 is that some changes are included in error function and for objects of small to a considerable size
detection occurs on three scales. The multiclass problem turned in a multilabel problem, and the
performance improved over small size objects. SSD is a single shot detector. It manages an excellent balance
of speed with the accuracy of result. In this, a CNN based model to the input picture for computing the
feature map was applied. It also employs anchor boxes similar to faster RCNN at various aspect ratios and
learns the offset instead of determining the box. Unlike YOLO, SSD does not divide the image into grids of
random size. For every location of the feature map, it predicts the offset of predefined anchor boxes (default
boxes). Relative to the corresponding cell, each box has a fixed size, proportion, and position. YOLO v3-
Tiny is a lightweight variant of YOLO v3, which takes less running time and less accuracy when examined
with YOLO v3.

P a g e 11 | 26
CHAPTER 3: TOOLS & METHODOLOGY

3.1 DEEP LEARNING

The field of artificial intelligence is essentially when machines can do tasks that typically require
human intelligence. It encompasses machine learning, where machines can learn by experience and
acquire skills without human involvement. Deep learning is a subset of machine learning where
artificial neural networks, algorithms inspired by the human brain, learn from large amounts of data.

Similarly to how we learn from experience, the deep learning algorithm would perform a task
repeatedly, each time tweaking it a little to improve the outcome.

In simple terms. Deep learning uses artificial neural networks to perform sophisticated computations
on large amounts of data. It is a type of machine learning that works based on the structure and
function of the human brain. Deep learning algorithms train machines by learning from examples.
Industries such as health care, eCommerce, entertainment, and advertising commonly use deep
learning.

Figure 3.1: Machine learning vs Deep learning

[Google Image]

3.1.1 NEURAL NETWORKS

A neural network [9] is structured like the human brain and consists of artificial neurons, also known
as nodes. These nodes are stacked next to each other in three layers:

 The input layer

 The hidden layer(s)

 The output layer

Data provides each node with information in the form of inputs. The node multiplies the inputs with
random weights, calculates them, and adds a bias. Finally, nonlinear functions, also known as
activation functions, are applied to determine which neuron to fire.
P a g e 12 | 26
Figure 3.2: Layers of Neural Network [Google Image].

3.1.2 WORKING OF DEEP LEARNING ALGORITHMS

While deep learning algorithms feature self-learning representations, they depend upon ANNs that
mirror the way the brain computes information. During the training process, algorithms use unknown
elements in the input distribution to extract features, group objects, and discover useful data patterns.
Much like training machines for self-learning, this occurs at multiple levels, using the algorithms to
build the models.

Deep learning models make use of several algorithms. While no one network is considered perfect,
some algorithms are better suited to perform specific tasks. To choose the right ones, it’s good to
gain a solid understanding of all primary algorithms.

3.1.3 TYPES OF DEEP LEARNING ALGORITHMS

Deep learning algorithms work with almost any kind of data and require large amounts of computing
power and information to solve complicated issues. The top 10 deep learning algorithms are:

 Convolutional Neural Networks (CNNs)

 Long Short Term Memory Networks (LSTMs)

 Recurrent Neural Networks (RNNs)

 Generative Adversarial Networks (GANs)

 Radial Basis Function Networks (RBFNs)


(https://www.simplilearn.com/tutorials/deep-

 Multilayer Perceptrons (MLPs) learning-tutorial/deep-learning-algorithm)

 Self-Organizing Maps (SOMs)

 Deep Belief Networks (DBNs)

 Restricted Boltzmann Machines( RBMs)

 Auto encoders

P a g e 13 | 26
3.2 CONVOLUTIONAL NEURAL NETWORKS (CNN)

CNN's, also known as Convolution Neural Networks, consist of multiple layers and are mainly used
for image processing and object detection. It was used for recognizing characters like ZIP codes and
digits.

CNN's are widely used to identify satellite images, process medical images, forecast time series, and
detect anomalies.

3.2.1 WORKING OF CNN

CNN's have multiple layers that process and extract features from data:

1. Convolution Layer:

CNN has a convolution layer that has several filters to perform the convolution operation.

2. Rectified Linear Unit (ReLU):

CNN's have a ReLU layer to perform operations on elements. The output is a rectified feature
map.

3. Pooling Layer:

The rectified feature map next feeds into a pooling layer. Pooling is a down-sampling
operation that reduces the dimensions of the feature map. The pooling layer then converts the
resulting two-dimensional arrays from the pooled feature map into a single, long, continuous,
linear vector by flattening it.

4. Fully Connected Layer:

A fully connected layer forms when the flattened matrix from the pooling layer is fed as an
input, which classifies and identifies the images.

Below is an example of an image processed via CNN.

Figure 3.3: Image Processing via CNN [Google Image].

3.3 YOU ONLY LOOK ONCE (YOLO)

P a g e 14 | 26
YOLO (You Only Look Once), is a network for object detection. The object detection task consists
in determining the location on the image where certain objects are present, as well as classifying
those objects. It is a single network trained end to end to perform a regression task predicting both
object bounding box and object class.

3.3.1 WORKING OF YOLO

YOLO uses a totally different approach. YOLO is a clever convolutional neural network (CNN) for
doing object detection in real-time. The algorithm applies a single neural network to the full image,
and then divides the image into regions and predicts bounding boxes and probabilities for each region.
These bounding boxes are weighted by the predicted probabilities.

The algorithm “only looks once” at the image in the sense that it requires only one forward
propagation pass through the neural network to make predictions. After non-max suppression (which
makes sure the object detection algorithm only detects each object once), it then outputs recognized
objects together with the bounding boxes.

With YOLO, a single CNN simultaneously predicts multiple bounding boxes and class probabilities
for those boxes. YOLO trains on full images and directly optimizes detection performance.

Figure 3.4: YOLO Convolution Neural Network [Google Image].

3.3.2 BENEFITS OF YOLO

 YOLO is extremely fast.


 YOLO sees the entire image during training and test time so it implicitly encodes
contextual information about classes as well as their appearance.
 YOLO learns generalizable representations of objects so that when trained on natural
images and tested on artwork, the algorithm outperforms other top detection methods.

3.3.3 YOLOv1 vs YOLOv2 vs YOLOv3

 YOLOv1: It uses Darknet framework which is trained on ImageNet-1000 dataset. This works as
mentioned above but has many limitations because of it the use of the YOL v1 is restricted. It
could not find small objects if they are appeared as a cluster. This architecture found difficulty in
generalization of objects if the image is of other dimensions different from the trained image. The
major issue is localization of objects in the input image.

P a g e 15 | 26
 YOLOv2: The major improvements of this version are better , faster and more advanced to meet
the Faster R-CNN which also an object detection algorithm which uses a Region Proposal
Network to identify the objects from the image input and SSD (Single Shot Multibox
Detector).The changes from YOLO to YOLO v2:
1. Batch Normalization: it normalize the input layer by altering slightly and scaling the
activations. By adding batch normalization to convolutional layers in the architecture MAP
(mean average precision) has been improved by 2%.
2. Higher Resolution Classifier: the input size in YOLO v2 has been increased from 224*224 to
448*448. The increase in the input size of the image has improved the MAP (mean average
precision) up to 4%.
3. Anchor Boxes: YOLO v2 does classification and prediction in a single framework. These
anchor boxes are responsible for predicting bounding box and this anchor boxes are designed
for a given dataset by using clustering (k-means clustering).
4. Fine-Grained Features: YOLO v2 divides the image into 13*13 grid cells which is smaller
when compared to its previous version. This enables the YOLO v2 to identify or localize the
smaller objects in the image and also effective with the larger objects.
5. Multi-Scale Training: in YOLO v2 where it is trained with random images with different
dimensions range from 320*320 to 608*608. This allows the network to learn and predict the
objects from various input dimensions with accuracy.
6. Darknet 19: YOLO v2 uses Darknet 19 architecture with 19 convolutional layers and 5 max
pooling layers and a softmax layer for classification objects.

 YOLOv3: As many object detection algorithms are been there for a while now the competition is
all about how accurate and quickly objects are detected. YOLO v3 has all we need for object
detection in real-time with accurately and classifying the objects. The so called Incremental
improvements in YOLO v3 are:
1. Bounding Box Predictions: In YOLO v3 gives the score for the objects for each bounding
boxes. It uses logistic regression to predict the objectiveness score.
2. Class Predictions: In YOLO v3 it uses logistic classifiers for every class instead of softmax
which has been used in the previous YOLO v2. By doing so in YOLO v3 we can have multi-
label classification.
3. Feature Pyramid Networks (FPN): YOLO v3 makes predictions similar to the FPN where 3
predictions are made for every location the input image and features are extracted from each
prediction. By doing so YOLO v3 has the better ability at different scales.
4. Darknet-53: the predecessor YOLO v2 used Darknet-19 as feature extractor and YOLO v3
uses the Darknet-53 network for feature extractor which has 53 convolutional layers. It is
much deeper than the YOL v2 and also had shortcut connections. Darknet-53 composes of the
mainly with 3x3 and 1x1 filters with shortcut connections.

3.4 OPTICAL CHARACTER RECOGNITION (OCR)

Optical Character Recognition (OCR) is the mechanical or electronic conversion of images of


typewritten or printed text into machine-encoded text. In simple words, Optical Character
Recognition, or OCR, is a technology that enables you to convert different types of documents, such
as scanned paper documents, PDF files or images captured by a digital camera into editable and
searchable data.
The automatic number plate recognition with OCR is a combination of integrated hardware and
software that will read vehicle license plates without the need of humans to do it.

P a g e 16 | 26
Figure 3.5: Optical Character Recognition [Google Image].

3.5 PYTESSERACT

Python-tesseract is an optical character recognition (OCR) tool for python. That is, it will recognize
and “read” the text embedded in images. This will be our most important tool in number plate
recognition.
It can read all image types supported, including jpeg, png, gif, bmp, tiff, and others. If used as a script,
Python-tesseract will print the recognized text instead of writing it to a file.

Figure 3.6: Extracting Characters using Pytesseract [Google Image].

P a g e 17 | 26
CHAPTER 4: IMPLEMENTATION & RESULTS

4.1 SYSTEM WORKFLOW DIAGRAM

Figure 4.1: System Workflow Diagram.

4.2 STEPS FOR IMPLEMENTATION

4.2.1 DATASET

First, a dataset [22] composing of 433 images of cars that contains license plate was taken. This
dataset contains bounding box annotations of the car license plates within the image.

In order to detect license we will use YOLO (You Only Look One) deep learning object detection
architecture based on Convolution Neural Networks.

This network is extremely fast, it processes images in real-time at 45 frames per second. A smaller
version of the network, Fast YOLO, processes an astounding 155 frames per second.

P a g e 18 | 26
Figure 4.2: Images from the Car License plate (Dataset).

4.2.2 TRAINING THE MODEL USING DARKNET FRAMEWORK

For number plate detection, the device was trained, and the program was written in Python. We used
Darknet framework, an open source neural network framework, for training the detector. In the
proposed work the detector is YOLOv3 deep learning model.
YOLO abbreviates to You Only Look Once. YOLO takes an image as input, runs it through a Neural
Network, and outputs the prediction of the bounding boxes. Each bounding box’s prediction consists
of five components, those are x, y, w, h and confidence. (x , y) represents the center of the bounding
box whereas, (w , h) are the width and the height of the boxes and confidence represents the
Estimated Prediction Accuracy of the object. Training is done by simply fitting the YOLOv3 model
on the images data as input and annotations (x, y, w, h, confidence) as output.
YOLOv3 is a model that has a wide range of applications because of its detecting speed and
precision. Our model is based on YOLOv3 in this paper. The more convolutional layers we utilize,
the better is the result. Based on the foregoing, the model provided in this work has a more complex
structure that is not only more suitable for our database, but also allows us to recognize targets at a
finer level. The Darknet is used to extract characteristics in the original YOLOv3.

4.2.3 IMAGE SEGMENTATION

After the successful detection of number plate, the next step is to segment the number plate out of the
image. This can also be done using OpenCV by cropping out the number plate region and then it
being saved as the new image. Segmentation serves as a link between character recognition and
number plate extraction. Boundary box analysis is another name for segmentation and the characters
are extracted by using this analysis.

Figure 4.3: Segmented number plate from the original image

4.2.4 OPTICAL CHARACTER RECOGNITION USING PYTESSERACT

The conversion of images of handwritten or printed text into computer text is known as optical
character recognition (OCR). There are a number of OCR engines available, the proposed work
uses Python-Tesseract also called Pytesseract. Python-Tesseract is a python-based optical character
recognition (OCR) application. It can recognize and interpret text embedded in pictures. This will be
our most effective method for recognizing license plates.
Tesseract contains a new neural network component that can recognize text lines. It is based on
OCRopus' Python-based LSTM implementation, however it has been rewritten in C++ for Tesseract.
Tesseract's neural network system predates TensorFlow, but it is compatible with it because it uses
the Variable Graph Specification Language as a network description language (VGSL).
Pytesseract OCR accepts the segmented image as input, then the characters in the image of the
number plate will then be recognized. The collected data is saved in a database or a data file.

P a g e 19 | 26
Figure 4.4: OCR using Pytesseract

4.2.5 STORING EXTRACTED DTATA IN DATABASE

The extracted characters of the license plate from the OCR of the number plate is then stored in the
database system using Pandas Dataframe with the Date, Entry time, Exit time and Vehicle No. Also,
generating unique User ID of each vehicle entering the parking. With the help of the entry time and
the exit time of the vehicle in the parking lot will calculate the parking duration and ultimately
generating the fees according to the total time consumed by the vehicle in the parking lot.

Figure 4.5: Data entry in Database system

4.3 EVALUATION SCORES OF THE MODEL

We conducted our experiment on several features of vehicles with completely various shapes and
dimensions all of them subject to different conditions in order to assess their process and precision.
The algorithm's accuracy was limited because for plates at a certain degree and plates at the edge of
the image, the segmentation approach did not produce the anticipated results. It needs a proper
camera angle setup to be more efficient and effective.

Evaluation Criteria. Formula for the evaluation of accuracy:


������ �� ������� ����������� ��+��
�������� =
������ ������ �� ����������
=
��+��+��+��
�100% (1)

Where TP stands for True Positives, TN stands for True Negatives, FP stands for False Positives, and
FN stands for False Negatives.

Formula for Mean squared error,



(�� � � −�(�))2
��� =

(2)
�=1

The proposed method yielded the following outcomes:

P a g e 20 | 26
Figure 4.6: Evaluation Scores of Smart Parking System using YOLOv3

The above graph in Fig. 8. shows the evaluation scores when the dataset was trained using
YOLOv3 detector. Score apprentissage depicts the training data score whereas, Score validation
depicts validation score. (Works as a part of test data from the same training dataset). It yielded an
Accuracy score on Training data and on Validation to be 94.2% and 80% respectively for 50 epoch.

Figure 4.7: Evaluation Scores of Smart Parking System using VGG16

The above graph in Fig. 9. shows the evaluation scores when the dataset was trained using
VGG16 detector. It yielded an Accuracy score to be 79.13% on 200 epoch.

P a g e 21 | 26
CHAPTER 5: CONCLUSION & FUTURE SCOPE

5.1 CONCLUSION
The proposed algorithm for license plate detection is simple and may successfully categorize various
license plate layouts. It can bring a number of benefits, such as traffic safety adherence, safety in the
event of susceptibility, ease of use and immediate access to information – compared to the phase of
segmentation searching for registration details of vehicle ownership. The lighting, the terminology,
the car shade and the non-uniform plate size, the character on the plate, distinct font and the
background color are factors that affect ANPR results.

Our system was trained using the YOLOv3-Darknet framework. The model for license plate detection
was trained using YOLOv3 with CNN which is capable of detecting object and entities. Then OCR
was applied for number plate recognition using Tesseract API available in python called Pytesseract.
The results of our method yielded an Accuracy score on Training data and on Validation to be 94.2%
and 80% respectively. It is clear that due to the complicated ANPR system, it is currently impossible
to achieve a 100 percent overall accuracy since each stage is dependent on the previous step. However,
if bounding boxes are accurate, our algorithm is able to extract the correct license plate numbers from
an image.

P a g e 22 | 26
5.2 FUTURE SCOPE

ANPR can be further used for vehicle model identification traffic control, speed control and vehicle
location tracking. This system is cost effective for any country. If the system gets implemented by any
country, then the system should be feed with the official vehicle database which consist of all the
information and details of owner and vehicle, respectively. It can be further extended as multilingual
ANPR to identify the language of characters automatically based on the training data. For low
resolution images, algorithms like super resolution of images should be implemented. To segment
multiple vehicle number plates a coarseto-fine strategy could be helpful.
In future research, we'll look into employing an applied noise reduction technique to improve license
plate recognition accuracy without dramatically increasing calculation time. The disadvantage of
using a single class classifier in an ensemble model is that it will significantly increase computation
time. We are investigating two options to fix this problem. A proposal-based technology like Fast R-
CNN can be utilized to minimize the calculation time of the underlying classifier. Secondly, we can
use parallel calculation to simultaneously calculate the basic classifier.
Algorithms such as super resolution of images can be applied for low-resolution images. A coarse to-
fine technique may be useful for segmenting multiple vehicle number plates. Since OCR has become
a commonly used and common tool in recent years, instead of redesigning the entire ANPR, ANPR
developers are focusing on increasing OCR accuracy. Some developers are modifying open sources,
like Tesseract, in an attempt to improve their accuracy, as mentioned in the previous section.

P a g e 23 | 26
REFERENCES

[1] C. Henry, S. Y. Ahn and S. -W. Lee, "Multinational License Plate Recognition Using Generalized
Character Sequence Detection," in IEEE Access, vol. 8, pp. 35185-35199, 2020.

[2] P. Adarsh, P. Rathi and M. Kumar, ”YOLO v3-Tiny: Object Detection and Recognition using one stage
improved model,” 2020 6th International Conference on Advanced Computing and Communication Systems
(ICACCS), Coimbatore, India, pp. 687-694, 2020.

[3] I. V. Pustokhina et al., ”Automatic Vehicle License Plate Recognition Using Optimal K-Means With
Convolutional Neural Network for Intelligent Transportation Systems,” in IEEE Access, vol. 8, pp. 92907-
92917, 2020.

[4] Siddiqui, Shahan Yamin, et al. "Smart occupancy detection for road traffic parking using deep extreme
learning machine." Journal of King Saud University-Computer and Information Sciences , 2020.

[5] R.-C. Chen, "Automatic license plate recognition via sliding-window darknet-YOLO deep
learning", Image Vis. Comput., vol. 87, pp. 47-56, Jul. 2019.

[6] J. Yépez, R. D. Castro-Zunti and S.-B. Ko, "Deep learning-based embedded license plate localisation
system", IET Intell. Transp. Syst., vol. 13, no. 10, pp. 1569-1578, Oct. 2019.

[7] R. Naren Babu, V. Sowmya and K. P. Soman, ”Indian Car Number Plate Recognition using Deep
Learning,” 2019 2nd International Conference on Intelligent Computing, Instrumentation and Control
Technologies (ICICICT), Kannur,Kerala, India, pp. 1269-1272, 2019.

[8] R. Laroca, E. Severo, L. A. Zanlorensi, L. S. Oliveira, G. R. Goncalves, W. R. Schwartz, et al., "A robust
real-time automatic license plate recognition based on the YOLO detector", Proc. Int. Joint Conf. Neural
Netw. (IJCNN), pp. 1-10, Jul. 2018.

[9] H. Bura, N. Lin, N. Kumar, S. Malekar, S. Nagaraj and K. Liu, "An Edge Based Smart Parking Solution
Using Camera Networks and Deep Learning," 2018 IEEE International Conference on Cognitive Computing
(ICCC), pp. 17-24, 2018.

[10] J. Redmon and A. Farhadi, "YOLOv3: An incremental improvement", 2018.

[11] M. Karakaya and F. C. Akıncı, "Parking space occupancy detection using deep learning methods," 2018
26th Signal Processing and Communications Applications Conference (SIU), pp. 1-4, 2018.

[12] Ghazali, Muhammad Naqiuddin Bin., Mohammad Azam Rusli “Development of Car Plate Number
Recognition using Image Processing and Database System for Domestic Car Park Application” 2018.

P a g e 24 | 26
[13] Kashyap, Abhishek, et al. "Automatic number plate recognition." 2018 international conference on
advances in computing, communication control and networking (ICACCCN). IEEE, 2018.

[14] S. Zain Masood, G. Shu, A. Dehghan and E. G. Ortiz, "License plate detection and recognition using
deeply learned convolutional neural networks", 2017.

[15] F. Delmar Kurpiel, R. Minetto and B. T. Nassu, "Convolutional neural networks for license plate
detection in images", Proc. IEEE Int. Conf. Image Process. (ICIP), pp. 3395-3399, Sep. 2017.

[16] J. Redmon, S. Divvala, R. Girshick and A. Farhadi, "You only look once: Unified real-time object
detection", Proc. CVPR, pp. 779-788, Jun. 2016.

[17] C. Anantha Reddy, C. Shoba Bindu, “Multi-Level Genetic Algorithm for Recognizing Multiple
License Computer Science and Engineering, 2015.

[18] V. Himani et.al, “Automatic Vehicle Number Plate Localization using Symmetric Wavelets”, ICT and
Critical Infrastructure: Proceedings of the 48th Annual Convention of Computer Society of India, Volume
248 of the series Advances in Intelligent Systems and Computing pp 69-76, 2014.

[19] XiaojunZhai, FaycalBensaali and Reza Sotudeh, “OCR-Based Neural Network for ANPR” in IEEE,
Pp1, 2018.

[20] Zhigang Zhang and Cong Wang, "The Research of Vehicle Plate Recognition Technical Based on BP
Neural Network," AASRI Procedia, vol. 1, pp. 74- 81, 2012.

[21] Fikriye Öztürk and Figen Özen, "A New License Plate Recognition System Based on Probabilistic
Neural Networks," Procedia Technology, vol. 1, pp. 124-128, 2012.

[22] Dataset: https://www.kaggle.com/andrewmvd/car-plate-detection

P a g e 25 | 26

You might also like