You are on page 1of 34

SIGN LANGUAGE DETECTION

Project report submitted in partial fulfillment of the requirement for the degree of

BACHELOR OF TECHNOLOGY

IN

ELECTRONICS AND COMMUNICATION ENGINEERING

By

ARCHITA GUPTA (171021)

SHRUTI SHARMA (171051)

UNDER THE GUIDANCE OF

MR. ANUJ MAURYA

JAYPEE UNIVERSITY OF INFORMATION TECHNOLOGY, WAKNAGHAT


DECEMBER 2020
TABLE OF CONTENTS

CAPTION PAGE NO.

DECLARATION i
ACKNOWLEDGEMENT ii
LIST OF ACRONYMS AND ABBREVIATIONS iii
LIST OF SYMBOLS iv
LIST OF FIGURES v
LIST OF TABLES vi
ABSTRACT vii

CHAPTER-1: INTRODUCTION 1
1.1 Section Title1 1
1.1.1 Sub-Section Title1 2
1.1.2 Sub-Section Title2 3
1.2 Section Title2 5
1.2.1 Sub-Section Title1 5
1.2.2 Sub-Section Title2 6

CHAPTER-2: MACHINE LEARNING 7


2.1 Section Title1 7
2.1.1 Sub-Section Title1 8
2.1.2 Sub-Section Title2 10
2.2 Section Title2 12
2.2.1 Sub-Section Title1 12
2.2.2 Sub-Section Title2 14
2.3 Section Title3 15
2.3.1 Sub-Section Title1 15
2.3.2 Sub-Section Title2 16
2.3.3 Sub-Section Title3 16
.
.
.
.
REFERENCES
APPE
DECLARATION

We hereby declare that the work reported in the B.Tech Project Report entitled “SIGN LANGUAGE

DETECTION” submitted at Jaypee University of Information Technology, Waknaghat, India is an

authentic record of our work carried out under the supervision of PROF. ANUJ MAURYA. We have

not submitted this work elsewhere for any other degree or diploma.

-------------------------- -------------------------
ARCHITA GUPTA SHRUTI SHARMA
171021 171051

This is to certify that the above statement made by the candidates is correct to the best of my knowledge.

-------------------------
ANUJ MAURYA

Date: 5 DECEMBER 2020

Head of the Department/Project Coordinator


ACKNOWLEDGEMENT

We take this opportunity to express our gratitude to our supervisor Prof. Anuj Maurya, for his insightful
advice, motivating suggestions, invaluable guidance, help and support in successful completion of this
project and also for his constant encouragement and advice throughout our project.

The in-house facilities provided by the department throughout the project are also equally
acknowledgeable. We would like to convey our thanks to the teaching and non-teaching staff of the
Electronics and Communication Engineering Department for their invaluable help and support.
LIST OF ACRONYMS AND ABBREVIATIONS
LIST OF SYMBOLS
LIST OF FIGURES
LIST OF TABLES
ABSTRACT

Sign language is a natural language used by hearing or speech impaired people to communicate.

It uses hand gestures instead of sound to convey meaning. More than 2 million people in India

are deaf. They find it difficult to communicate with the normal people because normal people

cannot understand sign languages. There arises a need for sign language translators who can

translate sign language to spoken language and vice versa.

However, the availability of such translators is limited, costly and does not work for a deaf

person's entire life. This led to development of sign language recognition system which can

automatically translate signs into text or voice. n our method, the hand is first passed through a

filter and after the filter is applied the hand is passed through a classifier which predicts the class

of the hand gestures. In our project we basically focus on producing a model which can

recognize. Fingerspelling based hand gestures in order to form a complete word by combining

each gesture.
CHAPTER 1
INTRODUCTION

Communication is essential in building a nation. Good communication leads to better understanding,


and it encompasses all the members of the community, including the deaf. In the Philippines, 1.23% of
the entire population is either deaf, mute or hearing impaired . Sign language bridges the gap of
communication with other people. However, most hearing people do not understand sign language and
learning it is not an easy process. As a result, there is still an undeniable barrier between the hearing
impaired and hearing majority.

Sign language is a form of communication used by people with impaired hearing and speech. People use
sign language gestures as a means of non-verbal communication to express their thoughts and emotions.
But non-signers find it extremely difficult to understand, hence trained sign language interpreters are
needed during medical and legal appointments, educational and training sessions. Over the past five
years, there has been an increasing demand for interpreting services.

The SLR architecture can be categorized into two main classifications based on its input: data gloves-
based and vision-based. Chouhan et al use smart gloves to acquire measurements such as the positions
of hands, joints orientation, and velocity using microcontrollers and specific sensors, i.e.,
accelerometers, flex sensors, etc. There are other approaches to capturing signs by using motion sensors,
such as electromyography (EMG) sensors, RGB cameras, Kinect sensors, leap motion controllers or
their combinations. The advantage of this approach is having higher accuracy, and the weakness is that it
has limited movement. In recent years, the involvement of vision-based techniques has become more
popular, of which input is from camera (web camera, stereo camera, or 3D camera). Sandjaja and
Marcos [10] used color-coded gloves to make hand detection easier. A combination of both architectures
is also possible, which is called the hybrid architecture . While these are more affordable and less
constraining than data gloves, the weakness of this approach is lower accuracy and high computing
power consumption.

The architecture of these vision-based systems is typically divided into two main parts. The first part is
the feature extraction, which extracts the desired features from a video by using image processing
techniques or the computer vision method. From the extracted and characterized features, the second
part that is the recognizer should be learning of the pattern from training data and correct recognition of
testing data on which machine algorithms were employed. Most of the studies mentioned above focus on
translating the signs typically made by the hearing- impaired person or the signer to word(s) that the
hearing majority or non-signer can understand. Although these studies proved that technology is useful
in so many ways, their proponents think that these are intrusive to some hearing–impaired individuals.
Instead, the proponents proposed a system that will help those non-signers who want to learn basic static
sign language and not being intrusive at the same time. It is also important to mention that there are
applications implemented on mobile phones that help the non-signer to learn sign language through
several videos installed on the apps. However, most of these apps require a large amount of storage and
good internet connection.
The proposed study aims to develop a system that will recognize static sign gestures and convert them
into corresponding words. A vision-based approach using a web camera is introduced to obtain the data
from the signer and can be used offline. The purpose of creating the system is that it will serve as the
learning tool for those who want to know more about the basics of sign language such as alphabets,
numbers, and common static signs. The proponents provided a white background and a specific location
for image processing of the hand, thus, improving the accuracy of the system and used Convolutional
Neural Network (CNN) as the recognizer of the system. The scope of the study includes basic static
signs, numbers and ASL alphabets (A–Z). One of the main features of this study is the ability of the
system to create words by fingerspelling without the use of sensors and other external technologies.

LITERATURE REVIEW

Literature review the problem shows that there have been several approaches to address the issue of
gesture recognition in video using several different methods. One of the messages used Hidden Markov
Models (HMM) to recognize facial expressions from video sequences combined with Bayesian Network
Classifiers and Gaussian Tree Augmented Naive Bayes Classifier. Francois also published a paper on
human posture recognition in a video sequence using methods based on 2 D and 3 D appearance. The
work mentions using PCA to recognize silhouettes from a static camera and then using 3 D to model
posture for recognition. This approach has the drawback of having intermediary gestures which may
lead to ambiguity in training and therefore lower accuracy in prediction.

Let's approach the analysis of video segments using neural networks which involves extracting visual
information in the form of feature vectors. Neural networks do face issues such as tracking of hands,
segmentation of subject from the background and environment, illumination, variation, occlusion,
movement and position. The paper splits the dataset into segments, extracts features and classifies using
Euclidean distance and K-nearest neighbor.

Work done by blank defines how to do continuous Indian sign language recognition. The paper proposes
frame extraction from video data, preprocessing the data, extracting key frames from the data followed
by extracting other features, recognition and finally optimization. Preprocessing is done by converting
the video to a sequence of RGB frames. Each frame having the same dimensions. Skin color
segmentation is used to extract skin regions with the help of AHS we gradient. The images of obtained
were converted to binary form. Food keyframes were extracted by calculating a gradient between the
frames. And features were extracted from the keyframes using an orientation histogram. Classification
was done by Euclidean distance, Manhattan distance, chess board distance and Mahalanobis distance.
In a paper by Jie et al. [2], the authors recognized problems in SLR such as problems in recognition
when the signs are broken down to individual words and the issues with continuous SLR. They decided
to solve the problem without isolating individual signs, which removes an extra level of preprocessing
(temporal segmentation) and another extra layer of post-processing because they believed that temporal
segmentation is crucial to SLR and without its errors propagate into subsequent steps. Combined with
the strenuous labelling of individual words adds a huge challenge to SLR without temporal
segmentation. They addressed this issue with a new framework called Hierarchical Attention Network
with Latent Space (LS-HAN), which eliminates the preprocessing of temporal segmentation. The
framework consists of a two-stream CNN for video feature representation generation, a Latent Space for
semantic gap bridging and a Hierarchical Attention Network for space-based recognition.
CHAPTER 2

MACHINE LEARNING

Machine Learning is an application of artificial intelligence that provides the systems the ability to

automatically learn and improve from experience without being explicitly programmed.

Machine learning focuses on the development of computer programs that can access data and

use it to learn for themselves.

The process of learning begins with observations and data, such as examples, direct

experience or instruction, in order to look for patterns in data and make better decisions in the

future based on the examples that we provide. the primary aim is to allow the computers to

learn automatically without human intervention or assistance and adjust actions accordingly.

Machine learning Life Cycle is defined as a cyclical process which involve three phase process

(Pipeline development, Training phase, and Inference phase) acquired by the data scientist and the

data engineers to develop, train and serve the models using the huge amount of data that are

involved in various applications so that the organization can take advantage of artificial intelligence

and machine learning algorithms to derive a practical business values.


2.1 TYPES OF MACHINE LEARNING

Machine learning can be classified as:

SUPERVISED LEARNING:

It contains a target or outcome variable (dependent variable) which is to be predicted from

given set of predictors (independent variables). Using these sets of variables, we can generate a

function that maps inputs to desire outputs.

Based on the type of target variable, supervised learning problems can further be divided into

two groups:

Regression: when the target variable is continous in nature.

Classification: when the target variable is discrete in nature.


There are different algorithms for regression and classification problems.

UNSUPERVISED LEARNING:

In this learning we do not have any outcome variable or target to predict. It is mainly used for

documents, market segmentation etc.

2.3 MODEL BUILDING CYCLE

Any machine learning model development can broadly be divided into six steps:

1) PROBLEM DEFINITION involves defining and understanding the problem in a more

comprehensive way. We identify the purpose of the problem and the prediction target variable.

2) HYPOTHESIS GENERATION is the guessing approach through which we

derive some essential data parameters that have a significant correlation with the
prediction target.

3) DATA COLLECTION is gathering the data from relevant sources regarding the

analytical problem, after the hypothesis generation.

4) DATA EXPLORATION AND TRANSFORMATION helps in analyzing the data and

converting it in the required form. It helps in detecting outliers and missing values.

There are several sub steps involved in data exploration:

a. Reading the data: We read the raw data available into analysis system/ software.

b. Variable identification: It is the process of identifying the variables as

 dependent or independent

 Continous or discrete

c. Univariate analysis: here we explore one variable at a time, summarize it, make out the

Summary.

d. Bivariate analysis: here we study the empirical relationship between two variables.

e. Missing value treatment: to identify and treat missing value.

f. Outliers treatment: to detect the anamolies and correct it.

g. Variable transformation: It is a process by which we replace a variable with some function of

that variable.

5) MODEL BUILDING is a process to create a mathematical model for predicting future

based on past data.

First base model or benchmark model:

Create dataset for predictive model: we divide the dataset into two groups :

TRAINING DATA
TESTING DATA

TRAINING DATA: The observations in the training set form the experience that the

algorithm uses to learn. In supervised learning problems, each observation consists of an

observed output variable and one or more observed input variables.

The part of data we use to train our model. This is the data which your model actually

sees (both input and output) and learn from.

TESTING DATA: The test set is a set of observations used to evaluate the performance

of the model using some performance metric. It is important that no observations from the

training set are included in the test set. If the test set does contain examples from the

training set, it will be difficult to assess whether the algorithm has learned to generalize

from the training set or has simply memorized it.

Once our model is completely trained, testing data provides the unbiased evaluation. When we

feed in the inputs of Testing data, our model will predict some values (without seeing actual

output). After prediction, we evaluate our model by comparing it with actual output present in the

testing data. This is how we evaluate and see how much our model has learned from the
experiences feed in as training data, set at the time of training.
CHAPTER 3

NEURAL NETWORKS

A neural network is, simply put, a series of algorithms that is extremely good at recognizing underlying

relationships (correlations) in a set of data through a process that mimics the way the human brain

operates.

As humans, we have the exceptional ability to notice patterns in our everyday lives. Think of every time

you solved a puzzle, or when you instantly recognized a song within a few seconds of it playing, or when

you look anywhere and immediately recognize the thing that you are looking at. Or even when you speak.

How were you able to achieve these extraordinary things without even having to think about it? This is

thanks to our powerful brain, which gives us the ability to recognize patterns and notice correlations and

has been the entire inspiration for the research behind Deep learning, with the hopes that we can create

even more powerful machines by trying to replicate and even improve what humans are already able to

do. 

Neural networks have endless applications in today’s world. From solving many business problems such

as sales forecasting, customer research, data validation, and risk management, to image and voice

recognition in the world of medicine, to self-driving cars, the applications are truly endless.
WORKING

An ANN is a model that solves a super complex math problem using a super complex math function. We

give it a problem with a bunch of data describing it (the input layer), and it is able to find out the optimal

solutions (the output layer, it is what you want to predict) by computing a complex function.

The lines connecting the nodes/neurons represented as circles symbolize the connections that happen

between neurons, and are what allows the model to become more accurate over time (by updating

the weights of the connections).

The input layer: What the machine always knows. Ex: The banking behavior of a customer.

The hidden layer: Where the magic happens.

The output layer: What the machine will predict Ex: Whether or not the customer will quit within the
next 6 months.

Node/Neuron: A thing that holds a number. Represented by a circle in the image.

Gradient descent: The algorithm that allows us to get more and more accurate data as the model

improves by updating the weights of the connections.

Weights: These are the things that get updated by the model to become more accurate after every

iteration. They are represented by the connections formed between each neuron. Each connection has a

different weight.

Import the training set which serves as the input layer.


Forward propagate the data from the input layer through the hidden layer to the output layer, where

we get a predicted value y. Forward propagation is the process by which we multiply the input node by a

random weight, and applying the activation function.

Measure the error between the predicted value and the real value.

Backpropagate the error and use gradient descent to modify the weights of the connections.

Repeats these steps until the error is minimized sufficiently, by finding the optimal weights.

CONVOLUTIONAL NEURAL NETWORKS

Image classification is the task of taking an input image and outputting a class or a probability of classes

that best describes the image. In CNN, we take an image as an input, assign importance to its various

aspects/features in the image and be able to differentiate one from another. The pre-processing required in

CNN is much lesser as compared to other classification algorithms.

Unlike regular Neural Networks, in the layers of CNN, the neurons are arranged in 3 dimensions: width,

height, depth. The neurons in a layer will only be connected to a small region of the layer (window size)

before it, instead of all of the neurons in a fully-connected manner. Moreover, the final output layer

would have dimensions (number of classes), because by the end of the CNN architecture we will reduce

the full image into a single vector of class scores


A CNN typically has three layers: a convolutional layer, pooling layer, and fully connected layer.

1.Convolution Layer : The main objective of convolution is to extract features such as edges, colours,
corners from the input. As we go deeper inside the network, the network starts identifying more complex
features such as shapes,digits, face parts as well.
In convolution layer we take a small window size [typically of length 5*5] that extends to the depth of
the input matrix. The layer consist of learnable filters of window size. During every iteration we slid the
window by stride size , and compute the dot product of filter entries and input values at a given position.
As we continue this process well create a 2-Dimensional activation matrix that gives the response of that
matrix at every spatial position. That is, the network will learn filters that activate when they see some
type of visual feature such as an edge of some orientation or a blotch of some color.
At the end of the convolution process, we have a featured matrix which has lesser
parameters(dimensions) than the actual image as well as more clear features than the actual one. So, now
we will work with our featured matrix from now on.
2. Pooling Layer : We use pooling layer to decrease the size of activation matrix and ultimately reduce
the learnable parameters. This layer is solely to decrease the computational power required to process the
data. It is done by decreasing the dimensions of the featured matrix even more. In this layer, we try to
extract the dominant features from a restricted amount of neighborhood
There are two type of pooling :

a) Max Pooling : In max pooling we take a window size [for example window of size 2*2], and only
take the maximum of 4 values. Well lid this window and continue this process, so well finally get a
activation matrix half of its original Size.

b) Average Pooling : In average pooling we take average of all values in a window.


So, after pooling layer, we have a matrix containing main features of the image and this matrix has even
lesser dimensions, which will help a lot in the next step.

3. Fully Connected Layer : In convolution layer neurons are connected only to a local region, while in a
fully connected region, well connect the all the inputs to neurons.

4. Final Output Layer : After getting values from fully connected layer, well connect them to final layer
of neurons[having count equal to total number of classes], that will predict the probability of each image
to be in different classes.
1. Provide the input image into convolution layer.

2. Take convolution with featured kernel/filters.

3. Apply pooling layer to reduce the dimensions.

4. Add these layers multiple times.

5. Flatten the output and feed into a fully connected layer.

6. Now train the model with backpropagation using logistic regression.


CHAPTER 4

PROPOSED PROJECT

BLOCK DIAGRAM OF SIGN LANGUAGE DETECTION

Sign language recognition (SLR) system takes an input expression from the hearing

impaired person, gives output to the normal person in the form of text or voice.

Our project goal is to take the simple step in connecting the social and communication

bridge between regular people and the disabled people with the help of Sign Language.

The basic steps in sign language detection are:

Data acquisition

Data preprocessing

Feature extraction

Gesture classification

DATA ACQUISTION: The different approaches to acquire data about the hand gesture can be

done in the following ways:

1.Use of sensory devices:: It uses electromechanical devices to provide exact hand configuration,

and position. Different glove based approaches can be used to extract information. But it is

expensive and not user friendly.

2. Vision based approach: In vision based methods computer camera is the input device for

observing the information of hands or fingers. The Vision Based methods require only a camera,
thus realizing a natural interaction between humans and computers without the use of any extra

devices. These systems tend to complement biological vision by describing artificial vision systems

that are implemented in software and/or hardware. The main challenge of vision-based hand

detection is to cope with the large variability of human hand’s appearance due to a huge number of

hand movements, to different skin-colour possibilities as well as to the variations in view points,

scales, and speed of the camera capturing.

DATA PREPROCESSING

As images are not captured in a controlled environment and they have different resolutions and

sizes, so preprocessing on image is required. It is a method to digitalize images and extract some

useful information called region of interest (ROI) from image.

This phase contains three steps which are image segmentation (skin masking),skin detection, edge

detection. From the raw image skin mask is generated by converting the image to HSV color space.

Using the skin mask, skin can be segmented. Finally, the Canny Edge technique is used to detect

and recognize the presence of sharp discontinuities in an image, thus detecting the edges of the

image
FEATURE EXTRACTION

Feature extraction is one of the most important step in sign language recognition, because it gives

feature vector as output which is used by classifier as an input. Feature extraction techniques used

to find objects and shapes must be reliable and robust without depending on orientation,

illumination level, position and size of object in image.

The features can be obtained using different techniques like texture features , orientation histogram

etc. In some cases, the Principal Component Analysis (PCA) is used to reduce dimensionality to get

feature vector from ROI.

CLASSIFICATION

Once the dataset is generated, the next step is classification . Before going to classification, it is

important to divide the data for training and testing.

Once the data is ready, the next step is to feed the training data to machine learning model. During

testing phase, trained identified class corresponding to signs and give output in text or audio format.

Some of the common used classifiers are Artificial Neural Network (ANN), K-Nearest Neighbour

(KNN), Support Vector Machine (SVM) etc.

Artificial Neural Network (ANN)

An artificial neural network involves artificial neurons that show complex behavior determined by

connections between elements and its parameters. ANN is used to infer a function from given inputs

and observations. One of basic network belongs to unsupervised learning is Kohonen-Self

Organizing Map. It was used to classify sign languages gestures of the alphabets.
Two most used networks of supervise leaning are Feed Forward Back Propagation Network (BPN),

and Radial Basis Function Neural Network (RBFNN). RBFNN was used in or static gesture

recognition of Sign Language.

K-Nearest Neighbor (KNN)

K-nearest neighbor (KNN) classifier classifies objects based on feature space using supervised

learning algorithm. Nearest neighbor algorithm is most popular classification technique.

An object is classified to the class which is most common among its K nearest neighbors. K nearest

neighbors is a simple algorithm that stores all available cases and classifies new cases based on a

similarity measure.

Support Vector Machine (SVM)


The SVM is widely known pattern recognition supervised learning technique. The basic SVM takes

input data and predict that which two possible classes generate output.

Support Vector Machines are based on decision hyperplanes that define decision boundaries. A

decision plane separates two set of objects having different class membership. Support Vector

Machine aims to maximize decision boundary between hyperplanes.

After classification, we need to check the accuracy of our output

TESTING

To verify the accuracy of the letter/number gestures recognition, the number of the correctly

recognized letters/numbers that appeared on the screen was added and divided by the product of the

total number of users multiplied by the number of trials.

If the system generates the equivalent letter/number beyond 15 seconds, it is not included in the

total number of correct recognized letters/number


CHAPTER 5

METHODOLOGY

The objective of this project is to identify the symbolic expressions through images so that the
communication gap between a normal and hearing impaired person can be easily reduced.

a) To collect the dataset.

b) To segment the skin part from the image, as the remaining part can be regarded as noise w.r.t

the character classification problem.

c) To extract relevant features from the skin segmented images which can prove significant for

the next stage i.e learning and classification.

d) To use the extracted features as input into various supervised learning models for training

and then finally use the trained models for classification.

PREREQUISITES

TensorFlow: Tensorflow is an open source software library for numerical computation.

First we define the nodes of the computation graph, then inside a session, the

actual computation takes place. TensorFlow is widely used in Machine Learning.

actual computation takes place. TensorFlow is widely used in Machine Learning.

Keras: Keras is a high-level neural networks library written in python that works as a

wrapper to TensorFlow. It is used in cases where we want to quickly build and test the

neural network with minimal lines of code. It contains implementations of commonly used

neural network elements like layers, objective, activation functions, optimizers, and tools

to make working with images and text data easier.

OpenCV: OpenCV(Open Source Computer Vision) is an open source library of

programming functions used for real-time computer-vision. It is mainly used for image

processing, video capture and analysis for features like face and object recognition. It is
written in C++ which is its primary interface, however bindings are available for Python,
Java, MATLAB/OCTAVE

Jyupter notebook: The Jupyter Notebook is an open-source web application that allows you to
create and share documents that contain live code, equations, visualizations and narrative text.
Uses include: data cleaning and transformation, numerical simulation, statistical modeling, data
visualization, machine learning, and much more.

The system will be implemented through a desktop with a 1080P Full-HD web camera. The camera will
capture the images of the hands that will be fed in the system. Note that the signer will adjust to the size of
the frame so that the system will be able to capture the orientation of the signer’s hand. When the camera
has already captured the gesture from the user, the system classifies the test sample and compares it in the
stored gestures in a dictionary, and the corresponding output is displayed on the screen for the user.

A. Data collection

Gathering of datasets for static SLR was done through the use of continuous capturing of images using
Python. Images were automatically cropped and converted to a 50 ×50 pixels black and white sample. Each
class contained 1,200 images that were then flipped horizontally, considering the left-handed signers.

B. Hand Skin Color Detection using Image Processing

For improved skin color recognition, the signer was advised to have a clear background for the hands, which
will make it easier for the system to detect the skin colors. Skin detection took place by using cv2.cvtColor.
Images were converted from RGB to HSV. Through the cv2.inRange function, the HSV frame was supplied,
with the lower and upper ranges as the arguments. The mask was the output from the cv2.inRange function.
White pixels in the mask produced were considered to be the region of the frame weighed as the skin.
Although black pixels are disregarded, cv2.erode and cv2.dilate functions remove small regions that may
represent a small false-positive skin region. Then, two iterations of erosions and dilations were done using
this kernel. Lastly, the resulting masks were smoothened using a Gaussian blur.

C. Network Layers

The goal of this study is to design a network that can effectively classify an image of a static sign language
gesture to its equivalent text by a CNN. To attain specific results, we used Keras and CNN architecture
containing a set of different layers for processing of training of data. The convolutional layer is composed of
16 filters, each of which has a 2 × 2 kernel. Then, a 2 × 2 pooling reduces spatial dimensions to 32 × 32.
From 16 filters of the convolutional layers, filters are increased to 32, whereas that of the Max Pooling
filters is increased to 5 × 5. Then, the number of filters in the CNN layers is increased to 64, but max
pooling is still at 5 × 5. Dropout(0.2) functions with randomly disconnecting each node from the current
layer into the next layer. The model is now being flattened or is now converted into a vector; then, the dense
layer is added. The fully connected layer is being specified by the dense layer along with rectified linear
activation. We finished the model with the SoftMax classifier that would give the predicted probabilities for
each class label.

D. Training the System

The training for character and SSL recognition was done separately; each dataset was divided into two:
training and testing. This was done to see the performance of the algorithm used. The network was
implemented and trained through Keras and TensorFlow as its backend using a Graphics Processing Unit
GT-1030 GPU.
CHAPTER 6
WORK DONE TILL NOW
PUBLICATIONS
PLAGIARISM REPORT

You might also like