Realistic Handwriting Generation Using RNN's

REALISTIC HANDWRITING GENERATION
USING RNNs.
Project Report Submitted
In Partial Fulfillment of the Requirements
For the Degree Of
BACHELOR OF ENGINEERING
IN
COMPUTER SCIENCE AND ENGINEERING
Submitted by
Mohammed Umar 1604-17-733-079

Mohammed Ehtesham Uddin Qureshi 1604-17-733-102
Mohammed Abdul Wase 1604-17-733-112
COMPUTER SCIENCE AND ENGINEERING DEPARTMENT

MUFFAKHAM JAH COLLEGE OF ENGINEERING & TECHNOLOGY
(Affiliated to Osmania University)
Mount Pleasant, 8-2-24, Road No. 3, Banjara Hills, Hyderabad-34
2021
1
Date: 01/06/2021
CERTIFICATE
This is to certify that the project dissertation titled “Realistic Handwriting
Generation Using RNNs” being submitted by
1. MOHAMMED UMAR (1604-17-733-079)
2. MOHAMMED EHTESHAM UDDIN QURESHI (1604-17-733-102)
3. MOHAMMED ABDUL WASE (1604-17-733-112)
in Partial Fulfillment of the requirements for the award of the degree of

BACHELOR OF ENGINEERING IN COMPUTER SCIENCE AND
ENGINEERING in MUFFAKHAM JAH COLLEGE OF ENGINEERING AND
TECHNOLOGY, Hyderabad for the academic year 2020-2021 is the bonafide work
carried out by them. The results embodied in the report have not been submitted to
any other University or Institute for the award of any degree or diploma.
Signatures:
Internal Project Guide Head CSED

Mr. RAJESHAM GAJULA Dr. A.A. MOIZ QYSER
(Assistant Professor)
External Examiner
2
DECLARATION
This is to certify that work reported in the major project entitled

“Realistic Handwriting Generation Using RNNs.”
is a record of the bonafide work done by us in the Department of
Computer Science and Engineering, Muffakham Jah College of
Engineering and Technology, Osmania University. The results
embodied in this report are based on the project work done entirely by
us and not copied from any other source.
1. MOHAMMED UMAR (1604-17-733-079)
2. MOHAMMED EHTESHAM UDDIN QURESHI (1604-17-733-102)
3. MOHAMMED ABDUL WASE (1604-17-733-112)
3
ACKNOWLEDGEMENT
Our hearts are filled with gratitude to the Almighty for empowering us with
courage, wisdom and strength to complete this project successfully. We give him
all the glory, honor and praise.
We thank our Parents for having sacrificed a lot in their lives to impart the
best education to us and make us promising professionals for tomorrow.
We would like to express our sincere gratitude and indebtness to our project
guide Mr. Rajesham Gajula, Assistant Professor CSED, for his valuable
suggestions and guidance throughout the course of this project.
We are happy to express our profound sense of gratitude and indebtedness to

Prof. Dr. Ahmed Abdul Moiz Qyser, Head of the Computer Science and
Engineering Department, for his valuable and intellectual suggestions apart from
educate guidance constant encouragement right throughout our work and making
successful.
With a great sense of pleasure and privilege, we extend our gratitude to Prof.
Dr. Umar Farooq, Associate Professor and Program Coordinator (CSED), project
in-charge, offered valuable suggestions was a pre-requisite to carry out support in
every step.
We are pleased to acknowledge our indebtedness to all those who devoted

themselves directly or indirectly to make this project work a total success.
MOHAMMED UMAR
MOHAMMED EHTESHAM UDDIN QURESHI
MOHAMMED ABDUL WASE
4
ABSTRACT
In this evolving tech world, Automation has become the Go-to procedure
in everyone’s life, with this, we introduce a system that further reduces
human effort.
Finding a pattern in a seemingly random event is about, how do we

process the given information and convert it into something useful. In this
project, we try to connect the dots of recreating hand-writing of a given
person.
Hand-writing is one of the most important cryptographic parameters

and a vital indicator of people’s characteristics such as their personality.
In this project, we demonstrate how Long Short-term Memory

recurrent neural networks can be used to generate complex sequences
with long-range structure, simply by predicting one data point at a time.
The approach is demonstrated for text (where the data are discrete)
and online handwriting (where the data are real-valued). It is then
extended to handwriting synthesis by allowing the network to condition
its predictions on a text sequence. The resulting system is able to generate
highly realistic cursive handwriting in a wide variety of styles.
Keywords: - Hand-writing, Recurrent Neural Networks, Long Short-term

Memory, Predictions, Text Sequence, Hand-writing Synthesis.
5
CONTENTS
TITLE
CERTIFICATE 2
DECLARATION 3
ACKNOWLEDGEMENT 4
ABSTRACT 5
LIST OF FIGURES 8
LIST OF TABLES 9
1. INTRODUCTION 10
1.1 Problem Statement 11
1.2 Objectives 11
2. LITERATURE SURVEY 12
2.1 Related Work 13
3. PROPOSED SYSTEM 15
3.1 System Requirements 16
3.2 System Architecture 17
3.3 Methodology 17
3.3.1 Recurrent Neural Networks (RNNs) 18
3.3.2 Long-short term Memory 18
3.3.3 Text Prediction 19
3.3.4 Handwriting Prediction 20
3.4 Handwriting Synthesis 21
3.5 Flow Diagram Overview 23
6
4. ALGORITHMS
4.1 Recurrent Neural Networks (RNNs) 24
4.1.1 Definition 24
4.1.2 Functionality 24
4.1.3 Architecture 26
4.1.4 Prediction Network 27
4.2 Long-short Term Memory 29
4.2.1 Definition 29
4.3 Mixture Density Output 35
4.3.1 Definition 35
4.3.3 Handwriting Prediction 41
4.3.4 Handwriting Synthesis 44
4.3.5 Synthesis Network 45
4.3.6 Alignment 47
4.4 Experiments 51
4.4.1 Unbiased Sampling 53
4.4.2 Biased Sampling 54
5. MODULES 56
5.1 Pre-Processing Module 56
5.2 Training Module 57
5.3 Generation Module 58
6. EXECUTION and OUTPUT 59

6.1 Pre-Processing Module 59
7
6.3 Generating Module 60
7. FUTURE SCOPE 62
8. CONCLUSION 63
9. REFERENCES 64
LIST OF FIGURES
3.2 System Architecture 17

3.3.4 Training Samples 20
3.4 Online Handwriting Samples 22
3.5 Flow Diagram 23
4.1.2 RNN Architecture 26
4.2 LSTM Basic Architecture 29
4.2 Standard RNN 30
4.2 LSTM Module 31
4.2 LSTM Model 33
4.2.2 LSTM Cell 34
4.3.2 Mixture Density Output 40
for handwriting prediction
4.3.3 Online Handwriting Samples 43

Generated by Prediction Network
4.3.4 Synthesis Network 45
4.3.6 Window Weights 47
4.3.6 Soft Windows 48
4.3.6 Mixture Density Output 50
for handwriting synthesis
4.4.1 Unbiased Sampling 53
8
4.4.2 Biased Sampling 55
5.1 Preprocessing Module 56
5.3 Generate Module 58
6.1 Command for Execution 59
6.2 After Execution of Module 59
6.3 Training the Module 60
6.4 Running the Generate Script 61
6.5 Hello World Output 61
LIST OF TABLES
3.1 System Requirements 16
4.3.3 Handwriting Prediction Results 42
4.4 Handwriting Synthesis Results 52
9
1. INTRODUCTION
Hand-writing is one of the most important cryptographic parameters and a vital

indicator of people’s characteristics such as their personality. In this evolving tech world,
Automation has become the Go-to procedure in everyone’s life, with this, we introduce a
system that further reduces human effort. With respect to the above scenario, a system is
proposed that will generate custom Handwriting fonts, this is possible by using Recurrent
Neural Networks (RNNs).
Recurrent neural networks (RNNs) are a rich class of dynamic models that have
been used to generate sequences in domains as diverse as music, text and motion capture
data. RNNs can be trained for sequence generation by processing real data sequences one
step at a time and predicting what comes next.
RNNs are ‘fuzzy’ in the sense that they do not use exact templates from the training
data to make predictions, but rather—like other neural networks— use their internal
representation to perform a high-dimensional interpolation between training examples. In
principle a large enough RNN should be sufficient to generate sequences of arbitrary
complexity. In practice however, standard RNNs are unable to store information about
past inputs for very long time. As well as diminishing their ability to model long-range
structure, this ‘amnesia’ makes them prone to instability when generating sequences.
Having a longer memory has a stabilizing effect, because even if the network cannot
make sense of its recent history, it can look further back in the past to formulate its
predictions. The problem of instability is especially acute with real-valued data, where it
is easy for the predictions to stray from the manifold on which the training data lies.
We believe that a better memory is a more profound and effective solution. Long
Short-term Memory (LSTM) is an RNN architecture designed to be better at storing and
accessing information than standard RNNs. The resulting system is able to generate
highly realistic cursive handwriting in a wide variety of styles.
10
1.1 Problem Statement
It is very important in today’s world that extensive use of automation be used so that
people can concentrate on more complicated tasks so that the world becomes a better
place.
Handwriting recognition and generation is one task that requires automation in its
entirety. Several indispensable departments require this automation some of those
departments include criminal, forensics, psychological departments etc. Computers
are capable of converting hand-written text into different text styles. These instances
are usually seen in all electronic devices such as mobiles, computers, and other
devices. This is very common technique available globally. This method will only
be able to generate predefined font styles for the handwritten text. We cannot
generate the handwriting for the given text in the existing technique.
1.2 Objectives
1. The main objective is to create a stable system that can generate custom
handwriting fonts.
2. To be used for drafting official documents

3. To implement the ideas of neural networks.
4. To provide a new age platform for use in forensics or in psychological
departments where handwriting can be used as a tool to understand an
individual’s traits.
5. To gain a better insight into the internal representation of the data.
11
2 LITERATURE SURVEY
Handwriting synthesis is the generation of handwriting for a given text. Clearly the
prediction networks we have described so far are unable to do this, since there is no way
to constrain which letters the network writes.
This section describes an augmentation that allows a prediction network to generate
data sequences conditioned on some high-level annotation sequence (a character string, in
the case of handwriting synthesis). The resulting sequences are sufficiently convincing
that they often cannot be distinguished from real handwriting. Furthermore, this realism is
achieved without sacrificing the diversity in writing style demonstrated in the previous
section.
The main challenge in conditioning the predictions on the text is that the two
sequences are of very different lengths (the pen trace being on average twenty-five times
as long as the text), and the alignment between them is unknown until the data is generated.
This is because the number of co-ordinates used to write each character varies greatly
according to style, size, pen speed etc. One neural network model able to make sequential
predictions based on two sequences of different length and unknown alignment is the RNN
transducer. However preliminary experiments on handwriting synthesis with RNN
transducers were not encouraging.
A possible explanation is that the transducer uses two separate RNNs to process the
two sequences, then combines their outputs to make decisions, when it is usually more
desirable to make all the information available to single network. This work proposes an
alternative model, where a ‘soft window’ is convolved with the text string and fed in as an
extra input to the prediction network.
12
2.1 Related Work
1. Synthesizing and Imitating Handwriting using Deep Recurrent Neural
Networks and Mixture Density Networks
K.MANOJ KUMAR ,HARISH KANDALA, Dr.N.SUDHAKAR REDDY
Deep recurrent neural networks specifically Deep LSTM cells can be used along
with a Mixture Density Network to generate artificial handwriting data. But using
this model we can only generate random handwriting styles which are being
hallucinated by the model. Mimicking a specific handwriting style is not so efficient
with this model. Mimicking or imitating a specific handwriting style can have an
extensive variety of applications like generating personalized handwritten
documents, editing a handwritten document by using the similar handwriting style
and also it is extended to compare handwriting styles to identify a forgery. A web
prototype is developed along with the model to test the results where the user can
enter the text input and select handwriting style to be used. And the application will
return the handwritten document containing input text mimicking the selected
handwriting style.
2. Neural Networks for Handwritten English Alphabet Recognition

Yusuf Perwej and Ashish Chaturvedi proposed system based on Neural network
pattern recognition, hand written character recognition Here, the goal of a character
recognition system is to transform a hand written text document on paper into a
digital format that can be manipulated by word processor software. The system is
required to identify a given input character form by mapping it to a single character
in a given character set. Each hand written character is split into a number of
segments (depending on the complexity of the alphabet involved) and each segment
is handled by a set of purpose-built neural network. The final output is unified via a
lookup table. Neural network architecture is designed for different values of the
network parameters like the number of layers, number of neurons in each layer, the
initial values of weights, the training coefficient and the tolerance of the correctness.
13
The optimal selection of these network parameters certainly depends on the
complexity of the alphabet.
3. LEARNING ALGORITHMS FOR CLASSIFICATION:

A COMPARISON ON HANDWRITTEN DIGIT RECOGNITION
Yann LeCun, L. D. Jackel, Leon Bottou , Corinna Cortes, John S. Denker, Harris
Drucker, Isabelle Guyon, Urs A. Muller, Eduard Sackinger, Patrice Simard, and
Vladimir Vapnik.
This paper compares the performance of several classifier algorithms on a standard
database of handwritten digits. We consider not only raw accuracy, but also training
time, recognition time, and memory requirements. When available, we report
measurements of the fraction of patterns that must be rejected so that the remaining
patterns have misclassification rates less than a given threshold.
14
3. PROPOSED SYSTEM
Handwriting is a skill developed by humans from the very early stage in the
order to represent his/her thoughts visually using letters and making meaningful
words and sentences. Every person improves this skill by practicing and
develops his/her own style of writing.
The proposed system helps us to create a stable system that can generate
custom handwriting fonts that are accurately relatable to individual. This system
can be well used in forensics department and also for drafting official
documents or in psychological departments where handwriting can be used as
a tool to understand an individual’s traits.
In this for our project we took online handwriting models for training our
model, Online handwriting is an attractive choice for sequence generation due
to its low dimensionality and ease of visualization. The data used for this system
were taken from the IAM online handwriting database (IAM-OnDB).
IAM-OnDB consists of handwritten lines collected from 221 different
writers using a smart whiteboard. The writers were asked to write forms from
the Lancaster-Oslo-Bergen text corpus, and the position of their pen was tracked
using an infra-red device in the corner of the board. IAM-OnDB is divided into
a training set, two validation sets and a test set, containing handwritten lines
taken from several inputs.
For our project, each line was treated as a separate sequence (meaning that
possible dependencies between successive lines were ignored). In order to
maximise the amount of training data, we used the training set, test set and the
larger of the validation sets for training and the smaller validation set for early-
stopping. The main goal was to generate convincing-looking handwriting.
15
As we are dealing with the temporal data, it is obvious that we need to use
recurrent neural networks. But a traditional recurrent cell is not quite effective
in storing long dependencies. For this purpose, Long Short-Term Memory cells
are used in its place of traditional RNN cell. They maintain cell states and have
longer memory by making use of various internal gates. With the help of LSTM
cells, we can have long-term dependencies but one LSTM cell may not be so
effective in abstracting the details of handwriting stroke. In order to have a
deeper understanding of handwriting strokes to the network, multiple LSTM
cells are stacked on top of each other to create a deep recurrent neural network.
In our case, three LSTM layers are used. Now with this architecture, we can
produce any handwritten text in some random style.
3.1 System Requirements
S.NO HARDWARE SOFTWARE
1. 3Ghz Intel/Amd CPU Python 3.5.0
2. Nvidia Geforce 1650 Visual Studio Code /

Pycharm / Jupyter Notebook
3. 20 Gb HDD space TensorFlow v1.6.0
4. 8 GB Ram Windows OS
Table 3.1: System Requirements
16
3.2 System Architecture
We synthesize and generate handwriting by using LSTM model and recurrent neural
networks. In addition, here we introduce two methods to make the model better at
imitating any handwriting style provided enough data and time to retrain. Later they
are fed to the python libraries and using this the handwriting is generated
Figure 3.2: System Architecture
3.3 Methodology
Compared to other algorithms used in previous systems, the proposed
algorithm is proficient enough to battle large handwriting variations. Large
handwriting variations tend to disrupt the efficiency of pre-existing algorithms.
To reduce this RNN and LSTM are taken.
We need to generate sequences to improve classification
• To create synthetic training data
• To simulate situations
17
• To understand the data
The role of Memory(LSTM):
Having a longer memory has several advantages:
• Need to remember the past to predict the future
• Can store and generate longer range patterns
• Especially ‘disconnected’ patterns like balanced quotes and brackets
•More robust to ‘mistakes’
3.3.1 Recurrent Neural Networks(RNNs)

● The RNN is basically allowing a multi-layer perceptron and connecting with
the hidden layer.
● A Perceptron is an algorithm for supervised learning of binary classifiers.
This algorithm enables neurons to learn and processes elements in the training
set one at a time
● In neural networks, a hidden layer is located between the input and output of
the algorithm, in which the function applies weights to the inputs and directs
them through an activation function as the output. In short, the hidden layers
perform nonlinear transformations of the inputs entered into the network
● That is how hidden layer receives input.
● In the architecture we can clearly see that the input is not only given to the
bottom hidden layer but also to all the hidden layer simultaneously.
3.3.2 Long Short-Term Memory

● LSTM basically uses these multiplicative gates for various operations.
● These three gate units act like differentiable versions of read write and reset
operations.
18
● Cell is the place where the memory is stored in the network.
● This gives LSTM to store information for longer times unlike RNN.
● The output gate allows to read, so it has to be open so that reading can happen.
● As long as forget gate is open info remains in the cell
3.3.3 Text Prediction

Text data is discrete, and is typically presented to neural networks using ‘one-hot’
input vectors. That is, if there are K text classes in total, and class k is fed in at time
t, then xt is a length K vector whose entries are all zero except for the kth, which is
one. Pr(xt+1jyt) is therefore a multinomial distribution, which can be naturally
parameterised by a softmax function at the output layer:
The only thing that remains to be decided is which set of classes to use. In most
cases, text prediction (usually referred to as language modelling) is performed
at the word level. K is therefore the number of words in the dictionary.This
can be problematic for realistic tasks, where the number of words (including
variant conjugations, proper names, etc.) often exceeds 100,000. As well as
requiring many parameters to model, having so many classes demand a huge
amount of training data to adequately cover the possible contexts for the words.
19
3.3.4 Handwriting Prediction
To test whether the prediction network could also be used to generate convincing
real-valued sequences, we applied it to online handwriting data (online in this
context means that the writing is recorded as a sequence of pen-tip locations, as
opposed to offline handwriting, where only the page images are available). Online
handwriting is an attractive choice for sequence generation due to its low
dimensionality (two real numbers per data point) and ease of visualization.
IAM-OnDB consists of handwritten lines collected from 221 different writers

using a smart whiteboard. The writers were asked to write forms from the
Lancaster-Oslo-Bergen text corpus, and the position of their pen was tracked using
an infra-red device in the corner of the board. IAM-OnDB is divided into a training
set, two validation sets and a test set, containing handwritten lines taken from
several inputs.
20
3.4 Handwriting Synthesis
Handwriting synthesis is the generation of handwriting for a given text. Clearly
the prediction networks we have described so far are unable to do this, since
there is no way to constrain which letters the network writes. This section
describes an augmentation that allows a prediction network to generate data
sequences conditioned on some high-level annotation sequence (a character
string, in the case of handwriting synthesis). The resulting sequences are
sufficiently convincing that they often cannot be distinguished from real
handwriting. Furthermore, this realism is achieved without sacrificing the
diversity in writing style demonstrated in the previous section.
sequences are of very different lengths (the pen trace being on average twenty-five
times as long as the text), and the alignment between them is unknown until the data
is generated. This is because the number of co-ordinates used to write each character
varies greatly according to style, size, pen speed etc. One neural network model able
to make sequential predictions based on two sequences of different length and
unknown alignment is the RNN transducer. However preliminary experiments on
handwriting synthesis with RNN transducers were not encouraging.
21
Figure: Online handwriting samples generated by the prediction network.
All samples are 700 timesteps long. at the same time as it makes the predictions,
so that it dynamically determines an alignment between the text and the pen
locations. Put simply, it learns to decide which character to write next.
22
3.5 Flow Diagram Overview
Figure 3.5: Flowchart of System Architecture
23
4. ALGORITHMS
Two different algorithms have been used for developing the system, that serve
different purposes according to the system’s needs.
The algorithms used are:
1. Recurrent Neural Networks (RNNs)
2. Long Short-Term Memory (LSTM)
4.1 Recurrent Neural Networks(RNNs)

4.1.1 Definition
Recurrent neural networks (RNNs) are a rich class of dynamic models that have
been used to generate sequences in domains as diverse as music, text and motion
capture data. RNNs can be trained for sequence generation by processing real data
sequences one step at a time and predicting what comes next.
4.1.2 Functionality
RNNs can be trained for sequence generation by processing real data sequences one
step at a time and predicting what comes next. Assuming the predictions are
probabilistic, novel sequences can be generated from a trained network by
iteratively sampling from the network's output distribution, then feeding in the
sample as input at the next step. In other words by making the network treat its
inventions as if they were real, much like a person dreaming. Although the network
itself is deterministic, the stochasticity injected by picking samples induces a
distribution over sequences. This distribution is conditional, since the internal state
of the network, and hence its predictive distribution, depends on the previous inputs.
24
RNNs are ‘fuzzy’ in the sense that they do not use exact templates from the training
data to make predictions, but rather—like other neural networks— use their internal
representation to perform a high-dimensional interpolation between training examples.
This distinguishes them from n-gram models and compression algorithms such as
Prediction by Partial Matching, whose predictive distributions are determined by counting
exact matches between the recent history and the training set. The result which is
immediately apparent from the samples in this paper is that RNNs (unlike template-based
algorithms) synthesize and reconstitute the training data in a complex way, and rarely
generate the same thing twice. Furthermore, fuzzy predictions do not suffer from the curse
of dimensionality, and are therefore much better at modelling real-valued or multivariate
data than exact matches.
In principle a large enough RNN should be sufficient to generate sequences of

arbitrary complexity. In practice however, standard RNNs are unable to store information
about past inputs for very long time. As well as diminishing their ability to model long-
range structure, this ‘amnesia’ makes them prone to instability when generating sequences.
The problem (common to all conditional generative models) is that if the network's
predictions are only based on the last few inputs, and these inputs were themselves
predicted by the network, it has little opportunity to recover from past mistakes. Having a
longer memory has a stabilizing effect, because even if the network cannot make sense of
its recent history, it can look further back in the past to formulate its predictions. The
problem of instability is especially acute with real-valued data, where it is easy for the
predictions to stray from the manifold on which the training data lies.
25
4.1.3 Architecture
Figure 4.1.3: RNN Architecture

Architecture Explanation:
● The RNN is basically allowing a multi-layer perceptron and connecting with the
hidden layer.
● A Perceptron is an algorithm for supervised learning of binary classifiers. This

algorithm enables neurons to learn and processes elements in the training set one at
a time
● In neural networks, a hidden layer is located between the input and output of the
algorithm, in which the function applies weights to the inputs and directs them
through an activation function as the output. In short, the hidden layers perform
nonlinear transformations of the inputs entered into the network
● That is how hidden layer receives input.
● In the architecture we can clearly see that the input is not only given to the bottom
hidden layer but also to all the hidden layer simultaneously.
26
4.1.4Prediction Network
The Fig 4.1.2 illustrates the basic recurrent neural network prediction
architecture used in this paper. An input vector sequence x = (x 1,……,xT ) is
passed through weighted connections to a stack of N recurrently connected
hidden layers to compute first the hidden vector sequences hn = (ℎ1𝑛 ,…..,ℎ𝑛𝑇 ) and
then the output vector sequence y = (y1,…..,yT ). Each output vector yt is used to
parameterize a predictive distribution Pr(xt+1|yt) over the possible next inputs

xt+1. The first element x1 of every input sequence is always a null vector whose
entries are all zero; the network therefore emits a prediction for x2, the first real
input, with no prior information. The network is `deep' in both space and time,
in the sense that every piece of information passing either vertically or
horizontally through the computation graph will be acted on by multiple
successive weight matrices and nonlinearities. The ‘skip connections’ from the
inputs to all hidden layers, and from all hidden layers to the outputs. These make
it easier to train deep networks, by reducing the number of processing steps
between the bottom of the network and the top, and thereby mitigating the
`vanishing gradient' problem [1]. In the special case that N = 1 the architecture
reduces to an ordinary, single layer next step prediction RNN. The hidden layer
activations are computed by iterating the following equations from t = 1 to T and
from n = 2 to N:
where the W terms denote weight matrices (e.g. Wihn is the weight matrix
connecting the inputs to the nth hidden layer, Wh1h1 is the recurrent connection at
the first hidden layer, and so on), the b terms denote bias vectors (e.g. by is output
bias vector) and H is the hidden layer function.
27
Given the hidden sequences, the output sequence is computed as follows:
where Y is the output layer function. The complete network therefore defines a
function, parameterized by the weight matrices, from input histories x1:t to output
vectors yt. The output vectors yt are used to parameterize the predictive distribution
Pr(xt+1jyt) for the next input. The form of Pr(xt+1jyt) must be chosen carefully to
match the input data. In particular, finding a good predictive distribution for high-
dimensional, real-valued data (usually referred to as density modelling), can be very
challenging.
The probability given by the network to the input sequence x is
and the sequence loss L(x) used to train the network is the negative logarithm of
Pr(x):
The partial derivatives of the loss with respect to the network weights can be
efficiently calculated with backpropagation through time applied to the computation
graph shown in Fig.4.1.2, and the network can then be trained with gradient descent.
28
4.2 Long Short-Term Memory
4.2.1 Definition
Long short-term memory (LSTM) is an artificial recurrent neural network (RNN)

architecture used in the field of deep learning. Unlike standard feedforward neural
networks, LSTM has feedback connections. It can not only process single data
points (such as images), but also entire sequences of data (such as speech or video).
For example, LSTM is applicable to tasks such as unsegmented, connected
handwriting recognition, speech recognition and anomaly detection in network
traffic or IDSs (intrusion detection systems). LSTMs were developed to deal with
the vanishing gradient problem that can be encountered when training traditional
RNNs. Relative insensitivity to gap length is an advantage of LSTM over RNNs,
hidden Markov models and other sequence learning methods in numerous
applications.
29
4.2.2 Functionality
LSTM is an RNN architecture designed to have a better memory. It uses linear

memory cells surrounded by multiplicative gate units to store read, write and reset
information. A common LSTM unit is composed of a cell, an input gate, an output
gate and a forget gate. The cell remembers values over arbitrary time intervals and
the three gates regulate the flow of information into and out of the cell. LSTMs are
explicitly designed to avoid the long-term dependency problem. Remembering
information for long periods of time is practically their default behavior, not
something they struggle to learn! All recurrent neural networks have the form of a
chain of repeating modules of neural network. In standard RNNs, this repeating
module will have a very simple structure, such as a single tanh layer.
The repeating module in a standard RNN contains a single layer
LSTMs also have this chain like structure, but the repeating module has a different
structure. Instead of having a single neural network layer, there are four, interacting
in a very special way.
30
The repeating module in an LSTM contains four interacting layers.
In the above diagram, each line carries an entire vector, from the output of one node
to the inputs of others. The pink circles represent pointwise operations, like vector
addition, while the yellow boxes are learned neural network layers. Lines merging
denote concatenation, while a line forking denote its content being copied and the
copies going to different locations.
The key to LSTMs is the cell state, the horizontal line running through the top of
the diagram. The cell state is kind of like a conveyor belt. It runs straight down the
entire chain, with only some minor linear interactions. It’s very easy for information
to just flow along it unchanged.
31
The LSTM does have the ability to remove or add information to the cell state,
carefully regulated by structures called gates. Gates are a way to optionally let
information through. They are composed out of a sigmoid neural net layer and a
pointwise multiplication operation.
The sigmoid layer outputs numbers between zero and one, describing how much of
each component should be let through. A value of zero means “let nothing through,”
while a value of one means “let everything through!”
An LSTM has three of these gates, to protect and control the cell state.
32
The recurrent structure allows the model to feed information forward from past
iterations. Arrows represent how data flows through the model (gradients flow
backwards)
For the version of LSTM used in our project H is implemented by the following
composite function:
where σ is the logistic sigmoid function, and i, f, o and c are respectively the input
gate, forget gate, output gate, cell and cell input activation vectors, all of which are
the same size as the hidden vector h. The weight matrix subscripts have the obvious
meaning, for example Whi is the hidden-input gate matrix, Wxo is the input-output
gate matrix etc. The weight matrices from the cell to gate vectors (e.g. Wci) are
diagonal, so element m in each gate vector only receives input from element m of
33
the cell vector. The bias terms (which are added to i, f, c and o) have been omitted
for clarity.
The original LSTM algorithm used a custom designed approximate gradient

calculation that allowed the weights to be updated after every timestep. However
the full gradient can instead be calculated with backpropagation through time, the
method used in this paper. One difficulty when training LSTM with the full gradient
is that the derivatives sometimes become excessively large, leading to numerical
problems.
To prevent this, all the experiments in our project clipped the derivative of the loss
with respect to the network inputs to the LSTM layers (before the sigmoid and tanh
functions are applied) to lie within a predefined range.
Figure 4.2.2: Long Short-Term Memory Cell
34
● We can see in the diagram that LSTM basically uses these multiplicative
gates for various operations.
● These three gate units act like differentiable versions of read write and reset
operations.
o Input gate: scales input to cell (write)
o Output gate: scales output from cell (read)
o Forget gate: scales old cell value (reset)
● Cell is the place where the memory is stored in the network.
● This gives LSTM to store information for longer times unlike RNN.
● The output gate allows to read, so it has to be open so that reading can happen.
● As long as forget gate is open info remains in the cell
Advantages of LSTM:
● It helps to achieve stable levels of generalization.
● Synthetic training data can be created.
● Situations are simulated.
● Practical tasks can be accomplished.
● It gives us a way to understand data clearly
4.3 Mixture Density Outputs
4.3.1 Definition
The idea of mixture density networks is to use the outputs of a neural network to
parameterise a mixture distribution. A subset of the outputs are used to define the
mixture weights, while the remaining outputs are used to parameterise the
individual mixture components. Mixture density outputs can also be used with
recurrent neural networks. In this case the output distribution is conditioned not
35
only on the current input, but on the history of previous inputs. Intuitively, the
number of components is the number of choices the network has for the next
output given the inputs so far.
4.3.2 Functionality
The mixture weight outputs are normalised with a softmax function to ensure they
form a valid discrete distribution, and the other outputs are passed through suitable
functions to keep their values within meaningful range (for example the
exponential function is typically applied to outputs used as scale parameters, which
must be positive).
Mixture density network are trained by maximising the log probability density
of the targets under the induced distributions. Note that the densities are normalised
(up to a fixed constant) and are therefore straight forward to differentiate and pick
unbiased sample from, in contrast with restricted Boltzmann machines and other
undirected models.
For the handwriting experiments in our project, the basic RNN architecture and
update equations remain unchanged. Each input vector xt consists of a real-valued
pair x1; x2 that defines the pen offset from the previous input, along with a binary
x3 that has value 1 if the vector ends a stroke (that is, if the pen was lifted of the
board before the next vector was recorded) and value 0 otherwise. A mixture of
bivariate Gaussians was used to predict x1 and x2, while a Bernoulli distribution
was used for x3. Each output vector yt therefore consists of the end of stroke
probability e, along with a set of means µj , standard deviations σj , correlations ρj
and mixture weights πj for the M mixture components. That is
36
Note that the mean and standard deviation are two dimensional vectors, whereas
the component weight, correlation and end-of-stroke probability are scalar. The
vectors yt are obtained from the network outputs yt, where:
as follows:
The probability density Pr(xt+1jyt) of the next input xt+1 given the output vector
yt is defined as follows:
37
where :
with
This can be substituted into Eq. (6) to determine the sequence loss (up to a constant
that depends only on the quantisation of the data and does not influence network
training):
38
Then observing that
39
Fig below illustrates the operation of a mixture density output layer applied to
online handwriting prediction
Figure : Mixture density outputs for handwriting prediction.

The top heatmap shows the sequence of probability distributions for the predicted
pen locations as the word ùnder' is written. The densities for successive
predictions are added together, giving high values where the distributions overlap.
Two types of prediction are visible from the density map: the small blobs
that spell out the letters are the predictions as the strokes are being written, the
three large blobs are the predictions at the ends of the strokes for the first point in
the next stroke. The end-of-stroke predictions have much higher variance because
the pen position was not recorded when it was off the whiteboard, and hence there
may be a large distance between the end of one stroke and the start of the next.
The bottom heatmap shows the mixture component weights during the same
sequence. The stroke ends are also visible here, with the most active components
switching off in three places, and other components switching on: evidently end-
of-stroke predictions use a different set of mixture components from in-stroke
predictions.
40
4.3.3 Handwriting Prediction
To test whether the prediction network could also be used to generate convincing
real-valued sequences, we applied it to online handwriting data (online in this
context means that the writing is recorded as a sequence of pen-tip locations, as
opposed to offline handwriting, where only the page images are available). Online
handwriting is an attractive choice for sequence generation due to its low
dimensionality (two real numbers per data point) and ease of visualization.
IAM-OnDB consists of handwritten lines collected from 221 different writers

using a smart whiteboard. The writers were asked to write forms from the
Lancaster-Oslo-Bergen text corpus, and the position of their pen was tracked using
an infra-red device in the corner of the board. IAM-OnDB is divided into a training
set, two validation sets and a test set, containing handwritten lines taken from
several inputs.
Experiments:
Each point in the data sequences consisted of three numbers: the x and y o set from
the previous point, and the binary end-of-stroke feature. The network input layer
was therefore size 3. The co-ordinate offsets were normalised to mean 0, std. dev. 1
over the training set. 20 mixture components were used to model the offsets, giving
a total of 120 mixture parameters per timestep (20 weights, 40 means, 40 standard
deviations and 20 correlations). A further parameter was used to model the end-of-
stroke probability, giving an output layer of size 121. Two network architectures
were compared for the hidden layers: one with three hidden layers, each consisting
of 400 LSTM cells, and one with a single hidden layer of 900 LSTM cells. Both
networks had around 3.4M weights. The three layer network was retrained with
adaptive weight noise, with all std. devs. initialised to 0.075. Training with fixed
variance weight noise proved ineffective, probably because it prevented the mixture
density layer from using precisely specifed weights.
41
Table below shows that the three-layer network had an average per-sequence loss
15.3 nats lower than the one-layer net. However, the sum-squared-error was
slightly lower for the single layer network. the use of adaptive weight noise
reduced the loss by another 16.7 nats relative to the unregularized three-layer
network, but did not significantly change the sum-squared error. The adaptive
weight noise network appeared to generate the best samples.
Network Regularisation Log-Loss SSE
1 layer None -1025.7 0.40
3 layer None -1041.0 0.41
3 layer Adaptive-Weight -1057.7 0.41

Noise
TABLE: Handwriting Prediction Results

All results recorded on the validation set. `Log-Loss' is the mean value of L(x) (in
nats). `SSE' is the mean sum-squared-error per data point.
42
Fig. below shows handwriting samples generated by the prediction network. The
network has clearly learned to model strokes, letters and even short words
(especially common ones such as òf' and `the'). It also appears to have learned a
basic character level language models, since the words it invents (èald', `bryoes',
`lenrest') look somewhat plausible in English. Given that the average character
occupies more than 25 timesteps, this again demonstrates the network's ability
to generate coherent long-range structures.
Figure: Online Handwriting Samples Generated by the Prediction network
All sample are 700 timestamps long, at the same time as it makes the predictions,
so that it dynamically determines an alignment between the text and the pen
locations. Put simply, it learns to decide which character to write next.
43
4.3.4 Handwriting Synthesis
Handwriting synthesis is the generation of handwriting for a given text. Clearly

the prediction networks we have described so far are unable to do this, since
there is no way to constrain which letters the network writes. This section
describes an augmentation that allows a prediction network to generate data
sequences conditioned on some high-level annotation sequence (a character
string, in the case of handwriting synthesis). The resulting sequences are
sufficiently convincing that they often cannot be distinguished from real
handwriting. Furthermore, this realism is achieved without sacrificing the
diversity in writing style demonstrated in the previous section.
sequences are of very different lengths (the pen trace being on average twenty-five
times as long as the text), and the alignment between them is unknown until the data
is generated. This is because the number of co-ordinates used to write each character
varies greatly according to style, size, pen speed etc. One neural network model able
to make sequential predictions based on two sequences of different length and
unknown alignment is the RNN transducer. However preliminary experiments on
handwriting synthesis with RNN transducers were not encouraging.
44
4.3.5 Synthesis Network
Figure: Synthesis Network Architecture
Circles represent layers, solid lines represent connections and dashed lines
represent predictions. The topology is similar to the prediction network in Fig.
4.1.2, except that extra input from the character sequence c, is presented to the
hidden layers via the window layer (with a delay in the connection to the first
hidden layer to avoid a cycle in the graph).
Fig above illustrates the network architecture used for handwriting synthesis. As
with the prediction network, the hidden layers are stacked on top of each other,
45
each feeding up to the layer above, and there are skip connections from the
inputs to all hidden layers and from all hidden layers to the outputs. The difference
is the added input from the character sequence, mediated by the window layer.
Given a length U character sequence c and a length T data sequence x, the soft
window wt into c at timestep t (1≤ t≤ T) is defined by the following discrete
convolution with a mixture of K Gaussian functions
where ϕ(t; u) is the window weight of cu at timestep t. Intuitively, the _t

parameters control the location of the window, the βt parameters control the width
of the window and the αt parameters control the importance of the window within
the mixture. The size of the soft window vectors is the same as the size of the
character vectors cu (assuming a one-hot encoding, this will be the number of
characters in the alphabet). Note that the window mixture is not normalised and
hence does not determine a probability distribution; however the window weight
ϕ(t; u) can be loosely interpreted as the network's belief that it is writing character
cu at time t. Fig. below shows the alignment implied by the window weights
during a training sequence.
The size 3K vector p of window parameters is determined as follows by the
outputs of the first hidden layer of the network:
46
4.3.6 Alignment
Figure: Window weights during a handwriting synthesis sequence
Each point on the map shows the value of ϕ(t; u), where t indexes the pen trace
along the horizontal axis and u indexes the text character along the vertical axis.
The bright line is the alignment chosen by the network between the characters and
the writing. Notice that the line spreads out at the boundaries between characters;
this means the network receives information about next and previous letters as it
makes transitions, which helps guide its predictions.
47
The location parameters kt are defined as offsets from the previous locations ct-1,
and that the size of the offset is constrained to be greater than zero. Intuitively,
this means that network learns how far to slide each window at each step, rather
than an absolute location. Using offsets was essential to getting the network to
align the text with the pen trace.
The wt vectors are passed to the second and third hidden layers at time t, and the
first hidden layer at time t+1 (to avoid creating a cycle in the processing graph).
The update equations for the hidden layers are
48
Note that yt is now a function of c as well as x1:t
49
Fig. below illustrates the operation of a mixture density output layer applied to
handwriting synthesis.
Figure: Mixture density outputs for Handwriting Synthesis.
The top heatmap shows the predictive distributions for the pen locations, the
bottom heatmap shows the mixture component weights. Comparison with Fig. of
mixture density outputs for handwriting prediction indicates that the synthesis
network makes more precise predictions (with smaller density blobs) than the
prediction-only network, especially at the ends of strokes, where the synthesis
network has the advantage of knowing which letter come next.
50
4.4 Experiments
The synthesis network was applied to the same input data as the handwriting
prediction network in the previous section. The character-level transcriptions
from the IAM-OnDB were now used to define the character sequences c. The full
transcriptions contain 80 distinct characters (capital letters, lower case letters,
digits, and punctuation). However we used only a subset of 57, with all the digits
and most of the punctuation characters replaced with a generic `nonletter' label
The network architecture was as similar as possible to the best prediction network:
three hidden layers of 400 LSTM cells each, 20 bivariate Gaussian mixture
components at the output layer and a size 3 input layer. The character sequence was
encoded with one-hot vectors, and hence the window vectors were size 57. A
mixture of 10 Gaussian functions was used for the window parameters, requiring a
size 30 parameter vector. The total number of weights was increased to
approximately 3.7M. The network was trained with rmsprop, using the same
parameters as inthe previous section.
The network was retrained with adaptive weight noise, initial standard deviation
0.075, and the output and LSTM gradients were again clipped in the range [-100;
100] and [-10; 10] respectively. Table below shows that adaptive weight noise gave
a considerable improvement in log-loss (around 31.3 nats) but no signi_cant change
in sum-squared error. The regularised network appears to generate slightly more
realistic sequences, although the difference is hard to discern by eye. Both networks
performed considerably better than the best prediction network. In particular the
sumsquared- error was reduced by 44%. This is likely due in large part to the
improved predictions at the ends of strokes, where the error is largest.
51
Table : Handwriting Synthesis Results. All results recorded on the validation set.
`Log-Loss' is the mean value of L(x) in nats. `SSE' is the mean sum-squared-error
per data point.
Regularisation Log-Loss SSE
None -1096.9 0.23
Adaptive weight noise -1128.2 0.23
52
4.4.1 Unbiased Sampling:
Given c, an unbiased sample can be picked from Pr(x|c) by iteratively drawing xt+1
from Pr (xt+1|yt), just as for the prediction network. The only difference is that we
must also decide when the synthesis network has finished writing the text and should
stop making any future decisions. To do this, we use the following heuristic: as soon
as φ(t, U + 1) > φ(t, u) ∀ 1 ≤ u ≤ U the current input xt is defined as the end of the
sequence and sampling ends. Examples of unbiased synthesis samples. These and
all subsequent figures were generated using the synthesis network retrained with
adaptive weight noise. Notice how stylistic traits, such as character size, slant,
cursiveness etc. vary widely between the samples, but remain more-or-less
consistent within them. This suggests that the network identifies the traits early on
in the sequence, then remembers them until the end. By looking through enough
samples for a given text, it appears to be possible to find virtually any combination
of stylistic traits, which suggests that the network models them independently both
from each other and from the text.
53
4.4.2 Biased Sampling
One problem with unbiased samples is that they tend to be difficult to read (partly
because real handwriting is difficult to read, and partly because the network is an
imperfect model). Intuitively, we would expect the network to give higher
probability to good handwriting because it tends to be smoother and more
predictable than bad handwriting. If this is true, we should aim to output more
probable elements of Pr(x|c) if we want the samples to be easier to read. A principled
search for high probability samples could lead to a difficult inference problem, as
the probability of every output depends on all previous outputs.
However a simple heuristic, where the sampler is biased towards more probable
predictions at each step independently, generally gives good results. Define the
probability bias b as a real number greater than or equal to zero.
Before drawing a sample from Pr(xt+1|yt), each standard deviation σ j t in the
Gaussian mixture is recalculated from Eq. (21) to
σ j t = exp σˆ j t − b (61)
and each mixture weight is recalculated from Eq. (19)
to π j t = exp πˆ j t (1 + b) PM j 0=1 exp πˆ j 0 t (1 + b)

(62)
This artificially reduces the variance in both the choice of component from the
mixture, and in the distribution of the component itself. When b = 0 unbiased
sampling is recovered, and as b → ∞ the variance in the sampling disappears.
54
55
5.MODULES
The system has been developed in three different Modules.
Each module has its own unique functionality.
5.5.1 Pre-Processing Module:
The first module is used to perform essential operations on Raw data , thus
converting it into processed Data. The Raw data is in the form of xml , so this
python file / module searches the directory for raw data and when it finds the
raw data , it performs operations on data such as normalization and splitting of
strokes. After the Preprocessing is done, it generates 2 NPY(NumPy) files and
2 PKL(Pickle) files.
Python Modules used are:
● NumPy
● Os
● Pickle
● ElementTree
Fig 5.1 Snippet of Preprocessing Module

56
5.5.2 Training Module:
Train: The data that has been pre-processed can be used to train the model. In the
Training Module 3 Classes are implemented these are:
● WindowLayer
● MixtureLayer
● RNNModel
The Most Important class being the RNNModel , this is the final end model class
with which we’ll create our model , It is a Implementation of a LSTM RNN . A
function named create_graph is used to instantiate a TensorFlow graph and there’s
a embedded function in this function called create_model which will create a
model, the model is trained on pre-processed data generated in previous step. This
data is fed to the model in batches which are generated through Batch Processing
utility Module. Log-Loss is the Metric measured to check the efficiency of the
Model.The Model is optimized using AdamOptimizer. Finally, the model is saved.
Fig 5.2 Snippet of Training Module

57
5.5.3 Generate Module
Generate: The generate module is used to synthesize handwriting, it can provide
various information like log-loss etc...
When model is trained, we can use generate.py script to test how it works. Without
providing --text argument this script will ask you what to generate in a loop.
Additional options for generation:
● --bias (float) - with higher bias generated handwriting is more clear so to
speak
● --noinfo - plots only generated handwriting (without attention window)
● --animation - animation of writing
● --style - style of handwriting, int from 0 to 7
Fig 5.3 Code Snippet of Generate Module

58
6.Execution and Output
6.1 Pre-Processing Module
The Pre-Processing module was executed directly from command line , before
execution of this module , it is essential to extract the IAM-Online dataset into a
folder named data in the directory where this module is present to avoid errors.
Fig 6.1 Command for Executing this Module
Fig 6.2 After Execution of this Module

59
6.2 Training Module:
This Module Creates and Trains a Model. Data used is the one generated by
Preprocessing module. Training can be done on CPU or GPU or TPU. A GPU
Provides highly parallel execution and hence is more advisable to use than a
CPU.TPU’s are the Fastest among the 3.
Command: python -W ignore train.py
The above command is used to execute this training script which then starts to
train the model on the data. Here Epoch denotes the number of times the Model
has seen the complete data.
Fig 6.3 Training the Model
6.3 Generating Module
In this Module, we load the Model first along with the Translation.pkl file, which
contains mapping of various alphabets to their corresponding written text. After
loading the model, we synthesize the text which is supplied by the user using the –
60
text flag. There are various other flags such as –animation which shows how the
text was animated, --bias which adds a bias to the model etc…
Fig 6.4 Running the generate script
Fig 6.5 Hello world generated from generate module
61
7. FUTURE SCOPE
This project is not limited to handwriting data. It can be functional to any sequential
data with few tweaks. Moreover, in future, this designed model can be applied in a
much more useful real-time application. It would also be interesting to develop a
mechanism to automatically extract high-level annotations from sequence data. In
the case of handwriting, this could allow for more nuanced annotations than just
text, for example stylistic features, different forms of the same letter, information
about stroke order and so on. Several directions for future work suggest themselves.
One is the application of the network to speech synthesis, which is likely to be more
challenging than handwriting synthesis due to the greater dimensionality of the data
points. Another is to gain a better insight into the internal representation of the data
and to use this to manipulate the sample distribution directly.
62
8. CONCLUSION
This project, “Realistic Handwriting generation Using RNNs” is a complicated task
and it is even more difficult to mimic the particular style. With the explained model,
we can achieve satisfying results. Results of the model totally depend on its
hyperparameters and bias. Tuning them properly is necessary. We demonstrated the
ability of Long Short-Term Memory recurrent neural networks to generate both
discrete and real-valued sequences with complex, long-range structure using next-
step prediction. And also, the model needs more amount of data for efficient results.
It has also introduced a novel convolutional mechanism that allows a recurrent
network to condition its predictions on an auxiliary annotation sequence, and used
this approach to synthesise diverse and realistic samples of online handwriting.
63
REFERENCES
1) Y. Bengio, P. Simard, and P. Frasconi. Learning long-term dependencies

with gradient descent is difficult. IEEE Transactions on Neural Networks,
March 1994.
2) C. Bishop. Neural Networks for Pattern Recognition. Oxford University
Press, Inc., 1995.
3) F. Gers, N. Schraudolph, and J. Schmidhuber. Learning precise timing with
LSTM recurrent networks. Journal of Machine Learning Research,2002.
4) Graves, A. Mohamed, and G. Hinton. Speech recognition with deep
recurrent neural networks. In Proc. ICASSP, 2013.
5) T. Mikolov. Statistical Language Models based on Neural Networks. PhD
thesis, Brno University of Technology, 2012.
6) A.Graves. Sequence transduction with recurrent neural networks. In ICML
Representation Learning Worksop, 2012.
7) A.Graves. Practical variational inference for neural networks. In Advances
in Neural Information Processing Systems, volume 24, pages 2348-
2356,2011.
8) A.Graves and J. Schmidhuber. Framewise phoneme classication with
bidirectional LSTM and other neural network architectures. Neural
Networks, 18:602-610, 2005.
9) A.Graves and J. Schmidhuber. Oine handwriting recognition with
multidimensional recurrent neural networks. In Advances in Neural
Information Processing Systems, volume 21, 2008.
10) A.Bishop. Neural Networks for Pattern Recognition. Oxford University
Press, Inc., 1995.
64

Realistic Handwriting Generation Using RNN's

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Realistic Handwriting Generation Using RNN's

Uploaded by

Copyright:

Available Formats

REALISTIC HANDWRITING GENERATION

In Partial Fulfillment of the Requirements

For the Degree Of

COMPUTER SCIENCE AND ENGINEERING

Mohammed Umar 1604-17-733-079

COMPUTER SCIENCE AND ENGINEERING DEPARTMENT

1. MOHAMMED UMAR (1604-17-733-079)

2. MOHAMMED EHTESHAM UDDIN QURESHI (1604-17-733-102)

3. MOHAMMED ABDUL WASE (1604-17-733-112)

in Partial Fulfillment of the requirements for the award of the degree of

Internal Project Guide Head CSED

This is to certify that work reported in the major project entitled

1. MOHAMMED UMAR (1604-17-733-079)

2. MOHAMMED EHTESHAM UDDIN QURESHI (1604-17-733-102)

3. MOHAMMED ABDUL WASE (1604-17-733-112)

We are happy to express our profound sense of gratitude and indebtedness to

We are pleased to acknowledge our indebtedness to all those who devoted

MOHAMMED EHTESHAM UDDIN QURESHI

MOHAMMED ABDUL WASE

Finding a pattern in a seemingly random event is about, how do we

Hand-writing is one of the most important cryptographic parameters

In this project, we demonstrate how Long Short-term Memory

Keywords: - Hand-writing, Recurrent Neural Networks, Long Short-term

6. EXECUTION and OUTPUT 59

3.2 System Architecture 17

4.3.3 Online Handwriting Samples 43

Hand-writing is one of the most important cryptographic parameters and a vital

2. To be used for drafting official documents

2. Neural Networks for Handwritten English Alphabet Recognition

3. LEARNING ALGORITHMS FOR CLASSIFICATION:

3.1 System Requirements

S.NO HARDWARE SOFTWARE

1. 3Ghz Intel/Amd CPU Python 3.5.0

2. Nvidia Geforce 1650 Visual Studio Code /

Table 3.1: System Requirements

Figure 3.2: System Architecture

We need to generate sequences to improve classification

• To create synthetic training data

The role of Memory(LSTM):

Having a longer memory has several advantages:

• Need to remember the past to predict the future

• Can store and generate longer range patterns

• Especially ‘disconnected’ patterns like balanced quotes and brackets

•More robust to ‘mistakes’

3.3.1 Recurrent Neural Networks(RNNs)

3.3.2 Long Short-Term Memory

● As long as forget gate is open info remains in the cell

3.3.3 Text Prediction

IAM-OnDB consists of handwritten lines collected from 221 different writers

Figure 3.5: Flowchart of System Architecture

The algorithms used are:

1. Recurrent Neural Networks (RNNs)

2. Long Short-Term Memory (LSTM)

4.1 Recurrent Neural Networks(RNNs)

In principle a large enough RNN should be sufficient to generate sequences of

Figure 4.1.3: RNN Architecture

● A Perceptron is an algorithm for supervised learning of binary classifiers. This

● That is how hidden layer receives input.

parameterize a predictive distribution Pr(xt+1|yt) over the possible next inputs

Long short-term memory (LSTM) is an artificial recurrent neural network (RNN)

LSTM is an RNN architecture designed to have a better memory. It uses linear

The repeating module in a standard RNN contains a single layer

The original LSTM algorithm used a custom designed approximate gradient