Next Word Prediction Using Deep Learning

A
Seminar Report
On
Next Word Prediction Using NLG
Submitted in partial fulfillment of the requirements for the award of the degree of
Bachelor of Technology
in
Information Technology
(Session 2022-2023)
Guided By - Submitted by-
Dr. Gajendra Rajawat Sanyam Modi (PCE19IT051)
HOD, Dept. of Information Technology VII Semester
DEPARTMENT OF INFORMATION TECHNOLOGY
POORNIMA COLLEGE OF ENGINEERING, JAIPUR
RAJASTHAN TECHNICAL UNIVERSITY, KOTA
November,2022
Candidate’s Declaration
I hereby declare that the work, which is being presented in the Seminar Report, entitled Next
Word Prediction Using NLG in partial fulfillment for the award of degree of Bachelor of
Technology in Information Technology, and submitted to the Department of Information
Technology, Poornima College of Engineering, Jaipur is a record of my own work carried
under the guidance of Dr. Gajendra Rajawat, Department of Information Technology,
Poornima College of Engineering, Jaipur.
I have not submitted the matter presented in this Seminar Report anywhere else for the award of any other
Degree.
[Signature of Student]
Name of Student: Sanyam Modi
[PCE19IT051]
[Counter Signed by Guide]
Name of Guide: Dr. Gajendra Rajawat
HOD
Information Technology
Date:
Place:
DEPARTMENT OF INFORMATION TECHNOLOGY
Date:
CERTIFICATE
This is to certify that seminar report entitled Next Word Prediction Using NLG has been
submitted by Sanyam Modi (PCE19IT051) in partial fulfillment for the award of the degree of
Bachelor of Technology in Information Technology during the Odd Semester of Session 2022-
23.
The seminar report is found satisfactory and approved for final submission.
1. Ms. Shazia Haque

2. Mrs. Seeta Gupta (Dr. Gajendra Singh Rajawat)
(Seminar Coordinators) Hod, IT

Acknowledgement
I wish to express my sincere thanks and gratitude to Dr. Mahesh Bundele (Principal & Director,
PCE) and Mr. Pankaj Dhemla (Vice-Principal, PCE) for their helping attitude with a keen
interest in completing this dissertation in time.
Nothing concrete can be achieved without an optimum combination of inspection and

perspiration. The idea of presenting this material without adequate thanks to those who give it to
me or pointed in the right direction seems simply indefensible. Generating this piece has been
time consuming and an arduous task and has involved various contributions. It is my pleasure to
acknowledge the help I have received from different individuals, Seminar Guide and all the
faculty members during the project. My first sincere appreciation and gratitude goes to respected
Dr. Gajendra Singh Rajawat, Seminar guide, for his guidance, valuable suggestions and
inspirations. It gives me immense pleasure to express my sincere and whole hearted thanks to
Dr. Gajendra Singh Rajawat (HoD, IT) for giving me the required guidance. I am also
thankful to my seminar coordinators Mrs. Sita Gupta and Ms. Shazia Haque for their
affectionate and undaunted guidance as well as for their morale boosting encouragement with
worthy suggestions and support.
Sanyam Modi
PCE10IT051
Table of Contents
CHAPTER
PARTICULARS PAGE NO.
NO.
Cover Page & Title Page i
Candidate’s Declaration ii
Certificate by the Department iii
Acknowledgement iv
Table of Contents v
List of Tables vi
List of Figures vii
List of Abbreviations viii
Abstract 1
Chapter 1 Introduction to NLG 2
1.1 Introduction 2
1.2 Stages 3
1.3 Applications 5
1.4 Evaluation 6
1.5 History 8
Chapter 2 Next Word Prediction Model 9
2.1 Introduction 9
2.2 Approach 9
2.3 Pre Processing Dataset 10
2.4 Tokenization 10
2.5 Creating the Model 11
2.6 Model Summary 12
2.7 Callbacks 12
2.8 Compile and Fit 12
2.9 Predictions 14
2.10 Observations 14
Chapter 3 LSTM(Long Short Term Memory) 15
3.1 Introduction 15
3.2 Gates 15
3.3 Applications 17
3.4 LSTM vs RNN 18
Chapter 4 Bi-LSTM 19
4.1 Introduction 19
4.2 Model 19
4.3 Bi-LSTM over LSTM 20
4.4 Bi-LSTM Architecture 21
Chapter 5 Result Analysis 22
5.1 Loss Calculations 22
5.2 Model Analysis 23
Chapter 6 Conclusion and Future scope of work 25
6.1 Conclusion 25
6.2 Future Scope 25
References 26
Appendices (if any) -
Copy of the certificate of technical or review paper. -
Copy of the technical or review paper. 28
LIST OF TABLES
TABLE NO. TITLE PAGE NO.

1 Model Summary 12
2 LSTM Parameters 16
3 Result Analysis 22
LIST OF FIGURES
FIGURE NO. TITLE PAGE NO.
1 Venn-Diagram of NLP 2
2 NLG Application in Research 7
3 History of NLG 8
4 Model Plot 9
5 Output of our Model 13
6 LSTM Architecture 15
7 LSTM Gates 16
8 Bi-LSTM Cell 19
9 Bi-LSTM Architecture 21
10 Model Analysis 23
LIST OF ABBERVIATIONS
ML Machine Learning
AI Artificial Intelligence
RNN Recurrent Neural Network
LSTM Long Short-Term Memory
BI-LSTM Bidirectional Long Short-Term Memory
NLG Natural Language Generation
NLP Natural Language Processing
Abstract
Next Word Prediction means Predicting the next word in the sentence which a user is currently
typing. With the help of this model we will be able to predict the next word for the user which
means user will not have to type the next word thus reducing the keystrokes pressed by user
means increasing the typing speed and less grammatical errors. For making this model we will be
using two deep learning algorithms such as LSTM(long short term memory) and
BiLSTM(Bidirectional long short term memory).
Profound learning is a subclass of AI, it impersonates the usefulness of the humanbrain the
manner in which it processes and makes design in current realities for decision making.
Fundamentally a man-made intelligence capability has networks equipped for learning unaided
information that is vague. The succeeding word gauge is performed on dataset comprising of
texts. Next Word Expectation is a use of NLP (Regular Language Processing). It is otherwise
called Language Displaying. Fundamentally it is the most common way of foreseeing the
following word in a sentence. It has numerous applications which are used by a large portion of
us, for example, auto-right which is generally used in Messages/Messages; it has likewise it's use
in MS Wordor google search where gauges the following word in light of our pursuit history or
the hunt we accomplished for globe. In this work we have concentrated on NLP, different
profound learning strategies like LSTM, BiLSTM and played out a relative report. We found
good outcomes in BiLSTM and LSTM. The exactness got utilizing BiLSTM and LSTM are:
66.1% and 58.27% separately.
1
Chapter 1 Introduction to NLG
1.1 Introduction
Natural language technology is, in a sense, the other of NLP programs along with voice
popularity and grammar checking, because it includes changing a few shape of automated
statistics into herbal language, as opposed to the alternative manner around.
NLG is to be outstanding from superficially comparable strategies, commonly stated with the
aid of using names along with "document technology", "file technology", "mail merging",
etc. These strategies contain virtually plugging a hard and fast statistics shape along with a
desk of numbers or a listing of names right into a template so one can produce entire
documents. Due to their restricted flexibility, they generally tend to provide inflexible textual
content, frequently containing grammatical errors (for example, "you've got got one choices
remaining")
Figure 1 : Vein-Diagram of NLP
2
NLG, on the alternative hand, makes use of a few degree of underlying linguistic illustration
of the textual content, so one can make certain that it's miles grammatically accurate and
fluent. Most NLG structures consist of a syntactic realizer (our RealPro product is an
example), which guarantees that grammatical guidelines along with subject-verb settlement
are obeyed; and a textual content planner (along with one created in our Exemplars
framework), which comes to a decision a way to set up sentences, paragraphs, and different
additives of a textual content coherently. Practically any range or fashion of textual content
may be produced, starting from some phrases to a whole file, and diverse output codecs may
be generated, along with RTF textual content or HTML hypertext. Text turbines may even
produce equal texts in more than one languages simultaneously, making them an outstanding
opportunity to computerized translation in lots of domains. A textual content generator also
can produce synthesized or phrase-concatenated speech with a "concept-to-speech" approach,
which makes use of semantic records to generate accurate intonation, in contrast to maximum
contemporary textual content-to-speech technology.
1.2 Stages
The cycle to produce text can be however straightforward as keeping a rundown of canned
text that seems to be reordered, perhaps connected with some paste text. The outcomes might
be palatable in straightforward areas, for example, horoscope machines or generators of
customized business letters. In any case, a refined NLG framework requirements to
incorporate phases of arranging and converging of data to empower the age of text that looks
normal and doesn't become redundant. The ordinary phases of regular language age, as
proposed by Dale and Reiter, are:
3
1.2.1 Content assurance:
Choosing what data to specify in the text. For example, in the dust model above,
choosing whether to unequivocally specify that dust level is 7 in the south east.
1.2.2 Archive organizing:
By and large association of the data to convey. For instance, choosing to depict the
regions with high dust levels first, rather than the areas with low dust levels.
1.2.3 Collection:
Converging of comparative sentences to further develop comprehensibility and
effortlessness. For example, blending the two following sentences:
Grass dust levels for Friday have expanded from the moderate to elevated degrees of
yesterday and Grass dust levels will associate with 6 to 7 across most pieces of the
country into the accompanying single sentence -
Grass dust levels for Friday have expanded from the moderate to elevated degrees of
yesterday with upsides of around 6 to 7 across most pieces of the country.
1.2.4 Lexical decision:
Putting words to the ideas. For instance, concluding whether medium or direct ought to
be utilized while portraying a dust level of 4.
1.2.5 Referring expression generation:
Creating referring expressions that identify objects and regions. For example, deciding to
use in the Northern Isles and far northeast of mainland Scotland to refer to a certain
region in Scotland. This task also includes making decisions about pronouns and other
types of anaphora.
4
1.2.6 Acknowledgment:
Making the real text, which ought to be right as indicated by the guidelines of
punctuation, morphology, and orthography. For instance, utilizing will be for the future
tense of to be.
An elective way to deal with NLG is to utilize "start to finish" AI to fabricate a framework,
without having separate stages as above.as such, we construct a NLG framework via
preparing an AI calculation (frequently a LSTM) on an enormous informational collection of
info information and relating (human-composed) yield texts. The start to finish approach has
maybe been best in picture captioning, that is naturally creating a text based subtitle for a
picture.
1.3 Applications:
1.3.1 Automatic Report Generation :
According to a business point of view, the best NLG applications have been information
to-message frameworks which produce literary synopses of data sets and informational
collections; these frameworks ordinarily perform information examination as well as
message age. Research has demonstrated the way that text based outlines can be more
compelling than diagrams and other visuals for choice help, and that PC created texts can
be predominant (according to the pursuer’s point of view) to human-composed texts.
1.3.2 Image Captioning :
Throughout recent years, there has been an expanded revenue in naturally producing
subtitles for pictures, as a component of a more extensive undertaking to explore the
connection point among vision and language. An instance of information to-message age,
the calculation of picture inscribing (or programmed picture depiction) includes taking a
5
picture, breaking down its visual substance, and producing a text based portrayal
(commonly a sentence) that expresses the most unmistakable parts of the picture.
1.3.3 Chabot’s :
Another region where NLG has been generally applied is mechanized discourse
frameworks, habitually as chatbots. A chatbot or chatterbot is a product application used
to lead an on-line talk discussion through text or text-to-discourse, in lieu of giving direct
contact a live human specialist. While Regular Language Handling (NLP) methods are
applied in translating human information, NLG illuminates the result part regarding the
chatbot calculations in working with continuous discoursed.
1.3.4 Creative Writing and Computational Humor :
Inventive language age by NLG has been conjectured since the field's beginnings. A new
trailblazer in the space is Phillip Parker, who has fostered a weapons store of calculations
prepared to do naturally producing reading material, crossword riddles, sonnets and
books on points going from bookbinding to waterfalls. The appearance of huge pre
trained transformer-based language models, for example, GPT-3 has likewise empowered
leap forwards, with such models exhibiting conspicuous capacity for making composing
undertakings.
1.4 Evaluation :
An extreme objective is the way valuable NLG frameworks are at aiding individuals, which
is the first of the above procedures. Be that as it may, task-based assessments are tedious and
costly, and can be hard to complete (particularly in the event that they require subjects with
specific ability, like specialists). Subsequently (as in different areas of NLP) task-based
assessments are the special case, not the standard.
6
As of late specialists are surveying how well human-appraisals and measurements connect
with (anticipate) task-based assessments. Work is being directed with regards to Age
Challenges shared-task occasions. Starting outcomes propose that human evaluations are
obviously superior to measurements in such manner. As such, human evaluations generally
foresee task-viability essentially somewhat (despite the fact that there are special cases),
while appraisals delivered by measurements frequently don't anticipate task-adequacy well.
These outcomes are primer. Anyway, human appraisals are the most well-known assessment
strategy in NLG; this is differentiation to machine interpretation, where measurements are
broadly utilized.
Figure 2: NLG application in Research
7
1.5 History
NLG has been around since ELIZA was created in the middle of the 1960s, however the
techniques weren't really put to use until the 1990s. NLG techniques include everything from
straightforward template-based systems that produce letters and emails via mail merge to
sophisticated algorithms that comprehend human grammar in great detail. Another method
for achieving NLG is by utilizing machine learning to train a statistical model, generally on a
sizable sample of human-written texts.
Figure 3: History of NLG
8
Chapter 2 : Next Word Prediction Model
2.1 Introduction
What exactly the built-in word prediction model will do is covered in this section. The
model will take into account the final word of a specific sentence and forecast the next potential
word. Deep learning, language modelling, and natural language processing techniques will all be
used. Following the pre-processing of the data, we will begin by evaluating the data. After
tokenizing the data, we will proceed to creating the deep learning model. Utilizing LSTMs, the
deep learning model will be created.
2.2 Approach:
The text data sets are readily available, and we can take into account Project Gutenberg, a
volunteer initiative to scan and archive literary works in order to "promote the manufacture and
delivery of eBooks." The tales, documents, and text data we need for our problem definition may
all be found here.
Figure 4: Model Plot

9
2.3 Pre Processing the Dataset:
Very first step is to purge the Metamorphosis dataset of all extraneous information. The
dataset's beginning and finish will be removed. This information is unimportant to us. The
following should be the starting point:
One morning, when Gregor Samsa woke from troubled dreams, he found.
The dataset's final line should read:
first to get up and stretch out her young body.
Save all the file as Metamorphosis clean.txt after completing this step. We'll use utf-8 encoding
to retrieve the Metamorphosis clean.txt file. We then proceed to replace all extra new lines that
are superfluous, the carriage returned, and the Unicode characters. Lastly, we will make
absolutely sure that all of our words are original. Each word will only be taken into account
once, and all further repeats will be dropped. The model train will run more smoothly as a result
of the redundancy of terms causing less confusion. The full code for pre-processing text data is
provided below.
2.4 Tokenization
Tokenization is the process of breaking up larger textual data, essay, or corpora into more
manageable chunks. These more compact units may take the form of shorter documents or text
data lines. They might also function as a word dictionary.
The Keras Tokenizer enables us to vectorize a text corpus by converting each bit of text into
either a chain of numerals (each integer representing the position of a token in a dictionary) or a
vector with a binary coefficient for each token based on word count or based on tf-idf.
The texts will subsequently be turned into sequences. This is a method of turning the textual
information into integers so that we can analyse them more effectively. Next, we'll produce the
training dataset. The training data will be input into the "X" along with text data. The results for
10
the training examples will be in the 'y'. Therefore, for each input "X," the "y" contains all of the
next word predictions.
By utilising the length extrapolated from tokenizer.word index and adding 1 to it, we will
determine the vocab size. Since 0 is set aside for buffering and we intend to begin counting from
1, we are adding 1. Finally, we will transform the prediction data (y) into categorical vocab size
data. This method changes an integer-based class vector into a binary class matrix. With our loss,
that will be categorical crossentropy, this will be helpful.
2.5 Creating the Model:
We'll create a sequential model. Next, we will set the input and output parameters for a
hidden layers. Given that the predictions will be based on just one word and that word will be the
subject of the response, it is crucial to set the intake length as 1. After that, we will extend our
design with an LSTM layer. We'll give it 1000 units and make careful to mark the sequencing as
true when we return them. To make sure it can pass through another LSTM layer, we do this. We
will also transmit it through an additional 1000 units for the subsequent LSTM layer, but we don't
need to provide the return sequence because it is incorrect by default. With back propagation set
as the activation, we will use the closely packed layer function to send this via a hidden layer
containing 1000 node units. Finally, we run through a output layer with a softmax activation and
the chosen vocabulary size. We are given a large number of possibilities for the outputs that are
equivalent to the vocabulary size thanks to the soft max activation.
11
2.6 Model Summary
Table 1 :Model Summary
2.7 Callbacks
The three necessary callback will be imported so that we may train our model.
ModelCheckpoint, ReduceLROnPlateau, and Tensorboard are the three crucial callbacks.
Let's examine the function that each of these callbacks serves.
2.7.1 Model Check Point :
After training, the parameters of our model are stored using this callback. By setting the
save best only=True option, we only keep the finest weights from our simulation. We'll use
the loss measure to keep an eye on our development.
2.7.2 Reduce LR on Plateau :
After a certain amount of epochs, the optimizer's learning rate is reduced using this
callback. Here, the patience has been set to 3. After three epochs, if the reliability does not
increase, our learning rate is lowered by a factor of 0.2. Loss is another measure that is being
watched in this situation.
12
2.7.3 Tensorboard :
For the purpose of visualising the charts, specifically the graphs plots for precision and
failure, the tensorboard function is utilised. Here, we'll just focus on the next word prediction
loss graph.
The optimal models rely on the measurement loss will be saved to the document
nextword1.h5. When using the predict function to try to guess our next word, this file will be
essential. Three epochs will pass as we wait for the damage to decrease. We will lower the
learning rate if it doesn't get better. Finally, if necessary, we will visualise the graphs and
histograms using the tensorboard function.
2.8 Compile and Fit:
The last phase involves gathering and fitting our model. In this case, we train the model and
save the best parameters to nextword1.h5 so that they may use the saved model instead of
repeatedly training it. In this case, I merely trained using the training data. You can, however,
decide to train using both validation and test data. We utilised categorical crossentropy, a loss
function that calculates the cross-entropy loss among labels and predictions. We'll build our
model using Adam, an optimizer with a 0.001 learning rate, and we'll base it on metric loss.
Our outcome is displayed below.
Figure 5: Output of our Model
13
2.9 Prediction:
We would import the tokenizer file that we have saved in pickle format for the prediction
notebook. The following word model, which has been saved in our database, will subsequently
be loaded. Every one of the input document for which predictions must be made will be
tokenized using the same tokenizer. Following this, we may use the model that was previously
saved to generate predictions about the input phrase.
While running the predictions, we'll use the try and except statements. We are using this
expression because we would not want the programme to leave the loop in the event that the
input sentence cannot be found. The script should be run for as soon as the user requests it. The
user must actively decide to end the script whenever they want to. The user may run the
programme for as long as they choose.
2.10 Observation :
For the metamorphosis dataset, we can develop a superior next word prediction. In roughly 150
epochs, we are able to considerably reduce the loss. On the available dataset, the next word
prediction model we have created is fairly accurate. The prediction's overall quality is good. To
improve the model's prediction, specific pre-processing procedures and model modifications
might be done.
14
Chapter 3 LSTM(Long Short Term Memory)
3.1 Introduction
LSTM is a kind of RNN, a serial network that permits persistent information. It fixes the
RNN's disappearing gradient issue. RNN is mostly used for persistent storage. Similar to how
humans recall what will happen next while watching a film or reading a book, RNNs likewise
function in a similar manner by remembering the prior knowledge and using it to interpret the
current input. The primary flaw of PNN is that they are unable to recall long-term dependencies
due to the vanishing gradient. and lstm is also built to prevent this issue.
Figure 6 : LSTM Architecture
3.2 Gates :
An LSTM recurrent unit attempts to "keep" all of the knowledge that the network has
previously encountered and attempts to "ignore" unnecessary information. To do this, numerous
layers of activation functions—often referred to as "gates"—are added for a range of goals.
Additionally, the Internal Cell State vector is tracked by each LSTM recurrent unit. This vector
gives a conceptual explanation of the data that the LSTM recurrent unit that come before it chose
to store. A Long Short-Term Memory Network is made up of three separate gates, each of which
performs a particular purpose, as will be seen in the following:
15
Figure 7 :LSTM Gates
Table 2: LSTM Parameters
3.2.1 Forget Gate:
The primary task of this gate is to forget the previous data
fgt=σ(Wfg⋅ [ht−1,xt]+bfg)
16
3.2.2 Input Gate:
This gate's main duty is to determine how much data should be written to the Internal
Cell State.
ipt=σ(Wip⋅ [ht−1,xt]+bip)
q~t=tanh(Wq⋅ [ht−1,xt]+bq)
3.2.3 Output Gate:
This gate's main duty is to generate output from the current state.
opt=σ(Wop⋅ [ht−1,xt]+bop)
ht=opt∗ tanh(qt)
3.3 LSTM Applications:
Applications for LSTM networks can be found in the following fields:
 Language simulation
 automated translation
 identification of handwriting
 picture captions
 employing attention models to create images
 answering inquiries
 Conversion of video to text
 modelling of polymorphic music
 Voice synthesiser
 Prediction of protein
17
3.4 LSTM VS RNN
Think about the situation when you need to edit some data in a calendar. An RNN applies
a function to the existing data to accomplish this, entirely altering it. While LSTM just performs
minor adjustments to the data through cell state-based addition or multiplication. This is how
LSTM selectively forgets and recalls information, outperforming RNNs.
Now imagine that you would like to process data that has periodic patterns, such as
forecasting coloured powder sales that surge around the Indian holiday of Holi. Thinking back at
the sales figures from the prior year is a wise move. Therefore, you must understand which
information should be deleted and which should be kept for future use. Otherwise, you'll need a
pretty sharp memory. Theoretically, recurrent neural networks appear to be effective in this.
However, they are unnecessary due to their two drawbacks, inflating gradient and disappearing
gradient.
To address this issue, LSTM introduces memory chunks known as cell states. One could
consider the designed cells to be differentiable memory.
18
Chapter 4: Bi-LSTM(Bidirectional LSTM)
4.1 Introduction
The unidirectional and bidirectional forms of long/short-term memory (BiLSTM) are
identical. The network is different because it connects to both the past and the future. For
instance, by feeding the letters to unidirectional LSTMs one at a time, they can be taught to
predict the word "fish," with the last value being remembered by the recurrent connections over
time. On the reverse path, a BiLSTM will also be fed the subsequent letter in the alphabet,
providing it with access to future knowledge. This teaches the networks to fill in holes rather
than forward information, which may patch a gap in the center of an image rather than expanding
it on the periphery.
Figure 8: Bi-LSTM Cell
4.2 Model
Traditional LSTMs. Bi-directional LSTMs attach to 2 layers that are simultaneously
moving in different directions in order to get the desired result. Data can travel in two directions
when using bi-directional information. The first is a future-to-past transition, while the second is
a future-to-past transition. This is where the unilateral and bidirectional versions differ most. In a
bidirectional model, simultaneous inspection in both directions is performed on any embedding

19
layer connected to the train data.We propose a Bi-directional LSTM-recurrent neural network as
the solution as a consequence. The information is acquired, pre-processed, and then applied to a
word embedding, which provides each word with a vector that reflects a few of its hidden
characteristics. We employ global vectors for word embedding for word embeddings (GloVe).
GloVe is an unsupervised learning-based technique for finding word vector representations in
datasets.We employ pre-trained GloVe embeddings to function with our social media network
news posts. The hidden layer will load the values straight from GloVe instead of using arbitrary
weights, which it would otherwise do. GloVe utilises globally obtained mass statistics to all
terms in news content, and the resulting representations help to define key linear word vector
substructures. The knowledge of the converted word vector has been divided into Train and Test.
We have now incorporated these word embeddings into our Bidirectional LSTM-RNN
model. To eliminate the most beneficial aspects of each filter, the Global Pooling Layer layer is
utilised. A series of densely packed concealed layers receive the value after it has been rerouted.
Last but not least, softmax layers are employed to evaluate the validity of a specific social media
message. All model parameters are subjected to regularisation in order to prevent the problem of
overfitting. A bidirectional LSTM-model RNN's design is depicted in Figure 8.
4.3 Bi-LSTM over LSTM
In practical issues, this kind of design excels, particularly in NLP. The primary
explanation is that each element of an input signal contains data from the past and the present.
For this reason, by integrating LSTM layers through both directions, BiLSTM can generate an
output that is more meaningful.
Each word in the sequence will have a different results from the BiLSTM
(sentence). As an outcome, the BiLSTM model is useful for a variety of NLP
applications, including entity recognition, translation, and sentence classification.
20
Additionally, it has uses in disciplines including handwriting recognition, protein
structure predictions, and speech recognition.
Finally, it's important to note that BiLSTM is a significantly slower model and takes
more training time than LSTM when discussing the drawbacks of BiLSTM comparison to
LSTM. As a result, we advise against utilising it unless absolutely necessary.
4.4 Bi-LSTM Network Architecture
Figure 9: Bi-LSTM Architecture
Making any neural network have the sequence data in both directions—backwards (future to
past) or forward—is known as bidirectional long-short term memory (bi-lstm) (past to future).
A bidirectional LSTM differs from a conventional LSTM in that our data flows in both
directions. We may make input flow in one way, either backwards or forward, using the standard
LSTM. To maintain both past and future information, bi-directional input can be made to flow in
both ways. Let's look at an illustration to help with the explanation.
21
Chapter 5 Result Analysis
The outcomes demonstrate that BiLSTM models outperform conventional unidirectional
LSTMs. It appears that by traversing input data multiple times (from left through right then from
right to left), BiLSTMs are better able to grasp the underlying context. For some types of data,
such as text parsing and word prediction in the input sentence, the superior performance of
BiLSTM over the standard unidirectional LSTM is understandable. It was unclear, nevertheless,
if training mathematical time series data repeatedly and learning from both the present and the
past would improve time series forecasting because some contexts might not exist, as seen in text
parsing. Our findings demonstrate that BiLSTMs outperform conventional LSTMs even when
used to anticipate financial time series data. There are a number of intriguing questions that may
be asked and empirically answered in order to better understand the distinctions between LSTM
and BiLSTM. By doing this, we can gain more knowledge about the operation and behaviour of
these different recurrent neural network variations.
Table 3 : Result Analysis
5.1 Loss Calculation
Deep learning algorithms often report the "loss" figures. Loss is essentially a punishment
for making a bad prediction. In more precise terms, if the model's forecast is accurate, the score
will be nil. Therefore, the objective is to obtain a weight values and biases that minimises the
loss in order to minimise the loss values.
The outcomes demonstrate that BiLSTM outperformed LSTM with efficiency of 66.00% and
58.30%, respectively. BiLSTM has the lowest loss when comparing losses.
22
5.2 Model Analysis
Figure 10: Model Analysis
At its core, LSTM uses the hidden state to preserve data from inputs that have already been
processed.
Due to the fact that it has only ever received inputs from the past, a unidirectional LSTM can
only preserve information from the past.
When using bidirectional, your inputs will be processed in two different directions: one from the
present to the future and the other from the future to the present. This method differs from
unidirectional in that details from the future is preserved in the LSTM that keeps running
backwards, and by combining the two hidden states, you can preserve data from both the present
and the future at any given time.
What they are suitable for is a really complex question, but BiLSTMs perform quite well since
they better comprehend context, as I will attempt to demonstrate with an example.
Let's imagine we want to anticipate the following word in a sentence. A unidirectional LSTM
will, at a high level, see that

23
The boys visited.
With the help of bidirectional LSTM, you will also be able to view data further along the road,
for instance, and will attempt to anticipate the next word purely based on this context.
Front LSTM:
The boys visited.
Reverse LSTM:
Afterward, they exited the pool.
Let's imagine we attempt to foretell the following word in a sentence. You can see how it could
be simpler for the networks to comprehend where the next word is if it uses knowledge from the
future.
24
Chapter 6 Conclusion and Future scope of work
6.1 Conclusion
A variation on standard LSTMs, deep bidirectional LSTMs (BiLSTM) networks train the
desired model not only from inputs to outputs but also from outputs to inputs. Specifically, given
the data A BiLSTM model first feeds input data to an LSTM model (back layer), and then
repeats the training using an additional LSTM model but with the input data sequence reversed
(i.e., Watson-Crick complement). BiLSTM models are said to perform better than standard
LSTMs .
In this work, we conducted a comparative analysis, and the outcomes are displayed in
Table 3 and also reflected in the bar chart in Figure 10. The results show that BiLSTM had the
best performance. Given that BiLSTM was developed to address LSTM model's shortcomings,
the higher performance of the BiLSTM model was anticipated. At the conclusion, it is clear that
using BiLSTM, we had both the highest accuracy and the lowest loss.
We can observe from the graphs in figures 6 and 8 for LSTM and BiLSTM, respectively,
that there is significant overfitting.
6.2 Future Scope
Future work would be done with a current emphasis on encapsulating BiLSTM and a
hybrid CNN/LSTM version. Applying the BERT technique to NLP and examining how these
models behave on datasets from various fields as well as Medium articles would be another topic
of investigation. In the future, a sizable dataset will also be collected in order to forecast not only
the word that will come next, but also the words that will be needed to finish sentences and to
automatically complete search terms or sentences.In future we can extend our model for many
others natural language generation such as auto- poem completion or auto story-completion etc.
We can also make this more personalized to predict the word from users history.
25
References
[I] P. P. Barman and A. Boruah, "A run based approach for next word pre-diction in assamese
phonetic transcriptio n," Sth International Conference on Advances in Computing and
Communication, 2018.
[2] R. Perera and P. Nand, "Recent advances in natural language generation:A survey and
classification of the empirical literature ," Computing and Informatics, vol. 36, pp. 1-32,01 2017.
[3] C. Aliprandi, N. Carmignani, N. Deha, P. Mancarella, and M. Rubino, "Advances in nip
applied to word prediction," J. Mol. BioI., vol. 147,pp. 195- 197,2008.
[4] C. McCormick, Latent Semantic Analysis (LSA) for Text Classification Tutorial, 20 19
(accessed February 3, 20 19). http://mccormickml.comJ 20 16/03/25/1sa-for-text-
classificationtutorial/.
[5] Y. Wang, K. Kim, B. Lee, and H. Y. Youn, "Word clustering based on pos feature for
efficient twitter sentiment analysis," Humancentric Computing and Information Sciences, vol. 8,
p. 17, Jun 2018.
[6] N. N. Shah, N. Bhatt, and A. Ganatra, "A unique word prediction system for text entry in
hindi," in Proceedings ofthe Second International Conference on Information and
Communication Technology for Competitive Strategies, p. 118, ACM, 20 16.
[7] M. K. Sharma and D. Smanta, "Word prediction system for text entry in hindi," ACM Trans.
Asian Lang. Inform. Process, 06 2014.
[8] R. Devi and M. Dua, "Performance evaluation of different similarity functions and
classification methods using web based hindi language question answering system," Procedia
Computer Science, vol. 92,pp. 520-525, 20 16.
[9] S. Hochreiter, "The vanishing gradient problem during learning recurrent neural nets and
problem solutions," International Journal 3ofUncertainty, Fuzziness and Knowledge-Based
Systems, vol. 6, no. 02, pp. 107-116,1998.
[10] D. Pawade, A. Sakhapara, M. Jain, N. Jain, and K. Gada, "Story scram-bler - automatic text
generation using word level rnn-lstm,' Modem Education and Computer Science, 2018.
26
27
Copy of Technical Review:
28
29
30
31

Next Word Prediction Using Deep Learning

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Next Word Prediction Using Deep Learning

Uploaded by

Copyright:

Available Formats

A

Next Word Prediction Using NLG

Guided By - Submitted by-

Dr. Gajendra Rajawat Sanyam Modi (PCE19IT051)

HOD, Dept. of Information Technology VII Semester

DEPARTMENT OF INFORMATION TECHNOLOGY

POORNIMA COLLEGE OF ENGINEERING, JAIPUR

RAJASTHAN TECHNICAL UNIVERSITY, KOTA

Name of Student: Sanyam Modi

[Counter Signed by Guide]

Name of Guide: Dr. Gajendra Rajawat

1. Ms. Shazia Haque

(Seminar Coordinators) Hod, IT

Nothing concrete can be achieved without an optimum combination of inspection and

TABLE NO. TITLE PAGE NO.

FIGURE NO. TITLE PAGE NO.

5 Output of our Model 13

statistics into herbal language, as opposed to the alternative manner around.

Figure 1 : Vein-Diagram of NLP

opportunity to computerized translation in lots of domains. A textual content generator also

can produce synthesized or phrase-concatenated speech with a "concept-to-speech" approach,

contemporary textual content-to-speech technology.

be palatable in straightforward areas, for example, horoscope machines or generators of

customized business letters. In any case, a refined NLG framework requirements to

proposed by Dale and Reiter, are:

1.2.2 Archive organizing:

Converging of comparative sentences to further develop comprehensibility and

effortlessness. For example, blending the two following sentences:

country into the accompanying single sentence -

yesterday with upsides of around 6 to 7 across most pieces of the country.

1.2.4 Lexical decision:

be utilized while portraying a dust level of 4.

1.2.5 Referring expression generation:

preparing an AI calculation (frequently a LSTM) on an enormous informational collection of

1.3.1 Automatic Report Generation :

collections; these frameworks ordinarily perform information examination as well as

be predominant (according to the pursuer’s point of view) to human-composed texts.

1.3.2 Image Captioning :

subtitles for pictures, as a component of a more extensive undertaking to explore the

frameworks, habitually as chatbots. A chatbot or chatterbot is a product application used

chatbot calculations in working with continuous discoursed.

1.3.4 Creative Writing and Computational Humor :

prepared to do naturally producing reading material, crossword riddles, sonnets and

assessments are the special case, not the standard.

obviously superior to measurements in such manner. As such, human evaluations generally

while appraisals delivered by measurements frequently don't anticipate task-adequacy well.

strategy in NLG; this is differentiation to machine interpretation, where measurements are

Figure 2: NLG application in Research

sizable sample of human-written texts.

Figure 3: History of NLG

deep learning model will be created.

all be found here.

Figure 4: Model Plot

following should be the starting point:

The dataset's final line should read:

first to get up and stretch out her young body.

data lines. They might also function as a word dictionary.

next word predictions.

that will be categorical crossentropy, this will be helpful.

2.5 Creating the Model:

equivalent to the vocabulary size thanks to the soft max activation.

Table 1 :Model Summary