You are on page 1of 39

A

Seminar Report
On

Next Word Prediction Using NLG

Submitted in partial fulfillment of the requirements for the award of the degree of

Bachelor of Technology

in

Information Technology

(Session 2022-2023)

Guided By - Submitted by-

Dr. Gajendra Rajawat Sanyam Modi (PCE19IT051)

HOD, Dept. of Information Technology VII Semester

DEPARTMENT OF INFORMATION TECHNOLOGY

POORNIMA COLLEGE OF ENGINEERING, JAIPUR

RAJASTHAN TECHNICAL UNIVERSITY, KOTA

November,2022
Candidate’s Declaration

I hereby declare that the work, which is being presented in the Seminar Report, entitled Next
Word Prediction Using NLG in partial fulfillment for the award of degree of Bachelor of
Technology in Information Technology, and submitted to the Department of Information
Technology, Poornima College of Engineering, Jaipur is a record of my own work carried
under the guidance of Dr. Gajendra Rajawat, Department of Information Technology,
Poornima College of Engineering, Jaipur.

I have not submitted the matter presented in this Seminar Report anywhere else for the award of any other

Degree.

[Signature of Student]

Name of Student: Sanyam Modi

[PCE19IT051]

[Counter Signed by Guide]

Name of Guide: Dr. Gajendra Rajawat

HOD

Information Technology

Date:

Place:
DEPARTMENT OF INFORMATION TECHNOLOGY
Date:

CERTIFICATE
This is to certify that seminar report entitled Next Word Prediction Using NLG has been
submitted by Sanyam Modi (PCE19IT051) in partial fulfillment for the award of the degree of
Bachelor of Technology in Information Technology during the Odd Semester of Session 2022-
23.

The seminar report is found satisfactory and approved for final submission.

1. Ms. Shazia Haque


2. Mrs. Seeta Gupta (Dr. Gajendra Singh Rajawat)

(Seminar Coordinators) Hod, IT


Acknowledgement

I wish to express my sincere thanks and gratitude to Dr. Mahesh Bundele (Principal & Director,
PCE) and Mr. Pankaj Dhemla (Vice-Principal, PCE) for their helping attitude with a keen
interest in completing this dissertation in time.

Nothing concrete can be achieved without an optimum combination of inspection and


perspiration. The idea of presenting this material without adequate thanks to those who give it to
me or pointed in the right direction seems simply indefensible. Generating this piece has been
time consuming and an arduous task and has involved various contributions. It is my pleasure to
acknowledge the help I have received from different individuals, Seminar Guide and all the
faculty members during the project. My first sincere appreciation and gratitude goes to respected
Dr. Gajendra Singh Rajawat, Seminar guide, for his guidance, valuable suggestions and
inspirations. It gives me immense pleasure to express my sincere and whole hearted thanks to
Dr. Gajendra Singh Rajawat (HoD, IT) for giving me the required guidance. I am also
thankful to my seminar coordinators Mrs. Sita Gupta and Ms. Shazia Haque for their
affectionate and undaunted guidance as well as for their morale boosting encouragement with
worthy suggestions and support.

Sanyam Modi

PCE10IT051
Table of Contents

CHAPTER
PARTICULARS PAGE NO.
NO.
Cover Page & Title Page i
Candidate’s Declaration ii
Certificate by the Department iii
Acknowledgement iv
Table of Contents v
List of Tables vi
List of Figures vii
List of Abbreviations viii
Abstract 1
Chapter 1 Introduction to NLG 2
1.1 Introduction 2
1.2 Stages 3
1.3 Applications 5
1.4 Evaluation 6
1.5 History 8
Chapter 2 Next Word Prediction Model 9
2.1 Introduction 9
2.2 Approach 9
2.3 Pre Processing Dataset 10
2.4 Tokenization 10
2.5 Creating the Model 11
2.6 Model Summary 12
2.7 Callbacks 12
2.8 Compile and Fit 12
2.9 Predictions 14
2.10 Observations 14
Chapter 3 LSTM(Long Short Term Memory) 15
3.1 Introduction 15
3.2 Gates 15
3.3 Applications 17
3.4 LSTM vs RNN 18
Chapter 4 Bi-LSTM 19
4.1 Introduction 19
4.2 Model 19
4.3 Bi-LSTM over LSTM 20
4.4 Bi-LSTM Architecture 21
Chapter 5 Result Analysis 22
5.1 Loss Calculations 22
5.2 Model Analysis 23
Chapter 6 Conclusion and Future scope of work 25
6.1 Conclusion 25
6.2 Future Scope 25
References 26
Appendices (if any) -
Copy of the certificate of technical or review paper. -
Copy of the technical or review paper. 28

LIST OF TABLES

TABLE NO. TITLE PAGE NO.


1 Model Summary 12
2 LSTM Parameters 16

3 Result Analysis 22
LIST OF FIGURES

FIGURE NO. TITLE PAGE NO.

1 Venn-Diagram of NLP 2
2 NLG Application in Research 7

3 History of NLG 8

4 Model Plot 9

5 Output of our Model 13

6 LSTM Architecture 15

7 LSTM Gates 16

8 Bi-LSTM Cell 19

9 Bi-LSTM Architecture 21

10 Model Analysis 23
LIST OF ABBERVIATIONS

ML Machine Learning
AI Artificial Intelligence
RNN Recurrent Neural Network
LSTM Long Short-Term Memory
BI-LSTM Bidirectional Long Short-Term Memory
NLG Natural Language Generation
NLP Natural Language Processing
Abstract

Next Word Prediction means Predicting the next word in the sentence which a user is currently
typing. With the help of this model we will be able to predict the next word for the user which
means user will not have to type the next word thus reducing the keystrokes pressed by user
means increasing the typing speed and less grammatical errors. For making this model we will be
using two deep learning algorithms such as LSTM(long short term memory) and
BiLSTM(Bidirectional long short term memory).
Profound learning is a subclass of AI, it impersonates the usefulness of the humanbrain the
manner in which it processes and makes design in current realities for decision making.
Fundamentally a man-made intelligence capability has networks equipped for learning unaided
information that is vague. The succeeding word gauge is performed on dataset comprising of
texts. Next Word Expectation is a use of NLP (Regular Language Processing). It is otherwise
called Language Displaying. Fundamentally it is the most common way of foreseeing the
following word in a sentence. It has numerous applications which are used by a large portion of
us, for example, auto-right which is generally used in Messages/Messages; it has likewise it's use
in MS Wordor google search where gauges the following word in light of our pursuit history or
the hunt we accomplished for globe. In this work we have concentrated on NLP, different
profound learning strategies like LSTM, BiLSTM and played out a relative report. We found
good outcomes in BiLSTM and LSTM. The exactness got utilizing BiLSTM and LSTM are:
66.1% and 58.27% separately.

1
Chapter 1 Introduction to NLG

1.1 Introduction

Natural language technology is, in a sense, the other of NLP programs along with voice

popularity and grammar checking, because it includes changing a few shape of automated

statistics into herbal language, as opposed to the alternative manner around.

NLG is to be outstanding from superficially comparable strategies, commonly stated with the

aid of using names along with "document technology", "file technology", "mail merging",

etc. These strategies contain virtually plugging a hard and fast statistics shape along with a

desk of numbers or a listing of names right into a template so one can produce entire

documents. Due to their restricted flexibility, they generally tend to provide inflexible textual

content, frequently containing grammatical errors (for example, "you've got got one choices

remaining")

Figure 1 : Vein-Diagram of NLP

2
NLG, on the alternative hand, makes use of a few degree of underlying linguistic illustration

of the textual content, so one can make certain that it's miles grammatically accurate and

fluent. Most NLG structures consist of a syntactic realizer (our RealPro product is an

example), which guarantees that grammatical guidelines along with subject-verb settlement

are obeyed; and a textual content planner (along with one created in our Exemplars

framework), which comes to a decision a way to set up sentences, paragraphs, and different

additives of a textual content coherently. Practically any range or fashion of textual content

may be produced, starting from some phrases to a whole file, and diverse output codecs may

be generated, along with RTF textual content or HTML hypertext. Text turbines may even

produce equal texts in more than one languages simultaneously, making them an outstanding

opportunity to computerized translation in lots of domains. A textual content generator also

can produce synthesized or phrase-concatenated speech with a "concept-to-speech" approach,

which makes use of semantic records to generate accurate intonation, in contrast to maximum

contemporary textual content-to-speech technology.

1.2 Stages

The cycle to produce text can be however straightforward as keeping a rundown of canned

text that seems to be reordered, perhaps connected with some paste text. The outcomes might

be palatable in straightforward areas, for example, horoscope machines or generators of

customized business letters. In any case, a refined NLG framework requirements to

incorporate phases of arranging and converging of data to empower the age of text that looks

normal and doesn't become redundant. The ordinary phases of regular language age, as

proposed by Dale and Reiter, are:

3
1.2.1 Content assurance:

Choosing what data to specify in the text. For example, in the dust model above,

choosing whether to unequivocally specify that dust level is 7 in the south east.

1.2.2 Archive organizing:

By and large association of the data to convey. For instance, choosing to depict the

regions with high dust levels first, rather than the areas with low dust levels.

1.2.3 Collection:

Converging of comparative sentences to further develop comprehensibility and

effortlessness. For example, blending the two following sentences:

Grass dust levels for Friday have expanded from the moderate to elevated degrees of

yesterday and Grass dust levels will associate with 6 to 7 across most pieces of the

country into the accompanying single sentence -

Grass dust levels for Friday have expanded from the moderate to elevated degrees of

yesterday with upsides of around 6 to 7 across most pieces of the country.

1.2.4 Lexical decision:

Putting words to the ideas. For instance, concluding whether medium or direct ought to

be utilized while portraying a dust level of 4.

1.2.5 Referring expression generation:

Creating referring expressions that identify objects and regions. For example, deciding to

use in the Northern Isles and far northeast of mainland Scotland to refer to a certain

region in Scotland. This task also includes making decisions about pronouns and other

types of anaphora.

4
1.2.6 Acknowledgment:

Making the real text, which ought to be right as indicated by the guidelines of

punctuation, morphology, and orthography. For instance, utilizing will be for the future

tense of to be.

An elective way to deal with NLG is to utilize "start to finish" AI to fabricate a framework,

without having separate stages as above.as such, we construct a NLG framework via

preparing an AI calculation (frequently a LSTM) on an enormous informational collection of

info information and relating (human-composed) yield texts. The start to finish approach has

maybe been best in picture captioning, that is naturally creating a text based subtitle for a

picture.

1.3 Applications:

1.3.1 Automatic Report Generation :

According to a business point of view, the best NLG applications have been information

to-message frameworks which produce literary synopses of data sets and informational

collections; these frameworks ordinarily perform information examination as well as

message age. Research has demonstrated the way that text based outlines can be more

compelling than diagrams and other visuals for choice help, and that PC created texts can

be predominant (according to the pursuer’s point of view) to human-composed texts.

1.3.2 Image Captioning :

Throughout recent years, there has been an expanded revenue in naturally producing

subtitles for pictures, as a component of a more extensive undertaking to explore the

connection point among vision and language. An instance of information to-message age,

the calculation of picture inscribing (or programmed picture depiction) includes taking a

5
picture, breaking down its visual substance, and producing a text based portrayal

(commonly a sentence) that expresses the most unmistakable parts of the picture.

1.3.3 Chabot’s :

Another region where NLG has been generally applied is mechanized discourse

frameworks, habitually as chatbots. A chatbot or chatterbot is a product application used

to lead an on-line talk discussion through text or text-to-discourse, in lieu of giving direct

contact a live human specialist. While Regular Language Handling (NLP) methods are

applied in translating human information, NLG illuminates the result part regarding the

chatbot calculations in working with continuous discoursed.

1.3.4 Creative Writing and Computational Humor :

Inventive language age by NLG has been conjectured since the field's beginnings. A new

trailblazer in the space is Phillip Parker, who has fostered a weapons store of calculations

prepared to do naturally producing reading material, crossword riddles, sonnets and

books on points going from bookbinding to waterfalls. The appearance of huge pre

trained transformer-based language models, for example, GPT-3 has likewise empowered

leap forwards, with such models exhibiting conspicuous capacity for making composing

undertakings.

1.4 Evaluation :

An extreme objective is the way valuable NLG frameworks are at aiding individuals, which

is the first of the above procedures. Be that as it may, task-based assessments are tedious and

costly, and can be hard to complete (particularly in the event that they require subjects with

specific ability, like specialists). Subsequently (as in different areas of NLP) task-based

assessments are the special case, not the standard.

6
As of late specialists are surveying how well human-appraisals and measurements connect

with (anticipate) task-based assessments. Work is being directed with regards to Age

Challenges shared-task occasions. Starting outcomes propose that human evaluations are

obviously superior to measurements in such manner. As such, human evaluations generally

foresee task-viability essentially somewhat (despite the fact that there are special cases),

while appraisals delivered by measurements frequently don't anticipate task-adequacy well.

These outcomes are primer. Anyway, human appraisals are the most well-known assessment

strategy in NLG; this is differentiation to machine interpretation, where measurements are

broadly utilized.

Figure 2: NLG application in Research

7
1.5 History

NLG has been around since ELIZA was created in the middle of the 1960s, however the

techniques weren't really put to use until the 1990s. NLG techniques include everything from

straightforward template-based systems that produce letters and emails via mail merge to

sophisticated algorithms that comprehend human grammar in great detail. Another method

for achieving NLG is by utilizing machine learning to train a statistical model, generally on a

sizable sample of human-written texts.

Figure 3: History of NLG

8
Chapter 2 : Next Word Prediction Model

2.1 Introduction

What exactly the built-in word prediction model will do is covered in this section. The

model will take into account the final word of a specific sentence and forecast the next potential

word. Deep learning, language modelling, and natural language processing techniques will all be

used. Following the pre-processing of the data, we will begin by evaluating the data. After

tokenizing the data, we will proceed to creating the deep learning model. Utilizing LSTMs, the

deep learning model will be created.

2.2 Approach:

The text data sets are readily available, and we can take into account Project Gutenberg, a

volunteer initiative to scan and archive literary works in order to "promote the manufacture and

delivery of eBooks." The tales, documents, and text data we need for our problem definition may

all be found here.

Figure 4: Model Plot


9
2.3 Pre Processing the Dataset:

Very first step is to purge the Metamorphosis dataset of all extraneous information. The

dataset's beginning and finish will be removed. This information is unimportant to us. The

following should be the starting point:

One morning, when Gregor Samsa woke from troubled dreams, he found.

The dataset's final line should read:

first to get up and stretch out her young body.

Save all the file as Metamorphosis clean.txt after completing this step. We'll use utf-8 encoding

to retrieve the Metamorphosis clean.txt file. We then proceed to replace all extra new lines that

are superfluous, the carriage returned, and the Unicode characters. Lastly, we will make

absolutely sure that all of our words are original. Each word will only be taken into account

once, and all further repeats will be dropped. The model train will run more smoothly as a result

of the redundancy of terms causing less confusion. The full code for pre-processing text data is

provided below.

2.4 Tokenization

Tokenization is the process of breaking up larger textual data, essay, or corpora into more

manageable chunks. These more compact units may take the form of shorter documents or text

data lines. They might also function as a word dictionary.

The Keras Tokenizer enables us to vectorize a text corpus by converting each bit of text into

either a chain of numerals (each integer representing the position of a token in a dictionary) or a

vector with a binary coefficient for each token based on word count or based on tf-idf.

The texts will subsequently be turned into sequences. This is a method of turning the textual

information into integers so that we can analyse them more effectively. Next, we'll produce the

training dataset. The training data will be input into the "X" along with text data. The results for

10
the training examples will be in the 'y'. Therefore, for each input "X," the "y" contains all of the

next word predictions.

By utilising the length extrapolated from tokenizer.word index and adding 1 to it, we will

determine the vocab size. Since 0 is set aside for buffering and we intend to begin counting from

1, we are adding 1. Finally, we will transform the prediction data (y) into categorical vocab size

data. This method changes an integer-based class vector into a binary class matrix. With our loss,

that will be categorical crossentropy, this will be helpful.

2.5 Creating the Model:

We'll create a sequential model. Next, we will set the input and output parameters for a

hidden layers. Given that the predictions will be based on just one word and that word will be the

subject of the response, it is crucial to set the intake length as 1. After that, we will extend our

design with an LSTM layer. We'll give it 1000 units and make careful to mark the sequencing as

true when we return them. To make sure it can pass through another LSTM layer, we do this. We

will also transmit it through an additional 1000 units for the subsequent LSTM layer, but we don't

need to provide the return sequence because it is incorrect by default. With back propagation set

as the activation, we will use the closely packed layer function to send this via a hidden layer

containing 1000 node units. Finally, we run through a output layer with a softmax activation and

the chosen vocabulary size. We are given a large number of possibilities for the outputs that are

equivalent to the vocabulary size thanks to the soft max activation.

11
2.6 Model Summary

Table 1 :Model Summary

2.7 Callbacks

The three necessary callback will be imported so that we may train our model.

ModelCheckpoint, ReduceLROnPlateau, and Tensorboard are the three crucial callbacks.

Let's examine the function that each of these callbacks serves.

2.7.1 Model Check Point :

After training, the parameters of our model are stored using this callback. By setting the

save best only=True option, we only keep the finest weights from our simulation. We'll use

the loss measure to keep an eye on our development.

2.7.2 Reduce LR on Plateau :

After a certain amount of epochs, the optimizer's learning rate is reduced using this

callback. Here, the patience has been set to 3. After three epochs, if the reliability does not

increase, our learning rate is lowered by a factor of 0.2. Loss is another measure that is being

watched in this situation.

12
2.7.3 Tensorboard :

For the purpose of visualising the charts, specifically the graphs plots for precision and

failure, the tensorboard function is utilised. Here, we'll just focus on the next word prediction

loss graph.

The optimal models rely on the measurement loss will be saved to the document

nextword1.h5. When using the predict function to try to guess our next word, this file will be

essential. Three epochs will pass as we wait for the damage to decrease. We will lower the

learning rate if it doesn't get better. Finally, if necessary, we will visualise the graphs and

histograms using the tensorboard function.

2.8 Compile and Fit:

The last phase involves gathering and fitting our model. In this case, we train the model and

save the best parameters to nextword1.h5 so that they may use the saved model instead of

repeatedly training it. In this case, I merely trained using the training data. You can, however,

decide to train using both validation and test data. We utilised categorical crossentropy, a loss

function that calculates the cross-entropy loss among labels and predictions. We'll build our

model using Adam, an optimizer with a 0.001 learning rate, and we'll base it on metric loss.

Our outcome is displayed below.

Figure 5: Output of our Model

13
2.9 Prediction:

We would import the tokenizer file that we have saved in pickle format for the prediction

notebook. The following word model, which has been saved in our database, will subsequently

be loaded. Every one of the input document for which predictions must be made will be

tokenized using the same tokenizer. Following this, we may use the model that was previously

saved to generate predictions about the input phrase.

While running the predictions, we'll use the try and except statements. We are using this

expression because we would not want the programme to leave the loop in the event that the

input sentence cannot be found. The script should be run for as soon as the user requests it. The

user must actively decide to end the script whenever they want to. The user may run the

programme for as long as they choose.

2.10 Observation :

For the metamorphosis dataset, we can develop a superior next word prediction. In roughly 150

epochs, we are able to considerably reduce the loss. On the available dataset, the next word

prediction model we have created is fairly accurate. The prediction's overall quality is good. To

improve the model's prediction, specific pre-processing procedures and model modifications

might be done.

14
Chapter 3 LSTM(Long Short Term Memory)

3.1 Introduction

LSTM is a kind of RNN, a serial network that permits persistent information. It fixes the

RNN's disappearing gradient issue. RNN is mostly used for persistent storage. Similar to how

humans recall what will happen next while watching a film or reading a book, RNNs likewise

function in a similar manner by remembering the prior knowledge and using it to interpret the

current input. The primary flaw of PNN is that they are unable to recall long-term dependencies

due to the vanishing gradient. and lstm is also built to prevent this issue.

Figure 6 : LSTM Architecture

3.2 Gates :

An LSTM recurrent unit attempts to "keep" all of the knowledge that the network has

previously encountered and attempts to "ignore" unnecessary information. To do this, numerous

layers of activation functions—often referred to as "gates"—are added for a range of goals.

Additionally, the Internal Cell State vector is tracked by each LSTM recurrent unit. This vector

gives a conceptual explanation of the data that the LSTM recurrent unit that come before it chose

to store. A Long Short-Term Memory Network is made up of three separate gates, each of which

performs a particular purpose, as will be seen in the following:

15
Figure 7 :LSTM Gates

Table 2: LSTM Parameters

3.2.1 Forget Gate:

The primary task of this gate is to forget the previous data

fgt=σ(Wfg⋅ [ht−1,xt]+bfg)

16
3.2.2 Input Gate:

This gate's main duty is to determine how much data should be written to the Internal

Cell State.

ipt=σ(Wip⋅ [ht−1,xt]+bip)

q~t=tanh(Wq⋅ [ht−1,xt]+bq)

3.2.3 Output Gate:

This gate's main duty is to generate output from the current state.

opt=σ(Wop⋅ [ht−1,xt]+bop)

ht=opt∗ tanh(qt)

3.3 LSTM Applications:

Applications for LSTM networks can be found in the following fields:

 Language simulation

 automated translation

 identification of handwriting

 picture captions

 employing attention models to create images

 answering inquiries

 Conversion of video to text

 modelling of polymorphic music

 Voice synthesiser

 Prediction of protein

17
3.4 LSTM VS RNN

Think about the situation when you need to edit some data in a calendar. An RNN applies

a function to the existing data to accomplish this, entirely altering it. While LSTM just performs

minor adjustments to the data through cell state-based addition or multiplication. This is how

LSTM selectively forgets and recalls information, outperforming RNNs.

Now imagine that you would like to process data that has periodic patterns, such as

forecasting coloured powder sales that surge around the Indian holiday of Holi. Thinking back at

the sales figures from the prior year is a wise move. Therefore, you must understand which

information should be deleted and which should be kept for future use. Otherwise, you'll need a

pretty sharp memory. Theoretically, recurrent neural networks appear to be effective in this.

However, they are unnecessary due to their two drawbacks, inflating gradient and disappearing

gradient.

To address this issue, LSTM introduces memory chunks known as cell states. One could

consider the designed cells to be differentiable memory.

18
Chapter 4: Bi-LSTM(Bidirectional LSTM)

4.1 Introduction

The unidirectional and bidirectional forms of long/short-term memory (BiLSTM) are

identical. The network is different because it connects to both the past and the future. For

instance, by feeding the letters to unidirectional LSTMs one at a time, they can be taught to

predict the word "fish," with the last value being remembered by the recurrent connections over

time. On the reverse path, a BiLSTM will also be fed the subsequent letter in the alphabet,

providing it with access to future knowledge. This teaches the networks to fill in holes rather

than forward information, which may patch a gap in the center of an image rather than expanding

it on the periphery.

Figure 8: Bi-LSTM Cell

4.2 Model

Traditional LSTMs. Bi-directional LSTMs attach to 2 layers that are simultaneously

moving in different directions in order to get the desired result. Data can travel in two directions

when using bi-directional information. The first is a future-to-past transition, while the second is

a future-to-past transition. This is where the unilateral and bidirectional versions differ most. In a

bidirectional model, simultaneous inspection in both directions is performed on any embedding


19
layer connected to the train data.We propose a Bi-directional LSTM-recurrent neural network as

the solution as a consequence. The information is acquired, pre-processed, and then applied to a

word embedding, which provides each word with a vector that reflects a few of its hidden

characteristics. We employ global vectors for word embedding for word embeddings (GloVe).

GloVe is an unsupervised learning-based technique for finding word vector representations in

datasets.We employ pre-trained GloVe embeddings to function with our social media network

news posts. The hidden layer will load the values straight from GloVe instead of using arbitrary

weights, which it would otherwise do. GloVe utilises globally obtained mass statistics to all

terms in news content, and the resulting representations help to define key linear word vector

substructures. The knowledge of the converted word vector has been divided into Train and Test.

We have now incorporated these word embeddings into our Bidirectional LSTM-RNN

model. To eliminate the most beneficial aspects of each filter, the Global Pooling Layer layer is

utilised. A series of densely packed concealed layers receive the value after it has been rerouted.

Last but not least, softmax layers are employed to evaluate the validity of a specific social media

message. All model parameters are subjected to regularisation in order to prevent the problem of

overfitting. A bidirectional LSTM-model RNN's design is depicted in Figure 8.

4.3 Bi-LSTM over LSTM

In practical issues, this kind of design excels, particularly in NLP. The primary

explanation is that each element of an input signal contains data from the past and the present.

For this reason, by integrating LSTM layers through both directions, BiLSTM can generate an

output that is more meaningful.

Each word in the sequence will have a different results from the BiLSTM

(sentence). As an outcome, the BiLSTM model is useful for a variety of NLP

applications, including entity recognition, translation, and sentence classification.

20
Additionally, it has uses in disciplines including handwriting recognition, protein

structure predictions, and speech recognition.

Finally, it's important to note that BiLSTM is a significantly slower model and takes

more training time than LSTM when discussing the drawbacks of BiLSTM comparison to

LSTM. As a result, we advise against utilising it unless absolutely necessary.

4.4 Bi-LSTM Network Architecture

Figure 9: Bi-LSTM Architecture

Making any neural network have the sequence data in both directions—backwards (future to

past) or forward—is known as bidirectional long-short term memory (bi-lstm) (past to future).

A bidirectional LSTM differs from a conventional LSTM in that our data flows in both

directions. We may make input flow in one way, either backwards or forward, using the standard

LSTM. To maintain both past and future information, bi-directional input can be made to flow in

both ways. Let's look at an illustration to help with the explanation.

21
Chapter 5 Result Analysis

The outcomes demonstrate that BiLSTM models outperform conventional unidirectional

LSTMs. It appears that by traversing input data multiple times (from left through right then from

right to left), BiLSTMs are better able to grasp the underlying context. For some types of data,

such as text parsing and word prediction in the input sentence, the superior performance of

BiLSTM over the standard unidirectional LSTM is understandable. It was unclear, nevertheless,

if training mathematical time series data repeatedly and learning from both the present and the

past would improve time series forecasting because some contexts might not exist, as seen in text

parsing. Our findings demonstrate that BiLSTMs outperform conventional LSTMs even when

used to anticipate financial time series data. There are a number of intriguing questions that may

be asked and empirically answered in order to better understand the distinctions between LSTM

and BiLSTM. By doing this, we can gain more knowledge about the operation and behaviour of

these different recurrent neural network variations.

Table 3 : Result Analysis

5.1 Loss Calculation

Deep learning algorithms often report the "loss" figures. Loss is essentially a punishment

for making a bad prediction. In more precise terms, if the model's forecast is accurate, the score

will be nil. Therefore, the objective is to obtain a weight values and biases that minimises the

loss in order to minimise the loss values.

The outcomes demonstrate that BiLSTM outperformed LSTM with efficiency of 66.00% and

58.30%, respectively. BiLSTM has the lowest loss when comparing losses.

22
5.2 Model Analysis

Figure 10: Model Analysis

At its core, LSTM uses the hidden state to preserve data from inputs that have already been

processed.

Due to the fact that it has only ever received inputs from the past, a unidirectional LSTM can

only preserve information from the past.

When using bidirectional, your inputs will be processed in two different directions: one from the

present to the future and the other from the future to the present. This method differs from

unidirectional in that details from the future is preserved in the LSTM that keeps running

backwards, and by combining the two hidden states, you can preserve data from both the present

and the future at any given time.

What they are suitable for is a really complex question, but BiLSTMs perform quite well since

they better comprehend context, as I will attempt to demonstrate with an example.

Let's imagine we want to anticipate the following word in a sentence. A unidirectional LSTM

will, at a high level, see that


23
The boys visited.

With the help of bidirectional LSTM, you will also be able to view data further along the road,

for instance, and will attempt to anticipate the next word purely based on this context.

Front LSTM:

The boys visited.

Reverse LSTM:

Afterward, they exited the pool.

Let's imagine we attempt to foretell the following word in a sentence. You can see how it could

be simpler for the networks to comprehend where the next word is if it uses knowledge from the

future.

24
Chapter 6 Conclusion and Future scope of work

6.1 Conclusion

A variation on standard LSTMs, deep bidirectional LSTMs (BiLSTM) networks train the

desired model not only from inputs to outputs but also from outputs to inputs. Specifically, given

the data A BiLSTM model first feeds input data to an LSTM model (back layer), and then

repeats the training using an additional LSTM model but with the input data sequence reversed

(i.e., Watson-Crick complement). BiLSTM models are said to perform better than standard

LSTMs .

In this work, we conducted a comparative analysis, and the outcomes are displayed in

Table 3 and also reflected in the bar chart in Figure 10. The results show that BiLSTM had the

best performance. Given that BiLSTM was developed to address LSTM model's shortcomings,

the higher performance of the BiLSTM model was anticipated. At the conclusion, it is clear that

using BiLSTM, we had both the highest accuracy and the lowest loss.

We can observe from the graphs in figures 6 and 8 for LSTM and BiLSTM, respectively,

that there is significant overfitting.

6.2 Future Scope

Future work would be done with a current emphasis on encapsulating BiLSTM and a

hybrid CNN/LSTM version. Applying the BERT technique to NLP and examining how these

models behave on datasets from various fields as well as Medium articles would be another topic

of investigation. In the future, a sizable dataset will also be collected in order to forecast not only

the word that will come next, but also the words that will be needed to finish sentences and to

automatically complete search terms or sentences.In future we can extend our model for many

others natural language generation such as auto- poem completion or auto story-completion etc.

We can also make this more personalized to predict the word from users history.

25
References

[I] P. P. Barman and A. Boruah, "A run based approach for next word pre-diction in assamese
phonetic transcriptio n," Sth International Conference on Advances in Computing and
Communication, 2018.
[2] R. Perera and P. Nand, "Recent advances in natural language generation:A survey and
classification of the empirical literature ," Computing and Informatics, vol. 36, pp. 1-32,01 2017.
[3] C. Aliprandi, N. Carmignani, N. Deha, P. Mancarella, and M. Rubino, "Advances in nip
applied to word prediction," J. Mol. BioI., vol. 147,pp. 195- 197,2008.
[4] C. McCormick, Latent Semantic Analysis (LSA) for Text Classification Tutorial, 20 19
(accessed February 3, 20 19). http://mccormickml.comJ 20 16/03/25/1sa-for-text-
classificationtutorial/.
[5] Y. Wang, K. Kim, B. Lee, and H. Y. Youn, "Word clustering based on pos feature for
efficient twitter sentiment analysis," Humancentric Computing and Information Sciences, vol. 8,
p. 17, Jun 2018.
[6] N. N. Shah, N. Bhatt, and A. Ganatra, "A unique word prediction system for text entry in
hindi," in Proceedings ofthe Second International Conference on Information and
Communication Technology for Competitive Strategies, p. 118, ACM, 20 16.
[7] M. K. Sharma and D. Smanta, "Word prediction system for text entry in hindi," ACM Trans.
Asian Lang. Inform. Process, 06 2014.
[8] R. Devi and M. Dua, "Performance evaluation of different similarity functions and
classification methods using web based hindi language question answering system," Procedia
Computer Science, vol. 92,pp. 520-525, 20 16.
[9] S. Hochreiter, "The vanishing gradient problem during learning recurrent neural nets and
problem solutions," International Journal 3ofUncertainty, Fuzziness and Knowledge-Based
Systems, vol. 6, no. 02, pp. 107-116,1998.
[10] D. Pawade, A. Sakhapara, M. Jain, N. Jain, and K. Gada, "Story scram-bler - automatic text
generation using word level rnn-lstm,' Modem Education and Computer Science, 2018.

26
27
Copy of Technical Review:

28
29
30
31

You might also like