Professional Documents
Culture Documents
Seminar Report
On
Submitted in partial fulfillment of the requirements for the award of the degree of
Bachelor of Technology
in
Information Technology
(Session 2022-2023)
November,2022
Candidate’s Declaration
I hereby declare that the work, which is being presented in the Seminar Report, entitled Next
Word Prediction Using NLG in partial fulfillment for the award of degree of Bachelor of
Technology in Information Technology, and submitted to the Department of Information
Technology, Poornima College of Engineering, Jaipur is a record of my own work carried
under the guidance of Dr. Gajendra Rajawat, Department of Information Technology,
Poornima College of Engineering, Jaipur.
I have not submitted the matter presented in this Seminar Report anywhere else for the award of any other
Degree.
[Signature of Student]
[PCE19IT051]
HOD
Information Technology
Date:
Place:
DEPARTMENT OF INFORMATION TECHNOLOGY
Date:
CERTIFICATE
This is to certify that seminar report entitled Next Word Prediction Using NLG has been
submitted by Sanyam Modi (PCE19IT051) in partial fulfillment for the award of the degree of
Bachelor of Technology in Information Technology during the Odd Semester of Session 2022-
23.
The seminar report is found satisfactory and approved for final submission.
I wish to express my sincere thanks and gratitude to Dr. Mahesh Bundele (Principal & Director,
PCE) and Mr. Pankaj Dhemla (Vice-Principal, PCE) for their helping attitude with a keen
interest in completing this dissertation in time.
Sanyam Modi
PCE10IT051
Table of Contents
CHAPTER
PARTICULARS PAGE NO.
NO.
Cover Page & Title Page i
Candidate’s Declaration ii
Certificate by the Department iii
Acknowledgement iv
Table of Contents v
List of Tables vi
List of Figures vii
List of Abbreviations viii
Abstract 1
Chapter 1 Introduction to NLG 2
1.1 Introduction 2
1.2 Stages 3
1.3 Applications 5
1.4 Evaluation 6
1.5 History 8
Chapter 2 Next Word Prediction Model 9
2.1 Introduction 9
2.2 Approach 9
2.3 Pre Processing Dataset 10
2.4 Tokenization 10
2.5 Creating the Model 11
2.6 Model Summary 12
2.7 Callbacks 12
2.8 Compile and Fit 12
2.9 Predictions 14
2.10 Observations 14
Chapter 3 LSTM(Long Short Term Memory) 15
3.1 Introduction 15
3.2 Gates 15
3.3 Applications 17
3.4 LSTM vs RNN 18
Chapter 4 Bi-LSTM 19
4.1 Introduction 19
4.2 Model 19
4.3 Bi-LSTM over LSTM 20
4.4 Bi-LSTM Architecture 21
Chapter 5 Result Analysis 22
5.1 Loss Calculations 22
5.2 Model Analysis 23
Chapter 6 Conclusion and Future scope of work 25
6.1 Conclusion 25
6.2 Future Scope 25
References 26
Appendices (if any) -
Copy of the certificate of technical or review paper. -
Copy of the technical or review paper. 28
LIST OF TABLES
3 Result Analysis 22
LIST OF FIGURES
1 Venn-Diagram of NLP 2
2 NLG Application in Research 7
3 History of NLG 8
4 Model Plot 9
6 LSTM Architecture 15
7 LSTM Gates 16
8 Bi-LSTM Cell 19
9 Bi-LSTM Architecture 21
10 Model Analysis 23
LIST OF ABBERVIATIONS
ML Machine Learning
AI Artificial Intelligence
RNN Recurrent Neural Network
LSTM Long Short-Term Memory
BI-LSTM Bidirectional Long Short-Term Memory
NLG Natural Language Generation
NLP Natural Language Processing
Abstract
Next Word Prediction means Predicting the next word in the sentence which a user is currently
typing. With the help of this model we will be able to predict the next word for the user which
means user will not have to type the next word thus reducing the keystrokes pressed by user
means increasing the typing speed and less grammatical errors. For making this model we will be
using two deep learning algorithms such as LSTM(long short term memory) and
BiLSTM(Bidirectional long short term memory).
Profound learning is a subclass of AI, it impersonates the usefulness of the humanbrain the
manner in which it processes and makes design in current realities for decision making.
Fundamentally a man-made intelligence capability has networks equipped for learning unaided
information that is vague. The succeeding word gauge is performed on dataset comprising of
texts. Next Word Expectation is a use of NLP (Regular Language Processing). It is otherwise
called Language Displaying. Fundamentally it is the most common way of foreseeing the
following word in a sentence. It has numerous applications which are used by a large portion of
us, for example, auto-right which is generally used in Messages/Messages; it has likewise it's use
in MS Wordor google search where gauges the following word in light of our pursuit history or
the hunt we accomplished for globe. In this work we have concentrated on NLP, different
profound learning strategies like LSTM, BiLSTM and played out a relative report. We found
good outcomes in BiLSTM and LSTM. The exactness got utilizing BiLSTM and LSTM are:
66.1% and 58.27% separately.
1
Chapter 1 Introduction to NLG
1.1 Introduction
Natural language technology is, in a sense, the other of NLP programs along with voice
popularity and grammar checking, because it includes changing a few shape of automated
NLG is to be outstanding from superficially comparable strategies, commonly stated with the
aid of using names along with "document technology", "file technology", "mail merging",
etc. These strategies contain virtually plugging a hard and fast statistics shape along with a
desk of numbers or a listing of names right into a template so one can produce entire
documents. Due to their restricted flexibility, they generally tend to provide inflexible textual
content, frequently containing grammatical errors (for example, "you've got got one choices
remaining")
2
NLG, on the alternative hand, makes use of a few degree of underlying linguistic illustration
of the textual content, so one can make certain that it's miles grammatically accurate and
fluent. Most NLG structures consist of a syntactic realizer (our RealPro product is an
example), which guarantees that grammatical guidelines along with subject-verb settlement
are obeyed; and a textual content planner (along with one created in our Exemplars
framework), which comes to a decision a way to set up sentences, paragraphs, and different
additives of a textual content coherently. Practically any range or fashion of textual content
may be produced, starting from some phrases to a whole file, and diverse output codecs may
be generated, along with RTF textual content or HTML hypertext. Text turbines may even
produce equal texts in more than one languages simultaneously, making them an outstanding
which makes use of semantic records to generate accurate intonation, in contrast to maximum
1.2 Stages
The cycle to produce text can be however straightforward as keeping a rundown of canned
text that seems to be reordered, perhaps connected with some paste text. The outcomes might
incorporate phases of arranging and converging of data to empower the age of text that looks
normal and doesn't become redundant. The ordinary phases of regular language age, as
3
1.2.1 Content assurance:
Choosing what data to specify in the text. For example, in the dust model above,
choosing whether to unequivocally specify that dust level is 7 in the south east.
By and large association of the data to convey. For instance, choosing to depict the
regions with high dust levels first, rather than the areas with low dust levels.
1.2.3 Collection:
Grass dust levels for Friday have expanded from the moderate to elevated degrees of
yesterday and Grass dust levels will associate with 6 to 7 across most pieces of the
Grass dust levels for Friday have expanded from the moderate to elevated degrees of
Putting words to the ideas. For instance, concluding whether medium or direct ought to
Creating referring expressions that identify objects and regions. For example, deciding to
use in the Northern Isles and far northeast of mainland Scotland to refer to a certain
region in Scotland. This task also includes making decisions about pronouns and other
types of anaphora.
4
1.2.6 Acknowledgment:
Making the real text, which ought to be right as indicated by the guidelines of
punctuation, morphology, and orthography. For instance, utilizing will be for the future
tense of to be.
An elective way to deal with NLG is to utilize "start to finish" AI to fabricate a framework,
without having separate stages as above.as such, we construct a NLG framework via
info information and relating (human-composed) yield texts. The start to finish approach has
maybe been best in picture captioning, that is naturally creating a text based subtitle for a
picture.
1.3 Applications:
According to a business point of view, the best NLG applications have been information
to-message frameworks which produce literary synopses of data sets and informational
message age. Research has demonstrated the way that text based outlines can be more
compelling than diagrams and other visuals for choice help, and that PC created texts can
Throughout recent years, there has been an expanded revenue in naturally producing
connection point among vision and language. An instance of information to-message age,
the calculation of picture inscribing (or programmed picture depiction) includes taking a
5
picture, breaking down its visual substance, and producing a text based portrayal
(commonly a sentence) that expresses the most unmistakable parts of the picture.
1.3.3 Chabot’s :
Another region where NLG has been generally applied is mechanized discourse
to lead an on-line talk discussion through text or text-to-discourse, in lieu of giving direct
contact a live human specialist. While Regular Language Handling (NLP) methods are
applied in translating human information, NLG illuminates the result part regarding the
Inventive language age by NLG has been conjectured since the field's beginnings. A new
trailblazer in the space is Phillip Parker, who has fostered a weapons store of calculations
books on points going from bookbinding to waterfalls. The appearance of huge pre
trained transformer-based language models, for example, GPT-3 has likewise empowered
leap forwards, with such models exhibiting conspicuous capacity for making composing
undertakings.
1.4 Evaluation :
An extreme objective is the way valuable NLG frameworks are at aiding individuals, which
is the first of the above procedures. Be that as it may, task-based assessments are tedious and
costly, and can be hard to complete (particularly in the event that they require subjects with
specific ability, like specialists). Subsequently (as in different areas of NLP) task-based
6
As of late specialists are surveying how well human-appraisals and measurements connect
with (anticipate) task-based assessments. Work is being directed with regards to Age
Challenges shared-task occasions. Starting outcomes propose that human evaluations are
foresee task-viability essentially somewhat (despite the fact that there are special cases),
These outcomes are primer. Anyway, human appraisals are the most well-known assessment
broadly utilized.
7
1.5 History
NLG has been around since ELIZA was created in the middle of the 1960s, however the
techniques weren't really put to use until the 1990s. NLG techniques include everything from
straightforward template-based systems that produce letters and emails via mail merge to
sophisticated algorithms that comprehend human grammar in great detail. Another method
for achieving NLG is by utilizing machine learning to train a statistical model, generally on a
8
Chapter 2 : Next Word Prediction Model
2.1 Introduction
What exactly the built-in word prediction model will do is covered in this section. The
model will take into account the final word of a specific sentence and forecast the next potential
word. Deep learning, language modelling, and natural language processing techniques will all be
used. Following the pre-processing of the data, we will begin by evaluating the data. After
tokenizing the data, we will proceed to creating the deep learning model. Utilizing LSTMs, the
2.2 Approach:
The text data sets are readily available, and we can take into account Project Gutenberg, a
volunteer initiative to scan and archive literary works in order to "promote the manufacture and
delivery of eBooks." The tales, documents, and text data we need for our problem definition may
Very first step is to purge the Metamorphosis dataset of all extraneous information. The
dataset's beginning and finish will be removed. This information is unimportant to us. The
One morning, when Gregor Samsa woke from troubled dreams, he found.
Save all the file as Metamorphosis clean.txt after completing this step. We'll use utf-8 encoding
to retrieve the Metamorphosis clean.txt file. We then proceed to replace all extra new lines that
are superfluous, the carriage returned, and the Unicode characters. Lastly, we will make
absolutely sure that all of our words are original. Each word will only be taken into account
once, and all further repeats will be dropped. The model train will run more smoothly as a result
of the redundancy of terms causing less confusion. The full code for pre-processing text data is
provided below.
2.4 Tokenization
Tokenization is the process of breaking up larger textual data, essay, or corpora into more
manageable chunks. These more compact units may take the form of shorter documents or text
The Keras Tokenizer enables us to vectorize a text corpus by converting each bit of text into
either a chain of numerals (each integer representing the position of a token in a dictionary) or a
vector with a binary coefficient for each token based on word count or based on tf-idf.
The texts will subsequently be turned into sequences. This is a method of turning the textual
information into integers so that we can analyse them more effectively. Next, we'll produce the
training dataset. The training data will be input into the "X" along with text data. The results for
10
the training examples will be in the 'y'. Therefore, for each input "X," the "y" contains all of the
By utilising the length extrapolated from tokenizer.word index and adding 1 to it, we will
determine the vocab size. Since 0 is set aside for buffering and we intend to begin counting from
1, we are adding 1. Finally, we will transform the prediction data (y) into categorical vocab size
data. This method changes an integer-based class vector into a binary class matrix. With our loss,
We'll create a sequential model. Next, we will set the input and output parameters for a
hidden layers. Given that the predictions will be based on just one word and that word will be the
subject of the response, it is crucial to set the intake length as 1. After that, we will extend our
design with an LSTM layer. We'll give it 1000 units and make careful to mark the sequencing as
true when we return them. To make sure it can pass through another LSTM layer, we do this. We
will also transmit it through an additional 1000 units for the subsequent LSTM layer, but we don't
need to provide the return sequence because it is incorrect by default. With back propagation set
as the activation, we will use the closely packed layer function to send this via a hidden layer
containing 1000 node units. Finally, we run through a output layer with a softmax activation and
the chosen vocabulary size. We are given a large number of possibilities for the outputs that are
11
2.6 Model Summary
2.7 Callbacks
The three necessary callback will be imported so that we may train our model.
After training, the parameters of our model are stored using this callback. By setting the
save best only=True option, we only keep the finest weights from our simulation. We'll use
After a certain amount of epochs, the optimizer's learning rate is reduced using this
callback. Here, the patience has been set to 3. After three epochs, if the reliability does not
increase, our learning rate is lowered by a factor of 0.2. Loss is another measure that is being
12
2.7.3 Tensorboard :
For the purpose of visualising the charts, specifically the graphs plots for precision and
failure, the tensorboard function is utilised. Here, we'll just focus on the next word prediction
loss graph.
The optimal models rely on the measurement loss will be saved to the document
nextword1.h5. When using the predict function to try to guess our next word, this file will be
essential. Three epochs will pass as we wait for the damage to decrease. We will lower the
learning rate if it doesn't get better. Finally, if necessary, we will visualise the graphs and
The last phase involves gathering and fitting our model. In this case, we train the model and
save the best parameters to nextword1.h5 so that they may use the saved model instead of
repeatedly training it. In this case, I merely trained using the training data. You can, however,
decide to train using both validation and test data. We utilised categorical crossentropy, a loss
function that calculates the cross-entropy loss among labels and predictions. We'll build our
model using Adam, an optimizer with a 0.001 learning rate, and we'll base it on metric loss.
13
2.9 Prediction:
We would import the tokenizer file that we have saved in pickle format for the prediction
notebook. The following word model, which has been saved in our database, will subsequently
be loaded. Every one of the input document for which predictions must be made will be
tokenized using the same tokenizer. Following this, we may use the model that was previously
While running the predictions, we'll use the try and except statements. We are using this
expression because we would not want the programme to leave the loop in the event that the
input sentence cannot be found. The script should be run for as soon as the user requests it. The
user must actively decide to end the script whenever they want to. The user may run the
2.10 Observation :
For the metamorphosis dataset, we can develop a superior next word prediction. In roughly 150
epochs, we are able to considerably reduce the loss. On the available dataset, the next word
prediction model we have created is fairly accurate. The prediction's overall quality is good. To
improve the model's prediction, specific pre-processing procedures and model modifications
might be done.
14
Chapter 3 LSTM(Long Short Term Memory)
3.1 Introduction
LSTM is a kind of RNN, a serial network that permits persistent information. It fixes the
RNN's disappearing gradient issue. RNN is mostly used for persistent storage. Similar to how
humans recall what will happen next while watching a film or reading a book, RNNs likewise
function in a similar manner by remembering the prior knowledge and using it to interpret the
current input. The primary flaw of PNN is that they are unable to recall long-term dependencies
due to the vanishing gradient. and lstm is also built to prevent this issue.
3.2 Gates :
An LSTM recurrent unit attempts to "keep" all of the knowledge that the network has
Additionally, the Internal Cell State vector is tracked by each LSTM recurrent unit. This vector
gives a conceptual explanation of the data that the LSTM recurrent unit that come before it chose
to store. A Long Short-Term Memory Network is made up of three separate gates, each of which
15
Figure 7 :LSTM Gates
fgt=σ(Wfg⋅ [ht−1,xt]+bfg)
16
3.2.2 Input Gate:
This gate's main duty is to determine how much data should be written to the Internal
Cell State.
ipt=σ(Wip⋅ [ht−1,xt]+bip)
q~t=tanh(Wq⋅ [ht−1,xt]+bq)
This gate's main duty is to generate output from the current state.
opt=σ(Wop⋅ [ht−1,xt]+bop)
ht=opt∗ tanh(qt)
Language simulation
automated translation
identification of handwriting
picture captions
answering inquiries
Voice synthesiser
Prediction of protein
17
3.4 LSTM VS RNN
Think about the situation when you need to edit some data in a calendar. An RNN applies
a function to the existing data to accomplish this, entirely altering it. While LSTM just performs
minor adjustments to the data through cell state-based addition or multiplication. This is how
Now imagine that you would like to process data that has periodic patterns, such as
forecasting coloured powder sales that surge around the Indian holiday of Holi. Thinking back at
the sales figures from the prior year is a wise move. Therefore, you must understand which
information should be deleted and which should be kept for future use. Otherwise, you'll need a
pretty sharp memory. Theoretically, recurrent neural networks appear to be effective in this.
However, they are unnecessary due to their two drawbacks, inflating gradient and disappearing
gradient.
To address this issue, LSTM introduces memory chunks known as cell states. One could
18
Chapter 4: Bi-LSTM(Bidirectional LSTM)
4.1 Introduction
identical. The network is different because it connects to both the past and the future. For
instance, by feeding the letters to unidirectional LSTMs one at a time, they can be taught to
predict the word "fish," with the last value being remembered by the recurrent connections over
time. On the reverse path, a BiLSTM will also be fed the subsequent letter in the alphabet,
providing it with access to future knowledge. This teaches the networks to fill in holes rather
than forward information, which may patch a gap in the center of an image rather than expanding
it on the periphery.
4.2 Model
moving in different directions in order to get the desired result. Data can travel in two directions
when using bi-directional information. The first is a future-to-past transition, while the second is
a future-to-past transition. This is where the unilateral and bidirectional versions differ most. In a
the solution as a consequence. The information is acquired, pre-processed, and then applied to a
word embedding, which provides each word with a vector that reflects a few of its hidden
characteristics. We employ global vectors for word embedding for word embeddings (GloVe).
datasets.We employ pre-trained GloVe embeddings to function with our social media network
news posts. The hidden layer will load the values straight from GloVe instead of using arbitrary
weights, which it would otherwise do. GloVe utilises globally obtained mass statistics to all
terms in news content, and the resulting representations help to define key linear word vector
substructures. The knowledge of the converted word vector has been divided into Train and Test.
We have now incorporated these word embeddings into our Bidirectional LSTM-RNN
model. To eliminate the most beneficial aspects of each filter, the Global Pooling Layer layer is
utilised. A series of densely packed concealed layers receive the value after it has been rerouted.
Last but not least, softmax layers are employed to evaluate the validity of a specific social media
message. All model parameters are subjected to regularisation in order to prevent the problem of
In practical issues, this kind of design excels, particularly in NLP. The primary
explanation is that each element of an input signal contains data from the past and the present.
For this reason, by integrating LSTM layers through both directions, BiLSTM can generate an
Each word in the sequence will have a different results from the BiLSTM
20
Additionally, it has uses in disciplines including handwriting recognition, protein
Finally, it's important to note that BiLSTM is a significantly slower model and takes
more training time than LSTM when discussing the drawbacks of BiLSTM comparison to
Making any neural network have the sequence data in both directions—backwards (future to
past) or forward—is known as bidirectional long-short term memory (bi-lstm) (past to future).
A bidirectional LSTM differs from a conventional LSTM in that our data flows in both
directions. We may make input flow in one way, either backwards or forward, using the standard
LSTM. To maintain both past and future information, bi-directional input can be made to flow in
21
Chapter 5 Result Analysis
LSTMs. It appears that by traversing input data multiple times (from left through right then from
right to left), BiLSTMs are better able to grasp the underlying context. For some types of data,
such as text parsing and word prediction in the input sentence, the superior performance of
BiLSTM over the standard unidirectional LSTM is understandable. It was unclear, nevertheless,
if training mathematical time series data repeatedly and learning from both the present and the
past would improve time series forecasting because some contexts might not exist, as seen in text
parsing. Our findings demonstrate that BiLSTMs outperform conventional LSTMs even when
used to anticipate financial time series data. There are a number of intriguing questions that may
be asked and empirically answered in order to better understand the distinctions between LSTM
and BiLSTM. By doing this, we can gain more knowledge about the operation and behaviour of
Deep learning algorithms often report the "loss" figures. Loss is essentially a punishment
for making a bad prediction. In more precise terms, if the model's forecast is accurate, the score
will be nil. Therefore, the objective is to obtain a weight values and biases that minimises the
The outcomes demonstrate that BiLSTM outperformed LSTM with efficiency of 66.00% and
58.30%, respectively. BiLSTM has the lowest loss when comparing losses.
22
5.2 Model Analysis
At its core, LSTM uses the hidden state to preserve data from inputs that have already been
processed.
Due to the fact that it has only ever received inputs from the past, a unidirectional LSTM can
When using bidirectional, your inputs will be processed in two different directions: one from the
present to the future and the other from the future to the present. This method differs from
unidirectional in that details from the future is preserved in the LSTM that keeps running
backwards, and by combining the two hidden states, you can preserve data from both the present
What they are suitable for is a really complex question, but BiLSTMs perform quite well since
Let's imagine we want to anticipate the following word in a sentence. A unidirectional LSTM
With the help of bidirectional LSTM, you will also be able to view data further along the road,
for instance, and will attempt to anticipate the next word purely based on this context.
Front LSTM:
Reverse LSTM:
Let's imagine we attempt to foretell the following word in a sentence. You can see how it could
be simpler for the networks to comprehend where the next word is if it uses knowledge from the
future.
24
Chapter 6 Conclusion and Future scope of work
6.1 Conclusion
A variation on standard LSTMs, deep bidirectional LSTMs (BiLSTM) networks train the
desired model not only from inputs to outputs but also from outputs to inputs. Specifically, given
the data A BiLSTM model first feeds input data to an LSTM model (back layer), and then
repeats the training using an additional LSTM model but with the input data sequence reversed
(i.e., Watson-Crick complement). BiLSTM models are said to perform better than standard
LSTMs .
In this work, we conducted a comparative analysis, and the outcomes are displayed in
Table 3 and also reflected in the bar chart in Figure 10. The results show that BiLSTM had the
best performance. Given that BiLSTM was developed to address LSTM model's shortcomings,
the higher performance of the BiLSTM model was anticipated. At the conclusion, it is clear that
using BiLSTM, we had both the highest accuracy and the lowest loss.
We can observe from the graphs in figures 6 and 8 for LSTM and BiLSTM, respectively,
Future work would be done with a current emphasis on encapsulating BiLSTM and a
hybrid CNN/LSTM version. Applying the BERT technique to NLP and examining how these
models behave on datasets from various fields as well as Medium articles would be another topic
of investigation. In the future, a sizable dataset will also be collected in order to forecast not only
the word that will come next, but also the words that will be needed to finish sentences and to
automatically complete search terms or sentences.In future we can extend our model for many
others natural language generation such as auto- poem completion or auto story-completion etc.
We can also make this more personalized to predict the word from users history.
25
References
[I] P. P. Barman and A. Boruah, "A run based approach for next word pre-diction in assamese
phonetic transcriptio n," Sth International Conference on Advances in Computing and
Communication, 2018.
[2] R. Perera and P. Nand, "Recent advances in natural language generation:A survey and
classification of the empirical literature ," Computing and Informatics, vol. 36, pp. 1-32,01 2017.
[3] C. Aliprandi, N. Carmignani, N. Deha, P. Mancarella, and M. Rubino, "Advances in nip
applied to word prediction," J. Mol. BioI., vol. 147,pp. 195- 197,2008.
[4] C. McCormick, Latent Semantic Analysis (LSA) for Text Classification Tutorial, 20 19
(accessed February 3, 20 19). http://mccormickml.comJ 20 16/03/25/1sa-for-text-
classificationtutorial/.
[5] Y. Wang, K. Kim, B. Lee, and H. Y. Youn, "Word clustering based on pos feature for
efficient twitter sentiment analysis," Humancentric Computing and Information Sciences, vol. 8,
p. 17, Jun 2018.
[6] N. N. Shah, N. Bhatt, and A. Ganatra, "A unique word prediction system for text entry in
hindi," in Proceedings ofthe Second International Conference on Information and
Communication Technology for Competitive Strategies, p. 118, ACM, 20 16.
[7] M. K. Sharma and D. Smanta, "Word prediction system for text entry in hindi," ACM Trans.
Asian Lang. Inform. Process, 06 2014.
[8] R. Devi and M. Dua, "Performance evaluation of different similarity functions and
classification methods using web based hindi language question answering system," Procedia
Computer Science, vol. 92,pp. 520-525, 20 16.
[9] S. Hochreiter, "The vanishing gradient problem during learning recurrent neural nets and
problem solutions," International Journal 3ofUncertainty, Fuzziness and Knowledge-Based
Systems, vol. 6, no. 02, pp. 107-116,1998.
[10] D. Pawade, A. Sakhapara, M. Jain, N. Jain, and K. Gada, "Story scram-bler - automatic text
generation using word level rnn-lstm,' Modem Education and Computer Science, 2018.
26
27
Copy of Technical Review:
28
29
30
31