LSTM

Why LSTM when we have RNN?
A sentence or phrase only holds meaning when every word in it is associated with
its previous word and the next one. LSTM, short for Long Short Term Memory, as
opposed to RNN, extends it by creating both short-term and long-term memory
components to efficiently study and learn sequential data. Hence, it’s great for
Machine Translation, Speech Recognition, time-series analysis, etc.
Why does LSTM outperform RNN?
First, let’s take a comparative look into an RNN and an LSTM-
Source – ResearchGate
RNNs have quite massively proved their incredible performance in sequence
learning. But, it has been remarkably noticed that RNNs are not sporty while
handling long-term dependencies.

Why so?
Vanishing Gradient Problem
Recurrent Neural Networks uses a hyperbolic tangent function, what we call the
tanh function. The range of this activation function lies between [-1,1], with its
derivative ranging from [0,1]. Now we know that RNNs are a deep sequential neural
network. Hence, due to its depth, the matrix multiplications continually increase in
the network as the input sequence keeps on increasing. Hence, while we use the
chain rule of differentiation during calculating backpropagation, the network keeps
on multiplying the numbers with small numbers. And guess what happens when you
keep on multiplying a number with negative values with itself? It becomes
exponentially smaller, squeezing the final gradient to almost 0, hence weights are
no more updated, and model training halts. It leads to poor learning, which we say
as “cannot handle long term dependencies” when we speak about RNNs.
Long Term Dependency Issue in RNNs
Let us consider a sentence-
“I am a data science student and I love machine ______.”
We know the blank has to be filled with ‘learning’. But had there been many terms
after “I am a data science student” like, “I am a data science student pursuing MS
from University of…… and I love machine ______”.
Conclusion
Long Short Term Memories are very efficient for solving use cases that involve
lengthy textual data. It can range from speech synthesis, speech recognition to
machine translation and text summarization. I suggest you solve these use-cases
with LSTMs before jumping into more complex architectures like Attention Models.
This time, however, RNNS fails to work. Likely in this case we do not need
unnecessary information like “pursuing MS from University of……”. What LSTMs do
is, leverage their forget gate to eliminate the unnecessary information, which helps
them handle long-term dependencies.
5 NATURAL LANGUAGE PROCESSING MODELS:
https://medium.com/ubiai-nlp/5-natural-language-processing-models-you-should-
know-836958303ce3
ELMO (Embeddings from Language Models)

ELMo, short for "Embeddings from Language Models," is used to create word
embeddings, which are numerical representations of words, but what sets ELMo
apart is its keen ability to capture the context and significance of words within
sentences.
Unlike traditional word embeddings, like Word2Vec or GloVe, which assign fixed
vectors to words regardless of context, ELMo takes a more dynamic approach. It
grasps the context of a word by considering the words that precede and follow it
in a sentence, thus delivering a more nuanced understanding of word meanings.
The architecture of ELMo is deep and bidirectional. This means it employs
multiple layers of recurrent neural networks (RNNs) to analyze the input
sentence from both directions – forward and backward. This bidirectional
approach ensures that ELMo comprehends the complete context surrounding
each word, which is crucial for a more accurate representation.
Moreover, they can be fine-tuned for specific NLP tasks, such as sentiment
analysis, named entity recognition, or machine translation, to achieve excellent
results.
UniLM (Unified Language Model)

UniLM, or the Unified Language Model, is an advanced language model
developed by Microsoft Research. What sets it apart is its ability to handle a
variety of language tasks without needing specific fine-tuning for each task. This
unified approach simplifies the use of NLP technology across various business
applications.
Key to UniLM's effectiveness is its bidirectional transformer architecture, which

allows it to understand the context of words in sentences from both directions.
This comprehensive understanding is essential for tasks like text generation,
translation, text classification, and summarization. It can streamline complex
processes such as document categorization and text analysis, making them
more efficient and accurate.
https://nanonets.com/blog/handwritten-character-recognition/#techniques
https://codelabs.developers.google.com/codelabs/cloud-vision-api-python#1
CODE FOR THE TEXT IDENTIFICATION

LSTM

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

LSTM

Uploaded by

Copyright:

Available Formats

Why LSTM when we have RNN?

opposed to RNN, extends it by creating both short-term and long-term memory

Machine Translation, Speech Recognition, time-series analysis, etc.

Why does LSTM outperform RNN?

First, let’s take a comparative look into an RNN and an LSTM-

RNNs have quite massively proved their incredible performance in sequence

handling long-term dependencies.

Vanishing Gradient Problem

chain rule of differentiation during calculating backpropagation, the network keeps

keep on multiplying a number with negative values with itself? It becomes

as “cannot handle long term dependencies” when we speak about RNNs.

Long Term Dependency Issue in RNNs

Let us consider a sentence-

“I am a data science student and I love machine ______.”

after “I am a data science student” like, “I am a data science student pursuing MS

from University of…… and I love machine ______”.

unnecessary information like “pursuing MS from University of……”. What LSTMs do

them handle long-term dependencies.

5 NATURAL LANGUAGE PROCESSING MODELS:

ELMO (Embeddings from Language Models)

UniLM (Unified Language Model)

Key to UniLM's effectiveness is its bidirectional transformer architecture, which

CODE FOR THE TEXT IDENTIFICATION

You might also like