You are on page 1of 23

MASTER SIDSD – S2

Neural Network

GRU:
GATED RECURRENT UNIT
introduction
Jaouad FATEH
D132766115 Dr. Essaid El haji
The plan:

Introduction to gru:
01

02 Implementation of gru

03 Gru vs lstm
01
Introduction to gru:
What are GRU ?

A Gated Recurrent Unit (GRU), as its name suggests, is a variant of the


RNN architecture, that uses gating mechanisms to control and manage the
flow of information between cells in the neural network. GRUs were
What introduced only in 2014 and can be considered a relatively new architec-
ture, especially when compared to the widely-adopted LSTM, which was
Are proposed in 1997.

GRU ? gated Recurrent Units are a variation of LSTMs because they


both have similar designs and mostly produce equally good
results. GRUs only have two gates, and they do not maintain an Internal
Cell State.
Why is this Useful?

A GRU is a very useful mechanism for fixing the vanishing gradient problem
in recurrent neural networks. The vanishing gradient problem occurs in 
machine learning when the gradient becomes vanishingly small, which pre-
Why vents the weight from changing its value. They also have better performance
is this than
LSTM when dealing with smaller datasets.
Useful
Applications of a Gated Recurrent Unit:

Gated Recurrent Units solve the problem of vanishing gradients faced by


traditional RNNs. They also do perform better than LSTMs on smaller
datasets. There are many applications of GRUs, some of which are:

•Polyphonic Music Modeling.


•Speech Signal Modeling.
•Natural Language Processing,
the structure of the GRU:

the structure of the GRU allows it to adaptively capture dependencies


from large sequences of data without discarding information from
earlier parts of the sequence. This is achieved through its gating units,
which solve the vanishing/exploding gradient problem of traditional
RNNs. These gates are responsible for regulating the information to be
kept or discarded at each time step.
How Does It Really Work? Inner Workings of the GRU
The ability of the GRU to hold on to long-term dependencies or memory
stems from the computations within the GRU cell to produce the hidden
state. While LSTMs have two different states passed between the cells —
the cell state and hidden state, which carry the long and short-term
memory, respectively — GRUs only have one hidden state transferred
between time steps. This hidden state is able to hold both the long-term
and short-term dependencies at the same time due to the gating mecha-
nisms and computations that the hidden state and input data go through.
?
Architecture du GRU:
The reset gate :
The reset gate is used from the model to decide how much of the past
information is needed to neglect or to forget ; in short, it decides whether the
previous cell state is important or not.
      
 First, the reset gate comes into action it stores relevant information from
the past time step into new memory content.
 Then it multiplies the input vector and hidden state with their weights.
 Next, it calculates element-wise multiplication between the reset gate and
previously hidden state multiple.
 After summing up the above steps the non-linear activation function is
applied and the next sequence is generated. 

gatereset=σ(Winputreset ⋅xt+Whiddenreset ⋅ht−1)


The reset gate in GRU is basically used to help the model decide how
much of the information to forget from the past.
The update gate:

The update gate is responsible for determining the amount of previous


information that needs to pass along the next state. This is really powerful
because the model can decide to copy all the information from the past
and eliminate the risk of vanishing gradient.

The update gate= forget +input gates in LSTM.


It decides what information to throw away & what new information to add.
//Memory Content:
?

//Memory Content:
02
Implementation of gru
03
Gru vs lstm
What is the difference between GRU & LSTM?

 The few differencing points are as follows


 The GRU has two gates, LSTM has three gates
 GRU does not possess any internal memory, they don’t have an output gate that is present in
LSTM
 In LSTM the input gate and target gate are coupled by an update gate and in GRU reset gate is
applied directly to the previous hidden state.
 In LSTM the responsibility of reset gate is taken by the two gates i.e., input and target. 
Conclusion:

GRU uses less training parameter and therefore uses less memory and
executes faster than LSTM whereas LSTM is more accurate on a larger dataset.
One can choose LSTM if you are dealing with large sequences and accuracy is
concerned, GRU is used when you have less memory consumption and want
faster results. 
References:

References:
 https://blog.floydhub.com/gru-with-pytorch/
 http://www.d2l.ai/chapter_recurrent-modern/gru.html
?highlight=gru
 https://www.kaggle.com/thebrownviking20/intro-to-re
current-neural-networks-lstm-gru/notebook
ⵜⴰⵏⵎⵉⵔⵜ ⵏⵏⵓⵏ ⵅⴼ ⵓⵡⴳⴳⴹ <3!!
Thank u for Ur
attention <3 !!

You might also like