Vanishin Gradient Problem

This document discusses the vanishing and exploding gradient problems in Gated Recurrent Units (GRUs) and their advantages over traditional Recurrent Neural Networks (RNNs) like the Elman Network. It highlights how GRUs are easier to optimize and less sensitive to initial parameters, making them more effective for tasks such as speech recognition and natural language processing. The authors aim to provide insights into the performance differences between RNNs and GRUs in the context of nonlinear system identification.

Uploaded by

rohan menon

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

23 views6 pages

Vanishin Gradient Problem

Uploaded by

rohan menon

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Available online at www.sciencedirect.

com

ScienceDirect
IFAC PapersOnLine 53-2 (2020) 1243–1248
On
On the
the vanishing
vanishing and
and exploding
exploding gradient
gradient
On
On the
the vanishing
vanishing
problem in and
and
Gated exploding
exploding
Recurrent gradient
gradient
Units
Onproblem
the
problem in
in Gated
vanishing and
Gated Recurrent
exploding
Recurrent Units
gradient
Units
problem in Gated Recurrent Units
problem in Gated Recurrent Units
Alexander Rehmer, Andreas Kroll
Alexander Rehmer, Andreas Kroll
Alexander
Alexander Rehmer, Rehmer, Andreas Andreas Kroll Kroll
DepartmentAlexander of Measurement Rehmer, and Control,
Andreas Institute
Kroll for System
Department of Measurement and Control, Institute for System
Analytics and Control,
Department of Faculty ofand
Measurement Mechanical
Control, Engineering,
Institute University
for System
System
Department
Analytics of Measurement
and Control, Faculty ofand Control, Engineering,
Mechanical Institute for University
Analytics
Departmentof
and Kassel,
Control, Germany,
Faculty
of Measurement (e-mail:
ofandMechanical {alexander.rehmer,
Control, Engineering,
Institute for System University
Analytics and Control,
of Kassel, Faculty of
Germany, Mechanical
(e-mail: Engineering,
{alexander.rehmer, University
Analytics and of Kassel, andreas.kroll}@mrt.uni-kassel.de)
Control, Germany, (e-mail: {alexander.rehmer,
of Kassel, Germany, (e-mail: {alexander.rehmer, University
Faculty of Mechanical
andreas.kroll}@mrt.uni-kassel.de) Engineering,
of Kassel, andreas.kroll}@mrt.uni-kassel.de)
Germany, (e-mail: {alexander.rehmer,
andreas.kroll}@mrt.uni-kassel.de)
andreas.kroll}@mrt.uni-kassel.de)
Abstract: Recurrent Neural Networks are applied in areas such as speech recognition,
Abstract: Recurrent Neural Networks are applied in areas such as speech recognition,
natural language
Abstract: Recurrentand videoNeural processing,
Networksand arethe identification
applied in areas
areas of nonlinear
such as state space
as speech
speech models.
recognition,
Abstract:
natural language Recurrentand videoNeural Networks
processing, andare theapplied
identification in such
of nonlinear state space recognition,
models.
Conventional
natural
Abstract: language Recurrent
Recurrentand Neural
video
Neural Networks,
processing,
Networks and e.g.
are the the Elman
identification
applied in Network,
areas of areashard
nonlinear
such state
speechto train.space A more
models.
recognition,
natural
Conventional language and video
Recurrent Neuralprocessing,
Networks, ande.g.thetheidentification
Elman Network, of nonlinear are hard state to train.space models. A more
recently language
Conventional
natural developed class
Recurrent
and of recurrent
Neural
video processing, neural
Networks, and networks,
e.g.
the the Elman so-called
identification Network,
of Gated are
nonlinear Units,
hard state outperform
to train.
space A their
more
models.
Conventional
recently developed Recurrent
class Neural
of recurrent Networks,
neurale.g. the Elman
networks, Network,
so-called Gated areUnits,hard to train. A more
outperform their
counterparts
recently
Conventional on
developed virtually
class
Recurrent of every task. neural
recurrent
Neural This e.g.
Networks, paper
networks,
the aims
Elman to Network,
provide
so-called Gated additional
are Units,
hard insights
outperform
to train. into
A the
their
more
recently
counterparts developed class ofevery
on virtually recurrent
task. neural
This papernetworks,aimsso-called
to provide Gated Units, outperform
additional insights intotheir the
differences
counterparts
recently between
on
developed RNNs
virtually
class of and Gated
every task.
recurrent Units
This
neural paperin order
networks,aims to
to explain
provide
so-called Gatedthe superior
additional
Units, perfomance
insights
outperform into the
theirof
counterparts
differences between on virtually
RNNsevery task. This
and Gated Units paper aims to
in order to explain
provide the additional
superiorinsights perfomance into the of
gated recurrent
differences
counterparts between
on units.RNNs
virtuallyIt isevery
argued,
and Gated
task.that
This Gated
Units paperin Units
order
aims are
to
to easier
explain
provide totheoptimizesuperior
additional not because
perfomance
insights into they
theof
differences
gated recurrent between units.RNNsIt is andargued,GatedthatUnits
Gated in Units
order are to explain
easier tothe optimizesuperior notperfomance
because they of
solve the
gated
differences vanishing
recurrent
between gradient
units.RNNsIt is problem,
is and
argued,Gated but
that because
Gated
Units theyare
Units
in Units
order circumvent
easier to
to explain tothe emergence
optimizesuperior notperfomanceof large local
because they of
gated
solve the recurrent
vanishing units. It
gradient argued,
problem, that
but Gated
because theyare easier
circumvent optimize
the emergence not because
of large local they
gradients.
solve
gated the vanishing
recurrent gradient
units. It is problem,
argued, but
that because
Gated they
Units circumvent
are easier to the emergence
optimize not of
because large local
they
solve
gradients.the vanishing gradient problem, but because they circumvent the emergence of large local
gradients.
solve
Copyright
gradients.the vanishing
© 2020 Thegradient
Authors.problem,
This is an butopenbecause they under
access article circumvent
the CC the BY-NC-NDemergence license of large local
(http://creativecommons.org/
gradients.
Keywords: Nonlinear system licenses/by-nc-nd/4.0)
identification, Recurrent Neural Networks, Gated Recurrent
Keywords: Nonlinear system identification, Recurrent Neural Networks, Gated Recurrent
Units.
Keywords: Nonlinear systemsystem identification,
identification, Recurrent
Recurrent Neural Neural Networks,
Networks, Gated Gated Recurrent
Recurrent
Keywords:
Units. Nonlinear
Units.
Keywords:
Units. Nonlinear system identification, Recurrent Neural Networks, Gated Recurrent
Units. 1. INTRODUCTION ics makes the GRU less sensitive to its initial choice of
1. INTRODUCTION ics makes the GRU less sensitive to its initial choice of
1. INTRODUCTION
INTRODUCTION parameters
ics makes the andGRU thus less simplifies
sensitive the to optimization
its initial choice problem. of
1. parameters
ics makes andGRU thus less simplifies the to optimization problem.
Gated Units, such 1. INTRODUCTION
as the Long Short-Term Memory ics In the
parameters
makes end the
GRU
and
the GRU and less
thus RNNs sensitive
simplifies willthe
sensitive be to its initial
compared
optimization
its initial on choice
a simple
problem.
choice
of
of
parameters
In the
Gated Units, such as the Long Short-Term Memory academic example and on a real nonlinear identification end and
GRU thus
and simplifies
RNNs will the
be optimization
compared on problem.
a simple
(LSTM)Units,
Gated and the such Gated
as the Recurrent
Long Unit (GRU)
Short-Term Memory In
were parameters the end GRU
and and
thus RNNs
simplifies will be
the compared
optimization on a simple
problem.
Gated
(LSTM)Units, and the suchGated as theRecurrent
Long Short-TermUnit (GRU) Memorywere academic
In the endexample GRU and and RNNs on awill realbenonlinear comparedidentification
on a simple
originally
(LSTM)
Gated
(LSTM)
originally
developed
and
Units,
and the
suchGated
the
developed
to overcome
Gated
as Long the
Recurrent
theRecurrent
to overcome
vanishing
Unit
Short-Term
the Unit
vanishing(GRU)
(GRU) Memorywere task.
gradient
were
gradient
academic
In the endexample
academic
task. GRU and
example and
and RNNs on
on aawill real
realbenonlinear comparedidentification
nonlinear on a simple
identification
problem, which
originally
(LSTM) developed
and theoccurs to
Gated in the Elman
overcome
Recurrent the Recurrent
vanishing
Unit (GRU) Neural
gradient
were task.
academic example and on a real nonlinear identification
originally
problem, which developedoccurs to in
overcome
the Elman the vanishing
Recurrentgradient Neural task. 2. RECURRENT NEURAL NETWORKS
Network (RNN)
problem,
originally which
developed (Pascanu
occurs in
to in et
theal.,
overcome 2012).
Elman They have
Recurrent
the vanishing since task. 2. RECURRENT NEURAL NETWORKS
Neural
gradient
problem,
Network (RNN) which occurs
(Pascanu the
et Elman
al., 2012). Recurrent
They have Neural
since 2. RECURRENT NEURAL NETWORKS
outperformed
Network
problem, (RNN)
which RNNsoccurs on ainnumber
(Pascanu et
theal., of tasks,
2012).
Elman such have
They
Recurrent as natural
since
Neural
Network
outperformed (RNN) RNNs (Pascanu
on a numberet al., of 2012).
tasks,They such have since In this 2.
as natural RECURRENT
2.section
RECURRENT the Simple NEURAL
NEURAL Recurrent NETWORKS
NETWORKS Neural Network
language,(RNN)
outperformed
Network speechRNNs and video
on
(Pascanu a processing
number
et al., of
2012).(Jordan
tasks, such
They ethave
asal., 2019)
natural
since In this section the Simple Recurrent Neural Network
outperformed
language, speech RNNs and on a number
video processingof tasks,
(Jordan suchetas natural
al., 2019) (RNN), In this also
section known the as Elman
Simple Network,
Recurrent and theNetwork
Neural Gated
and recently
language,
outperformed also
speechRNNs on
and a nonlinear
video
on a number system
processingof tasks,identification
(Jordan such etasal., task
2019)
natural In
(RNN), this section
also known the Simple
as Elman Recurrent
Network, Neural
and theNetwork Gated
language,
and recently speech
also andon avideo processing
nonlinear system (Jordan et al., 2019)
identification task In Recurrent
(RNN), this Unit
also
section known (GRU)
the as will
Elman
Simple be introduced.
Network,
Recurrent and
Neural the Gated
Network
(Rehmer
and
language,recentlyand Kroll,
also
speech on
and 2019).
avideo However,
nonlinear system
processing it will
(Jordan be shown,
identification
et al., that
task
2019) (RNN),
Recurrent also
Unit known (GRU) as Elman
will be Network,
introduced. and the Gated
and
(Rehmer recentlyandalsoKroll,on 2019).
a nonlinear
However,system identification
it will be shown, that task Recurrent Unit (GRU) as will be introduced.
the gradient
(Rehmer
and recentlyand Kroll,
also vanishes
on 2019).
a in Gated
However,
nonlinear system Units,
it will beother
shown,
identification un- (RNN),
stillthat
task also known Elman Network, and the Gated
(Rehmer
the gradient and also
Kroll, 2019). However,
vanishes in Gated it will be
Units, shown,
other un- Recurrent
stillthat 2.1 SimpleUnit Unit
Recurrent(GRU)
(GRU)Neural
will be
will beNetwork
introduced.
accounted
the
(Rehmer
the gradient
gradient
accounted
foralso
and mechanisms
Kroll,
foralso 2019). have
vanishes
vanishes
mechanisms However,
have
to beit
in Gated responsible
to beUnits, will be other
responsible
for
shown,
for un- Recurrent
stilltheir
that 2.1 Simple Recurrent Neural introduced.
Network
success. Pascanu et al. (2012) in Gated
show, Units,
that other
small stilltheir
changes un-
in 2.1 Simple Recurrent Neural Network
accounted
the
accountedgradient
success. foralso
for
Pascanu mechanisms
vanishes
mechanisms
et al. (2012) have
have to be
in Gated
to
show, bethatresponsible
Units, other
responsible
small for their
in 2.1
stilltheir
for
changes un- Simple Recurrent Neural Network
the parameters θ etof al.
the(2012)
RNN can lead tosmall
drastic changes The
2.1 RNN as
Simple depictedNeural
Recurrent in figure Network 1 is a straightforward
success.
accounted
success.
the parameters Pascanu
for mechanisms
Pascanuθ et of al.
the(2012) have
RNN show,show,
to be
can lead that
responsible
thattosmall
drastic changes
for
changes in
their
changes The
in realization RNN as depicted in figure 1 is a straightforward
The RNN of
as a nonlinear
depicted instatefigure space 11 model
is (Nelles, 2001).
in the
the
success.
the
in the
dynamic
parameters
Pascanuθ
parameters
dynamic θ behavior
of al.
et
of the(2012)
the
behavior RNN
RNN
of show,
the
can system,
can
of the lead
thatto
lead
system,
when
tosmall
drastic
drastic
when
crossing
changes
changes
changes
crossing in The RNN of
realization asa depicted
nonlinearin statefigure space is aa straightforward
model straightforward
(Nelles, 2001).
certain critical bifurcation points. This to in drastic
turn results It
The consists
realization
in realization RNN of
of
as aone
depicted hidden
nonlinear in recurrent
statefigure space 1 layer
model
is a with
(Nelles, nonlinear
straightforward2001).
inin
the the dynamic
parameters
the dynamic
certain θ behavior
of the
behavior of
critical bifurcation RNN of the
can system,
lead
the system,
points. This in turn when crossing
changes
whenresults
crossing It
in activationconsists of
of aone hidden
nonlinear recurrent
state space layer
model with
(Nelles, nonlinear
2001).
a huge change in the evolution of the hidden state x̂ , It consists
realization function
of
of aone nonlinear f h , which
hidden recurrent
state aims model
space to approximate
layer with
(Nelles, the
nonlinear
2001).
certain
in
certainthe critical
dynamic
a huge critical bifurcation
behavior
change bifurcation
in the evolution points.
of the This
system,
points.ofThisthe in in turn
when results
crossing
turn results
hidden state x̂in in
k activation
It consists function
of one f
hidden , which
recurrent aims to
layer approximate
with the
nonlinear
k, state
activation
It equation,
consists function
of one as wellf
hidden h as
, one
which hidden
recurrent aims feedforward
to
layer approximate
with layer fg
the
nonlinear
which
a huge leads
change to bifurcation
a locally
in the large,
evolution orofThis
exploding,
the hidden gradient
state x̂of , activation
state equation,function as wellf ,
has which
one hiddenaims to approximate
feedforward layer the
fg
acertain
huge critical
change in the points.
evolution of
which leads to a locally large, or exploding, gradient of the in turn
hidden results
state x̂in
k
k
k , and
state one
activation linear
equation, output
function as wellf h
layer,
h as
, one
which whichhiddenaims together
feedforward
to aim
approximateto approx-
layer f
the
athe
which
thehuge
which
lossleads
function.
change
lossleads to
to in
function. a
a In
In thisevolution
locally
the
locally
paper
large, the
large,the
this paper or GRU
orofGRU
will begradient
exploding,
the hidden
exploding,
examined
state x̂of
will begradient
examined k,
of state
and one equation,
linear output as welllayer, has one whichhidden feedforward
together aim to layer f ggg
approx-
and compared toa the RNN withthethe purpose to provide an imate
and
state one the output
linear
equation, output
as equation.
well layer,
as one For
whichhiddensimplicity
together of
feedforwardaimnotation,
to layer the
approx-
fg
the
which loss
and loss function.
leads
compared to to theIn this
locally
RNN paper
large,
withthe or
theGRUGRU will
exploding,
purpose be examined
gradient of and
imate one thelinear
output output equation.layer, which
For together
simplicity aim
of to
notation, approx-
the
the
alternative function.
explanationIn this paper
to why GRUs willto
outperformbeprovide
examined
RNNs.
an linear imate
and one output
the output
linear layer
output is layer,
equation. omitted For
which in the
simplicity
together following
of
aim equations
notation,
to the
approx-
and
the
and compared
loss
alternative function.
compared to
to the RNN
In this
the
explanation RNN with
paper
to why the
with the
theGRU
GRUs purpose
purpose willto beprovide
to
outperform examined
provide
RNNs.an imate
an linear the outputoutput layer equation.
is omitted For in simplicity
the following of notation, the
equations
First, it willexplanation
betoshown, that theGRUs
gradient oftothe GRU is imateand
linear figures:
output
the output layer is
equation. omitted For in the
simplicity following equations
of notation, the
alternative
and compared
alternative the
explanation RNN to
to why
with
why the
GRUs outperform
purpose
First, it will be shown, that the gradient of the GRU is linear outperform RNNs.
provide
RNNs.an linear
and output
figures: layer is omitted in the following equations
in fact smaller
First, than thatto ofwhythe RNN, at least for GRU
the pa- and figures:
output x̂ layer = fis omitted
(W x̂ + in W the u following
+ b ) , equations
in factit
alternative will be shown,
willexplanation thatthat the gradient
theGRUs of
of the
outperform RNNs. is figures: k+1 h x k u k h
First, itsmaller bethanshown, that
of the gradient
RNN, at least the
for GRU
the pa-is and and figures:x̂ x̂ k+1 = f h (W x x̂k + W u uk + bh ) ,
= f (W
(1)
rameterizations
in fact
First, itsmaller
will be considered
than
shown,that of
that in
thethis
theRNN, paper,
gradient at although
least
of for
the GRUs
the
GRU pa-is x̂ ŷ
k+1 k = f h
g (W x x̂
y x̂ k +
+ W
b
W g )u,u
u k ++ b
b h)) ,, (1)
in fact smaller than
rameterizations that of in
considered thethis
RNN, paper, at least for the
although GRUspa- x̂ ŷ
k+1
k+1 = f h
h (W x
x
y x̂
k = f g (W x̂ + W u + b ) ,kk + b g )u
u, k
k h
h (1)
(1)
were originally designed to solve the vanishing gradient ŷ
rameterizations
in factoriginally
smaller than
rameterizations
were considered
that of
considered
designed toin the
in this
RNN,
this
solve paper,
paper,
the at although
least
although
vanishing GRUs
forgradient
the
GRUspa- with x̂k ∈ Rk+1 n×1
ŷ kkk ,=ŷfk hggg ∈(W Rm×1 x̂kkk ,+ub
x k
y bkggg ))∈
u, k l×1 h
, Rl×1 , W x ∈ Rn×n (1),,
n×lRn×1 R R R
n×1 m×1
y n×n
problem.
were Secondly,
originally
rameterizations it willtobe
designed
considered in shown,
solve
this the
paper,that GRUs gradient
vanishing
although are
GRUsnot W with x̂k ∈ ŷ ,
= ŷf ∈(W y
x̂ ,
+ u b )∈, , W ∈
were
problem. originally
Secondly, designed
it willtobesolve shown, the that
vanishing
GRUs gradient
are not W with u ∈x̂ R ∈ R, b k
h ,∈ ŷ R
k gn×1
∈ R, m×1
yW k , ∈
u k
R
g m×n
∈ R l×1
, b, W∈ R
x m×1
∈ R n×n
and
n×n ,
only
problem. capable to designed
Secondly, represent
it highly nonlinear dynamics, but with u ∈
kRn×l , n×1
k ∈
x̂n×1 n×lRn×1
n×1
bn×1 h ,∈ŷR
k n×1∈ R, m×1 W yy, ∈
m×1
m×1
ukkkRm×1 m×nl×1
∈ Rl×1, b,gg W∈ xxxRm×1
∈ Rn×nand,
were
problem. originally
only capable Secondly, it will
to represent willto be
highlyshown,
besolve
shown, the that
nonlinearthat GRUs
vanishing
GRUs
dynamics, are not
gradient
are but
not f W
with : R
Whh uuu: R∈x̂
k
R ∈ → R, R
b
kRn×l , bn×1
n×l ,∈
h ∈ R ,
ŷ k
R
k
f n×1
∈
n×1: R, W
W yyy → ,→ ∈
u R R
Rm×1
m×n
∈
m×n R l×1
. Usually
, W∈
m×n , bg ∈ xRm×1 and
, b R m×1
R n×n
tanh(·)
∈
m×1 is,
are
only also
problem. able
capable to
Secondly, represent
represent
it will approximately
highly
be nonlinear
shown, that linear dynamics
dynamics,
GRUs are but
not f ∈n×1 → Rn×1 h
h , kf ggn×1: R, m×1 ∈ kR . Usually
g
g tanh(·)andis
only
are also capable
able to represent
represent highly nonlinearlinear
approximately dynamics,
dynamics but f employed
W
f : R∈
h : Rn×1 as
u R
n×1
n×1
n×l
as
→ , R
nonlinear
b
Rn×1
n×1
→ nonlinearh ∈ , R
f n×1
: R, m×1
activation
: Rm×1 W
m×1
, f ggg activation y → ∈ R R
Rm×1
→ function.
m×n
m×1
function.
m×1 ., b When
Usually
g
. Usually∈ R m×1
training
tanh(·)andis
via
are
only aalso
number
able
capable of different
to represent
represent parameterizations.
approximately
highly nonlinear Since
linear a linear
dynamics
dynamics, but employed
h Whentanh(·) trainingis
via aalso
are
model
number
able
is able
always
of different
to represent
a good first
parameterizations.
approximately
guess, the easy
Since
linear a linear f
dynamics
accessibility
anh RNN,
h
employed
:
employed R
an RNN, the
n×1 the
as
→ R
recurrent
n×1
nonlinear
as nonlinear
recurrent , f : R
model
m×1
activation
g activation →is
model is unfolded R
unfolded
m×1
function.
function. over. over
When
Usually
When
the whole
training
tanh(·)
thetraining
wholeis
via
are
via
model a number
also
a number
is always of
to different
represent
of different
a good first parameterizations.
approximately
parameterizations.
guess, the easySince Since
linear a linear
dynamics
a linear training
accessibility sequence of length Nis , and the gradient of the
an RNN,
employed the
as recurrent
nonlinear model
activation unfolded
function. over
When the whole
training
of different
model
via is parameterizations
is always
a number a
a good
good first
of different that produce
guess, the
the easy
parameterizations. linear a linear an
dynam-
accessibility
Since RNN,sequence
training the recurrent of length modelNis , andunfolded over the of
the gradient whole
the
model
of different always
parameterizations first guess,
that produce easylinear
accessibility
dynam- loss
training
an
loss
function
RNN,
training function the L
sequence
sequence
L
with
recurrent
with of respect
length
of respect model
length
to
toN
Nis ,,the
theand model
unfolded
and modelthe parameters
gradient
over the of
the parameters
gradient
θthe
whole
of θ
is
the
is
of different
model
of Sponsor
is
different parameterizations
always a good
parameterizations first that
guess, produce
the
that produce goes easy linear dynam-
accessibility
linear dynam- calculated. AsLawith consequence of ,thethe feedback, the gradient
and financial support acknowledgment here. Paper loss
training
loss function
function
calculated. sequence
AsLawith of respect
length
respect to
consequence toNof the and
the model modelthe
feedback, parameters
gradient
parameters of
the gradientθ
θ is
the
is
of different
Sponsor andparameterizations
financial support that produce
acknowledgment linear
goes dynam-
here. Paper of the errorAsLawith
Sponsor
titles
should andbe financial
written in uppercase
support and lowercase
acknowledgment letters,
goes here. not all
Paper calculated.
loss function
calculated.
of the error As a consequence
respect
consequence to of
of the
the
the feedback,
model

feedback, the
parameters
the gradient
θ is
gradient
titles should be written in uppercase and lowercase letters, not all
Sponsor and financial support acknowledgment goes here. Paper
of ek = ŷ k − ofy
target (2)
of the
the error
calculated. errorAs a consequence ythe feedback, the gradient
uppercase.

titles should
Sponsor andbe financial
written in uppercase
support and lowercase
acknowledgment letters,
goes here. not all
Paper target
k
uppercase.
titles should be written in uppercase and lowercase letters, not all ek = ŷ k − target
k (2)
uppercase.
titles should be written in uppercase and lowercase letters, not all
uppercase. of the error e = ŷ −
ekkk = ŷ kkk − y target y target
k (2)
(2)
target
k
k
2405-8963 Copyright © 2020 The Authors. This is an open access article under the CC BY-NC-ND license
uppercase. e k = . k
ŷ − y k (2)
Peer review under responsibility of International Federation of Automatic Control.
10.1016/j.ifacol.2020.12.1342
1244 Alexander Rehmer et al. / IFAC PapersOnLine 53-2 (2020) 1243–1248

Wu Wy
u fh fg ŷ x̂k x̂k+1

x̃k 1-

Wx
fr fz fc
Fig. 1. Representation of the Elman Network: Layers of Wr Wz Wc
neurons are represented as rectangles, connections
between layers represent fully connected layers. uk
at time step k with respect to the model parameters
θ = {W x , W u , W y , bh , bg } depends on the previous state Fig. 2. The Gated Recurrent Unit (GRU). Gates are
x̂k−1 , which depends again on the model parameters: depicted as rectangles with their respective activation
functions.
∂ek ∂ek ∂ ŷ k ∂ x̂k ∂ x̂k ∂ x̂k−1
= + (3) f r = σ (W r · [x̂k , uk ] + br ) ,
∂θ ∂ ŷ k ∂ x̂k ∂θ ∂ x̂k−1 ∂θ
For example, the gradient of the hidden state x̂k with f z = σ (W z · [x̂k , uk ] + bz ) , (6)
respect to W x is f c = tanh (W c · [x̃k , uk ] + bc ) ,
∂ x̂k
N
(k−τ +1)
(k−β) (k−β)
where W r , W z , W c ∈ Rn×n+l , br , bz , bc ∈ Rn×1 and
∂W x
= x̂k−τ f h (·) fh (·) W x
(4)
f r , f z , f c : Rn×1 → Rn×1 . σ (·) denotes the logistic
τ =1 β function.In order to map the states estimated by the GRU
β = τ − 2, τ − 3, . . . ∀β ≥ 0. to the output, the GRU has to be equipped either with an
High indices in brackets indicate the particular time step. output layer, as the RNN, or with an output gate, as the
The product term in (4), which also appears when com- LSTM.
puting the gradient with respect to the other parameters,
decreases exponentially with τ , if |fh · ρ (W )| < 1, where 3. GRADIENT OF THE STATE EQUATIONS
ρ (W x ) is the spectral radius of W x . Essentially, backprop-
agating an error one time step involves a multiplication of In this section, the gradients of the state equations of RNN
the state with a derivative that is possibly smaller than (1) and GRU (5) w.r.t. their parameters will be compared
one and a matrix whose spectral radius is possibly smaller to each other. In the cases examined the gradient of the
than one. Hence, the gradient vanishes after a certain GRU is, somewhat surprisingly, at most as large as that
amount of time steps. In the Machine Learning community of the RNN, but usually smaller.
it is argued that the vanishing gradient prevents learning In order to allow for an easily interpretable visualization,
of so-called long-term dependencies in acceptable time the analysis will be restricted to one dimensional and
(Hochreiter and Schmidhuber, 1997; Goodfellow et al., autonomous systems. Also, the GRU will be simplified by
2016), i.e. when huge time lags exist between input uk eliminating the reset gate f r from (6), such that x̃k = x̂k .
and output ŷ k . Gated recurrent units, like LSTM and Taking the gradient of the RNN’s state equation in (1)
GRU were developed to solve this problem and have since w.r.t. wx yields
then outperformed classical RNNs on virtually any task. ∂ x̂k+1 ∂ x̂k
However, it can be shown that the gradient also vanishes = (x̂k + wx ) · tanh (wx x̂k + bx ) . (7)
∂wx ∂wx
in gated recurrent units. Additionally, the vanishing of the
gradient over time is a desirable property. In most systems, The gradient of the GRUs state equation (5) with respect
the influence of a previous state xk−τ on the current state to wz is

x̂k decreases over time. Unless one wants to design a ∂ x̂k+1 ∂ x̂k
= (1 − tanh (x̂k ; θ c )) σ (x̂k ; θ z ) x̂k + wz
marginally stable or unstable system, e.g. when performing ∂wz ∂wz
tasks like unbounded counting, or when dealing with large ∂ x̂ k
dead times, the vanishing gradient has no negative effect + (1 − σ (x̂k ; θ z )) tanh (x̂k ; θ c ) ,
∂wz
on the optimization procedure.
(8)
2.2 The Gated Recurrent Unit and with respect to wc
∂ x̂k+1 ∂ x̂k
Gated Recurrent Unit (GRU) (Cho et al., 2014) is be- = σ (x̂k ; θ z ) (x̂k − tanh (x̂k ; θ c ))
∂wc ∂wc
sides LSTM the most often applied architecture of gated
∂ x̂k
recurrent units. The general concept of gated recurrent + σ (x̂k ; θ z )
units is to manipulate the state x̂k through the addition ∂wc
or multiplication of the activations of so called gates, see ∂ x̂k
+ (1 − σ (x̂k ; θ z )) tanh (x̂k ; θ c ) x̂k + wc .
figure 2. Gates are almost exclusively one-layered neural ∂wc
networks with nonlinear sigmoid activation functions. The (9)
state equation of the GRU is For convenience of notation θ z and θ c denote the pa-
x̂k+1 = f z x̂k + (1 − f z ) f c (x̃k ) (5) rameters of fz and fc respectively, i.e. θ z = [wz , bz ] and
with x̃k = f r x̂k . The operator denotes the Hadamard θ c = [wc , bc ].
product. The activations of the so-called gate reset gate f r , It is cumbersome to write down the gradients in (8) and
update gate f z and the output gate f c are given by (9) for an arbitrary number of time steps, as done for the
Alexander Rehmer et al. / IFAC PapersOnLine 53-2 (2020) 1243–1248 1245

RNN in (4). However, it already becomes apparent from

(8) and (9), that each step of backpropagation involves
the multiplication of the previous state with the output of
2 2
a sigmoid function, either σ (·) or tanh (·), and with one
of its derivatives. The outputs of the sigmoid functions is 1 1
always smaller than one. The derivative of the tanh (·) is 0 0
smaller than one, if |W c |1 < 1 and the derivative of the 0 1 0 1
logistic function is smaller than one, if |W z |1 < 4. This 0 0
means, the gradient of the GRU is at least as likely to fz 1 −1 fc fz 1 −1 fc
vanish as the gradient of the RNN.
(a) ∂ x̂k+1 /∂wz (b) ∂ x̂k+1 /∂wc
In order to examine the gradients further it is assumed,
that |wx | , |wz | , |wc | ≤ 1. The derivative of the logistic Fig. 3. Gradients of the GRU w.r.t. wz and wc depend-
function σ(·) is therefore bounded in the interval 0, 14 ing on the activation of fz = σ(x̂k , θz ) and fc =
and the derivative of the tanh(·) bounded in the interval tanh(x̂k , θc ) for wz = wc = ∂∂w
x̂k
= ∂∂w
x̂k
= x̂k = 1
[0, 1]. Under these assumptions the gradient of the RNN z c

is at most changes accumulate quickly, resulting in vastly different

∂ x̂k+1 ∂ x̂k
≤ (x̂k + wx ). (10) evolutions of the state. The effect on the loss function are
∂wx ∂wx locally huge gradients, which make gradient-based opti-
For the gradient of the GRU two extreme cases will be mization very difficult. Compared to the RNN, the GRUs
examined and compared to (10). gradient of the state equation is almost always smaller,
If the update gate is fully closed, i.e. σ (x̂k ; θ z ) = 0, (8) which means similar parameters produce similar trajec-
and (9) become tories of the state, leading to a smoother loss function
∂ x̂k+1 ∂ x̂k ∂ x̂k without huge local gradients. This will be illustrated via
= tanh (x̂k ; θ c ) ≤ simple examples in section 5.
∂wz ∂wz ∂wz
∂ x̂k+1 ∂ x̂k
= tanh (x̂k ; θ c ) (x̂k + wc ) (11) 4. EFFECT OF DIFFERENT PARAMETERIZATIONS
∂wc ∂wc
∂ x̂k ON THE STATE EQUATION
≤ x̂k + wc
∂wc
In addition to the smaller gradient of the GRU, the rather
It follows, that if the update gate is fully closed, the special structure of its state equation also benefits gradient
GRU becomes an ordinary RNN with the same properties based optimization, especially when identifying technical
regarding its gradient. systems.
If the update gate is fully open on the other hand, i.e. In this section, the same assumptions and restrictions
σ (x̂k ; θ z ) = 1, (8) and (9) become apply as in the previous section, i.e. only one dimensional
∂ x̂k+1 and autonomous systems are investigated and the GRUs
= 0, reset gate f r is neglected.
∂wz
(12) Without reset gate, (5) is the sum of the identity func-
∂ x̂k+1 ∂ x̂k
= . tion, weighted with a logistic function σ(·), and a tanh(·),
∂wc ∂wc
weighted by the residual 1 − σ(·). Fig. 4 visually decom-
Essentially, the previous state x̂k is just passed over to the poses (5) into its constituent parts.
new state x̂k+1 without any modification, so there is no This superposition of multiple functions enables the GRU
gradient w.r.t. the parameters but that from the previous to approximate quite complex nonlinearities with compar-
time-step. atively few parameters.
One can examine the behavior of (8) and (9) in between For example, if bz → −∞, (5) converges to a tanh pa-
these extreme cases by replacing tanh (·) = 1 − tanh2 (·) rameterized by wc and bc . For wz , bc → 0, wc → +∞ and
and σ (·) = σ(·) (1 − σ(·)). (8) and (9) then each are a cu- bz → −∞, (5) converges to the binary step function, while
bic polynomial in σ (x̂k , θz ) and tanh (x̂k , θc ). Their graphs for small negative bz it represents a kind of leaky binary
are plotted in figure 3 for wz = wc = 1, which produces step, where the slope of the tails is b̃z , compare Figure 5:
the largest gradient possible under the assumptions made.
Also we set ∂∂w x̂k
z
= ∂∂wx̂k
c
= 1 and x̂k = 1.
Figure 3 confirms, that the largest gradients occurs, when 1
the GRU’s update gate is fully closed, i.e. σ(x̂k , θz ) = 0,
x̂k+1

and the GRU hence becomes an RNN. For all other con- 0
figurations, the gradient is in fact smaller.
In contrast to the point made by the vanishing gradient
argument, that if the gradient vanishes after some time- −1
−1 0 1 −1 0 1 −1 0 1
steps, optimization might take prohibitively long. We ar-
gue that a smaller gradient of the state equation w.r.t. x̂k x̂k x̂k
its parameters is beneficial when training recurrent neural (a) fz · x̂k ( ) (b) (1 − fz )fc ( )(c) fz · x̂k +(1−fz )·fc
networks. If the gradient is large, as is the case with the ( )
RNN, a small change in parameters will lead to a huge
change in the state trajectory. Since every time step, the Fig. 4. Graphical decomposition of the GRU’s state equa-
previous state is again fed to the recurrent network, these tion. x̂k ( ), fz ( ), fc ( ), (1 − fz ) ( )
1246 Alexander Rehmer et al. / IFAC PapersOnLine 53-2 (2020) 1243–1248

1 1
x̂k+1

x̂k+1
0 0

−1 −1
−1 0 1 −1 0 1 −1 0 1 −1 0 1 −1 0 1
x̂k x̂k x̂k x̂k x̂k
(a) bz → −∞ (b) wz , wc → 0 (c) wz , bc → 0, (a) bz → +∞ (b) wz , wc → 0
wc → ∞,
bz → −∞ Fig. 7. Parameterizations for which GRU converges to
linear activation functions.
Fig. 5. Parameterizations for which GRU converges to-
wards a tanh, leaky binary step or binary step acti- ing different limits in the parameter space is especially
vation function. important when identifying technical systems. Even for
highly nonlinear processes the one step prediction surface
becomes more and more linear with increasing sampling
1 rate (Nelles, 2001). Therefore even when estimating a
model for a nonlinear process, it is important that the
x̂k+1

0 model has the capability to represent a linear dynamical

system. The fact that (5) represents a linear model for an
infinite number of parameter combinations and even when
−1 approaching different limits, means, that there are huge
−1 0 1 −1 0 1 −1 0 1
spaces in the parameter space, which already represent
x̂k x̂k x̂k
a formidable solution. Representing the identity function
(a) wc , bz , bc → 0 (b) bz → +∞,(c) bz → ∞, with a one dimensional Elman Network is only possible
wz , bc → 0 wz , wc , bc → 0 by letting wx and bx approach zero, which corresponds to
only a single point in the parameter space.
Fig. 6. Parameterizations for which GRU converges to
variants of the ReLu activation function.
b̃z = σ (bz ) 5. ACADEMIC EXAMPLE
(5) also converges to the ReLu activation function for
wz → ±∞ and wc , bz , bc → ±0 and is also able to represent Based on the idea, that representing an approximately lin-
a leaky ReLu with tail slope of w̃c : ear system is important even when identifying a nonlinear
w̃c = tanh (wc ) system, the loss functions and their gradients produced by
the RNN and the GRU will be examined on the following
for small wc and the swish activation for small wz . These linear discrete-time autonomous system with one state:
configurations are depicted in Figure 6.
The fact that the GRUs state equation converges to some xk+1 = −0.9 · xk (13)
of these functions when certain parameters become larger The initial state is x0 = 1 and the system is simulated
has an important effect on the loss function: Assume one for N = 100 time steps. The RNN is equipped with one
tries to learn a state space equation in form of a tanh. neuron in the recurrent layer without an output layer:
Although the RNN would be the right choice, because its x̂k+1 = tanh (wx x̂k + bx ) (14)
in the right model class, the optimization problem is rather
Since the GRU has two gates with respectively two param-
difficult. The only solution lies in a steep valley, which
eters, the parameters of the update gate f z will be fixed to
means small deviations from that solution produce large
the reasonable values wz = 1 and bz = 0 when examining
losses. It would take infinitely small steps (as soon as you
the loss function and gradient with respect to the output
are in the valley, which one cannot know) to reach that
gates parameters. Conversely the parameters of the output
solution. The GRU on the other hand offers an infinite
gate f c are fixed to the values wz = 1 and bz = 0 when
amount of solutions for large bz , which corresponds to an
examining the update gate.
infinite plane with small slope in the parameter space.
One only has to move along that plane to approximate x̂k+1 = [σ (wz x̂k + bz ) x̂k
(15)
the system increasingly better, which is easier than diving + [1 − σ (wz x̂k + bz )] · tanh (wc x̂k + bc )]
into a steep valley. Figures 8 and 9 show the resulting loss functions of the
Additionally, the GRU is able to approximate the identity GRU and the RNN as well as their respective gradients.
function and linear functions in general via a number of It is apparent, that the optimization problem posed
different parameterizations.: For bz → +∞ (5) converges by the RNN is rather difficult to solve with gradient
to the identity function, since fz is approaching one. For based optimization techniques: The local optimum lies in
wz , wc → 0 the result is also a linear function, but its slope a narrow valley surrounded by large gradients. The loss
is determined by b̃z and its axis intercept by b̃c , see Figure function of the GRU on the other hand does not possess
6: such a narrow value, compare figure 9a. The reason for
b̃c = tanh (bc ) this is that the update gate fz dominates the behavior
The ability to converge to a linear model when approach- of the model for x̂ > 0 (which is where training date is
Alexander Rehmer et al. / IFAC PapersOnLine 53-2 (2020) 1243–1248 1247

potentiometer throttle plate return spring

y
4
10 ϕ
2
5 2 2
0 i gear box
0 0 −2 0
−2 bx 0 bx u
0 PWM
L R
wx −2 wx −2 MA
(a) RNN loss function (b) RNN gradient DC motor

Fig. 8. RNNs loss function and magnitude of the gradient Fig. 10. Technology schematic of the electro-mechanical
on the linear identification task throttle.
of the system (Gringard and Kroll, 2016): a lower and an
upper hard stop, state-dependent friction, and a nonlinear
return spring.
2
10
1 6.2 Excitation Signals
5 2 2
0 One multisine signal and two Amplitude Modulated
0 0 −2 0
−2 bc 0 bc Pseudo-Random Binary Sequences (APRBS 1 and APRBS
0
wc −2 wc −2 2) have been used to excite the system. The multisine sig-
(a) GRU loss function for(b) GRU gradient for wz = nal has a length of ≈ 10 s or 103 instances, and the APRBS
wz = 1,bz = 0 1,bz = 0 signals have a length of ≈ 25 s or 2500 instances each. For
the multisine signal, an upper frequency of fu = 7.5 Hz
has been used. For the APRBS signals, the holding time
is TH = 0.1 s. See (Gringard and Kroll, 2016) for more
1 information on the test signal design.
10
5 0.5 6.3 Data preprocessing
0 0 0
0 −2 APRBS 1 and its response signal were scaled to the
−2 −2 bz 0 bz
0 −2 interval [−1, 1]; all other signals were scaled accordingly.
wz wz
The data was then divided into training, validation and
(c) GRU loss function for(d) GRU gradient for wc =
wc = 1,bc = 0 1,bc = 0
test datasets in the following way:
• Training dataset: Consists of two batches. The first
Fig. 9. GRUs loss function and magnitude of the gradient batch comprises 80 % of all instances of the multisine
on the linear identification task signal and the corresponding system response. The
available in this case study) by multiplying the output of second batch consists of 70 % of APRBS 1 and its
the output gate fc with 1 − fz and thereby diminishing corresponding response signal.
its influence. The effect is a smooth loss function without • Validation dataset: Consists of two batches. The first
large gradients. batch comprises the remaining 20 % of the multisine
signal and the corresponding system response. The
6. CASE STUDY: ELECTRO-MECHANICAL second batch consists of the remaining 30 % of APRBS
THROTTLE 1 and the response signal.
• Test dataset: One batch. APRBS 2 and its correspond-
ing system response.
To test, whether the properties of the GRU also proof
beneficial in real-life applications, it was compared to an This division was chosen because the multisines response
RNN on a real nonlinear dynamical system. signal almost exclusively covers the medium operating
range while the APRBS’ response signal also covers the
6.1 The test system lower and upper hard stops.

The system to be identified is an industrial electro-

mechanical throttle, as they are employed in combustion 6.4 Model Architectures
engines, which is operated load-free in a laboratory test
stand as depicted in figure 10. The input signal u is the The different model architectures used for identification
duty cycle for the pulse width modulator (PWM). The are listed in TABLE 1. Each model architecture was ini-
output signal y is the voltage of the sensor measuring the tialized randomly 20 times and trained until convergence
angle of the throttle plate. The throttle is actuated by a in order to examine sensitivity to initial parameters. The
DC motor. Even though the setup is relatively simple, the number of states dim(x) was varied from three to ten.
modeling task is rather difficult due to the characteristics Model architectures with an equal number of states were
1248 Alexander Rehmer et al. / IFAC PapersOnLine 53-2 (2020) 1243–1248

designed to have the same number of hidden neurons in

the output layer. Due to to its many gates a GRU will
have significantly more parameters than an RNN with an
equal number of states. This should be considered when
comparing both architectures.

Table 1: Network architectures

GRU
Gated Unit GRU
1st Layer
dim(x) 3 4 5 6 8 10
fg (·) tanh
2nd Layer
#(Neurons) 4 5 6 7 8 10
dim(θ) 66 103 149 201 321 481 Fig. 11. Boxplot of the BFR of RNN and GRU on the
RNN
test dataset. Each model was initialized 20 times and
trained for 800 epochs.
fh (·) tanh
1st Layer smaller gradient produced by the GRU helps gradient
dim(x) 3 4 5 6 8 10
2nd Layer
fg (·) tanh based optimization, since small changes in the parameter
#(Neurons) 4 5 6 7 space correspond to small changes in the evolution of
dim(θ) 36 55 78 105 151 205 the state, which in turn produces a smooth loss function
without large gradients. The second argument made is,
that the GRU’s state equation converges to different
6.5 Model Training functions, when certain parameters become larger. This
corresponds to producing large planes in the loss function
Each of the models in Table 1 was initialized randomly 20 along which the solution improves steadily, rather than
times and trained for 800 epochs. Between each batch, narrow valleys, as they are produced by the RNN. The
the models initial states were set to zero. Parameters analyses provided in this paper have yet to be generalized
were estimated based on the training dataset using the to state space networks with arbitrary dimensions and the
ADAM optimizer (Kingma and Ba, 2015) with its default whole parameter space.
parameter configuration (α = 0.01, β1 = 0.9, β2 = 0.999).
REFERENCES
6.6 Results
Cho, K. et al. (2014). Learning phrase representations
At the end of the optimization procedure, the model with using rnn encoder-decoder for statistical machine trans-
the parameter configuration, which performed best on the lation. In Proceedings of the 2014 Conference on Empir-
validation dataset, was selected and evaluated on the test ical Methods in Natural Language Processing (EMNLP),
dataset. The performance of the models is measured in 8, 1724–1734.
terms of their best fit rate (BFR): Goodfellow, I., Bengio, Y., and Courville, A. (2016). Deep
Learning. MIT Press.
yk − ŷk 2 Gringard, M. and Kroll, A. (2016). On the systematic
BFR = 100% · max 1 − ,0 (16)
yk − ȳ2 analysis of the impact of the parametrization of stan-
Figure 11 shows the BFR of the RNN and the GRU on the dard test signals. In IEEE Symposium Series of Com-
test dataset. As expected, nonlinear optimization of the putational Intelligence 2016. IEEE, Athens, Greece.
GRU consistently yields high performing models, while the Hochreiter, S. and Schmidhuber, J. (1997). Long short-
performance of the RNNs fluctuates strongly. It should be term memory. Neural Computation, 9(8), 1735–1780.
noted, that there are cases where the RNNs performance Jordan, I.D., Sokol, P.A., and Park, I.M. (2019). Gated re-
matches or even exceeds the performance of the GRU current units viewed through the lens of continuous time
(e.g. for dim(x) = 10). This proves, that the RNN is in dynamical systems. arXiv preprint arXiv:1906.01005.
general able to represent the test system, it seems just Kingma, D. and Ba, J. (2015). Adam: A method for
very unlikely to arrive at such a parameterization during stochastic optimization. In 3rd International Conference
the optimization process. Arguably because of the issues for Learning Representations (ICLR 2015).
discussed in Section 3. Nelles, O. (2001). Nonlinear System Identification: From
Classical Approaches to Neural Networks and Fuzzy
7. CONCLUSIONS & OUTLOOK Models. Springer, Berlin Heidelberg, Germany.
Pascanu, R., Mikolov, T., and Bengio, Y. (2012). Un-
It was shown, that the gradient of the GRU’s state derstanding the exploding gradient problem. CoRR,
equation w.r.t. its parameters is at most as large as but abs/1211.5063.
usually smaller than that of the RNN, provided the L1 Rehmer, A. and Kroll, A. (2019). On using gated recurrent
norm of all weights is smaller or equal to one. This finding units for nonlinear system identification. In Preprints
contradicts the argument, that a vanishing gradient is of the 18th European Control Conference (ECC), 2504–
responsible for the RNNs poor performance on various 2509. IFAC, Naples, Italy.
tasks. The first point made in this paper is, that the

Sciencedirect: © 2016, Ifac (International Federation of Automatic Control) Hosting by Elsevier Ltd. All Rights Reserved
No ratings yet
Sciencedirect: © 2016, Ifac (International Federation of Automatic Control) Hosting by Elsevier Ltd. All Rights Reserved
6 pages
Sciencedirect: © 2018, Ifac (International Federation of Automatic Control) Hosting by Elsevier Ltd. All Rights Reserved
No ratings yet
Sciencedirect: © 2018, Ifac (International Federation of Automatic Control) Hosting by Elsevier Ltd. All Rights Reserved
6 pages
Sciencedirect Sciencedirect
No ratings yet
Sciencedirect Sciencedirect
5 pages
10th IFAC Nonlinear Control Symposium
No ratings yet
10th IFAC Nonlinear Control Symposium
6 pages
Sciencedirect: © 2017, Ifac (International Federation of Automatic Control) Hosting by Elsevier Ltd. All Rights Reserved
No ratings yet
Sciencedirect: © 2017, Ifac (International Federation of Automatic Control) Hosting by Elsevier Ltd. All Rights Reserved
6 pages
Syllabus
No ratings yet
Syllabus
2 pages
B.E. Control Systems Syllabus (PSG Tech)
No ratings yet
B.E. Control Systems Syllabus (PSG Tech)
48 pages
NeurIPS 2022 Contrastive and Non Contrastive Self Supervised Learning Recover Global and Local Spectral Embedding Methods Paper Conference
No ratings yet
NeurIPS 2022 Contrastive and Non Contrastive Self Supervised Learning Recover Global and Local Spectral Embedding Methods Paper Conference
15 pages
Jee Main 2021 Syllabus
No ratings yet
Jee Main 2021 Syllabus
2 pages
Co Ipl - WTM-06 & Wta-06 - Proposed Syllabus (20-09-2025 & 22-08-2025)
No ratings yet
Co Ipl - WTM-06 & Wta-06 - Proposed Syllabus (20-09-2025 & 22-08-2025)
1 page
8th Vienna Conference on Mathematical Modelling
No ratings yet
8th Vienna Conference on Mathematical Modelling
6 pages
Mathematical Methods CAS Unit 2 Guide
No ratings yet
Mathematical Methods CAS Unit 2 Guide
31 pages
MCQs for ECE Board Exam: Tests & Measurements
No ratings yet
MCQs for ECE Board Exam: Tests & Measurements
1 page
College Algebra and Plane Trigonometry
No ratings yet
College Algebra and Plane Trigonometry
122 pages
Detection of Normal and Abnormal Conditions For Boundary Surveillance Using Unmanned Aerial Vehicle
No ratings yet
Detection of Normal and Abnormal Conditions For Boundary Surveillance Using Unmanned Aerial Vehicle
7 pages
Low-Cost Lab Setup for Control Systems
No ratings yet
Low-Cost Lab Setup for Control Systems
6 pages
A Novel High Order Time Domain Vector Finite Element Method For The Simulation of Electromagnetic Devices PH.D
No ratings yet
A Novel High Order Time Domain Vector Finite Element Method For The Simulation of Electromagnetic Devices PH.D
183 pages
Lyapunov Functions For Continuos and Discontinuos Differentiators2016
No ratings yet
Lyapunov Functions For Continuos and Discontinuos Differentiators2016
6 pages
MT Technical Test Syllabus Overview
No ratings yet
MT Technical Test Syllabus Overview
6 pages
Formularios Útiles
No ratings yet
Formularios Útiles
6 pages
Linear System Theory and Desing PDF
No ratings yet
Linear System Theory and Desing PDF
688 pages
Digital Signal Processing Overview
No ratings yet
Digital Signal Processing Overview
1 page
Mathematics & Chemistry Course Overview
No ratings yet
Mathematics & Chemistry Course Overview
47 pages
Dynamic Inner Canonical Correlation and Causality Analysis F 2018 IFAC Paper
No ratings yet
Dynamic Inner Canonical Correlation and Causality Analysis F 2018 IFAC Paper
6 pages
IIT-JEE 2024 Academic Calendar
No ratings yet
IIT-JEE 2024 Academic Calendar
8 pages
Proof Links To Paper: Ninian Nauneet Kujur Monika Meena
No ratings yet
Proof Links To Paper: Ninian Nauneet Kujur Monika Meena
7 pages
Mathematical Aspects of The Relative Gain Array
No ratings yet
Mathematical Aspects of The Relative Gain Array
18 pages
Understanding Vectors and Covectors
No ratings yet
Understanding Vectors and Covectors
33 pages
Understanding Vectors and Covectors
No ratings yet
Understanding Vectors and Covectors
1 page
1 s2.0 S2405896320308594 Main
No ratings yet
1 s2.0 S2405896320308594 Main
6 pages
A Novel Algorithm For Generating An Edge-Regular Graph A Novel Algorithm For Generating An Edge-Regular Graph
No ratings yet
A Novel Algorithm For Generating An Edge-Regular Graph A Novel Algorithm For Generating An Edge-Regular Graph
8 pages
Sciencedirect: © 2018, Ifac (International Federation of Automatic Control) Hosting by Elsevier Ltd. All Rights Reserved
No ratings yet
Sciencedirect: © 2018, Ifac (International Federation of Automatic Control) Hosting by Elsevier Ltd. All Rights Reserved
6 pages
Summer Semester 2025 MATH 202 L1 Engineering Mathematics
No ratings yet
Summer Semester 2025 MATH 202 L1 Engineering Mathematics
13 pages
Vibration Analysis of Cracked Beam
No ratings yet
Vibration Analysis of Cracked Beam
80 pages
Understanding Turbomachines and Finite Element Theory
No ratings yet
Understanding Turbomachines and Finite Element Theory
8 pages
5 - MSC Theoretical Computer Science Syllabus
No ratings yet
5 - MSC Theoretical Computer Science Syllabus
56 pages
ZZAA2
No ratings yet
ZZAA2
7 pages
Trishna Knowledge Systems - GATE 2017 Mechanical Engineering (2017, Pearson Education)
100% (1)
Trishna Knowledge Systems - GATE 2017 Mechanical Engineering (2017, Pearson Education)
1,557 pages
Data Science Roadmap 1737915697
No ratings yet
Data Science Roadmap 1737915697
1 page
CONTROL
No ratings yet
CONTROL
10 pages
J Procir 2019 03 151
No ratings yet
J Procir 2019 03 151
6 pages
Implementación de Algoritmos de Navegación Con Lego
No ratings yet
Implementación de Algoritmos de Navegación Con Lego
6 pages
Interactive Virtual Tools for Teaching
No ratings yet
Interactive Virtual Tools for Teaching
6 pages
COURSE PLAN 2025-2026: Course Name: FOUNDATION XI ENGG Class Commencement: 8 April, 2025 Sub:PHYSICS (Stream-1)
No ratings yet
COURSE PLAN 2025-2026: Course Name: FOUNDATION XI ENGG Class Commencement: 8 April, 2025 Sub:PHYSICS (Stream-1)
5 pages
ElectromagneticFieldTheory-ZahnSolutionsManual OCR PDF
No ratings yet
ElectromagneticFieldTheory-ZahnSolutionsManual OCR PDF
302 pages
1 s2.0 S2405896320331360 Main
No ratings yet
1 s2.0 S2405896320331360 Main
6 pages
Bhcs Maths Roadmap 2019
100% (1)
Bhcs Maths Roadmap 2019
1 page
JEE Main 2025 Syllabus: Math & Physics
No ratings yet
JEE Main 2025 Syllabus: Math & Physics
3 pages
Chemical Engineering Course Overview
No ratings yet
Chemical Engineering Course Overview
1 page
Pattern Recognition Lab Manual 2024
No ratings yet
Pattern Recognition Lab Manual 2024
13 pages
January 2007 QP
No ratings yet
January 2007 QP
17 pages
Logical Reasoning Mock Test 29
No ratings yet
Logical Reasoning Mock Test 29
13 pages
cUSTOM TOOLS
No ratings yet
cUSTOM TOOLS
78 pages
I. Ii. Iii. Iv. A.: KV Ongc Panvel Summer Vacation Holiday Homework Class X English
100% (1)
I. Ii. Iii. Iv. A.: KV Ongc Panvel Summer Vacation Holiday Homework Class X English
5 pages
Polydome Eden Pricelist Jun2015
No ratings yet
Polydome Eden Pricelist Jun2015
8 pages
Recruitment Process Flowchart Overview
No ratings yet
Recruitment Process Flowchart Overview
1 page
Understanding Racial Capitalism Dynamics
No ratings yet
Understanding Racial Capitalism Dynamics
14 pages
Enterprise System Integration With Web Services: A Case Study With A Book Broker Application
No ratings yet
Enterprise System Integration With Web Services: A Case Study With A Book Broker Application
156 pages
Kirchhoff's Equation for Enthalpy Change
100% (2)
Kirchhoff's Equation for Enthalpy Change
2 pages
Elements of A Novel
No ratings yet
Elements of A Novel
3 pages
Austin.J.L - Performative Utterances (On BBC)
No ratings yet
Austin.J.L - Performative Utterances (On BBC)
5 pages
Heeye
No ratings yet
Heeye
34 pages
Eurofork Sa180 Europallet 1
100% (1)
Eurofork Sa180 Europallet 1
5 pages
KINDS OF ERRORS IN SURVEYING - PDF Download
No ratings yet
KINDS OF ERRORS IN SURVEYING - PDF Download
2 pages
ASSIST LEDLife Revised2007 PDF
No ratings yet
ASSIST LEDLife Revised2007 PDF
34 pages
Writing Scholarly Thesis Proposal and Report - Okeola Olayinka
No ratings yet
Writing Scholarly Thesis Proposal and Report - Okeola Olayinka
22 pages
Inter Subjectivity
100% (2)
Inter Subjectivity
3 pages
SAQA - 114052 - Assessment Guide
No ratings yet
SAQA - 114052 - Assessment Guide
25 pages
CareerPath Marketing
No ratings yet
CareerPath Marketing
13 pages
VUCA Environment
No ratings yet
VUCA Environment
10 pages
Understanding Citation Styles and Importance
100% (1)
Understanding Citation Styles and Importance
36 pages
Gothic Cathedral Art Lesson Plan
No ratings yet
Gothic Cathedral Art Lesson Plan
3 pages
T&T Igcse Text Scan Unit 2 (Pgs 49-70)
No ratings yet
T&T Igcse Text Scan Unit 2 (Pgs 49-70)
22 pages
FLSA Exempt vs Non-Exempt Guide
No ratings yet
FLSA Exempt vs Non-Exempt Guide
1 page
Stoic Philosophy: The Role of Gratitude
100% (1)
Stoic Philosophy: The Role of Gratitude
3 pages
Android Emulator Setup & Usage Guide
No ratings yet
Android Emulator Setup & Usage Guide
19 pages
Participatory Arts Case Studies in Birmingham
No ratings yet
Participatory Arts Case Studies in Birmingham
15 pages
Mesh Size Testing for ITNs Guidance
No ratings yet
Mesh Size Testing for ITNs Guidance
2 pages
Journal of Indian Anthropological Society
No ratings yet
Journal of Indian Anthropological Society
21 pages
The Expanding Domain of Strategic Management
No ratings yet
The Expanding Domain of Strategic Management
13 pages