Professional Documents
Culture Documents
A Friendly Introduction To RNN PDF
A Friendly Introduction To RNN PDF
the perfect roommate, because he cooks every day and he cooks three types
of foods apple pie, burger and chicken.
But he has a rule for what he cooks. He first looks outside at the weather
and it can be sunny or it can be raining and if it's sunny he cooks apple
pie because he's happy and if it's rainy he cooks a burger.
This scenario can be easily modelled by a very simple neural network that
has an input and an output. So, in this case if the input is a sunny day
then the output is an apple pie or if the input is a rainy day then the
output is a burger.
Let's do some math and for the math we're going to introduce vectors.
We will represent the food and the weather by some vectors.
We have three possible foods, so we represent them with vectors of length
3 and we have two types of weather conditions, so we represent them with
vectors of length 2.
This neural network is actually just a very simple matrix and this matrix
works like this:
1. If we multiply the vector [1 0] corresponding to a sunny day with
the food vector, we get a vector [1 0 0] corresponding to an apple
pie.
2. If we multiply the vector [0 1] corresponding to a rainy day with
the food vector, we get a vector [0 1 0] corresponding to a burger.
So, this neural network is just a linear map that sends the sunny day to
an apple pie and the rainy day to a burger
We are not used to seeing our networks as a matrix but as a bunch of
nodes with arrows. In the figure shown below, the matrix on the left is
turned to the arrows on the right. The dark arrows are labelled 1 for the
1s in the matrix and the light arrows are labelled 0.
One day if he cooks an apple pie then the next day he cooks a burger and
then the next day he cooks a chicken and then the next day again apple
pie and then burger and then chicken and so on.
So, we can always tell what he's going to cook based on what he cooked
the day before. For example, if on Monday he cooks an apple pie then on
Tuesday he cooks a burger and Wednesday he cooks a chicken and on Thursday
an apple pie and on Friday a burger then on Saturday a chicken and so on
and so forth.
Now, this is not a normal neural network anymore and it’s called a
Recurrent Neural Network.
In this case there's no input for the weather, so the bottom arrow doesn't
come from anywhere but the output goes back in as input.
So, if we had an apple pie yesterday then this apple-pie comes back as
input and then the output is a burger. And today he will cook a burger
and this burger comes back as input and gives the chicken as the output,
which means tomorrow he will cook chicken and so on.
Let's look at a more complicated RNN. Now our perfect roommate’s cooking
rule is going to be a combination of the two previous rules. He is still
very methodical and he still cooks in sequence apple-pie, burger and
chicken. But his decision of what to cook is going to depend on the
weather too.
· If it's sunny he's going to go outside and enjoy the day and he's
not going to be cooking. So, he's just going to give the same thing
as yesterday, the leftover.
· If it's raining then he stays home and he has nothing to do. So,
he cooks the next dish in the list.
So, if it's sunny we get the same thing as yesterday and if it's raining
we get the next thing in the sequence.
Here's an example, let's say on Monday we made an apple pie and on Tuesday
we check the weather and it's sunny and if it's sunny then our roommate
doesn't cook anything new. So, we get an apple pie.
Note: Doesn’t confuse the sunny weather for Tuesday under Monday, it's
just for diagram purposes.
So, when they check the weather if it's rainy then our roommate stays
home and make something different, a burger the next thing on the list.
And on Thursday if it's rainy then he makes chicken because he stayed
home and cooked the next meal. If Friday is sunny then he goes outside
and doesn't cook anything new. So, Friday we get chicken again and if on
Saturday’s raining he cooks the next dish which is an apple pie. If Sunday
it's sunny then we can have a pie again and so on and so forth.
This Recurring Neural Network looks as shown in the figure given below.
There's an input coming from underneath which is the weather and an output
that is the food that comes back as input. So, check this out:
· If yesterday's food was apple pie and today the weather is raining,
· Then these two things feed into a neural network and the output is
a burger, because in a rainy day our roommate would cook the next
food which is a burger.
Again let's have a look at the vectors. Recall that the apple-pie is
vector [1 0 0], burger is [0 1 0] and chicken is [0 0 1] and there are
just two weathers. Sunny weather is vector [1 0] and rainy weather is
vector [0 1].
Now, we will see that the neural network is not just a simple matrix. As
there are more layers in the neural network, therefore it’s a bunch of
matrices and some maps.
Here are two matrices, one is called the food matrix and the other one
is the weather matrix. We add and then merge them with a non-linear
operation.
We will see all these steps one by one. So, let's start by the food
matrix and this is how the food matrix works:
So, what this food matrix does is that it takes the vector for the today's
food and it returns the vector for the today's food and the vector for
the tomorrow's food concatenated.
Now let's have a look at the weather matrix. Weather matrix is also a
concatenation of matrices, where the top matrix has three ones in the
first row and the bottom matrix has three ones in the second row.
· If we multiply the weather matrix by the vector corresponding to a
sunny day then we get three 1’s on the top for the same day and
three 0’s in the bottom for the next day.
So, this weather vector is kind of telling us that should we cook today's
food or should we cook tomorrow's food based on the input.
Now, magic happens when we add these two and we should get a clear signal
of what we should cook the next day based on:
· What the food for today,
· What the food for tomorrow, and
· Should we cook the food for today or should we cook the food for
tomorrow.
This will be much clearer with an example. Let’s say yesterday we cooked
an apple pie and today the weather is rainy. So, here is the result of
apple-pie vector multiplied by the food matrix:
We get the food for today and the food for tomorrow.
Notice that the largest entry in the final output vector is a 2 and this
2 hints of a burger, because the vector for a burger is [0 1 0]. So,
we're going to try to extract that 2 and for extracting it we use the
merge map.
One very special thing to notice is that the first part tells us what's
the food for today and what's the food for tomorrow and the second kind
of tells us which one to pick, should we go for same-day or should we go
for next day. And these two decisions together form the decision of what
to cook the next day.
To wrap this up and put these two together into the burger vector we need
the merge map. The merge map uses a nonlinear function that takes the
vector and turns the largest entry into a 1 and all the other entries
into a 0.
NOTE:
If you have experience with the neural networks then you can think
of this non-linear map as one hot encoding or as a combination of
some linear map and a sigmoid.
In next step, the merge map takes this vector form by the top three
entries and bottom three entries and just adds them to get the result,
which is a vector [0 1 0] corresponding to a burger. And that's the answer
because if we take an apple pie and a rainy day then we get a burger.
Instead of matrices, if we want to see this Neural Network with all its
nodes and edges then it will look as shown below:
Note: Only edges labelled as 1s are drawn and the edges labelled as 0 are
not drawn.
5. Our last matrix takes the vector and adds the top three entries
with the bottom three entries and gives the result which is the
vector corresponding to a burger.
What really happens is that the food coming out feeds back into the neural
network as input. And this is how the neural network looks:
The output vector of length three goes back into the neural network as
the input. And that's why it's called Recurrent Neural Networks.
RNN are super useful in many things, in particular they're very useful
when our data is sequential. So, whenever our data looks like it forms a
sequence and whenever the next data point depends a lot on the previous
ones then RNN is the best solution to use.