Professional Documents
Culture Documents
analyticsindiamag.com
11-13 minutes
Table of Contents
THE BELAMY
1 of 11 8/19/2022, 2:15 AM
A Beginner’s Guide to Using Attention Layer in Neural Networks about:reader?url=https%3A%2F%2Fanalyticsindiamag.com%2Fa-begi...
3. Implementation
Image source
2 of 11 8/19/2022, 2:15 AM
A Beginner’s Guide to Using Attention Layer in Neural Networks about:reader?url=https%3A%2F%2Fanalyticsindiamag.com%2Fa-begi...
When we talk about the work of the encoder, we can say that it
modifies the sequential information into an embedding which can
also be called a context vector of a fixed length. A critical
disadvantage with the context vector of fixed length design is that
the network becomes incapable of remembering the large
sentences. We can often face the problem of forgetting the starting
part of the sequence after processing the whole sequence of
information or we can consider it as the sentence. So providing a
proper attention mechanism to the network, we can resolve the
issue.
3 of 11 8/19/2022, 2:15 AM
A Beginner’s Guide to Using Attention Layer in Neural Networks about:reader?url=https%3A%2F%2Fanalyticsindiamag.com%2Fa-begi...
Image source
4 of 11 8/19/2022, 2:15 AM
A Beginner’s Guide to Using Attention Layer in Neural Networks about:reader?url=https%3A%2F%2Fanalyticsindiamag.com%2Fa-begi...
For the output word at position t, the context vector Ct can be the
sum of the hidden states of the input sequence.
Here we can see that the sum of the hidden state is weighted by
the alignment scores. We can say that {αt,i} are the weights that
are responsible for defining how much of each source’s hidden
state should be taken into consideration for each output.
5 of 11 8/19/2022, 2:15 AM
A Beginner’s Guide to Using Attention Layer in Neural Networks about:reader?url=https%3A%2F%2Fanalyticsindiamag.com%2Fa-begi...
Self-Attention
Global/Soft
Local/Hard
Self-Attention Mechanism
6 of 11 8/19/2022, 2:15 AM
A Beginner’s Guide to Using Attention Layer in Neural Networks about:reader?url=https%3A%2F%2Fanalyticsindiamag.com%2Fa-begi...
Here in the image, the red color represents the word which is
currently learning and the blue color is of the memory, and the
intensity of the color represents the degree of memory activation.
7 of 11 8/19/2022, 2:15 AM
A Beginner’s Guide to Using Attention Layer in Neural Networks about:reader?url=https%3A%2F%2Fanalyticsindiamag.com%2Fa-begi...
Image source
Implementation
8 of 11 8/19/2022, 2:15 AM
A Beginner’s Guide to Using Attention Layer in Neural Networks about:reader?url=https%3A%2F%2Fanalyticsindiamag.com%2Fa-begi...
use_scale=False, **kwargs
)
token_embedding = layers.Embedding(input_dim=1000,
output_dim=64)
query_embeddings = token_embedding(query)
value_embeddings = token_embedding(value)
9 of 11 8/19/2022, 2:15 AM
A Beginner’s Guide to Using Attention Layer in Neural Networks about:reader?url=https%3A%2F%2Fanalyticsindiamag.com%2Fa-begi...
query_encoding = layer_cnn(query_embeddings)
value_encoding = layer_cnn(value_embeddings)
query_attention_seq = layers.Attention()
([query_encoding, value_encoding])
Till now, we have taken care of the shape of the embedding so that
we can put the required shape in the attention layer. Now if
required, we can use a pooling layer so that we can change the
shape of the embeddings,
query_encoding = layers.GlobalAveragePooling1D()
(query_encoding)
query_value_attention =
layers.GlobalAveragePooling1D()(query_attention_seq)
After adding the attention layer, we can make a DNN input layer by
concatenating the query and document embedding.
input_layer = tf.keras.layers.Concatenate()
([query_encoding, query_value_attention])
After all, we can add more layers and connect them to a model.
Final Words
10 of 11 8/19/2022, 2:15 AM
A Beginner’s Guide to Using Attention Layer in Neural Networks about:reader?url=https%3A%2F%2Fanalyticsindiamag.com%2Fa-begi...
References
11 of 11 8/19/2022, 2:15 AM