Professional Documents
Culture Documents
Sinusoidal Positional Embedding
Sinusoidal Positional Embedding
Outline
1 Introduction
3 Implementation
6 Conclusion
Introduction
Definitions
Periodicity
Intuition
Intuition (Ctd)
How it works
sin(f0 pi )
cos(f0 pi )
sin(f1 pi )
cos(f1 pi )
1
PE (pi ) = , with fi =
2k
.. 10000 4
.
sin(fd/2 pi )
cos(fd/2 pi
Illustration of SPE
Example: input sentence: ” I am a student ”.
The positions of the words ”I”, ”am”, ”a” and ”student” are
0, 1, 2 and 3 respectively.
Assuming these words have 4-dimensional word embeddings
(Fig. 4), the first two entries in the vector are computed as
follows:
0
PE (00 ) = sin 2(0)
, (2.1)
10000 4
= 0. (2.2)
0
PE (01 ) = cos 2(0)
, (2.3)
10000 4
= 1. (2.4)
Illustration (Ctd)
Illustration (Ctd)
Implementation
Implementaion Cont.
Embedding Similarity
Word Embedding 0.9995
Positional Embedding 0.86
See Notebooks
Implementaion (ctd).
Advantages
Disadvantages
Fixed pattern
Limited sequence length
Lack of contextual information
Limited generalisation to unseen positions.
Conclusion
References
[1] Ashish and Shazeer Vaswani Noam and Parmar, Attention is all you need,
Vol. 30, 2017.
[2] Amirhossein Kazemnejad, Transformer Architecture: The Positional
Encoding, 2019. "https://kazemnejad.com/blog/transformer_
architecture_positional_encoding/".