You are on page 1of 24

Introduction

Sinusoidal Positional Encoding(SPE)


Implementation
Variants of SPE LoPE
Pros (advantages) and cons (disadvantages)
Conclusion

SINUSOIDAL POSITIONAL EMBEDDING.

Henock Makumbu Jeremie Nlandu


Tony Chisenga Millicent Omondi
Omer Fotso Dzodzoenyenye Senanou
Leema Hamid Ahmed Abdalla
Asim Mohamed Ignatius Boadi

African Institute for Mathematical Sciences, AIMS-Senegal

April 18, 2024

Presenters SINUSOIDAL POSITIONAL EMBEDDING. 1 / 24


Introduction
Sinusoidal Positional Encoding(SPE)
Implementation
Variants of SPE LoPE
Pros (advantages) and cons (disadvantages)
Conclusion

Outline

1 Introduction

2 Sinusoidal Positional Encoding(SPE)

3 Implementation

4 Variants of SPE LoPE

5 Pros (advantages) and cons (disadvantages)

6 Conclusion

Presenters SINUSOIDAL POSITIONAL EMBEDDING. 2 / 24


Introduction
Sinusoidal Positional Encoding(SPE)
Implementation
Variants of SPE LoPE
Pros (advantages) and cons (disadvantages)
Conclusion

Introduction

SPE is an absolute positional encoding that uses sinusoidal


functions to incorporate positional information smoothly into the
input sequences.
Why SPE?
It captures the positional information.
Easy computation.

Presenters SINUSOIDAL POSITIONAL EMBEDDING. 3 / 24


Introduction
Sinusoidal Positional Encoding(SPE)
Implementation
Variants of SPE LoPE
Pros (advantages) and cons (disadvantages)
Conclusion

Advantages of SPE over other absolute PE

The sinusoidal functions used in sinusoidal positional encoding


have the advantage of being continuous and periodic,
enabling the model to extrapolate positional information to
unseen positions.
Compared to one-hot vectors, sinusoidal positional encoding
provides a more compact representation that scales better
with sequence length.

Presenters SINUSOIDAL POSITIONAL EMBEDDING. 4 / 24


Introduction
Sinusoidal Positional Encoding(SPE)
Implementation
Variants of SPE LoPE
Pros (advantages) and cons (disadvantages)
Conclusion

Definitions

The sine function (sin) represents a sinusoidal wave that


oscillates between -1 and 1.
The cosine function (cos) represents a sinusoidal wave that is
shifted by a quarter period from the sine function.

Presenters SINUSOIDAL POSITIONAL EMBEDDING. 5 / 24


Introduction
Sinusoidal Positional Encoding(SPE)
Implementation
Variants of SPE LoPE
Pros (advantages) and cons (disadvantages)
Conclusion

Periodicity

The period of a periodic


function f is the smallest
positive value of p such
that f (x + p) = f (x) for all
x in its domain.
sin(x) and cos(x) have a
Figure: Sine and cosine.
period of 2π,

Presenters SINUSOIDAL POSITIONAL EMBEDDING. 6 / 24


Introduction
Sinusoidal Positional Encoding(SPE)
Implementation
Variants of SPE LoPE
Pros (advantages) and cons (disadvantages)
Conclusion

Intuition

Representing positions in binary format is inefficient in terms


of memory.
Sine and Cosine are continuous functions.
Different positions need to be represented with different
encodings. Using sine and cosine with different frequencies
and phases ensures the uniqueness of positional embeddings.
Sine and cosine together can capture the alternating
sequences in binary representation.

Presenters SINUSOIDAL POSITIONAL EMBEDDING. 7 / 24


Introduction
Sinusoidal Positional Encoding(SPE)
Implementation
Variants of SPE LoPE
Pros (advantages) and cons (disadvantages)
Conclusion

Intuition (Ctd)

Figure: Binary representation of integers from 0 to 16.

Presenters SINUSOIDAL POSITIONAL EMBEDDING. 8 / 24


Introduction
Sinusoidal Positional Encoding(SPE)
Implementation
Variants of SPE LoPE
Pros (advantages) and cons (disadvantages)
Conclusion

How it works

The (absolute) positional encoding has the same dimension


as the word embedding vectors.
For each entry, i in the positional embedding vector for the
word in position p, we compute the value using :
  
pi


 sin 2k , if i = 2k,

 10000 4
PE (pi ) =
 
pi



cos

2k , if i = 2k + 1.
10000 4
d, the dimension of the embeddings (such that d ≡ 02 ).

Presenters SINUSOIDAL POSITIONAL EMBEDDING. 9 / 24


Introduction
Sinusoidal Positional Encoding(SPE)
Implementation
Variants of SPE LoPE
Pros (advantages) and cons (disadvantages)
Conclusion

How it works (ctd)

 
sin(f0 pi )

 cos(f0 pi ) 

 sin(f1 pi ) 

cos(f1 pi )
 1
PE (pi ) =   , with fi =
 
2k
 ..  10000 4

 . 

 sin(fd/2 pi ) 
cos(fd/2 pi

The embedding that is fed to the model: ψ(wi ) + PE (pi ),


with ψ(wi ) the word-embedding of the word wi in a sentence
[w1 , . . . , wn ].
Presenters SINUSOIDAL POSITIONAL EMBEDDING. 10 / 24
Introduction
Sinusoidal Positional Encoding(SPE)
Implementation
Variants of SPE LoPE
Pros (advantages) and cons (disadvantages)
Conclusion

Illustration of SPE
Example: input sentence: ” I am a student ”.
The positions of the words ”I”, ”am”, ”a” and ”student” are
0, 1, 2 and 3 respectively.
Assuming these words have 4-dimensional word embeddings
(Fig. 4), the first two entries in the vector are computed as
follows:
 
0
PE (00 ) = sin 2(0)
, (2.1)
10000 4
= 0. (2.2)
 
0
PE (01 ) = cos 2(0)
, (2.3)
10000 4
= 1. (2.4)

Presenters SINUSOIDAL POSITIONAL EMBEDDING. 11 / 24


Introduction
Sinusoidal Positional Encoding(SPE)
Implementation
Variants of SPE LoPE
Pros (advantages) and cons (disadvantages)
Conclusion

How does it work (ctd)

Presenters SINUSOIDAL POSITIONAL EMBEDDING. 12 / 24


Introduction
Sinusoidal Positional Encoding(SPE)
Implementation
Variants of SPE LoPE
Pros (advantages) and cons (disadvantages)
Conclusion

Illustration (Ctd)

Figure: 4− dimensional word Figure: 4−dimensional sinusoidal


embeddings positional embeddings.

Presenters SINUSOIDAL POSITIONAL EMBEDDING. 13 / 24


Introduction
Sinusoidal Positional Encoding(SPE)
Implementation
Variants of SPE LoPE
Pros (advantages) and cons (disadvantages)
Conclusion

Illustration (Ctd)

Figure: 4-dimensional word embeddings + sinusoidal posiditional


encoding.
Presenters SINUSOIDAL POSITIONAL EMBEDDING. 14 / 24
Introduction
Sinusoidal Positional Encoding(SPE)
Implementation
Variants of SPE LoPE
Pros (advantages) and cons (disadvantages)
Conclusion

Why Are Both Sine and Cosine Used


Sinusoidal positional encoding allows the model to attend relative
positions effortlessly.
They provide complementary information across the
dimensions of each word.
They ensure the uniqueness of the position encoding vector of
each word via the differences in frequencies.
They enable the model to pay attention to words at regular
positions in the sequence using their periodicity properties.
They constrain the values of the input to lie between -1 to 1
and assure stability to avoid exploding gradients.
They allow the prediction of positional information for longer
sequences based on observed patterns in shorter sequences.

Presenters SINUSOIDAL POSITIONAL EMBEDDING. 15 / 24


Introduction
Sinusoidal Positional Encoding(SPE)
Implementation
Variants of SPE LoPE
Pros (advantages) and cons (disadvantages)
Conclusion

Implementation

We employed the skip-gram model within the word2vec


framework provided by the gensim package to derive
embeddings for both the ’black’ and ’brown’ words.

Presenters SINUSOIDAL POSITIONAL EMBEDDING. 16 / 24


Introduction
Sinusoidal Positional Encoding(SPE)
Implementation
Variants of SPE LoPE
Pros (advantages) and cons (disadvantages)
Conclusion

Implementaion Cont.

We utilized the following statement to highlight the significance of


integrating positional information into our word embeddings.
Example: The black cat sat on the couch and the brown dog
slept on the rug.

Embedding Similarity
Word Embedding 0.9995
Positional Embedding 0.86
See Notebooks

Presenters SINUSOIDAL POSITIONAL EMBEDDING. 17 / 24


Introduction
Sinusoidal Positional Encoding(SPE)
Implementation
Variants of SPE LoPE
Pros (advantages) and cons (disadvantages)
Conclusion

Implementaion (ctd).

Figure: Sinusoidal Positional Encoding for Positions

Presenters SINUSOIDAL POSITIONAL EMBEDDING. 18 / 24


Introduction
Sinusoidal Positional Encoding(SPE)
Implementation
Variants of SPE LoPE
Pros (advantages) and cons (disadvantages)
Conclusion

Variants of SPE LSPE

LSPE introduces a learnable component to the traditional


sinusoidal embeddings. This learnable component is
represented as a parameter or parameters, denoted as L,
which are learned during the training process.
The learnable component is combined with the traditional
sinusoidal embeddings to create the final positional
embeddings. This combination can be achieved through
addition, multiplication, or any other suitable operation.

Presenters SINUSOIDAL POSITIONAL EMBEDDING. 19 / 24


Introduction
Sinusoidal Positional Encoding(SPE)
Implementation
Variants of SPE LoPE
Pros (advantages) and cons (disadvantages)
Conclusion

Variants of SPE LoPE(ctd)

Let TraditionalEmbeddings(pos,i) denote the traditional


sinusoidal embeddings, then the final LSPE
Embeddings(pos,i) can be computed as:
LSPEEmbeddings(pos,i)=TraditionalEmbeddings(pos,i)+L(i)
where L(i) represents the learnable component corresponding
to the dimension i of the embeddings.

Presenters SINUSOIDAL POSITIONAL EMBEDDING. 20 / 24


Introduction
Sinusoidal Positional Encoding(SPE)
Implementation
Variants of SPE LoPE
Pros (advantages) and cons (disadvantages)
Conclusion

Advantages

Deterministic regardless of the token.


Can handle long word embeddings.
The sine and cosine functions have values [−1, 1], which keeps
the values of the positional encoding matrix in a normalized
range.
It measures the similarity between positions, thus enabling us
to encode the relative positions of words.
Given that the sinusoid for each position is different, there is a
unique way of encoding each position.

Presenters SINUSOIDAL POSITIONAL EMBEDDING. 21 / 24


Introduction
Sinusoidal Positional Encoding(SPE)
Implementation
Variants of SPE LoPE
Pros (advantages) and cons (disadvantages)
Conclusion

Disadvantages

Fixed pattern
Limited sequence length
Lack of contextual information
Limited generalisation to unseen positions.

Presenters SINUSOIDAL POSITIONAL EMBEDDING. 22 / 24


Introduction
Sinusoidal Positional Encoding(SPE)
Implementation
Variants of SPE LoPE
Pros (advantages) and cons (disadvantages)
Conclusion

Conclusion

SPE is an efficient method for adding positional information


to sequence data.
It captures the order and relative positions of the elements in
a representation.
It has several advantages over traditional methods like
one-hot vectors as it eliminates the need for large vector sizes
that grow with sequence length.
It scales better with longer sequences, as the encoding size
remains costant regardless of the sequence length.

Presenters SINUSOIDAL POSITIONAL EMBEDDING. 23 / 24


Introduction
Sinusoidal Positional Encoding(SPE)
Implementation
Variants of SPE LoPE
Pros (advantages) and cons (disadvantages)
Conclusion

References

[1] Ashish and Shazeer Vaswani Noam and Parmar, Attention is all you need,
Vol. 30, 2017.
[2] Amirhossein Kazemnejad, Transformer Architecture: The Positional
Encoding, 2019. "https://kazemnejad.com/blog/transformer_
architecture_positional_encoding/".

Presenters SINUSOIDAL POSITIONAL EMBEDDING. 24 / 24

You might also like