You are on page 1of 1

BERT uses Transformers (attention layers technique) that learns contextual

relations and meaning between words in a text. the basic transformer contains two
separate mechanisms, one is an encoder that reads the text input and a decoder that
creates output(prediction).

directional models read the text in a specific direction, (left to right or right
to left). Transformers encoder reads all the text at once, so we can say
transformers are nondirectional. this property allows transformers to learn the
context of words by taking surrounding words in any direction.

BERT data-input is a combination of 3 embeddings depending on the task we are


performing :

Position Embeddings: BERT learns the position/location of words in a sentence via


positional embeddings. This embedding helps BERT to capture the ‘order’ or
‘sequence’ information of a given sentence.

Segment Embeddings: (Optional Embedding) BERT takes sentence pairs as inputs for
(Question-Answering) tasks. BERT learns a unique embedding for the first and the
second sentences to help the model differentiate between them.

Token Embeddings: Token embedding basically contains all the information of input
text. it is an integer number specified for each unique word token.

You might also like