Professional Documents
Culture Documents
Transformers
▪Context
Context
▪“The animal didn't cross the street because it was
too tired”
▪What is “it”?
2
CDS.IISc.ac.in | Department of Computational and Data Sciences
Context
3
CDS.IISc.ac.in | Department of Computational and Data Sciences
Attention
▪Attention mechanism has an infinite reference
window.
4
CDS.IISc.ac.in | Department of Computational and Data Sciences
Encoder is a cont.
vector representation
of the inputs
5
CDS.IISc.ac.in | Department of Computational and Data Sciences
1. Input Embedding
6
CDS.IISc.ac.in | Department of Computational and Data Sciences
2. Positional Encoding
7
CDS.IISc.ac.in | Department of Computational and Data Sciences
8
CDS.IISc.ac.in | Department of Computational and Data Sciences
Multi-Head Attention
Q K V
X1 X2 X3
9
CDS.IISc.ac.in | Department of Computational and Data Sciences
Self Attention
▪ Associates each individual word in the input to
other words in the input.
10
CDS.IISc.ac.in | Department of Computational and Data Sciences
11
CDS.IISc.ac.in | Department of Computational and Data Sciences
12
CDS.IISc.ac.in | Department of Computational and Data Sciences
13
CDS.IISc.ac.in | Department of Computational and Data Sciences
14
CDS.IISc.ac.in | Department of Computational and Data Sciences
15
CDS.IISc.ac.in | Department of Computational and Data Sciences
▪N
N=2
16
CDS.IISc.ac.in | Department of Computational and Data Sciences
17
CDS.IISc.ac.in | Department of Computational and Data Sciences
18
CDS.IISc.ac.in | Department of Computational and Data Sciences
19
CDS.IISc.ac.in | Department of Computational and Data Sciences
Transformer Encoder
▪NN
Transformer Encoder
N Times
Transformer Encoder
Transformer Encoder
20
CDS.IISc.ac.in | Department of Computational and Data Sciences
Encoder Example
21
Credit: Y Tamura
CDS.IISc.ac.in | Department of Computational and Data Sciences
▪ No of tokens=9
▪ Token encoded dimension =512
▪ Split each token into 8 equal parts of 64 dimension
H1 H2 H3 H4 H5 H6 H7 H8
22
CDS.IISc.ac.in | Department of Computational and Data Sciences
23
CDS.IISc.ac.in | Department of Computational and Data Sciences
Transformer Decoder
▪Decoder is to generate
text
24
CDS.IISc.ac.in | Department of Computational and Data Sciences
25
CDS.IISc.ac.in | Department of Computational and Data Sciences
Decoder MH Attention1
26
CDS.IISc.ac.in | Department of Computational and Data Sciences
27
CDS.IISc.ac.in | Department of Computational and Data Sciences
Linear Classifier
28
CDS.IISc.ac.in | Department of Computational and Data Sciences
29
CDS.IISc.ac.in | Department of Computational and Data Sciences
Training a Transformer
▪Transformers typically use semi-supervised
learning with
‣Unsupervised pretraining over a very large dataset of
general text
‣Followed by supervised fine-tuning over a focused data
set of inputs and outputs for a particular task
▪Tasks for pretraining and fine-tuning commonly include:
‣language modeling
‣next-sentence prediction (aka completion)
‣question answering
‣reading comprehension
‣sentiment analysis
31
‣paraphrasing
CDS.IISc.ac.in | Department of Computational and Data Sciences
Pre-trained Models
▪ Since training a model requires huge datasets of
text and significant computation, researchers often
use common pretrained models
▪ Examples (~ December 2021) include
‣Google’s BERT model
‣Huggingface’s various Transformer models
‣OpenAI’s and GPT-3 models
32
CDS.IISc.ac.in | Department of Computational and Data Sciences
33
CDS.IISc.ac.in | Department of Computational and Data Sciences
GPT - Varients
GPT released June 2018
GPT-2 released Nov. 2019 with 1.5B parameters
GPT-3 released in 2020 with 175B parameters
34
CDS.IISc.ac.in | Department of Computational and Data Sciences
35
CDS.IISc.ac.in | Department of Computational and Data Sciences
37
CDS.IISc.ac.in | Department of Computational and Data Sciences
Patch Embedding
[CLS] Token
39
CDS.IISc.ac.in | Department of Computational and Data Sciences
ViT - Summary
40
CDS.IISc.ac.in | Department of Computational and Data Sciences
ViT Layers
41
CDS.IISc.ac.in | Department of Computational and Data Sciences
42
CDS.IISc.ac.in | Department of Computational and Data Sciences
ViT – in Numbers
43
CDS.IISc.ac.in | Department of Computational and Data Sciences
Critics
▪
•Better results - only with more data
44