You are on page 1of 9

The Transformer Architecture TheAiEdge.

io

Decoder
'you'
Encoder
Encoder output
Predicting head

Encoder Decoder
block block
Encoder Decoder
block block
Encoder Decoder
block block
Token Token
embedding embedding

Position Position
embedding embedding
'how' 'are' 'you' 'doing' '?' [SOS] 'I' 'am' 'good' 'and'

Input Output
sequence sequence

The Overall Architecture


The Transformer Architecture TheAiEdge.io

even i
odd i

The Position Embedding


The Transformer Architecture TheAiEdge.io

Encoder block

Multihead Layer Feed Layer


Attention Normalization Forward Normalization
Layer network

The Encoder Block


The Transformer Architecture TheAiEdge.io

Keys
Self-attentions

Wk
Hidden Queries Softmax

states
Wq

Wv

Values

Hidden
states

The Self-Attention Layer


The Transformer Architecture TheAiEdge.io

Hidden
state

Layer
Normalization

The Layer Normalization


The Transformer Architecture TheAiEdge.io

dmodel
dff
dff

Linear layer dmodel

Linear layer

The Position-wise Feed-forward Network


The Transformer Architecture TheAiEdge.io

Encoder
output

Decoder block

Hidden
states

Cross Feed Layer


Attention Forward
Multihead Layer Normalization
Layer Layer
Attention Normalization network
Normalization
Layer

The Decoder Block


The Transformer Architecture TheAiEdge.io

Keys
Cross-attentions
Encoder
Wk
output
Queries Softmax

Wq

Wv

Values

Hidden
Decoder states
hidden
states

The Cross-Attention Layer


The Transformer Architecture TheAiEdge.io

‘How’

‘are’

‘you’ Encoder
‘doing’

‘?’ dmodel Vocabulary


Decoder size
Vocabulary size
hidden
[SOS]
states Sequence
‘I’ size

‘am’ Decoder
‘good’

‘and’ ArgMax
predictions
Linear layer

‘you’

The Predicting Head

You might also like