You are on page 1of 1

Hierarchical Sentence Level Attention Transformer

Attention for sentence selection


att
Q K V + encoder token out

mem 1 mem 2 mem 3


Encoder
Decoder

mem 1 mem 2 mem 3 token vectors token vectors token vectors


mem 1 mem 2 mem 3 token vectors token vectors token vectors

transformer layer transformer layer transformer layer transformer layer


transformer layer transformer layer transformer layer transformer layer

Q K V Q K V Q K V Q K V
Q K V Q K V Q K V Q K V

N layers
N layers

mem 1 mem 2 mem 3 token vectors token vectors token vectors


mem 1 mem 2 mem 3 token vectors token vectors token vectors

transformer layer transformer layer transformer layer transformer layer


transformer layer transformer layer transformer layer transformer layer

Q K V Q K V Q K V Q K V
Q K V Q K V Q K V Q K V

mem 1 mem 2 mem 3 input sentence 1 input sentence 2 input sentence 3


mem 1 mem 2 mem 3 output segment 1 output segment 2 output segment 3

You might also like