Professional Documents
Culture Documents
instruction-tuned models
Kalle, Jun, Gerti, Phillip, Noman, Muhammad
Input sentences
- sentence_a = "Generate a positive review for a place."
- sentence_b ="What a great thrift store. Super friendly service. Prices are the same as the
East side location (aka: very reasonable). Thrifteriffic!"
Distinction of T5 and Flan-T5 in BertViz - Encoder
T5
Layers: 6
Attention heads: 8
Layers included: 0, 2, 4, 5
Distinction of T5 and Flan-T5 in BertViz - Encoder
Flan-T5
Layers: 8
Attention heads: 6
Layers included: 0, 2, 4, 5, 7
Distinction of T5 and Flan-T5 in BertViz - Encoder
T5 Flan-T5
T5 mostly attends to
subsequent tokens first and
specifies its attention with
each layer.
Flan-T5 partially attends to
specific tokens first and
generalizes its attention with
each layer.
Distinction of T5 and Flan-T5 in BertViz - Decoder
T5
Layers: 6
Attention heads: 8
Layers included: 0, 2, 4, 5
Distinction of T5 and Flan-T5 in BertViz - Decoder
Flan-T5
Layers: 8
Attention heads: 6
Layers included: 0, 2, 4, 5, 7
Distinction of T5 and Flan-T5 in BertViz - Decoder
T5 Flan-T5
Distinction of BERT and PromptBERT
BERT
Distinction of BERT and PromptBERT
PromptBERT
Distinction of BERT and PromptBERT
BERT PromptBERT
- Training parameters:
batch size = 14
epochs = 0.97