You are on page 1of 22

Attention in

instruction-tuned models
Kalle, Jun, Gerti, Phillip, Noman, Muhammad

Supervisor: M.A. Kai Kugler


Agenda
1. Introduction
1.1. Local Explainable AI
1.2. Introduction on explainable ai and instruction-tuning
1.3. Evolution of LLM
2. Distinction of Flan-T5 and T5
3. Distinction of BERT and PromptBERT
4. A novel approach: Instruction-tuning on BERT
5. Future Work
Introduction: Local Explainable AI
Introduction: Instruction-tuned Models
- Fine tuned LLMs with input-output pairs including instructions and attempts to follow
these instructions.

Image: Generative AI with Large Language Models (deeplearning.ai)


Introduction: Evolution of LLM
Attention vs BertViz patterns
Distinction of attention in different models
Model versions
- T5 (t5-small) - Flan-T5 (flan-t5-small)
- BERT: bert-base-uncased (RoBERTa-base) - PromptBERT: royokong/sup-PromptBERT

Input sentences
- sentence_a = "Generate a positive review for a place."

- sentence_b ="What a great thrift store. Super friendly service. Prices are the same as the
East side location (aka: very reasonable). Thrifteriffic!"
Distinction of T5 and Flan-T5 in BertViz - Encoder
T5

Layers: 6

Attention heads: 8

Layers included: 0, 2, 4, 5
Distinction of T5 and Flan-T5 in BertViz - Encoder
Flan-T5

Layers: 8

Attention heads: 6

Layers included: 0, 2, 4, 5, 7
Distinction of T5 and Flan-T5 in BertViz - Encoder
T5 Flan-T5

T5 mostly attends to
subsequent tokens first and
specifies its attention with
each layer.
Flan-T5 partially attends to
specific tokens first and
generalizes its attention with
each layer.
Distinction of T5 and Flan-T5 in BertViz - Decoder
T5

Layers: 6

Attention heads: 8

Layers included: 0, 2, 4, 5
Distinction of T5 and Flan-T5 in BertViz - Decoder
Flan-T5

Layers: 8

Attention heads: 6

Layers included: 0, 2, 4, 5, 7
Distinction of T5 and Flan-T5 in BertViz - Decoder
T5 Flan-T5
Distinction of BERT and PromptBERT
BERT
Distinction of BERT and PromptBERT
PromptBERT
Distinction of BERT and PromptBERT
BERT PromptBERT

BERT only attends to the PromptBERT attends to


end punctuation. multiple punctuations
between sentences
A novel approach: Instruction-tuning on BERT
- Inspired by the T5-Flan T5 approach, where
T5 is fine tuned on instruction datasets (Flan)
giving us Flan T5.
- Following the same approach with BERT,
transforming it into an encoder-decoder
model, suitable for text generation tasks.
- Adding a decoder, to generate an output text,
when we provide the instruction among the
input.
- At the moment, training performance given
only for english.
- Number of parameters: 137M
A novel approach: Instruction-tuning on BERT
- Training results on inference level:

- Training parameters:
batch size = 14

epochs = 0.97

learning rate = 0.00005


Distinction of BERT and instructionBERT

BertViz focuses mostly on [SEP]

instructionBertViz has a heterogeneous focus on the instruction


Future Work
We have defined the following goals:
- analyse instructionBERT with RoBERTa and promptRoBERTa respectively
- build instructionBERT with longer context, batch size and a training subset of Flan
- pursuit a multilingual setup
- measure the importance of attention heads
- run instructionBERT on benchmark (e.g. InstructEval)
- use syntax probing to further interpret results
References
- InstructionBERT repository: https://gitlab.com/Bachstelze/instructionbert
- Explainability for Large Language Models: A Survey: https://arxiv.org/abs/2309.01029
- Harnessing the Power of LLMs in Practice: A Survey on ChatGPT and Beyond
https://arxiv.org/abs/2304.13712
- Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
https://arxiv.org/abs/1910.10683
- Scaling Instruction-Finetuned Language Models https://arxiv.org/abs/2210.11416
- PromptBERT: Improving BERT Sentence Embeddings with Prompts
https://arxiv.org/abs/2201.04337
- InstructEval: Towards Holistic Evaluation of Instruction-Tuned Large Language Models
https://arxiv.org/abs/2306.04757
- Leveraging Pre-trained Checkpoints for Sequence Generation Tasks
https://arxiv.org/abs/1907.12461

You might also like