Welcome to Scribd!

Skip carousel

Attention in Instruction-Tuned Models

Uploaded by

redalert4ever4

0% found this document useful (0 votes)

6 views22 pages

Netzwerk_neu_A11_

Original Title

Attention in instruction-tuned models (1)

Copyright

Available Formats

PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Report this Document

Netzwerk_neu_A11_

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

0% found this document useful (0 votes)

6 views22 pages

Attention in Instruction-Tuned Models

Uploaded by

redalert4ever4

Netzwerk_neu_A11_

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

Jump to Page

You are on page 1of 22

Search inside document

Attention in

instruction-tuned models
Kalle, Jun, Gerti, Phillip, Noman, Muhammad

Supervisor: M.A. Kai Kugler

Agenda
1. Introduction
1.1. Local Explainable AI
1.2. Introduction on explainable ai and instruction-tuning
1.3. Evolution of LLM
2. Distinction of Flan-T5 and T5
3. Distinction of BERT and PromptBERT
4. A novel approach: Instruction-tuning on BERT
5. Future Work
Introduction: Local Explainable AI
Introduction: Instruction-tuned Models
- Fine tuned LLMs with input-output pairs including instructions and attempts to follow
these instructions.

Image: Generative AI with Large Language Models (deeplearning.ai)

Introduction: Evolution of LLM
Attention vs BertViz patterns
Distinction of attention in different models
Model versions
- T5 (t5-small) - Flan-T5 (flan-t5-small)
- BERT: bert-base-uncased (RoBERTa-base) - PromptBERT: royokong/sup-PromptBERT

Input sentences
- sentence_a = "Generate a positive review for a place."

- sentence_b ="What a great thrift store. Super friendly service. Prices are the same as the
East side location (aka: very reasonable). Thrifteriffic!"
Distinction of T5 and Flan-T5 in BertViz - Encoder
T5

Layers: 6

Attention heads: 8

Layers included: 0, 2, 4, 5
Distinction of T5 and Flan-T5 in BertViz - Encoder
Flan-T5

Layers: 8

Attention heads: 6

Layers included: 0, 2, 4, 5, 7
Distinction of T5 and Flan-T5 in BertViz - Encoder
T5 Flan-T5

T5 mostly attends to
subsequent tokens first and
specifies its attention with
each layer.
Flan-T5 partially attends to
specific tokens first and
generalizes its attention with
each layer.
Distinction of T5 and Flan-T5 in BertViz - Decoder
T5

Layers: 6

Attention heads: 8

Layers included: 0, 2, 4, 5
Distinction of T5 and Flan-T5 in BertViz - Decoder
Flan-T5

Layers: 8

Attention heads: 6

Layers included: 0, 2, 4, 5, 7
Distinction of T5 and Flan-T5 in BertViz - Decoder
T5 Flan-T5
Distinction of BERT and PromptBERT
BERT
Distinction of BERT and PromptBERT
PromptBERT
Distinction of BERT and PromptBERT
BERT PromptBERT

BERT only attends to the PromptBERT attends to

end punctuation. multiple punctuations
between sentences
A novel approach: Instruction-tuning on BERT
- Inspired by the T5-Flan T5 approach, where
T5 is fine tuned on instruction datasets (Flan)
giving us Flan T5.
- Following the same approach with BERT,
transforming it into an encoder-decoder
model, suitable for text generation tasks.
- Adding a decoder, to generate an output text,
when we provide the instruction among the
input.
- At the moment, training performance given
only for english.
- Number of parameters: 137M
A novel approach: Instruction-tuning on BERT
- Training results on inference level:

- Training parameters:
batch size = 14

epochs = 0.97

learning rate = 0.00005

Distinction of BERT and instructionBERT

BertViz focuses mostly on [SEP]

instructionBertViz has a heterogeneous focus on the instruction

Future Work
We have defined the following goals:
- analyse instructionBERT with RoBERTa and promptRoBERTa respectively
- build instructionBERT with longer context, batch size and a training subset of Flan
- pursuit a multilingual setup
- measure the importance of attention heads
- run instructionBERT on benchmark (e.g. InstructEval)
- use syntax probing to further interpret results
References
- InstructionBERT repository: https://gitlab.com/Bachstelze/instructionbert
- Explainability for Large Language Models: A Survey: https://arxiv.org/abs/2309.01029
- Harnessing the Power of LLMs in Practice: A Survey on ChatGPT and Beyond
https://arxiv.org/abs/2304.13712
- Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
https://arxiv.org/abs/1910.10683
- Scaling Instruction-Finetuned Language Models https://arxiv.org/abs/2210.11416
- PromptBERT: Improving BERT Sentence Embeddings with Prompts
https://arxiv.org/abs/2201.04337
- InstructEval: Towards Holistic Evaluation of Instruction-Tuned Large Language Models
https://arxiv.org/abs/2306.04757
- Leveraging Pre-trained Checkpoints for Sequence Generation Tasks
https://arxiv.org/abs/1907.12461

Students' Guide to Programming Languages
From Everand
Students' Guide to Programming Languages
Malcolm Bull
Rating: 5 out of 5 stars
5/5 (1)
Huggingface Co Blog Warm Starting Encoder Decoder Data Preprocessing
Document20 pages
Huggingface Co Blog Warm Starting Encoder Decoder Data Preprocessing
Seven 7
No ratings yet
LEARN MPLS FROM SCRATCH PART-A: A Beginner's Guide to Next Level of Networking
From Everand
LEARN MPLS FROM SCRATCH PART-A: A Beginner's Guide to Next Level of Networking
POONAM DEVI
No ratings yet
6-Bert T5 GPT
Document31 pages
6-Bert T5 GPT
Mariem El Mechry
No ratings yet
Real-World Natural Language Processing: Practical applications with deep learning
From Everand
Real-World Natural Language Processing: Practical applications with deep learning
Masato Hagiwara
No ratings yet
Bart - Bartpho: Bartpho: Pre-Trained Sequence-To-Sequence Models For Vietnamese
Document19 pages
Bart - Bartpho: Bartpho: Pre-Trained Sequence-To-Sequence Models For Vietnamese
nguyentthai96
No ratings yet
Demystifying Large Language Models: Unraveling the Mysteries of Language Transformer Models, Build from Ground up, Pre-train, Fine-tune and Deployment
From Everand
Demystifying Large Language Models: Unraveling the Mysteries of Language Transformer Models, Build from Ground up, Pre-train, Fine-tune and Deployment
James Chen
No ratings yet
Csp0506 - Python Programming Updated
Document9 pages
Csp0506 - Python Programming Updated
Mahesh Kushwah
No ratings yet
Python For Data Science
From Everand
Python For Data Science
Kevin Clark
No ratings yet
BERT Explained - State of The Art Language Model For NLP - by Rani Horev - Towards Data Science
Document8 pages
BERT Explained - State of The Art Language Model For NLP - by Rani Horev - Towards Data Science
OnlyBy Myself
100% (1)
Mastering Python: A Comprehensive Guide to Programming
From Everand
Mastering Python: A Comprehensive Guide to Programming
Christine Lambertson
No ratings yet
TOEFL
Document11 pages
TOEFL
michael
No ratings yet
mt5 A Massively Multilingual Pre Trained Text To Text 9iojxtx56w
Document16 pages
mt5 A Massively Multilingual Pre Trained Text To Text 9iojxtx56w
Abhishek Junghare
No ratings yet
BERT Finetuning Theory
Document14 pages
BERT Finetuning Theory
Raviraj
No ratings yet
Collect TOEFL
Document17 pages
Collect TOEFL
api-3850071
0% (1)
BARTpho: Pre-Trained Sequence-to-Sequence Models For Vietnamese
Document50 pages
BARTpho: Pre-Trained Sequence-to-Sequence Models For Vietnamese
MInh Thanh
No ratings yet
Problem Statement:: Rule-Based Machine Translation (RBMT), Statistical Machine Translation (SMT), Neural
Document4 pages
Problem Statement:: Rule-Based Machine Translation (RBMT), Statistical Machine Translation (SMT), Neural
Govind Messi
No ratings yet
495 Lecture 11 BERT
Document31 pages
495 Lecture 11 BERT
Mohibur Nabil
No ratings yet
Basic TOEFL Test With Answer
Document32 pages
Basic TOEFL Test With Answer
wizett2
No ratings yet
Lora and Qlora
Document5 pages
Lora and Qlora
Viditya
No ratings yet
LP Vi Manual
Document77 pages
LP Vi Manual
Jahan Chaware
No ratings yet
Final Review Paper Compiler Construction Spring 2020
Document1 page
Final Review Paper Compiler Construction Spring 2020
Faisal Shehzad
No ratings yet
Introduction To TOEFL iBT - Revised PDF
Document6 pages
Introduction To TOEFL iBT - Revised PDF
Siti Fairunisya
No ratings yet
How To Train Bert
Document18 pages
How To Train Bert
Raviraj
100% (1)
ConTEXt - Installing Expert Fonts - Minion Pro
Document17 pages
ConTEXt - Installing Expert Fonts - Minion Pro
Guy Styles
No ratings yet
Code-Switched Machine Translation
Document15 pages
Code-Switched Machine Translation
Shounak Mondal
No ratings yet
Dsda 01
Document15 pages
Dsda 01
Jithinmathai jacob
No ratings yet
A Deep Generative Approach To Native Language Identification
Document6 pages
A Deep Generative Approach To Native Language Identification
App Souls
No ratings yet
Lec 02
Document33 pages
Lec 02
Ghhanali Singh
No ratings yet
Final LP-VI NLP Manual 2023-24
Document29 pages
Final LP-VI NLP Manual 2023-24
shreyasnagare3635
No ratings yet
Asset-V1 Databricks+LLM102x+2T2023+type@asset+block@LLMs Foundation Models From The Ground Up Syllabus
Document3 pages
Asset-V1 Databricks+LLM102x+2T2023+type@asset+block@LLMs Foundation Models From The Ground Up Syllabus
RuchidaPithaksiripan
No ratings yet
ANLP semVI Labmanual
Document33 pages
ANLP semVI Labmanual
kun.dha.rt22
No ratings yet
NLP - Short Assignments
Document8 pages
NLP - Short Assignments
wemela1891
No ratings yet
Preprint Jesus
Document2 pages
Preprint Jesus
Jesus Urbaneja
No ratings yet
PZ02A - Language Translation
Document17 pages
PZ02A - Language Translation
Jimmy Gupta
No ratings yet
NLP Assignment 2
Document10 pages
NLP Assignment 2
Adam Hafizi
No ratings yet
NLP DL Lecture4
Document78 pages
NLP DL Lecture4
thanh.tien.96.vn
No ratings yet
Compiler Design Notes For Uptu Syllabus PDF
Document114 pages
Compiler Design Notes For Uptu Syllabus PDF
Shivam Verma
No ratings yet
Note 1015202360148 PM
Document4 pages
Note 1015202360148 PM
Nussiebah Ghanem
No ratings yet
Assignment 5b
Document3 pages
Assignment 5b
udita Pandey
No ratings yet
1 Digital Filter Design: Adapted From Assignment by Prof. Preeti Rao
Document1 page
1 Digital Filter Design: Adapted From Assignment by Prof. Preeti Rao
Khushal Kharade
No ratings yet
Unit I
Document41 pages
Unit I
Prudhvi Kurakula
No ratings yet
Steps To Solve - 1
Document2 pages
Steps To Solve - 1
Aman Singh
No ratings yet
Bert
Document5 pages
Bert
Siddharth NK
No ratings yet
CD Course File
Document114 pages
CD Course File
Ashutosh Jharkhade
No ratings yet
Spark NLP Training - Public - July 2020
Document51 pages
Spark NLP Training - Public - July 2020
Xuân Vinh Nguyễn
No ratings yet
FeResPost Manual
Document627 pages
FeResPost Manual
Stefano Barbieri
No ratings yet
Tianzheng Troy Wang CIS498EAS499 Submission
Document51 pages
Tianzheng Troy Wang CIS498EAS499 Submission
dan_1967
No ratings yet
Microprocessor - 8086 Instruction Sets
Document4 pages
Microprocessor - 8086 Instruction Sets
Yaqeen Matheam
No ratings yet
Yourtts: Towards Zero-Shot Multi-Speaker Tts and Zero-Shot Voice Conversion For Everyone
Document8 pages
Yourtts: Towards Zero-Shot Multi-Speaker Tts and Zero-Shot Voice Conversion For Everyone
Jefferson Quispe Pinares
No ratings yet
CSE313 - Compiler Design Syllabus
Document2 pages
CSE313 - Compiler Design Syllabus
Damodharan D
No ratings yet
Rebertsubmission116 NW
Document26 pages
Rebertsubmission116 NW
sheepriyanka322
No ratings yet
Getting Hands-On With BERT - Getting Started With Google BERT
Document25 pages
Getting Hands-On With BERT - Getting Started With Google BERT
gonzalo
No ratings yet
Python Prog
Document202 pages
Python Prog
gamer stupid
No ratings yet
Install TTF Font PDF
Document10 pages
Install TTF Font PDF
Signia Diseño
No ratings yet
EE 320 (CS 320) - Computer Organization and Assembly - Jahangir Ikram
Document4 pages
EE 320 (CS 320) - Computer Organization and Assembly - Jahangir Ikram
Sheikh Asher
No ratings yet
Python Course File Updated 18 19
Document11 pages
Python Course File Updated 18 19
Penchalaiah Narasapuram
No ratings yet
FFT Thesis
Document4 pages
FFT Thesis
s0kuzej0byn2
100% (1)
ASN.1 Summary
Document8 pages
ASN.1 Summary
infombm
No ratings yet
Transformer Architecture
Document18 pages
Transformer Architecture
pragyajahnvi9
No ratings yet
Sem. I 18-19 - Level I Booklist 18-19
Document3 pages
Sem. I 18-19 - Level I Booklist 18-19
Maxine
No ratings yet
Eng9 Q1 Mod4 TakeNoteOfSequenceSignals Version3
Document21 pages
Eng9 Q1 Mod4 TakeNoteOfSequenceSignals Version3
Jeneva Catalogo
No ratings yet
DXC Resume
Document3 pages
DXC Resume
Mahaprasad Dalai
No ratings yet
Performance Appraisal: The Importance of Rater Training: DSP Dev Kumar
Document17 pages
Performance Appraisal: The Importance of Rater Training: DSP Dev Kumar
Teddy Mc
No ratings yet
1st Course Bgu Exam I Quimester
Document3 pages
1st Course Bgu Exam I Quimester
UE Agustin Vera Loor
No ratings yet
ĐỀ UNIT 2 TEST 2
Document2 pages
ĐỀ UNIT 2 TEST 2
Đào Phan Anh
No ratings yet
Cape Pure Mathematics Unit 2 (2015) P1 Answers
Document8 pages
Cape Pure Mathematics Unit 2 (2015) P1 Answers
Daveed
100% (5)
Non Parametric Chapter 4
Document5 pages
Non Parametric Chapter 4
Roan Suanque Martir
No ratings yet
Q3e Online
Document3 pages
Q3e Online
Luis Enrique May Rodríguez
No ratings yet
COMMUNICATION
Document4 pages
COMMUNICATION
Chenee Konghop
No ratings yet
Overcoming The Triple Burden of Malnutrition in China: July 2020
Document5 pages
Overcoming The Triple Burden of Malnutrition in China: July 2020
natasya amabel
No ratings yet
Adlerian Therapy
Document4 pages
Adlerian Therapy
Ajwad Ghali
No ratings yet
New Umich Resume 1
Document1 page
New Umich Resume 1
api-583424414
No ratings yet
Activity 3: Episodes Procedural Knowledge Declarative Knowledge
Document4 pages
Activity 3: Episodes Procedural Knowledge Declarative Knowledge
Lovelyn Maristela
92% (12)
Assignment 5 Research
Document3 pages
Assignment 5 Research
api-289969121
No ratings yet
ExplanationExplanation of The Method of Interior Prayer of The Method of Interior Prayer (Lasallian Book)
Document183 pages
ExplanationExplanation of The Method of Interior Prayer of The Method of Interior Prayer (Lasallian Book)
Brchudu
No ratings yet
A Krishnaiahvbnjnn
Document16 pages
A Krishnaiahvbnjnn
Sandeep
No ratings yet
English - Afaan Oromoo Machine Translation: An Experiment Using A Statistical Approach
Document17 pages
English - Afaan Oromoo Machine Translation: An Experiment Using A Statistical Approach
Christine Ghali
No ratings yet
Cross-Cultural Differences in Romania
Document12 pages
Cross-Cultural Differences in Romania
consultant_ro
100% (3)
Bse 1-1 - Ubas, Christine Joice B - Activity 3 - Module 3
Document2 pages
Bse 1-1 - Ubas, Christine Joice B - Activity 3 - Module 3
Ubas, Christine Joice B.
No ratings yet
Regstreet Questionnaire - Internship - 2020 II
Document3 pages
Regstreet Questionnaire - Internship - 2020 II
Samrat Yadav
No ratings yet
Architects Bangalore
Document2 pages
Architects Bangalore
Madhukeshwar Bhat
No ratings yet
DCSP Candidate Briefing Pack
Document14 pages
DCSP Candidate Briefing Pack
Adi
No ratings yet
Work Attitude (Integrity) Screening: Sam Sample
Document9 pages
Work Attitude (Integrity) Screening: Sam Sample
عديالعربي
No ratings yet
Core - Module 00 - Course Introduction
Document10 pages
Core - Module 00 - Course Introduction
raghav
No ratings yet
Leadership and Corporate Accountability
Document4 pages
Leadership and Corporate Accountability
Karishma Koul
0% (1)
Family Reading Comprehension and Other Activities
Document5 pages
Family Reading Comprehension and Other Activities
JUAN DAVID GALVIS SANCHEZ
No ratings yet
Citing Secondary or Indirect Sources REVISED 2014
Document2 pages
Citing Secondary or Indirect Sources REVISED 2014
CUNG DINH
No ratings yet
BSID-II Mod
Document9 pages
BSID-II Mod
Mellow
No ratings yet
Jordan F Boyd Resume
Document1 page
Jordan F Boyd Resume
api-240673325
No ratings yet