Deeplearning - Ai Deeplearning - Ai

Copyright Notice
These slides are distributed under the Creative Commons License.
DeepLearning.AI makes these slides available for educational purposes. You may not use or
distribute these slides for commercial purposes. You may make copies of these slides and
use or distribute them for educational purposes as long as you cite DeepLearning.AI as the
source of the slides.
For the rest of the details of the license, see

https://creativecommons.org/licenses/by-sa/2.0/legalcode
Generative AI and
large-language
models (LLMs)
FINE-TUNING, INSTRUCTION
PROMPTS, AND PARAMETER
EFFICIENT FINE-TUNING
Fine-tuning with
instruction prompts
GenAI project lifecycle
Scope Select Adapt and align model Application integration
Prompt
engineering
Augment
Optimize
model and
Define the Choose Fine-tuning and deploy
Evaluate build LLM-
problem model model for
powered
inference
applications
Align with
human
feedback
GenAI project lifecycle
Scope Select Adapt and align model Application integration
Prompt
engineering
Augment
Optimize
model and
Define the Choose Fine-tuning and deploy
Evaluate build LLM-
problem model model for
powered
inference
applications
Align with
human
feedback
Fine-tuning an LLM
with instruction prompts
In-context learning (ICL) - zero shot inference
Prompt Model Completion

Classify this review: Classify this review:
I loved this DVD! I loved this DVD!
Sentiment: Sentiment: Positive
LLM
In-context learning (ICL) - zero shot inference

I loved this DVD! LLM I loved this DVD!
Sentiment: Sentiment: eived a
very nice book review
In-context learning (ICL) - one/few shot inference

Sentiment: Positive Sentiment: Positive

I don’t like this I don’t like this
chair. chair.
Sentiment: Sentiment: Negative
One-shot or Few-shot Inference

Limitations of in-context learning
Classify this review:
I loved this movie! ● In-context learning may
Sentiment: Positive
not work for smaller
Classify this review: Even with models LLM
I don’t like this chair. multiple
Sentiment: Negative examples
● Examples take up space
This sofa is so ugly. in the context window
Sentiment: Negative
Classify this review: Instead, try fine-tuning

Who would use this product? the model
Sentiment:
Context Window
LLM fine-tuning at a high level
LLM pre-training
TEXT[...]
TEXT[...] Model
TEXT[...]
TEXT[...]
TEXT[...]
TEXT[...] Pre-trained
TEXT[...]
TEXT[...]
LLM
TEXT[...]
TEXT[...]
GB - TB - PB
of unstructured textual data
LLM fine-tuning
Model Task-specific examples Model
TEXT[...], LABEL[...]
Pre-trained TEXT[...],
TEXT[...],
LABEL[...]
LABEL[...]
Fine-tuned
LLM TEXT[...], LABEL[...] LLM
TEXT[...], LABEL[...]
GB - TB
of labeled examples for a specific
task or set of tasks
LLM fine-tuning
PROMPT[...], COMPLETION[...]
Pre-trained PROMPT[...],
PROMPT[...],
COMPLETION[...]
COMPLETION[...]
Fine-tuned
LLM PROMPT[...], COMPLETION[...] LLM
Improved
GB - TB performance
of labeled examples for a specific Prompt-completion pairs
task or set of tasks
Using prompts to fine-tune LLMs with instruction
LLM fine-tuning
Model Model
I loved this DVD!
Sentiment: Positive
Pre-trained Fine-tuned
LLM Classify this review: LLM
I don’t like this
chair.
Sentiment: Negative
Each prompt/completion pair includes a

specific “instruction” to the LLM
LLM fine-tuning
PROMPT[...],
COMPLETION[...]
COMPLETION[...]
Fine-tuned
Summarize the following text: Translate this sentence to...

[EXAMPLE TEXT] [EXAMPLE TEXT]
[EXAMPLE COMPLETION] [EXAMPLE COMPLETION]
LLM fine-tuning
PROMPT[...],
COMPLETION[...]
COMPLETION[...]
Fine-tuned
Full fine-tuning Improved

updates all parameters performance
Sample prompt instruction templates
Classification / sentiment analysis
Text generation
Text summarization
Source: https://github.com/bigscience-workshop/promptsource/blob/main/promptsource/templates/amazon_polarity/templates.yaml
LLM fine-tuning process
LLM fine-tuning
Prepared instruction dataset Training splits
PROMPT[...], COMPLETION[...] Training
... Validation
... Test
LLM fine-tuning
LLM completion:

Prepared instruction dataset Model
I loved this DVD!
Sentiment: Neutral
Pre-trained
Label:
LLM
Prompt: Classify this review:
I loved this DVD!
I loved this DVD! Sentiment: Positive
Sentiment:
LLM fine-tuning
LLM completion:

I loved this DVD!
Sentiment: Neutral
Pre-trained
Label:
LLM
I loved this DVD!
Sentiment:
Loss: Cross-Entropy
LLM fine-tuning
... validation_accuracy
Validation
... Test
LLM fine-tuning
... Validation
... Test test_accuracy
Model Model
Pre-trained Fine-tuned
Instruct
LLM LLM
Fine-tuning on a single task
Fine-tuning on a single task
Model Single-task training dataset, Model

e.g. summarization
Pre-trained Summarize
Summarize the
the following
following text:
text: Instruct
Summarize the following text:
LLM [EXAMPLE
[EXAMPLE
[EXAMPLE
TEXT]
Summarize the
TEXT]
TEXT]
following text: LLM
[EXAMPLE
[EXAMPLE COMPLETION]
[EXAMPLE TEXT]
COMPLETION]
… [EXAMPLE
COMPLETION]
……
…
Often, only 500-1000 examples
needed to fine-tune a single task
Catastrophic forgetting
● Fine-tuning can significantly increase the performance of a model on a
specific task…
Before fine-tuning
Sentiment: Sentiment: eived a
very nice book review
● Fine-tuning can significantly increase the performance of a model on a
specific task…
After fine-tuning
Sentiment: Sentiment: POSITIVE
● …but can lead to reduction in ability on other tasks
Before fine-tuning
What is the name of What is the name of
the cat? LLM the cat?
Charlie the cat roamed Charlie the cat roamed
the garden at night. the garden at night.
Charlie
● …but can lead to reduction in ability on other tasks
After fine-tuning
What is the name of What is the name of
the cat? LLM the cat?
Charlie the cat roamed Charlie the cat roamed
the garden at night. the garden at night.
The garden was
positive.
How to avoid catastrophic forgetting
● First note that you might not have to!
● Fine-tune on multiple tasks at the same time
● Consider Parameter Efficient Fine-tuning (PEFT)
Multi-task, instruction fine-tuning
Model Instruction fine-tune on many tasks
Pre-trained
Rate this review:
LLM
Translate into Python code:
Identify the places:
[EXAMPLE TEXT] …
Model Instruction fine-tune on many tasks Model
Pre-trained Instruct
Rate this review:
LLM LLM
Translate into Python code:
Identify the places:
Many examples of each [EXAMPLE TEXT] …

needed for training [EXAMPLE COMPLETION]
Instruction fine-tuning with FLAN
● FLAN models refer to a specific set of instructions used to perform
instruction fine-tuning
“The metaphorical dessert to the main course of

pretraining”
FLAN
Instruction fine-tuning with FLAN
● FLAN models refer to a specific set of instructions used to perform
instruction fine-tuning
T5 FLAN-T5
T5
FLAN PALM FLAN-PALM

PALM
(Fine-tuned
LAnguage Net)
FLAN-T5: Fine-tuned version of pre-trained T5 model
● FLAN-T5 is a great, general purpose, instruct model
T0-SF Muffin CoT (reasoning) Natural Instructions
- Commonsense Reasoning, - Natural language inference, - Arithmetic reasoning, - Cause effect classification,
- Question Generation, - Code instruction gen, - Commonsense reasoning - Commonsense reasoning,
- Closed-book QA, - Code repair - Explanation generation, - Named Entity Recognition,
- Adversarial QA, - Dialog context generation, - Sentence composition, - Toxic Language Detection,
- Extractive QA - Summarization (SAMSum) - Implicit reasoning, - Question answering
… … … …
55 Datasets 69 Datasets 9 Datasets 372 Datasets

14 Categories 27 Categories 1 Category 108 Categories
193 Tasks 80 Tasks 9 Tasks 1554 Tasks
Source: Chung et al. 2022, “Scaling Instruction-Finetuned Language Models”

FLAN-T5: Fine-tuned version of pre-trained T5 model
● FLAN-T5 is a great, general purpose, instruct model
T0-SF Muffin CoT (reasoning) Natural Instructions
- Commonsense Reasoning, - Natural language inference, - Arithmetic reasoning, - Cause effect classification,
- Question Generation, - Code instruction gen, - Commonsense reasoning - Commonsense reasoning,
- Closed-book QA, - Code repair - Explanation generation, - Named Entity Recognition,
- Adversarial QA, - Dialog context generation, - Sentence composition, - Toxic Language Detection,
- Extractive QA - Summarization (SAMSum) - Implicit reasoning, - Question answering
… … … …
55 Datasets 69 Datasets 9 Datasets 372 Datasets

14 Categories 27 Categories 1 Category 108 Categories
193 Tasks 80 Tasks 9 Tasks 1554 Tasks
Source: Chung et al. 2022, “Scaling Instruction-Finetuned Language Models”

SAMSum: A dialogue dataset
Sample prompt training dataset (samsum) to fine-tune FLAN-T5 from pretrained T5
Source: https://huggingface.co/datasets/samsum, https://github.com/google-research/FLAN/blob/2c79a31/flan/v2/templates.py#L3285

Sample FLAN-T5 prompt templates
"samsum": [
("{dialogue}\n\Briefly summarize that dialogue.", "{summary}"),
("Here is a dialogue:\n{dialogue}\n\nWrite a short summary!",
"{summary}"),
("Dialogue:\n{dialogue}\n\nWhat is a summary of this dialogue?",
"{summary}"),
("{dialogue}\n\nWhat was that dialogue about, in two sentences or less?",
"{summary}"),
("Here is a dialogue:\n{dialogue}\n\nWhat were they talking about?",
"{summary}"),
("Dialogue:\n{dialogue}\nWhat were the main points in that "
"conversation?", "{summary}"),
("Dialogue:\n{dialogue}\nWhat was going on in that conversation?",
"{summary}"),
]
Sample FLAN-T5 prompt templates
"samsum": [
("{dialogue}\n\Briefly summarize that dialogue.", "{summary}"),
("Here is a dialogue:\n{dialogue}\n\nWrite a short summary!",
"{summary}"),
("Dialogue:\n{dialogue}\n\nWhat is a summary of this dialogue?",
"{summary}"),
("{dialogue}\n\nWhat was that dialogue about, in two sentences or less?",
"{summary}"),
("Here is a dialogue:\n{dialogue}\n\nWhat were they talking about?",
"{summary}"),
("Dialogue:\n{dialogue}\nWhat were the main points in that "
"conversation?", "{summary}"),
("Dialogue:\n{dialogue}\nWhat was going on in that conversation?",
"{summary}"),
]
Improving FLAN-T5’s summarization capabilities
ShopBot
Hi there! How can I help

you today?
I need to return a pair of

jeans that I purchased
Of course, I’d be happy T

to help you. When did
you buy your item?
Two weeks ago.
T
ShopBot

you today?
I need to return a pair of Goal: Summarize

jeans that I purchased conversations to
identify actions to
to help you. When did take
you buy your item?
Two weeks ago.
T
Further fine-tune FLAN-T5 with a domain-specific instruction dataset (dialogsum)
Example support-dialog summarization
Prompt (created from template)
Summarize the following conversation.

Tommy: Hello. My name is Tommy Sandals, I have a reservation.
Mike: May I see some identification, sir, please?
Tommy: Sure. Here you go.
Mike: Thank you so much. Have you got a credit card, Mr.
Sandals?
Tommy: I sure do.
Mike: Thank you, sir. You'll be in room 507, nonsmoking,
queen bed.
Tommy: That's great, thank you!
Mike: Enjoy your stay!
Source: https://huggingface.co/datasets/knkarthick/dialogsum/viewer/knkarthick--dialogsum/
Summary before fine-tuning FLAN-T5 with our dataset
Prompt (created from template) Model Completion (Summary)
Summarize the following Tommy Sandals has a reservation

conversation.
for a room at the Venetian
Tommy: Hello. My name is FLAN-T5
Tommy Sandals, I have a Hotel in Las Vegas.
reservation.
Mike: May I see some
... Adequate completion, but does not
... match human baseline.
...
Tommy: That's great, thank
you! Human baseline summary:
Mike: Enjoy your stay! Tommy Sandals has got a
reservation. Mike asks for his
identification and credit card
and helps his check-in.

conversation.
reservation.
...

conversation.
reservation.
...
Summary after fine-tuning FLAN-T5 with our dataset
Summarize the following Tommy Sandals has a

conversation.
FLAN-T5 reservation and checks in
Tommy: Hello. My name is
Tommy Sandals, I have a (Fine-tuned) showing his ID and credit
reservation. card. Mike helps him to
Mike: May I see some check in and approves his
...
reservation.
...
...
Tommy: That's great, thank Better summary,
you! more-closely matches
Mike: Enjoy your stay!
human baseline.
Fine-tuning with your own data
ShopBot

you today?
I need to return a pair of

jeans that I purchased

to help you. When did
you buy your item?
Two weeks ago.
T
Model evaluation metrics
LLM Evaluation - Challenges
Correct Predictions
Accuracy =
Total Predictions
LLM Evaluation - Challenges
“Mike really loves drinking tea.” “Mike adores sipping tea.”
❤☕ = ❤☕
“Mike does not drink coffee.” “Mike does drink coffee.”
🤢☕ = 🧐☕
LLM Evaluation - Metrics
BLEU
ROUGE
SCORE
● Used for text summarization ● Used for text translation

● Compares a summary to one ● Compares to human-generated
or more reference summaries translations
LLM Evaluation - Metrics - Terminology
n-gram
The dog lay on the rug as I sipped a cup of tea.
bigram unigram
LLM Evaluation - Metrics - ROUGE-1
Reference (human):
ROUGE-1 unigram matches 4
It is cold outside. = = = 1.0
Recall unigrams in reference 4
Generated output:
It is very cold outside. ROUGE-1 unigram matches 4
= 0.8
= =
Precision: unigrams in output 5
ROUGE-1 = 2 precision x recall 0.8

= 2 1.8 = 0.89
F1: precision + recall
Reference (human):
Recall unigrams in reference 4
Generated output:
It is not cold outside. ROUGE-1 unigram matches 4
= 0.8
= =
ROUGE-1 = 2 precision x recall 0.8

= 2 1.8 = 0.89
Reference (human):
It is cold outside.
It is is cold cold outside
Generated output:
It is very cold outside.

It is is very very cold cold outside
Reference (human):
ROUGE-2 bigram matches 2
It is cold outside. Recall:
=
bigrams in reference = = 0.67
3
It is is cold
cold outside
ROUGE-2 bigram matches 2
= = = 0.5
Generated output: Precision: bigrams in output 4
It is is very
ROUGE-2 = 2 precision x recall = 2
0.335
= 0.57
very cold cold outside F1: precision + recall 1.17
LLM Evaluation - Metrics - ROUGE-L
Reference (human):
It is cold outside.
Generated output:

Longest common subsequence (LCS):
It is
cold outside 2
Reference (human):
ROUGE-L LCS(Gen, Ref) 2
Recall: unigrams in reference 4
Generated output:
It is very cold outside. ROUGE-L LCS(Gen, Ref) 2
= 0.4
= =
ROUGE-L = 2 precision x recall 0.2

= 2 0.9 = 0.44
Reference (human):
ROUGE-L LCS(Gen, Ref) 2
Recall: unigrams in reference 4
Generated output:
It is very cold outside. ROUGE-L LCS(Gen, Ref) 2
= 0.4
= =
LCS:
Longest common subsequence
ROUGE-L = 2 precision x recall 0.2
= 2 0.9 = 0.44
LLM Evaluation - Metrics - ROUGE hacking
Reference (human):
It is cold outside.
Generated output:
cold cold cold cold

LLM Evaluation - Metrics - ROUGE clipping
Reference (human):
Precision unigrams in output 4
Generated output:
🥶
cold cold cold cold
Modified clip(unigram matches) 1
= = = 0.25
precision unigrams in output 4
Generated output:
Modified clip(unigram matches) 4
outside cold it is = = = 1.0
precision unigrams in output 4
🤔
BLEU
ROUGE
SCORE

LLM Evaluation - Metrics - BLEU
BLEU metric = Avg(precision across range of n-gram sizes)
Reference (human):
I am very happy to say that I am drinking a warm cup of tea.

Generated output:
I am very happy that I am drinking a cup of tea. - BLEU 0.495
I am very happy that I am drinking a warm cup of tea. - BLEU 0.730
I am very happy to say that I am drinking a warm tea. - BLEU 0.798
I am very happy to say that I am drinking a warm cup of tea. - BLEU 1.000
BLEU
ROUGE
SCORE

Benchmarks
Evaluation benchmarks
GLUE
The tasks included in SuperGLUE benchmark:
Source: Wang et al. 2018, “GLUE: A Multi-Task Benchmark and Analysis Platform for Natural
Language Understanding”
SuperGLUE
The tasks included in SuperGLUE benchmark:
Source: Wang et al. 2019, “SuperGLUE: A Stickier Benchmark for General-Purpose Language
Understanding Systems“
GLUE and SuperGLUE leaderboards
Disclaimer: metrics may not be up-to-date. Check https://super.gluebenchmark.com and https://gluebenchmark.com/leaderboard for the latest.
Benchmarks for massive models
Massive
Multitask
Language
Understanding
(MMLU)
2021
Source: Hendrycks, 2021. “Measuring Massive
Multitask Language Understanding”
Benchmarks for massive models
Massive
Multitask
BIG-bench Hard
Language
Understanding BIG-bench
(MMLU)
Lite
2021 2022
Source: Hendrycks, 2021. “Measuring Massive Source: Suzgun et al. 2022. “Challenging BIG-Bench
Multitask Language Understanding” tasks and whether chain-of-thought can solve them“
Holistic Evaluation of Language Models (HELM)
Metrics:
1. Accuracy
2. Calibration
3. Robustness
4. Fairness
5. Bias
6. Toxicity
7. Efficiency
Holistic Evaluation of Language Models (HELM)
Disclaimer: metrics may not be up-to-date. Check https://crfm.stanford.edu/helm/latest for the latest.
Key takeaways
LLM fine-tuning
LLM completion:
Training dataset
Model
Pre-trained
Label:
LLM
Prompt:

I loved this DVD!
Sentiment:
Loss: Cross-
LLM fine-tuning
LLM completion:
Training dataset
Model
Updated
Label:
LLM
Prompt:

I loved this DVD!
Sentiment:
Loss: Cross-Entropy
LLM fine-tuning
LLM completion:

I loved this DVD!
Sentiment: Neutral
Pre-trained
Label:
LLM
I loved this DVD!
Sentiment:
Parameter-
efficient
Fine-tuning (PEFT)
Full fine-tuning of large LLMs is challenging
Temp memory
Forward
Activations
LLM 12-20x
Gradients weights
Optimizer states
Trainable
Weights
Parameter efficient fine-tuning (PEFT)
Small number of
trainable layers
LLM
❄
LLM with most layers

frozen
Parameter efficient fine-tuning (PEFT)
New trainable
layers
Less prone to
catastrophic forgetting
LLM
LLM
Other
❄ components
Trainable
weights
LLM with additional

Frozen Weights ❄
layers for PEFT
Full fine-tuning creates full copy of original LLM per task
GBs
QA
QA fine tune
LLM
GBs
GBs
Summarize Summarize
LLM
fine tune LLM
GBs
Generate fine Generate
tune LLM
PEFT fine-tuning saves space and is flexible
PEFT weights
QA PEFT MBs
GBs
Summarize Generate
Summarize
QA
LLM MBs
PEFT LLM
Generate
MBs
PEFT
PEFT Trade-offs
Parameter Efficiency
Memory Efficiency Training Speed
Model Performance Inference Costs

PEFT methods
Selective Reparameterization
Select subset of initial Reparameterize model
LLM parameters to weights using a low-rank
fine-tune representation
Source: Lialin et al. 2023, “Scaling Down to Scale Up: A Guide to Parameter-Efficient Fine-Tuning”,
PEFT methods
Selective Reparameterization Additive

Select subset of initial Reparameterize model Add trainable layers or
LLM parameters to weights using a low-rank parameters to model
Adapters
LoRA
Soft Prompts
Prompt Tuning
Source: Lialin et al. 2023, “Scaling Down to Scale Up: A Guide to Parameter-Efficient Fine-Tuning”,
Low-Rank Adaptation of Large Language
Models (LoRA)
Output
Transformers: recap
Softmax
output
J'aime l'apprentissage
automatique
Decoder
Encoder
2345 3425 3853

Embedding Embedding
Inputs
Output
Transformers: recap
Softmax
output
Feed forward
network
Feed forward
network Decoder
Encoder
Self-attention Self-attention
2345 3425 3853

Embedding Embedding
Inputs
LoRA: Low Rank Adaption of LLMs
1. Freeze most of the original LLM weights.

Encoder
Self-attention
Weights
applied to ❄
embedding
vectors
Embedding

Encoder 2. Inject 2 rank decomposition matrices
Self-attention 3. Train the weights of the smaller matrices
❄ +
Rank r is small
dimension,
typically 4, 8 … 64
Embedding

❄ + Steps to update model for inference

1. Matrix multiply the low rank matrices
B * A = AxB
Embedding
2. Add to original weights
❄ + AxB

Updated
weights ❄ + AxB
Steps to update model for inference:
1. Matrix multiply the low rank matrices
B * A = AxB
Embedding
2. Add to original weights
❄ + AxB
Concrete example using base Transformer as reference
Use the base Transformer model presented by Vaswani et al. 2017:

● Transformer weights have dimensions d x k = 512 x 64
● So 512 x 64 = 32,768 trainable parameters
512 64
In LoRA with rank r = 8:

● A has dimensions r x k = 8 x 64 = 512 parameters
● B has dimension d x r = 512 x 8 = 4,096 trainable parameters
8
64 512 86% reduction in parameters
8 to train!
1. Train different rank decomposition

Encoder matrices for different tasks
2. Update weights before inference
Self-attention
Weights Task A
applied to * =
embedding
vectors ❄ +
Embedding

Self-attention
Updated Task A
weights for ❄ + * =
task A
❄ +
Embedding
Task B
* =
❄ +

Self-attention
Updated Task A
weights for ❄ + * =
task B
❄ +
Embedding
Task B
* =
❄ +
Sample ROUGE metrics for full vs. LoRA fine-tuning
Base model Full fine-tune
ROUGE ROUGE
FLAN-T5
Dialog
summarization
flan_t5_base
{‘rouge1’: 0.2334,
‘rouge2’: 0.0760,
‘rougeL’: 0.2014,
Baseline ‘rougeLsum’: 0.2015}
score
Sample ROUGE metrics for full vs. LoRA fine-tuning
Base model Full fine-tune LoRA fine tune
ROUGE +80.63% ROUGE -3.20% ROUGE
flan_t5_base_instruct_full
{‘rouge1’: 0.4216, flan_t5_base_instruct_lora
FLAN-T5 ‘rouge2’: 0.1804, {‘rouge1’: 0.4081,
‘rougeL’: 0.3384, ‘rouge2’: 0.1633,
‘rougeLsum’: 0.3384} ‘rougeL’: 0.3251,
Dialog ‘rougeLsum’: 0.3249}
summarization
flan_t5_base
{‘rouge1’: 0.2334,
‘rouge2’: 0.0760,
‘rougeL’: 0.2014,
Baseline ‘rougeLsum’: 0.2015}
score
Choosing the LoRA rank
● Effectiveness of higher rank

appears to plateau
● Relationship between rank
and dataset size needs more
empirical data
Source: Hu et al. 2021, “LoRA: Low-Rank Adaptation of Large Language Models”

QLoRA: Quantized LoRA
● Introduces 4-bit NormalFloat (nf4) data type for 4-bit quantization

● Supports double-quantization to reduce memory ~0.4 bits per parameter
(~3 GB for a 65B model)
● Unified GPU-CPU memory management reduces GPU memory usage
● LoRA adapters at every layer - not just attention layers
● Minimizes accuracy trade-off
Source: Dettmers et al. 2023, “QLoRA: Efficient

Finetuning of Quantized LLMs”
Prompt tuning with soft prompts
Prompt tuning is not prompt engineering!

Sentiment: Positive Sentiment: Positive

I don’t like this I don’t like this
chair. chair.
Sentiment: Sentiment: Negative
One-shot or Few-shot Inference

Prompt tuning adds trainable “soft prompt” to inputs
Soft prompt
X1 X1 X1 X1 X1 X1 X2 X3 X4 X5 X6 X7 X8
Same length as
token vectors
Typically 20-100
tokens
The teacher teaches the student with the book.
Soft prompts
fire
book
jumps
fox Embeddings of each token

exist at unique point in
swim
multi-dimensional space
whale
Soft prompts
Full Fine-tuning vs prompt tuning
Weights of model updated

during training
Full Fine-tuning vs prompt tuning
Weights of model updated Weights of model frozen and

during training soft prompt trained
Millions to Billions of 10K - 100K of parameters

parameter updated updated
Prompt tuning for multiple tasks
Task A
Switch out soft prompt at
❄ inference time to change task!
❄
Task B
❄
Performance of prompt tuning
⬤ Full Fine-tuning
⬤ Multi-task Fine-tuning
SuperGLUE Score ✖ Prompt tuning

🟦 Prompt engineering
Prompt tuning can

be as effective as full
Number of Model Parameters Fine-tuning for
Source: Lester et al. 2021, “The Power of Scale for Parameter-Efficient Prompt Tuning”
larger models!
Interpretability of soft prompts
fire
book
jumps
fox Trained soft-prompt

embedding does not
swim
correspond to a known
token…
whale
Interpretability of soft prompts
100%
completely
totally
altogether …but nearest neighbors
form a semantic group
entirely with similar meanings.
PEFT methods summary
Selective Reparameterization Additive

Select subset of initial Reparameterize model Add trainable layers or
LLM parameters to weights using a low-rank parameters to model
Adapters
LoRA
Soft Prompts
Prompt Tuning

Deeplearning - Ai Deeplearning - Ai

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Deeplearning - Ai Deeplearning - Ai

Uploaded by

Copyright:

Available Formats

Copyright Notice

These slides are distributed under the Creative Commons License.

For the rest of the details of the license, see

Scope Select Adapt and align model Application integration

Scope Select Adapt and align model Application integration

Prompt Model Completion

Prompt Model Completion

Prompt Model Completion

Classify this review: Classify this review:

One-shot or Few-shot Inference

Classify this review: Instead, try ﬁne-tuning

Model Task-speciﬁc examples Model

Model Task-speciﬁc examples Model

Each prompt/completion pair includes a

Model Task-speciﬁc examples Model

Summarize the following text: Translate this sentence to...

Model Task-speciﬁc examples Model

Full ﬁne-tuning Improved

Prepared instruction dataset Training splits

Classify this review:

Classify this review:

Prepared instruction dataset Training splits

Prepared instruction dataset Training splits

Model Single-task training dataset, Model

Model Instruction ﬁne-tune on many tasks

Summarize the following text:

Identify the places:

Model Instruction ﬁne-tune on many tasks Model

Summarize the following text:

Identify the places:

Many examples of each [EXAMPLE TEXT] …

“The metaphorical dessert to the main course of

FLAN PALM FLAN-PALM

T0-SF Mufﬁn CoT (reasoning) Natural Instructions

55 Datasets 69 Datasets 9 Datasets 372 Datasets

Source: Chung et al. 2022, “Scaling Instruction-Finetuned Language Models”

T0-SF Mufﬁn CoT (reasoning) Natural Instructions

55 Datasets 69 Datasets 9 Datasets 372 Datasets

Source: Chung et al. 2022, “Scaling Instruction-Finetuned Language Models”

Source: https://huggingface.co/datasets/samsum, https://github.com/google-research/FLAN/blob/2c79a31/flan/v2/templates.py#L3285

Hi there! How can I help

I need to return a pair of

Of course, I’d be happy T

Two weeks ago.

Hi there! How can I help

I need to return a pair of Goal: Summarize

Two weeks ago.

Summarize the following conversation.

Summarize the following Tommy Sandals has a reservation

Summarize the following Tommy Sandals has a reservation

Summarize the following Tommy Sandals has a reservation

Summarize the following Tommy Sandals has a

Hi there! How can I help

I need to return a pair of

Of course, I’d be happy T

Two weeks ago.

“Mike really loves drinking tea.” “Mike adores sipping tea.”

● Used for text summarization ● Used for text translation

The dog lay on the rug as I sipped a cup of tea.

ROUGE-1 = 2 precision x recall 0.8

ROUGE-1 = 2 precision x recall 0.8

It is very cold outside.