Professional Documents
Culture Documents
Deeplearning - Ai Deeplearning - Ai
Deeplearning - Ai Deeplearning - Ai
DeepLearning.AI makes these slides available for educational purposes. You may not use or distribute
these slides for commercial purposes. You may make copies of these slides and use or distribute them for
educational purposes as long as you cite DeepLearning.AI as the source of the slides.
Question BERT
Answering
Transfer T5
learning
Question Answering
Context-based Closed book
Model Model
Not just the model
Data Data
Model Model
Classical training
Training Inference
Course Review
Model
Training
on “Downstream” Task
Course Review Model
Transfer Learning: Different Tasks
Inference
Pre-
Training Watching the Model When’s my
movie is like ... birthday?
Sentiment
Classification
Model
Training When is
Downstream task: Pi Day? Model
Question Answering March
14! Umm…
BERT: Bi-directional Context
Uni-directional
Learning from deeplearning.ai is like watching the sunset with my best friend!
context
Bi-directional
Learning from deeplearning.ai is like watching the sunset with my best friend!
context context
T5: Single task vs. Multi task
Studying with Studying with
deeplearning.ai deeplearning.ai
was ... was ...
Model 1
Model
Model 2
T5: more data, better performance
C4
Colossal Clean Crawled
English wikipedia
Corpus
~13 GB ~800 GB
Transfer
Learning
in NLP
deeplearning.ai
Desirable Goals
Transfer Learning!
● Improve predictions
● Small datasets
Transfer learning options
1 2
Pre-train
Transfer Model
data
Train
Model Labeled
data Feature- Unlabeled
based
3
prediction Fine- Pre-training task
tuning
Language modeling
Masked words
Next sentence
Transfer 1
General purpose learning
I am because I am learning CBOW “Happy”
Word Embeddings
input
Translation
“Features”
Transfer 1
Feature-based vs. Fine-Tuning
Pre-Train Pre-Train
features
Fine-tune same model
on Downstream task
Model prediction
Model prediction
Train a new model
Fine-tune: adding a layer Transfer
1
Pre-Training
..
Movies .
Course
reviews
2
Pre-train
Data and performance data
Data Model
Data Model
2
Pre-train
Labeled vs Unlabeled Data data
Model
What day is Pi
Model March 14
day?
Labeled data
3
Self-supervised task Pre-training task
Unlabeled
data
Create
Inputs targets
(features) (Labels)
3
Self-supervised tasks Pre-training task
Unlabeled Data
Learning from deeplearning.ai
is like watching the sunset Target
with my best friend.
Input friend
Update
Language modeling
Fine-tune a model for each downstream task
Pre Training
Model
Training on
Downstream task
Model Model Model
“on”
“the” “right”
“side”
Bi-directional LSTM
“right”
LSTM LSTM
Decoder
RNN
Decoder
Encoder
Uni-directional
Why not bi-directional?
Transformer
Attention
Attention
Attention
Decoder
Decoder Encoder
Encoder
The legislators believed that they were on the _____ side of history, so they changed the
law.
Bi-directional
Transformer + Bi-directional Context
Sentence “A”
? Sentence “B”
Sentence “A”
? Sentence “B”
T5: Encoder vs. Encoder-Decoder
Decoder Decoder
Decoder Encoder
Encoder Encoder
T5: Multi-task
Studying with
deeplearning.ai
was ...
How?
Model
T5: Text-to-Text
“5 stars”
“Classify: Learning from deeplearning.ai is like...”
Classify
...
...
...
...
BERT
● Positional embeddings
● BERT_base:
12 layers (12 transformer blocks)
12 attentions heads
110 million parameters
BERT pre-training
● Choose 15% of the tokens at random: mask them 80% of the time,
replace them with a random token 10% of the time, or keep as is 10%
of the time.
E E E E E E E E E E E
Token [CLS] my dog is cute [SEP] he likes play ##ing [SEP]
Embeddings
E E E E E E E E E E E
Segment A A A
A
A A B B B B B
Embeddings
Position E E E E E E E E E E E
0 1 2 4 5 6 7 8 9
3 10
Embeddings
Visualizing the output
NSP Mask ML Mask ML
C T1 TN T [SEP] T1 ’ TM ’
• [CLS]: a special
... ...
classification symbol
added in front of
BERT
every input
E [CLS] E1 ... EN E E1 ’ ... EM’
[SEP]
[CLS]
Tok
... Tok N [SEP] Tok 1 ... Tok M • [SEP]: a special
1
separator token
Masked sentence A Masked sentence
B
Unlabeled Sentence A and B Pair
BERT Objective
Objective 1: Objective 2:
Multi-Mask LM Next Sentence Prediction
V 2
Summary
● BERT objective
● Model inputs/outputs
Fine-tuning
BERT
deeplearning.ai
Fine-tuning BERT: Outline
MNLI
Pre-train BERT
BERT
Hypothesis Premise
Sentence A Sentence B
NER
SQuAD BERT
BERT
Sentence A Tags
Question Answer
Inputs
Summary
Sentence A Sentence B Sentence Entities
Hypothesis Premise
⋮
Transformer
T5
deeplearning.ai
Outline
Classification
Summarization
Question
Answering (Q&A)
Sentiment
©️Exploring the Limits of Transfer learning with a unified text to Text Transformer. Raffel et. al. 2020
Transformer - T5 Model
Original text
Inputs
©️Exploring the Limits of Transfer learning with a unified text to Text Transformer. Raffel et. al. 2020
Model Architecture
y1 y2 .
Language model Prefix LM
Decoder
X2 X3 y1 y2 . X2 X3 y1 y2 .
Encoder
X1 X2 X3 X4 X1 X2 X3 y1 y2 X1 X X 3
y1 y2
2
©️Exploring the Limits of Transfer learning with a unified text to Text Transformer. Raffel et. al. 2020
Model Architecture
● Encoder/decoder Decoder
Encoder
©️Exploring the Limits of Transfer learning with a unified text to Text Transformer. Raffel et. al. 2020
Summary
● Prefix LM attention
● Model architecture
● Pre-training T5 (MLM)
Multi-task
Training
Strategy
deeplearning.ai
Multi-task training strategy
“Translate English to German: That is
good.” “Das ist gut”
T5
well.” “not acceptable”
©️Exploring the Limits of Transfer learning with a unified text to Text Transformer. Raffel et. al. 2020
Input and Output Format
Machine translation:
• translate English to German: That is good.
● Predict entailment, contradiction , or neutral
• mnli premise: I hate pigeons hypothesis: My feelings
towards pigeons are filled with animosity. target:
entailment
● Winograd schema
• The city councilmen refused the demonstrators a permit
because *they* feared violence
©️Exploring the Limits of Transfer learning with a unified text to Text Transformer. Raffel et. al. 2020
Multi-task Training Strategy
Fine-tuning method GLUE CNNDM SQuAD SGLUE EnDe EnFr EnRo
©️Exploring the Limits of Transfer learning with a unified text to Text Transformer. Raffel et. al. 2020
Data Training Strategies
Examples-proportional mixing Equal mixing
Data 1 Data 1
Sample 1 Sample 1
Data 2
Data 2 Sample 2
Sample 2
Temperature-scaled mixing
Gradual unfreezing vs. Adapter layers
Q&A
GLUE
Benchmark
deeplearning.ai
General Language Understanding Evaluation
● A collection used to train, evaluate, analyze natural language
understanding systems
● Datasets with different genres, and of different sizes and
difficulties
● Leaderboard
Tasks Evaluated on
● Sentence grammatical or not?
● Sentiment
● Paraphrase
● Similarity
● Questions duplicates
● Answerable
● Contradiction
● Entailment
● Winograd (co-ref)
General Language Understanding Evaluation
● Drive research
● Model agnostic
Add & [
Norm
Feed LayerNorm,
Forward
dense,
Add &
Norm activation,
Multi-Head
dropout_middle,
Attention
dense,
Positional dropout_final
Encoding ]
Input
Embedding
Inputs
Transformer encoder Encoder block:
Add & [
Norm
Feed Residual(
Forward
LayerNorm,
Add &
Norm attention,
Multi-Head
dropout_,
Attention
),
Positional Residual(
Encoding feed_forward,
Input
Embedding ),
Inputs ]
Transformer encoder Feedforward: Encoder block:
Add & [ [
Norm
Feed LayerNorm, Residual(
Forward
dense, LayerNorm,
Add &
activation, attention,
Norm
Multi-Head dropout_middle, dropout_,
Attention
dense, ),
Inputs ]
Data examples
Question: What percentage of the French population today is non - European ?
Context: Since the end of the Second World War , France has become an ethnically diverse country . Today ,
approximately five percent of the French population is non - European and non - white . This does not
approach the number of non - white citizens in the United States ( roughly 28 – 37 % , depending on how Latinos are
classified ; see Demographics of the United States ) . Nevertheless , it amounts to at least three million people , and has
forced the issues of ethnic diversity onto the French policy agenda . France has developed an approach to dealing with
ethnic problems that stands in contrast to that of many advanced , industrialized countries . Unlike the United States ,
Britain , or even the Netherlands , France maintains a " color - blind " model of public policy . This means that it targets
virtually no policies directly at racial or ethnic groups . Instead , it uses geographic or class criteria to address issues of social
inequalities . It has , however , developed an extensive anti - racist policy repertoire since the early 1970s . Until recently ,
French policies focused primarily on issues of hate speech — going much further than their American counterparts — and
relatively less on issues of discrimination in jobs , housing , and in provision of goods and services .
● Process data to get the required inputs “cola sentence: The course “not
T5
is jumping well.” acceptable”
and outputs: "question: Q context: C"
as input and "A" as target “stsb sentence1: The rhino
grazed on the grass. “3.8”
● Fine tune your model on the new task Sentence2: A rhino is
grazing in a field.”
and input “six people
“Summarize: state hospitalized
authorities dispatched after a storm
● Predict using your own model emergency crews tuesday in attala
to survey the damage after county”
an onslaught of severe
weather in mississippi…”
Hugging
Face:
Introduction
deeplearning.ai
Outline
● What is Hugging Face?
Transformers library
Use it for
Context
Q/ Answers
A
Questions
Hugging Face: Fine-Tuning Transformers
Datasets:
Tokenizer
One Thousand
Model Checkpoints:
Trainer Evaluation metrics
More than 14 thousand
Tokenizer
Checkpoint: Set of learned
parameters for a model using a
training procedure for some task
Human readable output
Hugging
Face: Using
Transformers
deeplearning.ai
Using Transformers
Pipelines 1. Pre-processing your inputs
Context
Q/ Answers
A
Questions
Tasks Task
Initialization
Pipelines Model
Checkpoint
Inputs for
Use
the task
Model Checkpoints:
Trainer Evaluation metrics
More than 14 thousand
Tokenizer
Datasets:
One Thousand Load them using just one function
Number of epochs
Warm-up steps
Weight decay
...