0% found this document useful (0 votes)

96 views31 pages

Self-Supervised Learning Explained

This document discusses self-supervised learning methods including reconstructing corrupted data using denoising autoencoders and BERT, predicting rotations of images, solving jigsaw puzzles, and contrastive predictive coding. It also compares autoregressive and autoencoding language models such as GPT, BERT, XLNet, and BART.

Uploaded by

daredevilcho

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

96 views31 pages

Self-Supervised Learning Explained

Uploaded by

daredevilcho

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

NTU 2020 Spring Machine Learning

Self-Supervised Learning

Chi-Liang Liu, Hung-Yi Lee

NTU

Refer to UCB CS294-158-SP20

Previously on this Course
■ Supervised Learning
■ Given: a dataset and
a loss function

Goal:

■ Works well when labeled data is abundant.

■ Learn useful representation with the supervision.

■ Problem:
Can we learn useful representation without the supervision?

National Taiwan University 2020 Spring Machine Learning -- Chi-Liang Liu, Hung-Yi Lee-- Pre-trained Model 2#
Why Self-Supervised Learning?

Labeled Data
Unlabeled Data
Slide: Thang Luong

National Taiwan University 2020 Spring Machine Learning -- Chi-Liang Liu, Hung-Yi Lee-- Pre-trained Model 3#
Why Self-Supervised Learning?

Yann LeCun’s cake

National Taiwan University 2020 Spring Machine Learning -- Chi-Liang Liu, Hung-Yi Lee-- Pre-trained Model 4#
What is Self-Supervised Learning?
■ A version of unsupervised learning where data provides the
supervision.

■ In general, withhold some part of the data and the task a

neural network to predict it from the remaining parts.

■ Goal: Learning to represent the world before learning tasks.

National Taiwan University 2020 Spring Machine Learning -- Chi-Liang Liu, Hung-Yi Lee-- Pre-trained Model 5#
Self-Supervised Learning= Filling the Blanks

Slide: LeCun
National Taiwan University 2020 Spring Machine Learning -- Chi-Liang Liu, Hung-Yi Lee-- Pre-trained Model 6#
Methods of Self-Supervised Learning
■ Reconstruct from a corrupted (or partial) data
■ Denoising Autoencoder
■ Bert-Family (Text)
■ In-painting (Imagae)
■ Visual common sense tasks
■ Jigsaw puzzles
■ Rotation
■ Contrastive Learning
■ word2vec
■ Contrastive Predictive Coding (CPC)
■ SimCLR

National Taiwan University 2020 Spring Machine Learning -- Chi-Liang Liu, Hung-Yi Lee-- Pre-trained Model 7#
Denoising AutoEncoder

Slide: CS294-158

National Taiwan University 2020 Spring Machine Learning -- Chi-Liang Liu, Hung-Yi Lee-- Pre-trained Model 8#
BERT-Family

National Taiwan University 2020 Spring Machine Learning -- Chi-Liang Liu, Hung-Yi Lee-- Pre-trained Model 9
Language Model
■ A statistical language model is a probability distribution over sequences of words. Given such
a sequence, say of length m, it assigns a probability to the whole sequence.
ex. P("Is it raining now?") > P("Is it raining yesterday?")

■ How to Compute?
■ n-gram model

■ Neural Network

Wiki: Language Model

National Taiwan University 2020 Spring Machine Learning -- Chi-Liang Liu, Hung-Yi Lee-- Pre-trained Model 10
ELMO & GPT & BERT

Devlin et al. 2018

National Taiwan University 2020 Spring Machine Learning -- Chi-Liang Liu, Hung-Yi Lee-- Pre-trained Model 11
GPT - Language Model

Radford et al. 2018

National Taiwan University 2020 Spring Machine Learning -- Chi-Liang Liu, Hung-Yi Lee-- Pre-trained Model 12
BERT - Masked LM
Masked LM:
Predict 15% of the tokens in the input.

80% replaced with a masked token

10% replaced with a random word
10% remain the same

ex.
Input: The [mask] sat on the [mask] .
Output: The cat sat on the mat .

National Taiwan University 2020 Spring Machine Learning -- Chi-Liang Liu, Hung-Yi Lee-- Pre-trained Model 13
BERT - Pipeline

Devlin et al. 2018

National Taiwan University 2020 Spring Machine Learning -- Chi-Liang Liu, Hung-Yi Lee-- Pre-trained Model 14
ARLM vs AELM
■ Autoregressive Language Model (ARLM)
■ Pro: Does not rely on data corruption
■ Con: Can only be forward or backward

■ Autoencoding Language Model (AELM a.k.a MaskedLM)

■ Pro: Can use bidirectional information (Context Dependency)
■ Con: Pretrain-Finetune discrepancy (Input Noise), Independence Assumption

Yang et al. 2019

National Taiwan University 2020 Spring Machine Learning -- Chi-Liang Liu, Hung-Yi Lee-- Pre-trained Model 15
XLNet - Permutation LM
Permutation LM:
Step 1. Permutation
Step 2. Autoregressive

ex.
Given Sequence [x1, x2, x3, x4]

Step 1. [x1, x2, x3, x4] -> [x2, x4, x3, x1]
Step 2. Given [x2] predict [x2, x4]
-> Given [x2, x4] predict [x2, x4, x3]
-> Given [x2, x4, x3] predict [x2, x4, x3, x1]

Yang et al. 2019

National Taiwan University 2020 Spring Machine Learning -- Chi-Liang Liu, Hung-Yi Lee-- Pre-trained Model 16
BART - Encoder & Decoder
■ Previous: Fixed-length to Fixed-length
■ BART:

■ Tasks:

Lewis et al. 2018

National Taiwan University 2020 Spring Machine Learning -- Chi-Liang Liu, Hung-Yi Lee-- Pre-trained Model 17
ELECTRA - Discriminator

Clark et al. 2018

National Taiwan University 2020 Spring Machine Learning -- Chi-Liang Liu, Hung-Yi Lee-- Pre-trained Model 18
Others
■ RoBERTa: A Robustly Optimized BERT Pretraining Approach

■ ALBERT: A LITE BERT FOR SELF-SUPERVISED LEARNING

OF LANGUAGE REPRESENTATIONS

National Taiwan University 2020 Spring Machine Learning -- Chi-Liang Liu, Hung-Yi Lee-- Pre-trained Model 19
Predict missing pieces

Pathak et al. 2016

Slide: CS294-158
National Taiwan University 2020 Spring Machine Learning -- Chi-Liang Liu, Hung-Yi Lee-- Pre-trained Model 20
Pathak et al. 2016
Slide: CS294-158
National Taiwan University 2020 Spring Machine Learning -- Chi-Liang Liu, Hung-Yi Lee-- Pre-trained Model 21
Context Encoder

Pathak et al 2016
Slide: CS294-158

National Taiwan University 2020 Spring Machine Learning -- Chi-Liang Liu, Hung-Yi Lee-- Pre-trained Model 22
Predicting one view from another

Slide: Richard Zhang

National Taiwan University 2020 Spring Machine Learning -- Chi-Liang Liu, Hung-Yi Lee-- Pre-trained Model 23
Slide: Richard Zhang
National Taiwan University 2020 Spring Machine Learning -- Chi-Liang Liu, Hung-Yi Lee-- Pre-trained Model 24
Solving Jigsaw Puzzles

Slide: CS294-158

National Taiwan University 2020 Spring Machine Learning -- Chi-Liang Liu, Hung-Yi Lee-- Pre-trained Model 25
Rotation

National Taiwan University 2020 Spring Machine Learning -- Chi-Liang Liu, Hung-Yi Lee-- Pre-trained Model 26
Rotation

Slide: CS294-158

National Taiwan University 2020 Spring Machine Learning -- Chi-Liang Liu, Hung-Yi Lee-- Pre-trained Model 27
Contrastive Predictive Coding

Remember Word2Vec? They are almost the same idea.

van den Oord et al. 2020

National Taiwan University 2020 Spring Machine Learning -- Chi-Liang Liu, Hung-Yi Lee-- Pre-trained Model 28
SimCLR

Chen et al. 2020

National Taiwan University 2020 Spring Machine Learning -- Chi-Liang Liu, Hung-Yi Lee-- Pre-trained Model 29
SimCLR

National Taiwan University 2020 Spring Machine Learning -- Chi-Liang Liu, Hung-Yi Lee-- Pre-trained Model 30
Reference
■ CS294-158 Deep Unsupervised Learning Lecture 7
■ AAAI 2020 Keynotes Turing Award Winners Event
■ Learning From Text - OpenAI
■ Learning from Unlabeled Data - Thang Luong

National Taiwan University 2020 Spring Machine Learning -- Chi-Liang Liu, Hung-Yi Lee-- Pre-trained Model 31

A Simple Survey of Pre-Trained Language Models
No ratings yet
A Simple Survey of Pre-Trained Language Models
7 pages
2023 LSE MY474 Applied Machine Learning Social Science, Lecture1
No ratings yet
2023 LSE MY474 Applied Machine Learning Social Science, Lecture1
65 pages
Warm-Starting Encoder-Decoder Models
No ratings yet
Warm-Starting Encoder-Decoder Models
50 pages
Introduction to Machine Learning Concepts
No ratings yet
Introduction to Machine Learning Concepts
145 pages
Lecture 12 Machine Learning
No ratings yet
Lecture 12 Machine Learning
40 pages
Understanding Machine Learning Concepts
No ratings yet
Understanding Machine Learning Concepts
33 pages
Cs229 Lecture Selfsupervision Final
No ratings yet
Cs229 Lecture Selfsupervision Final
65 pages
Predictive Analytics in Machine Learning
No ratings yet
Predictive Analytics in Machine Learning
16 pages
Machine Learning Course Overview
No ratings yet
Machine Learning Course Overview
60 pages
Survey of Transformer-based NLP Models
No ratings yet
Survey of Transformer-based NLP Models
42 pages
Lec1 - Introduction
No ratings yet
Lec1 - Introduction
55 pages
Unit 1 Introduction To ML-1
No ratings yet
Unit 1 Introduction To ML-1
184 pages
Supervised Learning: Concepts & Applications
No ratings yet
Supervised Learning: Concepts & Applications
16 pages
Jacob Devlin BERT
No ratings yet
Jacob Devlin BERT
43 pages
Naive Bayes Classifier Homework Solutions
No ratings yet
Naive Bayes Classifier Homework Solutions
5 pages
GLM: Autoregressive Blank Infill Model
No ratings yet
GLM: Autoregressive Blank Infill Model
16 pages
Advances in Semi-Supervised Learning
No ratings yet
Advances in Semi-Supervised Learning
46 pages
Introduction To ML
No ratings yet
Introduction To ML
46 pages
E2. Retrieval-Augmented LMs
No ratings yet
E2. Retrieval-Augmented LMs
83 pages
Intro Slides
No ratings yet
Intro Slides
31 pages
Pretraining Data Mixtures in Transformers
No ratings yet
Pretraining Data Mixtures in Transformers
13 pages
OR Forecasting Tool
No ratings yet
OR Forecasting Tool
39 pages
LM #02-ML Concepts & Frameworks
No ratings yet
LM #02-ML Concepts & Frameworks
31 pages
Machine Learning Fundamentals Overview
No ratings yet
Machine Learning Fundamentals Overview
39 pages
18 Overview
No ratings yet
18 Overview
18 pages
AIML Lab Improvement
No ratings yet
AIML Lab Improvement
20 pages
Machine Learning Challenges Explained
No ratings yet
Machine Learning Challenges Explained
12 pages
Comprehensive Guide to Machine Learning
No ratings yet
Comprehensive Guide to Machine Learning
30 pages
1 Introduction
No ratings yet
1 Introduction
24 pages
Machine Learning Fundamentals Overview
No ratings yet
Machine Learning Fundamentals Overview
46 pages
Deep Learning: Transformers & Pretraining
No ratings yet
Deep Learning: Transformers & Pretraining
32 pages
1 - Introduction
No ratings yet
1 - Introduction
82 pages
Introduction to Machine Learning Basics
No ratings yet
Introduction to Machine Learning Basics
606 pages
YANG Zhe - Progress Report
No ratings yet
YANG Zhe - Progress Report
49 pages
03 Introtoml Ueh
No ratings yet
03 Introtoml Ueh
43 pages
Self-Supervised Learning in Deep Learning
No ratings yet
Self-Supervised Learning in Deep Learning
91 pages
Machine Learning Fundamentals Explained
No ratings yet
Machine Learning Fundamentals Explained
31 pages
Huggingface Co Blog Warm Starting Encoder Decoder Data Preprocessing
No ratings yet
Huggingface Co Blog Warm Starting Encoder Decoder Data Preprocessing
20 pages
Self Supervised Learning
No ratings yet
Self Supervised Learning
37 pages
Next Word Prediction with Neural Networks
No ratings yet
Next Word Prediction with Neural Networks
47 pages
Introduction to Machine Learning Concepts
No ratings yet
Introduction to Machine Learning Concepts
32 pages
NLP Algorithms and Applications Overview
No ratings yet
NLP Algorithms and Applications Overview
108 pages
Natural Language Processing Unit 4
No ratings yet
Natural Language Processing Unit 4
37 pages
MLT U1
No ratings yet
MLT U1
23 pages
Pretrained Transformers Insights
No ratings yet
Pretrained Transformers Insights
42 pages
Supervised Learning: Concepts & Applications
No ratings yet
Supervised Learning: Concepts & Applications
15 pages
Machine Learning Fundamentals Overview
No ratings yet
Machine Learning Fundamentals Overview
85 pages
Machine Learning Principles Overview
No ratings yet
Machine Learning Principles Overview
28 pages
AI Machine Learning
No ratings yet
AI Machine Learning
3 pages
Machine Learning Course Overview 2024
No ratings yet
Machine Learning Course Overview 2024
21 pages
Introduction to Machine Learning Concepts
No ratings yet
Introduction to Machine Learning Concepts
10 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
15 pages
Language Models in HPC Applications
No ratings yet
Language Models in HPC Applications
73 pages
Understanding Machine Learning Concepts
No ratings yet
Understanding Machine Learning Concepts
54 pages
Autoregressive Models in Unsupervised Learning
No ratings yet
Autoregressive Models in Unsupervised Learning
113 pages
NLP Week9 Fine Tuning - and - IR
No ratings yet
NLP Week9 Fine Tuning - and - IR
64 pages
Supervised Learning Techniques Overview
No ratings yet
Supervised Learning Techniques Overview
94 pages
Pre-Training & LLM 2
No ratings yet
Pre-Training & LLM 2
46 pages

Self-Supervised Learning Explained

Uploaded by

Self-Supervised Learning Explained

Uploaded by

NTU 2020 Spring Machine Learning

Chi-Liang Liu, Hung-Yi Lee

Refer to UCB CS294-158-SP20

■ Works well when labeled data is abundant.

Yann LeCun’s cake

■ In general, withhold some part of the data and the task a

■ Goal: Learning to represent the world before learning tasks.

Wiki: Language Model

Devlin et al. 2018

Radford et al. 2018

80% replaced with a masked token

Devlin et al. 2018

■ Autoencoding Language Model (AELM a.k.a MaskedLM)

Yang et al. 2019

Yang et al. 2019

Lewis et al. 2018

Clark et al. 2018

■ ALBERT: A LITE BERT FOR SELF-SUPERVISED LEARNING

Pathak et al. 2016

Slide: Richard Zhang

Remember Word2Vec? They are almost the same idea.

Chen et al. 2020

You might also like