Professional Documents
Culture Documents
Deep Learning Unsupervised Learning
Deep Learning Unsupervised Learning
Unsupervised Learning
Russ Salakhutdinov
New example
• Alterna>ng Op>miza>on:
1. Fix dic>onary of bases and solve for
ac>va>ons a (a standard Lasso problem).
2. Fix ac>va>ons a, op>mize the dic>onary of bases (convex
QP problem).
Sparse Coding: Tes>ng Time
• Input: a new image patch x* , and K learned bases
• Output: sparse representa>on a of an image patch x*.
Classification
Algorithm
(SVM)
Learned
Input Image bases Features (coefficients) 9K images, 101 classes
Algorithm Accuracy
Baseline (Fei-Fei et al., 2004) 16%
PCA 37%
Sparse Coding 47%
a Sparse features a
Explicit Implicit
g(a) f(x)
Linear nonlinear
Decoding encoding
x’ x
Feed-back,
Feed-forward,
generative,
top-down Decoder Encoder bottom-up
path
Input Image
• Details of what goes insider the encoder and decoder maker!
• Need constraints to avoid learning an iden>ty.
Autoencoder
Binary Features z
Decoder Encoder
filters D filters W.
Input Image x
Autoencoder
Binary Features z • An autoencoder with D inputs,
D outputs, and K hidden units,
with K<D.
Dz z=σ(Wx)
Decoder Encoder
Autoencoder
Binary Features z • An autoencoder with D inputs,
D outputs, and K hidden units,
with K<D.
Dz z=σ(Wx)
Input Image x
Encoder
filters W.
σ(WTz) z=σ(Wx)
Decoder Sigmoid
filters D function
path
Binary Input x
L1 Sparsity Encoder
filters W.
Dz z=σ(Wx)
Decoder Sigmoid
filters D function
path
Real-valued Input x
At training
time
path
Decoder Encoder
Kavukcuoglu, Ranzato, Fergus, LeCun, 2009
Stacked Autoencoders
Class Labels
Decoder Encoder
Features
Features
Input x
Stacked Autoencoders
Class Labels
Decoder Encoder
Features
Features
Greedy Layer-wise Learning.
Sparsity Decoder Encoder
Input x
Stacked Autoencoders
Class Labels
• Remove decoders and
use feed-forward part. Encoder
• Standard, or Features
convolu>onal neural
network architecture. Encoder
• Parameters can be
Features
fine-tuned using
backpropaga>on.
Encoder
Input x
Deep Autoencoders
Decoder
30
W4
Top
500
RBM
T T
W1 W1 + 8
2000 2000
T T
500 W2 W2 + 7
W3 1000 1000
T T
1000 W3 W3 + 6
RBM
500 500
T T
W4 W4 + 5
RBM Encoder
Pretraining Unrolling Fine tuning
Deep Autoencoders
• 25x25 – 2000 – 1000 – 500 – 30 autoencoder to extract 30-D real-
valued codes for Olives face patches.
Energy Markets
Disasters and
Accidents
Leading Legal/Judicial
Economic
Indicators
Government
Accounts/
Borrowings
Earnings
Pixel CNN
Restricted Boltzmann Machines
hidden variables Pair-wise Unary
Graphical Models: Powerful Feature Detectors
framework for represen>ng
dependency structure between
random variables.
Sparse
New Image: representaNons
= ….
State-of-the-art performance
on the Nexlix dataset.
Binary 0
0
0
1
0
Real-valued 1-of-K
10
Topics “government”, ”corrup>on”
0.001 0.006 0.051 0.4
and ”mafia” can combine to give very
1.6 6.4 25.6 100
Recall high
(%) probability to a word “Silvio
Silvio Berlusconi Berlusconi”.
Deep Boltzmann Machines
Low-level features:
Edges
Input: Pixels
Image
(Salakhutdinov 2008, Salakhutdinov & Hinton 2012)
Deep Boltzmann Machines
Learn simpler representa>ons,
then compose more complex ones
Higher-level features:
Combina>on of edges
Low-level features:
Edges
Input: Pixels
Image
(Salakhutdinov 2008, Salakhutdinov & Hinton 2009)
Model Formula>on
h3 Same as RBMs
W3 model parameters
h2 • Dependencies between hidden variables.
W2 • All connec>ons are undirected.
h1 • Bokom-up and Top-down:
W1
v
Input
Top-down Bokom-up
Permuta>on-invariant version.
3-D object Recogni>on
NORB Dataset: 24,000 examples
Learning Algorithm Error
Logis>c regression 22.5%
K-NN (LeCun 2004) 18.92%
SVM (Bengio & LeCun 2007) 11.6%
Deep Belief Net (Nair & Hinton 9.0%
2009)
DBM 7.2%
Pakern
Comple>on
Data – Collec>on of Modali>es
• Mul>media content on the web -
image + text + audio.
• Product recommenda>on
systems.
car,
automobile
• Robo>cs applica>ons.
sunset,
pacificocean,
Motor control bakerbeach,
Touch sensors seashore, ocean
Vision
Audio
Challenges - I
Image Text
Very different input
sunset, pacific ocean, representa>ons
baker beach, seashore,
ocean • Images – real-valued, dense
• Text – discrete, sparse
Dense Sparse
Difficult to learn
cross-modal features
from low-level
representa>ons.
Challenges - II
Image Text
pentax, k10d,
pentaxda50200,
kangarooisland, sa, Noisy and missing data
australiansealion
mickikrimmel,
mickipedia,
headshot
< no text>
unseulpixel,
naturey
Challenges - II
Image Text Text generated by the model
pentax, k10d,
beach, sea, surf, strand,
pentaxda50200,
shore, wave, seascape,
kangarooisland, sa,
sand, ocean, waves
australiansealion
Gaussian model
Replicated Sovmax
0
Dense, real-valued 0 Word
image features 0
counts
1
0
Gaussian model
Replicated Sovmax
0
Dense, real-valued 0 Word
image features 0
counts
1
0
Gaussian model
Replicated Sovmax
0
Dense, real-valued 0 Word
image features 0
counts
1
0
nature, flower,
red, green
blue, green,
yellow, colors
chocolate, cake
MIR-Flickr Dataset
• 1 million images along with user-assigned tags.
Genera>ve
Approximate Process
Inference 3 h3
h
W3 W3
h2 h2
W2 W2
h1 h1
W1 W1
v v
Input data
Varia>onal Autoencoders (VAEs)
• The VAE defines a genera>ve process in terms of ancestral
sampling through a cascade of hidden stochas>c layers:
Input data
reparameteriza>on trick.
Reparameteriza>on Trick
• Assume that the recogni>on distribu>on is Gaussian:
with mean and covariance computed from the state of the hidden
units at the previous layer.
• Alterna>vely, we can express this in term of auxiliary variable:
Reparameteriza>on Trick
• Assume that the recogni>on distribu>on is Gaussian:
• Or
Determinis>c Distribu>on of
Encoder does not depend on
Compu>ng the Gradients
• The gradient w.r.t the parameters: both recogni>on and
genera>ve:
Autoencoder
h3
W3
unnormalized
h2 importance weights
W2 where are sampled
h1 from the recogni>on network.
W1
v
Input data
Stochas>c
Layer
Ask Google?
(Some) Open Problems
• Reasoning, Aken>on, and Memory
h1 h2 h3
x1 x2 x3
Gated Aken>on Mechanism
• Use Recurrent Neural Networks (RNNs) to encode a document
and a query:
RNN
Memory as Acyclic Graph . ht
Encoding (MAGE) - RNN ht 1
xt
Text Representa>on
Neural Story Telling
Sample from the GeneraNve Model
(recurrent neural network):
She was in love with him for the first
>me in months, so she had no
inten>on of escaping.
The sun had risen from the ocean, making her feel more alive than
normal. She is beau>ful, but the truth is that I do not know what to
do. The sun was just star>ng to fade away, leaving people scakered
around the Atlan>c Ocean.
Observa>on Ac>on
Learned Ac>on
External
Memory
Reward
Observa>on / State
Learned Ac>on
External
Memory
Reward
Learning 3-D game
without memory
Chaplot, Lample, AAAI 2017
Observa>on / State
Learned Ac>on
Structured
Memory
Reward
Observa>on / State
Mt Mt+1
Write Write
Read with
Aken>on
Learned Ac>on
External
Memory
Reward
Knowledge
Base
Observa>on / State
Building Intelligent Agents
Learned Ac>on
External
Memory
Reward
Learning from Fewer
Knowledge
Base
Examples, Fewer
Experiences
Observa>on / State
Summary
• Efficient learning algorithms for Deep Unsupervised Models
Text & image retrieval / Image Tagging Learning a Category
Object recogniNon Hierarchy
mosque, tower,
building, cathedral,
dome, castle
Object DetecNon
Speech RecogniNon MulNmodal Data
HMM decoder