Tricks of the Trade PPT

© All Rights Reserved

1 views

Tricks of the Trade PPT

© All Rights Reserved

- L09 Using Matlab Neural Networks Toolbox
- sar_2000
- 08-1003-1
- NeuroSolutions What is a Neural Network
- Hand-Written Character Recognition Using Kohonen Network
- ARTIFICIAL Neural Networks
- Multi-Agent Security System Based on Neural Network Model of User's Behavior
- An efficient method for classification of rice grains using Morphological process
- Artificial Intelligence - Wikipedia
- Heart Disease Prediction Using Data Mining Techniques
- Back Prop Theory
- Bengio Nips 2006 All
- CDDH Course Learning Plan BEB45104
- Evolution of Deep Learing
- CSM Submitted
- PREDICTION OF COMPRESSIVE STRENGTH OF HIGH PERFORMANCE CONCRETE CONTAINING INDUSTRIAL BY PRODUCTS USING ARTIFICIAL NEURAL NETWORKS
- rolfe-lecun-iclr-13
- mcl309-deep-learning-on-a-rasp-ab4cd6a6-8c65-4525-88a0-f6b4e8bc6399-1514865484-171129204147
- Research
- He Delving Deep Into

You are on page 1of 80

Marc'Aurelio Ranzato

Facebook, AI Group

Ideal Features

- window, right

- chair, left

- monitor, top of shelf

- carpet, bottom

- drums, corner

-…

Ideal

Feature

Extractor

- pillows on couch

What is on the couch? ...

Ranzato

The Manifold of Natural Images

3

Ideal Feature Extraction

Pixel n Pose

Ideal

Feature

Extractor

Pixel 2

Expression

Pixel 1 4

Ranzato

Learning Non-Linear Features

+

Ranzato

Learning Non-Linear Features

Kernel learning o w

l l

Boosting

h a

...

S

Proposal #2: composition f x≈ g 1 g 2 g n x

Deep learning

Scattering networks (wavelet cascade) e p

e

S.C. Zhou & D. Mumford “grammar” D 6

Ranzato

Linear Combination

prediction of class

+

...

templete matchers

an exponential nr. of

templates!!! Input image 7

Ranzato

Composition

...

prediction of class

high-level

parts

parts distributed representations

low level

parts

GOOD: (exponentially)

Input image more efficient

Lee et al. “Convolutional DBN's ...” ICML 2009, Zeiler & Fergus Ranzato

A Potential Problem with Deep Learning

1 2 3 4

Ranzato

A Potential Problem with Deep Learning

Classifier

k-Means

Pooling

SIFT

4

Solution #1: freeze first N-1 layer (engineer the features)

It makes it shallow!

- How to design features of features?

- How to design features for new imagery?

10

Ranzato

A Potential Problem with Deep Learning

1 2 3 4

Solution #2: live with it!

It will converge to a local minimum.

It is much more powerful!!

Given lots of data, engineer less and learn more!! 11

Deep Learning in Practice

1 2 3 4

A: No distinction, end-to-end learning!

12

Ranzato

Deep Learning in Practice

13

Ranzato

KEY IDEAS: WHY DEEP LEARNING

Distributed representations

Compositionality

End-to-end learning

14

Ranzato

What Is Deep Learning?

15

Ranzato

Buzz Words

16

Ranzato

(My) Definition

predictions by using a sequence of non-linear processing

stages. The resulting intermediate representations can be

interpreted as feature hierarchies and the whole system is

jointly learned from data.

loss-based, some are supervised, other unsupervised...

It's a large family!

17

Ranzato

Perceptron

1957

Rosenblatt

THE SPACE OF

MACHINE LEARNING METHODS 18

Neural Net Perceptron

80s

back-propagation &

Autoencoder

compute power Neural Net

19

Recurrent

Neural Net

Convolutional

Neural Net

90s

LeCun's CNNs Autoencoder

Neural Net

Sparse Coding

GMM

20

Recurrent

Neural Net

Boosting

Convolutional

Neural Net

SVM

00s

SVMs Autoencoder

Neural Net

Sparse Coding

GMM

Restricted BM

21

Recurrent

Neural Net

Boosting

Convolutional

Neural Net

SVM

2006

Hinton's DBN Autoencoder

Neural Net

Sparse Coding

GMM

Deep Belief Net Restricted BM

BayesNP 22

2009

Recurrent

Neural Net

Boosting

Convolutional

Neural Net

2009 SVM

ASR (data + GPU)

Deep (sparse/denoising) Autoencoder

Autoencoder Neural Net

Sparse Coding

ΣΠ GMM

Deep Belief Net Restricted BM

BayesNP 23

Recurrent

Neural Net

Boosting

Convolutional

Neural Net

2012 SVM

CNNs (data + GPU)

Deep (sparse/denoising) Autoencoder

Autoencoder Neural Net

Sparse Coding

ΣΠ GMM

Deep Belief Net Restricted BM

BayesNP 24

Recurrent

Neural Net

Boosting

Convolutional

Neural Net

SVM

Autoencoder Neural Net

Sparse Coding

ΣΠ GMM

Deep Belief Net Restricted BM

BayesNP 25

TIME

Convolutional

Neural Net 2012

Convolutional

Neural Net 1998

Convolutional

Neural Net 1988

A.: The main reason for the breakthrough is: data and GPU, but we

have also made networks deeper and more non-linear.

26

ConvNets: History

- Fukushima 1980: designed network with same basic structure but

did not train by backpropagation.

- LeCun from late 80s: figured out backpropagation for CNN,

popularized and deployed CNN for OCR applications and others.

- Poggio from 1999: same basic structure but learning is restricted

to top layer (k-means at second stage)

- LeCun from 2006: unsupervised feature learning

- DiCarlo from 2008: large scale experiments, normalization layer

- LeCun from 2009: harsher non-linearities, normalization layer,

learning unsupervised and supervised.

- Mallat from 2011: provides a theory behind the architecture

- Hinton 2012: use bigger nets, GPUs, more data 27

ConvNets: till 2012

Common wisdom: training does not work

Loss because we “get stuck in local minima”

parameter 28

ConvNets: today

Local minima are all similar, there are long plateaus,

Loss it can take long time to break symmetries.

1

w w

WT X

breaking ties Saturating units

input/output invariant to permutations between parameters

parameter 29

Like walking on a ridge between valleys

30

ConvNets: today

Local minima are all similar, there are long

Loss plateaus, it can take long to break symmetries.

Optimization is not the real problem when:

– dataset is large

– unit do not saturate too much

– normalization layer

parameter 31

ConvNets: today

Today's belief is that the challenge is about:

Loss – generalization

How many training samples to fit 1B parameters?

How many parameters/samples to model spaces with 1M dim.?

– scalability

parameter 32

SHALLOW

Recurrent

Neural Net

Boosting

Convolutional

Neural Net

SVM

Autoencoder Neural Net

Sparse Coding

ΣΠ GMM

Deep Belief Net Restricted BM

DEEP

BayesNP 33

SHALLOW

Recurrent

Neural Net

Boosting

Convolutional

Neural Net

SVM

SUPERVISED

Deep (sparse/denoising) UNSUPERVISED

Autoencoder

Autoencoder Neural Net

Sparse Coding

ΣΠ GMM

Deep Belief Net Restricted BM

DEEP

BayesNP 34

Deep Learning is a very rich family!

I am going to focus on a few methods...

35

Ranzato

SHALLOW

Recurrent

Neural Net

Boosting

CNN

SVM

SUPERVISED

Deep (sparse/denoising) UNSUPERVISED

Autoencoder

Autoencoder Neural Net

Sparse

Coding GMM

ΣΠ

DBN Restricted BM

DEEP

BayesNP 36

Deep Gated MRF

Layer 1:

c m 1 −1 c m

c m

−E x , h ,h

E x , h , h = x ' x p x , h , h e

2

pair-wise MRF

xp xq

37

Ranzato et al. “Modeling natural images with gated MRFs” PAMI 2013

Deep Gated MRF

Layer 1:

c m 1

E x , h , h = x ' C C ' x

2

pair-wise MRF

xp xq

F

38

Ranzato et al. “Modeling natural images with gated MRFs” PAMI 2013

Deep Gated MRF

Layer 1:

c m 1 c

E x , h , h = x ' C [diag h ]C ' x

2

c

h k

F

gated MRF

xp C C xq

F

39

Ranzato et al. “Modeling natural images with gated MRFs” PAMI 2013

Deep Gated MRF

Layer 1:

c m 1 c 1 m

E x , h , h = x ' C [diag h ]C ' x x ' x− x ' W h

2 2

c

h k

p x ∫h ∫h

c m

−E x ,h , h

M c m e

gated MRF

xp C C xq

F

W

m

h j

N 40

Ranzato et al. “Modeling natural images with gated MRFs” PAMI 2013

Deep Gated MRF

Layer 1:

c m 1 c 1 m

E x , h , h = x ' C [diag h ]C ' x x ' x− x ' W h

2 2

p x ∫h ∫h

c m

−E x ,h , h

Inference of latent variables: c m e

just a forward pass

Training:

requires approximations

(here we used MCMC methods)

41

Ranzato et al. “Modeling natural images with gated MRFs” PAMI 2013

Deep Gated MRF

Layer 1

Layer 2

input

2

h

42

Ranzato et al. “Modeling natural images with gated MRFs” PAMI 2013

Deep Gated MRF

Layer 1

Layer 2 Layer 3

input

2 3

h h

43

Ranzato et al. “Modeling natural images with gated MRFs” PAMI 2013

Sampling High-Resolution Images

Gaussian model marginal wavelet

44

Sampling High-Resolution Images

Gaussian model marginal wavelet

gMRF: 1 layer

45

Sampling High-Resolution Images

Gaussian model marginal wavelet

gMRF: 1 layer

46

Sampling High-Resolution Images

Gaussian model marginal wavelet

gMRF: 1 layer

47

Sampling High-Resolution Images

Gaussian model marginal wavelet

gMRF: 3 layers

48

Sampling High-Resolution Images

Gaussian model marginal wavelet

gMRF: 3 layers

49

Sampling High-Resolution Images

Gaussian model marginal wavelet

gMRF: 3 layers

50

Sampling High-Resolution Images

Gaussian model marginal wavelet

gMRF: 3 layers

51

Sampling After Training on Face Images

unconstrained samples

Original Input 1st layer 2nd layer 3rd layer 4th layer 10 times

52

Ranzato et al. PAMI 2013

Expression Recognition Under Occlusion

53

Ranzato et al. PAMI 2013

Pros Cons

Feature extraction is fast Training is inefficient

Unprecedented generation Slow

quality Tricky

Advances models of natural Sampling scales badly with

images dimensionality

Trains without labeled data What's the use case of

generative models?

Conclusion

If generation is not required, other feature learning methods are

more efficient (e.g., sparse auto-encoders).

What's the use case of generative models?

54

SHALLOW

Recurrent

Neural Net

Boosting

CNN

SVM

SUPERVISED

Deep (sparse/denoising) UNSUPERVISED

Autoencoder

Autoencoder Neural Net

SPARSE

CODING GMM

ΣΠ

DBN Restricted BM

DEEP

BayesNP 55

CONV NETS: TYPICAL ARCHITECTURE

One stage (zoom)

Whole system

Input Class

Image Labels

Fully Conn.

Layers

56

Ranzato

CONV NETS: TYPICAL ARCHITECTURE

One stage (zoom)

Lazebnik et al. “...Spatial Pyramid Matching...” CVPR 2006

Sanchez et al. “Image classifcation with F.V.: Theory and practice” IJCV 2012

57

Ranzato

CONV NETS: EXAMPLES

- OCR / House number & Traffic sign classification

Wan et al. “Regularization of neural networks using dropconnect” ICML 2013

CONV NETS: EXAMPLES

- Texture classification

59

Sifre et al. “Rotation, scaling and deformation invariant scattering...” CVPR 2013

CONV NETS: EXAMPLES

- Pedestrian detection

60

Sermanet et al. “Pedestrian detection with unsupervised multi-stage..” CVPR 2013

CONV NETS: EXAMPLES

- Scene Parsing

61

Farabet et al. “Learning hierarchical features for scene labeling” PAMI 2013 Ranzato

CONV NETS: EXAMPLES

- Segmentation 3D volumetric images

CONV NETS: EXAMPLES

- Action recognition from videos

63

CONV NETS: EXAMPLES

- Robotics

64

Sermanet et al. “Mapping and planning ...with long range perception” IROS 2008

CONV NETS: EXAMPLES

- Denoising

65

Burger et al. “Can plain NNs compete with BM3D?” CVPR 2012 Ranzato

CONV NETS: EXAMPLES

- Dimensionality reduction / learning embeddings

66

Hadsell et al. “Dimensionality reduction by learning an invariant mapping” CVPR 2006

CONV NETS: EXAMPLES

- Image classification

Object

railcar

Recognizer

67

Krizhevsky et al. “ImageNet Classification with deep CNNs” NIPS 2012 Ranzato

CONV NETS: EXAMPLES

- Deployed in commercial systems (Google & Baidu, spring 2013)

68

Ranzato

How To Use ConvNets...(properly)

69

CHOOSING THE ARCHITECTURE

Task dependent

Cross-validation

The more data: the more layers and the more kernels

Look at the number of parameters at each layer

Look at the number of flops at each layer

Computational cost

Be creative :) 70

Ranzato

HOW TO OPTIMIZE

SGD (with momentum) usually works very well

Bottou “Stochastic Gradient Tricks” Neural Networks 2012

Start with large learning rate and divide by 2 until loss does not diverge

Decay learning rate by a factor of ~100 or more by the end of training

Use non-linearity

similar variance. Avoid units in saturation.

71

Ranzato

HOW TO IMPROVE GENERALIZATION

Weight sharing (greatly reduce the number of parameters)

Dropout

Hinton et al. “Improving Nns by preventing co-adaptation of feature detectors”

arxiv 2012

72

Ranzato

OTHER THINGS GOOD TO KNOW

Check gradients numerically by finite differences

Visualize features (feature maps need to be uncorrelated)

and have high variance.

samples

hidden unit

Good training: hidden units are sparse across samples 73

and across features. Ranzato

OTHER THINGS GOOD TO KNOW

Check gradients numerically by finite differences

Visualize features (feature maps need to be uncorrelated)

and have high variance.

samples

hidden unit

Bad training: many hidden units ignore the input and/or 74

exhibit strong correlations. Ranzato

OTHER THINGS GOOD TO KNOW

Check gradients numerically by finite differences

Visualize features (feature maps need to be uncorrelated)

and have high variance.

Visualize parameters

GOOD BAD BAD BAD

75

Ranzato

OTHER THINGS GOOD TO KNOW

Check gradients numerically by finite differences

Visualize features (feature maps need to be uncorrelated)

and have high variance.

Visualize parameters

Measure error on both training and validation set.

Test on a small subset of the data and check the error → 0.

76

Ranzato

WHAT IF IT DOES NOT WORK?

Training diverges:

Learning rate may be too large → decrease learning rate

BPROP is buggy → numerical gradient checking

Check loss function:

Is it appropriate for the task you want to solve?

Does it have degenerate solutions?

Network is underperforming

Compute flops and nr. params. → if too small, make net larger

Visualize hidden units/params → fix optmization

Compute flops and nr. params. → GPU,distrib. framework, make

net smaller 77

Ranzato

SUMMARY

Deep Learning = Learning Hierarchical representations.

Leverage compositionality to gain efficiency.

Unsupervised learning: active research topic.

Supervised learning: most successful set up today.

Optimization

Don't we get stuck in local minima? No, they are all the same!

In large scale applications, local minima are even less of an issue.

Scaling

GPUs

Distributed framework (Google)

Better optimization techniques

Generalization on small datasets (curse of dimensionality):

Input distortions

weight decay 78

dropout Ranzato

THANK YOU!

Deadline: 9 Feb. 2014.

79

Ranzato

SOFTWARE

Torch7: learning library that supports neural net training

http://www.torch.ch

http://code.cogbits.com/wiki/doku.php (tutorial with demos by C. Farabet)

- http://deeplearning.net/software/theano/ (does automatic differentiation)

– http://eblearn.sourceforge.net/

– code.google.com/p/cuda-convnet

http://www.cs.toronto.edu/~ranzato/publications/ranzato_cvpr13.pdf 80

Ranzato

- L09 Using Matlab Neural Networks ToolboxUploaded byLakmalWeerasinghe
- sar_2000Uploaded bytnc
- 08-1003-1Uploaded byJK1175
- NeuroSolutions What is a Neural NetworkUploaded byresponse2jk
- Hand-Written Character Recognition Using Kohonen NetworkUploaded byTabitha Banu Krisnanti
- ARTIFICIAL Neural NetworksUploaded byNasreen Shaik
- Multi-Agent Security System Based on Neural Network Model of User's BehaviorUploaded bytushar_kaka
- An efficient method for classification of rice grains using Morphological processUploaded byIJIRAE
- Artificial Intelligence - WikipediaUploaded bySanjaya Kumar Panda
- Heart Disease Prediction Using Data Mining TechniquesUploaded byInternational Journal of Research in Engineering and Science
- Back Prop TheoryUploaded byNeha Baranwal
- Bengio Nips 2006 AllUploaded bygalaxystar
- CDDH Course Learning Plan BEB45104Uploaded byMuhd Ikram Shabry
- Evolution of Deep LearingUploaded byArslán Qádri
- CSM SubmittedUploaded byZeeshan Shamsi
- PREDICTION OF COMPRESSIVE STRENGTH OF HIGH PERFORMANCE CONCRETE CONTAINING INDUSTRIAL BY PRODUCTS USING ARTIFICIAL NEURAL NETWORKSUploaded byIAEME Publication
- rolfe-lecun-iclr-13Uploaded byJohnnyDoe0x27A
- mcl309-deep-learning-on-a-rasp-ab4cd6a6-8c65-4525-88a0-f6b4e8bc6399-1514865484-171129204147Uploaded byguddu
- ResearchUploaded byJasperBusschers
- He Delving Deep IntoUploaded bydoktorgoldfisch
- 2.Paper.pdfUploaded bycinematiqueshots
- Wang Transferable Joint Attribute-Identity CVPR 2018 PaperUploaded byAstha Verma
- Introduction to Deep Learning - With Complexe Python and TensorFlow Examples_Jürgen Brauer.pdfUploaded byMecobio
- 01_introduction.pdfUploaded byNitesh Shinde
- ai2Uploaded bynegi
- IJIGSP-V7-N6-6.pdfUploaded byShantala Giraddi
- RNNUploaded byVikram Rana
- Phd Msc Positions Synchromedia 2019Uploaded byJunaid Akram
- 5021-distributed-representations-of-words-and-phrases-and-their-compositionality.pdfUploaded bydragonwarrior
- D.L..pptxUploaded byMoataz Jabar

- Choosing a Good ChartUploaded byMark Druskoff
- Newspaper Chase(Penguin Readers Easystarts)Uploaded byIskra
- The_Three_Billy_Goats_GrufF_Penguin_Readers_Eas.pdfUploaded byCheng Yu Wang
- The Three Billy Goats GrufF Penguin Readers EasUploaded byCheng Yu Wang
- Project open sourceUploaded byAgus Suhendar
- R1PlotsUploaded byEduard Mars
- Ann and Deeplearning PreviewUploaded byCheng Yu Wang
- Oracle DBA Code ExamplesUploaded byJaime Arnold Huanca Valle
- KeyUploaded byCheng Yu Wang
- EulaUploaded byBluo's Clues
- Lab 3 - IR RangerUploaded byCheng Yu Wang
- Embedded ControllersUploaded byCheng Yu Wang
- Embedded_Systems_DictionaryUploaded byImranasan

- 2 _ Network ModelsUploaded byNickole Brucal
- Activity Proposal - Trade Fair2018Uploaded byYuki Nakata
- Zoo Management Manual CompressedUploaded bytoha putra
- 14649_Chapter5Uploaded byReka Kutasi
- physical and chemical changesUploaded byapi-237195020
- PERSONALITYUploaded bySamita Samaiya
- CDC UP Project Charter TemplateUploaded byMasters PM
- Lean Office applied to ICT Project Management: Autoparts Company Case StudyUploaded byAnonymous vQrJlEN
- Stress Management tUploaded by786arpit786
- 3 DistributionsUploaded byAngga Ade Putra
- 3D Graphics on iPhoneUploaded byapi-26188019
- C LanguageUploaded byKrish Mayuran
- Guidebook PhysicalMetallurgy 2016Uploaded byarryan
- math 9 examUploaded bysan nicolas
- Quantification of BZ-4 sunscreenUploaded byasobanska
- Chapter 1 BlooddonorUploaded byLOLriffic
- Carteles de Seguridad (BS)Uploaded bycaballerolang
- Singular and Plural Pronominal Reference in SpanishUploaded byAnahí Lopez Favilli
- ESET NOD32 Antivirus 4.2.64.12 Nt32 User GuideUploaded bylara_croft6008
- 4th AssignmentUploaded byAnjali
- The Goal of Nursing as Described ByUploaded byPrudencio Animas
- Sex Reversal in FishUploaded byKgerb
- MT-010 The Importance of Data Converter Static SpecUploaded bygreatqs
- Project on KolsonUploaded byNoshaba Khan
- Presentation for ThesisUploaded byCarissa Cestina
- Small Scale Modelling of Concrete StructuresUploaded bykaykaysoft
- Astm D610 Evaluating Degree of Rusting on Painted Steel SurfacesUploaded byjariasvazquez
- What is a ProjectUploaded bychandan kumar
- VG 2019_Aerospace and Defence_Sector ProfileUploaded byDeepak Goyal
- A Comparative Study on Stress and Its Contributing Factors Among Thegraduate and Postgraduate StudentsUploaded bylydia

## Much more than documents.

Discover everything Scribd has to offer, including books and audiobooks from major publishers.

Cancel anytime.