MIT Introduction to Deep Learning Course Lecture 1 Slides

© All Rights Reserved

108 views

MIT Introduction to Deep Learning Course Lecture 1 Slides

© All Rights Reserved

- Deep Learning in Character Recognition
- UT Dallas Syllabus for cs5343.501.09s taught by Ivor Page (ivor)
- CorordinatesC
- t.q.m syllabus
- Neural Networks
- cscan-438
- Efficent Approach Use to Increase Accuracy Multimodal Biometric System for Feature Level Fusion
- AI_Lecture6b_NN
- Msc Thesis
- Dictionar Binary
- a strategic approach to assessing deep learning
- Forecasting FX Using Recurrent Neural Network
- rbfn_akhlaghi.pdf
- 1-s2.0-S0167268108001959-main
- Deep Learning - Wikipedia, The Free Encyclopedia
- Jeff Oriol NIPS Tutorial 2015
- 1298629901_logo_2-JAN11003(2)
- TrindadeTavares_rpt
- resume phd 0 2
- 10-ANN

You are on page 1of 114

Nick Locascio

2016: year of deep learning

MIT 6.S191 | Intro to Deep Learning | IAP 2017

Deep Learning Success

- Image Classification

Machine Translation

Speech Recognition

Speech Synthesis

Game Playing

Deep Learning Success

- Image Classification

Machine Translation

Speech Recognition

Speech Synthesis

Game Playing

Better than

and many, many more AlexNet humans

Krizhevsky, Sutskever, Hinton 2012

Deep Learning Success

Image Classification

- Machine Translation

Speech Recognition

Speech Synthesis

Game Playing

.

and many, many more

Deep Learning Success

Image Classification

Machine Translation

- Speech Recognition

Speech Synthesis

Game Playing

.

and many, many more

Deep Learning Success

Image Classification

Machine Translation

Speech Recognition

- Speech Synthesis

Game Playing

.

and many, many more

Deep Learning Success

Image Classification

Machine Translation

Speech Recognition

Speech Synthesis

- Game Playing

.

and many, many more

Deep Learning Success

Image Classification

Machine Translation

Speech Recognition

Speech Synthesis

- Game Playing

.

and many, many more

6.S191 Goals

1. Fundamentals

2. Practical skills

3. Up to speed on current state of the field

4. Foster an open and collaborative deep learning community within MIT

and development.

Class Information

1 week, 5 sessions

P/F, 3 credits

2 TensorFlow Tutorials

In-class Monday + Tuesday

1 Assignment: (more info in a few slides)

Typical Schedule

10:30am-11:15am Lecture #1

11:15am-12:00pm Lecture #2

12:00pm-12:30pm Coffee Break

12:30pm-1:30pm Tutorial / Proposal Time

Assignment Information

1 Assignment, 2 options:

Present a novel deep learning research idea or application

OR

Write a 1-page review of a deep learning paper

Option 1: Novel Proposal

Proposal Presentation

Groups of 3 or 4

Present a novel deep learning research idea or application

1 slide, 1 minute

List of example proposals on website: introtodeeplearning.com

Presentations on Friday

Submit groups by Wednesday 5pm to be eligible

Submit slide by Thursday 9pm to be eligible

Option 2: Paper Review

Write a 1-page review of a deep learning paper

Suggested papers listed on website introtodeeplearning.com

We will read + grade based on clarity of writing and technical

communication of main ideas.

Class Support

Piazza: https://piazza.com/class/iwmlwep2fnd5uu

Course Website: introtodeeplearning.com

Lecture slides: introtodeeplearning.com/schedule

Email us: introtodeeplearning-staff@mit.edu

OH by request

Staff: Lecturers

Staff: TA + Admin

Our Fantastic Sponsors!

Why Deep Learning and why now?

Why Deep Learning?

Why Now?

1. Large Datasets

2. GPU Hardware Advances + Price Decreases

3. Improved Techniques

Fundamentals of Deep Learning

The Perceptron

1. Invented in 1954 by Frank Rosenblatt

2. Inspired by neurobiology

The Perceptron

inputs weights sum non-linearity

x0

w

0

x1 w

1

w2

x2

wn

xn

b

bias

Perceptron Forward Pass

inputs weights sum non-linearity

x0

w

0

x1 w

1

w2 output

x2

wn

xn

b

bias

Perceptron Forward Pass

inputs weights sum non-linearity

x0

w

0

x1 w

1

w2 output

x2

wn

xn

b

bias

Perceptron Forward Pass

inputs weights sum non-linearity

x0

w

0

x1 w

1

w2 output

x2

wn

xn

b

bias

Perceptron Forward Pass

inputs weights sum non-linearity

x0

w

0

x1 w

1

w2 output

x2

wn

xn

b

bias

Perceptron Forward Pass

inputs weights sum non-linearity

x0

w

0

x1 w

1

w2 output

x2

wn

xn

b

bias

Perceptron Forward Pass

inputs weights sum non-linearity

Activation Function

x0

w

0

x1 w

1

w2 output

x2

wn

xn

b

bias

Sigmoid Activation

inputs weights sum non-linearity

x0

w

0

x1 w

1

w2 output

x2

wn

xn

b

bias

Common Activation Functions

Importance of Activation Functions

Activation functions add non-linearity to our networks function

Most real-world problems + data are non-linear

Perceptron Forward Pass

inputs weights sum non-linearity

2

0.1

3 0.5

2.5 output

-1

0.2

5

3.0

bias

Perceptron Forward Pass

inputs weights sum non-linearity

(2*0.1) + 2

0.1

(3*0.5) + 3 0.5

2.5 output

(-1*2.5) + -1

0.2

(5*0.2) + 5

3.0

1

(1*3.0)

) bias

Perceptron Forward Pass

inputs weights sum non-linearity

2

0.1

3 0.5

2.5 output

-1

0.2

5

3.0

bias

How do we build neural networks

with perceptrons?

Perceptron Diagram Simplified

inputs weights sum non-linearity

x0

w

0

x1 w

1

w2 output

x2

wn

xn

b

bias

Perceptron Diagram Simplified

inputs output

x0

x1

o0

x2

xn

Multi-Output Perceptron

Input layer output layer

x0

x1 o0

x2 o1

xn

Multi-Layer Perceptron (MLP)

layer layer layer

h0

x0

h1 o0

x1

h2 on

xn

hn

Multi-Layer Perceptron (MLP)

layer layer layer

h0

x0

h1 o0

x1

h2 on

xn

hn

Deep Neural Network

layer layers layer

h0 h0

x0

h1 h1 o0

x1 ...

h2 h2 on

xn

hn hn

Applying Neural Networks

Example Problem: Will my Flight be Delayed?

Example Problem: Will my Flight be Delayed?

Temperature: -20 F

Example Problem: Will my Flight be Delayed?

[-20, 45]

Example Problem: Will my Flight be Delayed?

h0

x0

[-20, 45] h1 o0

x1

h2

Example Problem: Will my Flight be Delayed?

h0

x0

Predicted: 0.05

[-20, 45] h1 o0

x1

h2

Example Problem: Will my Flight be Delayed?

h0

x0

Predicted: 0.05

[-20, 45] h1 o0

x1 Actual: 1

h2

Quantifying Loss

h0

x0

Predicted: 0.05

[-20, 45] h1 o0

x1 Actual: 1

h2

Predicted Actual

Total Loss

Input h0 Predicted Actual

[ x0 [ [

[-20, 45], 0.05 1

h1 o0

[80, 0], 0.02 0

[4, 15], x1 0.96 1

[45, 60], h2 0.35 1

] ] ]

Predicted Actual

Total Loss

Input h0 Predicted Actual

[ x0 [ [

[-20, 45], 0.05 1

h1 o0

[80, 0], 0.02 0

[4, 15], x1 0.96 1

[45, 60], h2 0.35 1

] ] ]

Predicted Actual

Binary Cross Entropy Loss

Input h0 Predicted Actual

[ x0 [ [

[-20, 45], 0.05 1

h1 o0

[80, 0], 0.02 0

[4, 15], x1 0.96 1

[45, 60], h2 0.35 1

] ] ]

Mean Squared Error (MSE) Loss

Input h0 Predicted Actual

[ x0 [ [

[-20, 45], 10 40

h1 o0

[80, 0], 45 42

[4, 15], x1 100 110

[45, 60], h2 15 55

] ] ]

Predicted Actual

Training Neural Networks

Training Neural Networks: Objective

Training Neural Networks: Objective

loss function

Training Neural Networks: Objective

Loss is a function of the models parameters

How to minimize loss?

How to minimize loss?

Compute:

How to minimize loss?

of gradient to new point

How to minimize loss?

of gradient to new point

+

+

How to minimize loss?

Repeat!

This is called Stochastic Gradient Descent (SGD)

Repeat!

Stochastic Gradient Descent (SGD)

Initialize randomly

For N Epochs

For each training example (x, y):

Stochastic Gradient Descent (SGD)

Initialize randomly

For N Epochs

For each training example (x, y):

Stochastic Gradient Descent (SGD)

Initialize randomly

For N Epochs

For each training example (x, y):

MIT 6.S191 | Intro to Deep Learning | IAP 2017

Calculating the Gradient: Backpropagation

W1 W2

x0 h0 o0 J( )

Calculating the Gradient: Backpropagation

W1 W2

x0 h0 o0 J( )

Calculating the Gradient: Backpropagation

W1 W2

x0 h0 o0 J( )

Calculating the Gradient: Backpropagation

W1 W2

x0 h0 o0 J( )

Calculating the Gradient: Backpropagation

W1 W2

x0 h0 o0 J( )

Calculating the Gradient: Backpropagation

W1 W2

x0 h0 o0 J( )

Calculating the Gradient: Backpropagation

W1 W2

x0 h0 o0 J( )

Calculating the Gradient: Backpropagation

W1 W2

x0 h0 o0 J( )

Calculating the Gradient: Backpropagation

W1 W2

x0 h0 o0 J( )

Calculating the Gradient: Backpropagation

W1 W2

x0 h0 o0 J( )

Training Neural Networks In Practice

Loss function can be difficult to optimize

Loss function can be difficult to optimize

Update Rule:

Loss function can be difficult to optimize

Update Rule:

Learning Rate & Optimization

Small Learning Rate

Learning Rate & Optimization

Large learning rate

How to deal with this?

1. Try lots of different learning rates to see what is just right

How to deal with this?

1. Try lots of different learning rates to see what is just right

2. Do something smarter

How to deal with this?

1. Try lots of different learning rates to see what is just right

2. Do something smarter : Adaptive Learning Rate

Adaptive Learning Rate

Learning rate is no longer fixed

Can be made larger or smaller depending on:

how large gradient is

how fast learning is happening

size of particular weights

etc

Adaptive Learning Rate Algorithms

ADAM

Momentum

NAG

Adagrad

Adadelta

RMSProp

Escaping Saddle Points

Escaping Saddle Points

Training Neural Networks In Practice 2:

MiniBatches

Why is it Stochastic Gradient Descent?

Initialize randomly

Only an estimate of

For N Epochs true gradient!

For each training example (x, y):

Minibatches Reduce Gradient Variance

Initialize randomly More accurate

For N Epochs estimate!

For each training batch {(x0, y0), , (xB, yB)}:

Advantages of Minibatches

More accurate estimation of gradient

Smoother convergence

Allows for larger learning rates

Minibatches lead to fast training!

Can parallelize computation + achieve significant speed increases on GPUs

Training Neural Networks In Practice 3:

Fighting Overfitting

The Problem of Overfitting

Regularization Techniques

1. Dropout

2. Early Stopping

3. Weight Regularization

4. ...many more

Regularization I: Dropout

During training, randomly set some activations to 0

layer layers layer

h0 h0

x0

h1 h1 o0

x1 ...

h2 h2 on

xn

hn hn

Regularization I: Dropout

During training, randomly set some activations to 0

layer layers layer

h0 h0

x0

h1 h1 o0

x1 ...

h2 h2 on

xn

hn hn

Regularization I: Dropout

During training, randomly set some activations to 0

Typically drop 50% of activations in layer

Forces network to not rely on any 1 node

Input hidden output

layer layers layer

h0 h0

x0

h1 h1 o0

x1 ...

h2 h2 on

xn

hn hn

Regularization I: Dropout

During training, randomly set some activations to 0

Typically drop 50% of activations in layer

Forces network to not rely on any 1 node

Input hidden output

layer layers layer

h0 h0

x0

h1 h1 o0

x1 ...

h2 h2 on

xn

hn hn

Regularization II: Early Stopping

Dont give the network time to overfit

...

Epoch 15: Train: 85% Validation: 80%

Epoch 16: Train: 87% Validation: 82%

Epoch 17: Train: 90% Validation: 85%

Epoch 18: Train: 95% Validation: 83%

Epoch 19: Train: 97% Validation: 78%

Epoch 20: Train: 98% Validation: 75%

Regularization II: Early Stopping

Dont give the network time to overfit

...

Epoch 15: Train: 85% Validation: 80%

Stop here!

Epoch 16: Train: 87% Validation: 82%

Epoch 17: Train: 90% Validation: 85%

Epoch 18: Train: 95% Validation: 83%

Epoch 19: Train: 97% Validation: 78%

Epoch 20: Train: 98% Validation: 75%

Regularization II: Early Stopping

Stop here!

Regularization III: Weight Regularization

Large weights typically mean model is overfitting

Add the size of the weights to our loss function

Perform well on task + keep weights small

Regularization III: Weight Regularization

Large weights typically mean model is overfitting

Add the size of the weights to our loss function

Perform well on task + keep weights small

Regularization III: Weight Regularization

Large weights typically mean model is overfitting

Add the size of the weights to our loss function

Perform well on task + keep weights small

Regularization III: Weight Regularization

Large weights typically mean model is overfitting

Add the size of the weights to our loss function

Perform well on task + keep weights small

Core Fundamentals Review

Perceptron Classifier

Stacking Perceptrons to form neural networks

How to formulate problems with neural networks

Train neural networks with backpropagation

Techniques for improving training of deep neural networks

Questions?

- Deep Learning in Character RecognitionUploaded byJuan Andres
- UT Dallas Syllabus for cs5343.501.09s taught by Ivor Page (ivor)Uploaded byUT Dallas Provost's Technology Group
- CorordinatesCUploaded bykaiser_m00n
- t.q.m syllabusUploaded byAbhishek Saini
- Neural NetworksUploaded byphreedom
- Efficent Approach Use to Increase Accuracy Multimodal Biometric System for Feature Level FusionUploaded byEditor IJRITCC
- AI_Lecture6b_NNUploaded bySumit Gupta
- Dictionar BinaryUploaded byAlexandru Zelinschi
- cscan-438Uploaded byAbdelkrim AB
- Msc ThesisUploaded byMuhammad Rizwan
- a strategic approach to assessing deep learningUploaded byapi-270241877
- Forecasting FX Using Recurrent Neural NetworkUploaded byH090081
- rbfn_akhlaghi.pdfUploaded bymoh_750571194
- 1-s2.0-S0167268108001959-mainUploaded bySam Keays
- Deep Learning - Wikipedia, The Free EncyclopediaUploaded bygarbage
- Jeff Oriol NIPS Tutorial 2015Uploaded byTarak Kharrat
- 1298629901_logo_2-JAN11003(2)Uploaded bySyams Fathur
- TrindadeTavares_rptUploaded byGeorge Chiu
- resume phd 0 2Uploaded byapi-348749818
- 10-ANNUploaded byHuỳnhĐứcTrí
- IMPPPPPPPPPPPPPUploaded bypreetha
- Image Category Classification Using Deep LearningUploaded byHoàng Ngọc Cảnh
- Texture AnalysisUploaded byRobert Cringely
- 397-1642-2-PBUploaded byBat-Erdene Davaajav
- SPMA018Uploaded byAnonymous 1aqlkZ
- Intel DL FPGA 1Uploaded bydivzz97
- Predicting Solar Radiation.pdfUploaded byBALAMANTECH
- Flyer National Level Fdp on Depp LearningUploaded byRamesh Guntha
- Question Bank-pe II- 8ks04-Soft ComputingUploaded bybaraniinst6875
- Learning to Understand Remote Sensing Images Volume 2Uploaded byAlcides Naranjo

- Ifw Dd 2016 Machine Learning FinalUploaded bymosqi
- a-gentle-introduction-to-blockchain-technology-web.pdfUploaded byFiroz
- Lecture 0Uploaded bytintojames
- Screw CompressorsUploaded bytintojames
- MIT Introduction to Deep Learning Course Lecture 3 Slides - Computer VisionUploaded bytintojames
- Uda City Capstone Project Open a i GymUploaded bytintojames
- MIT Introduction to Deep Learning Course Lecture 6 Slides - Deep Reinforcement LearningUploaded bytintojames
- MIT Introduction to Deep Learning Course Lecture 4 Slides - Computer Vision Deep Generative ModelsUploaded bytintojames
- MIT Introduction to Deep Learning Course Lecture 5 Slides - Multimodal LearningUploaded bytintojames
- MIT Introduction to Deep Learning Course Lecture 2 Slides - Sequence ModelingUploaded bytintojames
- 284398301-Cambridge-Primary-I-dictionary-3-Workbook.pdfUploaded bytintojames
- Ultimate Skills Checklist for Your First Data Analyst JobUploaded byalhullandi
- Time Series Data Basics With Pandas PartUploaded bytintojames
- Module 1 - Intro to Data Science.pptxUploaded bytintojames
- International Secondary Catalogue 2017Uploaded bytintojames
- Cambridge International Examinations Catalogue 2017Uploaded bytintojames
- 2017 Local CatalogueUploaded bytintojames
- International Primary Catalogue 2017Uploaded bytintojames
- Eclipse IoT White Paper - The Three Software Stacks Required for IoT ArchitecturesUploaded bytintojames
- Knowledge Discovery in Data: A Case StudyUploaded bytintojames
- 32215,11ND-LigganUploaded byAbdan Shidqi
- Altintas Bdc14 Invitedtalk Dec8th2014 141208092953 Conversion Gate02Uploaded bytintojames
- Ifw Deep Dive R-quick GuideUploaded bytintojames
- Iot Analytics Infoworld Deep Dive 0515Uploaded bytintojames
- Streams and Storm April 2014 FinalUploaded bytintojames
- visualapi sparkUploaded byplbrook
- Spark Internals Architecture 21Nov15Uploaded bytintojames
- A Deeper Understanding of Spark Internals Aaron DavidsonUploaded bytintojames
- Kaggle Ensembling GuideUploaded bytintojames

- Product Brief PEX 8796 04Oct13Uploaded bysdfsdfsdf
- Quick Start Um Qs en Il Can Bk Tk PacUploaded bySebastián Octaviano
- TutorialUploaded byPRABAL BHATNAGAR
- 2009 German Tax Organizer Siemens VersionUploaded byshaonaa
- Google HackingUploaded bybai111
- 1421.pdfUploaded byLuis F Jauregui
- ENG_HMUDVBEUJ-1.312Uploaded byjupikajej
- Vivint flyerUploaded byJohn Laws
- Micro Environmental on ToysUploaded bygeena1980
- VMware Converter Standalone Guide40Uploaded byClaudio Diaz
- Naveen Android ResumeUploaded byJanga Vardhan Reddy
- ESM740G-tiristorUploaded byjosenicolas12000
- DS1602Uploaded byjnax101
- Proportional SegmentsUploaded byVenice Diane Flores Frivaldo
- Accenture Cloud Future Business Costs InnovationUploaded bygenieortega
- LC-37M43U__SHARP__LCD TV Color__Suppl+LC-37D42UUploaded byrictronix5586
- FluidTI 89 92 Voyage IDGAS Docu EngUploaded byKhai Huynh
- Linux audit check listUploaded byscorpiyanz
- Siebel EScript Quick ReferenceUploaded byrijone
- Coding Block & ReportingUploaded byDurga Tripathy Dpt
- CATIA Install GuideUploaded bynat27
- LogUploaded byndjkvbdjkslbvjks
- ShipRocket ProposalUploaded byShashi Bhushan Singh
- CCNA Notes FinalUploaded byNabi Bakhsh
- IECON2011 v011 ValidatedUploaded byraghavendran raghu
- MNSMS_2014011715052090 Modelling of the Quantum Transport in StrainedUploaded byKadu Brito
- Manual of information technonology audit vol1Uploaded byritunath
- python regexUploaded byzerohe
- Xgb AnalogUploaded bycucu
- English TestsUploaded byRafael Mejía