Lecture Slides For Chapter 1 of Deep Learning Ian Goodfellow 2016-09-26

Introduction
Lecture slides for Chapter 1 of Deep Learning

www.deeplearningbook.org
Ian Goodfellow
2016-09-26
Representations Matter
APTER 1. INTRODUCTION
Cartesian coordinates Polar coordinates

y
x r
Figure 1.1 suppose we want to separate

ure 1.1: Example of diﬀerent representations:
(Goodfellow 2016)
Depth: Repeated Composition
CHAPTER 1. INTRODUCTION
Output
CAR PERSON ANIMAL
(object identity)
3rd hidden layer

(object parts)
2nd hidden layer

(corners and
contours)
1st hidden layer

(edges)
Visible layer
(input pixels)
Figure 1.2: Illustration of a deep learning model. It is diﬃcult for a computer to understand
Figure 1.2
the meaning of raw sensory input data, such as this image represented as a collection
(Goodfellow 2016)
Computational Graphs
Element Element
Set Set
+
+
⇥ ⇥ ⇥ Logistic
Regression
Logistic
Regression
w1 x1 w2 x2 w x
Figure 1.3: Illustration of computational graphs mapping an input to an output where

Figure
each node performs an operation. Depth is the1.3
length of the longest path from(Goodfellow
input 2016)to
Machine Learning and AI
Deep learning Example:

Shallow
Example: Example:
Example: autoencoders
Logistic Knowledge
MLPs
regression bases
Representation learning
Machine learning
AI
Figure 1.4
Figure 1.4: A Venn diagram showing how deep learning is a kind of representation learning, (Goodfellow 2016)
Learning Multiple Components

Figure 1.5 Output
Mapping from
Output Output
features
Additional
Mapping from Mapping from layers of more
Output
features features abstract
features
Hand- Hand-
Simple
designed designed Features
features
program features
Input Input Input Input
Deep
Classic learning
Rule-based
machine
systems Representation
learning (Goodfellow 2016)
learning
Organization of the Book
Figure 1.6 1. Introduction
Part I: Applied Math and Machine Learning Basics
3. Probability and
2. Linear Algebra
Information Theory
4. Numerical 5. Machine Learning

Computation Basics
Part II: Deep Networks: Modern Practices
6. Deep Feedforward
Networks
7. Regularization 8. Optimization 9. CNNs 10. RNNs
11. Practical
12. Applications
Methodology
Part III: Deep Learning Research
13. Linear Factor 15. Representation

14. Autoencoders
Models Learning
16. Structured 17. Monte Carlo

Probabilistic Models Methods
18. Partition
19. Inference
Function
20. Deep Generative

Models
(Goodfellow 2016)
Figure 1.6: The high-level organization of the book. An arrow from one chapter to another
Historical Waves
0.000250
Frequency of Word or Phrase
cybernetics
0.000200
(connectionism + neural networks)
0.000150
0.000100
0.000050
0.000000
1940 1950 1960 1970 1980 1990 2000
Year
ure 1.7: The figure shows two of Figure

the three1.7
historical waves of artificial neural
(Goodfellow 2016)
Historical Trends: Growing Datasets
109
Dataset size (number examples)
108 Canadian Hansard

WMT Sports-1M
107 ImageNet10k
106 Public SVHN
105 Criminals ImageNet ILSVRC 2014
104
MNIST CIFAR-10
103
102 T vs. G vs. F Rotated T vs. C
Iris
101
100
1900 1950 1985 2000 2015
Year
ure 1.8: Dataset sizes have increased greatly over time. In the early 1900s, statistician
died datasets using hundreds or thousands of manually compiled measurements (Garson
0; Gosset, 1908; Anderson, 1935; Fisher, 1936). In the 1950s through 1980s, the pioneer
iologically inspired machine learningFigure 1.8 with small, synthetic datasets,
often worked (Goodfellow suc
2016)
The MNIST Dataset
Figure 1.9
Figure 1.9: Example inputs from the MNIST dataset. The “NIST” stands for National
(Goodfellow 2016)
Connections per Neuron
104 Human
6 Cat
Connections per neuron
9 7
4
103 Mouse
2
10
5
8
102 Fruit fly
3
1
101
1950 1985 2000 2015
Year
Figure 1.10: Initially, the number of connections between neurons in artificial neura
networks was limited by hardware capabilities. Today, the number of connections between
Figure
neurons is mostly a design consideration. Some1.10
artificial neural networks have(Goodfellow
nearly a
2016)
Number of Neurons
Number of neurons (logarithmic scale)
1011 Human
1010
17 20
109 16 19 Octopus
108 14 18
107 11 Frog
106 8
105 3 Bee
Ant
104
103 Leech
13
102
101 1 2 12 15 Roundworm
6 9
100 5 10
10 1 4 7
10 2 Sponge
1950 1985 2000 2015 2056
Year
ure 1.11: Since the introduction of hidden units, artificial neural networks have doub
ize roughly every 2.4 years. Biological neural network sizes from Wikipedia (2015
1. Perceptron (Rosenblatt, 1958, 1962) Figure 1.11 (Goodfellow 2016)

Solving Object Recognition
0.30
ILSVRC classification error rate
0.25
0.20
0.15
0.10
0.05
0.00
2010 2011 2012 2013 2014 2015
Year
gure 1.12: Since deep networks reached the scale necessary to compete in the Ima
arge Scale Visual Recognition Challenge, they have consistently won the compe
ery year, and yielded lower and lower
Figureerror rates each time. Data from(Goodfellow
1.12 Russak 2016)

Lecture Slides For Chapter 1 of Deep Learning Ian Goodfellow 2016-09-26

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lecture Slides For Chapter 1 of Deep Learning Ian Goodfellow 2016-09-26

Uploaded by

Copyright:

Available Formats

Introduction

Lecture slides for Chapter 1 of Deep Learning

Cartesian coordinates Polar coordinates

Figure 1.1 suppose we want to separate

3rd hidden layer

2nd hidden layer

1st hidden layer

Figure 1.3: Illustration of computational graphs mapping an input to an output where

Deep learning Example:

Learning Multiple Components

Input Input Input Input

Figure 1.6 1. Introduction

Part I: Applied Math and Machine Learning Basics

4. Numerical 5. Machine Learning

Part II: Deep Networks: Modern Practices

7. Regularization 8. Optimization 9. CNNs 10. RNNs

Part III: Deep Learning Research

13. Linear Factor 15. Representation

16. Structured 17. Monte Carlo

20. Deep Generative

ure 1.7: The figure shows two of Figure

108 Canadian Hansard

The MNIST Dataset

1. Perceptron (Rosenblatt, 1958, 1962) Figure 1.11 (Goodfellow 2016)

You might also like