You are on page 1of 46

Very Deep Learning

Lecture 04

Dr. Muhammad Zeshan Afzal, Prof. Didier Stricker


MindGarage, University of Kaiserslautern
afzal.tukl@gmail.com

M. Zeshan Afzal, Very Deep Learning Ch. 4


Recap

M. Zeshan Afzal, Very Deep Learning Ch. 4 2


Recap
◼ Logistic Regression as One Layer Neural Network
◼ Computational Graphs

M. Zeshan Afzal, Very Deep Learning Ch. 4 3


Computational Graph

◼ AlexNet

input image

weights

loss

M. Zeshan Afzal, Very Deep Learning Ch. 4 4


Computational Graph

◼ Neural Turing Machine

input image

loss

M. Zeshan Afzal, Very Deep Learning Ch. 4 5


Computational Graph

M. Zeshan Afzal, Very Deep Learning Ch. 4 6


Computational Graph

◼ A simple example

M. Zeshan Afzal, Very Deep Learning Ch. 4 7


Computational Graph

◼ Chain Rule

◼ Multivariate Chain Rule

M. Zeshan Afzal, Very Deep Learning Ch. 4 8


Computational Graph

◼ A simple example

Forward pass

Backward pass

Backward Gradients: Magenta Local Gradients: Blue


M. Zeshan Afzal, Very Deep Learning Ch. 4 9
Computational Graph

◼ A simple example

Forward pass

Backward pass
2 3𝑥 2 1 1*2
2 2

6𝑥 2 2 1

Backward Gradients: Magenta Local Gradients: Blue


M. Zeshan Afzal, Very Deep Learning Ch. 4 10
Computational Graph

M. Zeshan Afzal, Very Deep Learning Ch. 4 11


Computational Graph (Fan-Out >1)

◼ A slightly complicated example

M. Zeshan Afzal, Very Deep Learning Ch. 4 12


Logistic Regression

M. Zeshan Afzal, Very Deep Learning Ch. 4 13


Logistic Regression

◼ As a neural network

M. Zeshan Afzal, Very Deep Learning Ch. 4 14


Variants of Gradient Descent

◼ Gradient Descent
^ Updates after looking at complete dataset
◼ Minibatch Gradient Descent
^ Updates after looking at N samples (batch size)
◼ Stochastic Gradient Descent
^ Updates after looking at every samples
◼ Related Concept
^ Epoch
• one cycle through the full training dataset

M. Zeshan Afzal, Very Deep Learning Ch. 4 15


Logistic Regression (Decision Boundary)

◼ The decision boundary

◼ Decide for class 1

◼ Decide for class 0

M. Zeshan Afzal, Very Deep Learning Ch. 4 16


Simple Examples

M. Zeshan Afzal, Very Deep Learning Ch. 4 17


OR

◼ Linear Classifier
𝑥1
ሺ 1 1ሻ > 0.5

𝑇
𝑥2 −𝑤
𝒙𝟏 𝒙𝟐 OR (𝒙𝟏 , 𝒙𝟐 ) 𝒘 0
𝒙
0 0 0
0 1 1
1 0 1
1 1 1

M. Zeshan Afzal, Very Deep Learning Ch. 4 18


AND

◼ Linear Classifier 𝑥1
ሺ 1 1ሻ > 1.5

𝑇
𝑥2 −𝑤
𝒘 0
𝒙
𝒙𝟏 𝒙𝟐 AND (𝒙𝟏 , 𝒙𝟐 )

0 0 0
0 1 0
1 0 0
1 1 1

M. Zeshan Afzal, Very Deep Learning Ch. 4 19


NAND

◼ Linear Classifier 𝑥1
ሺ−1 −1ሻ > − 1.5

𝑇
𝑥2 −𝑤
𝒘 0
𝒙
𝒙𝟏 𝒙𝟐 NAND (𝒙𝟏 , 𝒙𝟐 )

0 0 1
0 1 1
1 0 1
1 1 0

M. Zeshan Afzal, Very Deep Learning Ch. 4 20


XOR

◼ Linear Classifier

𝒙𝟏 𝒙𝟐 XOR (𝒙𝟏 , 𝒙𝟐 )

0 0 0
0 1 1
1 0 1
1 1 0

M. Zeshan Afzal, Very Deep Learning Ch. 4 21


A Brief History of Deep Learning

1940 1950 1960 1970 1980 1990 2000 2010 2020

◼ 1958 Perceptron (Rosenblatt)


^ find a separating hyperplane by minimizing the distance of
misclassified points to the decision boundary
^ Code the two classes by yi = 1, −1
^ Linear threshold unit

^ It was hyped in the media


• It will solve all the problems

Rosenblatt: The perceptron -a probabilistic model for information storage and organization in the brain. Psychological Review, 1958.

M. Zeshan Afzal, Very Deep Learning Ch. 4 22


A Brief History of Deep Learning

1940 1950 1960 1970 1980 1990 2000 2010 2020

◼ 1969 Minsky and Pappert


^ Mathematical proof of what perceptron is capable of
• Discouraging results
^ Simple problem cant be solved
• XOR

Minsky and Papert: Perceptrons: An introduction to computational geometry. MIT Press, 1969.

M. Zeshan Afzal, Very Deep Learning Ch. 4 23


XOR (Multilayer Perceptron)

◼ Linear Classifier

𝒙𝟏 𝒙𝟐 XOR (𝒙𝟏 , 𝒙𝟐 )

0 0 0
0 1 1
1 0 1
1 1 0

M. Zeshan Afzal, Very Deep Learning Ch. 4 24


Representation Matters

M. Zeshan Afzal, Very Deep Learning Ch. 4 25


Neural Network Playground

◼ https://playground.tensorflow.org/

M. Zeshan Afzal, Very Deep Learning Ch. 4 26


Multilayer Perceptron

M. Zeshan Afzal, Very Deep Learning Ch. 4 27


Multilayer Perceptron

M. Zeshan Afzal, Very Deep Learning Ch. 4 28


Multilayer Perceptron

M. Zeshan Afzal, Very Deep Learning Ch. 4 29


Multilayer Perceptron

M. Zeshan Afzal, Very Deep Learning Ch. 4 30


Activation Functions

M. Zeshan Afzal, Very Deep Learning Ch. 4 31


The Neuron

◼ Neuron
^ Electrically excitable cell that communicates
with other cells via specialized connections
called synapses (100 billion)
◼ Sensory neurons
^ 5 senses
◼ Motor neurons
^ Allow brain to communicate with other parts
of the body
◼ Interneurons
^ connect neurons to other neurons within the
same region of the brain

M. Zeshan Afzal, Very Deep Learning Ch. 4 32


The Neuron

M. Zeshan Afzal, Very Deep Learning Ch. 4 33


The Neuron

M. Zeshan Afzal, Very Deep Learning Ch. 4 34


Sigmoid

◼ Maps input to range [0, 1]


◼ Historically popular since they
have nice interpretation as
saturating firing neuron

◼ Problems
^ Saturates: The gradients are
killed
^ Outputs are not zero centred

M. Zeshan Afzal, Very Deep Learning Ch. 4 35


Sigmoid

◼ Maps input to range [0, 1]

◼ Problems

Restricts gradient updates and is the reason for inefficient optimisation(minibatch helps)

M. Zeshan Afzal, Very Deep Learning Ch. 4 36


Tanh

◼ Maps input to range [-1, 1]

◼ Zero centred
◼ Antisymmetric
◼ Problem
^ Saturation Kills the gradient

M. Zeshan Afzal, Very Deep Learning Ch. 4 37


Relu

◼ Does not saturate (for x>0)


◼ Leads to fast convergence
◼ Computationally efficient
◼ Problem
^ Non zero-centered
^ No learning for x < 0, leads
to dead relu

M. Zeshan Afzal, Very Deep Learning Ch. 4 38


Leaky Relu

◼ Does not saturate


◼ Closer to zero-centred
outputs
◼ Fast convergence
◼ Computationally efficient

M. Zeshan Afzal, Very Deep Learning Ch. 4 39


Parametric Relu

◼ Does not saturate


◼ Closer to zero-centred
outputs
◼ Fast convergence
◼ Computationally efficient
◼ Parameter α is learned
from data

M. Zeshan Afzal, Very Deep Learning Ch. 4 40


ELU

◼ All benefits of leaky relu


◼ Adds some robustness to
noise
◼ Default value α = 1

M. Zeshan Afzal, Very Deep Learning Ch. 4 41


Activation Functions

◼ Choice of activation function depends on the problem


◼ Only most common ones are discussed there are many others
◼ Best activation function is often found by trial and error
◼ It is important to insure good gradient flow during the optimization
◼ In practice
^ Use relu by default with small enough learning rate
^ Try leaky relu, and elu for some additional gain

M. Zeshan Afzal, Very Deep Learning Ch. 4 43


M. Zeshan Afzal, Very Deep Learning Ch. 4 44
A Brief History of Deep Learning

1940 1950 1960 1970 1980 1990 2000 2010 2020

◼ 1986 Backpropagation Algorithm


^ backpropagation algorithm that was able to train a neural
network based on the feed back
^ Allowed the efficient calculation of the gradients with
respect to weights

Rumelhart, Hinton and Williams: Learning representations by back-propagating errors. Nature, 1986.

M. Zeshan Afzal, Very Deep Learning Ch. 4 45


Summary

◼ Concepts
^ Gradient Descent
• Vanilla, Minibatch, Stochastic
◼ Simple Functions
^ OR, AND, NAND, XOR
◼ Representation Matters
◼ Neural Network Playground
◼ MLP
◼ Activation Functions

M. Zeshan Afzal, Very Deep Learning Ch. 4 46


Thanks a lot for your Attention

M. Zeshan Afzal, Very Deep Learning Ch. 4 47

You might also like