You are on page 1of 44

Machine Learning Department of Electrical Engineering

Artificial Neural
Networks

Machine Learning Department of Electrical Engineering


Neuron

Machine Learning Department of Electrical Engineering


Neuron in the Brain

Machine Learning Department of Electrical Engineering


Artificial Neuron or Perceptron

x1
W1

x2 W2
Σ
Wd

xd
d

Machine Learning Department of Electrical Engineering


Artificial Neuron or Perceptron

Machine Learning Department of Electrical Engineering


Classification with Perceptron

Machine Learning Department of Electrical Engineering


Learning the weights for Perceptron

Machine Learning Department of Electrical Engineering


Gradient Descent Perceptron

Machine Learning Department of Electrical Engineering


Example

Machine Learning Department of Electrical Engineering


Example

Machine Learning Department of Electrical Engineering


Perceptron & Logistic Regression

Machine Learning Department of Electrical Engineering


Example

Machine Learning Department of Electrical Engineering


Example

Machine Learning Department of Electrical Engineering


Combining Perceptrons

x1 x2 OR x1 x2 OR
0 0 0 -1 -1 -1
0 1 1 -1 +1 +1
1 0 1 +1 -1 +1
1 1 1 +1 +1 +1

x1 x2 AND x1 x2 AND
0 0 0 -1 -1 -1
0 1 0 -1 +1 -1
1 0 0 +1 -1 -1
1 1 1 +1 +1 +1

Machine Learning Department of Electrical Engineering


XOR & XNOR Functions
x1 x2 XOR x1 x2 XOR
0 0 0 -1 -1 -1
0 1 1 -1 +1 +1
1 0 1 +1 -1 +1
1 1 0 +1 +1 -1

x1 x2 XNOR x1 x2 XNOR
0 0 0 -1 -1 -1
0 1 0 -1 +1 -1
1 0 0 +1 -1 -1
1 1 1 +1 +1 +1

• The data presented by XOR or XNOR functions is not


linearly separable.
• Single Perceptron is unable to classify this data
Machine Learning Department of Electrical Engineering
The Multi-Layer Perceptron for XOR

1 1

x1

x2

Input Layer Hidden Layer Output Layer

Machine Learning Department of Electrical Engineering


The Multi-Layer Perceptron for XNOR

1 1

x1

x2

Input Layer Hidden Layer Output Layer

Machine Learning Department of Electrical Engineering


Neural Network Intuition

Input Layer Hidden Layer 1 Hidden Layer 2 Output Layer

Machine Learning Department of Electrical Engineering


Neural Network Intuition

Machine Learning Department of Electrical Engineering


Multi-Class Classification

Class 1
Class 2
Class 3
Class 4
Input Layer
Hidden Layer 1 Output Layer
Hidden Layer 2

Output = Output = Output = Output =

Class 1 Class 2 Class 3 Class 4

Machine Learning Department of Electrical Engineering


Activation Functions
y
1

• Step Function 0
x

y
+1

• Signum Function 0
x

-1

Machine Learning Department of Electrical Engineering


Activation Functions
y
1

• Sigmoid Function 0
x

y
+1

• Tangent Hyperbolic 0
x

Function -1

Machine Learning Department of Electrical Engineering


Activation Functions

• ReLU Function

x
0

• Identity Function

Machine Learning Department of Electrical Engineering


Renaissance of Neural Networks

• Rebranding/Renaming
• ReLU
• GPUs
• Stochastic Gradient Descent

Machine Learning Department of Electrical Engineering


Non Linear Function Modelling with ANN

• Feed Forward Multi-Layer Neural Network

1 1

x1 Z1 Y1

x2 Z2 Y2

Machine Learning Department of Electrical Engineering


Adaptive Non-Linear Functions

• Non-Linear Regression
• h1: Non Linear Function
• h2: Identity

• Non-Linear Classification
• h1: Non Linear Function
• h2: Sigmoid

Machine Learning Department of Electrical Engineering


Optimization

 Error Minimization
 Back Propagation
 Maximum Likelihood
 Maximum A Posteriori
 Bayesian Learning

Machine Learning Department of Electrical Engineering


Least Square Error

• Error Function

We are optimizing a linear combination of non-linear


functions (Regression)

Machine Learning Department of Electrical Engineering


Gradient Descent

• For each example, adjust the weights as follows:

• How can we compute the gradient efficiently,


given an arbitrary network structure?
• Back Propagation
• Automatic Differentiation

Machine Learning Department of Electrical Engineering


Back Propagation Algorithm

• Two Phases
• Forward Phase: Compute output Zj of each unit j.

• Backward Phase: Compute δj (error) at unit j.

Machine Learning Department of Electrical Engineering


Forward Phase
2 Input Units 2 Hidden Units 2 Output Units

1 1

x1 Z1 Z3

x2 Z2 Z4

Machine Learning Department of Electrical Engineering


Backward Phase

Use chain rule to recursively compute gradient


• For each weight wji:

1 1

x1 Z1 Z3

x2 Z2 Z4

Machine Learning Department of Electrical Engineering


Backward Phase

Machine Learning Department of Electrical Engineering


Example with tanh(.) Activation Function

• Forward Propagation
• Hidden Units aj=
• Output Units ak=

• Backward Propagation
• Output Units δk:
• Hidden Units δj:

Machine Learning Department of Electrical Engineering


Deep Neural Networks

• Definition : Neural networks with many hidden layers

• Advantage: High Expressivity

• Challenges:

• How to train deep neural network?


• How can we avoid overfitting?

Machine Learning Department of Electrical Engineering


Expressiveness

• Neural networks with one hidden layer of


sigmoid/hyperbolic units can approximate
arbitrarily closely neural networks with
several layers of sigmoid/hyperbolic units.

• As we increase the number of layers, the


number of units needed may decrease
exponentially (with the number of layers).

Machine Learning Department of Electrical Engineering


Example: Parity Function

• Single Layer of Hidden Nodes (2n-1 units)


A
N
x1 D
A
N
D
A
N
D
x2 A
OR
N
D
A
N
D
x3 A
N
D
A
N
D
A
x4 N
D

Machine Learning Department of Electrical Engineering


Example: Parity Function

• 2n-2 layers of hidden nodes

1 1 1 1 1 1

x1 AND OR AND OR AND OR

x4 AND
x2 AND x3 AND

Machine Learning Department of Electrical Engineering


Vanishing Gradients

• Deep neural networks of sigmoid and


hyperbolic units often suffer from vanishing
gradients.
small
Gradient Medium
Gradient Large
Gradient

Machine Learning Department of Electrical Engineering


Sigmoid & Hyperbolic Units

• Derivative is always less than 1

Machine Learning Department of Electrical Engineering


Example

w1 w2 w3 w4
X h1 h2 h3 y

• Common weight initialization {-1,1}


• Sigmoid function and its derivative always less than 1
• This leads to vanishing gradients

Machine Learning Department of Electrical Engineering


Avoiding Vanishing Gradients

• Several popular solutions

• Skip Connections
• Batch Normalization
• Rectified Linear Units (ReLU as activation function)

Machine Learning Department of Electrical Engineering


Rectified Linear Units

• Rectified Linear Unit: h(a) = max(0,a)


• Gradient is not between 0 and 1

• Soft Version “softplus”


h(a) = log (1+ea)

• Warning: softplus
does not prevent
gradient vanishing

Machine Learning Department of Electrical Engineering

You might also like