You are on page 1of 16

AI-29: Deep Learning

Layers and Activation Functions

Prof. Dr. Florian Wahl


Faculty of Applied Computer Science
Deggendorf Institute of Technology

Summer Semester 2024

Prof. Dr. Florian Wahl (AI, THD) Deep Learning Summer Semester 2024 1 / 16
Learning Objectives

What Have We Learned So Far


We have ...
understood the interaction of inputs, weights, and bias
implemented multiple perceptrons in one layer

Goals for Today


Extending from one layer to multiple layers
Introduction to different activation functions and their purposes
Practical implementation

Prof. Dr. Florian Wahl (AI, THD) Deep Learning Summer Semester 2024 2 / 16
Layers

So far, only one layer


For deep neural networks, we need two or more hidden layers
The benefit of hidden layers will become clear later
Hidden layers are all layers between the input and output layers
Hidden does not mean that their values are irrelevant
Values in hidden layers are relevant for tuning and debugging

Prof. Dr. Florian Wahl (AI, THD) Deep Learning Summer Semester 2024 3 / 16
Layers

Connecting Multiple Layers


For a fully connected dense layer
Our first layer had 4 inputs, hence 4 outputs
Therefore, perceptrons of the following hidden layer
each need 4 inputs (number of outputs of the input
layer)

Prof. Dr. Florian Wahl (AI, THD) Deep Learning Summer Semester 2024 4 / 16
Layers

Connecting Multiple Layers


Dimensions of the matrices
Hidden Layer 1
▶ Input data: #Observations × #Inputs: 3 × 4
▶ Weights: #Inputs × #Perceptrons: 4 × 3
▶ Biases: #Perceptrons: 3
▶ Output: #Observations × #Perceptrons: 3 × 3

Prof. Dr. Florian Wahl (AI, THD) Deep Learning Summer Semester 2024 5 / 16
Layers

Hidden Layer 1
Input data: #Observations × #Inputs: 3 × 4
Weights: #Inputs × #Perceptrons: 4 × 3
Biases: #Perceptrons: 3
Output: #Observations × #Perceptrons: 3 × 3

Hidden Layer 2
Input data: #Observations × #Inputs: 3 × 3
Weights: #Inputs × #Perceptrons: 3 × 3
Biases: #Perceptrons: 3
Output: #Observations × #Perceptrons: 3 × 3
Prof. Dr. Florian Wahl (AI, THD) Deep Learning Summer Semester 2024 6 / 16
Layers

Connecting Multiple Layers


Number of inputs must be known for the first layer
Inputs of subsequent layers are defined by the respective previous layer
Output of one layer is input of the next

Prof. Dr. Florian Wahl (AI, THD) Deep Learning Summer Semester 2024 7 / 16
Layers

Initialization of Biases
Bias shifts the activation time of a neuron
Bias initialization ̸= 0 is useful if many inputs are 0
In our example, we initialize biases with 0

Prof. Dr. Florian Wahl (AI, THD) Deep Learning Summer Semester 2024 8 / 16
Layers

Initialization of Weights
For a neuron to “fire”, the activation function must produce an output
Each output of a layer is input to the next
If many weights are 0, the neuron likely won’t fire
This disrupts the training process
Results in a so-called dead network
Solution: Initialize weights with small random numbers

Prof. Dr. Florian Wahl (AI, THD) Deep Learning Summer Semester 2024 9 / 16
Activation Functions

Step Function
Step (
1.0 1 if x > 0
y=
0.8 0 if x ≤ 0
0.6
0.4 Simplest activation function
0.2 Neuron “fires” if threshold is reached
0.0 Formerly typical in hidden layers
10 5 0 5 10
Today, hardly used in practice

Prof. Dr. Florian Wahl (AI, THD) Deep Learning Summer Semester 2024 10 / 16
Activation Functions

Linear Linear Function


10
5 y =x

0
5
Output equals input
Typical in output layer for regression
10
10 5 0 5 10 problems

Prof. Dr. Florian Wahl (AI, THD) Deep Learning Summer Semester 2024 11 / 16
Activation Functions

Sigmoid Function
Sigmoid 1
1.0 y=
1 + e−x
0.8
0.6
Output between 0 and 1
0.4
Like step function, but with more detail
0.2
0.0
Formerly often used in hidden layers
10 5 0 5 10 Typical in output layers of binary
classification problems

Prof. Dr. Florian Wahl (AI, THD) Deep Learning Summer Semester 2024 12 / 16
Activation Functions

Hyperbolic Tangent (Tanh)


tanh y = tanh(x)
1.0
0.5
0.0 Output between -1 and 1
Like sigmoid, but steeper gradienta
0.5
Typical in output layers of binary
1.0
classification problems
10 5 0 5 10
a
http://yann.lecun.com/exdb/publis/pdf/lecun-98b.pdf

Prof. Dr. Florian Wahl (AI, THD) Deep Learning Summer Semester 2024 13 / 16
Activation Functions

Rectified Linear Unit (ReLU)


ReLU (
10 x if x > 0
y=
8 0 if x ≤ 0
6
4 Very easy to compute
2 Linear on the positive half
0 But nonlinear due to bend at 0
10 5 0 5 10
Standard for hidden layers

Prof. Dr. Florian Wahl (AI, THD) Deep Learning Summer Semester 2024 14 / 16
Activation Functions

Motivation
Neural networks typically aim to map nonlinear functions
Nonlinear means these cannot be well approximated by a straight line
To map nonlinearities well, nonlinear activation functions are needed
ReLU is suitable as a nonlinear activation functiona
a
Video demo at https://nnfs.io/mvp

Prof. Dr. Florian Wahl (AI, THD) Deep Learning Summer Semester 2024 15 / 16
Activation Functions
Softmax
ezi,j
Si,j = PL
zi,l
l=1 e
Exponential
20000 z is the value of the output at i, j. Here, i
represents the current observation, and j
15000
represents the output for the current
10000 observation.
5000 Normalizes the results
0 Makes results comparable to each other
10 5 0 5 10
Ensures results are positive
Used in the output layer of multi-class
classification problems
Prof. Dr. Florian Wahl (AI, THD) Deep Learning Summer Semester 2024 16 / 16

You might also like