You are on page 1of 10

Widely used Activation function inside the neurons

Moin Mostakim

Department of Computer Science and Enigneering


Faculty of School of Data Science

October 2023

Moin Mostakim (BRAC University) Activation Functions October 2023 1 / 10


Contents of the slide

1 Sigmoid Activation Function

2 Hyperbolic Tangent (Tanh) Activation Function

3 Rectified Linear Unit (ReLU) Activation Function

4 Leaky Rectified Linear Unit (Leaky ReLU) Activation Function

5 Exponential Linear Unit (ELU) Activation Function

6 Swish Activation Function

7 Gated Linear Unit (GLU) Activation Function

8 Softmax Activation Function Activation Functions


Moin Mostakim (BRAC University) October 2023 2 / 10
Sigmoid Activation Function

1
Formula: σ(x) = 1+e −x
Range: (0, 1)
First-order Derivative: 1
σ ′ (x) = σ(x) · (1 − σ(x)) 0.8

σ(x)
Output: 0.6
0.4
• Shape: S-shaped curve.
0.2
• Use Cases: Binary classification, 0
sigmoid neurons in the output −5 0 5
layer. x
• Benefits: Smooth gradient, suitable
for converting network outputs to
probabilities.

Moin Mostakim (BRAC University) Activation Functions October 2023 3 / 10


Hyperbolic Tangent (Tanh) Activation Function

e x −e −x
Formula: tanh(x) = e x +e −x
Range: (-1, 1)
First-order Derivative:
tanh′ (x) = 1 − tanh2 (x) 1
0.5

tanh(x)
Output:
0
• Shape: S-shaped curve similar to
sigmoid. −0.5
• Use Cases: Regression, −1
−2 −1 0 1 2
classification. x
• Benefits: Centered around zero,
mitigates vanishing gradient
problem, and provides smooth
gradients.

Moin Mostakim (BRAC University) Activation Functions October 2023 4 / 10


Rectified Linear Unit (ReLU) Activation Function

Formula: ReLU(x) = máx(0, x)


Range: [0, ∞)
First-order Derivative:
(
0 if x < 0
ReLU′ (x) =
1 if x ≥ 0

Output:
• Shape: Linear for positive values, zero for negatives.
• Use Cases: Hidden layers in most neural networks.
• Benefits: Efficient, mitigates vanishing gradient, induces sparsity.

Moin Mostakim (BRAC University) Activation Functions October 2023 5 / 10


Leaky Rectified Linear Unit (Leaky ReLU) Activation
Function
(
x if x ≥ 0
Formula: LeakyReLU(x, α) = Range : (−∞, ∞)
αx if x < 0

First-order Derivative:
(
1 if x ≥ 0
LeakyReLU′ (x, α) =
α if x < 0

Output:
• Shape: Linear for positive values, non-zero slope for negatives.
• Use Cases: Alternative to ReLU to prevent ”dying
ReLU”problem.
• Benefits: Addresses ”dying ReLUı̈ssue, retains sparsity.

Moin Mostakim (BRAC University) Activation Functions October 2023 6 / 10


Exponential Linear Unit (ELU) Activation Function

(
x if x ≥ 0
Formula: ELU(x, α) = x
Range : (−∞, ∞)
α(e − 1) if x < 0

First-order Derivative:
(
′ 1 if x ≥ 0
ELU (x, α) =
αe x if x < 0

Output:
• Shape: Smooth S-shaped curve with an exponential increase for
negative values.
• Use Cases: An alternative to ReLU with smoother gradients.
• Benefits: Smoother gradients, better training on negative values.

Moin Mostakim (BRAC University) Activation Functions October 2023 7 / 10


Swish Activation Function

Formula: Swish(x) = x · σ(x)


Range: (-∞, ∞)
First-order Derivative: Swish′ (x) = Swish(x) + σ(x) · (1 − Swish(x))
Output:
• Shape: Smooth, non-monotonic curve.
• Use Cases: Considered in some architectures as an alternative to
ReLU.
• Benefits: Smoothness, performance improvements observed in
deep networks.

Moin Mostakim (BRAC University) Activation Functions October 2023 8 / 10


Gated Linear Unit (GLU) Activation Function

Formula: GLU(x) = x · σ(g (x))


Range: (-∞, ∞)
First-order Derivative:
GLU′ (x) = σ(g (x)) + x · σ ′ (g (x)) · (1 − σ(g (x)))
Output:
• Shape: Complex, involving a sigmoid gate.
• Use Cases: Used in architectures like the Transformer and other
sequence-to-sequence models.
• Benefits: Enables modeling dependencies in sequences, better
than standard RNNs.

Moin Mostakim (BRAC University) Activation Functions October 2023 9 / 10


Softmax Activation Function

x
Formula (for class i): Softmax(x)i = Pe i xj
j e

Range: (0, 1)
First-order Derivative:

∂xi Softmax(x)j = Softmax(x)i · (δij − Softmax(x)j )
Output:
• Shape: Probability distribution over classes.
• Use Cases: Used in the output layer of multi-class classification
for probability distribution over classes.
• Benefits: Converts scores to class probabilities, essential for
classification tasks.

Moin Mostakim (BRAC University) Activation Functions October 2023 10 / 10

You might also like