You are on page 1of 8

ACTIVATION FUNCTIONS

WHICH ONE TO USE

swipe right
ReLU

0 when x <= 0; x when x > 0. 0 when x <0; 1 when x > 0

PROs CONs
• Solves vanishing gradient • Neurons with -ve input die
• Computationally Efficient • Sensitive to initialization
• Faster Convergence • Not differentiable at 0
• Default for hidden layers

swipe right
Tanh

Range of Tanh : [-1,1]

Range of Deriv. Tanh: (0,1]

PROs CONs
• Zero-centered range (better for • Vanishing gradient problem
optimization) • Computationally expensive
• Smooth gradient • Not suitable for deeper networks

swipe right
Sigmoid

Range of Sigmoid : [0,1]

Range of Deriv. Sigmoid : (0,0.25]

PROs CONs
• Output suitable for binary • Vanishing gradient problem
classification • Computationally expensive
• Used for multi-label • Too much compression
classification
• Smooth gradient

swipe right
Softmax

Range : (0,1)

PROs CONs
• Interpretable as likelihood of a • Doesn't work for multi-label
class classification
• works well with categorical • Vulnerable to Imbalance
cross entropy (CCE) loss datasets
• Optimal for multi-class • Instable to large input values
classification (Overflow errors)

swipe right
GeLU

Range similiar to ReLU

PROs CONs
• Smooth gradient • Computationally expensive that
• Dynamic gating makes network ReLU
adaptable • Reduced Interpretibility
• Used in SoTA transformer
models (GPT, BERT, SAM)
Thumb rules

• Start with ReLU for hidden layers, then GeLU


and Tanh
• If binary classification use Sigmoid for output
• If multi-class, use Softmax for output
• For transformer-based models, start with
GeLU
References
https://towardsdatascience.com/why-rectified-linear-unit-relu-in
-deep-learning-and-the-best-practice-to-use-it-with-tensorflow-e
9880933b7ef

https://medium.com/@omkar.nallagoni/activation-functions-wit
h-derivative-and-python-code-sigmoid-vs-tanh-vs-relu-44d23915
c1f4

https://www.researchgate.net/figure/The-Softmax-activation-fun
ction-and-its-derivative_fig5_373474238 [accessed 1 Mar, 2024]

https://www.cs.cmu.edu/~bhiksha/courses/deeplearning/Spring
.2019/archive-f19/www-bak11-22-2019/document/note/hwnotes
/HW1p1.html.backup

https://www.researchgate.net/figure/The-GELU-function-and-its
-derivative-with-respect-to-x_fig2_373051870 [accessed 2 Mar,
2024]

You might also like