0% found this document useful (0 votes)

619 views6 pages

CS 229 - Deep Learning Cheatsheet

deep learning cheatsheet

Uploaded by

Junaid Dooast

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

619 views6 pages

CS 229 - Deep Learning Cheatsheet

deep learning cheatsheet

Uploaded by

Junaid Dooast

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

8/13/24, 8:04 PM CS 229 - Deep Learning Cheatsheet

Want more content like this? Subscribe here

(https://docs.google.com/forms/d/e/1FAIpQLSeOr-
yp8VzYIs4ZtE9HVkRcMJyDcJ2FieM82fUsFoCssHu9DA/viewform) to be notified of new
releases!

(https://stanford.edu/~shervine/teaching/cs-229/cheatsheet-deep-learning#cs-229---
machine-learning)CS 229 - Machine Learning (teaching/cs-229) English 

Supervised Learning Unsupervised Learning Deep Learning Tips and tricks

(https://stanford.edu/~shervine/teaching/cs-
229/cheatsheet-deep-learning#cheatsheet)Deep
Learning cheatsheet Star 17,328

By Afshine Amidi (https://twitter.com/afshinea) and Shervine Amidi (https://twitter.com/shervinea)

(https://stanford.edu/~shervine/teaching/cs-229/cheatsheet-
deep-learning#nn)
Neural Networks
Neural networks are a class of models that are built with layers. Commonly used types of neural
networks include convolutional and recurrent neural networks.

❐ Architecture ― The vocabulary around neural networks architectures is described in the figure
below:

By noting i the ith layer of the network and j the j th hidden unit of the layer, we have:

https://stanford.edu/~shervine/teaching/cs-229/cheatsheet-deep-learning#nn 1/6
8/13/24, 8:04 PM CS 229 - Deep Learning Cheatsheet

[i] [i] T [i]

z j = w j x + bj

where we note w , b, z the weight, bias and output respectively.

❐ Activation function ― Activation functions are used at the end of a hidden unit to introduce
non-linear complexities to the model. Here are the most common ones:

Sigmoid Tanh ReLU Leaky ReLU

1 ez − e−z g(z) = max(ϵz, z)

g(z) = g(z) = z g(z) = max(0, z)
1 + e−z e + e−z with ϵ ≪ 1

❐ Cross-entropy loss ― In the context of neural networks, the cross-entropy loss L(z, y) is
commonly used and is defined as follows:

L(z, y) = −[y log(z) + (1 − y) log(1 − z)]

❐ Learning rate ― The learning rate, often noted α or sometimes η , indicates at which pace the
weights get updated. This can be fixed or adaptively changed. The current most popular method is
called Adam, which is a method that adapts the learning rate.

❐ Backpropagation ― Backpropagation is a method to update the weights in the neural network

by taking into account the actual output and the desired output. The derivative with respect to
weight w is computed using chain rule and is of the following form:

∂L(z, y) ∂L(z, y) ∂a ∂z
= × ×
∂w ∂a ∂z ∂w

https://stanford.edu/~shervine/teaching/cs-229/cheatsheet-deep-learning#nn 2/6
8/13/24, 8:04 PM CS 229 - Deep Learning Cheatsheet

As a result, the weight is updated as follows:

∂L(z, y)
w ⟵w−α
∂w

❐ Updating weights ― In a neural network, weights are updated as follows:

Step 1: Take a batch of training data.

Step 2: Perform forward propagation to obtain the corresponding loss.
Step 3: Backpropagate the loss to get the gradients.
Step 4: Use the gradients to update the weights of the network.

❐ Dropout ― Dropout is a technique meant to prevent overfitting the training data by dropping
out units in a neural network. In practice, neurons are either dropped with probability p or kept with
probability 1 − p.

(https://stanford.edu/~shervine/teaching/cs-229/cheatsheet-
deep-learning#cnn)
Convolutional Neural Networks
❐ Convolutional layer requirement ― By noting W the input volume size, F the size of the
convolutional layer neurons, P the amount of zero padding, then the number of neurons N that fit
in a given volume is such that:

W − F + 2P
N= +1

❐ Batch normalization ― It is a step of hyperparameter γ, β that normalizes the batch {xi }. By

2
noting μB , σB the mean and variance of that we want to correct to the batch, it is done as follows:

xi − μ B
xi ⟵ γ +β

σB2 + ϵ

https://stanford.edu/~shervine/teaching/cs-229/cheatsheet-deep-learning#nn 3/6
8/13/24, 8:04 PM CS 229 - Deep Learning Cheatsheet

It is usually done after a fully connected/convolutional layer and before a non-linearity layer and
aims at allowing higher learning rates and reducing the strong dependence on initialization.

(https://stanford.edu/~shervine/teaching/cs-229/cheatsheet-
deep-learning#rnn)
Recurrent Neural Networks
❐ Types of gates ― Here are the different types of gates that we encounter in a typical recurrent
neural network:

Input gate Forget gate Gate Output gate

How much to write How much to reveal

Write to cell or not? Erase a cell or not?
to cell? cell?

❐ LSTM ― A long short-term memory (LSTM) network is a type of RNN model that avoids the
vanishing gradient problem by adding 'forget' gates.

For a more detailed overview of the concepts above, check out the Deep Learning cheatsheets
(teaching/cs-230)!

(https://stanford.edu/~shervine/teaching/cs-229/cheatsheet-
deep-learning#reinforcement)
Reinforcement Learning and Control
The goal of reinforcement learning is for an agent to learn how to evolve in an environment.

Definitions
❐ Markov decision processes ― A Markov decision process (MDP) is a 5-tuple
(S, A, {Psa }, γ, R) where:

S is the set of states

A is the set of actions
{Psa } are the state transition probabilities for s ∈ S and a ∈ A

https://stanford.edu/~shervine/teaching/cs-229/cheatsheet-deep-learning#nn 4/6
8/13/24, 8:04 PM CS 229 - Deep Learning Cheatsheet

γ ∈ [0, 1[ is the discount factor

R : S × A ⟶ R or R : S ⟶ R is the reward function that the algorithm wants to
maximize

❐ Policy ― A policy π is a function π : S ⟶ A that maps states to actions.

Remark: we say that we execute a given policy π if given a state s we take the action a = π(s).

❐ Value function ― For a given policy π and a given state s, we define the value function V π as
follows:

V π (s) = E[R(s0 ) + γR(s1 ) + γ 2 R(s2 ) + ...∣s0 = s, π]

∗
❐ Bellman equation ― The optimal Bellman equations characterizes the value function V π of the
optimal policy π ∗ :

V π (s) = R(s) + max γ ∑ Psa (s′ )V π (s′ )

∗ ∗

a∈A
s′ ∈S

Remark: we note that the optimal policy π ∗ for a given state s is such that:

π ∗ (s) = argmax ∑ Psa (s′ )V ∗ (s′ )

a∈A
s′ ∈S

❐ Value iteration algorithm ― The value iteration algorithm is in two steps:

1) We initialize the value:

V0 (s) = 0

2) We iterate the value based on the values before:

https://stanford.edu/~shervine/teaching/cs-229/cheatsheet-deep-learning#nn 5/6
8/13/24, 8:04 PM CS 229 - Deep Learning Cheatsheet

Vi+1 (s) = R(s) + max [∑ γPsa (s′ )Vi (s′ )]

a∈A
s′ ∈S

❐ Maximum likelihood estimate ― The maximum likelihood estimates for the state transition
probabilities are as follows:

′ #times took action a in state s and got to s′

Psa (s ) =
#times took action a in state s

❐ Q-learning ― Q-learning is a model-free estimation of Q, which is done as follows:

Q(s, a) ← Q(s, a) + α[R(s, a, s′ ) + γ max

′
Q(s′ , a′ ) − Q(s, a)]

For a more detailed overview of the concepts above, check out the States-based Models
cheatsheets (teaching/cs-221/cheatsheet-states-models)!

 (https://twitter.com/shervinea)  (https://linkedin.com/in/shervineamidi) 
(https://github.com/shervinea)  (https://scholar.google.com/citations?user=nMnMTm8AAAAJ) 
 (https://www.amazon.com/stores/author/B0B37XBSJL)

https://stanford.edu/~shervine/teaching/cs-229/cheatsheet-deep-learning#nn 6/6

CS 229 Deep Learning Cheatsheet
No ratings yet
CS 229 Deep Learning Cheatsheet
6 pages
Cheatsheet Deep Learning
No ratings yet
Cheatsheet Deep Learning
2 pages
Deep Learning Cheatsheet
No ratings yet
Deep Learning Cheatsheet
5 pages
DNN Cheat Sheet
No ratings yet
DNN Cheat Sheet
5 pages
Neural Networks for NLP: Training & Concepts
No ratings yet
Neural Networks for NLP: Training & Concepts
14 pages
Neural Networks in NLP: Training & Concepts
No ratings yet
Neural Networks in NLP: Training & Concepts
14 pages
CS224n: Natural Language Processing With Deep Learning
No ratings yet
CS224n: Natural Language Processing With Deep Learning
18 pages
Day1 05 Introduction To DeepLearning Part
No ratings yet
Day1 05 Introduction To DeepLearning Part
20 pages
Deep Learning: Artificial Neuron Model
No ratings yet
Deep Learning: Artificial Neuron Model
13 pages
Introduction to Deep Learning Techniques
100% (1)
Introduction to Deep Learning Techniques
299 pages
CS 230 - Deep Learning Tips and Tricks Cheatsheet
No ratings yet
CS 230 - Deep Learning Tips and Tricks Cheatsheet
8 pages
CS60010: Deep Learning: Recurrent Neural Network
No ratings yet
CS60010: Deep Learning: Recurrent Neural Network
44 pages
Autoencoders in Deep Learning
No ratings yet
Autoencoders in Deep Learning
73 pages
Fundamentals of Deep Learning Overview
No ratings yet
Fundamentals of Deep Learning Overview
51 pages
2.game AI 1
No ratings yet
2.game AI 1
268 pages
ML Unit-5
No ratings yet
ML Unit-5
20 pages
Lecture 09 Slides - After
No ratings yet
Lecture 09 Slides - After
57 pages
A Imprimer 4
No ratings yet
A Imprimer 4
4 pages
Neural Networks Overview and Learning
No ratings yet
Neural Networks Overview and Learning
25 pages
Deep Learning for Beginners
100% (1)
Deep Learning for Beginners
87 pages
First
No ratings yet
First
92 pages
Lecture 2 - Neural Network v1.0
No ratings yet
Lecture 2 - Neural Network v1.0
64 pages
Neural Networks: Fundamentals and Training
No ratings yet
Neural Networks: Fundamentals and Training
20 pages
Machine Learning With Artificial Neural Networks
No ratings yet
Machine Learning With Artificial Neural Networks
6 pages
Week5 Summary Detail
No ratings yet
Week5 Summary Detail
10 pages
Neuron 7 AI: Linear Threshold Units
No ratings yet
Neuron 7 AI: Linear Threshold Units
18 pages
Deep Learning Fundamentals Overview
No ratings yet
Deep Learning Fundamentals Overview
38 pages
AI Course Overview for Students
No ratings yet
AI Course Overview for Students
86 pages
Lecture 0.4 - Neural Networks
No ratings yet
Lecture 0.4 - Neural Networks
51 pages
Cours 3
No ratings yet
Cours 3
18 pages
Deep Learning Fundamentals Explained
No ratings yet
Deep Learning Fundamentals Explained
18 pages
Perceptron and Neural Networks Overview
No ratings yet
Perceptron and Neural Networks Overview
4 pages
11 Neural Nets Annot
No ratings yet
11 Neural Nets Annot
33 pages
LLM For Maths People
No ratings yet
LLM For Maths People
53 pages
Artificial Neural Network (A.k.a. Deep Learning) : Dr. Md. Aminul Haque Akhand Dept. of CSE, Kuet
No ratings yet
Artificial Neural Network (A.k.a. Deep Learning) : Dr. Md. Aminul Haque Akhand Dept. of CSE, Kuet
29 pages
10 Neural Nets
No ratings yet
10 Neural Nets
61 pages
Chapter-4 Fundamental of Neural Network
No ratings yet
Chapter-4 Fundamental of Neural Network
26 pages
Deep Learning and Neural Networks Guide
No ratings yet
Deep Learning and Neural Networks Guide
17 pages
ANN Formulas and Models
No ratings yet
ANN Formulas and Models
24 pages
Shortnotedeeplearning
No ratings yet
Shortnotedeeplearning
11 pages
Choosing Hidden Units in Deep Learning
No ratings yet
Choosing Hidden Units in Deep Learning
26 pages
Physics Informed Neural Network Theory and Applications
No ratings yet
Physics Informed Neural Network Theory and Applications
44 pages
Overview of Artificial Neural Networks
No ratings yet
Overview of Artificial Neural Networks
72 pages
Backpropagation in Neural Networks
No ratings yet
Backpropagation in Neural Networks
7 pages
Week 1 - Artificial Neural Networks - Part I - Justin
No ratings yet
Week 1 - Artificial Neural Networks - Part I - Justin
56 pages
Artificial Neural Networks
No ratings yet
Artificial Neural Networks
100 pages
Deep Learning Basics and Techniques
No ratings yet
Deep Learning Basics and Techniques
301 pages
Imperial Dlcourse2022 RNN Notes
No ratings yet
Imperial Dlcourse2022 RNN Notes
9 pages
Backpropagation in Neural Networks
No ratings yet
Backpropagation in Neural Networks
83 pages
Introduction to Neural Networks Basics
No ratings yet
Introduction to Neural Networks Basics
45 pages
Introduction Deep Eng
No ratings yet
Introduction Deep Eng
50 pages
DL 02 Deep Forward Networks
No ratings yet
DL 02 Deep Forward Networks
47 pages
UNIT I - Introduction
No ratings yet
UNIT I - Introduction
76 pages
History and Theory of Deep Learning
No ratings yet
History and Theory of Deep Learning
18 pages
Data Science
No ratings yet
Data Science
9 pages
Deep Learning and Feed Forward Neural Networks
No ratings yet
Deep Learning and Feed Forward Neural Networks
8 pages
Algorithm Unrolling Interpretable, Efficient Deep Learning For Signal and Image Processing%%%%%%%%
No ratings yet
Algorithm Unrolling Interpretable, Efficient Deep Learning For Signal and Image Processing%%%%%%%%
27 pages
Awesome Machine Learning Papers
100% (1)
Awesome Machine Learning Papers
326 pages
CCS364 Soft Computing Lab Manual Final CCS364 Soft Computing Lab Manual Final
No ratings yet
CCS364 Soft Computing Lab Manual Final CCS364 Soft Computing Lab Manual Final
28 pages
A Beginner's Guide To Understanding Convolutional Neural Networks Part 1 - Adit Deshpande - CS Under
100% (1)
A Beginner's Guide To Understanding Convolutional Neural Networks Part 1 - Adit Deshpande - CS Under
14 pages
Applying Multiple Neural Networks On Large Scale Data: Kritsanatt Boonkiatpong and Sukree Sinthupinyo
No ratings yet
Applying Multiple Neural Networks On Large Scale Data: Kritsanatt Boonkiatpong and Sukree Sinthupinyo
5 pages
Real-Time Pothole Severity Detection System
No ratings yet
Real-Time Pothole Severity Detection System
15 pages
1 Ken Radh3touh Its Ok U Have Bilgacem For Further Assistance If You Dont Support Bilgacem Timchi T
No ratings yet
1 Ken Radh3touh Its Ok U Have Bilgacem For Further Assistance If You Dont Support Bilgacem Timchi T
174 pages
A Multilayer Neural Network Controller
No ratings yet
A Multilayer Neural Network Controller
5 pages
Neural Networks Overview and Backpropagation
No ratings yet
Neural Networks Overview and Backpropagation
38 pages
IV I End Semeste Regular Examination Feb Mar 2025
No ratings yet
IV I End Semeste Regular Examination Feb Mar 2025
114 pages
Data Orchestration in Deep Learning Accelerators
No ratings yet
Data Orchestration in Deep Learning Accelerators
35 pages
AI and Machine Learning Overview
No ratings yet
AI and Machine Learning Overview
151 pages
Greek Energy Consumption Forecasting
No ratings yet
Greek Energy Consumption Forecasting
7 pages
Unit 2
No ratings yet
Unit 2
19 pages
Applications of Improved Grey Prediction Model
No ratings yet
Applications of Improved Grey Prediction Model
9 pages
True Statements on Neural Networks
No ratings yet
True Statements on Neural Networks
31 pages
Learning Curves For Stochastic Gradient Descent in Linear Feedforward Networks
No ratings yet
Learning Curves For Stochastic Gradient Descent in Linear Feedforward Networks
8 pages
Sequence Models in Deep Learning
No ratings yet
Sequence Models in Deep Learning
22 pages
Optimize DME Production for Methanol Conversion
No ratings yet
Optimize DME Production for Methanol Conversion
48 pages
RNNs and Seq2Seq Architectures Explained
No ratings yet
RNNs and Seq2Seq Architectures Explained
23 pages
Experiment 9 1
No ratings yet
Experiment 9 1
3 pages
Modelling of Soil Shear Strength Using Neural Network Approach
No ratings yet
Modelling of Soil Shear Strength Using Neural Network Approach
21 pages
Intro DL 01
No ratings yet
Intro DL 01
64 pages
%%%computer Programming - Java - Neural Network Gui With Joone (2002)
No ratings yet
%%%computer Programming - Java - Neural Network Gui With Joone (2002)
91 pages
Comparison of Statistical and Machine Learning Met
No ratings yet
Comparison of Statistical and Machine Learning Met
26 pages
Encoders
No ratings yet
Encoders
29 pages