Professional Documents
Culture Documents
Raffaello Baluyot
AI, ML and DL
https://www.limitlessmobil.com/machine-learning/how-is-artificial-intelligence-different-from-machine-learning-
and-deep-learning/
AI, ML and DL
Source: https://blogs.nvidia.com/blog/2016/07/29/whats-difference-artificial-intelligence-machine-learning-
deep-learning-ai/
Artificial Intelligence
Source: https://en.wikipedia.org/wiki/Tic-tac-toe
Artificial Intelligence
Source: https://www.microsoft.com/en-us/p/international-chess-online/
Machine Learning
https://en.wikipedia.org/wiki/K-nearest_neighbors_algorithm
Machine Learning
https://playground.tensorflow.org
Deep Learning as they define
Large Neural Networks
Series of Data Representations
Vaguely Inspired by Brain
Deep Learning as I define
Composition of multiple parameterized
functions optimized for an objective
Parameterized Function: Layers
Composition: Connection
Objective: Loss
Optimization: Gradient Descent
Learning Example
Data
Objective
Layer
Connection
Optimization
Data
A B Z
1 1 3
4 0 20
2 0 10
3 3 9
1 3 -1
4 2 16
0 2 -4
1 4 -3
Objective
𝑃=𝑍
𝑃−𝑍 =0
𝑃−𝑍 =0
min 𝑃 − 𝑍
where P is the deep learning prediction
Layer
𝑙1 = 𝑤𝑥 𝑥 + 𝑤𝑦 𝑦 + 𝑏
where:
l1 is the layer result
wx is the parameter for x
wy is the parameter for y
b is the bias parameter
Connection
𝑃 = 𝑙1
where:
P is the deep learning prediction
l1 is the layer result
Optimization
http://fa.bianp.net/blog/2016/hyperparameter-optimization-with-approximate-gradient/
Optimization
𝑂 = 𝑤𝑥 𝑥 + 𝑤𝑦 𝑦 + 𝑏 − 𝑍
𝑤𝑥 , 𝑤𝑦 , 𝑏 = 𝑎𝑟𝑔𝑚𝑖𝑛 𝑂
where O is the objective
Optimization
𝜕𝑂
= 𝑥 ∙ 𝑠𝑖𝑔𝑛 𝑂
𝜕𝑤𝑥
𝜕𝑂
= 𝑦 ∙ 𝑠𝑖𝑔𝑛 𝑂
𝜕𝑤𝑦
𝜕𝑂
= 𝑠𝑖𝑔𝑛 𝑂
𝜕𝑏
Optimization
𝜕𝑂
𝑤𝑥 𝑛𝑒𝑤 = 𝑤𝑥 𝑜𝑙𝑑 −
𝜕𝑤𝑥
𝜕𝑂
𝑤𝑦 = 𝑤𝑦 −
𝑛𝑒𝑤 𝑜𝑙𝑑 𝜕𝑤𝑦
𝜕𝑂
𝑏𝑛𝑒𝑤 = 𝑏𝑜𝑙𝑑 −
𝜕𝑏
Optimization
Increase in Increase in
parameter parameter
decreases increases
objective objective
https://commons.wikimedia.org/wiki/File:Simple_sine_wave.svg
Learning
def perform_gradient_descent(x, y, z, wx, wy, b):
print('\n===== Gradient Step =====')
print('Pre-update Parameters wx:{} wy:{} b:{} '
.format(wx, wy, b))
p = wx * x + wy * y + b
wxg = sign(p-z)*x
wyg = sign(p-z)*y
bg = sign(p-z)
wx -= wxg
wy -= wyg
b -= bg
https://corochann.com/mnist-training-with-multi-layer-perceptron-1149.html
Dense Layer
Feature Transformation to N features
Linear Transformation
Dense Layer
Assume transform 3 features to 5 features
𝑦1 = 𝑤11 𝑥1 + 𝑤12 𝑥2 + 𝑤13 𝑥3 + 𝑏1
𝑦2 = 𝑤21 𝑥1 + 𝑤22 𝑥2 + 𝑤23 𝑥3 + 𝑏2
𝑦3 = 𝑤31 𝑥1 + 𝑤32 𝑥2 + 𝑤33 𝑥3 + 𝑏3
𝑦4 = 𝑤41 𝑥1 + 𝑤42 𝑥2 + 𝑤43 𝑥3 + 𝑏4
𝑦5 = 𝑤51 𝑥1 + 𝑤52 𝑥2 + 𝑤53 𝑥3 + 𝑏5
Dense Layer
Assume transform 3 features to 5 features
𝑦1 𝑤11 𝑤12 𝑤13 𝑏1
𝑦2 𝑤21 𝑤22 𝑤23 𝑥1 𝑏2
𝑦3 = 𝑤31 𝑤32 𝑤33 𝑥2 + 𝑏3
𝑦4 𝑤41 𝑤42 𝑤43 𝑥3 𝑏4
𝑦5 𝑤51 𝑤52 𝑤53 𝑏5
y = wx + b
Activation Function
https://medium.com/@krishnakalyan3/introduction-to-exponential-linear-unit-d3e2904b366c
Activation Function
Inspired by Brain Neuron Activation
Non-linear Activation Function enables
approximation of larger set of functions
Family of ReLU functions are popular
Usually applied after each layer
Convolution
https://blog.saush.com/2011/04/20/edge-detection-with-the-sobel-operator-in-ruby/
Convolution
Fundamental Image Operation
Applies a Kernel or Filter to an image
Different Kernels provide different Result
Convolution
http://graphics.stanford.edu/courses/cs148-10-summer/docs/04_imgproc.pdf
Convolution
http://graphics.stanford.edu/courses/cs148-10-summer/docs/04_imgproc.pdf
Convolution
http://graphics.stanford.edu/courses/cs148-10-summer/docs/04_imgproc.pdf
Convolution
0 ∙ 0 + −1 ∙ 0 + 0 ∙ 0 +
−1 ∙ 0 + 5 ∙ 2 + −1 ∙ 3 +
0 ∙ 0 + −1 ∙ 0 + 0 ∙ 5
0 ∙ 2 + −1 ∙ 3 + 0 ∙ 1 +
−1 ∙ 0 + 5 ∙ 5 + −1 ∙ 1 +
0 ∙ 1 + −1 ∙ 0 + 0 ∙ 8
21
http://graphics.stanford.edu/courses/cs148-10-summer/docs/04_imgproc.pdf
Convolutional Layer
Transformations using Convolutions
Kernel values are learned
Extracts image features based on
objective
Convolutional Layer
Image Processing
Signal Processing
Text Analysis
Sequences
https://coinmarketcap.com/currencies/bitcoin/
Sequences
How did you spend your week?
What are the places I’ll encounter if I
travel from Monumento to Adamson
through LRT-1?
Sequences
numbers = [5, 7, 10, 1, 20, 3]
sum = 0
summary = []
for number in numbers:
sum += number
twice_sum = sum * 2
summary.append(twice_sum)
Sequences
numbers = [5, 7, 10, 1, 20, 3] Input Sequence
sum = 0 State
summary = []
for number in numbers:
sum += number State Update
twice_sum = sum * 2 Output Generation
summary.append(twice_sum)
Recurrent Layer
https://en.wikipedia.org/wiki/Recurrent_neural_network
Recurrent Layer
Has internal state through out the
sequence
State Update and Output Generation
are parameterized linear functions
Long-Short Term Memory (LSTM) and its
bidirectional implementation are the
popular recurrent layers
Recurrent Layer
Video Analysis
Text Analysis
Time Series Analysis
Kinds of Learning
Unsupervised Learning
Supervised Learning
Reinforcement Learning
Unsupervised Learning
https://www.quora.com/What-does-the-word-embedding-mean-in-the-context-of-Machine-Learning
Unsupervised Learning
Unlabeled Data
Focuses on Data Representations and
Relationships
Unsupervised Learning
Autoencoders
Word Embeddings
Autoencoders
https://towardsdatascience.com/applied-deep-learning-part-3-autoencoders-1c083af4d798
Word Embedding
https://towardsdatascience.com/word-embedding-with-word2vec-and-fasttext-a209c1d3e12c
Supervised Learning
https://www.oreilly.com/library/view/hands-on-machine-learning/9781491962282/ch01.html
Supervised Learning
Labeled Data
Creates mapping of features to labels
Predictive Model
Supervised Learning
Classification
Regression
Classification
http://scikit-learn.org/stable/auto_examples/ensemble/plot_voting_decision_regions.html#sphx-glr-auto-
examples-ensemble-plot-voting-decision-regions-py
Regression
https://www.oreilly.com/library/view/hands-on-machine-learning/9781491962282/ch01.html
Reinforcement Learning
https://simple.wikipedia.org/wiki/Reinforcement_learning
Reinforcement Learning
Agent acting on an Environment
Select action based on observation
Maximize Rewards