You are on page 1of 58

Deep Learning

Raffaello Baluyot
AI, ML and DL

https://www.limitlessmobil.com/machine-learning/how-is-artificial-intelligence-different-from-machine-learning-
and-deep-learning/
AI, ML and DL

Source: https://blogs.nvidia.com/blog/2016/07/29/whats-difference-artificial-intelligence-machine-learning-
deep-learning-ai/
Artificial Intelligence

Source: https://en.wikipedia.org/wiki/Tic-tac-toe
Artificial Intelligence

Source: https://www.microsoft.com/en-us/p/international-chess-online/
Machine Learning

https://en.wikipedia.org/wiki/K-nearest_neighbors_algorithm
Machine Learning

Color Shape Fruit Count


Red Round Apple 7

Red Round Tomato 1

Red Not Round Strawberry 2

Yellow Not Round Banana 6

Yellow Not Round Manga 2

Yellow Round Lemon 2


Deep Learning

https://playground.tensorflow.org
Deep Learning as they define
Large Neural Networks
Series of Data Representations
Vaguely Inspired by Brain
Deep Learning as I define
Composition of multiple parameterized
functions optimized for an objective
Parameterized Function: Layers
Composition: Connection
Objective: Loss
Optimization: Gradient Descent
Learning Example
Data
Objective
Layer
Connection
Optimization
Data
A B Z
1 1 3
4 0 20
2 0 10
3 3 9
1 3 -1
4 2 16
0 2 -4
1 4 -3
Objective
𝑃=𝑍
𝑃−𝑍 =0
𝑃−𝑍 =0
min 𝑃 − 𝑍
where P is the deep learning prediction
Layer
𝑙1 = 𝑤𝑥 𝑥 + 𝑤𝑦 𝑦 + 𝑏
where:
l1 is the layer result
wx is the parameter for x
wy is the parameter for y
b is the bias parameter
Connection
𝑃 = 𝑙1
where:
P is the deep learning prediction
l1 is the layer result
Optimization

http://fa.bianp.net/blog/2016/hyperparameter-optimization-with-approximate-gradient/
Optimization
𝑂 = 𝑤𝑥 𝑥 + 𝑤𝑦 𝑦 + 𝑏 − 𝑍
𝑤𝑥 , 𝑤𝑦 , 𝑏 = 𝑎𝑟𝑔𝑚𝑖𝑛 𝑂
where O is the objective
Optimization
𝜕𝑂
= 𝑥 ∙ 𝑠𝑖𝑔𝑛 𝑂
𝜕𝑤𝑥
𝜕𝑂
= 𝑦 ∙ 𝑠𝑖𝑔𝑛 𝑂
𝜕𝑤𝑦
𝜕𝑂
= 𝑠𝑖𝑔𝑛 𝑂
𝜕𝑏
Optimization
𝜕𝑂
𝑤𝑥 𝑛𝑒𝑤 = 𝑤𝑥 𝑜𝑙𝑑 −
𝜕𝑤𝑥
𝜕𝑂
𝑤𝑦 = 𝑤𝑦 −
𝑛𝑒𝑤 𝑜𝑙𝑑 𝜕𝑤𝑦
𝜕𝑂
𝑏𝑛𝑒𝑤 = 𝑏𝑜𝑙𝑑 −
𝜕𝑏
Optimization

Increase in Increase in
parameter parameter
decreases increases
objective objective

https://commons.wikimedia.org/wiki/File:Simple_sine_wave.svg
Learning
def perform_gradient_descent(x, y, z, wx, wy, b):
print('\n===== Gradient Step =====')
print('Pre-update Parameters wx:{} wy:{} b:{} '
.format(wx, wy, b))

p = wx * x + wy * y + b
wxg = sign(p-z)*x
wyg = sign(p-z)*y
bg = sign(p-z)
wx -= wxg
wy -= wyg
b -= bg

print('Training Inputs x:{} y:{} z:{}'.format(x, y, z))


print('P: {}'.format(p, z))
print('Gradients wx:{} wy:{} b{}'.format(wxg, wyg, bg))
print('Pre-update Parameters wx:{} wy:{} b:{} '
.format(wx, wy, b))
return wx, wy, b
Learning
Learning
Layers
Dense Layer
Activation Function
Convolutional Layer
Recurrent Layer
Dense Layer

https://corochann.com/mnist-training-with-multi-layer-perceptron-1149.html
Dense Layer
Feature Transformation to N features
Linear Transformation
Dense Layer
Assume transform 3 features to 5 features
𝑦1 = 𝑤11 𝑥1 + 𝑤12 𝑥2 + 𝑤13 𝑥3 + 𝑏1
𝑦2 = 𝑤21 𝑥1 + 𝑤22 𝑥2 + 𝑤23 𝑥3 + 𝑏2
𝑦3 = 𝑤31 𝑥1 + 𝑤32 𝑥2 + 𝑤33 𝑥3 + 𝑏3
𝑦4 = 𝑤41 𝑥1 + 𝑤42 𝑥2 + 𝑤43 𝑥3 + 𝑏4
𝑦5 = 𝑤51 𝑥1 + 𝑤52 𝑥2 + 𝑤53 𝑥3 + 𝑏5
Dense Layer
Assume transform 3 features to 5 features
𝑦1 𝑤11 𝑤12 𝑤13 𝑏1
𝑦2 𝑤21 𝑤22 𝑤23 𝑥1 𝑏2
𝑦3 = 𝑤31 𝑤32 𝑤33 𝑥2 + 𝑏3
𝑦4 𝑤41 𝑤42 𝑤43 𝑥3 𝑏4
𝑦5 𝑤51 𝑤52 𝑤53 𝑏5
y = wx + b
Activation Function

https://medium.com/@krishnakalyan3/introduction-to-exponential-linear-unit-d3e2904b366c
Activation Function
Inspired by Brain Neuron Activation
Non-linear Activation Function enables
approximation of larger set of functions
Family of ReLU functions are popular
Usually applied after each layer
Convolution

https://blog.saush.com/2011/04/20/edge-detection-with-the-sobel-operator-in-ruby/
Convolution
Fundamental Image Operation
Applies a Kernel or Filter to an image
Different Kernels provide different Result
Convolution

http://graphics.stanford.edu/courses/cs148-10-summer/docs/04_imgproc.pdf
Convolution

http://graphics.stanford.edu/courses/cs148-10-summer/docs/04_imgproc.pdf
Convolution

http://graphics.stanford.edu/courses/cs148-10-summer/docs/04_imgproc.pdf
Convolution
0 ∙ 0 + −1 ∙ 0 + 0 ∙ 0 +

−1 ∙ 0 + 5 ∙ 2 + −1 ∙ 3 +

0 ∙ 0 + −1 ∙ 0 + 0 ∙ 5

0 ∙ 2 + −1 ∙ 3 + 0 ∙ 1 +

−1 ∙ 0 + 5 ∙ 5 + −1 ∙ 1 +

0 ∙ 1 + −1 ∙ 0 + 0 ∙ 8

21

http://graphics.stanford.edu/courses/cs148-10-summer/docs/04_imgproc.pdf
Convolutional Layer
Transformations using Convolutions
Kernel values are learned
Extracts image features based on
objective
Convolutional Layer
Image Processing
Signal Processing
Text Analysis
Sequences

https://coinmarketcap.com/currencies/bitcoin/
Sequences
How did you spend your week?
What are the places I’ll encounter if I
travel from Monumento to Adamson
through LRT-1?
Sequences
numbers = [5, 7, 10, 1, 20, 3]
sum = 0
summary = []
for number in numbers:
sum += number
twice_sum = sum * 2
summary.append(twice_sum)
Sequences
numbers = [5, 7, 10, 1, 20, 3] Input Sequence
sum = 0 State
summary = []
for number in numbers:
sum += number State Update
twice_sum = sum * 2 Output Generation
summary.append(twice_sum)
Recurrent Layer

https://en.wikipedia.org/wiki/Recurrent_neural_network
Recurrent Layer
Has internal state through out the
sequence
State Update and Output Generation
are parameterized linear functions
Long-Short Term Memory (LSTM) and its
bidirectional implementation are the
popular recurrent layers
Recurrent Layer
Video Analysis
Text Analysis
Time Series Analysis
Kinds of Learning
Unsupervised Learning
Supervised Learning
Reinforcement Learning
Unsupervised Learning

https://www.quora.com/What-does-the-word-embedding-mean-in-the-context-of-Machine-Learning
Unsupervised Learning
Unlabeled Data
Focuses on Data Representations and
Relationships
Unsupervised Learning
Autoencoders
Word Embeddings
Autoencoders

https://towardsdatascience.com/applied-deep-learning-part-3-autoencoders-1c083af4d798
Word Embedding

https://towardsdatascience.com/word-embedding-with-word2vec-and-fasttext-a209c1d3e12c
Supervised Learning

https://www.oreilly.com/library/view/hands-on-machine-learning/9781491962282/ch01.html
Supervised Learning
Labeled Data
Creates mapping of features to labels
Predictive Model
Supervised Learning
Classification
Regression
Classification

http://scikit-learn.org/stable/auto_examples/ensemble/plot_voting_decision_regions.html#sphx-glr-auto-
examples-ensemble-plot-voting-decision-regions-py
Regression

https://www.oreilly.com/library/view/hands-on-machine-learning/9781491962282/ch01.html
Reinforcement Learning

https://simple.wikipedia.org/wiki/Reinforcement_learning
Reinforcement Learning
Agent acting on an Environment
Select action based on observation
Maximize Rewards

You might also like