Day 1 S3

PART-II: Deep Learning
PART-III: Trends &

Challenges
et
a i kf
at n
Timeline a. p
av c. i n
p r
su i i t . a
@k
Geadient Descent
Popular ML
Algorithm
www.kiit.ac.in
Cloud
Computing
DATA and Machine
WSN Learning
Algorithm
DATA
IoT
Computational
Resourses
(CPU/GPU/TPU/IPU)
2
Defination
Ø AI: Inteligence demonstrated by machines. Machines are programmed to interpret and act like human beings
Ø ML:Machine learning is a subset of AI that provides systems, the ability to learn automatically and
improve from experience (training-data) without being explicitly programmed.
FAST BEYOND
ING
DEEP
DE E P
COMPUTATION HANDCRAFTED
N
STRUCTURE
LEAR
PATTERNS
ANALYTICS
AN A MIMICKING BNN LAYERED ARCHITECTURE

N ,G
FL
MA
CH NING DATA PATTERN STATISTICAL METHODS
INE AR
LE
AR CE
TIF EN AUTOMATION USING RULES
ICI
AL LLIG (sensor & rule based technology)
E
INT
3
1950 1975 Engineering of making Sensors Actuators Statistics Probability
intelligent devices
Automation
(machine language) 1G ANN
(Artificial Statistics
Neural Driven
network)
Without being explicitly
1962 1995
programmed. FL
Learning of (Rulebased
Expert
GA
(Evolution Pattern
paterns 2G
Systems) Theory) Driven
(Assembly Language)
Using deep models for automated

1995 2020
pattern mining Data
Driven
Deep Modularity
3G
Learning Processing Query
(High Level Language)
Driven..??
Data driven
Hybridization of
Deep Models
(Query based
processing) 4G 4
⬡ Trained /learned over instances
⬡ Able to identify meaningful patterns
⬡ Needs memory to remember and utilize
appropriately
ML Timeline
Training
Data
optimize and
tune the model
5
Training
Labeled Data ↔ generates performance feedback

Predict (regression)/ Identify (classification)
Testing
SUPERVISED
No Labels Maximize Reward for Series of

Explore Hidden Associations actions (Environment~Action)
Exa: Gaming,Autonomous Driving
UN LEARN
ING..??
T
EN
SU
EM
PE
RC
RV
FO
IS
IN
ED
RE
6
Learning and applications
SUPERVISED LEARNING UNSUPERVISED LEARNING REINFORCEDLEARNING
CLASSIFICATION DIMENSION Ø Gaming

Ø Image REDUCTION Ø Inventory
Classification Ø Text Mining Management
Ø Diagnostics Ø Image/ Face Ø Automation &
Ø Spam email Recognition Robotics
filtering Ø Bigdata Ø Navigation
Ø Fraud detection Visualization
Ø PCA
REGRESSION CLUSTERING
Ø Risk assesment Ø Targeted
Ø Score Marketing
prediction Ø City Planning
Ø Share market Ø Biology
price 7
Popular Applications
Image
processing Learning
& Association Statistical Autonomous
Computer Vision Target Marketing Arbritage Driving ,
Robotics
Speech Natural Interactive

Dimension
Processing Language Gaming
Reduction
& Analysis Processing
PCA, ICA
Medical Traffic
Time Series
Diagnosis, Fraud/ Spam Monitoring &
Trend &
Function Approx. Detection Resource
Analysis
Scheduling
Supervised Unsupervised Sequence Reinforcement

8
Image Computer vision is a utility that makes useful decisions about real physical
processing objects and scenes based on sensed images”
&
Computer Vision Posenet
Pattern
Recognition
Handwritten Char
Recognition
Pose Estimation
Transformation
Using GAN
Converting 2D to 3D
Medical Image
Analysis
9
Speech
Processing
& Analysis
Word Recognition
Formant synthesis
Speaker Recognition
Articulatory synthesis
Emotion Recognition Diphone synthesis
Speech to Text
Language Translation
Music Synthesis
Medical Diagnosis 10
Milestones in the history of ML
AI as a concept no realworld Military and Academia Large tech companies invest in
application (1950 ~1980) Interest (1980~2005) commercial applications
(2005~onwords)
Deep Blue Apple integrates
Contributing Problems : lack of

1. Joseph W: Chatbox ELIZA defeats chess Google Brain
into iphone to
computing, logic, Moravec

2.Rosenblatts: Perceptron champion recognizes picture,
inteligent virtual
describes a scene
Alphago, ImageNet, ...........

Garry Kaspav
Deep Neural Network Era:

assistance
paradox
1964 2012-
1997 2011
2014
1956
2002 2011
Other NN
McCarthy Models Roomba (robotic JIBMs Watson
Introduces the (MLP,AM, Vacuum cleaner) wins TV show
term AI SOFM) AI Winter learns to navigate Jeopardy
homes 11
Leaders In The Cloud Vendors
⬡ Amazon Web Services. ... ⬡ Nauto: Learning Platform
⬡ Google Cloud Platform. ... ⬡ Tempus: Data driven precision medicine
⬡ IBM Cloud. ... ⬡ Phrasee: Natural Language processing
⬡ Microsoft Azure. ... ⬡ Siemens: Energy, electrification,
⬡ Alibaba Cloud. digitazation, automation
⬡ Socure: Banking and Investment
⬡ Blue River Technology: Smart Farming
⬡ Nvidia: CUDA GPU
⬡ Intel: chip maker
⬡ Zebra Madical Vision, Iris.AI, Freenome,
⬡ Graphcor (IPU) 12
Popular Machine Learning Methods
Attention Model Networks Statistics, Probabilistic Performance Augmentation
Inferential Approach Strategies
⬡ Multilayer Perceptron ⬡ Decission Tree ⬡ Bagging

⬡ Associative mapping ⬡ Baye’s Probabilistic ⬡ Boosting
⬡ Radial Basis Function ⬡ Support Vector ⬡ n-fold validation
⬡ Convolutional NN
Machine ⬡ Ensemble Learning
⬡ Gaussian Mixture
⬡ Recurrent NN ⬡ Stacking
Model
⬡ Hidden Markove ⬡ Transfer Learning
⬡ Long Short Term
Memory Model ⬡ Genarative models
13
Classification Conception
Hyperplane car: �� = �1 ��1 + �2 ��2 +�0 = 0

Radial Basis functions
optimization: Cluster Center,
Cluster Variance
14
Gradient Descent
⬡ Parametric Optimization method
��
⬡ How to move in steps: descend the �� − ��
�� + ��
��
slope (gradient)
⬡ Activation function to deal with non-
linear mapping functions.
Gradient descent is an
iterative algorithm, that
starts from a random point
on a function and travels
down its slope in steps
until it reaches the lowest
point of that function
1,1
�→ 1,1 1. Model Representation (layers, neuron in layers)
� →�
2. Activation Function σ (linear, sigmoid, ReLU...)
�1,2 �1,2 �2,1 3. Bias Node
→�→ 4. Cost Function (MSE, Cross Entropy,...)
�
�→ � �
1,3 1,3
�2,2 5. Forward Propagation Calculation
→
6. Backpropagation Algorithm
1,4
�→ � �→1,4 7. Code Implementation
(�1 , �1 , �) (�2 , �2 , �)
��,� ��,�
�[�] = ⋮ →�→ � = ⋮
[�]
�,�
� ��,�
�,� �,�
�[�] = ��,� → � → �[�] = ��,�
� �
�
�=
�
�
�=
�
Back Propagation Backward Propagation
Graph-based computation carried out by tensorflow.
��
��
Forward Propagation ��
��
��
��  ( x)  1 �� P
Si
��
1e x
��
P
gm
oi
d
��   ( x) ��
  ( x)1 ( x)  ��
��
 x
��
�� = � �� ,� + �� ,�
x ��
�� = � �� ,� + �� ,�  ( x )  1 e ∆�3,1
hy
∆�3,2
t a abo
pa
n- li
��1 ��2
1 e  x
r
� = � �� ,� + �� ,�
c
  ( x) 1 �� 1 �� 2
 1 ( x ) 2  = +
��  x 2  ��1 ��1 ��1 ��2 ��1
= ��,� �’ = ��,� �’
��
∆�1,1 ∆�2,1
��
= ��,� �’ = ��,� �’
�� 1 �� 2
= +
��2 ��1 ��2 ��2 ��2
��
= ��,� �’ = ��,� �’
�� ∆�2,2
∆�1,2
https://medium.com/ml-cheat-
Three Layer MLP sheet/winning-at-loss-
functions-common-loss-
functions-that-you-should-
know-a72c1802ecb4
�
�=� (�� − �� ) �
⬡ Mean squared error (MSE): ��
Loss Functions
=
�
⬡ Mean absolute error (MAE), Huber Loss,....
⬡ Binary Cross-Entropy or Log-loss error: aims to reduce the entropy of the predicted
probability distribution in binary classification problems
�
�� = �=�
[− �� (�� ) − (� − �� ) ��(� − �� )]
⬡ Hing Loss = �
�=0
��(0,1 − �� ) , KL Divergence.....
�
⬡ Multiclass Cross-Entropy for C classes: − � ��(�� )
�=1 �
Variants of Gradient Descent:
⬡ Batch Gradient Descent: Vanilla gradient descent, aka batch gradient
descent, computes the gradient of the cost function w.r.t. to the
parameters W for the entire training dataset.
� = � − � �� (�).
⬡ Mini Batch Gradient Descent: Mini-batch gradient descent finally
takes the best of both worlds and performs an update for every mini-
batch of n training examples
� = � − � �� ; ��:�+� ; ��:�+�
⬡ Stochastic Gradient Descent: Stochastic gradient descent (SGD) in
contrast performs a parameter update for each training example �� and
label �� . � = � − � �� ; �� ; ��
Optimization Techniques
⬡ Momentum
⬡ Nesterov accelerated gradient
⬡ Adagrad
The softmax function converts its inputs, known as
⬡ Adadelta logit or logit scores, to be between 0 and 1, and also
normalizes the outputs so that they all sum up to 1.
⬡ RMSprop
⬡ Adam
⬡ AdaMax
⬡ Nadam
⬡ AMSGrad
Challenges:
⬡ Choosing a proper learning rate can be difficult. A learning rate that is
too small leads to painfully slow convergence, while a learning rate that
is too large can hinder convergence.
⬡ Another key challenge of minimizing highly non-convex error functions
common for neural networks is avoiding getting trapped in their
numerous suboptimal local minima.
Performance Evaluation
⬡ Training-Validation-Testing:
Parameter updating↔Interiem check↔Check at the end
Modify Model Architecture
⬡ Bias vs Overfit:
Underfitting Overfitting
⬡ n-fold validation: Fold-1 Fold-2 Fold-3 Fold-4 Fold-n

22
More Terminologies Pred →
↓True
Y1 Y2 YC YN
⬡ Performance: Y1 Y1Y1 Y1Y2 Y1Yc Y1YN
∙ Accuracy: Y2 Y2Y1 Y2Y2 Y2Yc Y2YN
1 TP c  TN c
c

c  1 , 2 .. N TP c  TN c  FP c  FN YC YcY1 YcY2 YcYc YcYN
c
YN YN Y1 YNY2 YNYc YNYN
∙ Sensitivity:
TEST
1 TP c TPc  YcYc
c

c  1 , 2 .. N TP c  FN
P
TP
N
FN
TN c  Y Y i j
ACTUAL
c 175 i c, j c
N P
150 25
FP FPc   YiYc
TN 825
Specificity:
i c
∙ 25 800
FN   YcYi
1 TN c

i c
175 825
c c  1 , 2 .. N TN c  FP c
23
Explainable and Interpretable model
24
Visual question answering
26
Machine Learning and Optimization (Only
Mimicking
Biological
Learning
Artificial Neural Network

Statistical
Learning
Hidden Markwove Model

Statistical
Learning
GMM & Expectation

Minimization
Fuzzy Rule
Rule Based Learning

Systems
based
and Export Systems

a few)
Optimization
Lagrangian
Support Vector
Machine
Information
Decission Tree and

Model
Gain
Random Forest
Linear & polynomial
Algebra
Linear
Regression
Statistical
Learning
Bayes Classification
Evaluation of a system
⬡ Accuracy: Learning system extracts knowledge from training data, such that the
learned knowledge is general enough to deal with the unknown data. Methods of
accuracy and error estimation quatitatively measures generalization capability.
⬡ Robustness: Machine can perform adequetly under all circumstances, including
the cases when information is corrupted by noise, is incomplete, and is interfered
with irrelevant data. It can be assessed with series of synthetic data representing
increased degree of inconsistencies in data.
⬡ Computational compexity and speed: How much memory is required and how
fast the system can learn.
⬡ Online learning: Which continues to aquire knowledge from a real-time
environment adaptively.
⬡ Interpretability: Level of understanding isight of a model
⬡ Scalability: Capability to build the learning machine by using huge amount of
data
Summary (Don't take is as the GOSPEL)
⬡ ML ia a field of experiment and experience,
not philosophy
Group Model Linear? Power Easy? Scale Plug &
ful? s? Play
Linear Models Linear & Logistic Y N Y Y Y
Regression
Basic Baye's Possib N Y Possi Y
DT, KNN ly bly
Ensemble RF, GMM, HMM N Y N Y Y
SVM SVM, Kernnel Trik Possib Y N N Y

ly
Complex
ANN MLP, Deep NN N Y N Y N

28
Thanks!
Any questions?
You can find me at:

suprava.patnaik@kiit.ac.in
� 29

Day 1 S3

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Day 1 S3

Uploaded by

Copyright:

Available Formats

PART-II: Deep Learning

PART-III: Trends &

AN A MIMICKING BNN LAYERED ARCHITECTURE

Using deep models for automated

Labeled Data ↔ generates performance feedback

No Labels Maximize Reward for Series of

CLASSIFICATION DIMENSION Ø Gaming

Speech Natural Interactive

Supervised Unsupervised Sequence Reinforcement

Emotion Recognition Diphone synthesis

Deep Blue Apple integrates

Contributing Problems : lack of

computing, logic, Moravec

Alphago, ImageNet, ...........

Deep Neural Network Era:

⬡ Multilayer Perceptron ⬡ Decission Tree ⬡ Bagging

Hyperplane car: �� � = �1 ��1 + �2 ��2 +�0 = 0

⬡ n-fold validation: Fold-1 Fold-2 Fold-3 Fold-4 Fold-n

⬡ Performance: Y1 Y1Y1 Y1Y2 Y1Yc Y1YN

∙ Accuracy: Y2 Y2Y1 Y2Y2 Y2Yc Y2YN

YN YN Y1 YNY2 YNYc YNYN

Artificial Neural Network

Hidden Markwove Model

GMM & Expectation

Rule Based Learning

and Export Systems

Decission Tree and

SVM SVM, Kernnel Trik Possib Y N N Y

ANN MLP, Deep NN N Y N Y N

You can find me at:

You might also like

Hyperplane car: �� = �1 ��1 + �2 ��2 +�0 = 0