You are on page 1of 29

PART-II: Deep Learning

PART-III: Trends &


Challenges

et
a i kf
at n
Timeline a. p
av c. i n
p r
su i i t . a
@k
Geadient Descent
Popular ML
Algorithm

www.kiit.ac.in
Cloud
Computing
DATA and Machine
WSN Learning

Algorithm

DATA
IoT

Computational
Resourses
(CPU/GPU/TPU/IPU)

2
Defination
Ø AI: Inteligence demonstrated by machines. Machines are programmed to interpret and act like human beings
Ø ML:Machine learning is a subset of AI that provides systems, the ability to learn automatically and
improve from experience (training-data) without being explicitly programmed.

FAST BEYOND

ING
DEEP

DE E P
COMPUTATION HANDCRAFTED

N
STRUCTURE

LEAR
PATTERNS
ANALYTICS

AN A MIMICKING BNN LAYERED ARCHITECTURE


N ,G
FL

MA
CH NING DATA PATTERN STATISTICAL METHODS
INE AR
LE
AR CE
TIF EN AUTOMATION USING RULES
ICI
AL LLIG (sensor & rule based technology)
E
INT
3
1950 1975 Engineering of making Sensors Actuators Statistics Probability
intelligent devices
Automation
(machine language) 1G ANN
(Artificial Statistics
Neural Driven
network)
Without being explicitly
1962 1995
programmed. FL
Learning of (Rulebased
Expert
GA
(Evolution Pattern
paterns 2G
Systems) Theory) Driven
(Assembly Language)

Using deep models for automated


1995 2020
pattern mining Data
Driven
Deep Modularity
3G
Learning Processing Query
(High Level Language)
Driven..??
Data driven
Hybridization of
Deep Models
(Query based
processing) 4G 4
⬡ Trained /learned over instances
⬡ Able to identify meaningful patterns
⬡ Needs memory to remember and utilize
appropriately

ML Timeline

Training
Data

optimize and
tune the model
5
Training

Labeled Data ↔ generates performance feedback


Predict (regression)/ Identify (classification)
Testing

SUPERVISED

No Labels Maximize Reward for Series of


Explore Hidden Associations actions (Environment~Action)
Exa: Gaming,Autonomous Driving
UN LEARN
ING..??

T
EN
SU

EM
PE

RC
RV

FO
IS

IN
ED

RE
6
Learning and applications
SUPERVISED LEARNING UNSUPERVISED LEARNING REINFORCEDLEARNING

CLASSIFICATION DIMENSION Ø Gaming


Ø Image REDUCTION Ø Inventory
Classification Ø Text Mining Management
Ø Diagnostics Ø Image/ Face Ø Automation &
Ø Spam email Recognition Robotics
filtering Ø Bigdata Ø Navigation
Ø Fraud detection Visualization
Ø PCA

REGRESSION CLUSTERING
Ø Risk assesment Ø Targeted
Ø Score Marketing
prediction Ø City Planning
Ø Share market Ø Biology
price 7
Popular Applications
Image
processing Learning
& Association Statistical Autonomous
Computer Vision Target Marketing Arbritage Driving ,
Robotics

Speech Natural Interactive


Dimension
Processing Language Gaming
Reduction
& Analysis Processing
PCA, ICA

Medical Traffic
Time Series
Diagnosis, Fraud/ Spam Monitoring &
Trend &
Function Approx. Detection Resource
Analysis
Scheduling

Supervised Unsupervised Sequence Reinforcement


8
Image Computer vision is a utility that makes useful decisions about real physical
processing objects and scenes based on sensed images”
&
Computer Vision Posenet

Pattern
Recognition

Handwritten Char
Recognition

Pose Estimation

Transformation
Using GAN

Converting 2D to 3D

Medical Image
Analysis
9
Speech
Processing
& Analysis

Word Recognition

Formant synthesis
Speaker Recognition
Articulatory synthesis

Emotion Recognition Diphone synthesis

Speech to Text

Language Translation

Music Synthesis

Medical Diagnosis 10
Milestones in the history of ML
AI as a concept no realworld Military and Academia Large tech companies invest in
application (1950 ~1980) Interest (1980~2005) commercial applications
(2005~onwords)

Deep Blue Apple integrates

Contributing Problems : lack of


1. Joseph W: Chatbox ELIZA defeats chess Google Brain
into iphone to

computing, logic, Moravec


2.Rosenblatts: Perceptron champion recognizes picture,
inteligent virtual
describes a scene

Alphago, ImageNet, ...........


Garry Kaspav

Deep Neural Network Era:


assistance

paradox
1964 2012-
1997 2011
2014

1956

2002 2011
Other NN
McCarthy Models Roomba (robotic JIBMs Watson
Introduces the (MLP,AM, Vacuum cleaner) wins TV show
term AI SOFM) AI Winter learns to navigate Jeopardy
homes 11
Leaders In The Cloud Vendors
⬡ Amazon Web Services. ... ⬡ Nauto: Learning Platform
⬡ Google Cloud Platform. ... ⬡ Tempus: Data driven precision medicine
⬡ IBM Cloud. ... ⬡ Phrasee: Natural Language processing
⬡ Microsoft Azure. ... ⬡ Siemens: Energy, electrification,
⬡ Alibaba Cloud. digitazation, automation
⬡ Socure: Banking and Investment
⬡ Blue River Technology: Smart Farming
⬡ Nvidia: CUDA GPU
⬡ Intel: chip maker
⬡ Zebra Madical Vision, Iris.AI, Freenome,
⬡ Graphcor (IPU) 12
Popular Machine Learning Methods
Attention Model Networks Statistics, Probabilistic Performance Augmentation
Inferential Approach Strategies

⬡ Multilayer Perceptron ⬡ Decission Tree ⬡ Bagging


⬡ Associative mapping ⬡ Baye’s Probabilistic ⬡ Boosting
⬡ Radial Basis Function ⬡ Support Vector ⬡ n-fold validation
⬡ Convolutional NN
Machine ⬡ Ensemble Learning
⬡ Gaussian Mixture
⬡ Recurrent NN ⬡ Stacking
Model
⬡ Hidden Markove ⬡ Transfer Learning
⬡ Long Short Term
Memory Model ⬡ Genarative models
13
Classification Conception

Hyperplane car: �� � = �1 ��1 + �2 ��2 +�0 = 0


Radial Basis functions
optimization: Cluster Center,
Cluster Variance
14
Gradient Descent
⬡ Parametric Optimization method
��
⬡ How to move in steps: descend the �� − �� ��
�� �� + ��
��
slope (gradient)
⬡ Activation function to deal with non-
linear mapping functions.

Gradient descent is an
iterative algorithm, that
starts from a random point
on a function and travels
down its slope in steps
until it reaches the lowest
point of that function
1,1
�→ 1,1 1. Model Representation (layers, neuron in layers)
� →�
2. Activation Function σ (linear, sigmoid, ReLU...)
�1,2 �1,2 �2,1 3. Bias Node
→�→ 4. Cost Function (MSE, Cross Entropy,...)

�→ � �
1,3 1,3
�2,2 5. Forward Propagation Calculation

6. Backpropagation Algorithm
1,4
�→ � �→1,4 7. Code Implementation
(�1 , �1 , �) (�2 , �2 , �)

��,� ��,�
�[�] = ⋮ →�→ � = ⋮
[�]
�,�
� ��,�
�,� �,�
�[�] = ��,� → � → �[�] = ��,�
� �


�=


�=

Back Propagation Backward Propagation
Graph-based computation carried out by tensorflow.
���
��
Forward Propagation ��
���
�� ���

�� ��  ( x)  1 ��� ��� P

Si
���
1e x
���
P

gm
oi
d
�� ��   ( x) �� ��
  ( x)1 ( x)  ��
��� ���
 x
���
�� = � �� ��,� + �� ��,�
x �� ��
�� = � �� ��,� + �� ��,�  ( x )  1 e ∆�3,1

hy
∆�3,2

t a abo
pa
n- li
��1 ��2
1 e  x

r
� = � �� ��,� + �� ��,�

c
  ( x) 1 �� �� ��1 �� ��2
 1 ( x ) 2  = +
��� ���  x 2  ��1 ��1 ��1 ��2 ��1
= ��,� �’ = ��,� �’
��� ���
∆�1,1 ∆�2,1
��� ���
= ��,� �’ = ��,� �’
��� ��� �� �� ��1 �� ��2
= +
��2 ��1 ��2 ��2 ��2
�� ��
= ��,� �’ = ��,� �’
��� ��� ∆�2,2
∆�1,2
https://medium.com/ml-cheat-
Three Layer MLP sheet/winning-at-loss-
functions-common-loss-
functions-that-you-should-
know-a72c1802ecb4


�=� (�� − �� ) �
⬡ Mean squared error (MSE): ����
Loss Functions

=

⬡ Mean absolute error (MAE), Huber Loss,....
⬡ Binary Cross-Entropy or Log-loss error: aims to reduce the entropy of the predicted
probability distribution in binary classification problems

���� = �=�
[− �� ���(�� ) − (� − �� ) ���(� − �� )]
⬡ Hing Loss = �
�=0
���(0,1 − �� �� ) , KL Divergence.....

⬡ Multiclass Cross-Entropy for C classes: − � ���(�� )
�=1 �
Variants of Gradient Descent:
⬡ Batch Gradient Descent: Vanilla gradient descent, aka batch gradient
descent, computes the gradient of the cost function w.r.t. to the
parameters W for the entire training dataset.
� = � − � �� �(�).
⬡ Mini Batch Gradient Descent: Mini-batch gradient descent finally
takes the best of both worlds and performs an update for every mini-
batch of n training examples

� = � − � �� � �; ��:�+� ; ��:�+�
⬡ Stochastic Gradient Descent: Stochastic gradient descent (SGD) in
contrast performs a parameter update for each training example �� and
label �� . � = � − � �� � �; �� ; ��
Optimization Techniques
⬡ Momentum
⬡ Nesterov accelerated gradient
⬡ Adagrad
The softmax function converts its inputs, known as
⬡ Adadelta logit or logit scores, to be between 0 and 1, and also
normalizes the outputs so that they all sum up to 1.
⬡ RMSprop
⬡ Adam
⬡ AdaMax
⬡ Nadam
⬡ AMSGrad
Challenges:
⬡ Choosing a proper learning rate can be difficult. A learning rate that is
too small leads to painfully slow convergence, while a learning rate that
is too large can hinder convergence.
⬡ Another key challenge of minimizing highly non-convex error functions
common for neural networks is avoiding getting trapped in their
numerous suboptimal local minima.
Performance Evaluation
⬡ Training-Validation-Testing:
Parameter updating↔Interiem check↔Check at the end
Modify Model Architecture

⬡ Bias vs Overfit:

Underfitting Overfitting

⬡ n-fold validation: Fold-1 Fold-2 Fold-3 Fold-4 Fold-n


22
More Terminologies Pred →
↓True
Y1 Y2 YC YN

⬡ Performance: Y1 Y1Y1 Y1Y2 Y1Yc Y1YN

∙ Accuracy: Y2 Y2Y1 Y2Y2 Y2Yc Y2YN

1 TP c  TN c
c

c  1 , 2 .. N TP c  TN c  FP c  FN YC YcY1 YcY2 YcYc YcYN
c

YN YN Y1 YNY2 YNYc YNYN

∙ Sensitivity:
TEST
1 TP c TPc  YcYc
c

c  1 , 2 .. N TP c  FN
P
TP
N
FN
TN c  Y Y i j

ACTUAL
c 175 i c, j c

N P
150 25
FP FPc   YiYc
TN 825
Specificity:
i c
∙ 25 800
FN   YcYi
1 TN c

i c
175 825
c c  1 , 2 .. N TN c  FP c
23
Explainable and Interpretable model

24
Visual question answering
26
Machine Learning and Optimization (Only

Mimicking
Biological
Learning

Artificial Neural Network


Statistical
Learning

Hidden Markwove Model


Statistical
Learning

GMM & Expectation


Minimization
Fuzzy Rule

Rule Based Learning


Systems
based

and Export Systems


a few)

Optimization
Lagrangian

Support Vector
Machine
Information

Decission Tree and


Model
Gain

Random Forest
Linear & polynomial
Algebra
Linear

Regression
Statistical
Learning
Bayes Classification
Evaluation of a system
⬡ Accuracy: Learning system extracts knowledge from training data, such that the
learned knowledge is general enough to deal with the unknown data. Methods of
accuracy and error estimation quatitatively measures generalization capability.
⬡ Robustness: Machine can perform adequetly under all circumstances, including
the cases when information is corrupted by noise, is incomplete, and is interfered
with irrelevant data. It can be assessed with series of synthetic data representing
increased degree of inconsistencies in data.
⬡ Computational compexity and speed: How much memory is required and how
fast the system can learn.
⬡ Online learning: Which continues to aquire knowledge from a real-time
environment adaptively.
⬡ Interpretability: Level of understanding isight of a model
⬡ Scalability: Capability to build the learning machine by using huge amount of
data
Summary (Don't take is as the GOSPEL)
⬡ ML ia a field of experiment and experience,
not philosophy
Group Model Linear? Power Easy? Scale Plug &
ful? s? Play
Linear Models Linear & Logistic Y N Y Y Y
Regression
Basic Baye's Possib N Y Possi Y
DT, KNN ly bly
Ensemble RF, GMM, HMM N Y N Y Y

SVM SVM, Kernnel Trik Possib Y N N Y


ly
Complex

ANN MLP, Deep NN N Y N Y N


28
Thanks!
Any questions?

You can find me at:


suprava.patnaik@kiit.ac.in

� 29

You might also like