Professional Documents
Culture Documents
et
a i kf
at n
Timeline a. p
av c. i n
p r
su i i t . a
@k
Geadient Descent
Popular ML
Algorithm
www.kiit.ac.in
Cloud
Computing
DATA and Machine
WSN Learning
Algorithm
DATA
IoT
Computational
Resourses
(CPU/GPU/TPU/IPU)
2
Defination
Ø AI: Inteligence demonstrated by machines. Machines are programmed to interpret and act like human beings
Ø ML:Machine learning is a subset of AI that provides systems, the ability to learn automatically and
improve from experience (training-data) without being explicitly programmed.
FAST BEYOND
ING
DEEP
DE E P
COMPUTATION HANDCRAFTED
N
STRUCTURE
LEAR
PATTERNS
ANALYTICS
MA
CH NING DATA PATTERN STATISTICAL METHODS
INE AR
LE
AR CE
TIF EN AUTOMATION USING RULES
ICI
AL LLIG (sensor & rule based technology)
E
INT
3
1950 1975 Engineering of making Sensors Actuators Statistics Probability
intelligent devices
Automation
(machine language) 1G ANN
(Artificial Statistics
Neural Driven
network)
Without being explicitly
1962 1995
programmed. FL
Learning of (Rulebased
Expert
GA
(Evolution Pattern
paterns 2G
Systems) Theory) Driven
(Assembly Language)
ML Timeline
Training
Data
optimize and
tune the model
5
Training
SUPERVISED
T
EN
SU
EM
PE
RC
RV
FO
IS
IN
ED
RE
6
Learning and applications
SUPERVISED LEARNING UNSUPERVISED LEARNING REINFORCEDLEARNING
REGRESSION CLUSTERING
Ø Risk assesment Ø Targeted
Ø Score Marketing
prediction Ø City Planning
Ø Share market Ø Biology
price 7
Popular Applications
Image
processing Learning
& Association Statistical Autonomous
Computer Vision Target Marketing Arbritage Driving ,
Robotics
Medical Traffic
Time Series
Diagnosis, Fraud/ Spam Monitoring &
Trend &
Function Approx. Detection Resource
Analysis
Scheduling
Pattern
Recognition
Handwritten Char
Recognition
Pose Estimation
Transformation
Using GAN
Converting 2D to 3D
Medical Image
Analysis
9
Speech
Processing
& Analysis
Word Recognition
Formant synthesis
Speaker Recognition
Articulatory synthesis
Speech to Text
Language Translation
Music Synthesis
Medical Diagnosis 10
Milestones in the history of ML
AI as a concept no realworld Military and Academia Large tech companies invest in
application (1950 ~1980) Interest (1980~2005) commercial applications
(2005~onwords)
paradox
1964 2012-
1997 2011
2014
1956
2002 2011
Other NN
McCarthy Models Roomba (robotic JIBMs Watson
Introduces the (MLP,AM, Vacuum cleaner) wins TV show
term AI SOFM) AI Winter learns to navigate Jeopardy
homes 11
Leaders In The Cloud Vendors
⬡ Amazon Web Services. ... ⬡ Nauto: Learning Platform
⬡ Google Cloud Platform. ... ⬡ Tempus: Data driven precision medicine
⬡ IBM Cloud. ... ⬡ Phrasee: Natural Language processing
⬡ Microsoft Azure. ... ⬡ Siemens: Energy, electrification,
⬡ Alibaba Cloud. digitazation, automation
⬡ Socure: Banking and Investment
⬡ Blue River Technology: Smart Farming
⬡ Nvidia: CUDA GPU
⬡ Intel: chip maker
⬡ Zebra Madical Vision, Iris.AI, Freenome,
⬡ Graphcor (IPU) 12
Popular Machine Learning Methods
Attention Model Networks Statistics, Probabilistic Performance Augmentation
Inferential Approach Strategies
Gradient descent is an
iterative algorithm, that
starts from a random point
on a function and travels
down its slope in steps
until it reaches the lowest
point of that function
1,1
�→ 1,1 1. Model Representation (layers, neuron in layers)
� →�
2. Activation Function σ (linear, sigmoid, ReLU...)
�1,2 �1,2 �2,1 3. Bias Node
→�→ 4. Cost Function (MSE, Cross Entropy,...)
�
�→ � �
1,3 1,3
�2,2 5. Forward Propagation Calculation
→
6. Backpropagation Algorithm
1,4
�→ � �→1,4 7. Code Implementation
(�1 , �1 , �) (�2 , �2 , �)
��,� ��,�
�[�] = ⋮ →�→ � = ⋮
[�]
�,�
� ��,�
�,� �,�
�[�] = ��,� → � → �[�] = ��,�
� �
�
�=
�
�
�=
�
Back Propagation Backward Propagation
Graph-based computation carried out by tensorflow.
���
��
Forward Propagation ��
���
�� ���
�� �� ( x) 1 ��� ��� P
Si
���
1e x
���
P
gm
oi
d
�� �� ( x) �� ��
( x)1 ( x) ��
��� ���
x
���
�� = � �� ��,� + �� ��,�
x �� ��
�� = � �� ��,� + �� ��,� ( x ) 1 e ∆�3,1
hy
∆�3,2
t a abo
pa
n- li
��1 ��2
1 e x
r
� = � �� ��,� + �� ��,�
c
( x) 1 �� �� ��1 �� ��2
1 ( x ) 2 = +
��� ��� x 2 ��1 ��1 ��1 ��2 ��1
= ��,� �’ = ��,� �’
��� ���
∆�1,1 ∆�2,1
��� ���
= ��,� �’ = ��,� �’
��� ��� �� �� ��1 �� ��2
= +
��2 ��1 ��2 ��2 ��2
�� ��
= ��,� �’ = ��,� �’
��� ��� ∆�2,2
∆�1,2
https://medium.com/ml-cheat-
Three Layer MLP sheet/winning-at-loss-
functions-common-loss-
functions-that-you-should-
know-a72c1802ecb4
�
�=� (�� − �� ) �
⬡ Mean squared error (MSE): ����
Loss Functions
=
�
⬡ Mean absolute error (MAE), Huber Loss,....
⬡ Binary Cross-Entropy or Log-loss error: aims to reduce the entropy of the predicted
probability distribution in binary classification problems
�
���� = �=�
[− �� ���(�� ) − (� − �� ) ���(� − �� )]
⬡ Hing Loss = �
�=0
���(0,1 − �� �� ) , KL Divergence.....
�
⬡ Multiclass Cross-Entropy for C classes: − � ���(�� )
�=1 �
Variants of Gradient Descent:
⬡ Batch Gradient Descent: Vanilla gradient descent, aka batch gradient
descent, computes the gradient of the cost function w.r.t. to the
parameters W for the entire training dataset.
� = � − � �� �(�).
⬡ Mini Batch Gradient Descent: Mini-batch gradient descent finally
takes the best of both worlds and performs an update for every mini-
batch of n training examples
� = � − � �� � �; ��:�+� ; ��:�+�
⬡ Stochastic Gradient Descent: Stochastic gradient descent (SGD) in
contrast performs a parameter update for each training example �� and
label �� . � = � − � �� � �; �� ; ��
Optimization Techniques
⬡ Momentum
⬡ Nesterov accelerated gradient
⬡ Adagrad
The softmax function converts its inputs, known as
⬡ Adadelta logit or logit scores, to be between 0 and 1, and also
normalizes the outputs so that they all sum up to 1.
⬡ RMSprop
⬡ Adam
⬡ AdaMax
⬡ Nadam
⬡ AMSGrad
Challenges:
⬡ Choosing a proper learning rate can be difficult. A learning rate that is
too small leads to painfully slow convergence, while a learning rate that
is too large can hinder convergence.
⬡ Another key challenge of minimizing highly non-convex error functions
common for neural networks is avoiding getting trapped in their
numerous suboptimal local minima.
Performance Evaluation
⬡ Training-Validation-Testing:
Parameter updating↔Interiem check↔Check at the end
Modify Model Architecture
⬡ Bias vs Overfit:
Underfitting Overfitting
1 TP c TN c
c
c 1 , 2 .. N TP c TN c FP c FN YC YcY1 YcY2 YcYc YcYN
c
∙ Sensitivity:
TEST
1 TP c TPc YcYc
c
c 1 , 2 .. N TP c FN
P
TP
N
FN
TN c Y Y i j
ACTUAL
c 175 i c, j c
N P
150 25
FP FPc YiYc
TN 825
Specificity:
i c
∙ 25 800
FN YcYi
1 TN c
i c
175 825
c c 1 , 2 .. N TN c FP c
23
Explainable and Interpretable model
24
Visual question answering
26
Machine Learning and Optimization (Only
Mimicking
Biological
Learning
Optimization
Lagrangian
Support Vector
Machine
Information
Random Forest
Linear & polynomial
Algebra
Linear
Regression
Statistical
Learning
Bayes Classification
Evaluation of a system
⬡ Accuracy: Learning system extracts knowledge from training data, such that the
learned knowledge is general enough to deal with the unknown data. Methods of
accuracy and error estimation quatitatively measures generalization capability.
⬡ Robustness: Machine can perform adequetly under all circumstances, including
the cases when information is corrupted by noise, is incomplete, and is interfered
with irrelevant data. It can be assessed with series of synthetic data representing
increased degree of inconsistencies in data.
⬡ Computational compexity and speed: How much memory is required and how
fast the system can learn.
⬡ Online learning: Which continues to aquire knowledge from a real-time
environment adaptively.
⬡ Interpretability: Level of understanding isight of a model
⬡ Scalability: Capability to build the learning machine by using huge amount of
data
Summary (Don't take is as the GOSPEL)
⬡ ML ia a field of experiment and experience,
not philosophy
Group Model Linear? Power Easy? Scale Plug &
ful? s? Play
Linear Models Linear & Logistic Y N Y Y Y
Regression
Basic Baye's Possib N Y Possi Y
DT, KNN ly bly
Ensemble RF, GMM, HMM N Y N Y Y
� 29