Developing A Machining Learning Models From Start To Finish.

Developing a Machining
Learning Models from

Start to Finish..
Dr. A. OBULESU
Assoc. Professor
1
Learn by Doing…
2
A little about me
 Certified By NVDIA and Leadingindia.ai

 In charge- MALAI Club in Association with
Wave Lab
 Convener- MALAI Research Wing
 BoS Member of CSE & AI
... @ Anurag University
1982 – 2020 . . .
3
Agenda
① Key Terms of Machine Learning
② Steps for Designing a Models
③ Machine learning Models
④ Clarifications on Models
4
Key Terms in Machine

Learning
5
What is Machine Learning? P

Ideall
UL B
y I
Field of study that gives "computers the
S H
ability to learn without being explicitly
programmed"
-- Arthur Samuel, 1959

6
Ma
 Machine learning is about predicting the future
chinbased on
the past.
-- Hal Daume III e
Lear
past future
ning ic t
rn e d
lea r
Training model/ Testing
is…
model/
p
Data predictor Data predictor

7
Machine Learning Types

Classification
Classification
Supervised
Regression
Human Unsupervised
Unsupervised
Clustering
Supervision?
Reinforcement Clustering
Machine Learn Online

nline
Learning Incrementally ?
Batch Processing
Instance based
How They Generalize? Modl based
Model Based
Thursday, September 30, 202 leadingindia.ai A nationwide AI Skilling and 9
1
Few Popular Applications: Precision Agriculture, Learner Profiling, Video Captioning, Exploring Patterns from Satellite
images, Image detection in Healthcare, Identifying specific markers in Genomes, Creating Art and Music,
Recommendations, behavior prediction,
1
Three main areas where Deep learning is being prominently applied
1
1. Problems where a) There is no deterministic algorithm (not even of evil complexity) e.g.
Recognizing a 3D object from a given scene, Handwriting recognition, Speech recognition
2. Problems which don’t have a fix solution and goal posts keep changing. System adapts
and learns from experience e.g. SPAM emails, Financial fraud, IT Security Framework
3. Where Solutions are Individual specific or time dependent. e.g. recommendations and
targeted advertisements
4. For prediction based on past and existing patterns (not defined or defined by huge
number of weak rules) e.g. prediction of share prices etc.
For What kind of Applications we use Machine/Deep Learning

1
Why it has taken off now
Availability of Data
has increased due
to explosion in
Smart Mobiles and
devices
More Computing Release/development

Power is of new algorithms,
available due to APIs and Platforms
coming of for Deep Learning
NVIDIA GPUs Applications

1
Ess
Git and Github
enti
Python
Jupyter Notebooks al
Too
Numpy for Matrix and Vector calculations
Pandas for Data handling, curation and manipulation
Matplotlib for plotting of graphs for different analysis on data
So many other Python APIs to make your life easy to develop DL models
ls

1
Deep Learning Magic competing with
Natural Intelligence
Types of Learning Algorithms
Thursday, September 30, 20 leadingindia.ai A nationwide AI skilling and Research Initiative 17

21
Features
The features are the elements of your input vectors. The number of features is
equal to the number of nodes in the input layer of the network
Category Features
Housing Prices No. of Rooms, House Area, Air Pollution, Distance from facilities, Economic Index city, Security
Ranking etc.
Spam Detection presence or absence of certain email headers, the email structure, the language, the
frequency of specific terms, the grammatical correctness of the text etc.
Speech Recognition noise ratios, length of sounds, relative power of sounds, filter matches
Cancer Detection Clump thickness, Uniformity of cell size, Uniformity of cell shape, Marginal adhesion, Single
epithelial cell size, Number of bare nuclei, Bland chromatin, Number of normal nuclei, Mitosis
etc.
Cyber Attacks IP address, Timings, Location, Type of communication, traffic details etc.
Video Text matches, Ranking of the video, Interest overlap, history of seen videos, browsing patterns
Recommendations etc.
Image Classification Pixel values, Curves, Edges etc.
Thursday, September 30, 202 leadingindia.ai A nationwide AI skilling and 18

1
Weights
Weights correspond to each feature.
Weights denote how much the feature matters in the model.
Higher weight of a particular feature means that it is more important in
deciding the outcome of the model.
Weights of a feature represent that how much evidence it gives in favor

or against the current hypothesis in context of the existence or non-
existence of the pattern you are trying to identify in the current input.
Generally weights are initialized randomly.

we try to bring them to near optimal values so that they are able to fit
the model well and can help in prediction of unseen values
Thursday, September 30, 202 leadingindia.ai A nationwide AI skilling and 19
1
8
Supervised learning
label
• Feed The Label Dataset to The Model…
apple
apple
banana
In My Words: Supervised Learning is
banana Learning By Us …With Teacher…
Supervised learning: given labeled examples

Training Data 10
Ma
chin
rn e i ct
lea pr
e d
Lear
model/
predictor
ning
is…
Supervised
Learning
11
Supervised learning Contd..
model/ predicted label

predictor
Test Data
12
Unsupervised learning
In My Words: Un-Supervised
Learning is Learning With Out
Teacher…
• Feed The Dataset With out Label to The Model…
Unsupervised learning: given data, i.e. examples, But NO LABELS

13
Reinforcement learning
•Given a sequence of examples/states and a reward after
completing that sequence, learn to predict the action to take in
for an individual example/state
… WIN! In My Words:
Feedback System
… LOSE!
…
14
Lets us understand with use cases?
P
UL B
I
S H
Collect the data automatically Self driving car on the road Accurate results in Google search
Speech recognition in Mobile
Netflix Movie recommendation Product recommendation

!!! Many…!!!
15
Machine Learning Tools
MLlib - Distributed Simpl

e
16
Steps for Designing a ML

Model
1. Pycharm
2. notebook
3. kaggle note book
4.Google colab note book
5.Microsoft jupyter note book
17
1. Define Appropriately the

Problem
• What is the main objective? What are we trying to predict?
• What are the target features?
• What is the input data? Is it available?
• What kind of problem are we facing? Binary classification?
Clustering?
• What is the expected improvement?
18
* Define Appropriately the Problem : Case Study –

Predicting of Cardiovascular Risk
• Objective : Predicting of risk level !!!
• Input data is available from Kaggle, UCI etc.
• This problem is multi-class classification
• Predicting of Accurate class among:
• 0 – No Risk
• 1- Mild Risk
• 2- Acute Risk
• 3- Serious Risk 4 –Very serious Risk
19
2. Collecting Data
This is the first real step towards the real development of a machine learning
model
The more and better data that we get , The better model we will design
There are many sources calling Web scraping like
 Kaggle
 UCI Machine Learning Repository: Created as an ftp archive in 1987 by
David Aha and fellow graduate students at UC Irvine
20
2. Case study :
 Taken files from Kaggle
 features.csv
 target.csv
21
2. Case study :
 Taken files from Kaggle
 features.csv : Features Names
 target.csv : Features Name
22
3. Choose the Measure of Success
• “If you can’t measure it you can’t improve it”.
•Regression problems use certain evaluation

metrics such as mean squared error (MSE).
•Classification problems use evaluation metrics as

precision, accuracy and recall.
23
4. Setting an Evaluation Protocol

•
 The reason to split data in three parts is to avoid information
leaks.
 The main inconvenient of this method is that if there is little data
available, the validation and test sets will contain so few samples
that the tuning and evaluatation processes of the model will not be
effective
24
4. Setting an Evaluation Protocol
•
25
4. Setting an Evaluation Protocol : Case study- K-fold

validation
•
26
5. Preparing the Data

•
 We should transform our data in a way that can be fed into a Machine
Learning model
 Dealing with missing data
Handling Caterogical Data
 Feature Scaling : Normalization, Standardization
Selecting Meaningful Features : PCA

27
4.1 Over fitting and Under fitting

•
•The model is starting to underfit: it has learned so not well the
training data. Small Dataset, Simple Model
•The model is starting to overfit: it has learned so well the training
data that has learned patterns that are too specific to training data and
irrelevant to new data.

28

•
While training the Model, two issues we should consider
•Optimization is the process of adjusting a model to get the best
performance possible on training data (the learning process).
•Generalization is how well the model performs on unseen data. The
goal is to obtain the best generalization ability.

29

•
30
4.1 Two ways to avoid this overfitting
 •Getting more data is usually the best solution, a model trained on more data
will naturally generalize better.
 Regularization
•L1 regularization: The cost is proportional to the absolute the value of the
weights coefficients (L1 norm of the weights).
•L2 regularization: The cost is proportional to the square of the value of the
31
5. Developing a Benchmark model
•
 To develop a benchamark model that serves us as a baseline
 Benchmarking requires experiments to be comparable,
measurable, and reproducible

32
6. Developing a Better Model & Tunning its Hyper

parameters
•
 Finding Good model
 A number of folds in which we will split our data.
 A scoring method (that will vary depending on the problem’s
nature — regression, classification…).
 Some appropriate algorithms that we want to check .

33
Machine Learning Models

3
Machine Classifiers Models
① Logistic Regression
② KNN Classifiers
③ Stochastic Gradient Descent
④ SVM
34
1. Regression
• Predict future scores on Y based on measured scores on X

 Predictions are based on a correlation from a sample where both
X and Y were measured.
Equation is linear:
y = bx + a
y = predicted score on y
x = measured score on x
b = slope
a = y-intercept
35
Relationship between Variables

1. Relationship Between Variables In a Linear Function
Population Population Slope Random Error

Y-Intercept
Y i   0   1X i  i
Dependent Independent (Explanatory) Variable
(Response) Variable
(e.g., COVID-19.) (e.g., age)
things to know
- You need data labeled with the correct
answers to "train" these algorithms before
they work
- feature = dimension = attribute of the data
- class = category = Harry Po!er house
linear discriminants
draw a line through it
Linear discriminants
draw a line through it
Linear discriminants
!
6 wrong
4 wrong
a map of shittiness
to find the least shitty line
shittine
ss
slo
pe t
rcep
inte
linear discriminants:
probably donot use these
logistic regression
divide it with a log function
Divide it with a log function
logistic regression
!!!!!!!!!!!
+ gives you probabilities
+ the model is a formula
+ can ÒthresholdÓ to make model more or les
conservative
- only works with linear decision boundaries
36
Linear Vs. Logistic
 When the prediction of outcome for the given

input is in a range or continuous – Linear regression
 When the prediction of the outcome is in a
discrete form – Logistic Regression
37
Linear Vs. Logistic Regression

38
Logistic regression
That the probability can not be negative, so we introduce a term called
exponential in our normal regression model to make it logistic regression.
 Since the probability can never be greater than 1, we need to divide our
outcome by something bigger than itself
 Regression formula give us Y using formula Yi = β0 + β1X+ εi.

39
Logistic regression Contd…

 We have to use exponential so that it does not become negative and hence we get
P = exp(β0 + β1X+ εi).
We divide that P by something bigger than itself so that it remains less than one and hence
P = e( β0 + β1X+ εi)/e( β0 + β1X+ εi) +1
After doing some calculations that formula in 3rd step can be re-written as
log(p/(1-p)) = β0 + β1X+ εi.
 log(p/(1-p)) is called the odds of probability.
If you look closely it is the probability of desired outcome being true divided by the
probability of desired outcome not being true and this is called LOGIT FUNCTION.
40
KNN Classifier
• Data set:
• Training (labeled) data: T = {(x i , yi)}
• x i ∈ Rp
• Test (unlabeled) data: x0 ∈ Rp
• Tasks:
• Classification: yi ∈ {1, . . . , J}
• Given new x0 predict y0
• Methods:
• Model-based
• Memory-based
41
Classification
Heart attack risk assessment (source:

Alpaydin ...)
43
KNN Classifier
• 1 NN
• Predict the same value/class as the nearest instance in
the training set
• k NN
• Find the k closest training points (small ǁxi − x0ǁ
according to some metric, for ex. euclidean,
manhattan, etc.)
45
k NN - Example
source: Duda, Hart ...

KNN (k-nearest neighbors)
what do similar cases look like?Ó
Òwhat do similar cases look like?Ó
k=1
k=2
k=1
k=2
k=3
!!!!!!!!!!!
+ no training, adding new data is easy
+ you get to deÞne ÒdistanceÓ!
"""""""""""
- can be outlier-sensitive
- you haveto deÞne ÒdistanceÓ
46
k NN Classification
• Calculate distances of all training vectors to test

vector
• Pick k closest vectors
• Calculate average/majority
47
k NN Pros. & Cons.
• + Easy to understand and program

• + Explicit reject option
• if there is no majority agreement
• + Easy handling of missing values
• restrict distance calculation to
subspace
• + asymptotic misclassification rate (as the number of data
points n → ∞ ) is bounded above by twice the Bayes error
rate. (see Duda, Hart...)
48
k NN Pros. & Cons. Contd…
• - Affected by local structure

• - Sensitive to noise, irrelevant features
• - Computationally expensive O(nd)
• - Large memory requirements
• - More frequent classes dominate result (if distance not weighed in)
• - Curse of dimensionality: high nr. of dimensions and low nr. of
training samples:
• "nearest" neighbor might be very far
• in high dimensions "nearest" becomes meaningless
49
• Choice of k
• smaller k ⇒ higher variance (less stable)
• larger k ⇒ higher bias (less precise)
• Proper choice of k dependends on the data:

• Adaptive methods, heuristics
• Cross-validation
50
3.Stochastic Gradient Descent Classifier
• Gradient descent is an iterative algorithm, that starts from a

random point on a function and travels down its slope in steps
until it reaches the lowest point of that function.
Capable of handling large datasets
Deals with training instances independently
Well suited for online training
3.1 Performance measure - Cross Validation
Here K =10
51
4.Support Vector Machine

•SVM is to find a hyper plane in an N-dimensional space (N-
Number of features) that distinctly classifies the data points.
52
Support Vector Machine Contd..

53
Geometric Margin
Definition: The margin of example 𝑥 w.r.t. a linear sep. 𝑤 is the distance
from 𝑥 to the plane 𝑤 ⋅ 𝑥 = 0.
Definition: The margin 𝛾𝑤 of a set of examples 𝑆 wrt a linear separator 𝑤

is the smallest margin over points 𝑥 ∈ 𝑆.
Definition: The margin 𝛾 of a set of examples 𝑆 is the maximum
𝛾𝑤 over all linear separators 𝑤 . w
𝛾
- 𝛾 +
+
- +
- + +
- -
-
-
-
- better to definition of "Shitty”
-lines can turn into non-linear
shapes if you transform your data
"
"
Òthe kernel trickÓ
SVMs (support vector machines)
Ò*advanced* draw a line through itÓ
!!!!!!!!!!!
• works well on a lot of different shapes of data
thanks to the kernel trick
not super easy to explain to people can

only kinda do probabilities
55
Thank you !
obuleshcse@cvsr.ac.in
7
AI can Achieve
Robotics
Domain Rule Based

Specific Expert
Artificial Intelligence System
Computing
Machine
Learning
DL
12
Classification
Classification : Predicting of Discrete Value .i.e. Apple or Banana or 0 or 1
Applications
• Face recognition
• Character recognition ,
• Spam detection,
• Medical diagnosis: From symptoms to illnesses,
• Biometrics: Recognition/authentication using physical and/or behavioral
characteristics: Face, iris, signature, etc..
14
Supervised learning: Regression

label
Regression : Predicting label is
-4.5
real-valued
10.1
3.2
4.3
Supervised learning: given labeled examples

13
Regression Applications
• Economics/Finance: predict the value of a stock
• Epidemiology
• Car/plane navigation: angle of the steering wheel,

acceleration, …
• Temporal trends: weather over time

•…
16
Unsupervised learning
• Learn clusters/groups without any label
•Customer segmentation (i.e. grouping)
•Image compression
•Bioinformatics: learn motifs,

• …..

Developing A Machining Learning Models From Start To Finish.

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Developing A Machining Learning Models From Start To Finish.

Uploaded by

Copyright:

Available Formats

Developing a Machining

Learning Models from

 Certified By NVDIA and Leadingindia.ai

① Key Terms of Machine Learning

② Steps for Designing a Models

③ Machine learning Models

Key Terms in Machine

What is Machine Learning? P

ability to learn without being explicitly

-- Arthur Samuel, 1959

Data predictor Data predictor

Machine Learning Types

Machine Learn Online

For What kind of Applications we use Machine/Deep Learning

More Computing Release/development

Thursday, September 30, 202 leadingindia.ai A nationwide AI Skilling and 14

Thursday, September 30, 202 leadingindia.ai A nationwide AI Skilling and 15

Thursday, September 30, 20 leadingindia.ai A nationwide AI skilling and Research Initiative 17

Thursday, September 30, 202 leadingindia.ai A nationwide AI skilling and 18

Weights of a feature represent that how much evidence it gives in favor

Generally weights are initialized randomly.

Supervised learning: given labeled examples

Supervised learning Contd..

model/ predicted label

• Feed The Dataset With out Label to The Model…

Unsupervised learning: given data, i.e. examples, But NO LABELS

Speech recognition in Mobile

Netflix Movie recommendation Product recommendation

Machine Learning Tools

MLlib - Distributed Simpl

Steps for Designing a ML

1. Define Appropriately the

* Define Appropriately the Problem : Case Study –

3. Choose the Measure of Success

• “If you can’t measure it you can’t improve it”.

•Regression problems use certain evaluation

•Classification problems use evaluation metrics as

4. Setting an Evaluation Protocol

4. Setting an Evaluation Protocol

4. Setting an Evaluation Protocol : Case study- K-fold

5. Preparing the Data

 Dealing with missing data

Handling Caterogical Data

 Feature Scaling : Normalization, Standardization

Selecting Meaningful Features : PCA

4.1 Over fitting and Under fitting

training data. Small Dataset, Simple Model

•The model is starting to overfit: it has learned so well the training

irrelevant to new data.

4.1 Over fitting and Under fitting

•Optimization is the process of adjusting a model to get the best

performance possible on training data (the learning process).

•Generalization is how well the model performs on unseen data. The

goal is to obtain the best generalization ability.

4.1 Over fitting and Under fitting

4.1 Two ways to avoid this overfitting

will naturally generalize better.

weights coefficients (L1 norm of the weights).

5. Developing a Benchmark model

 Benchmarking requires experiments to be comparable,

measurable, and reproducible

6. Developing a Better Model & Tunning its Hyper

 A number of folds in which we will split our data.