You are on page 1of 103

Developing a Machining

Learning Models from


Start to Finish..

Dr. A. OBULESU
Assoc. Professor
1

Learn by Doing…
2

A little about me

 Certified By NVDIA and Leadingindia.ai


 In charge- MALAI Club in Association with
Wave Lab
 Convener- MALAI Research Wing
 BoS Member of CSE & AI
... @ Anurag University

1982 – 2020 . . .
3

Agenda

① Key Terms of Machine Learning

② Steps for Designing a Models

③ Machine learning Models

④ Clarifications on Models
4

Key Terms in Machine


Learning
5

What is Machine Learning? P


Ideall
UL B
y I
Field of study that gives "computers the
S H

ability to learn without being explicitly

programmed"

-- Arthur Samuel, 1959


6

Ma
 Machine learning is about predicting the future
chinbased on
the past.
-- Hal Daume III e
Lear
past future
ning ic t
rn e d
lea r
Training model/ Testing
is…
model/
p

Data predictor Data predictor


7

Machine Learning Types


Classification
Classification
Supervised
Regression
Human Unsupervised
Unsupervised
Clustering
Supervision?
Reinforcement Clustering

Machine Learn Online


nline
Learning Incrementally ?
Batch Processing

Instance based
How They Generalize? Modl based
Model Based
Thursday, September 30, 202 leadingindia.ai A nationwide AI Skilling and 9
1
Few Popular Applications: Precision Agriculture, Learner Profiling, Video Captioning, Exploring Patterns from Satellite
images, Image detection in Healthcare, Identifying specific markers in Genomes, Creating Art and Music,
Recommendations, behavior prediction,
Thursday, September 30, 202 leadingindia.ai A nationwide AI Skilling and 10
1
Three main areas where Deep learning is being prominently applied
Thursday, September 30, 202 leadingindia.ai A nationwide AI Skilling and 12
1
1. Problems where a) There is no deterministic algorithm (not even of evil complexity) e.g.
Recognizing a 3D object from a given scene, Handwriting recognition, Speech recognition

2. Problems which don’t have a fix solution and goal posts keep changing. System adapts
and learns from experience e.g. SPAM emails, Financial fraud, IT Security Framework

3. Where Solutions are Individual specific or time dependent. e.g. recommendations and
targeted advertisements

4. For prediction based on past and existing patterns (not defined or defined by huge
number of weak rules) e.g. prediction of share prices etc.

For What kind of Applications we use Machine/Deep Learning


Thursday, September 30, 202 leadingindia.ai A nationwide AI Skilling and 13
1
Why it has taken off now
Availability of Data
has increased due
to explosion in
Smart Mobiles and
devices

More Computing Release/development


Power is of new algorithms,
available due to APIs and Platforms
coming of for Deep Learning
NVIDIA GPUs Applications

Thursday, September 30, 202 leadingindia.ai A nationwide AI Skilling and 14


1
Ess
Git and Github
enti
Python
Jupyter Notebooks al
Too
Numpy for Matrix and Vector calculations
Pandas for Data handling, curation and manipulation
Matplotlib for plotting of graphs for different analysis on data
So many other Python APIs to make your life easy to develop DL models
ls

Thursday, September 30, 202 leadingindia.ai A nationwide AI Skilling and 15


1
Deep Learning Magic competing with
Natural Intelligence
Types of Learning Algorithms

Thursday, September 30, 20 leadingindia.ai A nationwide AI skilling and Research Initiative 17


21
Features
The features are the elements of your input vectors. The number of features is
equal to the number of nodes in the input layer of the network
Category Features
Housing Prices No. of Rooms, House Area, Air Pollution, Distance from facilities, Economic Index city, Security
Ranking etc.
Spam Detection presence or absence of certain email headers, the email structure, the language, the
frequency of specific terms, the grammatical correctness of the text etc.
Speech Recognition noise ratios, length of sounds, relative power of sounds, filter matches
Cancer Detection Clump thickness, Uniformity of cell size, Uniformity of cell shape, Marginal adhesion, Single
epithelial cell size, Number of bare nuclei, Bland chromatin, Number of normal nuclei, Mitosis
etc.
Cyber Attacks IP address, Timings, Location, Type of communication, traffic details etc.
Video Text matches, Ranking of the video, Interest overlap, history of seen videos, browsing patterns
Recommendations etc.
Image Classification Pixel values, Curves, Edges etc.

Thursday, September 30, 202 leadingindia.ai A nationwide AI skilling and 18


1
Weights
Weights correspond to each feature.
Weights denote how much the feature matters in the model.
Higher weight of a particular feature means that it is more important in
deciding the outcome of the model.

Weights of a feature represent that how much evidence it gives in favor


or against the current hypothesis in context of the existence or non-
existence of the pattern you are trying to identify in the current input.

Generally weights are initialized randomly.


we try to bring them to near optimal values so that they are able to fit
the model well and can help in prediction of unseen values
Thursday, September 30, 202 leadingindia.ai A nationwide AI skilling and 19
1
8

Supervised learning
label
• Feed The Label Dataset to The Model…

apple

apple

banana
In My Words: Supervised Learning is
banana Learning By Us …With Teacher…

Supervised learning: given labeled examples


Training Data 10

Ma
chin
rn e i ct
lea pr
e d
Lear
model/
predictor
ning
is…
Supervised
Learning
11

Supervised learning Contd..

model/ predicted label


predictor

Test Data
12

Unsupervised learning
In My Words: Un-Supervised
Learning is Learning With Out
Teacher…

• Feed The Dataset With out Label to The Model…

Unsupervised learning: given data, i.e. examples, But NO LABELS


13

Reinforcement learning
•Given a sequence of examples/states and a reward after
completing that sequence, learn to predict the action to take in
for an individual example/state
… WIN! In My Words:
Feedback System
… LOSE!

14
Lets us understand with use cases?
P
UL B
I
S H
Collect the data automatically Self driving car on the road Accurate results in Google search

Speech recognition in Mobile

Netflix Movie recommendation Product recommendation


!!! Many…!!!
15

Machine Learning Tools

MLlib - Distributed Simpl


e
16

Steps for Designing a ML


Model
1. Pycharm

2. notebook
3. kaggle note book
4.Google colab note book
5.Microsoft jupyter note book
17

1. Define Appropriately the


Problem
• What is the main objective? What are we trying to predict?
• What are the target features?
• What is the input data? Is it available?
• What kind of problem are we facing? Binary classification?
Clustering?
• What is the expected improvement?
18

* Define Appropriately the Problem : Case Study –


Predicting of Cardiovascular Risk
• Objective : Predicting of risk level !!!
• Input data is available from Kaggle, UCI etc.
• This problem is multi-class classification
• Predicting of Accurate class among:
• 0 – No Risk
• 1- Mild Risk
• 2- Acute Risk
• 3- Serious Risk 4 –Very serious Risk
19

2. Collecting Data
This is the first real step towards the real development of a machine learning
model
The more and better data that we get , The better model we will design
There are many sources calling Web scraping like
 Kaggle
 UCI Machine Learning Repository: Created as an ftp archive in 1987 by
David Aha and fellow graduate students at UC Irvine
20

2. Case study :
 Taken files from Kaggle
 features.csv
 target.csv
21

2. Case study :
 Taken files from Kaggle
 features.csv : Features Names
 target.csv : Features Name
22

3. Choose the Measure of Success

• “If you can’t measure it you can’t improve it”.

•Regression problems use certain evaluation


metrics such as mean squared error (MSE).

•Classification problems use evaluation metrics as


precision, accuracy and recall.
23

4. Setting an Evaluation Protocol



 The reason to split data in three parts is to avoid information
leaks.
 The main inconvenient of this method is that if there is little data
available, the validation and test sets will contain so few samples
that the tuning and evaluatation processes of the model will not be
effective
24

4. Setting an Evaluation Protocol


25

4. Setting an Evaluation Protocol : Case study- K-fold


validation

26

5. Preparing the Data



 We should transform our data in a way that can be fed into a Machine

Learning model

 Dealing with missing data

Handling Caterogical Data

 Feature Scaling : Normalization, Standardization

Selecting Meaningful Features : PCA


27

4.1 Over fitting and Under fitting



•The model is starting to underfit: it has learned so not well the

training data. Small Dataset, Simple Model

•The model is starting to overfit: it has learned so well the training

data that has learned patterns that are too specific to training data and

irrelevant to new data.


28

4.1 Over fitting and Under fitting



While training the Model, two issues we should consider

•Optimization is the process of adjusting a model to get the best

performance possible on training data (the learning process).

•Generalization is how well the model performs on unseen data. The

goal is to obtain the best generalization ability.


29

4.1 Over fitting and Under fitting



30

4.1 Two ways to avoid this overfitting

 •Getting more data is usually the best solution, a model trained on more data

will naturally generalize better.

 Regularization 

•L1 regularization: The cost is proportional to the absolute the value of the

weights coefficients (L1 norm of the weights).

•L2 regularization: The cost is proportional to the square of the value of the
31

5. Developing a Benchmark model


 To develop a benchamark model that serves us as a baseline

 Benchmarking requires experiments to be comparable,

measurable, and reproducible


32

6. Developing a Better Model & Tunning its Hyper


parameters

 Finding Good model

 A number of folds in which we will split our data.

 A scoring method (that will vary depending on the problem’s

nature — regression, classification…).

 Some appropriate algorithms that we want to check .


33

Machine Learning Models


3

Machine Classifiers Models

① Logistic Regression

② KNN Classifiers

③ Stochastic Gradient Descent

④ SVM
34

1. Regression
• Predict future scores on Y based on measured scores on X

 Predictions are based on a correlation from a sample where both
X and Y were measured.
Equation is linear:
y = bx + a
y = predicted score on y
x = measured score on x
b = slope
a = y-intercept
35

Relationship between Variables


1. Relationship Between Variables In a Linear Function

Population Population Slope Random Error


Y-Intercept

Y i   0   1X i  i
Dependent Independent (Explanatory) Variable
(Response) Variable
(e.g., COVID-19.) (e.g., age)
things to know
- You need data labeled with the correct
answers to "train" these algorithms before
they work
- feature = dimension = attribute of the data
- class = category = Harry Po!er house
linear discriminants
draw a line through it
Linear discriminants
draw a line through it
Linear discriminants

!
6 wrong
4 wrong
a map of shittiness
to find the least shitty line

shittine
ss

slo
pe t
rcep
inte
linear discriminants:
probably donot use these
logistic regression
divide it with a log function
Divide it with a log function
logistic regression
!!!!!!!!!!!
+ gives you probabilities
+ the model is a formula
+ can ÒthresholdÓ to make model more or les
conservative
- only works with linear decision boundaries
36

Linear Vs. Logistic

 When the prediction of outcome for the given


input is in a range or continuous – Linear regression
 When the prediction of the outcome is in a
discrete form – Logistic Regression
37

Linear Vs. Logistic Regression


38

Logistic regression 
That the probability can not be negative, so we introduce a term called

exponential in our normal regression model to make it logistic regression.

 Since the probability can never be greater than 1, we need to divide our

outcome by something bigger than itself

 Regression formula give us Y using formula Yi = β0 + β1X+ εi.


39

Logistic regression Contd…


 We have to use exponential so that it does not become negative and hence we get
P = exp(β0 + β1X+ εi).
We divide that P by something bigger than itself so that it remains less than one and hence
P = e( β0 + β1X+ εi)/e( β0 + β1X+ εi) +1
After doing some calculations that formula in 3rd step can be re-written as
log(p/(1-p)) = β0 + β1X+ εi.
 log(p/(1-p)) is called the odds of probability.
If you look closely it is the probability of desired outcome being true divided by the
probability of desired outcome not being true and this is called LOGIT FUNCTION.
40

KNN Classifier
• Data set:
• Training (labeled) data: T = {(x i , yi)}
• x i ∈ Rp
• Test (unlabeled) data: x0 ∈ Rp
• Tasks:
• Classification: yi ∈ {1, . . . , J}
• Given new x0 predict y0
• Methods:
• Model-based
• Memory-based
41

Classification

Heart attack risk assessment (source:


Alpaydin ...)
43

KNN Classifier
• 1 NN
• Predict the same value/class as the nearest instance in
the training set
• k NN
• Find the k closest training points (small ǁxi − x0ǁ
according to some metric, for ex. euclidean,
manhattan, etc.)
45

k NN - Example

source: Duda, Hart ...


KNN (k-nearest neighbors)
what do similar cases look like?Ó
KNN (k-nearest neighbors)
Òwhat do similar cases look like?Ó
k=1
KNN (k-nearest neighbors)
Òwhat do similar cases look like?Ó
k=2
KNN (k-nearest neighbors)
Òwhat do similar cases look like?Ó
k=1
KNN (k-nearest neighbors)
Òwhat do similar cases look like?Ó
k=2
KNN (k-nearest neighbors)
Òwhat do similar cases look like?Ó
k=3
KNN (k-nearest neighbors)
Òwhat do similar cases look like?Ó
KNN (k-nearest neighbors)
Òwhat do similar cases look like?Ó

!!!!!!!!!!!
+ no training, adding new data is easy
+ you get to deÞne ÒdistanceÓ!

"""""""""""
- can be outlier-sensitive
- you haveto deÞne ÒdistanceÓ
46

k NN Classification

• Calculate distances of all training vectors to test


vector
• Pick k closest vectors
• Calculate average/majority
47

k NN Pros. & Cons.

• + Easy to understand and program


• + Explicit reject option
• if there is no majority agreement
• + Easy handling of missing values
• restrict distance calculation to
subspace
• + asymptotic misclassification rate (as the number of data
points n → ∞ ) is bounded above by twice the Bayes error
rate. (see Duda, Hart...)
48

k NN Pros. & Cons. Contd…

• - Affected by local structure


• - Sensitive to noise, irrelevant features
• - Computationally expensive O(nd)
• - Large memory requirements
• - More frequent classes dominate result (if distance not weighed in)
• - Curse of dimensionality: high nr. of dimensions and low nr. of
training samples:
• "nearest" neighbor might be very far
• in high dimensions "nearest" becomes meaningless
49

• Choice of k

• smaller k ⇒ higher variance (less stable)

• larger k ⇒ higher bias (less precise)

• Proper choice of k dependends on the data:


• Adaptive methods, heuristics
• Cross-validation
50

3.Stochastic Gradient Descent Classifier

• Gradient descent is an iterative algorithm, that starts from a


random point on a function and travels down its slope in steps
until it reaches the lowest point of that function.
Capable of handling large datasets
Deals with training instances independently
Well suited for online training
3.1 Performance measure - Cross Validation
Here K =10
51

4.Support Vector Machine


•SVM is to find a hyper plane in an N-dimensional space (N-
Number of features) that distinctly classifies the data points.
52

Support Vector Machine Contd..


53

Geometric Margin
Definition: The margin of example 𝑥 w.r.t. a linear sep. 𝑤 is the distance
from 𝑥 to the plane 𝑤 ⋅ 𝑥 = 0.

Definition: The margin 𝛾𝑤 of a set of examples 𝑆 wrt a linear separator 𝑤


is the smallest margin over points 𝑥 ∈ 𝑆.
Definition: The margin 𝛾 of a set of examples 𝑆 is the maximum
𝛾𝑤 over all linear separators 𝑤 . w
𝛾
- 𝛾 +
+
- +
- + +
- -
-
-
-
- better to definition of "Shitty”
-lines can turn into non-linear
shapes if you transform your data
"
"
Òthe kernel trickÓ
SVMs (support vector machines)
Ò*advanced* draw a line through itÓ
!!!!!!!!!!!
• works well on a lot of different shapes of data
thanks to the kernel trick

not super easy to explain to people can


only kinda do probabilities
55

Thank you !

obuleshcse@cvsr.ac.in
7

AI can Achieve

Robotics

Domain Rule Based


Specific Expert
Artificial Intelligence System
Computing

Machine
Learning
DL
12

Classification
Classification : Predicting of Discrete Value .i.e. Apple or Banana or 0 or 1

Applications
• Face recognition
• Character recognition ,
• Spam detection,
• Medical diagnosis: From symptoms to illnesses,
• Biometrics: Recognition/authentication using physical and/or behavioral
characteristics: Face, iris, signature, etc..
14

Supervised learning: Regression


label
Regression : Predicting label is
-4.5
real-valued
10.1

3.2

4.3

Supervised learning: given labeled examples


13

Regression Applications
• Economics/Finance: predict the value of a stock

• Epidemiology

• Car/plane navigation: angle of the steering wheel,


acceleration, …

• Temporal trends: weather over time


•…
16

Unsupervised learning
• Learn clusters/groups without any label

•Customer segmentation (i.e. grouping)

•Image compression

•Bioinformatics: learn motifs,


• …..

You might also like