Machine Learning AND Predictive Modeling: Rabi Kulshi

MACHINE LEARNING
AND
PREDICTIVE MODELING
Prepared and presented by :
Rabi Kulshi
1
Contents
1. Why Machine Learning?

2. What is Machine Learning?
3. Type of Machine Learning
 Supervised Learning
 Regression
 Classification
 Unsupervised Learning
 Clustering
4. Implementation
 Regression Model
 Using Normal Equation
 Using Gradient Descent
 Classification Model
 Using Logistics Regression
 Using Naïve Bayesian Conditional Probability
 Clustering
 Using K Means
2
Introduction
Why Machine Learning?

 My first thought: Is it because humans can’t learn any more???
 Volume, Velocity and Variety of information drove the need for
machine learning
 Big Data, Hadoop and analytic challenges
 It was easy for human to estimate the house price based on 3-4
features like location, size, age and interest rate
 Imagine when you have understand the factors( 30-40) that drives
our user’s decision to buy an image
 Imagine when you have to detect a fraud attempt that has 100 –
300 features (attributes) and attempts are appearing at 100
messages/sec.
3
What is Machine Learning?
As defined in Wikipedia (Tom M. Mitchell) : A computer program is said to learn from experience E
with respect to some class of tasks T and performance measure P, if its performance at tasks in
T, as measured by P, improves with experience E
E
Successfully
labeled xp
observations
Train
Incoming new
events, T ask
E
Data set,
xp Observation,
% of correct
(Experiences)
classification
P erf
4
The Learning Model?
with continuous knowledge update
Successfully Update Knowledge update

labeled
observations
E xp knowledge • Based on sample
• Based on Filter
• Real time/batch
• Supervised update
Train
Incoming new
events, T ask
E
Data set,
xp Observation,
% of correct
(Experiences) classification
P erf
X{x1,x2,…xn}
f(x) y
5
Type of Machine Learning?
• Supervised Learning
In supervised learning a supervisor provides a set of labeled data,
also known as experience that is used to train the machine to build
knowledge (i.e. a model/function)
(customer Age, Geo location, Profession,

social group,) Purchase Image A (Yes or No)
• Unsupervised Learning
Find some structure or groups in the data. Also known as clustering
technique
Example :News groups, Cohesive groups in Facebook, customer groups
6
Examples of Supervised Learning
where outcome is a continuous variable
• Outcome or yield (Y) is a continuous

variable, i.e. we are predicting numeric value
Yah, this is a regression problem
Example : Revenue from on-line sales for next year
concurrent active user sessions
Here y is the outcome that depends on m features or

attributes X1, X2…...Xm
y  f(x  x  ........ x )
1 2 m
7
Examples of Supervised Learning
where outcome is discrete variable
• Outcome or yield is a discrete variable

Got it , this is a Classification problem
• Outcome has two classes
Example : Here the outcome Y will have one of the two values
: Yes or No,
: True or False
• Outcome has multiple classes
Example : Classify email tone as Threat, Anger, Appreciate,
Unhappy.. ; i.e. value of Y is either Threat or Anger or
Appreciate
8
Regression Model, Visual Analysis
On-line purchase of images per year (y)
4k
(Dependent variable, Yield, result)
3k
y   0  1 x1
y   0 x0  1 x1
2k
 x0 
y  (  0 , 1 )  
 x1 
1k
Y  T X
0 5 10 15
Passion for photography(x)
(Feature, independent variable)
9
Classification Model, Visual Analysis
Distance
from
decision
boundary
200 Decision
Purchase amount in $ (Feature)
Boundary
150
100
50
0 10 20 30
Time spend to search and make decision
(In minutes) (Feature)
10
Clustering Model, Visual Analysis
e
Type of Image purchased
20 30 40 50 60 60 70
Customer Age
11
Key considerations and concepts for implementation
and development of Machine learning Models
 Visualize your data to identify features,
understand and determine the model
 Carefully analyze Standard deviation or
percentage of false positive or false negative
 Set users expectations on the probability of
false conclusion, this is very important
 Keep a balance between Bias caused by
under fitting and Variance cause by over
fitting
 May decide to use Training, Validation and
Test (60%, 20%,20%) steps
 Regularization process and selection of
regularization parameter
12
A Few Machine Learning Techniques
Normal Equation
Regression
Model
Gradient Descent
Supervised
Learning Logistic Regression
(Gradient Descent)
Classification
Model Naive Bayesian
Machine Learning
Support Vector
Machine
K Mean
Unsupervised Clustering
Learning Model
Bisecting K Mean
13
Let’s take a
break for
questions
14
Linear Regression Model
Introduction to Linear model
y  f ( x)  a  bx  f ( x)  a  b x  f ( x)  a  b x  b x
11 11 2 2
f ( x)b  b x  b x  f ( x) b x  b x  b x
0 11 2 2 0 0 11 2 2
Where b a, x 1
0 0
f ( x)b x  b x  b x ........ b x
0 0 11 2 2 n n
Y is the outcome which depends on n features x1, x2,…xn,

where bi is the parameter of xi
15
Linear Regression Model
Compact and Matrix representation of Linear model
n
f ( x)  β x  β x  β x  ........  β x  f ( x)    x
0 0 1 1 2 2 n n i i
i 0
 f ( x)βTX
β  x 
 0  0 
   
β  x 
 1   1 
   
β  . , x  . 
   
.  . 
   
   
β n  x n 
   
16
Linear Regression
Find beta values using Normal equation
Solve analytically using well known Normal equation
method
β  (XTX)1 XT y
Well know methodology
Easy to implement
Computationally less expensive for small to
medium number of feature (<1000)
No need to select Learning rate that is
needed for Gradient Descent
Pay attention on matrix inverse issue in
case of linear dependency
17
Linear Regression
Find beta values Using Gradient descent
Cost function with m observations
1 m (i) (i) 2
C(β)   (f (x )  y )
2m i  1 β
Let’s understand gradient descent technique using example of an ant
who wants to reach the bottom of the hat
δ 1 m (i) (i) (i)
β :β α C(  ) β :β α  (f (x )  y ) x
j j δβ j j m i 1 β j
j
Now iterate through the above formula to optimize cost function . Stop
iteration when cost function does not change significantly between
iterations
J=1,2….n
And α is the Learning Rate
18
Classification Model
Solution using Logistic Regression
 Classification Model
 Y, the outcome can get one of the two values; 0=negative
(No), 1=positive (Yes)
 Email Spam, Email Threat, Fraudulent transaction
 Multi-class classification problem; Tone of the message
 The approach
 Compute the Decision Boundary based on training data and
that boundary that you will use to classify as YES or NO
depending on where (which side of the Decision Boundary) the
observation falls
 Depending on the dimension (i.e. number of features) The
Decision Boundary will be a line, a plane or a hyper plane (one
degree less than the number of features)
19
Logistic Regression Model Distance
from
decision
boundary
Purchase amount in $ (Feature)
4 Decision
Boundary
3
0 10 20 30
Time spend to search and make decision
(In minutes) (Feature)
20
Logistic Regression Model
f
β
(x) 
1
T x  0
 βT x
1 e
f
β
(x)  g(β T x) Predict Y=1 if g(z) ≥0.5, z ≥0
g(z) 
1 Predict Y=0 if g(z) <0.5, z <0
1  e z T x  0
The Model
 0<=f(x)<=1
 The function known as Sigmoid function
 The Cost function become a convex function
 Decision boundary
 Now we need to optimize the cost function for Logistics
regression for estimating beta using Gradient descent
21
Classification Model
Solution using Bayesian Model
• Based on conditional probability
• It assumes features are statistically independent variable
• We find the probability for a given observation to be classifies
as class ci, for i=1,2…n
• The observation will be classified as a type/category for which
this conditional probability is maximum
p( x1, x2, ..., xn / ci ) p(ci )

p(ci /( x1, x2, ..., xn ) 
p( x1, x2, ..., xn )
p( x1, x2, ..., xn / ci )  p( x1 / ci ) p( x2 / ci ).......p( xn / ci )
p( x1 / ci ) p( x2 / ci ).......p( xn / ci ) p(ci )
p(ci /( x1, x2, ..., xn ) 
p( x1, x2, ..., xn ) 22
Clustering Model with K Means
e
Type of Image purchased
20 30 40 50 60 60 70
Customer Age
23
Let’s take
some
Questions
24

Machine Learning AND Predictive Modeling: Rabi Kulshi

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Machine Learning AND Predictive Modeling: Rabi Kulshi

Uploaded by

Copyright:

Available Formats

MACHINE LEARNING

Prepared and presented by :

1. Why Machine Learning?

Why Machine Learning?

Successfully Update Knowledge update

(customer Age, Geo location, Profession,

• Outcome or yield (Y) is a continuous

Here y is the outcome that depends on m features or

• Outcome or yield is a discrete variable

Y is the outcome which depends on n features x1, x2,…xn,

p( x1, x2, ..., xn / ci ) p(ci )

You might also like