You are on page 1of 24

MACHINE LEARNING

AND
PREDICTIVE MODELING

Prepared and presented by :

Rabi Kulshi
1
Contents

1. Why Machine Learning?


2. What is Machine Learning?
3. Type of Machine Learning
 Supervised Learning
 Regression
 Classification
 Unsupervised Learning
 Clustering
4. Implementation
 Regression Model
 Using Normal Equation
 Using Gradient Descent
 Classification Model
 Using Logistics Regression
 Using Naïve Bayesian Conditional Probability
 Clustering
 Using K Means

2
Introduction

Why Machine Learning?


 My first thought: Is it because humans can’t learn any more???
 Volume, Velocity and Variety of information drove the need for
machine learning
 Big Data, Hadoop and analytic challenges
 It was easy for human to estimate the house price based on 3-4
features like location, size, age and interest rate
 Imagine when you have understand the factors( 30-40) that drives
our user’s decision to buy an image
 Imagine when you have to detect a fraud attempt that has 100 –
300 features (attributes) and attempts are appearing at 100
messages/sec.

3
What is Machine Learning?

As defined in Wikipedia (Tom M. Mitchell) : A computer program is said to learn from experience E
with respect to some class of tasks T and performance measure P, if its performance at tasks in
T, as measured by P, improves with experience E

E
Successfully
labeled xp
observations

Train

Incoming new
events, T ask

E
Data set,
xp Observation,
% of correct
(Experiences)
classification

P erf

4
The Learning Model?
with continuous knowledge update

Successfully Update Knowledge update


labeled
observations
E xp knowledge • Based on sample
• Based on Filter
• Real time/batch
• Supervised update

Train
Incoming new
events, T ask

E
Data set,

xp Observation,
% of correct
(Experiences) classification

P erf

X{x1,x2,…xn}
f(x) y
5
Type of Machine Learning?

• Supervised Learning
In supervised learning a supervisor provides a set of labeled data,
also known as experience that is used to train the machine to build
knowledge (i.e. a model/function)

(customer Age, Geo location, Profession,


social group,) Purchase Image A (Yes or No)

• Unsupervised Learning
Find some structure or groups in the data. Also known as clustering
technique
Example :News groups, Cohesive groups in Facebook, customer groups

6
Examples of Supervised Learning
where outcome is a continuous variable

• Outcome or yield (Y) is a continuous


variable, i.e. we are predicting numeric value
Yah, this is a regression problem
Example : Revenue from on-line sales for next year
concurrent active user sessions

Here y is the outcome that depends on m features or


attributes X1, X2…...Xm

y  f(x  x  ........ x )
1 2 m

7
Examples of Supervised Learning
where outcome is discrete variable

• Outcome or yield is a discrete variable


Got it , this is a Classification problem
• Outcome has two classes
Example : Here the outcome Y will have one of the two values
: Yes or No,
: True or False
• Outcome has multiple classes
Example : Classify email tone as Threat, Anger, Appreciate,
Unhappy.. ; i.e. value of Y is either Threat or Anger or
Appreciate

8
Regression Model, Visual Analysis
On-line purchase of images per year (y)

4k
(Dependent variable, Yield, result)

3k
y   0  1 x1
y   0 x0  1 x1
2k
 x0 
y  (  0 , 1 )  
 x1 
1k
Y  T X

0 5 10 15
Passion for photography(x)
(Feature, independent variable)
9
Classification Model, Visual Analysis
Distance
from
decision
boundary

200 Decision
Purchase amount in $ (Feature)

Boundary
150

100

50

0 10 20 30
Time spend to search and make decision
(In minutes) (Feature)

10
Clustering Model, Visual Analysis

e
Type of Image purchased

20 30 40 50 60 60 70
Customer Age

11
Key considerations and concepts for implementation
and development of Machine learning Models
 Visualize your data to identify features,
understand and determine the model
 Carefully analyze Standard deviation or
percentage of false positive or false negative
 Set users expectations on the probability of
false conclusion, this is very important
 Keep a balance between Bias caused by
under fitting and Variance cause by over
fitting
 May decide to use Training, Validation and
Test (60%, 20%,20%) steps
 Regularization process and selection of
regularization parameter

12
A Few Machine Learning Techniques
Normal Equation
Regression
Model

Gradient Descent

Supervised
Learning Logistic Regression
(Gradient Descent)

Classification
Model Naive Bayesian

Machine Learning
Support Vector
Machine

K Mean

Unsupervised Clustering
Learning Model

Bisecting K Mean

13
Let’s take a
break for
questions

14
Linear Regression Model
Introduction to Linear model

y  f ( x)  a  bx  f ( x)  a  b x  f ( x)  a  b x  b x
11 11 2 2
f ( x)b  b x  b x  f ( x) b x  b x  b x
0 11 2 2 0 0 11 2 2
Where b a, x 1
0 0
f ( x)b x  b x  b x ........ b x
0 0 11 2 2 n n

Y is the outcome which depends on n features x1, x2,…xn,


where bi is the parameter of xi

15
Linear Regression Model
Compact and Matrix representation of Linear model

n
f ( x)  β x  β x  β x  ........  β x  f ( x)    x
0 0 1 1 2 2 n n i i
i 0
 f ( x)βTX
β  x 
 0  0 
   
β  x 
 1   1 
   
β  . , x  . 
   
.  . 
   
   
β n  x n 
   

16
Linear Regression
Find beta values using Normal equation
Solve analytically using well known Normal equation
method

β  (XTX)1 XT y
Well know methodology
Easy to implement
Computationally less expensive for small to
medium number of feature (<1000)
No need to select Learning rate that is
needed for Gradient Descent
Pay attention on matrix inverse issue in
case of linear dependency

17
Linear Regression
Find beta values Using Gradient descent
Cost function with m observations
1 m (i) (i) 2
C(β)   (f (x )  y )
2m i  1 β
Let’s understand gradient descent technique using example of an ant
who wants to reach the bottom of the hat
δ 1 m (i) (i) (i)
β :β α C(  ) β :β α  (f (x )  y ) x
j j δβ j j m i 1 β j
j
Now iterate through the above formula to optimize cost function . Stop
iteration when cost function does not change significantly between
iterations
J=1,2….n
And α is the Learning Rate

18
Classification Model
Solution using Logistic Regression

 Classification Model
 Y, the outcome can get one of the two values; 0=negative
(No), 1=positive (Yes)
 Email Spam, Email Threat, Fraudulent transaction
 Multi-class classification problem; Tone of the message
 The approach
 Compute the Decision Boundary based on training data and
that boundary that you will use to classify as YES or NO
depending on where (which side of the Decision Boundary) the
observation falls
 Depending on the dimension (i.e. number of features) The
Decision Boundary will be a line, a plane or a hyper plane (one
degree less than the number of features)

19
Logistic Regression Model Distance
from
decision
boundary
Purchase amount in $ (Feature)

4 Decision
Boundary
3

0 10 20 30
Time spend to search and make decision
(In minutes) (Feature)

20
Logistic Regression Model
f
β
(x) 
1
T x  0
 βT x
1 e
f
β
(x)  g(β T x) Predict Y=1 if g(z) ≥0.5, z ≥0
g(z) 
1 Predict Y=0 if g(z) <0.5, z <0
1  e z T x  0
The Model
 0<=f(x)<=1
 The function known as Sigmoid function
 The Cost function become a convex function
 Decision boundary
 Now we need to optimize the cost function for Logistics
regression for estimating beta using Gradient descent

21
Classification Model
Solution using Bayesian Model
• Based on conditional probability
• It assumes features are statistically independent variable
• We find the probability for a given observation to be classifies
as class ci, for i=1,2…n
• The observation will be classified as a type/category for which
this conditional probability is maximum

p( x1, x2, ..., xn / ci ) p(ci )


p(ci /( x1, x2, ..., xn ) 
p( x1, x2, ..., xn )
p( x1, x2, ..., xn / ci )  p( x1 / ci ) p( x2 / ci ).......p( xn / ci )
p( x1 / ci ) p( x2 / ci ).......p( xn / ci ) p(ci )
p(ci /( x1, x2, ..., xn ) 
p( x1, x2, ..., xn ) 22
Clustering Model with K Means

e
Type of Image purchased

20 30 40 50 60 60 70
Customer Age

23
Let’s take
some
Questions

24

You might also like