You are on page 1of 20

Introduction to Machine Learning

Supervised Learning
(Naïve Bayes Algorithm)

Dr. Hikmat Ullah Khan

September 27,
Dr. Hikmat Ullah Khan 1
2018
Supervised vs. Unsupervised Learning

 Supervised learning (classification)


 Supervision:
 The training data (observations, measurements, etc.)
 labels indicating the class of the observations
 Training data is used to Learn from the data
 Test data evaluates the learning
 New data is classified based on the training set

 Applications
 Classification / Prediction
 Detection /recognition

2
Supervised vs. Unsupervised Learning

 Unsupervised learning (clustering)


 No concept of Class label
 The class labels of training data is unknown
 Given a set of measurements, observations, etc. with the
aim of establishing the groups in the data
 Groups are also known as Clusters
 Aim is to group
 Max intra cluster Similarity
 Min Inter cluster Similarity

3
Prediction Problems: Classification vs.
Numeric Prediction
 Classification
 predicts categorical class labels

 classifies data (constructs a model) based on the training


set and the values (class labels) in a classifying attribute
and uses it in classifying new data
 Numeric Prediction or Regression
 models continuous-valued functions, i.e., predicts
unknown or missing values
 Typical applications for Decision making
 Credit/loan approval:

 Medical diagnosis: if a tumor is cancerous or benign

 Fraud detection: if a transaction is fraudulent

 Sentiment : which category it is, positive, negative, neutral

4
Classification—A Two-Step Process
 Model construction: describing a set of predetermined classes
 Each tuple/sample is assumed to belong to a predefined class, as

determined by the class label attribute


 The set of tuples used for model construction is training set

 The model is represented as classification rules, decision trees, or

mathematical formulae
 Model usage: for classifying future or unknown objects
 Estimate accuracy of the model

 The known label of test sample is compared with the classified

result from the model


 Accuracy rate is the percentage of test set samples that are

correctly classified by the model


 Test set is independent of training set (otherwise overfitting)

5
Process (1): Model Construction
Classification
Algorithms
Training
Data

NAME RANK YEARS TENURED Classifier


M ike A ssistant P rof 3 no (Model)
M ary A ssistant P rof 7 yes
B ill P rofessor 2 yes
Jim A ssociate P rof 7 yes
D ave A ssistant P rof 6 no
IF rank = ‘professor’
A Dr.
nne A ssociate P rof 3 no
OR years
September>27,6
Hikmat Ullah Khan 6
THEN tenured = ‘yes’
2018
Process (2): Using the Model in Prediction

Classifier

Testing
Data Unseen Data

NAME RANK YEARS TENURED (Jeff, Professor, 4)


T om A ssistant P rof 2 no
M erlisa A ssociate P rof 7 no Tenured?
G eorge P rofessor 5 yes
September 27,
Joseph A ssistant
Dr. Hikmat Ullah Khan P rof 7 yes 2018
7
Attributes/Dimesions/Features
age income Student credit_Rating buys PC

<=30 high no fair no


<=30 high no excellent no
31…40 high no fair yes
>40 medium no fair yes
>40 low yes fair yes
>40 low yes excellent no
31…40 low yes excellent yes
<=30 medium no fair no
<=30 low yes fair yes
>40 medium yes fair yes
<=30 medium yes excellent yes
31…40 medium no excellent yes
31…40 high yes fair yes
>40 medium no excellent no
September 27,
Dr. Hikmat Ullah Khan 8
2018
Bayesian Theorem: Basics
O Let X be a data sample (“evidence”):
O Let H be a hypothesis that X belongs to class C
O Classification is to determine P(H|X), the probability that the
hypothesis holds given the observed data sample X
O P(H) (prior probability), the initial probability
O E.g., X will buy computer, regardless of age, income, …
O P(X): probability that sample data is observed
O P(X|H) (posteriori probability), the probability of observing
the sample X, given that the hypothesis holds
O E.g., Given that X will buy computer, the prob. that X is
31..40, medium income
September 27,
Dr. Hikmat Ullah Khan 9
2018
Bayesian Theorem
O Given training data X, posteriori probability of a
hypothesis H, P(H|X), follows the Bayes theorem

P(H | X)  P(X | H )P(H )


P(X)

O Informally, this can be written as

posteriori = likelihood x prior/(evidence)

September 27,
Dr. Hikmat Ullah Khan 10
2018
Towards Naïve Bayesian Classifier
O This can be derived from Bayes’ theorem

P(X | C )P(C )
P(C | X)  i i
i P(X)
O Since P(X) is constant for all the classes, we have only

P(C | X)  P(X| C )P(C )


i i i

September 27,
Dr. Hikmat Ullah Khan 11
2018
Naïve Bayesian Classifier: Training Dataset
age income studentcredit_rating
buys_compute
Class: <=30 high no fair no
<=30 high no excellent no
C1:buys_computer = ‘yes’
31…40 high no fair yes
C2:buys_computer = ‘no’ >40 medium no fair yes
>40 low yes fair yes
Data sample >40 low yes excellent no
X = (age <=30, 31…40 low yes excellent yes
Income = medium, <=30 medium no fair no
<=30 low yes fair yes
Student = yes
>40 medium yes fair yes
Credit_rating = Fair) <=30 medium yes excellent yes
31…40 medium no excellent yes
31…40 high yes fair yes
>40 medium no excellent no

September 27, 2018 Dr. Hikmat Ullah Khan 12


Naïve Bayesian Classifier: An Example
• P(Ci): P(buys_computer = “yes”) = 9/14 = 0.643
P(buys_computer = “no”) = 5/14= 0.357

• age income studentcredit_rating


buys_computer
<=30 high no fair no
<=30 high no excellent no
31…40 high no fair yes
>40 medium no fair yes
>40 low yes fair yes
>40 low yes excellent no
31…40 low yes excellent yes
<=30 medium no fair no
<=30 low yes fair yes
>40 medium yes fair yes
<=30 medium yes excellent yes
31…40 medium no excellent yes
31…40 high yes fair yes
>40 medium no excellent no

X = (age <= 30 , income = medium, student = yes, credit_rating = fair)

September 27, 2018 Dr. Hikmat Ullah Khan 13


• P(Ci): P(buys_computer = “yes”) = 9/14 = 0.643
P(buys_computer = “no”) = 5/14= 0.357

• Compute P(X|Ci) for each class


P(age = “<=30” | buys_computer = “yes”) = 2/9 = 0.222
P(age = “<= 30” | buys_computer = “no”) = 3/5 = 0.6
P(income = “medium” | buys_computer = “yes”) = 4/9 = 0.444
P(income = “medium” | buys_computer = “no”) = 2/5 = 0.4
P(student = “yes” | buys_computer = “yes) = 6/9 = 0.667
P(student = “yes” | buys_computer = “no”) = 1/5 = 0.2
P(credit_rating = “fair” | buys_computer = “yes”) = 6/9 = 0.667
P(credit_rating = “fair” | buys_computer = “no”) = 2/5 = 0.4

X = (age <= 30 , income = medium, student = yes, credit_rating = fair)

September 27, 2018 Dr. Hikmat Ullah Khan 14


Naïve Bayesian Classifier: An Example
• P(Ci): P(buys_computer = “yes”) = 9/14 = 0.643
P(buys_computer = “no”) = 5/14= 0.357

• Compute P(X|Ci) for each class


P(age = “<=30” | buys_computer = “yes”) = 2/9 = 0.222
P(age = “<= 30” | buys_computer = “no”) = 3/5 = 0.6
P(income = “medium” | buys_computer = “yes”) = 4/9 = 0.444
P(income = “medium” | buys_computer = “no”) = 2/5 = 0.4
P(student = “yes” | buys_computer = “yes) = 6/9 = 0.667
P(student = “yes” | buys_computer = “no”) = 1/5 = 0.2
P(credit_rating = “fair” | buys_computer = “yes”) = 6/9 = 0.667
P(credit_rating = “fair” | buys_computer = “no”) = 2/5 = 0.4

• X = (age <= 30 , income = medium, student = yes, credit_rating = fair)

P(X|Ci) : P(X|buys_computer = “yes”)= 0.222 x 0.444 x 0.667 x 0.667 = 0.044


P(X|buys_computer = “no”) = 0.6 x 0.4 x 0.2 x 0.4 = 0.019

September 27, 2018 Dr. Hikmat Ullah Khan 15


• P(Ci): P(buys_computer = “yes”) = 9/14 = 0.643
P(buys_computer = “no”) = 5/14= 0.357

• Compute P(X|Ci) for each class


P(age = “<=30” | buys_computer = “yes”) = 2/9 = 0.222
P(age = “<= 30” | buys_computer = “no”) = 3/5 = 0.6
P(income = “medium” | buys_computer = “yes”) = 4/9 = 0.444
P(income = “medium” | buys_computer = “no”) = 2/5 = 0.4
P(student = “yes” | buys_computer = “yes) = 6/9 = 0.667
P(student = “yes” | buys_computer = “no”) = 1/5 = 0.2
P(credit_rating = “fair” | buys_computer = “yes”) = 6/9 = 0.667
P(credit_rating = “fair” | buys_computer = “no”) = 2/5 = 0.4

• X = (age <= 30 , income = medium, student = yes, credit_rating = fair)


P(X|Ci) : P(X|buys_computer = “yes”) = 0.222 x 0.444 x 0.667 x 0.667 = 0.044
P(X|buys_computer = “no”) = 0.6 x 0.4 x 0.2 x 0.4 = 0.019

P(X|Ci)*P(Ci) : P(X|buys_computer = “yes”) * P(buys_computer = “yes”) = 0.028


P(X|buys_computer = “no”) * P(buys_computer = “no”) = 0.007

Therefore, X belongs to class (“buys_computer = yes”)

September 27, 2018 Dr. Hikmat Ullah Khan 16


Naïve Bayes Classifier: Comments
• Advantages
– Easy to implement
– Good results obtained in most of the cases
• Disadvantages
– Assumption: class conditional independence, therefore loss of
accuracy
– Practically, dependencies exist among variables
• E.g., hospitals: patients: Profile: age, family history, etc.
Symptoms: fever, cough etc., Disease: lung cancer,
diabetes, etc.
• Dependencies among these cannot be modeled by Naïve
Bayes Classifier

17
Exercise

September 27,
Dr. Hikmat Ullah Khan 2018
18
Task
O Given a new instance, predict its label

x’=(Outlook=Sunny, Temperature=Cool,
Humidity=High, Wind=Strong)

September 27,
Dr. Hikmat Ullah Khan 2018
19
September 27,
Dr. Hikmat Ullah Khan 2018
20

You might also like