05 Classification NB

Introduction to Machine Learning
Supervised Learning
(Naïve Bayes Algorithm)
Dr. Hikmat Ullah Khan
September 27,
Dr. Hikmat Ullah Khan 1
2018
Supervised vs. Unsupervised Learning
 Supervised learning (classification)

 Supervision:
 The training data (observations, measurements, etc.)
 labels indicating the class of the observations
 Training data is used to Learn from the data
 Test data evaluates the learning
 New data is classified based on the training set
 Applications
 Classification / Prediction
 Detection /recognition
2
Supervised vs. Unsupervised Learning
 Unsupervised learning (clustering)

 No concept of Class label
 The class labels of training data is unknown
 Given a set of measurements, observations, etc. with the
aim of establishing the groups in the data
 Groups are also known as Clusters
 Aim is to group
 Max intra cluster Similarity
 Min Inter cluster Similarity
3
Prediction Problems: Classification vs.
Numeric Prediction
 Classification
 predicts categorical class labels
 classifies data (constructs a model) based on the training

set and the values (class labels) in a classifying attribute
and uses it in classifying new data
 Numeric Prediction or Regression
 models continuous-valued functions, i.e., predicts
unknown or missing values
 Typical applications for Decision making
 Credit/loan approval:
 Medical diagnosis: if a tumor is cancerous or benign
 Fraud detection: if a transaction is fraudulent
 Sentiment : which category it is, positive, negative, neutral
4
Classification—A Two-Step Process
 Model construction: describing a set of predetermined classes
 Each tuple/sample is assumed to belong to a predefined class, as
determined by the class label attribute

 The set of tuples used for model construction is training set
 The model is represented as classification rules, decision trees, or
mathematical formulae
 Model usage: for classifying future or unknown objects
 Estimate accuracy of the model
 The known label of test sample is compared with the classified
result from the model

 Accuracy rate is the percentage of test set samples that are
correctly classified by the model

 Test set is independent of training set (otherwise overfitting)
5
Process (1): Model Construction
Classification
Algorithms
Training
Data
NAME RANK YEARS TENURED Classifier

M ike A ssistant P rof 3 no (Model)
M ary A ssistant P rof 7 yes
B ill P rofessor 2 yes
Jim A ssociate P rof 7 yes
D ave A ssistant P rof 6 no
IF rank = ‘professor’
A Dr.
nne A ssociate P rof 3 no
OR years
September>27,6
Hikmat Ullah Khan 6
THEN tenured = ‘yes’
2018
Process (2): Using the Model in Prediction
Classifier
Testing
Data Unseen Data
NAME RANK YEARS TENURED (Jeff, Professor, 4)

T om A ssistant P rof 2 no
M erlisa A ssociate P rof 7 no Tenured?
G eorge P rofessor 5 yes
September 27,
Joseph A ssistant
Dr. Hikmat Ullah Khan P rof 7 yes 2018
7
Attributes/Dimesions/Features
age income Student credit_Rating buys PC
<=30 high no fair no

<=30 high no excellent no
31…40 high no fair yes
>40 medium no fair yes
>40 low yes fair yes
>40 low yes excellent no
31…40 low yes excellent yes
<=30 medium no fair no
<=30 low yes fair yes
>40 medium yes fair yes
<=30 medium yes excellent yes
31…40 medium no excellent yes
31…40 high yes fair yes
>40 medium no excellent no
September 27,
2018
Bayesian Theorem: Basics
O Let X be a data sample (“evidence”):
O Let H be a hypothesis that X belongs to class C
O Classification is to determine P(H|X), the probability that the
hypothesis holds given the observed data sample X
O P(H) (prior probability), the initial probability
O E.g., X will buy computer, regardless of age, income, …
O P(X): probability that sample data is observed
O P(X|H) (posteriori probability), the probability of observing
the sample X, given that the hypothesis holds
O E.g., Given that X will buy computer, the prob. that X is
31..40, medium income
September 27,
2018
Bayesian Theorem
O Given training data X, posteriori probability of a
hypothesis H, P(H|X), follows the Bayes theorem
P(H | X)  P(X | H )P(H )

P(X)
O Informally, this can be written as
posteriori = likelihood x prior/(evidence)
September 27,
2018
Towards Naïve Bayesian Classifier
O This can be derived from Bayes’ theorem
P(X | C )P(C )
P(C | X)  i i
i P(X)
O Since P(X) is constant for all the classes, we have only
P(C | X)  P(X| C )P(C )

i i i
September 27,
2018
Naïve Bayesian Classifier: Training Dataset
age income studentcredit_rating
buys_compute
Class: <=30 high no fair no
C1:buys_computer = ‘yes’
C2:buys_computer = ‘no’ >40 medium no fair yes
Data sample >40 low yes excellent no
X = (age <=30, 31…40 low yes excellent yes
Income = medium, <=30 medium no fair no
Student = yes
Credit_rating = Fair) <=30 medium yes excellent yes
September 27, 2018 Dr. Hikmat Ullah Khan 12

Naïve Bayesian Classifier: An Example
• P(Ci): P(buys_computer = “yes”) = 9/14 = 0.643
P(buys_computer = “no”) = 5/14= 0.357
• age income studentcredit_rating

buys_computer
<=30 high no fair no
>40 medium no fair yes
>40 low yes excellent no
31…40 low yes excellent yes
<=30 medium no fair no
<=30 medium yes excellent yes
X = (age <= 30 , income = medium, student = yes, credit_rating = fair)

• Compute P(X|Ci) for each class

P(age = “<=30” | buys_computer = “yes”) = 2/9 = 0.222
P(age = “<= 30” | buys_computer = “no”) = 3/5 = 0.6
P(income = “medium” | buys_computer = “yes”) = 4/9 = 0.444
P(income = “medium” | buys_computer = “no”) = 2/5 = 0.4
P(student = “yes” | buys_computer = “yes) = 6/9 = 0.667
P(student = “yes” | buys_computer = “no”) = 1/5 = 0.2
P(credit_rating = “fair” | buys_computer = “yes”) = 6/9 = 0.667
P(credit_rating = “fair” | buys_computer = “no”) = 2/5 = 0.4
X = (age <= 30 , income = medium, student = yes, credit_rating = fair)

Naïve Bayesian Classifier: An Example

• X = (age <= 30 , income = medium, student = yes, credit_rating = fair)
P(X|Ci) : P(X|buys_computer = “yes”)= 0.222 x 0.444 x 0.667 x 0.667 = 0.044

P(X|buys_computer = “no”) = 0.6 x 0.4 x 0.2 x 0.4 = 0.019


• X = (age <= 30 , income = medium, student = yes, credit_rating = fair)

P(X|Ci) : P(X|buys_computer = “yes”) = 0.222 x 0.444 x 0.667 x 0.667 = 0.044
P(X|buys_computer = “no”) = 0.6 x 0.4 x 0.2 x 0.4 = 0.019
P(X|Ci)*P(Ci) : P(X|buys_computer = “yes”) * P(buys_computer = “yes”) = 0.028

P(X|buys_computer = “no”) * P(buys_computer = “no”) = 0.007
Therefore, X belongs to class (“buys_computer = yes”)

Naïve Bayes Classifier: Comments
• Advantages
– Easy to implement
– Good results obtained in most of the cases
• Disadvantages
– Assumption: class conditional independence, therefore loss of
accuracy
– Practically, dependencies exist among variables
• E.g., hospitals: patients: Profile: age, family history, etc.
Symptoms: fever, cough etc., Disease: lung cancer,
diabetes, etc.
• Dependencies among these cannot be modeled by Naïve
Bayes Classifier
17
Exercise
September 27,
18
Task
O Given a new instance, predict its label
x’=(Outlook=Sunny, Temperature=Cool,
Humidity=High, Wind=Strong)
September 27,
19
September 27,
20

05 Classification NB

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

05 Classification NB

Uploaded by

Copyright:

Available Formats

Introduction to Machine Learning

Dr. Hikmat Ullah Khan

 Supervised learning (classification)

 Unsupervised learning (clustering)

 classifies data (constructs a model) based on the training

 Medical diagnosis: if a tumor is cancerous or benign

 Fraud detection: if a transaction is fraudulent

 Sentiment : which category it is, positive, negative, neutral

determined by the class label attribute

 The model is represented as classification rules, decision trees, or

 The known label of test sample is compared with the classified

result from the model

correctly classified by the model

NAME RANK YEARS TENURED Classifier

NAME RANK YEARS TENURED (Jeff, Professor, 4)

<=30 high no fair no

P(H | X)  P(X | H )P(H )

O Informally, this can be written as

posteriori = likelihood x prior/(evidence)

P(C | X)  P(X| C )P(C )

September 27, 2018 Dr. Hikmat Ullah Khan 12

• age income studentcredit_rating

X = (age <= 30 , income = medium, student = yes, credit_rating = fair)

September 27, 2018 Dr. Hikmat Ullah Khan 13

• Compute P(X|Ci) for each class

X = (age <= 30 , income = medium, student = yes, credit_rating = fair)

September 27, 2018 Dr. Hikmat Ullah Khan 14

• Compute P(X|Ci) for each class

• X = (age <= 30 , income = medium, student = yes, credit_rating = fair)

P(X|Ci) : P(X|buys_computer = “yes”)= 0.222 x 0.444 x 0.667 x 0.667 = 0.044

September 27, 2018 Dr. Hikmat Ullah Khan 15

• Compute P(X|Ci) for each class

• X = (age <= 30 , income = medium, student = yes, credit_rating = fair)

P(X|Ci)*P(Ci) : P(X|buys_computer = “yes”) * P(buys_computer = “yes”) = 0.028

Therefore, X belongs to class (“buys_computer = yes”)

September 27, 2018 Dr. Hikmat Ullah Khan 16

You might also like

P(X|Ci)P(Ci) : P(X|buys_computer = “yes”) P(buys_computer = “yes”) = 0.028