Professional Documents
Culture Documents
•
p(j|t) is the relative frequency of j at node t
● Maximum (1-1/n): records equally distributed in n classes
● Minimum 0: all records in one class
Example
Revision Of Classification
Decision Tree
Naïve Bayes
Inductive Learning (Learning From Examples)
Different kinds of learning…
Semi-
Supervised Unsupervised Reinforcement
supervised
learning: learning: learning:
learning:
Learning
Scheme OR
Learning
Algorithm
Supervised Machine Learning Types
Types Supervised Machine Learning
Classification
Association
Regression \ Numeric Prediction
Also Known as Predictive Learning
Classification Algorithms
There are many algorithms for classification. Some of the
famous are following:
Decision trees
Naive Bayes classifier
Linear Regression
Logistic regression
Neural networks
Perceptron
Multi Layer Neural Networks
Support vector machines (SVM)
Quadratic classifiers
Instance Based Learning
k-nearest neighbor
Decision Tree Learning
Classification Training data may be used to create a
Decision Tree (Model) which consists of:
Nodes:
Represents the Test of Attributes/Features of Training Data
Edges:
Correspond to the outcome of a test which connect to the
next node or leaf
Leaves:
Predict the class(y)
Nominal and numeric attributes
Nominal:
number of children usually equal to number values
attribute won’t get tested more than once
Other possibility: division into two subsets
Numeric:
test whether value is greater or less than constant
attribute may get tested several times
Other possibility: three-way split (or multi-way split)
Integer: less than, equal to, greater than
Real: below, within, above
Classification using Decision Tree
Example (Restaurant)
Decision Tree of Restaurant Example
Alt Bar Fri Hun Pat Price Rain Res Type Est WillWait
Yes No yes Yes Full $ No Yes French 30–60 ?
Criterion for attribute selection
n
There is no way to efficiently search through the 22
trees
Want to get the smallest tree
Heuristic: choose the attribute that produces the
“purest” nodes
Strategy: choose attribute that gives greatest
information gain
Purity check of an attribute
Pure Impure
Measuring Purity
Question is How to Measure Purity?
Two Methods are used
Entropy Based Information Gain
Gini Index
Information Gain Used in ID3 and its commercial
version i.e. C5.4, J48 and J50
Gini Index is used in CART algorithm
implemented in Python Sklearn
Decision Trees
There are four cases to consider while creating
If the remaining examples are all positive (or all negative), then answer Yes
or No
If there are some positive and some negative examples, then choose the
best attribute to split them
No Example left: If there are no examples left, it means that no example
has been observed for this combination of attribute values, and we return a
default value calculated from the plurality classification
Noise in Data (No feature left): If there are no attributes left, but both
positive and negative examples, it means that these examples have exactly
the same description, but different classifications. This can happen because
there is an error or noise in the data; because the domain is on
deterministic; or because we can’t observe an attribute that would
distinguish the examples. The best we can do is return the plurality
classification of the remaining examples.
Example
(Weather Data)
Which attribute to select? 19
Computing information
Measure information in bits
A priori probability of H : 𝑷𝒓 𝐄 ∣ 𝑯
NO
How Naïve Bayes Deals….
Zero Frequency Problem
Missing Values
Numeric Values
The “zero-frequency problem”
What if an attribute value doesn’t occur with every class
value?
(e.g. “Humidity = high” for class “yes”)
Probability will be zero!
A posteriori probability will also be zero!
(No matter how likely the other values are!)
Remedy: add 1 to the count for every attribute value-
class combination (Laplace estimator)
Result: probabilities will never be zero!
(also: stabilizes probability estimates)
Missing values
Training: instance is not included in frequency
count for attribute value-class combination
Classification: attribute will be omitted from
calculation
Example:
• Sample mean
• Standard deviation
………..
………..
Gaussian Naïve Bays (Iris flower data set)
from sklearn import datasets
from sklearn.naive_bayes import GaussianNB
from sklearn import metrics
iris= datasets.load_iris()
X = iris.data
y = iris.target
print(X.shape)
print(y.shape)
nb = GaussianNB()
nb.fit(X,y).predict(X)
y_pred = nb.predict(X)
print(y_pred)
print ("Accuracy : ", metrics.accuracy_score(y_pred,y)*100)
Revision Of Classification
Decision Tree
Naïve Bayes
Inductive Learning (Learning From Examples)
Different kinds of learning…
Semi-
Supervised Unsupervised Reinforcement
supervised
learning: learning: learning:
learning:
Learning
Scheme OR
Learning
Algorithm
Supervised Machine Learning Types
Types Supervised Machine Learning
Classification
Association
Regression \ Numeric Prediction
Also Known as Predictive Learning
Classification Algorithms
There are many algorithms for classification. Some of the
famous are following:
Decision trees
Naive Bayes classifier
Linear Regression
Logistic regression
Neural networks
Perceptron
Multi Layer Neural Networks
Support vector machines (SVM)
Quadratic classifiers
Instance Based Learning
k-nearest neighbor
Decision Tree Learning
Classification Training data may be used to create a
Decision Tree (Model) which consists of:
Nodes:
Represents the Test of Attributes/Features of Training Data
Edges:
Correspond to the outcome of a test which connect to the
next node or leaf
Leaves:
Predict the class(y)
Nominal and numeric attributes
Nominal:
number of children usually equal to number values
attribute won’t get tested more than once
Other possibility: division into two subsets
Numeric:
test whether value is greater or less than constant
attribute may get tested several times
Other possibility: three-way split (or multi-way split)
Integer: less than, equal to, greater than
Real: below, within, above
Classification using Decision Tree
Example (Restaurant)
Decision Tree of Restaurant Example
Alt Bar Fri Hun Pat Price Rain Res Type Est WillWait
Yes No yes Yes Full $ No Yes French 30–60 ?
Criterion for attribute selection
n
There is no way to efficiently search through the 22
trees
Want to get the smallest tree
Heuristic: choose the attribute that produces the
“purest” nodes
Strategy: choose attribute that gives greatest
information gain
Purity check of an attribute
Pure Impure
Measuring Purity
Question is How to Measure Purity?
Two Methods are used
Entropy Based Information Gain
Gini Index
Information Gain Used in ID3 and its commercial
version i.e. C5.4, J48 and J50
Gini Index is used in CART algorithm
implemented in Python Sklearn
Decision Trees
There are four cases to consider while creating
If the remaining examples are all positive (or all negative), then answer Yes
or No
If there are some positive and some negative examples, then choose the
best attribute to split them
No Example left: If there are no examples left, it means that no example
has been observed for this combination of attribute values, and we return a
default value calculated from the plurality classification
Noise in Data (No feature left): If there are no attributes left, but both
positive and negative examples, it means that these examples have exactly
the same description, but different classifications. This can happen because
there is an error or noise in the data; because the domain is on
deterministic; or because we can’t observe an attribute that would
distinguish the examples. The best we can do is return the plurality
classification of the remaining examples.
Example
(Weather Data)
Which attribute to select? 19
Computing information
Measure information in bits
A priori probability of H : 𝑷𝒓 𝐄 ∣ 𝑯
NO
How Naïve Bayes Deals….
Zero Frequency Problem
Missing Values
Numeric Values
The “zero-frequency problem”
What if an attribute value doesn’t occur with every class
value?
(e.g. “Humidity = high” for class “yes”)
Probability will be zero!
A posteriori probability will also be zero!
(No matter how likely the other values are!)
Remedy: add 1 to the count for every attribute value-
class combination (Laplace estimator)
Result: probabilities will never be zero!
(also: stabilizes probability estimates)
Missing values
Training: instance is not included in frequency
count for attribute value-class combination
Classification: attribute will be omitted from
calculation
Example:
• Sample mean
• Standard deviation
………..
………..
Gaussian Naïve Bays (Iris flower data set)
from sklearn import datasets
from sklearn.naive_bayes import GaussianNB
from sklearn import metrics
iris= datasets.load_iris()
X = iris.data
y = iris.target
print(X.shape)
print(y.shape)
nb = GaussianNB()
nb.fit(X,y).predict(X)
y_pred = nb.predict(X)
print(y_pred)
print ("Accuracy : ", metrics.accuracy_score(y_pred,y)*100)