You are on page 1of 40

Classification

What is Classification?
• Classification is a data mining function that assigns items in a collection
to target categories or classes. The goal of classification is to accurately
predict the target class for each case in the data.
--Oracle
For example, a classification model could be used to identify loan applicants
as low, medium, or high credit risks.
General Approach to Classification
The Data Classification process includes two steps −

• Building the Classifier or Model (Learning Step)


• Using Classifier for Classification
General Approach to Classification(STEP-1)
General Approach to Classification(STEP-2)
What is Supervised Learning?
• Supervised learning is when the model is getting trained on a labelled
dataset. Labelled dataset is one which have both input and output
parameters.
What is Supervised Learning?
Supervised learning problems can be further grouped into regression and
classification problems.
• Classification: A classification problem is when the output variable is a
category, such as “red” or “blue” or “disease” and “no disease”.
• Regression: A regression problem is when the output variable is a real
value, such as “dollars” or “weight”.
What is Supervised Learning?

Classification Regression
What is Unsupervised Learning?
• Unsupervised learning is a type of self-organized Hebbian learning that
helps find previously unknown patterns in data set without pre-existing
labels.
What is Unsupervised Learning?
Unsupervised learning problems can be further grouped into clustering and
association problems.
• Clustering: A clustering problem is where you want to discover the
inherent groupings in the data, such as grouping customers by
purchasing behavior.
• Association: An association rule learning problem is where you want to
discover rules that describe large portions of your data, such as people
that buy X also tend to buy Y.
What is Unsupervised Learning?
What is Unsupervised Learning?
What is Unsupervised Learning?
Classification Techniques
• Decision Trees
• Bayesian Classifiers
• Neural Networks
• K-Nearest Neighbour
• Support Vector Machines
• Linear Regression
• Logistic Regression
Decision Tree
• Decision tree algorithm falls under the category of supervised learning. They
can be used to solve both regression and classification problems.
• Decision tree uses the tree representation to solve the problem in which
each leaf node corresponds to a class label and attributes are represented
on the internal node of the tree.
Decision Tree
How does the Decision Tree algorithm work?
1. Select the best attribute using Attribute Selection Measures(ASM) to split the records.
2. Make that attribute a decision node and breaks the dataset into smaller subsets.
3. Starts tree building by repeating this process recursively for each child until one of the
condition will match:
a) All the tuples belong to the same attribute value.
b) There are no more remaining attributes.
c) There are no more instances.
Attribute Selection Measures
• Attribute selection measure is a heuristic for selecting the splitting criterion that partition
data into the best possible manner.
• It is also known as splitting rules because it helps us to determine breakpoints for tuples
on a given node.
• ASM provides a rank to each feature(or attribute) by explaining the given dataset. Best
score attribute will be selected as a splitting attribute
Attribute Selection Measures
Most popular selection measures are
• Information Gain
• Gain Ratio
• Gini Index
Attribute Selection Measures(Information Gain)
Information gain is a statistical property that measures how well a given attribute separates
the training examples according to their target classification.
Attribute Selection Measures(Information Gain)
Attribute Selection Measures(Information Gain)
Entropy: Shannon invented the concept of entropy, which measures the impurity of the
input set.
Attribute Selection Measures(Information Gain)
• entropy is 0 if all the members of S belong to
the same class. For example, if all members
are positive, Entropy(S) = 0.
• Entropy is 1 when the sample contains an
equal number of positive and negative
examples.
• If the sample contains an unequal number of
positive and negative examples, entropy is
between 0 and 1.
Attribute Selection Measures(Information Gain)
Entropy can be calculated using the formula:-

Here p and q is the probability of success and failure respectively in that node.
Attribute Selection Measures(Information Gain)
Definition: Suppose S is a set of instances, A is an attribute, Sv is the subset of S with A = v,
and Values (A) is the set of all possible values of A, then

Information gain is the difference between the original and expected information
that is required to classify the tuples of dataset D
Attribute Selection Measures(Information Gain)
Day Outlook Temperature Humidity Wind Class: Play
Golf
1 Sunny Hot High Weak No
2 Sunny Hot High Strong No
3 Overcast Hot High Weak Yes
4 Rain Mild High Weak Yes
5 Rain Cool Normal Weak Yes
6 Rain Cool Normal Strong No
7 Overcast Cool Normal Strong Yes
8 Sunny Mild High Weak No
9 Sunny Cool Normal Weak Yes
10 Rain Mild Normal Weak Yes
11 Sunny Mild Normal Strong Yes
12 Overcast Mild High Strong Yes
13 Overcast Hot Normal Weak Yes
14 Rain Mild High Strong No
Attribute Selection Measures(Information Gain)
To build a decision tree, we need to calculate two types of entropy using frequency tables
as follows:
Step1: Entropy using the frequency table of one attribute:
Attribute Selection Measures(Information Gain)
Step2: Entropy using the frequency table of two attributes:
Attribute Selection Measures(Information Gain)
Attribute Selection Measures(Information Gain)
Attribute Selection Measures(Information Gain)
Step 3: Choose attribute with the largest information gain as the decision node, divide the
dataset by its branches and repeat the same process on every branch.
Attribute Selection Measures(Information Gain)
Step 4a: A branch with entropy of 0 is a leaf node.
Attribute Selection Measures(Information Gain)
Step 4b: A branch with entropy more than 0 needs further splitting.
Attribute Selection Measures(Information Gain)
Decision Tree to Decision Rules
Attribute Selection Measures(Information Gain)
Highly-branching attributes
 Problematic: attributes with a large number of values
• extreme case: each example has its own value. e.g. example ID; Day attribute in
weather data
 Subsets are more likely to be pure if there is a large number of different attribute
values
• Information gain is biased towards choosing attributes with a large number of
values.
Attribute Selection Measures(Information Gain)
This may cause several problems:
Overfitting: selection of an attribute that is non-optimal for prediction
Fragmentation: data are fragmented into (too) many small sets
Attribute Selection Measures(Information Gain)
Attribute Selection Measures(Gain Ration)
Information gain ratio is a ratio of information gain to the intrinsic information
Attribute Selection Measures(Gain Ration)
Attribute Selection Measures(Gain Ration)

You might also like