You are on page 1of 2

DMT Session-3

By Kushal Anjaria

The second pattern that we will study is the classification algorithm. In Data Mining, one of the most common tasks is
to build models to predict the class of an object based on its attributes. Here the object can be seen as a customer, patient,
transaction, email message, or even a single character. Characteristics of such objects can be, for example, for the patient
object, heart rate, blood pressure, weight, and gender. In contrast, the patient object's class would most commonly be
positive/negative for a specific disease. This section considers learning a classification tree model using the data we
have about such objects. Figure-1 illustrates the classification task in detail.

Given a collection of records (training set), each record contains a set of attributes; one of the attributes is class. Find a
model of a class attribute as a function of the values of the other variable

Goal: previously unseen records should be assigned a class as accurately as possible.


The test set is used to determine the accuracy of the model. Usually, the give data set is divided into training, and test
sets, with the training set, used to build the model and the test set used to validate it.
E.g., Email is spam or not, story or tweet falls under which category, doctor diagnosis
Multiple data mining techniques can perform the classification task. From the set of techniques, we will initiate with the
decision tree technique. The advantage of the decision tree technique is it is highly human interpretable. As a result, the
decision tree approach hardly requires any transformation.
Decision Tree
It is a classification method where we accept the training data set and design a decision tree. The decision tree helps us
in making the practical decision
The toy training set is as follows:
From the training dataset, the decision tree can be drawn as follows:
Outlook

If you observe the decision tree, then you will realize that

• Each internal node tests the attribute


• Each branch corresponds to the attribute value
• Each leaf-node assigns a classification

In data mining, we aim to design the best decision tree which best fits our training set. Best fit means for each row of
the training set; our decision tree should give us the correct result.

You might also like