You are on page 1of 5

Using Decision Trees to predict customer behaviour

In the latest instalment of his series on Customer Relationship Management, Khalid Sheikh explains
how CRM can be used to predict customer behaviour, a vital need in many businesses

A Decision Tree is a predictive model that is used to make predictions through a classification
process. The predictive model is represented as an upside down Tree—root at the top (or on the
left-hand side) and leaves at the bottom (or on the right-hand side).

Decision Trees represent rules. By following the Tree, you can decipher the rules and understand
why a record is classified in a certain way. These rules can then be used to retrieve records falling
into a certain category, and the known behaviour of the category is the predicted behaviour of the
entity represented by the record.

In CRM, Decision Trees can be used to classify existing customer records into customer segments
that behave in a particular manner. The process starts with data related to customers whose
behaviour is already known; for example, customers who have responded to a promotional
campaign and those who have not; or customers who have churned (left the service for a
competitor) and those who have not. The Decision Tree developed from this data gives us the
splitting attributes and criterion that divide customers into two categories. Once the rules that
determine the classes to which different customers belong are known, they can be used to classify
existing customers and predict behaviour in future. For example, a customer whose record shows
attributes similar to those customers who have churned in the recent past is more likely to churn,
and that is the prediction that marketers are looking for to plan activities to pre-empt the churn.

Classification classes

A set of classification classes can be defined for a database having a large number of records such
that each record belongs to one of the given classes. The classification process decides the class to
which a given record belongs. The classification process in Decision Trees is also concerned with
generating a description or a (predictive) model for each class from the given data set.

Predictive modelling
Predictive modelling is similar to the human learning experience in using observations to form a
model of the important characteristics of some phenomenon. This approach uses generalisations of
the ‘real world’ and the ability to fit new data into a general framework. Predictive modelling can be
used to analyse an existing database to determine some essential characteristics (model) about the
data set. The model is developed using a supervised learning approach.

This has two phases: training and testing. Training builds a model using a large sample of historical
data called a training set, while testing involves trying out the model on new, previously unseen data
called a test set, to determine its accuracy and physical performance characteristics. Applications of
predictive modelling include customer retention management, credit approval, cross-selling, and
direct marketing. Supervised classification is one of the techniques associated with predictive
modelling.

Supervised classification

In supervised classification,

A training data set is used to generate the class descriptions (predictive models). For each record of
the training set, the respective class to which it belongs is also known. Using the training set, the
classification process attempts to generate the descriptions of classes (predictive models). These
descriptions are then used to classify the unclassified records.

A test data set is used to measure the effectiveness of a classification. A test data set can be used to
determine the effectiveness of a classification method. A set of test records whose classifications are
already known are passed through the classifier and the resulting classifications are compared with
the known classifications. The percentage of matching classifications is the measure of effectiveness
of the classification method.

There are several approaches to supervised classifications. Decision Trees are especially attractive in
the data-mining environment as they represent rules. Rules can be easily expressed in natural
languages, and they can be easily mapped to a database access language like SQL. To summarise:

A Decision Tree represents a series of questions. Good questions produce a short series of questions.

Each question determines what follow-up question is best to be asked next.


The leaves represent the most specific classification for a data record. Decision Trees are drawn with
the root at the top (or on the left-hand side) and the leaves at the bottom (or on the right). The root
represents the most general classification—the entire dataset, the leaves represent the most specific
classification. A data record enters the Decision Tree at the root node (the top) and then the record
works its way down until it reaches a leaf node. The leaf node determines the most specific
classification of the record.

Effectiveness can be enhanced by pruning the incompetent branches. Some paths are better than
others are because the rules associated with them are better. The predictive effectiveness of the
whole Tree can be enhanced by pruning incompetent branches.

Building the Decision Tree Algorithm

The algorithm attempts to find the test that will split records in the best possible manner among the
wanted classification.

At each lower level node from the root, whatever rule works best to split the subset is applied.

The process of finding each additional level of the Tree continues. The Tree is allowed to grow until
you cannot find better ways to split the input records.

Process of creating Decision Trees

All Decision Tree construction methods are based on the principle of recursively partitioning the data
set till homogeneity is achieved. The construction of a Decision Tree involves the following phases:

Construction phase: The initial Decision Tree is constructed in this phase, based on the entire
training data set. It requires recursively partitioning the training set into two, or more sub-partitions
using a splitting criterion, until a stopping criterion is met.

Pruning phase: The pruning phase involves removing some of the lower branches and nodes to
improve performance. The Tree constructed in the previous phase may not result in the best
possible set of rules due to overfitting. Often the training dataset used for constructing a Decision
Tree may not be a proper representative of the real-life situation and may contain noise. While
building a Decision Tree from a noisy training data set, it might be prudent to grow the Decision Tree
just deeply enough to guard against the possibility of incorporating unnecessary features making the
Tree difficult to comprehend. A Decision Tree T is said to overfit the training data if there exists some
other Decision Tree T’, which is a simplification of T, such that T has smaller error over the training
set but T’ has smaller error over the entire distribution of instances. This situation is indicative of
noise in the training set.

Processing the pruned Tree: In this step, the Decision Tree is processed to improve
understandability.
Classification process

A record enters the Decision Tree at the root node. At the root, a test is applied to determine which
child node the record will encounter the next.

Splitting attribute: Associated with every node of the Decision Tree is an attribute, called the splitting
attribute, whose values determine the partitioning of the data set when the node is expanded. In the
example described next, outlook, humidity, and windy are the splitting attributes.

Splitting criterion: The qualifying condition on splitting attribute for is called the splitting criterion.
For a numeric attribute, the criterion can be an equation or an inequality. For a categorical attribute,
it is a membership condition on a subset of values. In the example, Humidity < 75%, or > 75% is the
criteria for the humidity attribute; whereas the outlook being sunny, overcast, or rainy are the
criteria for the outlook splitting attribute at the root.

This process is repeated until the record arrives at a leaf node. All the records that end up at a given
leaf of the Tree are classified in the same way. There is a unique path from the root to each leaf. The
path is a rule, which is used to classify the records.

Example: The example has been adapted from the book, ‘Data Mining Techniques’ by Arun K Pujari;
published in the year 2001 by Universities Press, Hyderabad. Based on training data set shown in
Figure 1, the task of the supervised classification process is to find a set of rules to know what values
of outlook, temperature, humidity, and wind, determine whether a golf player would choose to play
golf. The training data, which contains the attributes values of golf players who decided to play and
who decided not to play, is used to formulate the rules in Table 1. The rules are tested by making a
prediction about the behaviour depicted in the test data set and then comparing the predicted
behaviour with the actual behaviour that is already known. A match between the predicted and
actual behaviour, shown by ( ), confirms that the rule is correct. While a mismatch between the
predicted and actual behaviour, shown by ( ), indicates that the rule is incorrect.

The accuracy of the classifier is determined by the percentage of the test data set that is correctly
classified. The last column of the second table in Figure 1 shows what is the known classification of
the records in the test set; this classification is assumed as the correct classification. The column also
shows whether the classification determined by the Decision Tree matches with the known
classification. A check mark indicates that the classification determined by the Tree is the same as
shown in the test data. A cross indicates the determined classification is opposite of the one shown
in the column. It above shows the accuracy of the rules based on this. Once the fairly accurate rules
are known the Decision Tree can be built as shown in Figure 2. The Tree is then used to find the class
to which the data element belongs. The behaviour of the class is the predicted behaviour of the golf
player under the situation described by the data element.
Table 1—The rules

Rule# Rule Description Accuracy

If..., and if..., then...

1 If it is Sunny, and the humidity is 75% or less, play 50%

2 If it is Sunny, and the humidity is abve 75% do not play 50%

3 If it is overcost, - play 66.67%

4 If it is rainy, and not windy play 50

5 If it is rainy and windy do not play 0%

Figure 2.The Decision Tree (Adapted from Pujari 2001)

The author is associate professor of Supply Chain Management at S P Jain Institute of Management
& Research, Mumbai. He can be contacted at khalid_sheikh@hotmail.com

You might also like