You are on page 1of 5

Decision Tree

Decision tree is a graph to represent choices and their results in form of a tree. The nodes in the graph
represent an event or choice and the edges of the graph represent the decision rules or conditions. It is
mostly used in Machine Learning and Data Mining applications using R.
Examples of use of decision tress is − predicting an email as spam or not spam, predicting of a tumor
is cancerous or predicting a loan as a good or bad credit risk based on the factors in each of these.
Decision Trees are useful supervised Machine learning algorithms that have the ability to perform
both regression and classification tasks. It is characterized by nodes and branches, where the tests
on each attribute are represented at the nodes, the outcome of this procedure is represented at the
branches and the class labels are represented at the leaf nodes. Hence it uses a tree-like model based
on various decisions that are used to compute their probable outcomes. These types of tree-based
algorithms are one of the most widely used algorithms due to the fact that these algorithms are easy
to interpret and use. Apart from this, the predictive models developed by this algorithm are found to
have good stability and a descent accuracy due to which they are very popular.
Types of Decision Trees
 Decision stump: Used for generating a decision tree with just a single split hence also
known as a one-level decision tree. It is known for its low predictive performance in
most cases due to its simplicity.
 M5: Known for its precise classification accuracy and its ability to work well to a
boosted decision tree and small datasets with too much noise.
 ID3(Iterative Dichotomiser 3): One of the core and widely used decision tree
algorithms uses a top-down, greedy search approach through the given dataset and
selects the best attribute for classifying the given dataset
 C4.5: Also known as the statistical classifier this type of decision tree is derived from
its parent ID3. This generates decisions based on a bunch of predictors.
 C5.0: Being the successor of the C4.5 it broadly has two models namely the basic tree
and rule-based model, and its nodes can only predict categorical targets.
 CHAID: Expanded as Chi-squared Automatic Interaction Detector, this algorithm
basically studies the merging variables to justify the outcome on the dependent variable
by structuring a predictive model
 MARS: Expanded as multivariate adaptive regression splines, this algorithm creates a
series of piecewise linear models which is used to model irregularities and interactions
among variables, they are known for their ability to handle numerical data with greater
efficiency.
 Conditional Inference Trees: This is a type of decision tree that uses a conditional
inference framework to recursively segregate the response variables, it’s known for its
flexibility and strong foundations.
 CART: Expanded as Classification and Regression Trees, the values of the target
variables are predicted if they are continuous else the necessary classes are identified if
they are categorical.
As it can be seen that there are many types of decision trees but they fall under two main categories
based on the kind of target variable, they are:  
 Categorical Variable Decision Tree: This refers to the decision trees whose target
variables have limited value and belong to a particular group.
 Continuous Variable Decision Tree: This refers to the decision trees whose target
variables can take values from a wide range of data types.

Working of a Decision Tree in R


 Partitioning: It refers to the process of splitting the data set into subsets. The decision
of making strategic splits greatly affects the accuracy of the tree. Many algorithms are
used by the tree to split a node into sub-nodes which results in an overall increase in the
clarity of the node with respect to the target variable. Various Algorithms like the chi-
square and Gini index are used for this purpose and the algorithm with the best
efficiency is chosen.
 Pruning: This refers to the process wherein the branch nodes are turned into leaf nodes
which results in the shortening of the branches of the tree. The essence behind this idea
is that overfitting is avoided by simpler trees as most complex classification trees may
fit the training data well but do an underwhelming job in classifying new values.
 Selection of the tree: The main goal of this process is to select the smallest tree that
fits the data due to the reasons discussed in the pruning section.

Important factors to consider while selecting the tree in R

 Entropy: 
Mainly used to determine the uniformity in the given sample. If the sample is
completely uniform then entropy is 0, if it’s uniformly partitioned it is one. Higher the
entropy more difficult it becomes to draw conclusions from that information.
 InformationGain: 
Statistical property which measures how well training examples are separated based on
the target classification. The main idea behind constructing a decision tree is to find an
attribute that returns the smallest entropy and the highest information gain. It is basically
a measure in the decrease of the total entropy, and it is calculated by computing the total
difference between the entropy before split and average entropy after the split of dataset
based on the given attribute values.

Generally, a model is created with observed data also called training data. Then a set of validation
data is used to verify and improve the model. R has packages which are used to create and visualize
decision trees. For new set of predictor variable, we use this model to arrive at a decision on the
category (yes/No, spam/not spam) of the data.
The R package "party" is used to create decision trees.

Install R Package

Use the below command in R console to install the package. You also have to install the dependent
packages if any.
install.packages("party")
The package "party" has the function ctree() which is used to create and analyze decison tree.
Syntax
The basic syntax for creating a decision tree in R is −
ctree(formula, data)
Following is the description of the parameters used −
 formula is a formula describing the predictor and response variables.
 data is the name of the data set used.

Input Data
We will use the R in-built data set named readingSkills to create a decision tree. It describes the score
of someone's readingSkills if we know the variables "age","shoesize","score" and whether the person
is a native speaker or not.
Here is the sample data.

# Load the party package. It will automatically load other


# dependent packages.
library(party)

# Print some records from data set readingSkills.


print(head(readingSkills))
When we execute the above code, it produces the following result and chart −
nativeSpeaker age shoeSize score
1 yes 5 24.83189 32.29385
2 yes 6 25.95238 36.63105
3 no 11 30.42170 49.60593
4 yes 7 28.66450 40.28456
5 yes 11 31.88207 55.46085
6 yes 10 30.07843 52.83124
Loading required package: methods
Loading required package: grid
...............................
...............................

Example
We will use the ctree() function to create the decision tree and see its graph.

# Load the party package. It will automatically load other


# dependent packages.
library(party)

# Create the input data frame.


input.dat <- readingSkills[c(1:105),]

# Give the chart file a name.


png(file = "decision_tree.png")

# Create the tree.


output.tree <- ctree( nativeSpeaker ~ age + shoeSize + score, data = input.dat)

# Plot the tree.


plot(output.tree)

# Save the file.


dev.off()
When we execute the above code, it produces the following result −
null device
1
Loading required package: methods
Loading required package: grid
Loading required package: mvtnorm
Loading required package: modeltools
Loading required package: stats4
Loading required package: strucchange
Loading required package: zoo

Attaching package: ‘zoo’

The following objects are masked from ‘package:base’:

as.Date, as.Date.numeric

Loading required package: sandwich

Conclusion
From the decision tree shown above we can conclude that anyone whose readingSkills score is less
than 38.3 and age is more than 6 is not a native Speaker.

Making a prediction  

# testing the people who are native speakers


# and those who are not
predict_model<-predict(ctree_, test_data)
 
# creates a table to count how many are classified
# as native speakers and how many are not
m_at <- table(test_data$nativeSpeaker, predict_model)
m_at

Output 

The model has correctly predicted 13 people to be non-native speakers but classified an additional
13 to be non-native, and the model by analogy has misclassified none of the passengers to be native
speakers when actually they are not.

Determining the accuracy of the model developed 

ac_Test < - sum(diag(table_mat)) / sum(table_mat)


print(paste('Accuracy for test is found to be', ac_Test))

Output: 

Here the accuracy-test from the confusion matrix is calculated and is found to be 0.74. Hence this
model is found to predict with an accuracy of 74 %.

Advantages of Decision Trees


 Easy to understand and interpret.
 Does not require Data normalization
 Doesn’t facilitate the need for scaling of data
 The pre-processing stage requires lesser effort compared to other major algorithms,
hence in a way optimizes the given problem
Disadvantages of Decision Trees
 Requires higher time to train the model
 It has considerable high complexity and takes more time to process the data
 When the decrease in user input parameter is very small it leads to the termination of
the tree
 Calculations can get very complex at times

You might also like