You are on page 1of 8

CA- PROJECT

ARYAN
DEVESH
PUJA
SHABNAS
MUDIT
What is a Decision Tree Algorithm?
● Decision tree learning is one of the predictive modelling
approaches used in statistics, data mining and machine
learning.
● It uses a decision tree to go from observations about an item
to conclusions about the item's target value.
● Tree models where the target variable can take a discrete set
of values are called classification trees
● Decision trees where the target variable can take continuous
values are called regression trees
● Decision trees are among the most popular machine learning
algorithms given their intelligibility and simplicity to use

Types of decision tree -

Categorical Variable Decision Tree: Decision Tree which has a


categorical target variable then it called a Categorical variable
decision tree.

Continuous Variable Decision Tree: Decision Tree has a continuous


target variable then it is called Continuous Variable Decision Tree.
Different uses of Decision Tree

1 2 3
Using demographic data to find Assessing prospective growth Serving as a support tool in several
prospective clients opportunities fields

Another application of decision trees is in the use of One of the applications of decision trees involves evaluating Lenders also use decision trees to predict the probability of a
demographic d to find prospective clients. They can help prospective growth opportunities for businesses based on customer defaulting on a loan, by applying predictive model
in streamlining a marketing budget and in making informed historical data. Historical data on sales can be used in generation using the client’s past data. The use of a decision
decisions on the target market that the business is decision trees that may lead to making radical changes in the tree support tool can help lenders in evaluating the
focused on. strategy of a business to help aid expansion and growth creditworthiness of a customer to prevent losses.
Decision Tree Algorithm

Advantages Disadvantages

Easy to explain and perfect for visual representation Generally it gives low prediction accuracy compared to
other algorithms

It can be used for both continuous and categorical


High probability of overfitting
data

Information gain with categorical variables gives a biased


Requires little data preprocessing response for attributes with greater no. of categories

The data ends up in distinct groups that are often Trees fail to deal the linear relationship with input and
easier to understand and infer output
What is a Random Forest?
● Random forest is a supervised learning algorithm.
The "forest" it builds, is an ensemble of decision
trees, usually trained with the “bagging” method.
● Random forest, like its name implies, consists of a
large number of individual decision trees that
operate as an ensemble. Each individual tree in the
random forest spits out a class prediction and the
class with the most votes becomes our model’s
prediction

How Random forest works -


● Step 1 − First, start with the selection of random
samples from a given dataset.
● Step 2 − Next, this algorithm will construct a decision
tree for every sample. Then it will get the prediction
result from every decision tree.
● Step 3 − In this step, voting will be performed for
every predicted result.
● Step 4 − At last, select the most voted prediction
result as the final prediction result.
What are the uses of Random forest?
Random forest classifier will handle the

missing values and maintain the


Random forest algorithm can be
accuracy of a large proportion of data.
used for both classifications and

regression task.
. 4
3

If there are more trees, it

won’t allow over-fitting

trees in the model.

It provides higher
accuracy through cross
validation.

It has the power to handle a large data

set with higher dimensionality


Random Forest

Advantages Disadvantages

Work well for both classification and regression Do good job at classification, but not for regression with
problems same effectiveness

Power to handle data sets with higher dimensions Large number of trees in large data sets make algorithm
making it suitable for complicated tasks too slow for processing

It has an effective method for estimating missing data Difficult to interpret , act like black boxes
and maintains accuracy

Presence of large number of trees enhances better


Computationally expensive
final prediction
Methodology and Analysis
Data Cleansing

02

⮚ The variables which contains


⮚ Surface Analysis of data more than Just15%start
entries as ⮚ Only one variable Just start Final
There are many
Raw
“NA” are notare
There selected
many out of two or three
variations passages of Data
⮚ Finding the relevant and ⮚ Replaced “NA”
variations entriesofin
passages same variables lorem ipsum available
Data irrelevant variables intuitively other variables
lorem as average
ipsum available
with same data are
majority have suffered
majorityor
or median have suffered
mode of that alteration
variables alteration only considered
01
03

Analysis

Software Results
Algorithms
Results are obtained by
1. Decision tree coding with the decision tree
2. Random Forest algorithm as well as random
forest algorithm

You might also like