Assignment AnjaliVats 244

Assignment
On
Decision Tree Analysis
IN PARTIAL FULLFILLMENT OF THE DEGREE OF
Masters of Business Administration-Intelligent Data Science (MBA-IDS:2018-2020)
UNDER GUIDANCE OF
Prof. Keerti Jain
Anjali Vats MB18GID244

Contents
Problem Statement........................................................................................................................ 3
DataSet ........................................................................................................................................... 3
Models of Decision Tree & Model Accuracy .............................................................................. 4
Using rpart .................................................................................................................................. 4
Comparing different combinations of model using mean squared error values ......................... 8
Using tree package ...................................................................................................................... 9
Best Model which is generated................................................................................................... 12
Comparing Accuracy ................................................................................................................ 12
Comaparing Means ................................................................................................................... 12
Problem Statement
1. Build various models of decision tree using different combinations of independent variables.
2. And check the accuracy of the models.
3. Find best model among the models which u generated.
DataSet
The Carseats dataset is a dataframe with 400 observations on the following 11 variables:
1. Sales: unit sales in thousands

2. CompPrice: price charged by competitor at each location
3. Income: community income level in 1000s of dollars
4. Advertising: local ad budget at each location in 1000s of dollars
5. Population: regional pop in thousands
6. Price: price for car seats at each site
7. ShelveLoc: Bad, Good or Medium indicates quality of shelving location
8. Age: age level of the population
9. Education: ed level at location
10. Urban: Yes/No
11. US: Yes/No
Models of Decision Tree & Model Accuracy
Using rpart
1. Predicting weather the person belongs to US or not based on variables (Income, Advertising,
Population, price)
############ CODE 1 ##################
install.packages("rpart")
library(rpart)
getwd()
Carseats<-read.csv("C:/Users/intone/Desktop/MBA/T4/MA/After MidTerm/Carseats.csv")
attach(Carseats)
names(Carseats)
tree_analysis<-rpart(US~Income+Advertising+Population+price, data=Carseats)
tree_analysis
install.packages("rpart.plot")
library(rpart.plot)
rpart.plot(tree_analysis,extra=1)
2) Predicting weather the person belongs to Urban or not based on other parameters in the dataset
i.e ( Income, Advertising, Education, Population, Price, Age, Shelvloc, US , Sales)
Carseats <- Carseats[,-1]
Carseats$Urban <- factor(Carseats$US, levels=c(2,4), labels=c("Yes", "No"))
print(summary(Carseats))
set.seed(1234)
ind <- sample(2, nrow(Carseats), replace=TRUE, prob=c(0.7, 0.3))
trainData <- Carseats[ind==1,]
validationData <- Carseats[ind==2,]
tree = rpart(Urban ~ ., data=trainData, method="class")
rpart.plot(tree)
evaluation(tree, validationData, "class")

C) Predicting Shelveloc (Good, Medium or bad) based on other parameters in the dataset
(Income+Advertising+Education+Population+price+age+Urban+US+Sales)
############ CODE ##################
tree_analysis<-
rpart(Shelveloc~Income+Advertising+Education+Population+price+age+Urban+US+Sales,
data=Carseats)
rpart.plot(tree_analysis,extra=1)
Comparing different combinations of model using mean

squared error values
Different combinations of independent variables are used to create models and their mean squared
error values are calculated using the difference in actual and predicted values
The least mean error value has been obtained for model7 and model8 with 7 and 8 independent
variables ie. With the following combinations :
tree_model7=tree(High~Advertising+age+price+Education+Income+Population+US+Shelv
eloc,training_data)
eloc+Urban,training_data)
Using tree package
a) Creating a decision model by splitting the dataset into training and test data.
#split data into training ans test set
set.seed(2)
train=sample(1:nrow(Carseats),nrow(Carseats)/2)
training_data=Carseats[train,]
testing_data=Carseats[test, ]
testing_High=High[test]
#fit thr tree model using training data
tree_model=tree(High~.,training_data)
plot(tree_model)
text(tree_model, pretty = 0)
tree_pred=predict(tree_model, testing_data, type="class")
mean(tree_pred!=testing_High)
#PRUNE the tree

##cross validation to check whre to stop pruning
set.seed(3)
cv_tree=cv.tree(tree_model, FUN=prune.misclass)
names(cv_tree)
plot(cv_tree$size, cv_tree$dev, type="b")
##prune the tree
pruned_model=prune.misclass(tree_model, best=9)
plot(pruned_model)
text(pruned_model, pretty=0)
##check how it is doing
tree_pred=predict(pruned_model, testing_data, type="class")
mean(tree_pred !=testing_High)
Best Model which is generated
Comparing Accuracy
1) The best model has been generated is the one in which US (Yes/ No) labels have been
predicted keeping other variables. This model has accuracy of around 89%.
Comaparing Means
The least mean error value has been obtained for model7 and model8 with 7 and 8 independent
variables ie. With the following combinations :
eloc,training_data)
eloc+Urban,training_data)

Assignment AnjaliVats 244

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Assignment AnjaliVats 244

Uploaded by

Copyright:

Available Formats

Assignment

Decision Tree Analysis

IN PARTIAL FULLFILLMENT OF THE DEGREE OF

Masters of Business Administration-Intelligent Data Science (MBA-IDS:2018-2020)

Prof. Keerti Jain

Anjali Vats MB18GID244

1. Sales: unit sales in thousands

Carseats$Urban <- factor(Carseats$US, levels=c(2,4), labels=c("Yes", "No"))

ind <- sample(2, nrow(Carseats), replace=TRUE, prob=c(0.7, 0.3))

trainData <- Carseats[ind==1,]

validationData <- Carseats[ind==2,]

tree = rpart(Urban ~ ., data=trainData, method="class")

evaluation(tree, validationData, "class")

Comparing different combinations of model using mean

#split data into training ans test set

#fit thr tree model using training data

#PRUNE the tree

##check how it is doing

tree_pred=predict(pruned_model, testing_data, type="class")

You might also like