0% found this document useful (0 votes)
13 views8 pages

Content Beyond Syllabus and Case Based Program

The document outlines a program demonstrating the ID3 decision tree algorithm using a dataset related to used car price prediction. It includes code for importing data, splitting datasets, training models using Gini index and entropy, making predictions, and calculating accuracy. Additionally, it discusses the growth of India's used car market and the objectives of creating a predictive model for determining used car prices.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Topics covered

  • Used Car Market,
  • Model Training,
  • Revenue Optimization,
  • Data Insights,
  • Data Dictionary,
  • Model Optimization,
  • Car Features,
  • Pandas Library,
  • Model Evaluation,
  • Classification Report
0% found this document useful (0 votes)
13 views8 pages

Content Beyond Syllabus and Case Based Program

The document outlines a program demonstrating the ID3 decision tree algorithm using a dataset related to used car price prediction. It includes code for importing data, splitting datasets, training models using Gini index and entropy, making predictions, and calculating accuracy. Additionally, it discusses the growth of India's used car market and the objectives of creating a predictive model for determining used car prices.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Topics covered

  • Used Car Market,
  • Model Training,
  • Revenue Optimization,
  • Data Insights,
  • Data Dictionary,
  • Model Optimization,
  • Car Features,
  • Pandas Library,
  • Model Evaluation,
  • Classification Report

PSAGAR INSTITUTE OF RESEARCH & TECHNOLOGY

DEPARTMENT OF COMPUTERSCIENCE &ENGINEERING

Content Beyond Syllabus


Program l: Write a programto demonstrate the working of the decision tree-based ID3
algorithm.

Importing the required packages

import numpy as np
import pandas as pd
from [Link] import confusion_matrix

#rom sklearn.cross_validation import train_test_ split


from [Link] import DecisionTreeClassifier

from [Link] import accuracy_sCore


from [Link] import classification _report
#Function importing Dataset
def importdata():
balance_data = [Link] csv('[Link]

'databases/balance-scale/[Link],sep="", header =None)


# Printing the dataswet shape
print ("Dataset Length: ", len(balance_data))

print ("Dataset Shape:", balance_data.shape)

# Printing the dataset obseravtions

print ("Dataset: ",balance_data.head()

return balance data

# Function tosplit the dataset


def splitdataset(balance_data):
# Separating the target variable
X= balance [Link][:, 1:5]
Y= balance [Link][:, 0]
#Splitting the dataset into train and test
A_rain, X_test, y train, y_test =train_test_split(
OF RESEARCH& TECHNOLOGY
SAGAR INSTITUTE COMPUTER SCIENCE & ENGINEERING
DEPARTMENT OF

X,Y, test_size = 0.3, random_state = 100)

return X, Y, X_train, X_test, y_train, y_test


#Function to perform training with ginilndex.

def train_using_gini(X_train, X_test, y_train);


# Creating the classifier object

clf gini =DecisionTreeClassifier(criterion = "gini",


_samples_leaf-5)
random_state = 100, max_depth=3, min

# Performing training
cIf [Link](X_train, y_train)
return clf gini
#Function to perform training with entropy.

def tarin_using_entropy(X_train,X_test, y_train):

#Decision tree with entropy

clf entropy= DecisionTreeClassifier


criterion ="entropy", random_state =100,

max_depth =3, min_samples_leaf =5)

# Performing training
clf_entropy.fit(X_train, y_train)

return clf entropy

# Function to make predictions


def prediction(X_test, clf_object):

# Predicton on test with ginilndex


Y_pred = clf_object.predict(X_test)

print("Predicted values: ")

print(ypred)
return y pred
# Function to calculate accuracy

def cal_accuracy(y_test, y_pred):


print("Confusion Matrix: ",

confusion_ matrix(y_test, y_pred))


TECHNOLOGY
OF RESEARCH &
PSAGAR INSTITUTE COMPUTER SCIENCE & ENGINEERING
DEPARTMENT OF

accuracy_score(y_test,y_pred)°100)
print ("Accuracy : ",
print("Report: ",classification_report(y_test, y_pred))

def main():

# Building Phase

data = importdata()

X, Y, X_train, X_test, y_train, y_ test = splitdataset(data)


clf gini= train_using_gini(X_train, X_test, y_train)
clf_entropy = tarin_using_entropy(X_train, X_test, y_train)
#Operational Phase
print("Results Using Gini Index:")

# Prediction using gini

Y_pred gini= prediction(X_test, clf gini)


cal_accuracy(y_test, y_pred gini)
print("Results Using Entropy:")

#Prediction using entropy


Y_pred_entropy = prediction(X_test,cIf_entropy)

cal_accuracy(y_test, y_pred_entropy)
# Calling main function

if_name ==" main_":

main()
SAGAR INSTITUTE OF RESEARCH & TECHNOLOGY
DEPARTMENT OF COMPUTERSCIENCE & ENGINEERING

OUTPUT:

Dataset Length: 625

Dataset Shape: (625, 5)

Dataset:01234

0B11 11

1R111 2
2R1113

3R1114

4R1115
SAGAR INSTITUTE OFRESEARCH AND TECHNOLOGY,
BHOPAL
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

CASE STUDY BASEDQUESTION

TITLE:USED CARS PRICE PREDICTION


Riding the digital wave, India's used car market is set to grow at a compounded
[Link] of 11% and is likely to touch sales of up to 8.3 million units by FY26
as more people have been opting for pre - owned cars for personal mobility in the
pandemic amid the
ongoing supply shorages for manufacturing new cars.
The used car market in the country is expected to reach over 70 lakh vehicles by
2025- 26,up from 38 lakh in 2020 - 21 as the
Covid -19 pandemic, digitalization , changing demographics and aspirations, first -
time buyers and availability of financing options are acting as growth drivers,
according to a report by OLX Autos and rating agency Crisil.
"MyCars" is a new - age startup laying foundations in the settin
gup a car resell domain and they are setting up a team of ML experts to make
predictive models determine the price of
second - hand cars to optimize their revenue, you have joined as a new Data
Scientist and your role is to create a model to determine the selling price of a used
car.

Objective:
"Provide the best-performing model to determine the price of the used car.
"Providing the most important features which determine the price
Data Description
The data provided consists of the following Data Dictionary
"ld: Unique ID assigned to a specific car.
"year: Manufacture year of the car.
"brand: Brand of the car.
ofull model name: Model name includes other details such as engine
capacity, transmission,etc., basically a detailed model name.
emodel name: Just the model name of the car.
"price: Sellprice of the 2nd ownership car.
sdistance travelled(km): Distance traveled by car.
ueltype: Fuel engine type.
city: City where the car is registered.
car age: Age of
the ca
Td year brand full_modelnane odel nune price distonce_travelled(kas) fuel_type cAty brandantk ca y
0 2010 Hondn Monda Brlo 0 MT BO80.0 Petrct Mumtel

1 2012 Nissan Nissan Sunny XV Diesel


Press Fse toeot fhull sCreen
unny 119120.0 Dlesel Munbal 9.0

2 2017 Toyota Tbyota Fortuner 2.D 4x MT (2010-2020| Foatuner 2050000.o 4503.0 Dleses Thane

3 2017 Mercedes Benz Meredes-Denz [Link] E220d Expression (2019-.. EClass 419000.0 2U000.0 Dleset Mumbal 49

4 2012 Hyundal Hyundal Vema Fluldic 16 CRDI SX Verna 475000.0 23800,0 Dlescl Mumbal 9.0

1720 1720 2015 Hyundal Eon Era Eon 290000.0 28000.0 Pettol PUne 60
Hyundal
1721 1721 2011 Benticy Continental Flyng Spur W12 Contnental 7600000.0 30000.0 Petral Pune 10.9
Bentley
1722 1722 2008 Mahindra-Renaut Mahindra-Renault Logan DLE 1.5dc Logan 185000.0 142522.0 De Pune 24 130

1723 1723 1080 Mahindra Mahindra Jeep CJ G00D Jeep 326000.0 18681.0 Dlesel Pune 31.0

1724 1724 2017 Hyundal Hyundal Creta SX Pus 1.6 AT CRDI Creta 1395000.0 31028.0 Pune

1725 rOwax 11 columns


SAGARINSTITUTE OF RESEARCH AND TECHNOLOGY,
BHOPAL
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
<class [Link]'>
RangeIndex: 1725 entries, e to 1724
Data columns (total 11 columns):
Column Non-Null Count Dtype

Id 1725 non-null int64


1 year 1725 non-null int64
2 brand 1725 non-null object
3 full model_name 1725 non-null object
model name 1725 non-null object
5 price 1725 non-null float64
6 distance travelled(kms) 1725 non-null float64
fuel_type 1725 non-null object
city 1725 non - null object
brand rank 1725 non-null int64
10 car age 1725 non-null float64
dtypes: float64(3), int64(3), object(5)
memory usage: 148.4+ KB

[Link]()
Id year price distance travelled (Kms) brand rank car age
cOunt 1725.000000 1725.0000001.725000e+03 1725.0000001725.0000001725.000000

mean 862.000000 2015.390725 1.494837e+06 53848.256232 15.731014 5.609275

std 498.108924 3.207504 1,671658e+06 4725.54196312951122 3207504


min 0.000000 1990.000000 6.250000e+04 350.000000 1.000000 0.000000

25% 431.000000 2013.000000 5.450000e+05 29000.000000 5.000000 3.000000

50% 862,000000 2016.000000 8.750000e+05 49000.000000 14.000000 5.000000


75% 1293.000000 2018.000000 1.825000er06 7O500.000000 24.000000 a.000000

max 1724.000000 2021.000000 1.470000e+07 780000.000000 81.000000 31.000000


SAGAR INSTITUTEOF RESEARCH AND TECHNOLOGY,
BHOPAL
ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND
year travelled(kms)
price distance brand rank car_age
Id
0.100282 .022191 0.054391
ld 1,000000 0.054391 -0.105696
-0.386107 0.134275 1.000000
year -0.054391 1.000000 0.288483
-0.137351 -0.164591-0.288483
-0.105696 0.288483 1.000000
price
0.111406 0.386107
1.000000
distance travelled(kms) 0.100282 -0.386107 -0.137351
-0.111406 1.000000 -0.134275
0.022191 0.134275 -0.164591
brand_rank 0.134275 1.000000
0.386107
0.054391-1.000000 0.288483
car_ age

[Link]().sum()

Id
year
brand
full model name
model name
price
distance travelled(kms)
fuel_type
city
brand rank
car_age
dtype: int64

[Link]()

You might also like