PSAGAR INSTITUTE OF RESEARCH & TECHNOLOGY
DEPARTMENT OF COMPUTERSCIENCE &ENGINEERING
Content Beyond Syllabus
Program l: Write a programto demonstrate the working of the decision tree-based ID3
algorithm.
Importing the required packages
import numpy as np
import pandas as pd
from [Link] import confusion_matrix
#rom sklearn.cross_validation import train_test_ split
from [Link] import DecisionTreeClassifier
from [Link] import accuracy_sCore
from [Link] import classification _report
#Function importing Dataset
def importdata():
balance_data = [Link] csv('[Link]
'databases/balance-scale/[Link],sep="", header =None)
# Printing the dataswet shape
print ("Dataset Length: ", len(balance_data))
print ("Dataset Shape:", balance_data.shape)
# Printing the dataset obseravtions
print ("Dataset: ",balance_data.head()
return balance data
# Function tosplit the dataset
def splitdataset(balance_data):
# Separating the target variable
X= balance [Link][:, 1:5]
Y= balance [Link][:, 0]
#Splitting the dataset into train and test
A_rain, X_test, y train, y_test =train_test_split(
OF RESEARCH& TECHNOLOGY
SAGAR INSTITUTE COMPUTER SCIENCE & ENGINEERING
DEPARTMENT OF
X,Y, test_size = 0.3, random_state = 100)
return X, Y, X_train, X_test, y_train, y_test
#Function to perform training with ginilndex.
def train_using_gini(X_train, X_test, y_train);
# Creating the classifier object
clf gini =DecisionTreeClassifier(criterion = "gini",
_samples_leaf-5)
random_state = 100, max_depth=3, min
# Performing training
cIf [Link](X_train, y_train)
return clf gini
#Function to perform training with entropy.
def tarin_using_entropy(X_train,X_test, y_train):
#Decision tree with entropy
clf entropy= DecisionTreeClassifier
criterion ="entropy", random_state =100,
max_depth =3, min_samples_leaf =5)
# Performing training
clf_entropy.fit(X_train, y_train)
return clf entropy
# Function to make predictions
def prediction(X_test, clf_object):
# Predicton on test with ginilndex
Y_pred = clf_object.predict(X_test)
print("Predicted values: ")
print(ypred)
return y pred
# Function to calculate accuracy
def cal_accuracy(y_test, y_pred):
print("Confusion Matrix: ",
confusion_ matrix(y_test, y_pred))
TECHNOLOGY
OF RESEARCH &
PSAGAR INSTITUTE COMPUTER SCIENCE & ENGINEERING
DEPARTMENT OF
accuracy_score(y_test,y_pred)°100)
print ("Accuracy : ",
print("Report: ",classification_report(y_test, y_pred))
def main():
# Building Phase
data = importdata()
X, Y, X_train, X_test, y_train, y_ test = splitdataset(data)
clf gini= train_using_gini(X_train, X_test, y_train)
clf_entropy = tarin_using_entropy(X_train, X_test, y_train)
#Operational Phase
print("Results Using Gini Index:")
# Prediction using gini
Y_pred gini= prediction(X_test, clf gini)
cal_accuracy(y_test, y_pred gini)
print("Results Using Entropy:")
#Prediction using entropy
Y_pred_entropy = prediction(X_test,cIf_entropy)
cal_accuracy(y_test, y_pred_entropy)
# Calling main function
if_name ==" main_":
main()
SAGAR INSTITUTE OF RESEARCH & TECHNOLOGY
DEPARTMENT OF COMPUTERSCIENCE & ENGINEERING
OUTPUT:
Dataset Length: 625
Dataset Shape: (625, 5)
Dataset:01234
0B11 11
1R111 2
2R1113
3R1114
4R1115
SAGAR INSTITUTE OFRESEARCH AND TECHNOLOGY,
BHOPAL
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
CASE STUDY BASEDQUESTION
TITLE:USED CARS PRICE PREDICTION
Riding the digital wave, India's used car market is set to grow at a compounded
[Link] of 11% and is likely to touch sales of up to 8.3 million units by FY26
as more people have been opting for pre - owned cars for personal mobility in the
pandemic amid the
ongoing supply shorages for manufacturing new cars.
The used car market in the country is expected to reach over 70 lakh vehicles by
2025- 26,up from 38 lakh in 2020 - 21 as the
Covid -19 pandemic, digitalization , changing demographics and aspirations, first -
time buyers and availability of financing options are acting as growth drivers,
according to a report by OLX Autos and rating agency Crisil.
"MyCars" is a new - age startup laying foundations in the settin
gup a car resell domain and they are setting up a team of ML experts to make
predictive models determine the price of
second - hand cars to optimize their revenue, you have joined as a new Data
Scientist and your role is to create a model to determine the selling price of a used
car.
Objective:
"Provide the best-performing model to determine the price of the used car.
"Providing the most important features which determine the price
Data Description
The data provided consists of the following Data Dictionary
"ld: Unique ID assigned to a specific car.
"year: Manufacture year of the car.
"brand: Brand of the car.
ofull model name: Model name includes other details such as engine
capacity, transmission,etc., basically a detailed model name.
emodel name: Just the model name of the car.
"price: Sellprice of the 2nd ownership car.
sdistance travelled(km): Distance traveled by car.
ueltype: Fuel engine type.
city: City where the car is registered.
car age: Age of
the ca
Td year brand full_modelnane odel nune price distonce_travelled(kas) fuel_type cAty brandantk ca y
0 2010 Hondn Monda Brlo 0 MT BO80.0 Petrct Mumtel
1 2012 Nissan Nissan Sunny XV Diesel
Press Fse toeot fhull sCreen
unny 119120.0 Dlesel Munbal 9.0
2 2017 Toyota Tbyota Fortuner 2.D 4x MT (2010-2020| Foatuner 2050000.o 4503.0 Dleses Thane
3 2017 Mercedes Benz Meredes-Denz [Link] E220d Expression (2019-.. EClass 419000.0 2U000.0 Dleset Mumbal 49
4 2012 Hyundal Hyundal Vema Fluldic 16 CRDI SX Verna 475000.0 23800,0 Dlescl Mumbal 9.0
1720 1720 2015 Hyundal Eon Era Eon 290000.0 28000.0 Pettol PUne 60
Hyundal
1721 1721 2011 Benticy Continental Flyng Spur W12 Contnental 7600000.0 30000.0 Petral Pune 10.9
Bentley
1722 1722 2008 Mahindra-Renaut Mahindra-Renault Logan DLE 1.5dc Logan 185000.0 142522.0 De Pune 24 130
1723 1723 1080 Mahindra Mahindra Jeep CJ G00D Jeep 326000.0 18681.0 Dlesel Pune 31.0
1724 1724 2017 Hyundal Hyundal Creta SX Pus 1.6 AT CRDI Creta 1395000.0 31028.0 Pune
1725 rOwax 11 columns
SAGARINSTITUTE OF RESEARCH AND TECHNOLOGY,
BHOPAL
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
<class [Link]'>
RangeIndex: 1725 entries, e to 1724
Data columns (total 11 columns):
Column Non-Null Count Dtype
Id 1725 non-null int64
1 year 1725 non-null int64
2 brand 1725 non-null object
3 full model_name 1725 non-null object
model name 1725 non-null object
5 price 1725 non-null float64
6 distance travelled(kms) 1725 non-null float64
fuel_type 1725 non-null object
city 1725 non - null object
brand rank 1725 non-null int64
10 car age 1725 non-null float64
dtypes: float64(3), int64(3), object(5)
memory usage: 148.4+ KB
[Link]()
Id year price distance travelled (Kms) brand rank car age
cOunt 1725.000000 1725.0000001.725000e+03 1725.0000001725.0000001725.000000
mean 862.000000 2015.390725 1.494837e+06 53848.256232 15.731014 5.609275
std 498.108924 3.207504 1,671658e+06 4725.54196312951122 3207504
min 0.000000 1990.000000 6.250000e+04 350.000000 1.000000 0.000000
25% 431.000000 2013.000000 5.450000e+05 29000.000000 5.000000 3.000000
50% 862,000000 2016.000000 8.750000e+05 49000.000000 14.000000 5.000000
75% 1293.000000 2018.000000 1.825000er06 7O500.000000 24.000000 a.000000
max 1724.000000 2021.000000 1.470000e+07 780000.000000 81.000000 31.000000
SAGAR INSTITUTEOF RESEARCH AND TECHNOLOGY,
BHOPAL
ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND
year travelled(kms)
price distance brand rank car_age
Id
0.100282 .022191 0.054391
ld 1,000000 0.054391 -0.105696
-0.386107 0.134275 1.000000
year -0.054391 1.000000 0.288483
-0.137351 -0.164591-0.288483
-0.105696 0.288483 1.000000
price
0.111406 0.386107
1.000000
distance travelled(kms) 0.100282 -0.386107 -0.137351
-0.111406 1.000000 -0.134275
0.022191 0.134275 -0.164591
brand_rank 0.134275 1.000000
0.386107
0.054391-1.000000 0.288483
car_ age
[Link]().sum()
Id
year
brand
full model name
model name
price
distance travelled(kms)
fuel_type
city
brand rank
car_age
dtype: int64
[Link]()