You are on page 1of 17

Chabot based Car price prediction tool leveraging generative AI and regression

A Project
Submitted in partial fulfillment of the requirements for
the award of the Degree of
BACHELOR OF COMPUTER APPLICATION
By

Mohammad Aaqib Siddiquei


ROLL NO-12021004006163 AND REGISTRATION NO-213661001210029
Debjit Sarkar
ROLL NO-12021004006162 AND REGISTRATION NO-213661001210030
Pratik Anand
ROLL NO-12021004006167 AND REGISTRATION NO-213661001210025
Tushar Kumar Singh
ROLL NO-12021004006049 AND REGISTRATION NO-213661001210142
Prince Atul
ROLL NO-12021004006160 AND REGISTRATION NO-213661001210032

DEPARTMENT OF COMPUTER SCIENCE/ APPLICATION

INSTITUTE OF ENGINEERING & MANAGEMENT

2023
DECLARATION CERTIFICATE

This is to certify that the work presented in the thesis entitled “Chatbot
based car price prediction tool leveraging generative AI and regression” in
partial fulfillment of the requirement for the award of degree of Bachelor of
Computer Application of Institute of Engineering & Management is an authentic
work carried out under my supervision and guidance.

To the best of my knowledge, the content of this thesis does not form a
basis for the award of any previous Degree to anyone else.

Date: 18/11/2023 Prof. Nabanita Das

Dept. of Computer Application

Institute of Engineering & Management

Dr. Abhishek Bhattacharya

Head of the Department

Dept. of Computer Application and Science

Institute of Engineering & Management


CERTIFICATE OF APPROVAL

The foregoing thesis entitled “Chatbot based car price prediction tool
leveraging generative AI and regression” is hereby approved as a creditable
study of research topic and has been presented in satisfactory manner to warrant
its acceptance as prerequisite to the degree for which it has been submitted.

It is understood that by this approval, the undersigned do not necessarily


endorse any conclusion drawn or opinion expressed therein, but approve the
thesis for the purpose for which it is submitted.

(Internal Examiner) (External Examiner)


Acknowledgements
We would like to express our special thanks of gratitude to our Guide Prof. Nabanita Das who
helped us a lot in this project, her valuable suggestions helped us to solve tough challenges and
without her help this project could not have been completed in time. A special thanks to our
Head of Department Prof. Abhishek Bhattacharya who gave us the golden opportunity to do
this wonderful project on the topic “Chabot based car price prediction tool leveraging
generative AI and regression”, which helped us to gain a significant knowledge in the
aforesaid subjects. Secondly, we would like to thank our friends who helped us a lot in finalizing
this project within the given time frame.

Name of Student: Mohammad Aaqib Siddiquei


Roll Num: 12021004006163

Name of Student: Debjit Sarkar


Roll Num: 12021004006162

Name of Student: Pratik Anand


Roll Num: 12021004006167

Name of Student: Tushar Kumar Singh


Roll Num: 12021004006049

Name of Student: Prince Atul


Roll Num: 12021004006160
Contents

Abstract ............................................................................................................. v
Chapter 1
1.1 Introduction .................................................................................................... 1

Chapter 2
2.1 Background Studies....................................................................................... 3
2.2 Literature Survey........................................................................................... 4

Chapter 3
3.1 Proposed Methodology.................................................................................. 5

Chapter 4
4.1 Experimental Dataset ...................................................................................... 7

Chapter 5
5.1 Results and Discussion ................................................................................... 8

Chapter 6
6.1 Conclusions ................................................................................................... 11
6.2 Future Work ..................................................................................................... 11
Abstrac

Abstract
This study explores the application of machine learning techniques in predicting
the price of used cars. Leveraging a diverse dataset encompassing various car
attributes, historical pricing information, and market trends, the research aims to
develop accurate predictive models. Feature engineering and selection processes
are employed to identify the most influential factors affecting the resale value of
vehicles. Several machines learning algorithms, including regression models are
implemented and compared for their predictive performance. The study
emphasizes the importance of data preprocessing in enhancing model accuracy
and generalization. Furthermore, the research investigates the impact of
incorporating advanced features such as mileage, brand reputation, and
maintenance history on model robustness. The results demonstrate the feasibility
of employing machine learning in predicting used car prices, providing valuable
insights for both buyers and sellers in the dynamic automotive market.
Chapter 1

1.1 Introduction
The Indian automobile market, a significant sector for international and local
companies, is witnessing a surge in the demand for used cars. Online platforms
like OLX and Quikr dominate this market, but there is a concern about
manipulation and overpricing. To address this issue, I propose a solution utilizing
artificial intelligence and machine learning. By employing supervised learning
techniques and algorithms, we aim to predict used car prices based on relevant
parameters. In recent years, the Indian automobile industry has experienced a
decline in new vehicle production, with a growing preference for used and
second-hand vehicles. To standardize the used car market and implement a
transparent pricing system, this project explores machine-learning techniques
using historical data and mean values from price lists. This research, focused on
the Indian dataset, aims to provide an accurate and customer-friendly solution to
the challenge of overpricing in the used car market. By leveraging data from
various sources, including Kaggle and web scraping, we intend to train a machine-
learning model that considers key features such as manufacturing year, model
year, fuel type, transmission, mileage, and ownership history. The goal is to
empower buyers with a reliable tool that prevents deception by dealers selling
damaged or overpriced used cars. Through accurate price predictions based on
comprehensive data, customers can make informed decisions and navigate the
complexities of purchasing a used car. The project title encapsulates this effort:
"Used Car Price Predictor using Machine Learning."

1 | Institute of Engineering & Management


Chapter 2

2.1 Background Studies


The main objective of this project is to propose a solution for predicting used car
prices using machine-learning algorithms. The study investigates trends in used
car prices and suggests the best algorithm for predicting car prices. The proposed
solution is intended to be helpful for first-time used car buyers and sellers in
determining the selling cost of the car.

2.2 Literature Survey

Several studies and related works have been done previously to predict used car
prices around the world using different methodologies and approaches, with
varying results of accuracy from 50% to 90%. In (Pudaruth, 2014) the researcher
proposed to predict used car prices in Mauritius, where he applied different
machine learning techniques to achieve his results like decision tree, K-nearest
neighbors, Multiple Regression and Naïve Bayes algorithms to predict the used
cars prices, based on historical data gathered from the newspaper.
Achieved results ranged from accuracy of 60-70 percent, the author suggested
using more sophisticated models and algorithms to make the evaluation, with the
main weakness off the decision tree and naïve Bayes that it is required to
discretize the price and classify it which accrue to more inaccuracies. Moreover,

Institute of Engineering & Management | 2


he suggested a larger set of data of data to train the models hence the data
gathered was not sufficient.
(Monburinon, et al., 2018) Gathered data from a German e-commerce site that
totaled to 304,133 rows and 11 attributes to predict the prices of used car using
different techniques and measured their results using Mean Absolute Error (MEA)
to compare their results. Same training dataset and testing dataset was given to
each model. Highest results achieved was by using gradient boosted regression
tree with a MAE of 0.28, and MEA of 0.35 and 0.55 for mean absolute error and
multiple linear regression respectively. Authors suggested adjusting the
parameters in future works to yield better results, as well as using one hot
encoding instead of label encoding for more realistic data interpretations on
categorical data.
(Gegic, Isakovic, Keco, Masetic, & Kevric, 2019) from the International Burch
University in Sarajevo, used three different machine learning techniques to
predict used car prices. Using data scrapped from a local Bosnian website for used
cars totalled at 797 car samples after pre-processing, and proposed using these
methods: Support Vector Machine, Random Forest and Artificial Neural network.
Results have shown using only one machine learning algorithm achieved results
less than 50%, whereas after combing the algorithms with pre calcification of
prices using Random Forest, results with accuracies up to 87.38% was recorded.

(Noor & Jan, 2017) were able to achieve high level of accuracy using Multiple
linear regression models to predict the price of cars collected from used cars
website in Pakistan called Pak Wheels that totaled to 1699 records after pre-
processing, and where able to achieve accuracy of 98%, this was done after

3 | Institute of Engineering & Management


reducing the total amount of attributes using variable selection technique to
include significant attributes only and to reduce the complexity of the model.

(K.Samruddhi & Kumar, 2020) Proposed using Supervised machine leaning model
using K-Nearest Neighbour to predict used car prices from a data set obtained
from Kaggle containing 14 different attributes, using this method accuracy
reached up to 85% after different values of K as well as Changing the percent of
training data to testing data, expectedly when increasing the percent of data that
is tested better accuracy results are achieved. The model was also cross validated
with 5 and 10 folds by using K fold method.
(Gongqi, Yansong, & Qiang, 2011) proposed using Artificial Neural Network (ANN)
through a combined method of BP neural network and nonlinear curve fit and
have achieved accurate value prediction with a feasible model.
(Listiani, 2009) used Support Vector Machines to evaluate leased cars prices,
results have shown that SVM is far more accurate in large dataset with high
dimensional data than Multiple linear regression. Whereas the computation
Multiple linear regression can take several minutes and the SVM would take up to
a day to compute the results. Multiple linear regression may be simple, but SVM
is far more accurate. Moreover, the study includes Samples with up to 178
attributes which is far more than the proposed variable in our study, hence the
use of multiple linear regression may be more suitable in our case.
(Kuiper, 2008) Collected data from General Motor of cars that are produced in
2005, where he as well used variable selection technique to include the most
relevant attributes in his model to reduce the complexity of the data. He

Institute of Engineering & Management | 4


proposed used Multivariate regression model that would be more suitable for
values with numeric format.
In order to predict the price of used cars, researchers (Nabarun Pal, 2018) used a
supervised learning method known as Random Forest. Kaggle's dataset was used
as a basis for predicting used car prices. In order to determine the price impact of
each feature, careful exploratory data analysis was performed. 500 Decision Trees
were trained with Random Forests. It is most commonly used for classification,
but they turned it into a regression model by transforming the problem into an
equivalent regression problem. Using experimental results, it was found that
training accuracy was 95.82%, and testing accuracy was 83.63%. By selecting the
most correlated features, the model can accurately predict the car price.

Hence, from all literature review it is concluded that used cars price prediction is
an important topic which is the area of many researchers nowadays. So far, the
best achieved accuracy is 83.63% on Kaggle’s dataset using random forest
technique. The researchers have tested multiple regressors and final model is
regression model using linear regression.

5 | Institute of Engineering & Management


Chapter 3
3.1 Proposed Methodology
The main goal of this method is to give users an accurate estimate of how much
has to be paid for the given vehicle. The model may give the customer a record of
possibilities for various automobiles based on the details of the automobile the
customer wants. The system assists in providing the customer with sufficient data
to help him to reach a conclusion. The used automobile market is expanding at an
exponential rate, and vehicle vendors may profit from this by offering incorrect
prices to capitalize on the demand. As a result, a system that can predict the price
of a car based on its parameters while also taking into consideration the costs of
competing vehicles is necessary. Our system fills in the gaps by providing buyers
and sellers with an estimate of the car's value based on the best algorithm
available for price.

Algorithm
DECISION TREE: Decision Trees are a type of Supervised Machine Learning (that is
you explain what the input is and what the corresponding output is in the training
data) where the data is continuously split according to a certain parameter.
Decision trees use multiple algorithms to decide to split a node into two or more
sub-nodes. The creation of sub-nodes increases the homogeneity of resultant sub-
nodes. In other words, we can say that the purity of the node increases with
respect to the target variable. In decision analysis, a decision tree can be used to
visually and explicitly represent decisions and decision-making. As the name goes,
it uses a tree-like model of decisions.
Institute of Engineering & Management | 6
RANDOM FOREST: Random forest is a Supervised Machine Learning Algorithm
that is used widely in Classification and Regression problems. It builds decision
trees on different samples and takes their majority vote for classification and
average in case of regression. Random Forest is suitable for situations when we
have a large dataset, and interpretability is not a major concern. Decision trees
are much easier to interpret and understand. Since a random forest combines
multiple decision trees, it becomes more difficult to interpret.

7 | Institute of Engineering & Management


VOTING CLASSIFIER: A voting classifier is a machine learning estimator that trains
various base models or estimators and predicts on the basis of aggregating the
findings of each base estimator. The aggregating criteria can be combined
decision of voting for each estimator output.

Institute of Engineering & Management | 8


Chapter 4
4.1 Experimental Dataset
• Platform used: Kaggle
• The study mentions that the required used car prices dataset
with needed features and parameters was selected from
Kaggle, which is an open-source machine learning and data
science platform that offers data and notebooks for data
scientists and data analysts. The required data was cleaned
and pre-processed using machine learning techniques before
applying any algorithm for predicting the price.
• IDE used: VS Code
• The extension is the Python extension. It lays the foundation
for Python development in Visual Studio Code.
• Visual Studio Code Tools for AI: This extension provides
additional tools for working with various AI and machine
learning frameworks.

9 | Institute of Engineering & Management


Chapter 5
5.1 Expected Outcome

The expected outcome is to provide a predictive model for estimating the price of
a used car based on various features such as engine capacity, distance traveled,
and year of manufacture. The study aims to compare the performance of three
machine-learning algorithms, Decision Tree, Random forest and Voting Classifier
in predicting car prices and identify the most important features for predicting car
prices. The expected outcome is to determine which algorithm performs better in
predicting car prices and which features are most important for predicting car
prices. The study also aims to provide insights into the limitations of the study and
suggest future research directions to improve the accuracy of car price
prediction models.

Chapter 6
6.1 Conclusion
The increased prices of new cars and the financial incapability of the customers to
buy them, Used Car sales are on a global increase. Therefore, there is an urgent
need for a Used Car Price Prediction system which effectively determines the
worthiness of the car using a variety of features. The proposed system will help to
determine the accurate price of used car price prediction. This paper compares 3
different algorithms for machine learning : Decision tree, Random forest and
Voting classifier.

Institute of Engineering & Management | 10


6.2 Future Work
In future iterations, we envision incorporating an AI Chatbot feature into the
Android app, providing users with an interactive and user-friendly interface for
obtaining real-time information, personalized recommendations, and insights
related to car prices and market trends. we may add large historical data of car
price which can help to improve accuracy of the machine learning model. For
better performance, we plan to judiciously design deep learning network
structures, use adaptive learning rates and train on clusters of data rather than
the whole dataset.

11 | Institute of Engineering & Management

You might also like