Professional Documents
Culture Documents
A Project
Submitted in partial fulfillment of the requirements for
the award of the Degree of
BACHELOR OF COMPUTER APPLICATION
By
2023
DECLARATION CERTIFICATE
This is to certify that the work presented in the thesis entitled “Chatbot
based car price prediction tool leveraging generative AI and regression” in
partial fulfillment of the requirement for the award of degree of Bachelor of
Computer Application of Institute of Engineering & Management is an authentic
work carried out under my supervision and guidance.
To the best of my knowledge, the content of this thesis does not form a
basis for the award of any previous Degree to anyone else.
The foregoing thesis entitled “Chatbot based car price prediction tool
leveraging generative AI and regression” is hereby approved as a creditable
study of research topic and has been presented in satisfactory manner to warrant
its acceptance as prerequisite to the degree for which it has been submitted.
Abstract ............................................................................................................. v
Chapter 1
1.1 Introduction .................................................................................................... 1
Chapter 2
2.1 Background Studies....................................................................................... 3
2.2 Literature Survey........................................................................................... 4
Chapter 3
3.1 Proposed Methodology.................................................................................. 5
Chapter 4
4.1 Experimental Dataset ...................................................................................... 7
Chapter 5
5.1 Results and Discussion ................................................................................... 8
Chapter 6
6.1 Conclusions ................................................................................................... 11
6.2 Future Work ..................................................................................................... 11
Abstrac
Abstract
This study explores the application of machine learning techniques in predicting
the price of used cars. Leveraging a diverse dataset encompassing various car
attributes, historical pricing information, and market trends, the research aims to
develop accurate predictive models. Feature engineering and selection processes
are employed to identify the most influential factors affecting the resale value of
vehicles. Several machines learning algorithms, including regression models are
implemented and compared for their predictive performance. The study
emphasizes the importance of data preprocessing in enhancing model accuracy
and generalization. Furthermore, the research investigates the impact of
incorporating advanced features such as mileage, brand reputation, and
maintenance history on model robustness. The results demonstrate the feasibility
of employing machine learning in predicting used car prices, providing valuable
insights for both buyers and sellers in the dynamic automotive market.
Chapter 1
1.1 Introduction
The Indian automobile market, a significant sector for international and local
companies, is witnessing a surge in the demand for used cars. Online platforms
like OLX and Quikr dominate this market, but there is a concern about
manipulation and overpricing. To address this issue, I propose a solution utilizing
artificial intelligence and machine learning. By employing supervised learning
techniques and algorithms, we aim to predict used car prices based on relevant
parameters. In recent years, the Indian automobile industry has experienced a
decline in new vehicle production, with a growing preference for used and
second-hand vehicles. To standardize the used car market and implement a
transparent pricing system, this project explores machine-learning techniques
using historical data and mean values from price lists. This research, focused on
the Indian dataset, aims to provide an accurate and customer-friendly solution to
the challenge of overpricing in the used car market. By leveraging data from
various sources, including Kaggle and web scraping, we intend to train a machine-
learning model that considers key features such as manufacturing year, model
year, fuel type, transmission, mileage, and ownership history. The goal is to
empower buyers with a reliable tool that prevents deception by dealers selling
damaged or overpriced used cars. Through accurate price predictions based on
comprehensive data, customers can make informed decisions and navigate the
complexities of purchasing a used car. The project title encapsulates this effort:
"Used Car Price Predictor using Machine Learning."
Several studies and related works have been done previously to predict used car
prices around the world using different methodologies and approaches, with
varying results of accuracy from 50% to 90%. In (Pudaruth, 2014) the researcher
proposed to predict used car prices in Mauritius, where he applied different
machine learning techniques to achieve his results like decision tree, K-nearest
neighbors, Multiple Regression and Naïve Bayes algorithms to predict the used
cars prices, based on historical data gathered from the newspaper.
Achieved results ranged from accuracy of 60-70 percent, the author suggested
using more sophisticated models and algorithms to make the evaluation, with the
main weakness off the decision tree and naïve Bayes that it is required to
discretize the price and classify it which accrue to more inaccuracies. Moreover,
(Noor & Jan, 2017) were able to achieve high level of accuracy using Multiple
linear regression models to predict the price of cars collected from used cars
website in Pakistan called Pak Wheels that totaled to 1699 records after pre-
processing, and where able to achieve accuracy of 98%, this was done after
(K.Samruddhi & Kumar, 2020) Proposed using Supervised machine leaning model
using K-Nearest Neighbour to predict used car prices from a data set obtained
from Kaggle containing 14 different attributes, using this method accuracy
reached up to 85% after different values of K as well as Changing the percent of
training data to testing data, expectedly when increasing the percent of data that
is tested better accuracy results are achieved. The model was also cross validated
with 5 and 10 folds by using K fold method.
(Gongqi, Yansong, & Qiang, 2011) proposed using Artificial Neural Network (ANN)
through a combined method of BP neural network and nonlinear curve fit and
have achieved accurate value prediction with a feasible model.
(Listiani, 2009) used Support Vector Machines to evaluate leased cars prices,
results have shown that SVM is far more accurate in large dataset with high
dimensional data than Multiple linear regression. Whereas the computation
Multiple linear regression can take several minutes and the SVM would take up to
a day to compute the results. Multiple linear regression may be simple, but SVM
is far more accurate. Moreover, the study includes Samples with up to 178
attributes which is far more than the proposed variable in our study, hence the
use of multiple linear regression may be more suitable in our case.
(Kuiper, 2008) Collected data from General Motor of cars that are produced in
2005, where he as well used variable selection technique to include the most
relevant attributes in his model to reduce the complexity of the data. He
Hence, from all literature review it is concluded that used cars price prediction is
an important topic which is the area of many researchers nowadays. So far, the
best achieved accuracy is 83.63% on Kaggle’s dataset using random forest
technique. The researchers have tested multiple regressors and final model is
regression model using linear regression.
Algorithm
DECISION TREE: Decision Trees are a type of Supervised Machine Learning (that is
you explain what the input is and what the corresponding output is in the training
data) where the data is continuously split according to a certain parameter.
Decision trees use multiple algorithms to decide to split a node into two or more
sub-nodes. The creation of sub-nodes increases the homogeneity of resultant sub-
nodes. In other words, we can say that the purity of the node increases with
respect to the target variable. In decision analysis, a decision tree can be used to
visually and explicitly represent decisions and decision-making. As the name goes,
it uses a tree-like model of decisions.
Institute of Engineering & Management | 6
RANDOM FOREST: Random forest is a Supervised Machine Learning Algorithm
that is used widely in Classification and Regression problems. It builds decision
trees on different samples and takes their majority vote for classification and
average in case of regression. Random Forest is suitable for situations when we
have a large dataset, and interpretability is not a major concern. Decision trees
are much easier to interpret and understand. Since a random forest combines
multiple decision trees, it becomes more difficult to interpret.
The expected outcome is to provide a predictive model for estimating the price of
a used car based on various features such as engine capacity, distance traveled,
and year of manufacture. The study aims to compare the performance of three
machine-learning algorithms, Decision Tree, Random forest and Voting Classifier
in predicting car prices and identify the most important features for predicting car
prices. The expected outcome is to determine which algorithm performs better in
predicting car prices and which features are most important for predicting car
prices. The study also aims to provide insights into the limitations of the study and
suggest future research directions to improve the accuracy of car price
prediction models.
Chapter 6
6.1 Conclusion
The increased prices of new cars and the financial incapability of the customers to
buy them, Used Car sales are on a global increase. Therefore, there is an urgent
need for a Used Car Price Prediction system which effectively determines the
worthiness of the car using a variety of features. The proposed system will help to
determine the accurate price of used car price prediction. This paper compares 3
different algorithms for machine learning : Decision tree, Random forest and
Voting classifier.