Machine Learning Based Car Price Prediction System

Image to Braille Converter
Minor Project Report
Submitted to
CHHATTISGARH SWAMI VIVEKANAND TECHNICAL UNIVERSITY

BHILAI (C.G.), INDIA
In partial fulfilment of B.Tech VI Semester

In
Electronics & Telecommunication Engineering

By
Abhishek Raj,BK3571
Adhikarla Shravani,Bk3506
Aman Kumar Sahu,BK3510
Ananya Mishra,BK3513
Under the Guidance of

DOLLY GAUTAM
Assistant Professor
Dept. of Elect. & Telecomm. Engg.
ii
iii
DECLARATION
I the undersigned solemnly declare that the report of the Project work entitled “Machine
learning based car price prediction system”, is based on my own work carried out during my study
under the supervision of Prof. Dolly Gautam, Department of Electronics and Telecommunication
Engineering, Bhilai Institute of Technology, Durg, Chhattisgarh.
I assert that the statements made, and conclusions drawn are an outcome of the project work. I
further declare that to the best of my knowledge and belief that the report does not contain any part of any
work which has been submitted for the award of any other degree/diploma/certificate in this University/
deemed University of India or any other country.All help received and citations used for the preparation of
the Report have been duly acknowledged.
Abhishek Raj Adhikarla Shravani

Roll No. :300102820070 Roll No. :300102820005
Enrollment No.: BK3571 Enrollment No.: BK3506
Department of Electronics Department of Electronics
& Telecommunication Engg. & Telecommunication Engg.
BIT, Durg BIT, Durg
Aman Kumar Sahu Ananya Mishra

Roll No. :300102820009 Roll No. :300102820012
Enrollment No.: BK3510 Enrollment No.: BK3513
Department of Electronics Department of Electronics
& Telecommunication Engg. & Telecommunication Engg.
BIT, Durg BIT, Durg
iv
CERTIFICATE BY THE SUPERVISOR
This is to certify that the report of the Project submitted is an outcome of the project work
entitled “Machine learning based car price prediction system”, carried out by
Abhishek Raj bearing Univ. Roll No.: 300102820070 & Enrollment No.: BK3571
Adhikarla Shravani bearing Univ. Roll No.: 300102820090 & Enrollment No.: BK3506
Aman Kumar Sahu bearing Univ.Roll No.: 300102820009&Enrollment No.: BK3510
Ananya Mishra bearing Univ. Roll No.: 300102820012 & Enrollment No.: BK3513
Carried out under my guidance and supervision for completion of the Minor Project-II at
Department of Electronics and Telecommunication Engineering of Bhilai Institute of Technology,
Durg.
To the best of my knowledge and the Report
i. Embodies the work of the candidate him/herself,
ii. Has duly been completed and
iii. Is up to the desired standard for the purpose of which is submitted.
The project work as mentioned above is hereby being recommended and forwarded for
examination and evaluation.
v
CERTIFICATE BY THE EXAMINERS
This is to certify that the project work entitled
“Machine learning based car price prediction system”

Submitted by
Abhishek Raj, Roll No.: 300102820092, Enrollment No.: BK3571

Adhikarla Shravani, Roll No.: 300102820090, Enrollment No.: BK3506
Aman Kumar Sahu, Roll No.: 300102820009, Enrollment No.: BK3510
Ananya Mishra, Roll No.: 300102820099, Enrollment No.: BK3513
has been examined by the undersigned as a part of the Minor-Project-II examination, B.Tech
6rd semester, Department of Electronics & Telecommunication Engineering, Bhilai Institute of
Technology, Durg (C.G.).
Internal Examiner External Examiner

Date: Date:
vi
ACKNOWLEDGEMENT
With deep regards and profound respect, I avail this opportunity to express my deep
sense of gratitude and indebtedness to Prof. Dolly Gautam, Department of Electronics and
Telecommunication Engineering, BIT Durg for his valuable guidance and support. I amdeeply
indebted to the valuable discussions at each phase of the project. I consider it my good fortune
to have got an opportunity to work with such a wonderful person.
I express my sincere gratitude to Dr. Arun Arora, Director, Bhilai Institute of
Technology, Durg, for providing adequate infrastructure to carry out present investigations and
also motivating for research work, which has been a constant source of inspiration in
completing this work.
I take immense pleasure in thanking Dr. Mohan Kumar Gupta, Principal, Bhilai
Institute of Technology, Durg, for providing adequate academic facilities to work in my
research area.
I take immense pleasure to thank Dr. Manisha Sharma, Vice Principal, Bhilai Institute
of Technology, Durg, for motivating us to work in research direction and providing
opportunities to connect with global research.
I take immense pleasure in thanking Dr. Arun Kumar, HOD (ETC), Bhilai Institute of
Technology, Durg, for constant feedback and encouragements and endless support and help
throughout this project work.
Lastly, I feel immensely moved in expressing my indebtedness to my revered parents
whose sacrifice, guidance and blessings helped me to complete my work.
Abhishek Raj
Adhikarla Shravani
Aman Kumar Sahu
Ananya Mishra
vii
Table of Contents
Chapter Title Page No.
I Introduction 01
1.1 What is Machine Learning ?

1.2 Types of Machine Learning
1.3 Objective & Problem Statement
1.4 Purpose Of Project
1.5 Architecture Diagram
II Methodology 04
III Result 08
3.1 Jupyter Notebook Code
3.2 Pycharm Code
3.3 Output
IV Conclusion and Future Scope 23
4.1 Conclusion
4.2 Future Scope
V References 25
viii
Chapter 01: Introduction
1.1 What is Machine Learning ?
Machine Learning is the field of study that gives computers the

capability to learn without being explicitly programmed. ML is one of
the most exciting technologies that one would have ever come across.
As it is evident from the name, it gives the computer that makes it more
similar to humans: The ability to learn. Machine learning is actively
being used today, perhaps in many more places than one would expect.
Figure-1.1 Machine
Learning
Figure 1.2 Machine Learning & Traditional Programming
1
1.2 Types Of Machine Learning
A machine is said to be learning from past experiences (data feed-in) with
respect to some class of tasks if its Performance in a given Task improves with
the Experience. For example, assume that a machine has to predict whether a
customer will buy a specific product let’s say ―Antivirus‖ this year or not. The
machine will do it by looking at the previous knowledge/past experiences i.e
the data of products that the customer had bought every year and if he buys
Antivirus every year, then there is a high probability that the customer is going
to buy an antivirus this year as well. This is how machine learning works at the
basic conceptual level.
1.3 Objective & Problem Statement
Objective Of the Project - The goal of this project is to create an efficient

and effective model that will be able to predict the price of a used car by
using the Linear Regression algorithm with better accuracy.
• Brand or Type of the car one prefers like Ford, Hyundai
• Model of the car namely Ford Figo, Hyundai Creta
• Year of manufacturing like 2020, 2021
• Type of fuel namely Petrol, Diesel
• Number of kilometers car has travelled
2
Problem Statement - It is easy for any company to price their new cars based on
the manufacturing and marketing cost it involves. But when it comes to a used car
it is quite difficult to define a price because it involves it is influenced by various
parameters like car brand, manufactured year and etc. The goal of our project is to
predict the best price for a pre-owned car in the Indian market based on the
previous data related to sold cars using Linear Regression.
1.4 Purpose Of Project

The used car market is an ever-rising industry, which has almost
doubled its market value in the last few years. The emergence of online
portals such as CarDheko, Quikr, Carwale, Cars24, and many others
has facilitated the need for both the customer and the seller to be better
informed about the trends and patterns that determine the value of the
used car in the market. Machine Learning algorithms can be used to
predict the retail value of a car, based on a certain set of features. The
purpose of this project is to provide Car price prediction using machine
learning without any human interference.
1.5 Architecture Diagram
3
Chapter 02: Methodology
Data Gathering: The source of the data is the web portal of quikr.com where
vehicle dataset is provided for selling and buying of cars. The dataset gave the
following set of features:
Car Name, Year, Price, Kilometers driven, Fuel Type: Petrol, Diesel or LPG
(Liquid Petroleum Gas) etc.
Creating Environment: An environment is created using anaconda prompt. This

environment would separate our project space from the other default(base) or any
other environments created previously. All the packages, libraries and modules
that we require can be manually installed in the environment created using this
manner and this makes this a beneficial step. We can make the changes according
to our requirements in such an environment.
Data Reading: The csv file is imported and read for the study which is the
primary step. The dataset is thoroughly read on various aspects like null values,
shape, columns, numerical and categorical features, dataset columns, unique
values of each feature, data info etc.
Data Pre-processing: Some of the features in the data were renamed for better
4
understanding and some other features that were not useful for analysis were also
dropped. Exploratory Data Analysis of data is done in which we use statistical
graphics and other visualization methods to summarize the main characteristics
of data. After completing EDA, One Hot Encoding technique is employed for
dealing with the categorical features of the dataset. Thereafter, the correlation
features of the dataset are produced and analyzed thoroughly by visualizing some
plots. Then the features allocation of data is done where the dependent feature
and independent features are allocated for further procedure.
Train-Test Split: After the allocation of dependent and independent features is

completed, we proceed further with the splitting of dataset into training and
testing data. We use 80% of data for training our model and 20% data for testing
purposes.
Model Building: After the Train-Test split, modelling of data is done where the
process of building the model begins. The model along with a few parameters is
defined for further implementation. After the model is ready, various algorithms
are then applied to obtain the final results generated by them. The following
algorithms are employed for the predictive analysis after model building.
Linear Regression: In the field of statistics, it is a linear approach for modelling

the relationships between a scalar response and dependent and independent
variables. In linear regression, the modelling of relationships is done using the
functions such as linear predictor and the unknown model parameters are
estimated from the data.
Lasso Regression: It is a type of linear regression itself which uses shrinkage

which means that the data values are shrunk towards a data point in the center or
in simple term, mean of the data. Lasso procedure supports simple and sparse
models that have a lesser number of parameters. When any model has a high
5
level of multicollinearity then this regression is best suited for that particular
model. This model can also be employed in case certain parts of model selection
are needed to be automated such as variable selection or parameter elimination.
‘LASSO’ is an acronym for Least Absolute Shrinkage and Selection Operator.
Ridge Regression: It is a regression method used for tuning of a model and

analyzing a data that has multicollinearity. L2 regularization are performed under
this method. The multicollinearity of data results in unbiased least-squares, large
variance and thus the predicted values are quite far from the actual values.
Bayesian Ridge Regression: This regression is used to estimate any probabilistic

model of any regression problem allowing a natural mechanism that survives
data insufficiency or poor data distribution by linear regression formulation with
the use of probability distributors avoiding any point estimates.
Random Forest Regression: Random-forest uses ensemble learning method for

classification and regression and thus is a Supervised Learning Algorithm.
Random forests have trees that run parallel to each other and have no interaction
while they are being built. Random forest is a meta-estimator that assembles the
results of multiple predictions. It also aggregates multiple decision trees with the
help of some modifications.
Decision Tree Regression: This algorithm is used to build regression and

classification models in the form of a tree structure. A dataset is broken into
smaller subsets and simultaneously an associated decision tree is also created in
an incremental manner. The final tree consists of decision nodes or leaf nodes as
the results. The algorithm used to construct a decision tree employs a top-down
greedy search throughout the tree and possible branches in it without any
backtracking.
XGBoost Regression: For building supervised regression models XGBoost is a

very powerful algorithm to approach. XGBoost is one of the ensemble learning
methods which involves training of individual models and then combining these
6
individual models (base learners) to generate a single prediction.
Gradient Boosting Regression: It is a technique in machine learning for regression
and classification problems to generate a prediction model. The prediction model
produce is an ensemble of weak prediction models which typically are the decision
trees. This technique generally outperforms the random forest method.
7
Chapter 03: Result
3.1 Jupyter Notebook Code
8
9
10
11
12
13
14
15
16
17
3.2 Pycharm Code
18
3.3 Output
19
20
21
22
Chapter 04: Conclusion and Future Scope
4.1 Conclusion
Since India’s used-car market is booming as buyers have a wide range of options,
easy financing, convenient digital sales channels, and a growing preference for
personal mobility in the COVID-19 era, car prediction can be a challenging task
due to the high number of attributes that should be considered for accurate
prediction. The main weakness of Gradient boosting is that it sacrifices
intelligibility and interpretability. The main limitation of this study
is the low number of records that have been used. In future work, we intend to
collect more data related to electric vehicles and combustion vehicles and to use
more advanced techniques.
4.1 Future Scope
A car price prediction has been a high-interest research area, as it requires

noticeable effort and knowledge of the field expert. A considerable number of
distinct attributes are examined for reliable and accurate predictions. The major
step in the prediction process is the collection and pre-processing of the data. In
this project, data was normalized and cleaned to avoid unnecessary noise for
machine learning algorithms. Applying a single machine algorithm to the data set
accuracy was less than 70%. Therefore, the ensemble of multiple machine learning
algorithms has been proposed and this combination of ML methods gains an
accuracy of 93%. This is a significant improvement compared to the single
machine learning method approach. However, the drawback of the proposed
system is that it consumes much more computational resources than a single
machine learning algorithm. Although this system has achieved astonishing
performance in the car price prediction problem, it can also be implemented using
an advanced machine learning model and with Deep learning techniques to
23
improve its efficiency and accuracy. Moreover, as innovation has been increased in
automobiles and we can observe Electric vehicles have gained public attention and
are preferred by most than a normal car.
24
References
1. [ Agencija za statistiku BiH. (n.d.), retrieved from: http://www.bhas.ba .

[accessed July 18, 2018.]
2. Listiani, M. (2009). Support vector regression analysis for price prediction
in a car leasing application (Doctoral dissertation, Master thesis, TU
Hamburg-Harburg).
3. Richardson, M. S. (2009). Determinants of used car resale value. Retrieved
from: https://digitalcc.coloradocollege.edu/islandora/object
/coccc%3A1346 [accessed: August 1, 2018.]
4. Wu, J. D., Hsu, C. C., & Chen, H. C. (2009). An expert system of price
forecasting for used cars using adaptive neuro-fuzzy inference. Expert
Systems with Applications, 36(4), 7809-7817.
5. Du, J., Xie, L., & Schroeder, S. (2009). Practice Prize Paper—PIN Optimal
Distribution of Auction Vehicles System: Applying Price Forecasting,
Elasticity Estimation, and Genetic Algorithms to Used-Vehicle
Distribution. Marketing Science, 28(4), 637-644.
6. Gongqi, S., Yansong, W., & Qiang, Z. (2011, January). New Model for
Residual Value Prediction of the Used Car Based on BP Neural Network
and Nonlinear Curve Fit. In Measuring Technology and Mechatronics
Automation (ICMTMA), 2011 Third International Conference on (Vol. 2,
pp. 682-685). IEEE.
7. Pudaruth, S. (2014). Predicting the price of used cars using machine
learning techniques. Int. J. Inf. Comput. Technol, 4(7), 753-764. [8] Noor,
K., & Jan, S. (2017). Vehicle Price Prediction System using Machine
Learning Techniques. International Journal of Computer Applications,
167(9), 27-31.
25

Machine Learning Based Car Price Prediction System

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Machine Learning Based Car Price Prediction System

Uploaded by

Copyright:

Available Formats

Image to Braille Converter

Minor Project Report

CHHATTISGARH SWAMI VIVEKANAND TECHNICAL UNIVERSITY

In partial fulfilment of B.Tech VI Semester

Electronics & Telecommunication Engineering

Aman Kumar Sahu,BK3510

Under the Guidance of

Abhishek Raj Adhikarla Shravani

Aman Kumar Sahu Ananya Mishra

This is to certify that the project work entitled

“Machine learning based car price prediction system”

Abhishek Raj, Roll No.: 300102820092, Enrollment No.: BK3571

Internal Examiner External Examiner

Aman Kumar Sahu

Chapter Title Page No.

1.1 What is Machine Learning ?

Machine Learning is the field of study that gives computers the

Figure 1.2 Machine Learning & Traditional Programming

1.3 Objective & Problem Statement

Objective Of the Project - The goal of this project is to create an efficient

• Model of the car namely Ford Figo, Hyundai Creta

• Year of manufacturing like 2020, 2021

• Type of fuel namely Petrol, Diesel

• Number of kilometers car has travelled

1.4 Purpose Of Project

1.5 Architecture Diagram

Creating Environment: An environment is created using anaconda prompt. This

Train-Test Split: After the allocation of dependent and independent features is

Linear Regression: In the field of statistics, it is a linear approach for modelling

Lasso Regression: It is a type of linear regression itself which uses shrinkage

Ridge Regression: It is a regression method used for tuning of a model and

Bayesian Ridge Regression: This regression is used to estimate any probabilistic

Random Forest Regression: Random-forest uses ensemble learning method for

Decision Tree Regression: This algorithm is used to build regression and

XGBoost Regression: For building supervised regression models XGBoost is a

3.1 Jupyter Notebook Code

4.1 Future Scope

A car price prediction has been a high-interest research area, as it requires

1. [ Agencija za statistiku BiH. (n.d.), retrieved from: http://www.bhas.ba .

You might also like