You are on page 1of 24

USED CAR PRICE PREDICTION USING

LINEAR REGRESSION MODEL


PROJECT REPORT

BACHELOR OF TECHNOLOGY

in

COMPUTER SCIENCE

of

APJ ABDUL KALAM TECHNOLOGICAL UNIVERSITY

DEPARTMENT OF COMPUTER SCIENCE

MARCH 2023
ACKNOWLEDGEMENT

First of all I thank God Almighty for His grace and mercy showered upon me for the
successful completion of this seminar work.

I would like to extend my gratitude to Prof. Dr. K. Sunil kumar, Principal, College
of Engineering, Adoor, for equipping me with all facilities during the presentation of my
seminar. I express my sincere thanks to Prof. Shibu J., Head of the Department, Department
of Electrical and Electronics Engineering, College of Engineering Adoor for permitting me to
do this seminar and supporting me till the completion of this work.

I make use of this opportunity to express my hearty gratitude to the seminar guide,
Prof. Sreedeepa H. S., Assistant Professor, Department of Electrical and Electronics
Engineering, College of Engineering Adoor and seminar co-ordinator Prof. Sreedeepa H. S.
Assistant Professor, Department of Electrical and Electronics Engineering, College of
Engineering Adoor for assisting me in needs and giving relevant advice for making this
seminar successful.

I also express my heartfelt thanks to all other faculty in the Department of Electrical
and Electronics Engineering, for their enormous help in the progress of my seminar work.

This acknowledgment will stand incomplete if my friends and classmates aren’t


thanked whose constant encouragement and timely criticism helped me to a great extent and
fueled my destination.

I also express my gratitude to my beloved parents for encouraging and supporting me


throughout the course of this seminar. I take this opportunity to thank all who have
helped me directly or indirectly through this endeavor.
ABSTRACT

The production rates of cars have been rising progressively during the past
decade, with almost 92 million cars being produced in the year 2019. This has
provided the used car market with a big rise which has now come into picture as
a well-growing industry. The recent arrival of various online portals and
websites has provided with the need of the customers, clients, dealers and the
sellers to be updated with the current scenario and trends to know the actual
value of any used car in the current market. While there are numerous
applications of machine learning in real life but one of the most pronounced
application is it’s use in solving the prediction problems. Again, there is an end
number of topics on which the prediction can be done. This project is very much
focused and based upon one of such application. Making the use of a Machine
Learning Algorithm such as Linear Regression, we will try to predict the price
of a used car and build a statistical model based on provided data with a given
set of attributes.
CONTENTS
1. INTRODUCTION.....................................................................................7
1.1 Objective Of The Project..........................................................................8
1.2 Motivation And Challenges......................................................................8
1.3 FEATURES..............................................................................................8
2. LITERATURE REVIEW.........................................................................9
3. TECHNOLOGY USED..........................................................................11
3.1 SCIPY.....................................................................................................11
3.2 Matplotlib...............................................................................................11
3.3 Seaborn...................................................................................................12
3.4 Linear Regression...................................................................................12
4. SYSTEM DESIGN..................................................................................15
4.1 DATA FLOW DIAGRAM.....................................................................15
4.2 SYSTEM ARCHITECTURE.................................................................15
5. METHODOLOGY..................................................................................17
6. EXPERIMENT AND RESULTS...........................................................19
7. DISADVANTAGES................................................................................22
8. CONCLUSION........................................................................................23
9. FUTURE ENHANCEMENT.................................................................24
10. REFERENCES........................................................................................24
LIST OF FIGURES

Fig 4. 1: Data Flow Diagram...............................................................................15


Fig 4. 2: System Architecture..............................................................................16
Fig 5. 1: Methodology.........................................................................................18
Fig 6. 1: Data Overview......................................................................................19
Fig 6. 2:Visualizing data with target...................................................................20
Fig 6. 3:Visualizing data.....................................................................................20
Fig 6. 4:Corelation between Selling Price and Selling Price..............................21
USED CAR PRICE PREDICTION USING LINEAR REGRESSION MODEL 6

1. INTRODUCTION

From a long time since being, a continuous paradigm of transactions of


commodities has been into existence. Earlier these transactions were in the form
of barter system which later was translated into a monetary system. And with
consideration into these, all changes that were brought about the pattern of re-
selling items was affected as well. There are two ways in which the re-selling of
the item is carried out. One is offline and the other being online. In offline
transactions, there is a mediator present in between who is very vulnerable to
being corrupt and make overly profitable transactions. The second option is
online wherein there is a certain platform which lets the user find the price he
might get if he goes for selling.

Kilometers traveled – We know that the number of kilometers traveled by


a vehicle has a huge role to play while putting the vehicle up for sale. The more
the vehicle has travelled, the older it is.

Fiscal power – It is the power output of the vehicle. More output yields
better value out of a vehicle.

Year of registration – It is the year when the vehicle was registered with
the Road Transport Authority. The newer the vehicle is; the better value it will
yield. By every passing year, the value will depreciate.

Fuel Type – There were two types of fuel types present in the dataset that
we had. Petrol and Diesel. It was relatively less dominant.

It's due to the above factors that we need a system that can develop a self-
learning machine learning-based system. This was the basis on which a set of
objectives was supposed to be formulated. One thing that was pre-determined
was that this is going to be a real-time project.

DEPARTMENT OF COMPUTER SCIENCE


USED CAR PRICE PREDICTION USING LINEAR REGRESSION MODEL 7

1.1 Objective Of The Project


To build a supervised machine learning model for forecasting value of a vehicle
based on multiple attributes.

The system that is being built must be feature based i.e., feature wise prediction
must be possible.

Providing graphical comparisons to provide a better view.

1.2 Motivation And Challenges


The automotive industry is composed of a few top global multinational players
and several retailers. The multinational players are mainly manufacturers by
trade whereas the retail market features players who deal in both new and used
vehicles. The used car market has demonstrated a significant growth in value
contributing to the larger share of the overall market. The used car market in
India accounts for nearly 3.4 million vehicles per year.

1.3 FEATURES
There will be majorly two features provided in the project note that this will be
not.

 Re-sale platform: A centralized platform for car resale that will


predict prices.
 Feature selection: Feature-based search and prediction.

Section I contains the introduction of our module, then objective, motivation


and features of our model, Section II contains Literature Review, Section III
contain the various technologies in machine learning, Section IV explains the
methodology, section V describes the results and discussion, Section VI
contains the conclusion and future work.

DEPARTMENT OF COMPUTER SCIENCE


USED CAR PRICE PREDICTION USING LINEAR REGRESSION MODEL 8

2. LITERATURE REVIEW

In this chapter, we discuss various applications and methods which inspired us


to build our project. We did a background survey regarding the basic ideas of
our project and used those ideas for the collection of information like the
technological stack, algorithms, and shortcomings of our project which led us to
build a better project.

CARS24
Cars24 is a web platform where seller can sell their used car. It is an Indian
Start-up with a simplified user interface which asks seller parameters like car
model, kilometers traveled, year of registration and vehicle type (petrol, diesel)
[1]. These allow the web model to run certain algorithms on given parameters
and predict the price.

GET VEHICLE PRICE


Get Vehicle Price is an android app which works on similar parameters as of
Cars24. This app predicts vehicle prices on various parameter like Fiscal power,
horsepower, kilometers traveled [2]. This app uses a machine learning approach
to predict the price of a car, bike, electric vehicle and hybrid vehicle. This app
can predict the price of any vehicle because of the smartly optimized algorithm.

CARWALE
CarWale app is one of the top-rated car apps in India for new and used car
research. It provides accurate on-road prices of cars, genuine user and expert
reviews [3]. It can also compare different cars with the car comparison tool. this
app also helps you to connect with your nearest car dealers for the best offers
available.

DEPARTMENT OF COMPUTER SCIENCE


USED CAR PRICE PREDICTION USING LINEAR REGRESSION MODEL 9

CARTRADE
CarTrade is web and Android platform where user can research New Cars in
India by exploring Car Prices, Car Specs, Images, Mileage, Reviews, and Car
Comparisons [4]. On this app one can Sell Used Car to genuine buyers with
ease. One can list their used car for sale along with the details like image,
model, and year of purchase and kilometers so that it is displayed to lakhs of
interested car buyers in their city [5]. User can read user reviews and expert car
reviews with images that help in finalizing a new car buying decision [6].

DEPARTMENT OF COMPUTER SCIENCE


USED CAR PRICE PREDICTION USING LINEAR REGRESSION MODEL 10

3. TECHNOLOGY USED

Python was the major technology used for the implementation of machine
learning concepts the reason being that there are numerous inbuilt methods in
the form of packaged libraries present in python. Following are prominent
libraries/tools we used in our project.

3.1 SCIPY
SciPy is a free and open-source Python library used for scientific computing and
technical computing. SciPy contains modules for optimization, linear algebra,
integration, interpolation, special functions, FFT, signal and image processing,
ODE solvers and other tasks common in science and engineering. SciPy builds
on the NumPy array object and is part of the NumPy stack which includes tools
like Matplotlib, pandas, and SymPy, and an expanding set of scientific
computing libraries. This NumPy stack has similar users to other applications
such as MATLAB, GNU Octave, and Scilab. The NumPy stack is also
sometimes referred to as the SciPy stack[2]. The SciPy library is currently
distributed under the BSD license, and its development is sponsored and
supported by an open community of developers. It is also supported by
NumFOCUS, a community foundation for supporting reproducible and
accessible science

3.2 Matplotlib
Matplotlib is especially deployed for basic plotting. Bars, pies, lines, scatter
plots and so on are part of visualization using matplotlib. Multiple figures of this
module can be opened, however have to be closed explicitly. Only the current
figure is closed by plt.close() while plt.close(‘all’) would shut them all. For data
visualization in Python, Matplotlib is a graphics package well integrated with
NumPy and Pandas. The MATLAB plotting commands are closely mirrored by
the pyplot module. Therefore, the MATLAB users could simply transit to

DEPARTMENT OF COMPUTER SCIENCE


USED CAR PRICE PREDICTION USING LINEAR REGRESSION MODEL 11

plotting with Python. Matplotlib has different stateful APIs for plotting and
works with data frames and arrays. The object represents the figures and aces
and therefore plot() like calls without parameters suffices, avoiding any need to
manage parameters. Matplotlib is extremely customizable and powerful. Pandas
uses Matplotlib and it is also a neat wrapper around Matplotlib

3.3 Seaborn
Seaborn provides various visualization patterns. It has easy and interesting
default themes and uses fewer syntax. Statistics visualization is the speciality of
seaborn and it is employed while summarizing data in visuals and additionally
depict the data distribution. Seaborn creates multiple figures which typically
results in OOM (out of memory) issues. Seaborn is additionally integrated for
functioning with Pandas data frames. It extends the Matplotlib library for
making ideal graphics with Python employing simple and easy methods.
Seaborn is much more intuitive than Matplotlib and works with an entire
dataset. In Seaborn, replot() is the API used with ‘kind’ parameter which
specifies the type of plot that can be line, bar, or many of the other types. Since,
Seaborn is not stateful, it is necessary for plot() to pass the object. Seaborn
avoids plenty of boilerplate by providing commonly used default themes.
Seaborn is employed for use cases that are more specific and also, under the
hood it is Matplotlib. Statistical plotting is what it is especially meant for.

3.4 Linear Regression


Linear regression is a widely used machine learning algorithm for predictive
analysis. It is a statistical method that helps to establish a relationship between
two continuous variables. The primary objective of linear regression is to
determine the relationship between the independent variable(s) and the
dependent variable, which is represented by a linear equation of the form Y = a
+ bX + e, where Y is the dependent variable, X is the independent variable, a is
the intercept, b is the slope of the line, and e is the error term.

DEPARTMENT OF COMPUTER SCIENCE


USED CAR PRICE PREDICTION USING LINEAR REGRESSION MODEL 12

Linear regression is used in various fields such as economics, finance,


healthcare, social sciences, and engineering, to name a few. Some of the most
common use cases of linear regression are as follows:
 Sales Forecasting: Linear regression is used to predict future sales based
on historical sales data.
 Marketing Analysis: Linear regression is used to determine the impact of
various marketing strategies on sales.
 Financial Analysis: Linear regression is used to analyze the relationship
between financial variables such as interest rates, inflation, and stock prices.
 Healthcare: Linear regression is used to analyze the relationship between
different health factors such as age, weight, and blood pressure.
 Sports Analytics: Linear regression is used to analyze the performance of
sports teams and players based on various factors such as player salary, win-loss
record, and player stats.
Linear regression modeling involves several steps, which are as follows:
 Data Collection: The first step in linear regression modeling is to collect
data. The data can be collected from various sources such as surveys,
experiments, and observations.
 Data Cleaning and Preprocessing: The collected data may contain errors,
missing values, and outliers, which need to be addressed before building a
model. This step involves data cleaning, data transformation, and data scaling.
 Feature Selection: In this step, the relevant features for the analysis are
selected based on their relevance to the dependent variable.
 Model Building: Once the relevant features are identified, a linear
regression model is built using the data. The model is built by fitting a straight
line that best represents the relationship between the independent variable(s) and
the dependent variable.

DEPARTMENT OF COMPUTER SCIENCE


USED CAR PRICE PREDICTION USING LINEAR REGRESSION MODEL 13

 Model evaluation: Once the model is built, it is evaluated on the testing


dataset using various evaluation metrics such as Mean Squared Error (MSE),
Mean Absolute Error (MAE), and R-squared score.
 Model optimization: In this step, the model is optimized by tuning the
hyperparameters such as learning rate, regularization, and number of iterations
to improve the model's performance.
 Prediction: Finally, the model is used to make predictions on the new
data, and the performance of the model is monitored over time.

DEPARTMENT OF COMPUTER SCIENCE


USED CAR PRICE PREDICTION USING LINEAR REGRESSION MODEL 14

4. SYSTEM DESIGN

4.1 DATA FLOW DIAGRAM


A data flow diagram shows the way information flows through a process or
system. It includes data inputs and outputs; data stores and the various sub
processes the data moves through. DFDs are built using standardized symbols
and notation to describe various entities and their relationships. In this project
there is one DFD.

Fig 4. 1: Data Flow Diagram

4.2 SYSTEM ARCHITECTURE


Figure 4.2 shows the architecture diagram. Initially the dataset is collected.
Then the collected dataset is preprocessed, true data is noisy. Therefore, it is
necessary to clean data so that the actual information from the collected data can
be acquired. Different processes are carried out to obtain the actual information
such as manual encoding and one hot encoding. Then feature extraction is
performed to extract necessary features. Linear regression modelling is used to
train the dataset. User can give an input to the detection model and it will
provide an output.

DEPARTMENT OF COMPUTER SCIENCE


USED CAR PRICE PREDICTION USING LINEAR REGRESSION MODEL 15

Fig 4. 2: System Architecture

5. METHODOLOGY

In this chapter, we discuss various algorithms and the required dataset that
were implemented to build this module. We used the Linear Regression

DEPARTMENT OF COMPUTER SCIENCE


USED CAR PRICE PREDICTION USING LINEAR REGRESSION MODEL 16

algorithm to build a predictive model for Used Car Prediction. We used the
Python programming language and its various libraries such as Pandas,
matplotlib, sklearn and Scikit-learn for data preprocessing, analysis, and model
building.

We first imported the dataset into a Pandas dataframe and then performed
the necessary data cleaning and preprocessing, such as handling missing values,
converting categorical features into numerical features, and feature scaling.

We then divided the dataset into two parts, i.e., training and testing, with a
ratio of 70:30, respectively. We then applied the Linear Regression algorithm on
the training dataset to build a predictive model and evaluated its performance
using various evaluation metrics such as Mean Absolute Error (MAE), Mean
Squared Error (MSE), and R2 score.

Fig 5. 1: Methodology

DEPARTMENT OF COMPUTER SCIENCE


USED CAR PRICE PREDICTION USING LINEAR REGRESSION MODEL 17

6. EXPERIMENT AND RESULTS

Reading and Understanding Data: The dataset is imported and read


for understanding

Fig 6. 1: Data Overview

The Figure 6.1 above is an overview of our dataset that simply describes that
what exactly does our dataset looks like. It simply displays all the attributes
which are: Car Name, Year, Selling Price, Present Price, Kms Driven, Fuel
Type, Seller Type, Transmission, Owner (Number of previous owners). Figure 4
covers the fact that the dataset does not contains any Null entries. Null entries

DEPARTMENT OF COMPUTER SCIENCE


USED CAR PRICE PREDICTION USING LINEAR REGRESSION MODEL 18

are basically any kind of missing values in the dataset. It is necessary to know
about the missing values or null entries because a null entry would affect the
homogeneity of the dataset or the continuity of our data and this could create
problem while data modelling and building the model. So, to avoid any such
kind of problems we have to make sure that our dataset does not have any
missing values or entries and in order to do that we would have to remove those
data points which have any missing value in them from the whole dataset.

Visualizing Data with Target Variable :

Fig 6. 2:Visualizing data with target

From Figure 6.2 it can be conclusion that used cars have a higher selling price
when sold by dealers in comparison to being sold by individuals. Similarly,
Figure 6 tells the fact that selling price of the cars with manual transmission is
lower than those cars which are having automatic transmission.

DEPARTMENT OF COMPUTER SCIENCE


USED CAR PRICE PREDICTION USING LINEAR REGRESSION MODEL 19

Fig 6. 3:Visualizing data

From Figure 6.3 it can be concluded that used cars with Diesel as fuel type have
higher selling price as compared to those which have Petrol and CNG as fuel
type. Additionally, Figure 8 clarifies that the selling price of cars with no
previous owners is higher than rest of the cars

Fig 6. 4:Corelation between Selling Price and Selling Price

DEPARTMENT OF COMPUTER SCIENCE


USED CAR PRICE PREDICTION USING LINEAR REGRESSION MODEL 20

Figure 6.4 depicts that a greater present price of the car would also result in a
greater selling price of the used car.

7. DISADVANTAGES

While Linear Regression Modelling is a popular and effective statistical method


for predicting a continuous response variable, it also has its disadvantages,
including:

 Limited Applicability: Linear Regression assumes a linear relationship


between the predictor variables and the response variable, which may not
always be the case in real-world scenarios.
 Overfitting: Linear Regression may lead to overfitting if the model is too
complex or if the number of predictor variables is large relative to the
sample size.
 Outliers: Linear Regression is sensitive to outliers, which can skew the
model's predictions and reduce its accuracy.
 Multicollinearity: Linear Regression assumes that the predictor variables
are independent of each other, which may not be the case if there is
multicollinearity among the predictor variables.
 Non-normality: Linear Regression assumes that the errors are normally
distributed, which may not be the case in real-world scenarios.

DEPARTMENT OF COMPUTER SCIENCE


USED CAR PRICE PREDICTION USING LINEAR REGRESSION MODEL 21

 Interpretability: Linear Regression may not always provide clear


interpretations of the relationship between the predictor variables and the
response variable.

8. CONCLUSION

In this study, a Linear Regression Model was successfully implemented


employing various prominent algorithms from the python libraries and modules.
After the collection of data was done, further processing of data was done. The
null entries and missing datapoints were removed from the dataset and the
categorical variables were also processed using One Hot Encoding technique. The
results showed that there is a positive correlation between Selling Price and
Present Price while a negative correlation between Selling Price and Kms Driven,
Years Used and Owner (Number of Previous Owners). Positive correlation can be
referred to as Direct proportion while Negative correlation can be referred to as
Inverse Proportion. Also, it was concluded that Selling Price of cars was higher
when sold by dealers when compared to individuals. Similarly, the Selling Price
was higher for cars that were automatic in transmission. It was also observed that
Selling Price of cars with Fuel Type Diesel was higher than those having Petrol
and CNG as Fuel Type. The r2 score of Linear Regression was 0.86 which is
good and predictions were quite close to the original selling prices.

DEPARTMENT OF COMPUTER SCIENCE


USED CAR PRICE PREDICTION USING LINEAR REGRESSION MODEL 22

9. FUTURE ENHANCEMENT

Future work will be done to refine the model and make it more useful in real
world situations. There are several ways to enhance the accuracy and
performance of a Linear Regression model for used car prediction. Some of the
future enhancements for used car prediction using Linear Regression Modelling
are:

 Feature Engineering: The inclusion of more relevant and useful features,


such as car make, model, and condition, can improve the accuracy of the
model. Additional features like service history, accident history, and car
specifications can help the model to make better predictions.
 Non-linear Relationships: Linear Regression assumes a linear relationship
between the predictor variables and the response variable. However,
including polynomial or interaction terms can capture the non-linear
relationships between the variables and improve the accuracy of the
model.
 Regularization Techniques: Regularization techniques such as Lasso and
Ridge regression can prevent overfitting and improve the accuracy of the
model.
 Ensemble Techniques: Ensemble techniques such as Random Forest and
Gradient Boosting can combine multiple Linear Regression models to
improve the accuracy and performance of the model.

DEPARTMENT OF COMPUTER SCIENCE


USED CAR PRICE PREDICTION USING LINEAR REGRESSION MODEL 23

 Handling Outliers: Outliers can significantly affect the accuracy of the


model. Identifying and handling outliers by removing them or
transforming the variables can help to improve the accuracy of the model.
 Model Interpretation: Linear Regression models can be difficult to
interpret. Providing clear interpretations of the relationship between the
predictor variables and the response variable can help to build trust and
confidence in the model's predictions.
 Data Quality: The quality of the data used to build the model can have a
significant impact on its accuracy. Ensuring data cleanliness and accuracy
by validating, cleaning, and normalizing the data can improve the
accuracy and performance of the model.

By incorporating these future enhancements, a Linear Regression model for


used car prediction can be improved to deliver more accurate predictions and
better performance.

DEPARTMENT OF COMPUTER SCIENCE


USED CAR PRICE PREDICTION USING LINEAR REGRESSION MODEL 24

10. REFERENCES

[1] M. G. Pattabiraman Venkatasubbu, “Used Cars Price Prediction using


Supervised Learning Techniques.,” International Journal of Engineering
and Advanced Technology (IJEAT), 2019.
[2] B. I. D. K. Z. M. J. K. Enis Gegic, “Car Price Prediction using Machine
Learning Techniques.,” International Burch University, Sarajevo, Bosnia
and Herzegovina, TEM Journal, 2019.
[3] A. W. D. A. K. D. M. V. Laveena D’Costa, “Predicting True Value of Used
Car using Multiple Linear Regression Model.,” International Journal of
Recent Technology and Engineering, 2020.
[4] S. Peerun, “ Predicting the Price of Second-hand Cars using Artificial
Neural Networks.,” Proceedings of the Second International Conference,
Nushrah Henna Chummun and Sameerchand Pudaruth, University of
Mauritius, Reduit, Mauritius., 2014.
[5] S. Pudaruth, “Predicting the Price of Used Cars using Machine Learning
Techniques.,” International Journal of Information & Computation
Technology, Computer Science and Engineering Department, University of
Mauritius, Reduit, MAURITIUS. , 2014.
[6] D. S. S. S. G. S. K. S.E.Viswapriya, “Vehicle Price Prediction using SVM
Techniques.,” International Journal of Innovative Technology and
Exploring Engineering, 2020.

DEPARTMENT OF COMPUTER SCIENCE

You might also like