Professional Documents
Culture Documents
(University of Mumbai)
(2022-2023)
TERNA ENGINEERING COLLEGE, NERUL,
NAVI MUMBAI
Department of Computer Engineering
Academic Year 2022-23
CERTIFICATE
This is to certify that the mini project 2 A entitles “Car Price Prediction” is a bonafide
work of
Submitted to the University of Mumbai in partial fulfillment of the requirement for the
award of the Bachelor of Engineering (Computer Engineering).
Submitted by:
1.---------------------------------------------------------
2.----------------------------------------------------------
Date: ---------------------------------
Place:---------------------------------
Declaration
We declare that this written submission represents our ideas in our own words and where
others' ideas or words have been included, we have adequately cited and referenced the
original sources. We also declare that we have adhered to all principles of academic
honesty and integrity and have not misrepresented or fabricated or falsified any
idea/data/fact/source in our submission. We understand that any violation of the above
will be cause for disciplinary action by the Institute and can also evoke penal action from
the sources which have thus not been properly cited or from whom proper permission
has not been taken when needed.
Date: _____________________
Place: _____________________
Acknowledgement
We would like to express our sincere gratitude towards our guide Prof. Dnyanada
Dafale, Project Coordinators Prof. Rohini Palve for their help, guidance and
encouragement, they provided during the project development. This work would have
not been possible without their valuable time, patience and motivation. We thank them
for making my stint thoroughly pleasant and enriching. It was great learning and an
honor being their student.
We are deeply thankful to Dr.Seema Biday (H.O.D Computer Department) and entire
team in the Computer Department. They supported us with scientific guidance, advice
and encouragement, they were always helpful and enthusiastic and this inspired us in our
work.
We take the privilege to express our sincere thanks to Dr. L. K. Ragha our Principal for
providing the encouragement and much support throughout our work.
Date: _____________________
Place: _____________________
Table of Contents
Abstract i
List of Figures ii
List of Abbreviations iii
Chapter 1 Introduction
1.1 Introduction 1
1.2 Scope 2
1.3 Organization of The Report 3
Chapter 2 Literature Survey
2.1 Existing System 4
2.2 Problem Statement 7
2.3 Objective of project 7
Chapter 3 Software Analysis
3.1 Software Model 8
3.1.1 Phases of Software Model 9
3.2 Proposed System 9
3.3 System Requirement Specification (SRS) 10
3.3.1 Hardware and Software Requirements 10
Chapter 4 Design
4.1 Data Flow Diagrams 11
4.2 Use Case Diagram 12
Chapter 5 Methodology
5.1 Methodology Used To Perform Experiment 13
Car price prediction is a crucial task in the automobile industry, as it helps both buyers and sellers to
make informed decisions. In recent years, machine learning techniques have been widely used for
predicting car prices. In this project, we aim to develop a machine learning model that can accurately
predict the price of a car based on its various features such as make, model, year, mileage, engine size,
fuel type, transmission type, and other relevant features.
The dataset used in this project will be sourced from reliable sources, such as online car marketplaces,
car dealerships, and other automobile-related websites. The dataset will be preprocessed to remove any
missing or irrelevant data, and feature engineering techniques will be used to create new features that
may improve the accuracy of the model.
We will evaluate various machine learning algorithms, including linear regression, decision trees, and
random forests, to determine the most suitable algorithm for our task. We will also use cross-validation
techniques to ensure that the model is not overfitting the training data.
Finally, we will deploy the model in a web application, where users can enter the features of a car and
receive a predicted price based on the model's trained parameters. The web application will be designed
to be user-friendly, and the results will be presented in a clear and easy-to-understand manner. This
project can have significant implications for the automobile industry, as it can help both buyers and
sellers make informed decisions and reduce the information asymmetry in the market.
List Of Figures
Level 0 DFD 11
Level 1 DFD 11
6.2.1 Result 17
List of Abbreviations
ML – Machine Learning
1.1: Introduction
Increasing number of cars being manufactured and sold around the world. One of the most important
factors that determine the success of a car sale is the price of the vehicle. Buyers and sellers alike need
to have an accurate understanding of a car's market value to make informed decisions. However, the
process of determining the price of a car can be complex, with various factors to consider, such as
make, model, year, mileage, engine size, fuel type, transmission type, and other relevant features.
Machine learning techniques have gained significant traction in recent years for solving complex
prediction problems, and car price prediction is no exception. The use of machine learning algorithms to
predict car prices can significantly reduce the time and effort required to evaluate a car's market value
accurately. Additionally, machine learning models can take into account a wide range of factors that
may affect the price of a car, including those that may be difficult for humans to consider.
In this project, we aim to develop a machine learning model that can accurately predict the price of a car
based on its various features. The dataset used in this project will be sourced from reliable sources, such
as online car marketplaces, car dealerships, and other automobile-related websites. The dataset will be
preprocessed to remove any missing or irrelevant data, and feature engineering techniques will be used
to create new features that may improve the accuracy of the model.
The model will be evaluated using various machine learning algorithms, including linear regression,
decision trees, and random forests, to determine the most suitable algorithm for our task. We will also
use cross-validation techniques to ensure that the model is not overfitting the training data. Finally, we
will deploy the model in a web application, where users can enter the features of a car and receive a
predicted price based on the model's trained parameters. This project can have significant implications
for the automobile industry, as it can help both buyers and sellers make informed decisions and reduce
the information asymmetry in the market.
1.2: Scope
The scope of this project is to develop a machine learning model that can accurately predict the price of
a car based on various features such as make, model, year, mileage, engine size, fuel type, transmission
type, and other relevant features. The project will involve the following steps:
1. Data Collection: Data will be sourced from reliable sources, such as online car marketplaces, car
dealerships, and other automobile-related websites.
2. Data Preprocessing: The collected data will be preprocessed to remove any missing or irrelevant
data, and feature engineering techniques will be used to create new features that may improve
the accuracy of the model.
3. Model Selection: Various machine learning algorithms such as linear regression, decision trees,
and random forests will be evaluated to determine the most suitable algorithm for our task.
4. Model Training: The selected algorithm will be trained on the preprocessed data using cross-
validation techniques to ensure that the model is not overfitting the training data.
5. Model Evaluation: The trained model will be evaluated on a test dataset to measure its
performance in predicting car prices accurately.
6. Model Deployment: The model will be deployed in a web application where users can enter the
features of a car and receive a predicted price based on the model's trained parameters.
The project's scope is limited to predicting car prices based on the given features, and the accuracy of
the model will depend on the quality and quantity of the data collected. The project can be extended by
incorporating additional features and exploring advanced machine learning techniques.
1.3: Organization Of The Report
Chapter 2 contains Literature Survey. In this chapter, we have studied and reviewed the previous
work done on the topics related to our project. We have included different papers published by
their respective authors. Also, we have mentioned the Problem Statement and Objective of the
project.
Chapter 3 Methodology deals with the Software Model and its phases, the proposed system, SRS,
Hardware and Software Requirements and Gantt Chart.
Chapter 4 includes design of the project with DFD, Use Case, Flowchart, Sequence and Data Model
Diagrams.
This project aims to develop an application which will predict the Car prices for various data using
machine learning model.
The user will get the predicted values and with that he can decide which car to buy. In the current day
scenario car is very important in day today travel and almost everyone’s needs it. But there are also
many people who wants to buy a car but they can not buy because of financial issue so car price
prediction can help them to find a used car according to their data and what they can afford. The
proposed system can help save money of several customers by providing them the information of car
price according to their uses.
CHAPTER 3: SOFTWARE ANALYSIS
The Waterfall model is the SDLC approach that is used for the project development. The waterfall
Model illustrates the software development process in a linear sequential flow. This means that any
phase in the development process begins only if the previous phase is complete.
•Requirement Gathering and analysis − All possible requirements of the system to be developed are
captured in this phase and documented in a requirement specification document.
•System Design − The requirement specifications from first phase are studied in this phase and the
system design is prepared. This system design helps in specifying hardware and system requirements
and helps in defining the overall system architecture.
•Implementation − With inputs from the system design, the system is first developed in small
programs called units, which are integrated in the next phase. Each unit is developed and tested for its
functionality, which is referred to as Unit Testing.
•Integration and Testing − All the units developed in the implementation phase are integrated into a
system after testing of each unit. Post integration the entire system is tested for any faults and failures.
•Deployment of system − Once the functional and non-functional testing is done; the product is
deployed in the customer environment or released into the market.
•Maintenance − There are some issues which come up in the client environment. To fix those issues,
patches are released. Also, to enhance the product some better versions are released. Maintenance is
done to deliver these changes in the customer environment.
3.1.1 Phases of Software Model
The reasons of using waterfall development are that it allows for departmentalization and control.
We can update the system from the last step, hence waterfall model is the most suitable software model
for this project. A schedule can be set with deadlines for each stage of development and a product can
proceed through the development process model phases one by one.
The proposed system for car price prediction using machine learning would involve the
development of a machine learning model that can predict the price of a car based on its fetures.
The system would be user-friendly and interactive, making it easy for users to input
the required information and obtain an estimated price.
3.3 System Requirement Specification (SRS)
Hardware Requirements:
A computer or server with a modern processor (e.g., Intel Core i5 or higher) and at least 8GB of
RAM
A GPU (graphics processing unit) with sufficient memory (at least 2GB) if using deep learning
algorithms
Sufficient storage space to store the dataset and trained models
An internet connection for data collection and deployment
Software Requirements:
An operating system (e.g., Windows, Linux, )
Python programming language (version 3.7 or higher)
Machine learning libraries such as Scikit-learn, TensorFlow, or PyTorch
Data manipulation libraries such as Pandas and NumPy
Visualization libraries such as Matplotlib and Seaborn
Development environment such as Jupyter Notebook, Spyder, or Visual Studio Code
CHAPTER 4: DESIGN
Level 0 DFD
Level 1 DFD
4.2: Use Case Diagram
Collecting Data:
As you know, machines initially learn from the data that you give them. It is of the utmost importance
to collect reliable data so that your machine learning model can find the correct patterns. The quality of
the data that you feed to the machine will determine how accurate your model is. If you have incorrect or
outdated data, you will have wrong outcomes or predictions which are not relevant.
After you have your data, you must prepare it. You can do this by :
Putting together all the data you have and randomizing it. This helps make sure that data is evenly
distributed, and the ordering does not affect the learning process.
Cleaning the data to remove unwanted data, missing values, rows, and columns, duplicate values, data
type conversion, etc. You might even have to restructure the dataset and change the rows and columns
or index of rows and columns.
Visualize the data to understand how it is structured and understand the relationship between various
variables and classes present.
Choosing a Model:
A machine learning model determines the output you get after running a machine learning algorithm on
the collected data. It is important to choose a model which is relevant to the task at hand. Over the years,
scientists and engineers developed various models suited for different tasks like speech recognition,
image recognition, prediction, etc. Apart from this, you also have to see if your model is suited for
numerical or categorical data and choose accordingly.
Training is the most important step in machine learning. In training, you pass the prepared data
to your machine learning model to find patterns and make predictions. It results in the model
learning from the data so that it can accomplish the task set. Over time, with training, the
model gets better at predicting.
Evaluating the Model:
After training your model, you must check to see how it’s performing. This is done by testing the
performance of the model on previously unseen data. The unseen data used is the testing set that you
split our data into earlier. If testing was done on the same data which is used for training, you will not
get an accurate measure, as the model is already used to the data, and finds the same patterns in it, as it
previously did. This will give you disproportionately high accuracy. When used on testing data, you get
an accurate measure of how your model will perform and its speed.
Parameter Tuning:
Once you have created and evaluated your model, see if its accuracy can be improved in any way. This
is done by tuning the parameters present in your model. Parameters are the variables in the model that the
programmer generally decides. At a particular value of your parameter, the accuracy will be the
maximum. Parameter tuning refers to finding these values.
Making Predictions:
In the end, you can use your model on unseen data to make predictions accurately.
Creation of the structure is done with HTML and Styling is done with CSS.
The machine learning regression algorithms are used in this project. So, the evaluation metrics are as
follows:
MAE is a very simple metric which calculates the absolute difference between actual and predicted values.
MSE is a most used and very simple metric with a little bit of change in mean absolute error. Mean squared
error states that finding the squared difference between actual and predicted value.
As RMSE is clear by the name itself that it is a simple square root of mean squared error.
R Squared (R2)
R2 squared is also known as Coefficient of Determination or sometimes also known asGoodness of fit. R2
score is a metric that tells the performance of your model, not the loss in an absolute sense that how many
well did your model perform. In contrast, MAE and MSE depend on the context as we have seen whereas
the R2 score is independent of context. So, with help of R squared we have a baseline model to comparea
model which none of the other metrics provides.
6.2 Experiment Result and Discussion
Result
CHAPTER 7: CONCLUSION
Car price prediction using machine learning is a useful application of machine learning algorithms that can
help buyers and sellers make informed decisions in the automobile market. By using machine learning
models to predict the prices of cars based on their features, it is possible to obtain accurate estimates of the
value of a car.
The proposed system for car price prediction using machine learning involves data collection, data
preprocessing, machine learning model development, model evaluation, model deployment, user interface
development, and maintenance. The system uses various machine learning algorithms to learn patterns and
relationships in the data, and provides estimated prices for cars based on their features.
The hardware and software requirements for car price prediction using machine learning depend on the
complexity of the machine learning algorithms used and the size of the dataset. However, a modern
computer with sufficient memory and storage, along with commonly used machine learning libraries and
programming languages, should be sufficient for most projects.
In conclusion, car price prediction using machine learning is a promising application of machine learning
that can benefit both buyers and sellers in the automobile market. By providing accurate estimates of car
prices based on their features, it can help buyers make informed decisions and sellers set reasonable prices
for their vehicles.
REFERENCES
https://www.temjournal.com/content/81/TEMJournalFebruary2019_113_118.pdf
https://www.ijeat.org/wp-content/uploads/papers/v9i1s3/A10421291S319.pdf
https://www.ijcaonline.org/archives/volume167/number9/noor-2017-ijca-914373.pdf
http://ripublication.com/irph/ijict_spl/ijictv4n7spl_17.pdf
https://youtu.be/p_tpQSY1aTs
https://www.irjet.net/archives/V8/i4/IRJET-V8I4278.pdf
PUBLICATION
Car Price Prediction
Fahad Khan Shashank Tiwari Aman Lakhotra
Computer Science and Technology Computer Science and Technology Computer Science and Technology
Department, Department, Department,
Terna Engineering College, Terna Engineering College, Terna Engineering College,
Navi Mumbai, India Navi Mumbai, India Navi Mumbai, India
fahadkhan@ternaengg.ac.in shashanktiwari@ternaengg.ac.in amanlakhotra@ternaengg.ac.in
Abstract— The project aims to develop a web application can help save money of several customers by
that predicts the price of a used car based on its features using providing them the information of car price according
machine learning algorithms. The Random Forest Regression
to their uses.
model is used to train and test the data to predict the price
accurately. The Flask framework is used for the backend of the As we can see the car is important in are day to day
web application, and HTML and CSS are used for the frontend life ,are scope of project is to predict the price of old
design. The user can input the car's features, such as model
year, type and year of purchase, and get the predicted price
cars depends upon its condition(km traveled, fuel type,
based on the model. The accuracy of the model is evaluated etc). It predict the price of the car based on data it have
using various metrics such as Root Mean Square Error been learn ,Not always it predicts perfectly sometimes
(RMSE) and R-squared. The project provides a user-friendly there is little difference in the actual price and predicts
and interactive interface to predict the price of a used car. price Many times Dealer cheats the customer sells
them with high price. Its main Focus is to help both
Keywords—random forest regression, car, flask, root mean
square error
individual dealer can buy.
Determining whether the listed price if used cars The proposed system for car price prediction using
a challenging task, due to many factors that driver machine learning would involve the development of a
used vehicle prices on the market. Car price prediction machine learning model that can predict the price of a
is some how interesting and popular problem. As per car based on its features. The system would be user-
information that was gotten from the Agency for friendly and interactive, making it easy for users to
Statistics of BIH, input the required information and obtain an estimated
921.456 vehicles were registered in 2014 from which price.
84% of them are cars for personal usage . This number An algorithm is a process or set of rules to be followed
is increased by 2.7% since 2013 and it is likely that in calculations or other problem-solving operations.
this trend will continue, and the number of cars will The Car Price Prediction system is designed and
increase in future. This adds additional significance to implemented using Python, HTML & CSS, and
the problem of the car price prediction. Accurate car Bootstrap.
price prediction involves expert knowledge, because
1. Data Collection:
price usually depends on many distinctive features and
factors. Typically, most significant ones are brand and Data collection is defined as the procedure of
model, age, horsepower and mileage. The fuel type collecting, measuring and analyzing accurate
used in the car as well as fuel consumption per mile insights for research using standard validated
highly affect price of a car due to a frequent changes in techniques.
the price of a fuel.
In this case, we have collected the data from
This project aims to develop an application which will the internet.
predict the Car prices for various data using machine 2. Data Pre-Processing:
learning model.
Data preprocessing is a data mining technique
The user will get the predicted values and with that he which is used to transform the raw data in a
can decide which car to buy. In the current day useful and efficient format.
scenario car is very important in day-to-day travel and
almost everyone’s needs it. But there are also many 3. Exploratory data analysis: It is an approach of
people who wants to buy a car but they can not buy analyzing data sets to summarize their main
because of financial issue so car price prediction can characteristics, often using statistical graphics and
help them to find a used car according to their data other data visualization methods
and what they can afford. The proposed system 4. Training Data:
This type of data builds up the machine confirm that the ML algorithm was trained
learning algorithm. The data scientist feeds the effectively.
algorithm input data, which corresponds to an 6. Feature selection:
expected output. The model evaluates the data
repeatedly to learn more about the data’s Feature selection algorithms are categorized as
behavior and then adjusts itself to serve its either supervised, which can be used for
intended purpose. labeled data; or unsupervised, which can be
used for unlabeled data. Unsupervised
5. Test Data:
techniques are classified as filter methods,
After the model is built, testing data once wrapper methods,
again