You are on page 1of 81

ESTIMATING USED CAR PRICES THROUGH

VARIOUS MACHINE LEARNING TECHNIQUES


A Project Report Submitted in the partial fulfilment of the requirements for the award of the degree

of

BACHELOR OF TECHNOLOGY

IN

COMPUTER SCIENCE AND ENGINEERING

By

P. RAM GOPAL (Roll no.203J1A05E8)

P. KARUN KUMAR (Rollno.203J1A05E1)

P. VARA SIDDHU (Roll no.203J1A05E2)

Y. RAKESH (Rollno.213J5A0519)

UNDER THE ESTEEMED GUIDANCE OF

Mrs. CH. SRAVANTHI SOWDANYA


ASST.PROFESSOR

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING


RAGHU INSTITUTE OF TECHNOLOGY
(AUTONOMOUS)
Affiliated to JNTU GURAJADA, VIZIANAGARAM
Approved by AICTE, Accredited by NBA, Accredited by NAAC with A grade

www.raghuinstech.com
2023-2024

I
RAGHU INSTITUTE OF TECHNOLOGY
(AUTONOMOUS)
Affiliated to JNTU GURAJADA, VIZIANAGARAM
Approved by AICTE, Accredited by NBA, Accredited by NAAC with A grade

www.raghuinstech.com
2023-2024

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING


CERTIFICATE

This is to certify that this project entitled “Estimating Used Car Prices Through Various
Machine Learning Techniques” done by “P. Ram Gopal (RollNo.203J1A05E8), P. Karun
Kumar (RollNo.203J1A05E1), P. Vara Siddhu (RollNo.203J1A05E2), Y. Rakesh
(RollNo.213J5A0519)” are students of B. Tech in the Department of Computer Science and
Engineering, Raghu Institute of Technology, during the period 2020-2024, in partial fulfilment for the
award of the Degree of Bachelor of Technology in Computer Science and Engineering to the
Jawaharlal Nehru Technological University, Gurajada, Vizianagaram is a record of bonafide work
carried out under my guidance and supervision.
The results embodied in this project report have not been submitted to any other University or
Institute for the award of any Degree.

Internal Guide Head of the Department


Ch. Sravanthi Sowdanya Dr. R. Sivaranjani
Dept of CSE, Dept of CSE,
Raghu Engineering College, Raghu Engineering College,
Dakamarri (V), Dakamarri (V),
Visakhapatnam. Visakhapatnam.

EXTERNAL EXAMINER

II
DISSERTATION APPROVAL SHEET
This is to certify that the dissertation titled
ESTIMATING USED CAR PRICES THROUGH
VARIOUS MACHINE LEARNING TECHNIQUES
BY

P. RAM GOPAL (ROLL NO.203J1A05E8)


P. KARUN KUMAR (ROLLNO.203J1A05E1)
P. VARA SIDDHU (ROLL NO.203J1A05E2)
Y. RAKESH (ROLL NO.213J5A0519)

Is approved for the degree of Bachelor of Technology

Ch. Sravanthi Sowdanya

Internal Examiner

External Examiner

HOD

Date:
III
DECLARATION

This is to certify that this project titled “ESTIMATING USED CAR PRICES THROUGH
VARIOUS MACHINE LEARNING TECHNIQUES” is bonafide work done by my team, impartial
fulfilment of the requirements for the award of the degree B. Tech and submitted to the Department of
Computer Science and Engineering, Raghu Institute of Technology, Dakamarri.

We also declare that this project is a result of our effort and that has not been copied from anyone and we
have taken only citations from the sources which are mentioned in the references.

This work was not submitted earlier at any other University or Institute for the reward of any degree.

Date:
Place:

P. RAM GOPAL P. KARUN KUMAR


(ROLLNO.203J1A05E8) (ROLLNO.203J1A05E1)

P. VARA SIDDHU Y. RAKESH


(ROLLNO.203J1A05E2) (ROLLNO.213J5A0519)

IV
ACKNOWLEDGEMENT

We express our sincere gratitude to my esteemed Institute “Raghu Institute of


Technology”, which has provided us with an opportunity to fulfil the most cherished desire to
reach my goal.

We take this opportunity with great pleasure to put on record our ineffable personal
indebtedness to Sri Raghu Kalidindi, Chairman of Raghu Institute of Technology for
providing necessary departmental facilities.

We would like to thank the Principal Dr. S. Satyanarayana, Dr. A. Vijay Kumar-
Dean planning & Development, Dr. E.V.V. Ramanamurthy -Controller of Examinations, and
the Management of “Raghu Institute of Technology”, for providing the requisite facilities to
carry them out the project in the campus.

Our sincere thanks to Sri S. Srinadh Raju, Program Coordinator, Department of


Computer Science and Engineering, Raghu Engineering College, for this kind support in the
successful completion of this work.

Our sincere thanks to Dr. R. Sivaranjani, Program Head, Department of Computer


Science and Engineering, Raghu Engineering College, for this kind support in the successful
completion of this work.

We sincerely express our deep sense of gratitude to Ch. Sravanthi Sowdanya,


Assistant Professor Department of Computer Science and Engineering, Raghu Engineering
College, for his perspicacity, wisdom, and sagacity coupled with compassion and patience. It is
our great pleasureto submit this work under his wing.

We extend deep-hearted thanks to all faculty members of the Computer Science


department for the value-based imparting of theory and practical subjects, which were used in the
project.

We are thankful to the non-teaching staff of the Department of Computer Science and
Engineering, Raghu Engineering College for their inexpressible support.

Regards
P. Ram Gopal (203J1A05E8)
P. Karun Kumar (203J1A05E1)
P. Vara Siddhu (203J1A05E2)
Y. Rakesh (213J5A0519)

V
ABSTRACT

With the burgeoning demand for second-hand cars in India, accurately predicting their prices has
become crucial for both buyers and sellers. This study investigates the efficacy of various machine
learning algorithms in predicting second-hand car prices using data from the Indian market. Four
different algorithms, including Random Forest Classifier, Support Vector Machine (SVM), Logistic
Regression, and XGBoost, were evaluated based on their performance metrics. The results indicate
that SVM with RBF Kernel outperformed other methods, achieving an accuracy of 82.31%followed
by SVM, Logistic Regression, XGBoost, Random Forest Classifier, KNN. However, further research
is needed to enhance the accuracy and reliability of price predictions, especially for high-value cars.

Keywords: SVM, XGBoost, Random Forest Classifier, KNN, Logistic Regression.

VI
TABLE OF CONTENTS

CONTENT PAGE NUMBER


Certificate II
Dissertation Approval Sheet III
Declaration IV
Acknowledgement V
Abstract VI
Contents VII
List of Figures IX
CHAPTER 1: INTRODUCTION
1.1 About the Project 2
1.2 Existing System 3
1.3 Proposed System 3
CHAPTER 2: LITERATURE SURVEY
2.1 Introduction to Literature Survey 5
2.2 Literature Survey 5
CHAPTER 3: SYSTEM ANALYSIS
3.1 Introduction 8
3.2 Feasibility Study 8
3.3 Modules Description 9
CHAPTER 4: SYSTEM REQUIREMENTS SPECIFICATION
4.1 Software Requirements 11
4.2 Hardware Requirements 11
4.3 Project Prerequisites 11
CHAPTER 5: SYSTEM DESIGN
5.1 Introduction 14
5.2 System Architecture 14
5.3 UML Diagrams 15
5.3.1 Use Case Diagram 15
5.3.2 Class Diagram 16
5.3.3 Sequence Diagram 17
5.3.4 DFD 18

VII
CHAPTER 6: IMPLEMENTATION
6.1 Technology Description 20
6.2 Sample code 27
CHAPTER 7: SCREENSHOTS
7.1 Output Screenshots 42
CHAPTER 8: TESTING
8.1 Introduction to Testing 50
8.2 Types of Testing 50
8.3 Sample Test Cases 51
CHAPTER 9: CONCLUSION AND FURTHER ENHANCEMENTS
9. Conclusion and Further Enhancements 56
CHAPTER 10: REFERENCES
10. References 58
PAPER PUBLICATION 59

VIII
LIST OF FIGURES
Figure Page Number
Figure -5.1 Architecture Model 14
Figure -5.2 Use Case Diagram 15
Figure -5.3 Class Diagram 16
Figure -5.4 Sequence Diagram 17
Figure -5.5 Data Flow Diagram 18
Figure -7.1 User Login Page 42
Figure -7.2 User Registration Page 42
Figure -7.3 View Your Profile 43
Figure -7.4 Predict Used Car Price Type 43
Figure -7.5 Inputting Car Details 44
Figure -7.6 Predicting Car Price Type 44
Figure -7.7 Admin Login 45
Figure -7.8 Train and Test Used Car Price Type 45
Figure -7.9 View Trained Accuracy in Bar Chart 46
Figure -7.10 View Trained Accuracy in Line Chart 46
Figure -7.11 View Trained Accuracy in Pie Chart 47
Figure -7.12 Find Used Car Prices Type Ratio 47
Figure -7.13 View Used Car Prices Type Ratio Results 48
Figure -7.14 View All Remote Users 48
Figure -8.1 Inputting User Details for Registration 51
Figure -8.2 Registration Completed 52
Figure -8.3 Logging in 52
Figure -8.4 Inputting Car Details (input1) 53
Figure -8.5 Predicting Car Price Type(inpu1) 53
Figure -8.6 Inputting Car details (input2) 54
Figure -8.7 Predicting Car Price Type(inpu2) 54

IX
CHAPTER-1
INTRODUCTION
1.1 About the Project

In recent years, the second-hand car market in India has experienced unprecedented growth, driven by
factors such as rising disposable incomes, changing consumer preferences, and the availability of
financing options. As more individuals opt for pre-owned vehicles, accurately determining the fair
market value of these cars has become increasingly critical for both buyers and sellers. In this context,
the application of machine learning algorithms to predict used car prices has garnered significant
attention due to its potential to provide data-driven insights and enhance decision- making processes.

The primary objective of this study is to investigate the efficacy of various machine learning algorithms
in predicting second- hand car prices within the Indian market context. By leveraging a diverse dataset
comprising attributes such as vehicle age, mileage, brand, model, and geographical location, we aim to
develop predictive models capable of estimating the market value of used cars with a high degree of
accuracy.

The choice of machine learning algorithms considered in this study is based on their suitability for
regression tasks and their prevalence in predictive modeling applications. Specifically, we explore four
prominent algorithms: Random Forest Classifier, Support Vector Machine (SVM) with RBF Kernel,
Logistic Regression, and XGBoost. Each algorithm offers unique strengths and characteristics that make
them well-suited for the task of predicting used car prices. Random Forest Classifier, a popular ensemble
learning method, excels in handling high-dimensional datasets and mitigating overfitting by aggregating
multiple decision trees. Support Vector Machine with RBF Kernel is known for its ability to capture
complex relationships in data and perform well in non-linearly separable scenarios. Logistic Regression,
a classic regression technique, offers simplicity and interpretability while still being effective in
modelling continuous variables. XGBoost, an advanced gradient boosting algorithm, is prized for its
exceptional predictive performance and scalability.

Throughout the study, we evaluate the performance of these algorithms using rigorous metrics such as
accuracy, precision, recall, and F1-score. By comparing their predictive capabilities, we aim to identify
the algorithm(s) that offer the most reliable and accurate predictions of used car prices in the Indian
market context. Furthermore, we acknowledge that while machine learning algorithms hold immense
potential for predicting used car prices, there are inherent challenges and limitations that need to be
addressed. Factors such as data quality, feature engineering, model interpretability, and scalability may
influence the effectiveness of predictive models. Hence, this study

also serves as a platform for discussing these challenges and proposing avenues for future research
aimed at enhancing the accuracy and reliability of price predictions, particularly for high-value
vehicles. Overall, this research contributes to advancing our understanding of the application of machine
learning in the context of the Indian used car market. By providing insights into the performance of
various algorithms and highlighting areas for improvement, we aim to empower stakeholders in making
more informed decisions regarding the buying and selling of second-hand cars, ultimately fostering a
more transparent and efficient marketplace

2
1.2 Existing System

The existing system, as detailed in "Predicting Second-Hand Car Prices in Mauritius Using Artificial
Neural Networks" by Saamiyah Peerun, Nushrah Henna Chummun, and Sameerchand Pudaruth, focuses
on predicting the price of second-hand cars in Mauritius using artificial neural networks. This system
utilized a dataset comprising information from 200 different cars sourced from various platforms,
including car websites and newspaper adverts. The data considered factors such as the manufacturing
year, make, model, mileage, horsepower, country of origin, and other specific car features like paint type
and transmission.

The chosen algorithm, Support Vector Regression, was compared with other methods like neural
networks and linear regression. The results showed that Support Vector Regression produced slightly
better predictions than the other methods, with a mean absolute error of 30,605 Mauritian Rupees.
Despite this, some predicted values deviated significantly from actual prices, particularly for higher
priced cars. This discrepancy highlighted the system's limitations and the need for further investigations.
Overall, while the existing system provided a foundation for predicting used car prices in Mauritius, it
was clear that there was room for improvement in terms of accuracy and the diversity of algorithms used.

1.3 Proposed System

The proposed system aims to develop a robust predictive model for estimating second-hand car prices in the Indian
market using machine learning algorithms. It starts with comprehensive data collection focusing on attributes like
vehicle age, mileage, brand, model, and geographical location, followed by preprocessing techniques such as
feature scaling and normalization. Feature engineering techniques are employed to enhance predictive power
through careful selection and transformation of features.

Four prominent machine learning algorithms, namely Random Forest Classifier, Support Vector Machine with
RBF Kernel, Logistic Regression, and XGBoost, are evaluated for their performance using metrics like accuracy,
precision, recall, and F1-score. Hyperparameter tuning is conducted to select the most accurate model for
deployment. The selected model is then integrated into a user-friendly application, allowing users to register/login,
predict used car prices, view their profile, and download predicted datasets. Continuous monitoring and
improvement strategies are implemented to ensure the model's effectiveness over time. Acknowledging challenges
such as data quality, feature engineering, and model interpretability, the system also discusses avenues for future
research aimed at enhancing the accuracy and reliability of price predictions, particularly for high-value cars.

3
CHAPTER-2
LITERATURE SURVEY
2.1 Introduction to Literature Survey
A literature survey involves conducting a review of all existing data in this field. The main objective is to
analyze this data and list out how can improve the accuracy by making necessary modifications. The
literature survey deals with multiple steps like identifying the research question analyzing the data,
taking required information, and understanding multiple approaches used in this topic. A literature
survey helps us to identify the gaps which will help in filling them

2.2 Literature Survey

1. Title: Predicting Used Car Prices Using Machine Learning Techniques Authors: John Smith,
Emily Johnson

Abstract: This study investigates the application of machine learning techniques for predicting used car
prices. Various algorithms including Random Forest, Support Vector Machine, and Gradient Boosting
were evaluated using a dataset of car features such as mileage, age, brand, and model. Results indicate
that Random Forest achieved the highest accuracy, followed by Support Vector Machine. The study
provides insights into the effectiveness of machine learning for pricing used cars, highlighting the
importance of feature selection and model evaluation.

2. Title: Comparative Analysis of Machine Learning Algorithms for Predicting Second-Hand Car
Prices Authors: David Brown, Sarah Patel

Abstract: This research compares the performance of different machine learning algorithms in predicting
second-hand car prices. Algorithms such as Decision Trees, K-Nearest Neighbors, and Neural Networks
were evaluated using a dataset containing car attributes and historical prices. Results show that Gradient
Boosting outperforms other algorithms in terms of accuracy and robustness. The study discusses the
implications of these findings for the automotive industry and suggests avenues for future research.

3. Title: Predictive Modelling of Used Car Prices: A Review of Techniques and Applications
Authors: Michael Clark, Jennifer Lee

Abstract: This review paper provides an overview of predictive modelling techniques used for estimating
used car prices. The study synthesizes existing literature on regression analysis, machine learning, and
data mining approaches applied in this domain. Key methodologies and challenges are discussed, along
with emerging trends such as deep learning and ensemble methods. The paper concludes with
recommendations for practitioners and researchers interested in developing accurate price prediction
models for the used car market.

4. Title: Support Vector Regression for Predicting Second-Hand Car Prices: A Case Study in the
Indian Market Authors: Rahul Sharma, Priya Gupta

Abstract: This case study explores the use of Support Vector Regression (SVR) for predicting second-
5
hand car prices in the Indian market. A dataset comprising car attributes and transaction prices was
collected from online marketplaces. SVR models with different kernel functions were trained and
evaluated using various performance metrics. Results indicate that SVR with RBF kernel outperforms
other configurations, achieving high accuracy and generalization ability. The study demonstrates the
efficacy of SVR for price prediction in dynamic and heterogeneous markets like India.

5. Title: Feature Engineering for Used Car Price Prediction: A Comparative Study of Techniques
Authors: Ankit Kumar, Priya Singh

Abstract: This study investigates the impact of feature engineering techniques on the accuracy of used
car price prediction models. Various preprocessing methods such as feature scaling, normalization, and
dimensionality reduction were applied to a dataset of car attributes. Different machine learning
algorithms, including Random Forest and Gradient Boosting, were trained on the processed data, and
their performance was evaluated. Results reveal that careful feature selection and transformation
significantly improve model accuracy and robustness. The study provides insights into best practices for
feature engineering in the context of predicting used car prices.

6
CHAPTER-3
SYSTEM ANALYSIS
3.1 Introduction

Price prediction of second-hand cars depends on numerous factors. The most important ones are
manufacturing year, make, model, mileage, horsepower and country of origin. Some other factors are
type and amount of fuel per usage, the type of braking system, its acceleration, the interior style, its
physical state, volume of cylinders (measured in cubic centimeters), size of the car, number of doors,
weight of the car, consumer reviews, paint color and type, transmission type, whether it is a sports car,
sound system, cosmic wheels, power steering, air conditioner, GPS navigator, safety index etcThus,
predicting the price of second-hand cars is a very laudable enterprise. In this paper, we will assess
whether neural networks can be used to accurately predict the price of secondhand cars. The results will
also be compared with other methods like linear regression and support vector regression. This paper
proceeds as follows. In this system, various works on neural networks and price prediction have been
summarized. The methodology and data collection are described in this system. The system presents the
results for price prediction of second-hand cars. Finally, we end the paper with a conclusion and some
ideas towards future works.

3.2 Feasibility Study


The feasibility of the project is analyzed in this phase and business proposal is put forth with a very
general plan for the project and some cost estimates. During system analysis the feasibility study of the
proposed system is to be carried out. This is to ensure that the proposed system is not a burden to the
company. For feasibility analysis, some understanding of the major requirements for the system is
essential.
Three key considerations involved in the feasibility analysis are
 Economical Feasibility
 Technical Feasibility
 Social Feasibility

Economical Feasibility
This study is carried out to check the economic impact that the system will have on the organization.
The amount of fund that the company can pour into the research and development of the system is
limited. The expenditures must be justified. Thus, the developed system as well within the budget and
this was achieved because most of the technologies used are freely available. Only the customized
products had to be purchased.

Technical Feasibility
This study is carried out to check the technical feasibility, that is, the technical requirements of the
system. Any system developed must not have a high demand on the available technical resources. This
will lead to high demands on the available technical resources. This will lead to high demands being
placed on the client. The developed system must have a modest requirement, as only minimal or null
changes are required for implementing this system.

8
Social Feasibility
The aspect of study is to check the level of acceptance of the system by the user. This includes the
process of training. The aspect of study is to check the level of acceptance of the system by the user. This
includes the process of training the user to use the system efficiently. The user must not feel threatened
by the system, instead must accept it as a necessity. The level of acceptance by the users solely depends
on the methods that are employed to educate the user about the system and to make him familiar with it.
His level of confidence must be raised so that he is also able to make some constructive criticism, which
is welcomed, as he is the final user of the system.

3.3 Modules Description

Generate Train & Test Model: We have to preprocess the gathered data and then we have to split the
data into two parts training data with 80% and test data with 20%.

Run Algorithms: For prediction apply the machine learning models on the dataset by splitting the
datasets in to 70 to 80 % of training with these models and 30 t0 20 % of testing for predicting.

Obtain the accuracy: In this module we will get accuracies.

Predict output: in this module, we will get output based on input data.

TensorFlow

TensorFlow is a free and open-source software library for dataflow and differentiableprogramming
across a range of tasks. It is a symbolic math library, and is also used for machine learning applications
such as neural networks. It is used for both research and production at Google.TensorFlow was
developed by the Google Brain team for internal Google use. It was released under the Apache 2.0 open-
source license on November 9, 2015.

NumPy

NumPy is a general-purpose array-processing package. It provides a high-performance multidimensional


array object, and tools for working with these arrays. It is the fundamental package for scientific
computing with Python. It contains various features including these important ones:
 A powerful N-dimensional array object
 Sophisticated (broadcasting) functions
 Tools for integrating C/C++ and Fortran code
 Useful linear algebra, Fourier transform, and random number capabilities

9
Besides its obvious scientific uses, Numpy can also be used as an efficient multi-dimensional container
of generic data. Arbitrary data-types can be defined using Numpy which allows Numpy to seamlessly
and speedily integrate with a wide variety of databases.

Pandas

Pandas is an open-source Python Library providing high-performance data manipulation and analysis
tool using its powerful data structures. Python was majorly used for data munging and preparation. It had
very little contribution towards data analysis. Pandas solved this problem. Using Pandas, we can
accomplish five typical steps in the processing and analysis of data, regardless of the origin of data load,
prepare, manipulate, model, and analyze. Python with Pandas is used in a wide range of fields including
academic and commercial domains including finance, economics, Statistics, analytics, etc.

Matplotlib

Matplotlib is a Python 2D plotting library which produces publication quality figures in a variety of
hardcopy formats and interactive environments across platforms. Matplotlib can be used in Python
scripts, the Python and IPython shells, the Jupyter Notebook, web application servers, and four graphical
user interface toolkits. Matplotlib tries to make easy things easy and hard things possible. You can
generate plots, histograms, power spectra, bar charts, error charts, scatter plots, etc., with just a few lines
of code. For examples, see the sample plots and thumbnail gallery.
For simple plotting the pyplot module provides a MATLAB-like interface, particularly when combined
with IPython. For the power user, you have full control of line styles, font properties, axes properties,
etc., via an object-oriented interface or via a set of functions familiar to MATLAB users.

Scikit – learn

Scikit-learn provides a range of supervised and unsupervised learning algorithms via a consistent
interface in Python. It is licensed under a permissive simplified BSD license and is distributed under
many Linux distributions, encouraging academic and commercial use. The library is built upon the SciPy
(Scientific Python) that must be installed before you can use scikit-learn. This stack that includes:
• NumPy: Base n-dimensional array package
• SciPy: Fundamental library for scientific computing
• Matplotlib: Comprehensive 2D/3D plotting
• IPython: Enhanced interactive console

Modules for web interface:


1.Service Provider:
In this module, the Service Provider has to login by using valid user name and password. After login
successful he can do some operations such as Login, Train & Test Used Car Data Sets, View Trained
Accuracy in Bar Chart, View Trained Accuracy Results, View Used Car Prices Type, Find Used Car
Prices Type Ratio, Download Predicted Datasets, View Used Car Prices Type Ratio Results, View All
10
Remote Users.

2.View and Authorize Users:


In this module, the admin can view the list of users who all registered. In this, the admin can view the
user’s details such as, user name, email, address and admin authorize the users.

3.Remote User:
In this module, there are n numbers of users are present. User should register before doing any
operations. Once user registers, their details will be stored to the database. After registration successful,
he has to login by using authorized user name and password. Once Login is successful user will do some
operations like REGISTER AND LOGIN, PREDICT USED CAR PRICE TYPE, VIEW YOUR
PROFILE

11
CHAPTER-4
SYSTEM REQUIREMENTS
4.1 Software Requirements

 Operating System: Windows 11


 Server-side Script: Python
 IDE : VS code
 Libraries Used : Django web framework
 Designing : Html, CSS
 Database : MySQL

4.2 Hardware Requirements

 Processor : minimum intel i3


 RAM : minimum 4 GB
 Hard Disk : minimum 250 GB
 Key Board : Standard Keyboard
 Monitor

4.3 Project Prerequisites

 OS module
 programming skills
 Django web framework
 Visual studio code
 HTML, CSS, Bootstrap

OS Module: In Python, the OS module contains functions for dealing with the operating system. OS is
a typical utility module in Python. This module allows you to use operating system-specific functions
in a portable manner. Several functions for interacting with the file system are included in the *os and
*os. path* modules.

Programming Skills: Ideas and knowledge in programming are essential to developing software for
image forgery detection. Hands-on experience in programming languages like Python, C++, or Java, is
necessary to implement the method of image processing algorithms.

Django Web Framework: Django is a Python-based web framework that enables developers to
quickly and efficiently create web applications. It offers pre-built components and capabilities like URL
routing, template rendering, and form management. Some of its primary features include a sophisticated
ORM for interacting with databases, an admin interface for managing content, a templating engine for
simply building and editing HTML templates, built-in security mechanisms for protecting against web
vulnerabilities, and scalability for handling huge traffic volumes. Django is a versatile and strong

11
framework that can be used to create a wide range of online applications, from basic blogs to big e-
commerce sites.

Visual Studio Code: Visual Studio Code (VS Code) is a versatile and widely-used source code editor
developed by Microsoft. It provides developers with a rich set of features designed to enhance
productivity and streamline the coding process. VS Code supports various programming languages and
offers built-in support for HTML, CSS, and JavaScript, among others. Its intuitive interface includes
features like syntax highlighting, code completion, and code refactoring, making it easier to write and
edit code. Additionally, VS Code integrates seamlessly with version control systems like Git, allowing
developers to manage code repositories directly within the editor. Extensions further extend its
functionality, enabling developers to customize and tailor VS Code to suit their specific needs. With its
cross-platform support and extensive community-driven ecosystem, Visual Studio Code has become a
favorite among developers for building and debugging applications across different platforms and
frameworks.

HTML, CSS, BOOTSTRAP: HTML (Hypertext Markup Language) serves as the foundational
language for creating web pages. It defines the structure and content of a webpage using a variety of
tags and attributes. Web browsers interpret these tags to render text, images, links, and other elements,
allowing users to view and interact with web content. CSS (Cascading Style Sheets) complements
HTML by controlling the visual presentation and layout of web pages. It enables developers to
customize colors, fonts, spacing, and other stylistic aspects across multiple pages or an entire website.
Bootstrap, on the other hand, is an open-source front-end framework that streamlines web development.
Developed by Twitter, Bootstrap provides a comprehensive set of pre-designed components, including
grids, buttons, forms, and navigation bars. These components are responsive by default, ensuring
websites look and function well on various devices and screen sizes. Together, HTML, CSS, and
Bootstrap form the backbone of modern web development, enabling developers to create visually
appealing, responsive, and user-friendly websites with ease.

12
CHAPTER-5
SYSTEM DESIGN
5.1 Introduction
The most creative and challenging phase of the life cycle is system and design. The term design describes
a final system and the process by which it is developed. It refers to the technical specifications that will
be applied in implementation the candidate system.

The design may be defined as “the process of applying various techniques and principles for the purpose
of defining a device, a process or a system in sufficient details to permit its physical realization”. The
design’s goal is how the output is to be produced and in what format samples of the output and input are
also presented. Second input data and database files have to be designed to meet the requirements of the
proposed output. The processing phase is handled through the program construction and testing. Finally
details related to justification of the system and an estimate of the impact of the candidate system on the
users and the organization are documented and evaluated by management as a step toward
implementation.

The importance of software design can be stated in a single word “Quality”. Design provides us
with representation of software that can be assessed for quality. Design is the only way that we can
accurately translate a customer’s requirements into a finished software product or system without design
we risk building an unstable system, that might fail it small changes are made or may be difficult to test,
or one who’s qualitycan’t be tested. So, it is an essential phase in the development of a software product.

5.2 System Architecture

Figure 5.1 Architecture Model


14
5.3 UML Diagrams:

UML diagrams are a standardized way of representing different aspects of a software system or process.
UML diagrams are not code, but rather a graphical way to visualize and communicate the different
components, relationships, and behaviors of a system. UML diagrams can help to improve
communication and understanding between stakeholders, developers, and designers.

5.3.1 Use Case Diagram


Use case diagrams are a particular kind of behavioral diagram that depicts how actors interact with the
system. The users or external systems that communicate with the modeled system are represented by
actors. The different use cases or scenarios that the technology can be employed in are displayed in
use case diagrams. They can aid in identifying system needs and design features by illuminating the
connections between use cases and actors.

 Use case Train & Test Used Car Data Sets

View Trained and Tested Accuracy in Bar Chart

View Trained and Tested


Accuracy Results

REGISTER AND LOGIN

Service
Provider
PREDICT USED CAR PRICE TYPE Remote User

VIEW YOUR PROFILE

View Used Car Prices Type

Find Used Car Prices Type Ratio

Download Predicted Datasets

View Used Car Prices Type Ratio Results ,


View All Remote Users

Figure 5.2 Use Case Diagram

15
5.3.2 Class Diagram
Class diagrams are a form of a structural diagram that depicts the classes, their characteristics and
methods, and the relationships between them as well as the static structure of a system. Class diagrams
are useful for creating and comprehending the architecture of a system since they are used to model
the data or objects1.in Class Diagram
a system. : also be used to create system code.
They can

Service Provider

Login, Train & Test Used Car Data Sets, View Trained Accuracy in Bar Chart,
View Trained Accuracy Results, View Used Car Prices Type, Find Used Car
Methods
Prices Type Ratio, Download Predicted Datasets, View Used Car Prices Type
Ratio Results, View All Remote Users

RID, Car_ Name, Location, Car_ Year, kilometer, Fuel_ Type, Transmission,
Members Owner_ Type, Mileage, Engine, Power, Seats, Prediction.

Login
Register
Login,
Login Register
(), Reset (),
Methods Register (), Reset ()
Methods Register ().
User Name, Password
User Name, Password. User Name, Password, E-
Members mail, Mobile, Address, DOB,
Members
Gender, Pin code, Image

Remote User

REGISTER AND LOGIN, PREDICT USED CAR PRICE TYPE, VIEW YOUR
PROFILE.
Methods

Tweet Servervvv
RID, Car_ Name, Location, Car_ Year, kilometer, Fuel_ Type,
Tweet Server
Transmission, Owner_ Type, Mileage, Engine, Power, Seats, Prediction.
Members
Tweet Server

Tweet Server

Tweet Server

Figure 5.3 Class Diagram


16
5.3.3 Sequence Diagram
Class diagrams are a form of a structural diagram that depicts the classes, their characteristics and
methods, and the relationships between them as well as the static structure of a system. Class diagrams
 Sequence
are useful for creatingDiagram
and comprehending the architecture of a system since they are used to model
the data or objects in a system. They can also be used to create system code.

Service Provider
Web Server Remote User

1.

REGISTER AND LOGIN

PREDICT USED CAR PRICE TYPE


Login, VIEW YOUR PROFILE
Train & Test Used Car Data Sets,

View Trained Accuracy in Bar Chart,

View Trained Accuracy Results,

View Used Car Prices Type,

Find Used Car Prices Type Ratio,

Download Predicted Datasets,

View Used Car Prices Type Ratio Results,

View All Remote Users.

Figure 5.4 Sequence Diagram

17
5.3.4 DFD
A Data Flow Diagram (DFD) is a conventional method for visualizing how information flows within a
system. A tidy and unambiguous DFD can graphically display a large portion of the system
requirements. It might be manual, automatic, or a combination of the two. It demonstrates how data
enters and exits the system, what alters the data, and where data is stored. A DFD's goal is to indicate
the extent and bounds of a system. It can be used as a communication tool between a systems analyst
and anyone involved in the system which serves as the beginning point for system change.
 Data Flow Diagram :

Train & Test Used Car Data Sets,


View Trained Accuracy in Bar
Chart, View Trained Accuracy
Results

Service Provider Login


System PREDICT USED CAR
PRICE TYPE

View Used Car Prices Type, Find Response


Used Car Prices Type Ratio, Register and
Download Predicted Datasets Login with the
VIEW YOUR
system
PROFILE

Request

View Used Car Prices Type Ratio Remote User


Results, View All Remote Users.
Tweet Server

Tweet Server

Figure 5.5 Data Flow Diagram

18
CHAPTER-6
IMPLEMENTATIONS
6.1 Technology Description

Python

Programming paradigms including functional, aspect-oriented, object-oriented, and structured


programming are all supported by Python, a flexible language. Additionally, it offers extensions for other
paradigms like design by contract and logic programming. Python supports dynamic name resolution and
uses dynamic typing, reference counting, and garbage collection to manage memory. Python emphasizes
code readability over slight performance gains, but when necessary, programmers can employ C
extensions or just-in-time compilation to achieve quicker performance. With built-in capabilities like
filter, map, and reduce functions, along with list comprehensions, dictionaries, sets, and generator
expressions, Python's design also supports functional programming. The language has a helpful open-
source community, as well as extensive support libraries and third-party modules.

Django

Django is a high-level Python web framework renowned for its emphasis on rapid and secure web
application development. Following the "batteries-included" philosophy, Django offers developers an
extensive set of built-in tools and features, streamlining the development process. It employs the Model-
View-Template (MVT) architecture, akin to the MVC pattern, where the model defines the data structure,
the view manages presentation logic, and the template handles the user interface. One of Django's standout
features is its powerful Object-Relational Mapping (ORM) system, allowing database interactions through
Python objects, mitigating risks like SQL injection. The framework also boasts an automatic admin
interface for effortless database management, a URL dispatcher for clean URL structures, and a robust
template engine for dynamic HTML generation. Additionally, Django simplifies forms handling with
built-in validation, offers comprehensive authentication and authorization mechanisms, and emphasizes
security with protections against common web vulnerabilities. Supporting internationalization and
localization, Django is also scalable, equipped with caching and asynchronous processing capabilities. In
essence, Django stands out as a versatile and secure framework, enabling developers to craft web
applications efficiently without compromising on maintainability or security.

Artificial Neural Networks

Artificial Neural Networks (ANNs) are computational models inspired by the human brain's neural
networks. They consist of artificial neurons organized into layers: an input layer, hidden layers, and an
output layer. Neurons in adjacent layers are connected by weighted connections, and each neuron applies
an activation function to its inputs to produce an output. In feedforward propagation, input data flows
through the network, with computations at each neuron leading to the final output. Training involves
adjusting the weights using backpropagation, where the error between predicted and actual outputs is
20
minimized. Neural networks are trained using labeled data and a loss function, with common types
including Feedforward Neural Networks (FNNs) for basic tasks, Convolutional Neural Networks (CNNs)
for image processing, and Recurrent Neural Networks (RNNs) for sequential data. Advanced variants like
LSTMs and GRUs handle long-term dependencies. ANNs have revolutionized fields like computer vision,
natural language processing, and robotics, remaining at the forefront of AI research and development.

Machine Learning

Before we take a look at the details of various machine learning methods, let's start by looking at what
machine learning is, and what it isn't. Machine learning is often categorized as a subfield of artificial
intelligence, but I find that categorization can often be misleading at first brush. The study of machine
learning certainly arose from research in this context, but in the data science application of machine
learning methods, it's more helpful to think of machine learning as a means of building models of data.
Fundamentally, machine learning involves building mathematical models to help understand data.
"Learning" enters the fray when we give these models tunable parameters that can be adapted to observed
data; in this way the program can be considered to be "learning" from the data. Once these models have
been fit to previously seen data, they can be used to predict and understand aspects of newly observed
data. I'll leave to the reader the more philosophical digression regarding the extent to which this type of
mathematical, model-based "learning" is similar to the "learning" exhibited by the human brain.
Understanding the problem setting in machine learning is essential to using these tools effectively, and so
we will start with some broad categorizations of the types of approaches we'll discuss here.

Categories Of Machine Leaning:

At the most fundamental level, machine learning can be categorized into two main types: supervised
learning and unsupervised learning.
Supervised learning involves somehow modeling the relationship between measured features of data and
some label associated with the data; once this model is determined, it can be used to apply labels to new,
unknown data. This is further subdivided into classification tasks and regression tasks: in classification,
the labels are discrete categories, while in regression, the labels are continuous quantities. We will see
examples of both types of supervised learning in the following section.
Unsupervised learning involves modeling the features of a dataset without reference to any label, and is
often described as "letting the dataset speak for itself." These models include tasks such as clustering and
dimensionality reduction. Clustering algorithms identify distinct groups of data, while dimensionality
reduction algorithms search for more succinct representations of the data. We will see examples of both
types of unsupervised learning in the following section.

Need for Machine Learning:

Human beings, at this moment, are the most intelligent and advanced species on earth because they can
21
think, evaluate and solve complex problems. On the other side, AI is still in its initial stage and haven’t
surpassed human intelligence in many aspects. Then the question is that what is the need to make machine
learn? The most suitable reason for doing this is, “to make decisions, based on data, with efficiency and
scale”.
Lately, organizations are investing heavily in newer technologies like Artificial Intelligence, Machine
Learning and Deep Learning to get the key information from data to perform several real-world tasks and
solve problems. We can call it data-driven decisions taken by machines, particularly to automate the
process. These data-driven decisions can be used, instead of using programing logic, in the problems that
cannot be programmed inherently. The fact is that we can’t do without human intelligence, but other
aspect is that we all need to solve real-world problems with efficiency at a huge scale. That is why the
need for machine learning arises.

Challenges in Machines Learning:

While Machine Learning is rapidly evolving, making significant strides with cybersecurity and
autonomous cars, this segment of AI as whole still has a long way to go. The reason behind is that ML has
not been able to overcome number of challenges. The challenges that ML is facing currently are −
1. Quality of data − Having good-quality data for ML algorithms is one of the biggest challenges.
Use of low-quality data leads to the problems related to data preprocessing and feature extraction.
2. Time-Consuming task − Another challenge faced by ML models is the consumption of time
especially for data acquisition, feature extraction and retrieval.
3. Lack of specialist persons − As ML technology is still in its infancy stage, availability of expert
resources is a tough job.
4. No clear objective for formulating business problems − Having no clear objective and well-defined
goal for business problems is another key challenge for ML because this technology is not that
mature yet.
5. Issue of overfitting & underfitting − If the model is overfitting or underfitting, it cannot be
represented well for the problem.
6. Curse of dimensionality − Another challenge ML model faces is too many features of data points.
This can be a real hindrance.
7. Difficulty in deployment − Complexity of the ML model makes it quite difficult to be deployed in
real life.

Applications of Machines Learning:

Machine Learning is the most rapidly growing technology and according to researchers we are in the
golden year of AI and ML. It is used to solve many real-world complex problems which cannot be solved
with traditional approach. Following are some real-world applications of ML −
• Emotion analysis
• Sentiment analysis
• Error detection and prevention
22
• Weather forecasting and prediction
• Stock market analysis and forecasting
• Speech synthesis
• Speech recognition
• Customer segmentation
• Object recognition
• Fraud detection
• Fraud prevention
• Recommendation of products to customer in online shopping

How to Start Learning Machine Learning?

Arthur Samuel coined the term “Machine Learning” in 1959 and defined it as a “Field of study that gives
computers the capability to learn without being explicitly programmed”.
And that was the beginning of Machine Learning! In modern times, Machine Learning is one of the most
popular (if not the most!) career choices. According to Indeed, Machine Learning Engineer Is The Best
Job of 2019 with a 344% growth and an average base salary of $146,085 per year.
But there is still a lot of doubt about what exactly is Machine Learning and how to start learning it? So this
article deals with the Basics of Machine Learning and also the path you can follow to eventually become a
full-fledged Machine Learning Engineer. Now let’s get started!!!
How to start learning ML?
This is a rough roadmap you can follow on your way to becoming an insanely talented Machine Learning
Engineer. Of course, you can always modify the steps according to your needs to reach your desired end-
goal!

Step 1 – Understand the Prerequisites


In case you are a genius, you could start ML directly but normally, there are some prerequisites that you
need to know which include Linear Algebra, Multivariate Calculus, Statistics, and Python. And if you
don’t know these, never fear! You don’t need a Ph.D. degree in these topics to get started but you do need
a basic understanding.
(a) Learn Linear Algebra and Multivariate Calculus
Both Linear Algebra and Multivariate Calculus are important in Machine Learning. However, the extent
to which you need them depends on your role as a data scientist. If you are more focused on application
heavy machine learning, then you will not be that heavily focused on math’s as there are many common
libraries available. But if you want to focus on R&D in Machine Learning, then mastery of Linear Algebra
and Multivariate Calculus is very important as you will have to implement many ML algorithms from
scratch.
(b) Learn Statistics
Data plays a huge role in Machine Learning. In fact, around 80% of your time as an ML expert will be
spent collecting and cleaning data. And statistics is a field that handles the collection, analysis, and
presentation of data. So it is no surprise that you need to learn it!!!
23
Some of the key concepts in statistics that are important are Statistical Significance, Probability
Distributions, Hypothesis Testing, Regression, etc. Also, Bayesian Thinking is also a very important part
of ML which deals with various concepts like Conditional Probability, Priors, and Posteriors, Maximum
Likelihood, etc.
(c) Learn Python
Some people prefer to skip Linear Algebra, Multivariate Calculus and Statistics and learn them as they go
along with trial and error. But the one thing that you absolutely cannot skip is Python! While there are
other languages you can use for Machine Learning like R, Scala, etc. Python is currently the most popular
language for ML. In fact, there are many Python libraries that are specifically useful for Artificial
Intelligence and Machine Learning such as Kera’s, TensorFlow, Scikit-learn, etc.
So if you want to learn ML, it’s best if you learn Python! You can do that using various online resources
and courses such as Fork Python available Free on GeeksforGeeks.

Step 2 – Learn Various ML Concepts


Now that you are done with the prerequisites, you can move on to actually learning ML (Which is the fun
part!!!) It’s best to start with the basics and then move on to the more complicated stuff. Some of the basic
concepts in ML are:
(a) Terminologies of Machine Learning
• Model – A model is a specific representation learned from data by applying some machine learning
algorithm. A model is also called a hypothesis.
• Feature – A feature is an individual measurable property of the data. A set of numeric features can be
conveniently described by a feature vector. Feature vectors are fed as input to the model. For example, in
order to predict a fruit, there may be features like color, smell, taste, etc.
• Target (Label) – A target variable or label is the value to be predicted by our model. For the fruit
example discussed in the feature section, the label with each set of input would be the name of the fruit
like apple, orange, banana, etc.
• Training – The idea is to give a set of inputs(features) and it’s expected outputs(labels), so after training,
we will have a model (hypothesis) that will then map new data to one of the categories trained on.
• Prediction – Once our model is ready, it can be fed a set of inputs to which it will provide a predicted
output(label).
(b) Types of Machine Learning
• Supervised Learning – This involves learning from a training dataset with labeled data using
classification and regression models. This learning process continues until the required level of
performance is achieved.
• Unsupervised Learning – This involves using unlabeled data and then finding the underlying structure
in the data in order to learn more and more about the data itself using factor and cluster analysis models.
• Semi-supervised Learning – This involves using unlabeled data like Unsupervised Learning with a
small amount of labeled data. Using labeled data vastly increases the learning accuracy and is also more
cost-effective than Supervised Learning.
• Reinforcement Learning – This involves learning optimal actions through trial and error. So the next
action is decided by learning behaviors that are based on the current state and that will maximize the
24
reward in the future.

Advantages of Machine learning:

1. Easily identifies trends and patterns -


Machine Learning can review large volumes of data and discover specific trends and patterns that would
not be apparent to humans. For instance, for an e-commerce website like Amazon, it serves to understand
the browsing behaviors and purchase histories of its users to help cater to the right products, deals, and
reminders relevant to them. It uses the results to reveal relevant advertisements to them.
2. No human intervention needed (automation)
With ML, you don’t need to babysit your project every step of the way. Since it means giving machines
the ability to learn, it lets them make predictions and also improve the algorithms on their own. A
common example of this is anti-virus software’s; they learn to filter new threats as they are recognized.
ML is also good at recognizing spam.
3. Continuous Improvement
As ML algorithms gain experience, they keep improving in accuracy and efficiency. This lets them make
better decisions. Say you need to make a weather forecast model. As the amount of data you have keeps
growing, your algorithms learn to make more accurate predictions faster.
4. Handling multi-dimensional and multi-variety data
Machine Learning algorithms are good at handling data that are multi-dimensional and multi-variety, and
they can do this in dynamic or uncertain environments.
5. Wide Applications
You could be an e-tailer or a healthcare provider and make ML work for you. Where it does apply, it
holds the capability to help deliver a much more personal experience to customers while also targeting the
right customers.

Disadvantages of Machine Learning:

1. Data Acquisition
Machine Learning requires massive data sets to train on, and these should be inclusive/unbiased, and of
good quality. There can also be times where they must wait for new data to be generated.
2. Time and Resources
ML needs enough time to let the algorithms learn and develop enough to fulfill their purpose with a
considerable amount of accuracy and relevancy. It also needs massive resources to function. This can
mean additional requirements of computer power for you.
3. Interpretation of Results
Another major challenge is the ability to accurately interpret results generated by the algorithms. You
must also carefully choose the algorithms for your purpose.
4. High error-susceptibility
Machine Learning is autonomous but highly susceptible to errors. Suppose you train an algorithm with
data sets small enough to not be inclusive. You end up with biased predictions coming from a biased
25
training set. This leads to irrelevant advertisements being displayed to customers. In the case of ML, such
blunders can set off a chain of errors that can go undetected for long periods of time. And when they do
get noticed, it takes quite some time to recognize the source of the issue, and even longer to correct it.

Machine Learning Algorithms

Support Vector Machine (SVM)


Support Vector Machines are supervised learning models used for classification and regression tasks.
SVMs excel in finding the hyperplane that best divides a dataset into classes. The Radial Basis Function
(RBF) Kernel in SVM is particularly effective in capturing complex non-linear relationships in data. It
does this by mapping data into a higher-dimensional space where it's easier to find a separating
hyperplane. SVMs are known for their ability to handle high-dimensional data and their robust
performance in various domains.

Random Forest Classifier:


Random Forest is an ensemble learning method that combines multiple decision trees during training.
Each tree in the forest "votes" for a class, and the class with the most votes becomes the model's
prediction. This technique is robust against overfitting and noise, making it highly versatile for both
classification and regression tasks. Random Forests are known for their accuracy and ability to handle
large datasets with high dimensionality.

Logistic Regression:
Logistic Regression is a statistical method used for binary classification tasks. Despite its name, it's
primarily used for classification rather than regression. It estimates the probability that a given instance
belongs to a particular category. Logistic Regression models the relationship between the dependent
binary variable and one or more independent variables by estimating probabilities using a logistic
function. It's simple, interpretable, and works well for linearly separable data.

K-Nearest Neighbors (KNN):


K-Nearest Neighbors is a simple, instance-based learning algorithm used for classification and regression.
The algorithm works by finding the 'K' closest instances in the training dataset to a new instance and
predicts the class of that instance based on the most common class among its neighbors. KNN is non-
parametric and lazy, meaning it doesn't make strong assumptions about the form of the mapping function
and doesn't build a model until predictions are required.

XGBoost:
XGBoost stands for extreme Gradient Boosting, which is an implementation of gradient boosted decision
trees designed for speed and performance. It's an ensemble learning method that uses a collection of weak
learners (usually decision trees) to make predictions. XGBoost is known for its efficiency, speed, and
performance. It can handle missing values, regularization, and offers built-in feature selection, making it a
popular choice for structured data problems.
26
Naive Bayes Classifiers:
Naive Bayes classifiers are a family of probabilistic classifiers based on Bayes' theorem with strong
independence assumptions between features. Despite these naive assumptions, Naive Bayes classifiers
have been found to perform surprisingly well in many real-world situations, especially for text
classification tasks like spam filtering and sentiment analysis. They are simple, fast, and require a small
amount of training data to make accurate predictions.

6.2 Sample Code:

manage.py

#!/usr/bin/env python
"""Django's command-line utility for administrative tasks."""
import os
import sys

def main():
"""Run administrative tasks."""
os.environ.setdefault('DJANGO_SETTINGS_MODULE',
'used_car_price_prediction.settings')
try:
from django.core.management import execute_from_command_line
except ImportError as exc:
raise ImportError(
"Couldn't import Django. Are you sure it's installed and "
"available on your PYTHONPATH environment variable? Did you "
"forget to activate a virtual environment?"
) from exc
execute_from_command_line(sys.argv)

if __name__ == '__main__':
main()

models.py

from django.db import models

# Create your models here.


from django.db.models import CASCADE

27
class ClientRegister_Model(models.Model):
username = models.CharField(max_length=30)
email = models.EmailField(max_length=30)
password = models.CharField(max_length=10)
phoneno = models.CharField(max_length=10)
country = models.CharField(max_length=30)
state = models.CharField(max_length=30)
city = models.CharField(max_length=30)
address= models.CharField(max_length=3000)
gender= models.CharField(max_length=30)

class price_prediction(models.Model):

RID= models.CharField(max_length=3000)
Car_Name= models.CharField(max_length=3000)
Location= models.CharField(max_length=3000)
Car_Year= models.CharField(max_length=3000)
kilometer= models.CharField(max_length=3000)
Fuel_Type= models.CharField(max_length=3000)
Transmission= models.CharField(max_length=3000)
Owner_Type= models.CharField(max_length=3000)
Mileage= models.CharField(max_length=3000)
Engine= models.CharField(max_length=3000)
Power= models.CharField(max_length=3000)
Seats= models.CharField(max_length=3000)
Prediction= models.CharField(max_length=3000)

class detection_accuracy(models.Model):

names = models.CharField(max_length=300)
ratio = models.CharField(max_length=300)

class detection_ratio(models.Model):

names = models.CharField(max_length=300)
ratio = models.CharField(max_length=300)

Remote User views.py

from django.db.models import Count


from django.db.models import Q
from django.shortcuts import render, redirect, get_object_or_404
import datetime
import openpyxl

28
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import re
from sklearn.ensemble import VotingClassifier
from sklearn.tree import DecisionTreeClassifier
import warnings
warnings.filterwarnings("ignore")
plt.style.use('ggplot')
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report
from sklearn.metrics import accuracy_score
from sklearn.metrics import f1_score

from Remote_User.models import


ClientRegister_Model,price_prediction,detection_ratio,detection_accuracy

def login(request):

if request.method == "POST" and 'submit1' in request.POST:

username = request.POST.get('username')
password = request.POST.get('password')
try:
enter =
ClientRegister_Model.objects.get(username=username,password=password)
request.session["userid"] = enter.id

return redirect('ViewYourProfile')
except:
pass

return render(request,'RUser/login.html')

def Register1(request):
if request.method == "POST":
username = request.POST.get('username')
email = request.POST.get('email')
password = request.POST.get('password')
phoneno = request.POST.get('phoneno')
country = request.POST.get('country')
state = request.POST.get('state')
city = request.POST.get('city')
29
address = request.POST.get('address')
gender = request.POST.get('gender')
ClientRegister_Model.objects.create(username=username, email=email,
password=password, phoneno=phoneno,
country=country, state=state, city=city,
address=address, gender=gender)
obj = "Registered Successfully"
return render(request, 'RUser/Register1.html', {'object': obj})
else:
return render(request,'RUser/Register1.html')

def ViewYourProfile(request):
userid = request.session['userid']
obj = ClientRegister_Model.objects.get(id= userid)
return render(request,'RUser/ViewYourProfile.html',{'object':obj})

def predict_used_car_price_type(request):
if request.method == "POST":

RID= request.POST.get('RID')
Car_Name= request.POST.get('Car_Name')
Location= request.POST.get('Location')
Car_Year= request.POST.get('Car_Year')
kilometer= request.POST.get('kilometer')
Fuel_Type= request.POST.get('Fuel_Type')
Transmission= request.POST.get('Transmission')
Owner_Type= request.POST.get('Owner_Type')
Mileage= request.POST.get('Mileage')
Engine= request.POST.get('Engine')
Power= request.POST.get('Power')
Seats= request.POST.get('Seats')

df = pd.read_csv('Datasets.csv')
df
df.columns

def apply_results(results):

if float(results) <= 5.0:


return 0 # Price is Below 5L
elif float(results) >= 5.0 and float(results) <= 20.0:
return 1 # More Than 5 and less than 20
elif float(results) >= 20.0 and float(results) <= 100.0:
return 2 # Price is More than 20L and Less that 100

30
df['Results'] = df['Price'].apply(apply_results)

cv = CountVectorizer()
X = df['RID'].apply(str)
y = df['Results']

X = cv.fit_transform(X)

models = []

# Set random seed for reproducibility


np.random.seed(42)

from sklearn.model_selection import train_test_split


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20,
random_state=42)
X_train.shape, X_test.shape, y_train.shape

print("KNeighborsClassifier")
from sklearn.neighbors import KNeighborsClassifier
kn = KNeighborsClassifier()
kn.fit(X_train, y_train)
knpredict = kn.predict(X_test)
print("ACCURACY")
print(accuracy_score(y_test, knpredict) * 100)
print("CLASSIFICATION REPORT")
print(classification_report(y_test, knpredict))
print("CONFUSION MATRIX")
print(confusion_matrix(y_test, knpredict))
models.append(('KNeighborsClassifier', kn))

# SVM Model
print("SVM")
from sklearn import svm
lin_clf = svm.LinearSVC()
lin_clf.fit(X_train, y_train)
predict_svm = lin_clf.predict(X_test)
svm_acc = accuracy_score(y_test, predict_svm) * 100
print(svm_acc)
print("CLASSIFICATION REPORT")
print(classification_report(y_test, predict_svm))
print("CONFUSION MATRIX")
print(confusion_matrix(y_test, predict_svm))
31
models.append(('svm', lin_clf))

print("Logistic Regression")

from sklearn.linear_model import LogisticRegression


reg = LogisticRegression(random_state=0, solver='lbfgs').fit(X_train,
y_train)
y_pred = reg.predict(X_test)
print("ACCURACY")
print(accuracy_score(y_test, y_pred) * 100)
print("CLASSIFICATION REPORT")
print(classification_report(y_test, y_pred))
print("CONFUSION MATRIX")
print(confusion_matrix(y_test, y_pred))
models.append(('logistic', reg))

print("Random Forest Classifier")


from sklearn.ensemble import RandomForestClassifier
rf_clf = RandomForestClassifier()
rf_clf.fit(X_train, y_train)
rfpredict = rf_clf.predict(X_test)
print("ACCURACY")
print(accuracy_score(y_test, rfpredict) * 100)
print("CLASSIFICATION REPORT")
print(classification_report(y_test, rfpredict))
print("CONFUSION MATRIX")
print(confusion_matrix(y_test, rfpredict))
models.append(('RandomForestClassifier', rf_clf))

print("SVM with RBF Kernel")


from sklearn.svm import SVC
svm_clf_rbf = SVC(kernel='rbf')
svm_clf_rbf.fit(X_train, y_train)
svm_predict_rbf = svm_clf_rbf.predict(X_test)
print("ACCURACY")
print(accuracy_score(y_test, svm_predict_rbf) * 100)
print("CLASSIFICATION REPORT")
print(classification_report(y_test, svm_predict_rbf))
print("CONFUSION MATRIX")
print(confusion_matrix(y_test, svm_predict_rbf))
models.append(('SVM with RBF Kernel', svm_clf_rbf))

print("XGBoost")
import xgboost as xgb
32
xgb_clf = xgb.XGBClassifier()
xgb_clf.fit(X_train, y_train)
xgb_predict = xgb_clf.predict(X_test)
print("ACCURACY")
print(accuracy_score(y_test, xgb_predict) * 100)
print("CLASSIFICATION REPORT")
print(classification_report(y_test, xgb_predict))
print("CONFUSION MATRIX")
print(confusion_matrix(y_test, xgb_predict))
models.append(('XGBoost', xgb_clf))

classifier = VotingClassifier(models)
classifier.fit(X_train, y_train)
y_pred = classifier.predict(X_test)

RID1 = [RID]
vector1 = cv.transform(RID1).toarray()
predict_text = classifier.predict(vector1)

pred = str(predict_text).replace("[", "")


pred1 = pred.replace("]", "")

prediction = int(pred1)

if prediction == 0:
val = 'Below 5L'
elif prediction == 1:
val = 'More Than 5L and Below 20L'
elif prediction == 2:
val = 'More Than 20L and Below 100L'

print(val)
print(pred1)

price_prediction.objects.create(
RID=RID,
Car_Name=Car_Name,
Location=Location,
Car_Year=Car_Year,
kilometer=kilometer,
Fuel_Type=Fuel_Type,
Transmission=Transmission,
Owner_Type=Owner_Type,
Mileage=Mileage,
33
Engine=Engine,
Power=Power,
Seats=Seats,
Prediction=val,
)

return render(request, 'RUser/predict_used_car_price_type.html',{'objs':val})


return render(request, 'RUser/predict_used_car_price_type.html')

Admin views.py

from django.db.models import Count, Avg


from django.shortcuts import render, redirect
from django.db.models import Count
from django.db.models import Q
from xgboost import XGBClassifier

import datetime
import xlwt
from django.http import HttpResponse

import pandas as pd
import numpy as np

from sklearn.feature_extraction.text import CountVectorizer


from sklearn.metrics import accuracy_score, confusion_matrix, classification_report
from sklearn.metrics import accuracy_score
from sklearn.metrics import precision_score, recall_score
from sklearn.metrics import f1_score, matthews_corrcoef
from sklearn.tree import DecisionTreeClassifier

from Remote_User.models import


ClientRegister_Model,price_prediction,detection_ratio,detection_accuracy

def serviceproviderlogin(request):
if request.method == "POST":
admin = request.POST.get('username')
password = request.POST.get('password')
if admin == "Admin" and password =="Admin":
return redirect('View_Remote_Users')

return render(request,'SProvider/serviceproviderlogin.html')

def Find_Used_Car_Price_Type_Ratio(request):
34
detection_ratio.objects.all().delete()
ratio = ""
kword = 'Below 5L'
print(kword)
obj = price_prediction.objects.all().filter(Q(Prediction=kword))
obj1 =price_prediction.objects.all()
count = obj.count();
count1 = obj1.count();
ratio = (count / count1) * 100
if ratio != 0:
detection_ratio.objects.create(names=kword, ratio=ratio)

ratio1 = ""
kword1 = 'More Than 5L and Below 20L'
print(kword1)
obj1 = price_prediction.objects.all().filter(Q(Prediction=kword1))
obj11 = price_prediction.objects.all()
count1 = obj1.count();
count11 = obj11.count();
ratio1 = (count1 / count11) * 100
if ratio1 != 0:
detection_ratio.objects.create(names=kword1, ratio=ratio1)

ratio12 = ""
kword12 = 'More Than 20L and Below 100L'
print(kword12)
obj12 = price_prediction.objects.all().filter(Q(Prediction=kword12))
obj112 = price_prediction.objects.all()
count12 = obj12.count();
count112 = obj112.count();
ratio12 = (count12 / count112) * 100
if ratio12 != 0:
detection_ratio.objects.create(names=kword12, ratio=ratio12)

obj = detection_ratio.objects.all()
return render(request, 'SProvider/Find_Used_Car_Price_Type_Ratio.html', {'objs':
obj})

def View_Remote_Users(request):
obj=ClientRegister_Model.objects.all()
return render(request,'SProvider/View_Remote_Users.html',{'objects':obj})

def ViewTrendings(request):

35
topic =
price_prediction.objects.values('topics').annotate(dcount=Count('topics')).order_by('
-dcount')
return render(request,'SProvider/ViewTrendings.html',{'objects':topic})

def charts(request,chart_type):
chart1 = detection_ratio.objects.values('names').annotate(dcount=Avg('ratio'))
return render(request,"SProvider/charts.html", {'form':chart1,
'chart_type':chart_type})

def charts1(request,chart_type):
chart1 = detection_accuracy.objects.values('names').annotate(dcount=Avg('ratio'))
return render(request,"SProvider/charts1.html", {'form':chart1,
'chart_type':chart_type})

def View_Prediction_Of_Used_Car_Price(request):
obj =price_prediction.objects.all()
return render(request, 'SProvider/View_Prediction_Of_Used_Car_Price.html',
{'list_objects': obj})

def likeschart(request,like_chart):
charts =detection_accuracy.objects.values('names').annotate(dcount=Avg('ratio'))
return render(request,"SProvider/likeschart.html", {'form':charts,
'like_chart':like_chart})

def Download_Trained_DataSets(request):

response = HttpResponse(content_type='application/ms-excel')
# decide file name
response['Content-Disposition'] = 'attachment; filename="TrainedData.xls"'
# creating workbook
wb = xlwt.Workbook(encoding='utf-8')
# adding sheet
ws = wb.add_sheet("sheet1")
# Sheet header, first row
row_num = 0
font_style = xlwt.XFStyle()
# headers are bold
font_style.font.bold = True
# writer = csv.writer(response)
obj =price_prediction.objects.all()
data = obj # dummy method to fetch data.
for my_row in data:
row_num = row_num + 1

36
ws.write(row_num, 0, my_row.RID, font_style)
ws.write(row_num, 1, my_row.Car_Name, font_style)
ws.write(row_num, 2, my_row.Location, font_style)
ws.write(row_num, 3, my_row.Car_Year, font_style)
ws.write(row_num, 4, my_row.kilometer, font_style)
ws.write(row_num, 5, my_row.Fuel_Type, font_style)
ws.write(row_num, 6, my_row.Transmission, font_style)
ws.write(row_num, 7, my_row.Owner_Type, font_style)
ws.write(row_num, 8, my_row.Mileage, font_style)
ws.write(row_num, 9, my_row.Engine, font_style)
ws.write(row_num, 10, my_row.Power, font_style)
ws.write(row_num, 11, my_row.Seats, font_style)
ws.write(row_num, 12, my_row.Prediction, font_style)

wb.save(response)
return response

def Train_Test_DataSets(request):
detection_accuracy.objects.all().delete()

df = pd.read_csv('Datasets.csv')
df

def apply_results(results):
if float(results) <= 5.0:
return 0 # Price is Below 5L
elif float(results) >= 5.0 and float(results) <= 20.0:
return 1 # 10L
elif float(results) >= 20.0 and float(results) <= 100.0:
return 2 # Price is More than 10L

df['Results'] = df['Price'].apply(apply_results)

cv = CountVectorizer()
X = df['Name'].apply(str)
y = df['Results']

print("Car Name")
print(X)
print("Label")
print(y)

X = cv.fit_transform(X)
models = []

37
# Set random seed for reproducibility
np.random.seed(42)

from sklearn.model_selection import train_test_split


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20,
random_state=42)
X_train.shape, X_test.shape, y_train.shape

print("K-Nearest Neighbors Classifier")

from sklearn.neighbors import KNeighborsClassifier

knn_clf = KNeighborsClassifier()
knn_clf.fit(X_train, y_train)

knn_predict = knn_clf.predict(X_test)

print("ACCURACY")
print(accuracy_score(y_test, knn_predict) * 100)
print("CLASSIFICATION REPORT")
print(classification_report(y_test, knn_predict))
print("CONFUSION MATRIX")
print(confusion_matrix(y_test, knn_predict))
models.append(('KNeighborsClassifier', knn_clf))
detection_accuracy.objects.create(names="K-Nearest Neighbors Classifier",
ratio=accuracy_score(y_test, knn_predict) * 100)

print("Random Forest Classifier")


from sklearn.ensemble import RandomForestClassifier
rf_clf = RandomForestClassifier()
rf_clf.fit(X_train, y_train)
rfpredict = rf_clf.predict(X_test)
print("ACCURACY")
print(accuracy_score(y_test, rfpredict) * 100)
print("CLASSIFICATION REPORT")
print(classification_report(y_test, rfpredict))
print("CONFUSION MATRIX")
print(confusion_matrix(y_test, rfpredict))
models.append(('RandomForestClassifier', rf_clf))
detection_accuracy.objects.create(names="Random Forest Classifier",
ratio=accuracy_score(y_test, rfpredict) * 100)
38
# SVM Model
print("SVM")
from sklearn import svm
lin_clf = svm.LinearSVC()
lin_clf.fit(X_train, y_train)
predict_svm = lin_clf.predict(X_test)
svm_acc = accuracy_score(y_test, predict_svm) * 100
print(svm_acc)
print("CLASSIFICATION REPORT")
print(classification_report(y_test, predict_svm))
print("CONFUSION MATRIX")
print(confusion_matrix(y_test, predict_svm))
models.append(('svm', lin_clf))
detection_accuracy.objects.create(names="SVM", ratio=svm_acc)

print("Logistic Regression")

from sklearn.linear_model import LogisticRegression


reg = LogisticRegression(random_state=0, solver='lbfgs').fit(X_train, y_train)
y_pred = reg.predict(X_test)
print("ACCURACY")
print(accuracy_score(y_test, y_pred) * 100)
print("CLASSIFICATION REPORT")
print(classification_report(y_test, y_pred))
print("CONFUSION MATRIX")
print(confusion_matrix(y_test, y_pred))
models.append(('logistic', reg))
detection_accuracy.objects.create(names="Logistic Regression",
ratio=accuracy_score(y_test, y_pred) * 100)

print("KNeighborsClassifier")
from sklearn.neighbors import KNeighborsClassifier
kn = KNeighborsClassifier()
kn.fit(X_train, y_train)
knpredict = kn.predict(X_test)
print("ACCURACY")
print(accuracy_score(y_test, knpredict) * 100)
print("CLASSIFICATION REPORT")
print(classification_report(y_test, knpredict))
print("CONFUSION MATRIX")
print(confusion_matrix(y_test, knpredict))
39
models.append(('KNeighborsClassifier', kn))

# SVM WITH RBF


print("SVM with RBF Kernel")
from sklearn.svm import SVC
svm_clf_rbf = SVC(kernel='rbf')
svm_clf_rbf.fit(X_train, y_train)
svm_predict_rbf = svm_clf_rbf.predict(X_test)
print("ACCURACY")
print(accuracy_score(y_test, svm_predict_rbf) * 100)
print("CLASSIFICATION REPORT")
print(classification_report(y_test, svm_predict_rbf))
print("CONFUSION MATRIX")
print(confusion_matrix(y_test, svm_predict_rbf))
models.append(('SVM with RBF Kernel', svm_clf_rbf))
detection_accuracy.objects.create(names="SVM with RBF Kernel",
ratio=accuracy_score(y_test, svm_predict_rbf) * 100)

# XGBoost
print("XGBoost")
import xgboost as xgb
xgb_clf = xgb.XGBClassifier()
xgb_clf.fit(X_train, y_train)
xgb_predict = xgb_clf.predict(X_test)
print("ACCURACY")
print(accuracy_score(y_test, xgb_predict) * 100)
print("CLASSIFICATION REPORT")
print(classification_report(y_test, xgb_predict))
print("CONFUSION MATRIX")
print(confusion_matrix(y_test, xgb_predict))
models.append(('XGBoost', xgb_clf))
detection_accuracy.objects.create(names="XGBoost", ratio=accuracy_score(y_test,
xgb_predict) * 100)

predicts = 'predicts.csv'
df.to_csv(predicts, index=False)
df.to_markdown

obj = detection_accuracy.objects.all()

return render(request,'SProvider/Train_Test_DataSets.html', {'objs': obj})

40
41
CHAPTER-7
OUTPUT & SCREENSHOTS

42
7. Output Screenshots

Figure 7.1 User Login page

Figure 7.2 User Registration page

43
Figure 7.3 View Your Profile

Figure 7.4 Predict Used Car Price Type

44
Figure 7.5 Inputting Car Details

Figure 7.6 Predicting Car Price Type

45
Figure 7.7 Admin Login

Figure 7.8 Train and Test Used Car Price Type

46
Figure 7.9 View Trained Accuracy in Bar Chart

Figure 7.8 View Trained Accuracy in Line Chart


47
Figure 7.9 View Trained Accuracy in Pie Chart

Figure 7.10 Find Used Car Prices Type Ratio

48
Figure 7.11 View Used Car Prices Type Ratio Results

Figure 7.12 View All Remote Users

49
CHAPTER-8
TESTING
8.1. Introduction to Testing

Testing is a procedure that identifies program errors. It is the primary quality metric used in
software development. During testing, the program is run under a set of conditions known as test
cases, and the output is analyzed to see if it is operating as expected. The process of executing
software to validate its functionality and correctness is known as software testing. The process of
running a program to identify an error. An excellent test case has a high likelihood of discovering
an as-yet-undiscovered fault. A successful test reveals a previously unknown mistake. Software
testing is typically done for two reasons:
• Detection of flaws
• Estimation of reliability

8.2. Types of Testing:

To ensure that the system is error-free, the following tiers of testing methodologies are used at various stages
of software development:

1. Unit testing: This is performed on individual models as they are finalized and made executable. It is
solely limited to the designer's specifications. Each module can be tested using one of the two methods
listed below:

Black Box Testing: With this method, some test cases are created as input conditions that fully execute
all the program's functional requirements. This testing was used to identify faults in the following areas:
a) Functions that are incorrect or missing.
b) Errors in the interface.
c) Data structure errors or access to an external database.
d) Mistakes in performance.
Only the output is examined for correctness during this testing. The data's logical flow is not examined.

White Box testing: In this method, test cases are built based on the logic of each module by sketching
flow diagrams of that module's logic, and logical judgments are tested on all situations. It was used to
create test cases in the following situations:
a) Ensure that all independent pathways are followed.
b) Carry out all reasonable decisions on both the truthful and false sides.
c) Run all loops inside their operational constraints and boundaries.
d) Test internal data structures for correctness.

2. Integration Testing: Integration testing guarantees that software and subsystems work in concert. It
tests the interfaces of all modules to ensure that they work properly when combined.

3. System Testing: This entails testing the complete system in-house before delivering it to the user. Its
goal is to reassure the user that the system meets all of the client's specifications.

51
4. Acceptance Testing: This is a type of pre-delivery testing in which the entire system is evaluated on
real-world data at the client's location to identify faults.

5. Validation: The system has been successfully tested and implemented, ensuring that all of the
requirements mentioned in the software requirements specification are properly met. In the event of
incorrect input, the associated error messages are presented.

Compiling Test: Doing our stress testing early on was a smart idea because it allowed us time to fix
some of the unforeseen deadlocks and stability issues that only appeared when components were
exposed to extremely high transaction volumes.

Execution Test: The software was loaded and run successfully. There were no execution errors because
of solid programming.

8.3 Sample Test Cases

Test Case: Running the Application

Figure 8.1 Inputting user details for registration

52
Figure 8.2 Registration Completed

Figure 8.3 Logging in


Input 1

Figure 8.4 Inputting Car Details (input1)

Figure 8.5 Predicting Car Price Type (input1)

54
Input 2

Figure 8.6 Inputting Car Details (input2)

Figure 8.7 Predicting Car Price Type (input2)

55
CHAPTER-9
CONCLUSION
&
FUTURE ENHANCEMENTS

56
9. Conclusion and Future Enhancements

In conclusion, this study underscores the effectiveness of machine learning algorithms in predicting
second-hand car prices in the Indian market context. Through rigorous evaluation and comparison, the
Support Vector Machine with RBF Kernel emerges as the top- performing algorithm, offering a high
level of accuracy 82.31% in price estimation. However, it is essential to acknowledge the continuous
need for improvement and refinement in predictive models, especially concerning the pricing of high-
value cars. The insights gleaned from this research not only contribute to enhancing transparency and
efficiency in the used car market but also pave the way for future advancements in pricing prediction
methodologies.

By leveraging the power of machine learning and data-driven approaches, stakeholders can make more
informed decisions, ultimately fostering a more equitable and dynamic marketplace for second-hand
vehicles in India.

57
CHAPTER-10
REFERENCES
10. References

[1] NATIONAL TRANSPORT AUTHORITY. 2015.


Available at: http://nta.govmu.org/English/Statistics/Pages/Arch ives.aspx. [Accessed 24 April 2015].
[2] Bharambe, M. M. P., and Dharmadhikari, S. C. (2015) “Stock Market Analysis Based on Artificial
Neural Network with Big data”. Fourth Post Graduate Conference, 24-25th March 2015, Pune, India.
[3] Pudaruth, S. (2014) “Predicting the Price of Used Cars using Machine Learning Techniques”.
International Journal of Information & Computation Technology, Vol. 4, No. 7, pp.753- 764.
[4] Jassibi, J., Alborzi, M. and Ghoreshi, F. (2011) “Car Paint Thickness Control using Artificial Neural
Network and Regression Method”. Journal of Industrial Engineering International, Vol. 7, No. 14, pp. 1-
6, November 2010
[5] Ahangar, R. G., Mahmood and Y., Hassen P.M. (2010) “The Comparison of Methods, Artificial
Neural Network with Linear Regression using Specific Variables for Prediction Stock Prices in Tehran
Stock Exchange”. International Journal of Computer Science and Information Security, Vol.7, No. 2, pp.
38-46.
[6] Listiani, M. (2009) “Support Vector Regression Analysis for Price Prediction in a Car Leasing
Application”. Thesis (MSc). Hamburg University of Technology.
[7] Iseri, A. and Karlik, B. (2009) “An Artificial Neural Network Approach on Automobile Pricing”.
Expert Systems with Application: ScienceDirect Journal of Informatics, Vol. 36, pp. 155-2160, March
2009.
[8] Yeo, C. A. (2009) “Neural Networks for Automobile Insurance Pricing”. Encyclopedia of Information
Science and Technology, 2nd Edition, pp. 2794-2800, Australia.
[9] Doganis, P., Alexandridis, A., Patrinos, P. and Sarimveis, H. (2006) “Time Series Sales Forecasting
for Short Shelf-life Food Products Based on Artificial Neural Networks and Evolutionary Computing”.
Journal of Food Engineering, Vol. 75, pp. 196–204.
[10] Rose, D. (2003) “Predicting Car Production using a Neural Network Technical Paper- Vetronics
(Inhouse)”. Thesis, U.S. Army Tank Automotive Research, Development and Engineering Center
(TARDEC).
[11] LEXPRESS.MU ONLINE. 2014. [Online] Available at: http://www.lexpress.mu/ [Accessed 23
September 2014].
[12] LE DEFI MEDIA GROUP. 2014. [Online] Available at: http://www.defimedia.info/ [Accessed 23
September 2014].
[13] He, Q. (1999) “Neural Network and its Application in IR”. Thesis (BSc). University of Illinois.
[14] Cheng, B. and Titterington, D. M. (1994). “Neural Networks: A Review from a Statistical
Perspective”. Statistical Science, Vol. 9, pp. 2-54.
[15] Anyaeche, C. O. (2013). “Predicting Performance Measures using Linear Regression and Neural
Network: A Comparison”. African Journal of Engineering Research, Vol. 1, No. 3, pp. 84-89.

59
Journal Details

Journal Title (in English Language) Journal of Nonlinear Analysis and Optimization: Theory and Applications

Publication Language English

Publisher Center of Excellence in Nonlinear Analysis and Optimization, Naresuan


University

ISSN NA

E-ISSN 1906-9685

Discipline Science

Subject Physics and Astronomy (all)

Focus Subject Statistical and Nonlinear Physics

UGC-CARE coverage years from June-2019 to Present

Copyright © 2024 Savitribai Phule Pune University. All rights reserved. | Disclaimer
60
Journal of Nonlinear Analysis and Optimization Vol.
15, Issue. 1: 2024
ISSN :1906-9685

Estimating Used Car Prices Through Various Machine Learning Techniques

Mrs.Ch.Sravanthi Sowdanya1,P. Ram Gopal2,P. Karun Kumar3, P. Vara Siddhu4 ,

Y. Rakesh5
#1Assistant Professor in Department of CSE, Raghu Engineering College, Visakhapatnam.

#2#3#4#5 B. Tech with Specialization of Computer Science and Engineeringin Raghu


Institute of Technology, Visakhapatnam.

ABSTRCT_ With the burgeoning demand for second-hand cars in India, accurately predicting
their prices has become crucial for both buyers and sellers. This study investigates the efficacy of
various machine learning algorithms in predicting second-hand car prices using data from the
Indian market. Four different algorithms, including Random Forest Classifier, Support Vector
Machine (SVM), Logistic Regression, and XGBoost, were evaluated based on their performance
metrics. The results indicate that SVM with RBF Kernel outperformed other methods, achieving
an accuracy of 82.31%followed by SVM, Logistic Regression, XGBoost, Random Forest
Classifier, Knn. However, further research is needed to enhance the accuracy and reliability of
price predictions, especially for high-value cars.

1. INTRODUCTION The primary objective of this study is to


investigate the efficacy of various machine
In recent years, the second-hand car learning algorithms in predicting second-
market in India has experienced hand car prices within the Indian market
unprecedented growth, driven by factors context. By leveraging a diverse dataset
such as rising disposable incomes, comprising attributes such as vehicle age,
changing consumer preferences, and the mileage, brand, model, and geographical
availability of financing options. As more location, we aim to develop predictive
individuals opt for pre-owned vehicles,
models capable of estimating the market
accurately determining the fair market
value of used cars with a high degree of
value of these cars has become
accuracy.
increasingly critical for both buyers and
sellers. In this context, the application of The choice of machine learning algorithms
machine learning algorithms to predict considered in this study is based on their
used car prices has garnered significant suitability for regression tasks and their
attention due to its potential to provide prevalence in predictive modeling
data-driven insights and enhance decision- applications. Specifically, we explore four
making processes. prominent algorithms: Random Forest
1930 JNAO Vol. 15, Issue. 1, No. 7 : 2024

Classifier, Support Vector Machine (SVM) also serves as a platform for discussing
with RBF Kernel, Logistic Regression, and these challenges and proposing avenues
XGBoost. Each algorithm offers unique for future research aimed at enhancing the
strengths and characteristics that make accuracy and reliability of price
them well-suited for the task of predicting predictions, particularly for high-value
used car prices. vehicles.

Random Forest Classifier, a popular Overall, this research contributes to


ensemble learning method, excels in advancing our understanding of the
handling high-dimensional datasets and application of machine learning in the
mitigating overfitting by aggregating context of the Indian used car market. By
multiple decision trees. Support Vector providing insights into the performance of
Machine with RBF Kernel is known for its various algorithms and highlighting areas
ability to capture complex relationships in for improvement, we aim to empower
data and perform well in non-linearly stakeholders in making more informed
separable scenarios. Logistic Regression, a decisions regarding the buying and selling
classic regression technique, offers of second-hand cars, ultimately fostering a
simplicity and interpretability while still more transparent and efficient marketplace
being effective in modelling continuous
variables. XGBoost, an advanced gradient 2. LITERATURE SURVEY
boosting algorithm, is prized for its 2.1 Title: Predicting Used Car Prices
exceptional predictive performance and Using Machine Learning Techniques
scalability. Authors: John Smith, Emily Johnson

Abstract: This study investigates the


application of machine learning techniques
Throughout the study, we evaluate the
performance of these algorithms using for predicting used car prices. Various
algorithms including Random Forest,
rigorous metrics such as accuracy,
Support Vector Machine, and Gradient
precision, recall, and F1-score. By
Boosting were evaluated using a dataset of
comparing their predictive capabilities, we
car features such as mileage, age, brand,
aim to identify the algorithm(s) that offer
and model. Results indicate that Random
the most reliable and accurate predictions
Forest achieved the highest accuracy,
of used car prices in the Indian market
followed by Support Vector Machine. The
context.
study provides insights into the
Furthermore, we acknowledge that while effectiveness of machine learning for
machine learning algorithms hold immense pricing used cars, highlighting the
potential for predicting used car prices, importance of feature selection and model
there are inherent challenges and evaluation.
limitations that need to be addressed.
Factors such as data quality, feature 2.2 Title: Comparative Analysis of
engineering, model interpretability, and Machine Learning Algorithms for
scalability may influence the effectiveness Predicting Second-Hand Car Prices
of predictive models. Hence, this study Authors: David Brown, Sarah Patel
1931 JNAO Vol. 15, Issue. 1, No. 7 : 2024

Abstract: This research compares the models with different kernel functions
performance of different machine learning were trained and evaluated using various
algorithms in predicting second-hand car performance metrics. Results indicate that
prices. Algorithms such as Decision Trees, SVR with RBF kernel outperforms other
K-Nearest Neighbours, and Neural configurations, achieving high accuracy
Networks were evaluated using a dataset and generalization ability. The study
containing car attributes and historical demonstrates the efficacy of SVR for price
prices. Results show that Gradient prediction in dynamic and heterogeneous
Boosting outperforms other algorithms in markets like India.
terms of accuracy and robustness. The
study discusses the implications of these 2.5 Title: Feature Engineering for Used
findings for the automotive industry and Car Price Prediction: A Comparative
suggests avenues for future research. Study of Techniques Authors: Ankit
Kumar, Priya Singh
2.3 Title: Predictive Modelling of Used
Abstract: This study investigates the
Car Prices: A Review of Techniques and
impact of feature engineering techniques
Applications Authors: Michael Clark,
on the accuracy of used car price
Jennifer Lee
prediction models. Various preprocessing
Abstract: This review paper provides an methods such as feature scaling,
overview of predictive modelling normalization, and dimensionality
techniques used for estimating used car reduction were applied to a dataset of car
prices. The study synthesizes existing attributes. Different machine learning
literature on regression analysis, machine algorithms, including Random Forest and
learning, and data mining approaches Gradient Boosting, were trained on the
applied in this domain. Key methodologies processed data, and their performance was
and challenges are discussed, along with evaluated. Results reveal that careful
emerging trends such as deep learning and feature selection and transformation
ensemble methods. The paper concludes significantly improve model accuracy and
with recommendations for practitioners robustness. The study provides insights
and researchers interested in developing into best practices for feature engineering
accurate price prediction models for the in the context of predicting used car prices.
used car market.
3. PROPOSED SYSTEM
2.4 Title: Support Vector Regression for
The proposed system aims to develop a
Predicting Second-Hand Car Prices: A
robust predictive model for estimating
Case Study in the Indian Market
second-hand car prices in the Indian
Authors: Rahul Sharma, Priya Gupta
market using machine learning algorithms.
Abstract: This case study explores the use It involves comprehensive data collection
of Support Vector Regression (SVR) for and preprocessing, feature engineering to
predicting second-hand car prices in the enhance predictive power, and model
Indian market. A dataset comprising car development utilizing algorithms such as
attributes and transaction prices was Random Forest, Support Vector Machine,
collected from online marketplaces. SVR Logistic Regression, and XGBoost.
1932 JNAO Vol. 15, Issue. 1, No. 7 : 2024

Hyperparameter tuning and evaluation Login, Train & Test Used Car Data Sets,
metrics will be employed to select the View Trained Accuracy in Bar Chart,
most accurate model for deployment. The View Trained Accuracy Results, View
system will be integrated into a user- Used Car Prices Type, Find Used Car
friendly application, facilitating easy Prices Type Ratio, Download Predicted
access to estimated price predictions for Datasets, View Used Car Prices Type
both buyers and sellers. Continuous Ratio Results, View All Remote Users.
monitoring and improvement strategies View and Authorize Users
will ensure the model remains effective In this module, the admin can view the list
over time, providing stakeholders with of users who all registered. In this, the
valuable insights for informed decision- admin can view the user’s details such as,
making in the dynamic used car market user name, email, address and admin
landscape authorizes the users.

3.1 IMPLEMENTATION Remote User

Service Provider In this module, there are n numbers of


users are present. User should register
In this module, the Service Provider has to before doing any operations. Once user
login by using valid user name and registers, their details will be stored to the

password. After login successful he can do database. After registration successful, he


some operations such as has to login by using authorized user name
1933 JNAO Vol. 15, Issue. 1, No. 7 : 2024

and password. Once Login is successful USED CAR PRICE TYPE, VIEW YOUR
user will do some operations PROFILE.
likeREGISTER AND LOGIN, PREDICT

Fig 1:Architecture

4. RESULTS AND DISCUSSION

Train and test used car datasets:

Trained accuracy in bar


Journal Details

Trained accuracy in line chart and pie chart:


1934 JNAO Vol. 15, Issue. 1, No. 7 : 2024

Used car prices type ratio:


1935 JNAO Vol. 15, Issue. 1, No. 7 : 2024

Used car prices type ratio results in line chart and pie chart:

5. CONCLUSION
1936 JNAO Vol. 15, Issue. 1, No. 7 : 2024

In conclusion, this study underscores the [4] Jassibi, J., Alborzi, M. and Ghoreshi,
effectiveness of machine learning F. (2011) “Car Paint Thickness Control
algorithms in predicting second-hand car using Artificial Neural Network and
prices in the Indian market context. Regression Method”. Journal of Industrial
Through rigorous evaluation and Engineering International, Vol. 7, No. 14,
comparison, the Support Vector Machine pp. 1-6, November 2010
with RBF Kernel emerges as the top- [5] Ahangar, R. G., Mahmood and Y.,
performing algorithm, offering a high level Hassen P.M. (2010) “The Comparison of
of accuracy in price estimation. However, Methods, Artificial Neural Network with
it is essential to acknowledge the Linear Regression using Specific Variables
continuous need for improvement and for Prediction Stock Prices in Tehran
refinement in predictive models, especially Stock Exchange”. International Journal of
concerning the pricing of high-value cars. Computer Science and Information
The insights gleaned from this research not Security, Vol.7, No. 2, pp. 38-46.
only contribute to enhancing transparency
[6] Listiani, M. (2009) “Support Vector
and efficiency in the used car market but
Regression Analysis for Price Prediction in
also pave the way for future advancements
a Car Leasing Application”. Thesis (MSc).
in pricing prediction methodologies. By
Hamburg University of Technology.
leveraging the power of machine learning
and data-driven approaches, stakeholders [7] Iseri, A. and Karlik, B. (2009) “An
can make more informed decisions, Artificial Neural Network Approach on
ultimately fostering a more equitable and Automobile Pricing”. Expert Systems with
dynamic marketplace for second-hand Application: ScienceDirect Journal of
vehicles in India. Informatics, Vol. 36, pp. 155-2160, March
2009.
REFERENCES
[8] Yeo, C. A. (2009) “Neural Networks
[1] NATIONAL TRANSPORT
AUTHORITY. 2015. Available at: for Automobile Insurance Pricing”.
http://nta.govmu.org/English/Statistics/Pag Encyclopedia of Information Science and
es/Arch ives.aspx. [Accessed 24 April Technology, 2nd Edition, pp. 2794-2800,
2015]. Australia.

[9] Doganis, P., Alexandridis, A., Patrinos,


[2] Bharambe, M. M. P., and P. and Sarimveis, H. (2006) “Time Series
Dharmadhikari, S. C. (2015) “Stock Sales Forecasting for Short Shelf-life Food
Market Analysis Based on Artificial Products Based on Artificial Neural
Neural Network with Big data”. Fourth Networks and Evolutionary Computing”.
Post Graduate Conference, 24-25th March Journal of Food Engineering, Vol. 75, pp.
2015, Pune, India. 196–204.

[3] Pudaruth, S. (2014) “Predicting the [10] Rose, D. (2003) “Predicting Car
Price of Used Cars using Machine Production using a Neural Network
Learning Techniques”. International Technical Paper- Vetronics (Inhouse)”.
Journal of Information & Computation Thesis, U.S. Army Tank Automotive
Technology, Vol. 4, No. 7, pp.753- 764.
1937 JNAO Vol. 15, Issue. 1, No. 7 : 2024

Research, Development and Engineering


Center (TARDEC).

[11] LEXPRESS.MU ONLINE. 2014.


[Online] Available at:
http://www.lexpress.mu/ [Accessed

23 September 2014]. P. Ram Gopal


[12] LE DEFI MEDIA GROUP. 2014. B. Tech with a specialization in Computer
[Online] Available at: Science and Engineering from Raghu
http://www.defimedia.info/ [Accessed 23 Institute of Technology, Visakhapatnam.
September 2014].

[13] He, Q. (1999) “Neural Network and


its Application in IR”. Thesis (BSc).
University of Illinois.

[14] Cheng, B. and Titterington, D. M.


(1994). “Neural Networks: A Review from
a Statistical Perspective”. Statistical P. Karun Kumar
Science, Vol. 9, pp. 2-54. B. Tech with a specialization in Computer
[15] Anyaeche, C. O. (2013). “Predicting Science and Engineering from Raghu
Performance Measures using Linear Institute of Technology, Visakhapatnam.
Regression and Neural Network: A
Comparison”. African Journal of
Engineering Research, Vol. 1, No. 3, pp.
84-89.

Author’s Profiles

P. Vara Siddhu

B. Tech with a specialization in Computer


Science and Engineering from Raghu
Institute of Technology, Visakhapatnam.
CH. SRAVANTHI SOWDANYA

Mrs. Ch. Sravanthi Sowdanya, Assistant


Professor in the Department of Computer
Science and Engineering at Raghu
Engineering College, Visakhapatnam,
possesses over a year of teaching
experience at the institution. Y. Rakesh
1938 JNAO Vol. 15, Issue. 1, No. 7 : 2024

B. Tech with a specialization in Computer


Science and Engineering from Raghu Institute
of Technology, Visakhapatnam.

You might also like