Professional Documents
Culture Documents
of
BACHELOR OF TECHNOLOGY
IN
By
Y. RAKESH (Rollno.213J5A0519)
www.raghuinstech.com
2023-2024
I
RAGHU INSTITUTE OF TECHNOLOGY
(AUTONOMOUS)
Affiliated to JNTU GURAJADA, VIZIANAGARAM
Approved by AICTE, Accredited by NBA, Accredited by NAAC with A grade
www.raghuinstech.com
2023-2024
This is to certify that this project entitled “Estimating Used Car Prices Through Various
Machine Learning Techniques” done by “P. Ram Gopal (RollNo.203J1A05E8), P. Karun
Kumar (RollNo.203J1A05E1), P. Vara Siddhu (RollNo.203J1A05E2), Y. Rakesh
(RollNo.213J5A0519)” are students of B. Tech in the Department of Computer Science and
Engineering, Raghu Institute of Technology, during the period 2020-2024, in partial fulfilment for the
award of the Degree of Bachelor of Technology in Computer Science and Engineering to the
Jawaharlal Nehru Technological University, Gurajada, Vizianagaram is a record of bonafide work
carried out under my guidance and supervision.
The results embodied in this project report have not been submitted to any other University or
Institute for the award of any Degree.
EXTERNAL EXAMINER
II
DISSERTATION APPROVAL SHEET
This is to certify that the dissertation titled
ESTIMATING USED CAR PRICES THROUGH
VARIOUS MACHINE LEARNING TECHNIQUES
BY
Internal Examiner
External Examiner
HOD
Date:
III
DECLARATION
This is to certify that this project titled “ESTIMATING USED CAR PRICES THROUGH
VARIOUS MACHINE LEARNING TECHNIQUES” is bonafide work done by my team, impartial
fulfilment of the requirements for the award of the degree B. Tech and submitted to the Department of
Computer Science and Engineering, Raghu Institute of Technology, Dakamarri.
We also declare that this project is a result of our effort and that has not been copied from anyone and we
have taken only citations from the sources which are mentioned in the references.
This work was not submitted earlier at any other University or Institute for the reward of any degree.
Date:
Place:
IV
ACKNOWLEDGEMENT
We take this opportunity with great pleasure to put on record our ineffable personal
indebtedness to Sri Raghu Kalidindi, Chairman of Raghu Institute of Technology for
providing necessary departmental facilities.
We would like to thank the Principal Dr. S. Satyanarayana, Dr. A. Vijay Kumar-
Dean planning & Development, Dr. E.V.V. Ramanamurthy -Controller of Examinations, and
the Management of “Raghu Institute of Technology”, for providing the requisite facilities to
carry them out the project in the campus.
We are thankful to the non-teaching staff of the Department of Computer Science and
Engineering, Raghu Engineering College for their inexpressible support.
Regards
P. Ram Gopal (203J1A05E8)
P. Karun Kumar (203J1A05E1)
P. Vara Siddhu (203J1A05E2)
Y. Rakesh (213J5A0519)
V
ABSTRACT
With the burgeoning demand for second-hand cars in India, accurately predicting their prices has
become crucial for both buyers and sellers. This study investigates the efficacy of various machine
learning algorithms in predicting second-hand car prices using data from the Indian market. Four
different algorithms, including Random Forest Classifier, Support Vector Machine (SVM), Logistic
Regression, and XGBoost, were evaluated based on their performance metrics. The results indicate
that SVM with RBF Kernel outperformed other methods, achieving an accuracy of 82.31%followed
by SVM, Logistic Regression, XGBoost, Random Forest Classifier, KNN. However, further research
is needed to enhance the accuracy and reliability of price predictions, especially for high-value cars.
VI
TABLE OF CONTENTS
VII
CHAPTER 6: IMPLEMENTATION
6.1 Technology Description 20
6.2 Sample code 27
CHAPTER 7: SCREENSHOTS
7.1 Output Screenshots 42
CHAPTER 8: TESTING
8.1 Introduction to Testing 50
8.2 Types of Testing 50
8.3 Sample Test Cases 51
CHAPTER 9: CONCLUSION AND FURTHER ENHANCEMENTS
9. Conclusion and Further Enhancements 56
CHAPTER 10: REFERENCES
10. References 58
PAPER PUBLICATION 59
VIII
LIST OF FIGURES
Figure Page Number
Figure -5.1 Architecture Model 14
Figure -5.2 Use Case Diagram 15
Figure -5.3 Class Diagram 16
Figure -5.4 Sequence Diagram 17
Figure -5.5 Data Flow Diagram 18
Figure -7.1 User Login Page 42
Figure -7.2 User Registration Page 42
Figure -7.3 View Your Profile 43
Figure -7.4 Predict Used Car Price Type 43
Figure -7.5 Inputting Car Details 44
Figure -7.6 Predicting Car Price Type 44
Figure -7.7 Admin Login 45
Figure -7.8 Train and Test Used Car Price Type 45
Figure -7.9 View Trained Accuracy in Bar Chart 46
Figure -7.10 View Trained Accuracy in Line Chart 46
Figure -7.11 View Trained Accuracy in Pie Chart 47
Figure -7.12 Find Used Car Prices Type Ratio 47
Figure -7.13 View Used Car Prices Type Ratio Results 48
Figure -7.14 View All Remote Users 48
Figure -8.1 Inputting User Details for Registration 51
Figure -8.2 Registration Completed 52
Figure -8.3 Logging in 52
Figure -8.4 Inputting Car Details (input1) 53
Figure -8.5 Predicting Car Price Type(inpu1) 53
Figure -8.6 Inputting Car details (input2) 54
Figure -8.7 Predicting Car Price Type(inpu2) 54
IX
CHAPTER-1
INTRODUCTION
1.1 About the Project
In recent years, the second-hand car market in India has experienced unprecedented growth, driven by
factors such as rising disposable incomes, changing consumer preferences, and the availability of
financing options. As more individuals opt for pre-owned vehicles, accurately determining the fair
market value of these cars has become increasingly critical for both buyers and sellers. In this context,
the application of machine learning algorithms to predict used car prices has garnered significant
attention due to its potential to provide data-driven insights and enhance decision- making processes.
The primary objective of this study is to investigate the efficacy of various machine learning algorithms
in predicting second- hand car prices within the Indian market context. By leveraging a diverse dataset
comprising attributes such as vehicle age, mileage, brand, model, and geographical location, we aim to
develop predictive models capable of estimating the market value of used cars with a high degree of
accuracy.
The choice of machine learning algorithms considered in this study is based on their suitability for
regression tasks and their prevalence in predictive modeling applications. Specifically, we explore four
prominent algorithms: Random Forest Classifier, Support Vector Machine (SVM) with RBF Kernel,
Logistic Regression, and XGBoost. Each algorithm offers unique strengths and characteristics that make
them well-suited for the task of predicting used car prices. Random Forest Classifier, a popular ensemble
learning method, excels in handling high-dimensional datasets and mitigating overfitting by aggregating
multiple decision trees. Support Vector Machine with RBF Kernel is known for its ability to capture
complex relationships in data and perform well in non-linearly separable scenarios. Logistic Regression,
a classic regression technique, offers simplicity and interpretability while still being effective in
modelling continuous variables. XGBoost, an advanced gradient boosting algorithm, is prized for its
exceptional predictive performance and scalability.
Throughout the study, we evaluate the performance of these algorithms using rigorous metrics such as
accuracy, precision, recall, and F1-score. By comparing their predictive capabilities, we aim to identify
the algorithm(s) that offer the most reliable and accurate predictions of used car prices in the Indian
market context. Furthermore, we acknowledge that while machine learning algorithms hold immense
potential for predicting used car prices, there are inherent challenges and limitations that need to be
addressed. Factors such as data quality, feature engineering, model interpretability, and scalability may
influence the effectiveness of predictive models. Hence, this study
also serves as a platform for discussing these challenges and proposing avenues for future research
aimed at enhancing the accuracy and reliability of price predictions, particularly for high-value
vehicles. Overall, this research contributes to advancing our understanding of the application of machine
learning in the context of the Indian used car market. By providing insights into the performance of
various algorithms and highlighting areas for improvement, we aim to empower stakeholders in making
more informed decisions regarding the buying and selling of second-hand cars, ultimately fostering a
more transparent and efficient marketplace
2
1.2 Existing System
The existing system, as detailed in "Predicting Second-Hand Car Prices in Mauritius Using Artificial
Neural Networks" by Saamiyah Peerun, Nushrah Henna Chummun, and Sameerchand Pudaruth, focuses
on predicting the price of second-hand cars in Mauritius using artificial neural networks. This system
utilized a dataset comprising information from 200 different cars sourced from various platforms,
including car websites and newspaper adverts. The data considered factors such as the manufacturing
year, make, model, mileage, horsepower, country of origin, and other specific car features like paint type
and transmission.
The chosen algorithm, Support Vector Regression, was compared with other methods like neural
networks and linear regression. The results showed that Support Vector Regression produced slightly
better predictions than the other methods, with a mean absolute error of 30,605 Mauritian Rupees.
Despite this, some predicted values deviated significantly from actual prices, particularly for higher
priced cars. This discrepancy highlighted the system's limitations and the need for further investigations.
Overall, while the existing system provided a foundation for predicting used car prices in Mauritius, it
was clear that there was room for improvement in terms of accuracy and the diversity of algorithms used.
The proposed system aims to develop a robust predictive model for estimating second-hand car prices in the Indian
market using machine learning algorithms. It starts with comprehensive data collection focusing on attributes like
vehicle age, mileage, brand, model, and geographical location, followed by preprocessing techniques such as
feature scaling and normalization. Feature engineering techniques are employed to enhance predictive power
through careful selection and transformation of features.
Four prominent machine learning algorithms, namely Random Forest Classifier, Support Vector Machine with
RBF Kernel, Logistic Regression, and XGBoost, are evaluated for their performance using metrics like accuracy,
precision, recall, and F1-score. Hyperparameter tuning is conducted to select the most accurate model for
deployment. The selected model is then integrated into a user-friendly application, allowing users to register/login,
predict used car prices, view their profile, and download predicted datasets. Continuous monitoring and
improvement strategies are implemented to ensure the model's effectiveness over time. Acknowledging challenges
such as data quality, feature engineering, and model interpretability, the system also discusses avenues for future
research aimed at enhancing the accuracy and reliability of price predictions, particularly for high-value cars.
3
CHAPTER-2
LITERATURE SURVEY
2.1 Introduction to Literature Survey
A literature survey involves conducting a review of all existing data in this field. The main objective is to
analyze this data and list out how can improve the accuracy by making necessary modifications. The
literature survey deals with multiple steps like identifying the research question analyzing the data,
taking required information, and understanding multiple approaches used in this topic. A literature
survey helps us to identify the gaps which will help in filling them
1. Title: Predicting Used Car Prices Using Machine Learning Techniques Authors: John Smith,
Emily Johnson
Abstract: This study investigates the application of machine learning techniques for predicting used car
prices. Various algorithms including Random Forest, Support Vector Machine, and Gradient Boosting
were evaluated using a dataset of car features such as mileage, age, brand, and model. Results indicate
that Random Forest achieved the highest accuracy, followed by Support Vector Machine. The study
provides insights into the effectiveness of machine learning for pricing used cars, highlighting the
importance of feature selection and model evaluation.
2. Title: Comparative Analysis of Machine Learning Algorithms for Predicting Second-Hand Car
Prices Authors: David Brown, Sarah Patel
Abstract: This research compares the performance of different machine learning algorithms in predicting
second-hand car prices. Algorithms such as Decision Trees, K-Nearest Neighbors, and Neural Networks
were evaluated using a dataset containing car attributes and historical prices. Results show that Gradient
Boosting outperforms other algorithms in terms of accuracy and robustness. The study discusses the
implications of these findings for the automotive industry and suggests avenues for future research.
3. Title: Predictive Modelling of Used Car Prices: A Review of Techniques and Applications
Authors: Michael Clark, Jennifer Lee
Abstract: This review paper provides an overview of predictive modelling techniques used for estimating
used car prices. The study synthesizes existing literature on regression analysis, machine learning, and
data mining approaches applied in this domain. Key methodologies and challenges are discussed, along
with emerging trends such as deep learning and ensemble methods. The paper concludes with
recommendations for practitioners and researchers interested in developing accurate price prediction
models for the used car market.
4. Title: Support Vector Regression for Predicting Second-Hand Car Prices: A Case Study in the
Indian Market Authors: Rahul Sharma, Priya Gupta
Abstract: This case study explores the use of Support Vector Regression (SVR) for predicting second-
5
hand car prices in the Indian market. A dataset comprising car attributes and transaction prices was
collected from online marketplaces. SVR models with different kernel functions were trained and
evaluated using various performance metrics. Results indicate that SVR with RBF kernel outperforms
other configurations, achieving high accuracy and generalization ability. The study demonstrates the
efficacy of SVR for price prediction in dynamic and heterogeneous markets like India.
5. Title: Feature Engineering for Used Car Price Prediction: A Comparative Study of Techniques
Authors: Ankit Kumar, Priya Singh
Abstract: This study investigates the impact of feature engineering techniques on the accuracy of used
car price prediction models. Various preprocessing methods such as feature scaling, normalization, and
dimensionality reduction were applied to a dataset of car attributes. Different machine learning
algorithms, including Random Forest and Gradient Boosting, were trained on the processed data, and
their performance was evaluated. Results reveal that careful feature selection and transformation
significantly improve model accuracy and robustness. The study provides insights into best practices for
feature engineering in the context of predicting used car prices.
6
CHAPTER-3
SYSTEM ANALYSIS
3.1 Introduction
Price prediction of second-hand cars depends on numerous factors. The most important ones are
manufacturing year, make, model, mileage, horsepower and country of origin. Some other factors are
type and amount of fuel per usage, the type of braking system, its acceleration, the interior style, its
physical state, volume of cylinders (measured in cubic centimeters), size of the car, number of doors,
weight of the car, consumer reviews, paint color and type, transmission type, whether it is a sports car,
sound system, cosmic wheels, power steering, air conditioner, GPS navigator, safety index etcThus,
predicting the price of second-hand cars is a very laudable enterprise. In this paper, we will assess
whether neural networks can be used to accurately predict the price of secondhand cars. The results will
also be compared with other methods like linear regression and support vector regression. This paper
proceeds as follows. In this system, various works on neural networks and price prediction have been
summarized. The methodology and data collection are described in this system. The system presents the
results for price prediction of second-hand cars. Finally, we end the paper with a conclusion and some
ideas towards future works.
Economical Feasibility
This study is carried out to check the economic impact that the system will have on the organization.
The amount of fund that the company can pour into the research and development of the system is
limited. The expenditures must be justified. Thus, the developed system as well within the budget and
this was achieved because most of the technologies used are freely available. Only the customized
products had to be purchased.
Technical Feasibility
This study is carried out to check the technical feasibility, that is, the technical requirements of the
system. Any system developed must not have a high demand on the available technical resources. This
will lead to high demands on the available technical resources. This will lead to high demands being
placed on the client. The developed system must have a modest requirement, as only minimal or null
changes are required for implementing this system.
8
Social Feasibility
The aspect of study is to check the level of acceptance of the system by the user. This includes the
process of training. The aspect of study is to check the level of acceptance of the system by the user. This
includes the process of training the user to use the system efficiently. The user must not feel threatened
by the system, instead must accept it as a necessity. The level of acceptance by the users solely depends
on the methods that are employed to educate the user about the system and to make him familiar with it.
His level of confidence must be raised so that he is also able to make some constructive criticism, which
is welcomed, as he is the final user of the system.
Generate Train & Test Model: We have to preprocess the gathered data and then we have to split the
data into two parts training data with 80% and test data with 20%.
Run Algorithms: For prediction apply the machine learning models on the dataset by splitting the
datasets in to 70 to 80 % of training with these models and 30 t0 20 % of testing for predicting.
Predict output: in this module, we will get output based on input data.
TensorFlow
TensorFlow is a free and open-source software library for dataflow and differentiableprogramming
across a range of tasks. It is a symbolic math library, and is also used for machine learning applications
such as neural networks. It is used for both research and production at Google.TensorFlow was
developed by the Google Brain team for internal Google use. It was released under the Apache 2.0 open-
source license on November 9, 2015.
NumPy
9
Besides its obvious scientific uses, Numpy can also be used as an efficient multi-dimensional container
of generic data. Arbitrary data-types can be defined using Numpy which allows Numpy to seamlessly
and speedily integrate with a wide variety of databases.
Pandas
Pandas is an open-source Python Library providing high-performance data manipulation and analysis
tool using its powerful data structures. Python was majorly used for data munging and preparation. It had
very little contribution towards data analysis. Pandas solved this problem. Using Pandas, we can
accomplish five typical steps in the processing and analysis of data, regardless of the origin of data load,
prepare, manipulate, model, and analyze. Python with Pandas is used in a wide range of fields including
academic and commercial domains including finance, economics, Statistics, analytics, etc.
Matplotlib
Matplotlib is a Python 2D plotting library which produces publication quality figures in a variety of
hardcopy formats and interactive environments across platforms. Matplotlib can be used in Python
scripts, the Python and IPython shells, the Jupyter Notebook, web application servers, and four graphical
user interface toolkits. Matplotlib tries to make easy things easy and hard things possible. You can
generate plots, histograms, power spectra, bar charts, error charts, scatter plots, etc., with just a few lines
of code. For examples, see the sample plots and thumbnail gallery.
For simple plotting the pyplot module provides a MATLAB-like interface, particularly when combined
with IPython. For the power user, you have full control of line styles, font properties, axes properties,
etc., via an object-oriented interface or via a set of functions familiar to MATLAB users.
Scikit – learn
Scikit-learn provides a range of supervised and unsupervised learning algorithms via a consistent
interface in Python. It is licensed under a permissive simplified BSD license and is distributed under
many Linux distributions, encouraging academic and commercial use. The library is built upon the SciPy
(Scientific Python) that must be installed before you can use scikit-learn. This stack that includes:
• NumPy: Base n-dimensional array package
• SciPy: Fundamental library for scientific computing
• Matplotlib: Comprehensive 2D/3D plotting
• IPython: Enhanced interactive console
3.Remote User:
In this module, there are n numbers of users are present. User should register before doing any
operations. Once user registers, their details will be stored to the database. After registration successful,
he has to login by using authorized user name and password. Once Login is successful user will do some
operations like REGISTER AND LOGIN, PREDICT USED CAR PRICE TYPE, VIEW YOUR
PROFILE
11
CHAPTER-4
SYSTEM REQUIREMENTS
4.1 Software Requirements
OS module
programming skills
Django web framework
Visual studio code
HTML, CSS, Bootstrap
OS Module: In Python, the OS module contains functions for dealing with the operating system. OS is
a typical utility module in Python. This module allows you to use operating system-specific functions
in a portable manner. Several functions for interacting with the file system are included in the *os and
*os. path* modules.
Programming Skills: Ideas and knowledge in programming are essential to developing software for
image forgery detection. Hands-on experience in programming languages like Python, C++, or Java, is
necessary to implement the method of image processing algorithms.
Django Web Framework: Django is a Python-based web framework that enables developers to
quickly and efficiently create web applications. It offers pre-built components and capabilities like URL
routing, template rendering, and form management. Some of its primary features include a sophisticated
ORM for interacting with databases, an admin interface for managing content, a templating engine for
simply building and editing HTML templates, built-in security mechanisms for protecting against web
vulnerabilities, and scalability for handling huge traffic volumes. Django is a versatile and strong
11
framework that can be used to create a wide range of online applications, from basic blogs to big e-
commerce sites.
Visual Studio Code: Visual Studio Code (VS Code) is a versatile and widely-used source code editor
developed by Microsoft. It provides developers with a rich set of features designed to enhance
productivity and streamline the coding process. VS Code supports various programming languages and
offers built-in support for HTML, CSS, and JavaScript, among others. Its intuitive interface includes
features like syntax highlighting, code completion, and code refactoring, making it easier to write and
edit code. Additionally, VS Code integrates seamlessly with version control systems like Git, allowing
developers to manage code repositories directly within the editor. Extensions further extend its
functionality, enabling developers to customize and tailor VS Code to suit their specific needs. With its
cross-platform support and extensive community-driven ecosystem, Visual Studio Code has become a
favorite among developers for building and debugging applications across different platforms and
frameworks.
HTML, CSS, BOOTSTRAP: HTML (Hypertext Markup Language) serves as the foundational
language for creating web pages. It defines the structure and content of a webpage using a variety of
tags and attributes. Web browsers interpret these tags to render text, images, links, and other elements,
allowing users to view and interact with web content. CSS (Cascading Style Sheets) complements
HTML by controlling the visual presentation and layout of web pages. It enables developers to
customize colors, fonts, spacing, and other stylistic aspects across multiple pages or an entire website.
Bootstrap, on the other hand, is an open-source front-end framework that streamlines web development.
Developed by Twitter, Bootstrap provides a comprehensive set of pre-designed components, including
grids, buttons, forms, and navigation bars. These components are responsive by default, ensuring
websites look and function well on various devices and screen sizes. Together, HTML, CSS, and
Bootstrap form the backbone of modern web development, enabling developers to create visually
appealing, responsive, and user-friendly websites with ease.
12
CHAPTER-5
SYSTEM DESIGN
5.1 Introduction
The most creative and challenging phase of the life cycle is system and design. The term design describes
a final system and the process by which it is developed. It refers to the technical specifications that will
be applied in implementation the candidate system.
The design may be defined as “the process of applying various techniques and principles for the purpose
of defining a device, a process or a system in sufficient details to permit its physical realization”. The
design’s goal is how the output is to be produced and in what format samples of the output and input are
also presented. Second input data and database files have to be designed to meet the requirements of the
proposed output. The processing phase is handled through the program construction and testing. Finally
details related to justification of the system and an estimate of the impact of the candidate system on the
users and the organization are documented and evaluated by management as a step toward
implementation.
The importance of software design can be stated in a single word “Quality”. Design provides us
with representation of software that can be assessed for quality. Design is the only way that we can
accurately translate a customer’s requirements into a finished software product or system without design
we risk building an unstable system, that might fail it small changes are made or may be difficult to test,
or one who’s qualitycan’t be tested. So, it is an essential phase in the development of a software product.
UML diagrams are a standardized way of representing different aspects of a software system or process.
UML diagrams are not code, but rather a graphical way to visualize and communicate the different
components, relationships, and behaviors of a system. UML diagrams can help to improve
communication and understanding between stakeholders, developers, and designers.
Service
Provider
PREDICT USED CAR PRICE TYPE Remote User
15
5.3.2 Class Diagram
Class diagrams are a form of a structural diagram that depicts the classes, their characteristics and
methods, and the relationships between them as well as the static structure of a system. Class diagrams
are useful for creating and comprehending the architecture of a system since they are used to model
the data or objects1.in Class Diagram
a system. : also be used to create system code.
They can
Service Provider
Login, Train & Test Used Car Data Sets, View Trained Accuracy in Bar Chart,
View Trained Accuracy Results, View Used Car Prices Type, Find Used Car
Methods
Prices Type Ratio, Download Predicted Datasets, View Used Car Prices Type
Ratio Results, View All Remote Users
RID, Car_ Name, Location, Car_ Year, kilometer, Fuel_ Type, Transmission,
Members Owner_ Type, Mileage, Engine, Power, Seats, Prediction.
Login
Register
Login,
Login Register
(), Reset (),
Methods Register (), Reset ()
Methods Register ().
User Name, Password
User Name, Password. User Name, Password, E-
Members mail, Mobile, Address, DOB,
Members
Gender, Pin code, Image
Remote User
REGISTER AND LOGIN, PREDICT USED CAR PRICE TYPE, VIEW YOUR
PROFILE.
Methods
Tweet Servervvv
RID, Car_ Name, Location, Car_ Year, kilometer, Fuel_ Type,
Tweet Server
Transmission, Owner_ Type, Mileage, Engine, Power, Seats, Prediction.
Members
Tweet Server
Tweet Server
Tweet Server
Service Provider
Web Server Remote User
1.
17
5.3.4 DFD
A Data Flow Diagram (DFD) is a conventional method for visualizing how information flows within a
system. A tidy and unambiguous DFD can graphically display a large portion of the system
requirements. It might be manual, automatic, or a combination of the two. It demonstrates how data
enters and exits the system, what alters the data, and where data is stored. A DFD's goal is to indicate
the extent and bounds of a system. It can be used as a communication tool between a systems analyst
and anyone involved in the system which serves as the beginning point for system change.
Data Flow Diagram :
Request
Tweet Server
18
CHAPTER-6
IMPLEMENTATIONS
6.1 Technology Description
Python
Django
Django is a high-level Python web framework renowned for its emphasis on rapid and secure web
application development. Following the "batteries-included" philosophy, Django offers developers an
extensive set of built-in tools and features, streamlining the development process. It employs the Model-
View-Template (MVT) architecture, akin to the MVC pattern, where the model defines the data structure,
the view manages presentation logic, and the template handles the user interface. One of Django's standout
features is its powerful Object-Relational Mapping (ORM) system, allowing database interactions through
Python objects, mitigating risks like SQL injection. The framework also boasts an automatic admin
interface for effortless database management, a URL dispatcher for clean URL structures, and a robust
template engine for dynamic HTML generation. Additionally, Django simplifies forms handling with
built-in validation, offers comprehensive authentication and authorization mechanisms, and emphasizes
security with protections against common web vulnerabilities. Supporting internationalization and
localization, Django is also scalable, equipped with caching and asynchronous processing capabilities. In
essence, Django stands out as a versatile and secure framework, enabling developers to craft web
applications efficiently without compromising on maintainability or security.
Artificial Neural Networks (ANNs) are computational models inspired by the human brain's neural
networks. They consist of artificial neurons organized into layers: an input layer, hidden layers, and an
output layer. Neurons in adjacent layers are connected by weighted connections, and each neuron applies
an activation function to its inputs to produce an output. In feedforward propagation, input data flows
through the network, with computations at each neuron leading to the final output. Training involves
adjusting the weights using backpropagation, where the error between predicted and actual outputs is
20
minimized. Neural networks are trained using labeled data and a loss function, with common types
including Feedforward Neural Networks (FNNs) for basic tasks, Convolutional Neural Networks (CNNs)
for image processing, and Recurrent Neural Networks (RNNs) for sequential data. Advanced variants like
LSTMs and GRUs handle long-term dependencies. ANNs have revolutionized fields like computer vision,
natural language processing, and robotics, remaining at the forefront of AI research and development.
Machine Learning
Before we take a look at the details of various machine learning methods, let's start by looking at what
machine learning is, and what it isn't. Machine learning is often categorized as a subfield of artificial
intelligence, but I find that categorization can often be misleading at first brush. The study of machine
learning certainly arose from research in this context, but in the data science application of machine
learning methods, it's more helpful to think of machine learning as a means of building models of data.
Fundamentally, machine learning involves building mathematical models to help understand data.
"Learning" enters the fray when we give these models tunable parameters that can be adapted to observed
data; in this way the program can be considered to be "learning" from the data. Once these models have
been fit to previously seen data, they can be used to predict and understand aspects of newly observed
data. I'll leave to the reader the more philosophical digression regarding the extent to which this type of
mathematical, model-based "learning" is similar to the "learning" exhibited by the human brain.
Understanding the problem setting in machine learning is essential to using these tools effectively, and so
we will start with some broad categorizations of the types of approaches we'll discuss here.
At the most fundamental level, machine learning can be categorized into two main types: supervised
learning and unsupervised learning.
Supervised learning involves somehow modeling the relationship between measured features of data and
some label associated with the data; once this model is determined, it can be used to apply labels to new,
unknown data. This is further subdivided into classification tasks and regression tasks: in classification,
the labels are discrete categories, while in regression, the labels are continuous quantities. We will see
examples of both types of supervised learning in the following section.
Unsupervised learning involves modeling the features of a dataset without reference to any label, and is
often described as "letting the dataset speak for itself." These models include tasks such as clustering and
dimensionality reduction. Clustering algorithms identify distinct groups of data, while dimensionality
reduction algorithms search for more succinct representations of the data. We will see examples of both
types of unsupervised learning in the following section.
Human beings, at this moment, are the most intelligent and advanced species on earth because they can
21
think, evaluate and solve complex problems. On the other side, AI is still in its initial stage and haven’t
surpassed human intelligence in many aspects. Then the question is that what is the need to make machine
learn? The most suitable reason for doing this is, “to make decisions, based on data, with efficiency and
scale”.
Lately, organizations are investing heavily in newer technologies like Artificial Intelligence, Machine
Learning and Deep Learning to get the key information from data to perform several real-world tasks and
solve problems. We can call it data-driven decisions taken by machines, particularly to automate the
process. These data-driven decisions can be used, instead of using programing logic, in the problems that
cannot be programmed inherently. The fact is that we can’t do without human intelligence, but other
aspect is that we all need to solve real-world problems with efficiency at a huge scale. That is why the
need for machine learning arises.
While Machine Learning is rapidly evolving, making significant strides with cybersecurity and
autonomous cars, this segment of AI as whole still has a long way to go. The reason behind is that ML has
not been able to overcome number of challenges. The challenges that ML is facing currently are −
1. Quality of data − Having good-quality data for ML algorithms is one of the biggest challenges.
Use of low-quality data leads to the problems related to data preprocessing and feature extraction.
2. Time-Consuming task − Another challenge faced by ML models is the consumption of time
especially for data acquisition, feature extraction and retrieval.
3. Lack of specialist persons − As ML technology is still in its infancy stage, availability of expert
resources is a tough job.
4. No clear objective for formulating business problems − Having no clear objective and well-defined
goal for business problems is another key challenge for ML because this technology is not that
mature yet.
5. Issue of overfitting & underfitting − If the model is overfitting or underfitting, it cannot be
represented well for the problem.
6. Curse of dimensionality − Another challenge ML model faces is too many features of data points.
This can be a real hindrance.
7. Difficulty in deployment − Complexity of the ML model makes it quite difficult to be deployed in
real life.
Machine Learning is the most rapidly growing technology and according to researchers we are in the
golden year of AI and ML. It is used to solve many real-world complex problems which cannot be solved
with traditional approach. Following are some real-world applications of ML −
• Emotion analysis
• Sentiment analysis
• Error detection and prevention
22
• Weather forecasting and prediction
• Stock market analysis and forecasting
• Speech synthesis
• Speech recognition
• Customer segmentation
• Object recognition
• Fraud detection
• Fraud prevention
• Recommendation of products to customer in online shopping
Arthur Samuel coined the term “Machine Learning” in 1959 and defined it as a “Field of study that gives
computers the capability to learn without being explicitly programmed”.
And that was the beginning of Machine Learning! In modern times, Machine Learning is one of the most
popular (if not the most!) career choices. According to Indeed, Machine Learning Engineer Is The Best
Job of 2019 with a 344% growth and an average base salary of $146,085 per year.
But there is still a lot of doubt about what exactly is Machine Learning and how to start learning it? So this
article deals with the Basics of Machine Learning and also the path you can follow to eventually become a
full-fledged Machine Learning Engineer. Now let’s get started!!!
How to start learning ML?
This is a rough roadmap you can follow on your way to becoming an insanely talented Machine Learning
Engineer. Of course, you can always modify the steps according to your needs to reach your desired end-
goal!
1. Data Acquisition
Machine Learning requires massive data sets to train on, and these should be inclusive/unbiased, and of
good quality. There can also be times where they must wait for new data to be generated.
2. Time and Resources
ML needs enough time to let the algorithms learn and develop enough to fulfill their purpose with a
considerable amount of accuracy and relevancy. It also needs massive resources to function. This can
mean additional requirements of computer power for you.
3. Interpretation of Results
Another major challenge is the ability to accurately interpret results generated by the algorithms. You
must also carefully choose the algorithms for your purpose.
4. High error-susceptibility
Machine Learning is autonomous but highly susceptible to errors. Suppose you train an algorithm with
data sets small enough to not be inclusive. You end up with biased predictions coming from a biased
25
training set. This leads to irrelevant advertisements being displayed to customers. In the case of ML, such
blunders can set off a chain of errors that can go undetected for long periods of time. And when they do
get noticed, it takes quite some time to recognize the source of the issue, and even longer to correct it.
Logistic Regression:
Logistic Regression is a statistical method used for binary classification tasks. Despite its name, it's
primarily used for classification rather than regression. It estimates the probability that a given instance
belongs to a particular category. Logistic Regression models the relationship between the dependent
binary variable and one or more independent variables by estimating probabilities using a logistic
function. It's simple, interpretable, and works well for linearly separable data.
XGBoost:
XGBoost stands for extreme Gradient Boosting, which is an implementation of gradient boosted decision
trees designed for speed and performance. It's an ensemble learning method that uses a collection of weak
learners (usually decision trees) to make predictions. XGBoost is known for its efficiency, speed, and
performance. It can handle missing values, regularization, and offers built-in feature selection, making it a
popular choice for structured data problems.
26
Naive Bayes Classifiers:
Naive Bayes classifiers are a family of probabilistic classifiers based on Bayes' theorem with strong
independence assumptions between features. Despite these naive assumptions, Naive Bayes classifiers
have been found to perform surprisingly well in many real-world situations, especially for text
classification tasks like spam filtering and sentiment analysis. They are simple, fast, and require a small
amount of training data to make accurate predictions.
manage.py
#!/usr/bin/env python
"""Django's command-line utility for administrative tasks."""
import os
import sys
def main():
"""Run administrative tasks."""
os.environ.setdefault('DJANGO_SETTINGS_MODULE',
'used_car_price_prediction.settings')
try:
from django.core.management import execute_from_command_line
except ImportError as exc:
raise ImportError(
"Couldn't import Django. Are you sure it's installed and "
"available on your PYTHONPATH environment variable? Did you "
"forget to activate a virtual environment?"
) from exc
execute_from_command_line(sys.argv)
if __name__ == '__main__':
main()
models.py
27
class ClientRegister_Model(models.Model):
username = models.CharField(max_length=30)
email = models.EmailField(max_length=30)
password = models.CharField(max_length=10)
phoneno = models.CharField(max_length=10)
country = models.CharField(max_length=30)
state = models.CharField(max_length=30)
city = models.CharField(max_length=30)
address= models.CharField(max_length=3000)
gender= models.CharField(max_length=30)
class price_prediction(models.Model):
RID= models.CharField(max_length=3000)
Car_Name= models.CharField(max_length=3000)
Location= models.CharField(max_length=3000)
Car_Year= models.CharField(max_length=3000)
kilometer= models.CharField(max_length=3000)
Fuel_Type= models.CharField(max_length=3000)
Transmission= models.CharField(max_length=3000)
Owner_Type= models.CharField(max_length=3000)
Mileage= models.CharField(max_length=3000)
Engine= models.CharField(max_length=3000)
Power= models.CharField(max_length=3000)
Seats= models.CharField(max_length=3000)
Prediction= models.CharField(max_length=3000)
class detection_accuracy(models.Model):
names = models.CharField(max_length=300)
ratio = models.CharField(max_length=300)
class detection_ratio(models.Model):
names = models.CharField(max_length=300)
ratio = models.CharField(max_length=300)
28
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import re
from sklearn.ensemble import VotingClassifier
from sklearn.tree import DecisionTreeClassifier
import warnings
warnings.filterwarnings("ignore")
plt.style.use('ggplot')
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report
from sklearn.metrics import accuracy_score
from sklearn.metrics import f1_score
def login(request):
username = request.POST.get('username')
password = request.POST.get('password')
try:
enter =
ClientRegister_Model.objects.get(username=username,password=password)
request.session["userid"] = enter.id
return redirect('ViewYourProfile')
except:
pass
return render(request,'RUser/login.html')
def Register1(request):
if request.method == "POST":
username = request.POST.get('username')
email = request.POST.get('email')
password = request.POST.get('password')
phoneno = request.POST.get('phoneno')
country = request.POST.get('country')
state = request.POST.get('state')
city = request.POST.get('city')
29
address = request.POST.get('address')
gender = request.POST.get('gender')
ClientRegister_Model.objects.create(username=username, email=email,
password=password, phoneno=phoneno,
country=country, state=state, city=city,
address=address, gender=gender)
obj = "Registered Successfully"
return render(request, 'RUser/Register1.html', {'object': obj})
else:
return render(request,'RUser/Register1.html')
def ViewYourProfile(request):
userid = request.session['userid']
obj = ClientRegister_Model.objects.get(id= userid)
return render(request,'RUser/ViewYourProfile.html',{'object':obj})
def predict_used_car_price_type(request):
if request.method == "POST":
RID= request.POST.get('RID')
Car_Name= request.POST.get('Car_Name')
Location= request.POST.get('Location')
Car_Year= request.POST.get('Car_Year')
kilometer= request.POST.get('kilometer')
Fuel_Type= request.POST.get('Fuel_Type')
Transmission= request.POST.get('Transmission')
Owner_Type= request.POST.get('Owner_Type')
Mileage= request.POST.get('Mileage')
Engine= request.POST.get('Engine')
Power= request.POST.get('Power')
Seats= request.POST.get('Seats')
df = pd.read_csv('Datasets.csv')
df
df.columns
def apply_results(results):
30
df['Results'] = df['Price'].apply(apply_results)
cv = CountVectorizer()
X = df['RID'].apply(str)
y = df['Results']
X = cv.fit_transform(X)
models = []
print("KNeighborsClassifier")
from sklearn.neighbors import KNeighborsClassifier
kn = KNeighborsClassifier()
kn.fit(X_train, y_train)
knpredict = kn.predict(X_test)
print("ACCURACY")
print(accuracy_score(y_test, knpredict) * 100)
print("CLASSIFICATION REPORT")
print(classification_report(y_test, knpredict))
print("CONFUSION MATRIX")
print(confusion_matrix(y_test, knpredict))
models.append(('KNeighborsClassifier', kn))
# SVM Model
print("SVM")
from sklearn import svm
lin_clf = svm.LinearSVC()
lin_clf.fit(X_train, y_train)
predict_svm = lin_clf.predict(X_test)
svm_acc = accuracy_score(y_test, predict_svm) * 100
print(svm_acc)
print("CLASSIFICATION REPORT")
print(classification_report(y_test, predict_svm))
print("CONFUSION MATRIX")
print(confusion_matrix(y_test, predict_svm))
31
models.append(('svm', lin_clf))
print("Logistic Regression")
print("XGBoost")
import xgboost as xgb
32
xgb_clf = xgb.XGBClassifier()
xgb_clf.fit(X_train, y_train)
xgb_predict = xgb_clf.predict(X_test)
print("ACCURACY")
print(accuracy_score(y_test, xgb_predict) * 100)
print("CLASSIFICATION REPORT")
print(classification_report(y_test, xgb_predict))
print("CONFUSION MATRIX")
print(confusion_matrix(y_test, xgb_predict))
models.append(('XGBoost', xgb_clf))
classifier = VotingClassifier(models)
classifier.fit(X_train, y_train)
y_pred = classifier.predict(X_test)
RID1 = [RID]
vector1 = cv.transform(RID1).toarray()
predict_text = classifier.predict(vector1)
prediction = int(pred1)
if prediction == 0:
val = 'Below 5L'
elif prediction == 1:
val = 'More Than 5L and Below 20L'
elif prediction == 2:
val = 'More Than 20L and Below 100L'
print(val)
print(pred1)
price_prediction.objects.create(
RID=RID,
Car_Name=Car_Name,
Location=Location,
Car_Year=Car_Year,
kilometer=kilometer,
Fuel_Type=Fuel_Type,
Transmission=Transmission,
Owner_Type=Owner_Type,
Mileage=Mileage,
33
Engine=Engine,
Power=Power,
Seats=Seats,
Prediction=val,
)
Admin views.py
import datetime
import xlwt
from django.http import HttpResponse
import pandas as pd
import numpy as np
def serviceproviderlogin(request):
if request.method == "POST":
admin = request.POST.get('username')
password = request.POST.get('password')
if admin == "Admin" and password =="Admin":
return redirect('View_Remote_Users')
return render(request,'SProvider/serviceproviderlogin.html')
def Find_Used_Car_Price_Type_Ratio(request):
34
detection_ratio.objects.all().delete()
ratio = ""
kword = 'Below 5L'
print(kword)
obj = price_prediction.objects.all().filter(Q(Prediction=kword))
obj1 =price_prediction.objects.all()
count = obj.count();
count1 = obj1.count();
ratio = (count / count1) * 100
if ratio != 0:
detection_ratio.objects.create(names=kword, ratio=ratio)
ratio1 = ""
kword1 = 'More Than 5L and Below 20L'
print(kword1)
obj1 = price_prediction.objects.all().filter(Q(Prediction=kword1))
obj11 = price_prediction.objects.all()
count1 = obj1.count();
count11 = obj11.count();
ratio1 = (count1 / count11) * 100
if ratio1 != 0:
detection_ratio.objects.create(names=kword1, ratio=ratio1)
ratio12 = ""
kword12 = 'More Than 20L and Below 100L'
print(kword12)
obj12 = price_prediction.objects.all().filter(Q(Prediction=kword12))
obj112 = price_prediction.objects.all()
count12 = obj12.count();
count112 = obj112.count();
ratio12 = (count12 / count112) * 100
if ratio12 != 0:
detection_ratio.objects.create(names=kword12, ratio=ratio12)
obj = detection_ratio.objects.all()
return render(request, 'SProvider/Find_Used_Car_Price_Type_Ratio.html', {'objs':
obj})
def View_Remote_Users(request):
obj=ClientRegister_Model.objects.all()
return render(request,'SProvider/View_Remote_Users.html',{'objects':obj})
def ViewTrendings(request):
35
topic =
price_prediction.objects.values('topics').annotate(dcount=Count('topics')).order_by('
-dcount')
return render(request,'SProvider/ViewTrendings.html',{'objects':topic})
def charts(request,chart_type):
chart1 = detection_ratio.objects.values('names').annotate(dcount=Avg('ratio'))
return render(request,"SProvider/charts.html", {'form':chart1,
'chart_type':chart_type})
def charts1(request,chart_type):
chart1 = detection_accuracy.objects.values('names').annotate(dcount=Avg('ratio'))
return render(request,"SProvider/charts1.html", {'form':chart1,
'chart_type':chart_type})
def View_Prediction_Of_Used_Car_Price(request):
obj =price_prediction.objects.all()
return render(request, 'SProvider/View_Prediction_Of_Used_Car_Price.html',
{'list_objects': obj})
def likeschart(request,like_chart):
charts =detection_accuracy.objects.values('names').annotate(dcount=Avg('ratio'))
return render(request,"SProvider/likeschart.html", {'form':charts,
'like_chart':like_chart})
def Download_Trained_DataSets(request):
response = HttpResponse(content_type='application/ms-excel')
# decide file name
response['Content-Disposition'] = 'attachment; filename="TrainedData.xls"'
# creating workbook
wb = xlwt.Workbook(encoding='utf-8')
# adding sheet
ws = wb.add_sheet("sheet1")
# Sheet header, first row
row_num = 0
font_style = xlwt.XFStyle()
# headers are bold
font_style.font.bold = True
# writer = csv.writer(response)
obj =price_prediction.objects.all()
data = obj # dummy method to fetch data.
for my_row in data:
row_num = row_num + 1
36
ws.write(row_num, 0, my_row.RID, font_style)
ws.write(row_num, 1, my_row.Car_Name, font_style)
ws.write(row_num, 2, my_row.Location, font_style)
ws.write(row_num, 3, my_row.Car_Year, font_style)
ws.write(row_num, 4, my_row.kilometer, font_style)
ws.write(row_num, 5, my_row.Fuel_Type, font_style)
ws.write(row_num, 6, my_row.Transmission, font_style)
ws.write(row_num, 7, my_row.Owner_Type, font_style)
ws.write(row_num, 8, my_row.Mileage, font_style)
ws.write(row_num, 9, my_row.Engine, font_style)
ws.write(row_num, 10, my_row.Power, font_style)
ws.write(row_num, 11, my_row.Seats, font_style)
ws.write(row_num, 12, my_row.Prediction, font_style)
wb.save(response)
return response
def Train_Test_DataSets(request):
detection_accuracy.objects.all().delete()
df = pd.read_csv('Datasets.csv')
df
def apply_results(results):
if float(results) <= 5.0:
return 0 # Price is Below 5L
elif float(results) >= 5.0 and float(results) <= 20.0:
return 1 # 10L
elif float(results) >= 20.0 and float(results) <= 100.0:
return 2 # Price is More than 10L
df['Results'] = df['Price'].apply(apply_results)
cv = CountVectorizer()
X = df['Name'].apply(str)
y = df['Results']
print("Car Name")
print(X)
print("Label")
print(y)
X = cv.fit_transform(X)
models = []
37
# Set random seed for reproducibility
np.random.seed(42)
knn_clf = KNeighborsClassifier()
knn_clf.fit(X_train, y_train)
knn_predict = knn_clf.predict(X_test)
print("ACCURACY")
print(accuracy_score(y_test, knn_predict) * 100)
print("CLASSIFICATION REPORT")
print(classification_report(y_test, knn_predict))
print("CONFUSION MATRIX")
print(confusion_matrix(y_test, knn_predict))
models.append(('KNeighborsClassifier', knn_clf))
detection_accuracy.objects.create(names="K-Nearest Neighbors Classifier",
ratio=accuracy_score(y_test, knn_predict) * 100)
print("Logistic Regression")
print("KNeighborsClassifier")
from sklearn.neighbors import KNeighborsClassifier
kn = KNeighborsClassifier()
kn.fit(X_train, y_train)
knpredict = kn.predict(X_test)
print("ACCURACY")
print(accuracy_score(y_test, knpredict) * 100)
print("CLASSIFICATION REPORT")
print(classification_report(y_test, knpredict))
print("CONFUSION MATRIX")
print(confusion_matrix(y_test, knpredict))
39
models.append(('KNeighborsClassifier', kn))
# XGBoost
print("XGBoost")
import xgboost as xgb
xgb_clf = xgb.XGBClassifier()
xgb_clf.fit(X_train, y_train)
xgb_predict = xgb_clf.predict(X_test)
print("ACCURACY")
print(accuracy_score(y_test, xgb_predict) * 100)
print("CLASSIFICATION REPORT")
print(classification_report(y_test, xgb_predict))
print("CONFUSION MATRIX")
print(confusion_matrix(y_test, xgb_predict))
models.append(('XGBoost', xgb_clf))
detection_accuracy.objects.create(names="XGBoost", ratio=accuracy_score(y_test,
xgb_predict) * 100)
predicts = 'predicts.csv'
df.to_csv(predicts, index=False)
df.to_markdown
obj = detection_accuracy.objects.all()
40
41
CHAPTER-7
OUTPUT & SCREENSHOTS
42
7. Output Screenshots
43
Figure 7.3 View Your Profile
44
Figure 7.5 Inputting Car Details
45
Figure 7.7 Admin Login
46
Figure 7.9 View Trained Accuracy in Bar Chart
48
Figure 7.11 View Used Car Prices Type Ratio Results
49
CHAPTER-8
TESTING
8.1. Introduction to Testing
Testing is a procedure that identifies program errors. It is the primary quality metric used in
software development. During testing, the program is run under a set of conditions known as test
cases, and the output is analyzed to see if it is operating as expected. The process of executing
software to validate its functionality and correctness is known as software testing. The process of
running a program to identify an error. An excellent test case has a high likelihood of discovering
an as-yet-undiscovered fault. A successful test reveals a previously unknown mistake. Software
testing is typically done for two reasons:
• Detection of flaws
• Estimation of reliability
To ensure that the system is error-free, the following tiers of testing methodologies are used at various stages
of software development:
1. Unit testing: This is performed on individual models as they are finalized and made executable. It is
solely limited to the designer's specifications. Each module can be tested using one of the two methods
listed below:
Black Box Testing: With this method, some test cases are created as input conditions that fully execute
all the program's functional requirements. This testing was used to identify faults in the following areas:
a) Functions that are incorrect or missing.
b) Errors in the interface.
c) Data structure errors or access to an external database.
d) Mistakes in performance.
Only the output is examined for correctness during this testing. The data's logical flow is not examined.
White Box testing: In this method, test cases are built based on the logic of each module by sketching
flow diagrams of that module's logic, and logical judgments are tested on all situations. It was used to
create test cases in the following situations:
a) Ensure that all independent pathways are followed.
b) Carry out all reasonable decisions on both the truthful and false sides.
c) Run all loops inside their operational constraints and boundaries.
d) Test internal data structures for correctness.
2. Integration Testing: Integration testing guarantees that software and subsystems work in concert. It
tests the interfaces of all modules to ensure that they work properly when combined.
3. System Testing: This entails testing the complete system in-house before delivering it to the user. Its
goal is to reassure the user that the system meets all of the client's specifications.
51
4. Acceptance Testing: This is a type of pre-delivery testing in which the entire system is evaluated on
real-world data at the client's location to identify faults.
5. Validation: The system has been successfully tested and implemented, ensuring that all of the
requirements mentioned in the software requirements specification are properly met. In the event of
incorrect input, the associated error messages are presented.
Compiling Test: Doing our stress testing early on was a smart idea because it allowed us time to fix
some of the unforeseen deadlocks and stability issues that only appeared when components were
exposed to extremely high transaction volumes.
Execution Test: The software was loaded and run successfully. There were no execution errors because
of solid programming.
52
Figure 8.2 Registration Completed
54
Input 2
55
CHAPTER-9
CONCLUSION
&
FUTURE ENHANCEMENTS
56
9. Conclusion and Future Enhancements
In conclusion, this study underscores the effectiveness of machine learning algorithms in predicting
second-hand car prices in the Indian market context. Through rigorous evaluation and comparison, the
Support Vector Machine with RBF Kernel emerges as the top- performing algorithm, offering a high
level of accuracy 82.31% in price estimation. However, it is essential to acknowledge the continuous
need for improvement and refinement in predictive models, especially concerning the pricing of high-
value cars. The insights gleaned from this research not only contribute to enhancing transparency and
efficiency in the used car market but also pave the way for future advancements in pricing prediction
methodologies.
By leveraging the power of machine learning and data-driven approaches, stakeholders can make more
informed decisions, ultimately fostering a more equitable and dynamic marketplace for second-hand
vehicles in India.
57
CHAPTER-10
REFERENCES
10. References
59
Journal Details
Journal Title (in English Language) Journal of Nonlinear Analysis and Optimization: Theory and Applications
ISSN NA
E-ISSN 1906-9685
Discipline Science
Copyright © 2024 Savitribai Phule Pune University. All rights reserved. | Disclaimer
60
Journal of Nonlinear Analysis and Optimization Vol.
15, Issue. 1: 2024
ISSN :1906-9685
Y. Rakesh5
#1Assistant Professor in Department of CSE, Raghu Engineering College, Visakhapatnam.
ABSTRCT_ With the burgeoning demand for second-hand cars in India, accurately predicting
their prices has become crucial for both buyers and sellers. This study investigates the efficacy of
various machine learning algorithms in predicting second-hand car prices using data from the
Indian market. Four different algorithms, including Random Forest Classifier, Support Vector
Machine (SVM), Logistic Regression, and XGBoost, were evaluated based on their performance
metrics. The results indicate that SVM with RBF Kernel outperformed other methods, achieving
an accuracy of 82.31%followed by SVM, Logistic Regression, XGBoost, Random Forest
Classifier, Knn. However, further research is needed to enhance the accuracy and reliability of
price predictions, especially for high-value cars.
Classifier, Support Vector Machine (SVM) also serves as a platform for discussing
with RBF Kernel, Logistic Regression, and these challenges and proposing avenues
XGBoost. Each algorithm offers unique for future research aimed at enhancing the
strengths and characteristics that make accuracy and reliability of price
them well-suited for the task of predicting predictions, particularly for high-value
used car prices. vehicles.
Abstract: This research compares the models with different kernel functions
performance of different machine learning were trained and evaluated using various
algorithms in predicting second-hand car performance metrics. Results indicate that
prices. Algorithms such as Decision Trees, SVR with RBF kernel outperforms other
K-Nearest Neighbours, and Neural configurations, achieving high accuracy
Networks were evaluated using a dataset and generalization ability. The study
containing car attributes and historical demonstrates the efficacy of SVR for price
prices. Results show that Gradient prediction in dynamic and heterogeneous
Boosting outperforms other algorithms in markets like India.
terms of accuracy and robustness. The
study discusses the implications of these 2.5 Title: Feature Engineering for Used
findings for the automotive industry and Car Price Prediction: A Comparative
suggests avenues for future research. Study of Techniques Authors: Ankit
Kumar, Priya Singh
2.3 Title: Predictive Modelling of Used
Abstract: This study investigates the
Car Prices: A Review of Techniques and
impact of feature engineering techniques
Applications Authors: Michael Clark,
on the accuracy of used car price
Jennifer Lee
prediction models. Various preprocessing
Abstract: This review paper provides an methods such as feature scaling,
overview of predictive modelling normalization, and dimensionality
techniques used for estimating used car reduction were applied to a dataset of car
prices. The study synthesizes existing attributes. Different machine learning
literature on regression analysis, machine algorithms, including Random Forest and
learning, and data mining approaches Gradient Boosting, were trained on the
applied in this domain. Key methodologies processed data, and their performance was
and challenges are discussed, along with evaluated. Results reveal that careful
emerging trends such as deep learning and feature selection and transformation
ensemble methods. The paper concludes significantly improve model accuracy and
with recommendations for practitioners robustness. The study provides insights
and researchers interested in developing into best practices for feature engineering
accurate price prediction models for the in the context of predicting used car prices.
used car market.
3. PROPOSED SYSTEM
2.4 Title: Support Vector Regression for
The proposed system aims to develop a
Predicting Second-Hand Car Prices: A
robust predictive model for estimating
Case Study in the Indian Market
second-hand car prices in the Indian
Authors: Rahul Sharma, Priya Gupta
market using machine learning algorithms.
Abstract: This case study explores the use It involves comprehensive data collection
of Support Vector Regression (SVR) for and preprocessing, feature engineering to
predicting second-hand car prices in the enhance predictive power, and model
Indian market. A dataset comprising car development utilizing algorithms such as
attributes and transaction prices was Random Forest, Support Vector Machine,
collected from online marketplaces. SVR Logistic Regression, and XGBoost.
1932 JNAO Vol. 15, Issue. 1, No. 7 : 2024
Hyperparameter tuning and evaluation Login, Train & Test Used Car Data Sets,
metrics will be employed to select the View Trained Accuracy in Bar Chart,
most accurate model for deployment. The View Trained Accuracy Results, View
system will be integrated into a user- Used Car Prices Type, Find Used Car
friendly application, facilitating easy Prices Type Ratio, Download Predicted
access to estimated price predictions for Datasets, View Used Car Prices Type
both buyers and sellers. Continuous Ratio Results, View All Remote Users.
monitoring and improvement strategies View and Authorize Users
will ensure the model remains effective In this module, the admin can view the list
over time, providing stakeholders with of users who all registered. In this, the
valuable insights for informed decision- admin can view the user’s details such as,
making in the dynamic used car market user name, email, address and admin
landscape authorizes the users.
and password. Once Login is successful USED CAR PRICE TYPE, VIEW YOUR
user will do some operations PROFILE.
likeREGISTER AND LOGIN, PREDICT
Fig 1:Architecture
Used car prices type ratio results in line chart and pie chart:
5. CONCLUSION
1936 JNAO Vol. 15, Issue. 1, No. 7 : 2024
In conclusion, this study underscores the [4] Jassibi, J., Alborzi, M. and Ghoreshi,
effectiveness of machine learning F. (2011) “Car Paint Thickness Control
algorithms in predicting second-hand car using Artificial Neural Network and
prices in the Indian market context. Regression Method”. Journal of Industrial
Through rigorous evaluation and Engineering International, Vol. 7, No. 14,
comparison, the Support Vector Machine pp. 1-6, November 2010
with RBF Kernel emerges as the top- [5] Ahangar, R. G., Mahmood and Y.,
performing algorithm, offering a high level Hassen P.M. (2010) “The Comparison of
of accuracy in price estimation. However, Methods, Artificial Neural Network with
it is essential to acknowledge the Linear Regression using Specific Variables
continuous need for improvement and for Prediction Stock Prices in Tehran
refinement in predictive models, especially Stock Exchange”. International Journal of
concerning the pricing of high-value cars. Computer Science and Information
The insights gleaned from this research not Security, Vol.7, No. 2, pp. 38-46.
only contribute to enhancing transparency
[6] Listiani, M. (2009) “Support Vector
and efficiency in the used car market but
Regression Analysis for Price Prediction in
also pave the way for future advancements
a Car Leasing Application”. Thesis (MSc).
in pricing prediction methodologies. By
Hamburg University of Technology.
leveraging the power of machine learning
and data-driven approaches, stakeholders [7] Iseri, A. and Karlik, B. (2009) “An
can make more informed decisions, Artificial Neural Network Approach on
ultimately fostering a more equitable and Automobile Pricing”. Expert Systems with
dynamic marketplace for second-hand Application: ScienceDirect Journal of
vehicles in India. Informatics, Vol. 36, pp. 155-2160, March
2009.
REFERENCES
[8] Yeo, C. A. (2009) “Neural Networks
[1] NATIONAL TRANSPORT
AUTHORITY. 2015. Available at: for Automobile Insurance Pricing”.
http://nta.govmu.org/English/Statistics/Pag Encyclopedia of Information Science and
es/Arch ives.aspx. [Accessed 24 April Technology, 2nd Edition, pp. 2794-2800,
2015]. Australia.
[3] Pudaruth, S. (2014) “Predicting the [10] Rose, D. (2003) “Predicting Car
Price of Used Cars using Machine Production using a Neural Network
Learning Techniques”. International Technical Paper- Vetronics (Inhouse)”.
Journal of Information & Computation Thesis, U.S. Army Tank Automotive
Technology, Vol. 4, No. 7, pp.753- 764.
1937 JNAO Vol. 15, Issue. 1, No. 7 : 2024
Author’s Profiles
P. Vara Siddhu