You are on page 1of 69

DEPARTMENT OF COMPUTER SCIENCE & INFORMATION TECHNOLOGY

GURU GHASIDAS VISHWAVIDYALAYA, KONI, BILASPUR (C.G.)


(A Central University Established by the Central Universities Act, 2009 No.25 of 2009)

A
Major Project Report
on
“LAPTOP PRICE PREDICTOR”
Master of Computer Application
(Session: 2021-22)

SUBMITTED BY: SUBMITTED TO:


Ajit Tiwari Prof. A. K. Saxena

Roll no.: 20606005 (HOD of CSIT Dept)

Enrollment no: GGV/20/05005 Guru Ghasidas Central University


MCA IVth Semester Koni, Bilaspur (C.G.)
DEPARTMENT OF COMPUTER SCIENCE & INFORMATION TECHNOLOGY
GURU GHASIDAS CENTRAL UNIVERSITY, BILASPUR
(A Central University Established by the Central Universities Act, 2009 No.25 of 2009)

ACKNOWLEDGEMENT
At every outset we express our gratitude to almighty lord for showering his grace
and blessing upon us to complete this project.

Although our name appears on the cover of this book, many people had contributed
in some form or the other form to this project development. We could not done this project
without the assistance or support of each of the following we thank you all.

I wish to place on our record my deep sense of gratitude to my project guide and my
project in charge Prof. A. K. Saxena (H.O.D. of CSIT Dept, GGU), for their constant
motivation and valuable help through the project work. Express our gratitude to Prof. A.
K. Saxena (H.O.D. of CSIT Dept, GGU), for his valuable suggestions and advices
throughout the course. We also extend our thanks to other faculties for their cooperation
during our course.

Finally I would like to thank our friends for their cooperation to complete this project.

Ajit Tiwari
(Signature of the Candidate) Roll no. 20606005
Enroll no. GGV/20/05005
DEPARTMENT OF COMPUTER SCIENCE & INFORMATION TECHNOLOGY
GURU GHASIDAS CENTRAL UNIVERSITY, BILASPUR
(A Central University Established by the Central Universities Act, 2009 No.25 of 2009)

CERTIFICATE OF THE GUIDE

This is to certify that the project entitled “LAPTOP PRICE PREDICTOR” is a


record of work carried out by Ajit Tiwari under my guidance and supervision for the
award of the Degree of MCA at Guru Ghasidas Central University Bilaspur (C.G.).

To the best of my knowledge and belief of the project


i) Embodies the work of the candidate him/herself, and has not been submitted for
the award of any degree.
ii) Has duly been completed.
iii) Fulfills the requirement of the Ordinance relating to the MCA CS degree of the
University
iv) Is up to the desired standard in respect of contents and is being referred to the
examiners.

(Signature of the guide)

Recommendation of the Department


The Project work as mentioned above by here by being recommended and forwarded
for examination and evaluation.

(Signature of the head of Department with seal)


Prof. A. K. Saxena
(H.O.D. of CSIT dept.)
GURU GHASHIDAS CENTRAL UNIVERSITY BILASPUR (C.G.)
DEPARTMENT OF COMPUTER SCIENCE & INFORMATION TECHNOLOGY
GURU GHASIDAS CENTRAL UNIVERSITY, BILASPUR
(A Central University Established by the Central Universities Act, 2009 No.25 of 2009)

DECLARATION OF THE CANDIDATE

This is to certify that I am Ajit Tiwari, a student of MCA (Master of Computer


Application) of Department of Computer Science & Information Technology session
2020-2022, Enrollment No. GGV/20/05005 have carried out a project entitled “LAPTOP
PRICE PREDICTOR” under the guidance of Prof. A. K. Saxena (H.O.D. of CSIT
Dept, GGU) .This is an original work carried by me and the report has not been submitted
to any other University for the award of any degree or diploma.

Place:- Bilaspur Ajit Tiwari


Date:- MCA IVth Sem.

Dept. of CSIT
Guru Ghasidas University Bilaspur (C.G.)
(A Central University)
DEPARTMENT OF COMPUTER SCIENCE & INFORMATION TECHNOLOGY
GURU GHASIDAS CENTRAL UNIVERSITY, BILASPUR
(A Central University Established by the Central Universities Act, 2009 No.25 of 2009)

CERTIFICATE BY THE EXAMINIER

This is to certified that the project work entitled “LAPTOP PRICE


PREDICTOR” submitted by Ajit Tiwari has completed under the guidance of Prof.
A. K. Saxena (H.O.D. of CSIT Dept, GGU), has been examined by the undersigned as
a part of the examination for the award of the MCA (Master of Computer Application)
Degree in Dept. of “COMPUTER SCIENCE & INFORMATION TECHNOLOGY” in
GURU GHASIDAS UNIVERSITY, BILASPUR C.G.

“Project Examined & Approved”

Internal Examiner External Examiner

Date: Date:

Prof A. K Saxena

(Signature of H.O.D. (CSIT))


DEPARTMENT OF COMPUTER SCIENCE & INFORMATION TECHNOLOGY

GURU GHASIDAS CENTRAL UNIVERSITY, BILASPUR


(A Central University Established by the Central Universities Act, 2009 No.25 of 2009)

ABSTRACT

When an item is missing and has to be replaced, the difference in quality between
the disappearing product and the new one must be taken into account in the
consumer price index, in order to measure comparable prices. Hedonic regressions
can be used to estimate this difference, using product characteristics as explanatory
variables for the price. However, the quality of the models can be insufficient due
to the small size of samples. This paper explores the use of web scraping in order
together bigger volumes of information on prices and characteristics, in particular
for electronic goods. Traditional hedonic regressions will be compared with other
predictive methods, including machine learning algorithms in terms of predictive
power.
This paper presents a Laptop price prediction system by using the supervised
machine learning technique. The research uses multiple linear regression as the
machine learning prediction method which offered 81% prediction precision.
Using multiple linear regression, there are multiple independent variables but one
and only one dependent variable whose actual and predicted values are compared
to find precision of results. This paper proposes a system where price is dependent
variable which is predicted, and this price is derived from factors like Laptop’s
model, RAM, ROM (HDD/SSD), GPU, CPU, IPS Display, and Touch Screen.

Keywords: consumer price index · hedonic regression · quality adjustment · web


scraping.
INDEX
1. INTRODUCTION
1.1 Scope of work
1.2 Problem Definition
1.3 Existing System And Need For System
1.4 Price Prediction Work
1.5 Related Work

2. LITERATURE SURVEY
2.1 Data Set
2.2 Pre – Processing and Enhancement
2.3 Feature Engineering
2.4 Classification

3. PROJECT DESIGN
3.1 Feasibility Analysis
3.2 Feasibility Studies
3.3 Life cycle Model
3.4 Project Cost and Time Estimation
3.5 Software Architecture Diagram
3.6 Architectural Style and Justification
3.7 Flow Chart
3.8 Hardware and Software Platform requirements Software
3.9 Software Design/Diagram Document
3.10 Software Description

4. PROJECT IMPLEMENTATION
4.1 Methodology
4.2 Screenshots
4.3 Programming Language Used for Implementation
4.4 Tool Used
4.5 Testing Approach
4.6 Testing Plan

5. CONCLUSION AND FUTURE SCOPE


5.1 Conclusion
5.2 Future Scope
References –
1. INTRODUCTION

This chapter discusses the main concepts that the project is based on. It identifies what the project is
actually meant to accomplish

Laptop price prediction especially when the laptop is coming direct from the
factory to Electronic Market/ Stores, is both a critical and important task. The mad
rush that we saw in 2020 for laptops to support remote work and learning is no
longer there. In India, demand of Laptops soared after the Nationwide lockdown,
leading to 4.1-Million-unit shipments in the June quarter of 2021, the highest in the
five years. Accurate Laptop price prediction involves expert knowledge, because
price usually depends on many distinctive features and factors. Typically, most
significant ones are brand and model, RAM, ROM, GPU, CPU, etc. In this paper,
we applied different methods and techniques in order to achieve higher precision
of the used laptop price prediction.
Predicting price of laptops has been studied extensively in various researches.
Listen discussed, in her paper written for Master thesis, that regression model that
was built using Decision Tree & Random Forest Regressor can predict the price of
a laptop that has been leased with better precision than multivariate regression or
some simple multiple regression. This is on the grounds that Decision Tree
Algorithm is better in dealing with datasets with more dimensions and it is less
prone to overfitting and underfitting. The weakness of this research is that a change
of simple regression with more advanced Decision Tree Algorithm regression was
not shown in basic indicators like mean, variance or standard deviation.
1.1 Scope of work-

When shopping for a new laptop a consumer may look for certain specifications
and
features on a budget. College students, who generally are financially unstable, have
a limited budget to afford high-end laptops. There are several factors influencing
the price of the laptop. Usually high specifications and more features mean more
money. The purpose of the paper is to identify the most significant factors of
laptop which drives the laptop prices by developing a regression model to forecast
the prices of laptops. Using our regression model and analysis, one may be able to
identify the correct price of the laptop instead of performing a competitive
analysis. For a consumer who lacks knowledge about laptops may find our model
to be useful. Especially our model may be helpful to consumers with a limited
budget, such as students, since they may be able to predict the price of the laptop
given the features and specifications that they want.

1.2 Problem Definition-

We will make a project for Laptop price prediction. The problem statement is that
if any user wants to buy a laptop then our application should be compatible to
provide a tentative price of laptop according to the user configurations. Although it
looks like a simple project or just developing a model, the dataset we have is noisy
and needs lots of feature engineering, and preprocessing that will drive your
interest in developing this project.
When shopping for a new laptop a consumer may look for certain specifications
and features on a budget. College students, who generally are financially unstable,
have a limited budget to afford high-end laptops. There are several factors
influencing the price of the laptop. Usually high specifications and more features
mean more money. The purpose of the paper is to identify the most significant
factors of laptop which drives the laptop prices by developing a regression model
to forecast the prices of laptops. Using our regression model and analysis, one may
be able to identify the correct price of the laptop instead of performing a
competitive analysis. For a consumer who lacks knowledge about laptops may find
our model to be useful. Especially our model may be helpful to consumers with a
limited budget, such as students, since they may be able to predict the price of the
laptop given the features and specifications that they want.

1.3 Existing System And Need For System-

Encouraged by these results, future work will involve the improvement of


classification result and overall accuracy. The number of output classes can also be
increased if more data is available. With a more extensive and diverse dataset, the
overall classification accuracy can be dramatically increased. Another approach to
improve the result would be to increase the number of hidden layers of the neural
network. By increasing the number of hidden layers, the weights will be better
adjusted and thus increase the classification. One can also do fine tuning and
transfer learning approaches to better tune the model on the basis of already trained
models.
he accuracy of traditional pricing methods leaves much to be desired. In truth, most
conventional methods value intuition and subjective opinion over hard data. And
that’s why decisions based on such processes often lead businesses down a rabbit
hole. Whereas if you use AI to set your prices, you won’t only work faster and
more economically — you’ll price more accurately, no matter what the market
throws your way. When you use machine learning, you also get an excellent grasp
of how industry prices evolve over the course of a year. And this leads to a final,
more subtle benefit.
Suppose you spot that a supplier often increases their prices
in October. You can make a note to stock up on certain goods in September,
avoiding the upcoming increase, saving you money, and boosting your overall
profit margin.

1.4 Price Prediction Work-

Now, let’s deep-dive into how price prediction with machine learning works.
Machine learning models use both technical and fundamental analysis in the price
forecasting process. Technical analysis looks at historical prices, economic growth
rates, and other related factors, formulating an approximate price. Then, to get a
more accurate picture of the market, the process turns to fundamental analysis.
This step looks at various external and internal factors, including macro-factors
like the season and micro-influencers like the time of the day, trying to figure out
when a consumer is most likely to buy. In mathematical terms, these processes are
known as regression analysis. Which is a statistical way to predict the relationship
between variables (one independent variable and one — or more — dependent
variable).In price prediction, price is the independent variable. And it’s affected by
several dependent variables. Suppose we were trying to price a pizza: the answer
would depend on the size of the pizza and the cost of its ingredients. Beyond
regression, price prediction uses descriptive and predictive analytics. But this is
just another way to describe the discrete steps of regression analysis.

This is what the two processes entail:

 Descriptive Analytics: This step uses statistical methods of data collection,


analysis, interpretation, and data visualization to look at what’s happened in
the past. The historical analysis forms the basis of the predictions.
 Predictive Analytics: This step analyzes the data to predict the possibility of
future events, forecasting aspects like customer behavior.
Automated price prediction aims to develop a model capable of finding the optimal
price point at any point in time without any human input. If you’re looking for a
reliable pricing strategy that’s rooted in data, this is the approach for you.

1.5 Related Work -

Predicting price of laptops has been studied extensively in various researches.


Listen discussed, in her paper written for Master thesis, that regression model that
was built using Decision Tree & Random Forest Regressor can predict the price of
a laptop that has been leased with better precision than multivariate regression or
some simple multiple regression. This is on the grounds that Decision Tree
Algorithm is better in dealing with datasets with more dimensions and it is less
prone to overfitting and underfitting. The weakness of this research is that a change
of simple regression with more advanced Decision Tree Algorithm regression was
not shown in basic indicators like mean, variance or standard deviation.
2. LITERATURE SURVEY

This chapter discusses the papers referenced in preparation for undertaking this project. These papers
serve as a benchmark to enable this project to be undertaken.

Machine learning is a branch of Artificial intelligence that deals with implementing


applications that can make a future prediction based on past data. If you are a data
science enthusiast or a practitioner then this article will help build your own end-
to-end machine learning project from scratch. There are various steps involved in
building a machine learning project but not all the steps are mandatory to use in a
single project, and it all depends on the data. In this article, we will build a Laptop
price prediction project and learn about the machine learning project lifecycle.
Amazon, most popular and reliable online retailer, offers competitive pricing and
frequent pricing updates. Thus, the team decided to collect data from Amazon.
However, after running regression analysis on the data collected from Amazon, we
found that the data did not accurately depict the price because there are too many
individual sellers. Therefore, the team recollected the data from BestBuy instead.
According to pcworlds.com, BestBuy came in second after Amazon in laptop
buying online retailers.

2.1 Data Set-


You can download the dataset from here. Most of the columns in a dataset are
noisy and contain lots of information. But with feature engineering you do, you
will get more good results. The only problem is we are having less data but we will
obtain a good accuracy over it. The only good thing is it is better to have a large
data. we will develop a website that could predict a tentative price of a laptop
based on user configuration
Now let us start working on a dataset in our Jupyter Notebook. The first step is to
import the libraries and load data. After that we will take a basic understanding of
data like its shape, sample, is there are any NULL values present in the dataset.
Understanding the data is an important step for prediction or any machine learning
project.

2.1 Pre – Processing and Enhancement-

In this section, we will relabel and convert categorical features into numerical
features. This is essential for training our ML models as ML models only accept
numerical values as inputs. Starting off, we identify features that are non-numerical
(Object type) and compute their cardinalities (categories present in each
feature).Knowing that the Touchscreen feature only has 2 categories, we can use
label encoding to encode this feature. (One-hot-encoding can be used too) Using the
Scikit-learns label encoding function, the variables present in Touchscreen (‘No’,
‘Yes’) will be encoded into 0s and 1s. Abel encoding also handles features with
high cardinalities. Applying label encoding to the CPU feature, the label encoded
values (associated with their pre-encoded variables) are recorded for predicting
purposes later. Other features with slightly lower cardinality were encoded via the
one-hot-encoding method. Through the use of pandas’ .get dummies() method, new
column will be created to indicate the presence of each categorical variable. after
applying One-hot-encoding to TypeName and OpSys features, we will use manual
encoding to deal with features with high cardinality if we know the order of
variables. We can use python’s dictionary and mapping methods to specify and
encode each category based on their magnitude/order. The code snippet shown
below encodes the Screen Resolution feature based on the pixel count.

2.3 Feature Engineering-

We would now extract and reorganize our data to better understand the underlying
factors that contribute to the price of laptops.

If we take a look at the Screen Resolution column, there seems to be laptops with
touchscreen capabilities. Since touchscreen laptops are known to be more
expensive than those without them, a Touchscreen feature would be added to mark
laptops with such capabilities. We would then extract and replace the screen
resolution column with their respective pixel count using regular expressions. I
find regular expressions incredibly useful when it comes to extracting/filtering
alphanumeric values. We would then apply the same process for engineering the
CPU, Ram, and Weight features. Our goal is to minimize or remove any units and
words that are not essential for analysis later. Now comes the most tiring part of
feature engineering, dealing with memory feature. Upon closer inspection, the
memory column contains various types of memory (SSD, HDD, SSHD, and Flash
Storage). We would need to create 4 additional columns representing different
memory types and extract their memory capacities individually. (Additional
processing needs to be done for laptops having double memory configuration that
uses the same memory types.( EX: 256GB SSD + 512GB SSD). This could be
done using a similar process shown above.
It is good that there are no NULL values. And we need little changes in weight and
Ram column to convert them to numeric by removing the unit written after value.
So we will perform data cleaning here to get the correct types of columns.
2.4 Classification-
In classification tasks, there are often several candidate feature extraction methods
available. The most suit able method can be chosen by training neural networks to
perform the required classification task using different input features (derived
using different methods). The error in the neural network k response to test
examples provides an indication of the suitability of the corresponding input
features (and thus method used to derive them) to the considered classification
task. Following are the classification algorithms that have been implemented:

 Linear Models
o Logistic Regression
o Support Vector Machines
 Non-linear Models
o K-Nearest Neighbours
o Kernel SVM
o Naïve Bayes
o Decision Tree Classification
o Random Forest Classification

The Classification algorithm is a Supervised Learning technique that is used to


identify the category of new observations on the basis of training data. In
Classification, a program learns from the given dataset or observations and then
classifies new observation into a number of classes or groups. Such as, Yes or No,
0 or 1, Spam or Not Spam, cat or dog, etc. Classes can be called as targets/labels or
categories. Unlike regression, the output variable of Classification is a category,
not a value, such as "Green or Blue", "fruit or animal", etc. Since the Classification
algorithm is a Supervised learning technique, hence it takes labeled input data,
which means it contains input with the corresponding output.

3. PROJECT DESIGN

This chapter contains a fully developed Software Project Management Plan for the project. The plan
highlights the deliverables roles tasks and schedule for the project

Our study deals with the segmentation and classification of automated laptop
predictors. laptop, laptop computer, or notebook computer is a small, portable
personal computer (PC) with a screen and alphanumeric keyboard. These typically
have a clamshell form factor, typically having the screen mounted on the inside of
the upper lid and the keyboard on the inside of the lower lid, although 2-in-1 PCs
with a detachable keyboard are often marketed as laptops or as having a laptop
mode. Laptops are folded shut for transportation, and thus are suitable for mobile
use. Its name comes from the lap, as it was deemed practical to be placed on a
person’s lap when being used. Today, laptops are used in a variety of settings, such
as at work, in education, for playing games, web browsing, for personal
multimedia, and general home computer use.

3.1 Feasibility Analysis-

 Economic feasibility: Whether the firm can afford to build the software,
whether its benefits should substantially exceed its cost. Our project is
economically feasible. Our system uses academic version of JUPYTER a,
which was very feasible, economically since it can be viewed as a onetime
investment.
 Technical feasibility: Whether the technology needed for the system exists,
how difficult It is to build. Our project is technically versatile system which
can work on most platforms making it technically feasible to build requiring
only few specifications. Software used for the project implementation is
JUPYTER. Basic technical knowledge of operating JUPYTER software
along with the classification toolbox is required for the developers.
 Schedule Feasibility: How much time is available to build the new system,
when it can be built? The project is entirely build from scratch to completion
in a span of eight- nine months.
 Ecological Feasibility: Whether the system has an impact on its
environment. There are no adverse effects on the environment.
 Operational feasibility: The system is easy to use and user-friendly. All
maintenance issues will be handled efficiently. System is adaptable to most
environments. Hence our system is operationally feasible.

3.2 Feasibility Studies-

A feasibility study will help you determine the specific factors that will affect your
project before committing resources, time, or budget. So while it’s tempting to
brush it aside as another exercise delaying getting to work, remember that it’s
easier to address issues before you jump in than it is after. Say, for example, you’re
launching a new app. You’ll want to know if you physically have the resources and
technology needed to produce it, as well as whether or not it’ll give you an
acceptable return on investment (ROI). If you proceed without conducting a full
analysis, you’re opening yourself up to unnecessary risk. A feasibility study
mitigates that risk.
What are the benefits of a feasibility study?
 It’s flexible and scalable, which means it can be applied to any kind of
project – whether that’s a software development project, a new product
launch, or a new team process. Although the bigger the project, the more
important it becomes because the investment stakes are that much higher.
 It helps you avoid project failure through logical assessment.
 It gives stakeholders a clearer picture of the project, which, in turn, helps
improve focus and commitment.
 Comparing and analyzing the different options helps you narrow business
alternatives while helping simplify the decision-making process.
 It outlines a valid reason for your project to exist.
 Evaluating multiple options enhances your project’s success rate.

3.3 Life cycle Model-

Waterfall model is non-iterative design process where System requirements are


known initially and final outcome is determined. It progresses steadily downwards
through above given faces. When to use waterfall model:

 This model is used only when the requirements are very well known, clear and
fixed.
 Product definition is stable.
 Technology is understood.
 There are no ambiguous requirements
 Ample resources with required expertise are available freely

Figure 1: Waterfall Model

Functionality 1: Requirements Gathering

 JUPYTER/Google Collab software and laptop price prediction scan


datasets.
 At least 100 datasets of laptop. The vaster a dataset, more accurate will the
results be.
 Gathering information about different filters such as - Predictive filters are a
family of estimation techniques. which gives the best result for our study
 Clustering algorithms for segmentation- k-means , scaling , Particle Swarm
Optimization etc. Feature extraction techniques such as EDA .
Functionality 2: Design-
 Designing the process overview from applying filters to segmentation,
feature extraction and classification of laptop price .
Functionality 3: Implementation-

 Implementing all algorithms in JUPYTER/GOOGLE COLLAB.

Functionality 4: Verification-

 Verifying it by testing it on minimum 50 datasets of scans.

Functionality 5: Maintenance-

 Maintaining from time to time for its efficiency.

Project deliverables-

 Software Project Management Plan


 Software Requirements specifications
 Software Design Description
 System Test Document
 User Interface Module o Final Product

3.4 Project Cost and Time Estimation-


Only the academic version of Anaconda Jupyter or Google Collab was used in the
project. there was no other additional cost spent.
3.5 Software Architecture Diagram-

Figure 2: Software Architecture Diagram

3.6 Architectural Style and Justification-

The diagram (Figure 2) shows the various stages in the development of our system.
The diagram shows interaction between the various components of the application
and their position in the development hierarchy. This style hence, is appropriate for
the selected problem because all the modules in the selected problem function
independently. The communication is strictly through message passing connectors.
The flow of the system is from the left to right.
3.7 Flow Chart-

Start

Data Pre Processing

Segments

Data Cleaning

Classification

Input value

Predict Price

Stop
3.8 Hardware and Software Platform requirements
Software-

Hardware Requirements-

 CPU configuration

o Processor : Intel Pentium or later


o RAM : 512 MB or later
o Hard Disk : 1 Gb Hard Disk Space or more
o Monitor : Any monitor

Software Requirements-
 Languages : Python 3
 Editor : Python IDK
 Dataset : MS excel
 Operating System : Windows Xp/7/8/10/11

Hardware Used-

o Processor : Intel i3
o RAM : 4 GB
o Hard Disk : 1 TB Hard Disk
o Monitor : Laptop

Software Used-
 Languages : Python 3
 Editor : Jupiter / Google Collab And PyCharm
 Dataset : MS Excel
 Operating System : Windows 11
3.9 Software Design/Diagram Document-

1. DFD (Data Flow Diagram) –


The data used in the study are laptop price Pridiction
Data
Collection

Split Data Into Training Data


Feature Split Data Into Test Data
Extraction

Support Vactor Machine


We Can Also Use
KNN
Decision Tree
ML classifiees Random Forest
SVM
Stacking

Various performance measures are computed for each of the models


Analysing and the best one are chosen
Result

2. Work Flow Diagram –


Data Data Pre-Processing Feature Extraction

Price Predict Classification User Input

3. Use Case Diagram-

 The use case diagram consists of two actors, who interact with the software.
 The User: The user takes in input image and sees the final output
 The System: The System performs all clustering, feature extraction,
classification and training algorithms.
4. Flow of design and analysis-

5. Price Prediction Control Flow-

INPUT

USER QUERIES TRAINNIG


DATA
LAPTOP PRICE
PREDICTION

MISSING DATA
TEST
RESULT

PREDICTION
SERVICES
3.10 Software And Library Description-

I. Python: Python is an interpreted high-level programming language for


general-purpose programming. Created by Guido van Rossum and first
released in 1991, Python has a design philosophy that emphasizes code
readability, notably using significant whitespace. It provides constructs that
enable clear programming on both small and large scales. Python features a
dynamic type system and automatic memory management. It supports
multiple programming paradigms, including object-oriented, imperative,
functional and procedural, and has a large and comprehensive standard
library. Python interpreters are available for many operating systems. C
Python, the reference implementation of Python, is open source software and
has a community-based development model, as do nearly all of its variant
implementations. C Python is managed by the non-profit Python Software
Foundation. We Use Python 3.

 NumPy: NumPy is a general-purpose array-processing package. It provides


a high performance multidimensional array object, and tools for working
with these arrays. It is the fundamental package for scientific computing
with Python. It contains various features including these important ones:
a. A powerful N-dimensional array object • Sophisticated (broadcasting)
functions
b. Tools for integrating C/C++ and Fortran code
c. Useful linear algebra, Fourier transform, and random number
capabilities 24
d. Besides its obvious scientific uses, NumPy can also be used as an
efficient multidimensional container of generic data. Arbitrary data-
types can be defined using Numpy which allows NumPy to
seamlessly and speedily integrate with a wide variety of databases.
 Pandas: Pandas is an open-source Python Library providing high-
performance data manipulation and analysis tool using its powerful data
structures. The name Pandas is derived from the word Panel Data – an
Econometrics from Multidimensional data. In 2008, developer Wes
McKinney started developing pandas when in need of high performance,
flexible tool for analysis of data. Prior to Pandas, Python was majorly used
for data mining and preparation. It had very little contribution towards data
analysis. Pandas solved this problem. Using Pandas, we can accomplish five
typical steps in the processing and analysis of data, regardless of the origin
of data — load, prepare, manipulate, model, and analyse. Python with
Pandas is used in a wide range of fields including academic and commercial
domains including finance, economics, Statistics, analytics, etc.
 Matplotlib- Matplotlib is a python library used to create 2D graphs and
plots by using python scripts. It has a module named pyplot which makes
things easy for plotting by providing feature to control line styles, font
properties, formatting axes etc. It supports a very wide variety of graphs and
plots namely - histogram, bar charts, power spectra, error charts etc. It is
used along with NumPy to provide an environment that is an effective open
source alternative for MatLab. It can also be used with graphics toolkits like
PyQt and python.
 Seaborn- Seaborn is an amazing visualization library for statistical graphics
plotting in Python. It provides beautiful default styles and color palettes to
make statistical plots more attractive. It is built on the top of matplotlib
library and also closely integrated to the data structures from pandas.
Seaborn aims to make visualization the central part of exploring and
understanding data. It provides dataset-oriented APIs, so that we can switch
between different visual representations for same variables for better
understanding of dataset.

II. Jupyter Notebook-


a. The Jupyter Notebook is an incredibly powerful tool for interactively
developing and presenting data science projects.
b. A notebook integrates code and its output into a single document that
combines visualizations, narrative text, mathematical equations, and
other rich media.
c. The Jupyter Notebook is an open-source web application that allows
you to create and share documents that contain live code, equations,
visualizations and narrative text.
d. Uses include: data cleaning and transformation, numerical simulation,
statistical modeling, data visualization, machine learning, and much
more.
e. The Notebook has support for over 40 programming languages,
including Python,R, Julia, and Scala.
f. Notebooks can be shared with others using email, Drop box, Git Hub
and the Jupyter Notebook.
g. Your code can produce rich, interactive output: HTML, images,
videos, LATEX, and custom MIME types.
h. Leverage big data tools, such as Apache Spark, from Python, R and
Scala. Explore that same data with pandas, scikit-learn, ggplot2,
Tensor Flow.
III. PyCharm- PyCharm is the most popular IDE for Python, and includes great
features such as excellent code completion and inspection with advanced
debugger and support for web programming and various frameworks.
PyCharm is created by Czech company, Jet brains which focusses on
creating integrated development environment for various web development
languages like JavaScript and PHP.
PyCharm is the most popular IDE used for Python scripting language. This
chapter will give you an introduction to PyCharm and explains its features.
PyCharm offers some of the best features to its users and developers in the
following aspects −
 Code completion and inspection
 Advanced debugging
 Support for web programming and frameworks such as Django and
Flask

 Streamlit- Streamlit is an open source app framework in python language. It


helps us create beautiful web-apps for data science and machine learning in
a little time. It is compatible with major python libraries such as scikit-learn,
keras, pytorch, latex, numpy, pandas, matplotlib, etc. Syntax for installing
this library is shown below

 Pickle- Pickle is a module in Python used for serializing and de-serializing


Python objects. This converts Python objects like lists, dictionaries, etc. into
byte streams (zeroes and ones). You can convert the byte streams back into
Python objects through a process called unpickling. Pickling is also known
as serialization, flattening, or marshalling
4. PROJECT IMPLEMENTATION

This chapter discusses the implementation of the project – the various algorithms, testing approaches
and the results.
4.1 Methodology-
We have implemented nine algorithms in this project. A detailed explanation and
the various outputs are shown below To support the application of machine
learning using the Decision Tree algorithm, of course the sample data is needed.
Table below contains data about various laptops and their prices depending on
their configuration. Sample data were obtained from Kaggle.com

 Dataset Used for Analysis- The key to success in the field of machine
learning or to become a great data scientist is to practice with different types
of datasets. But discovering a suitable dataset for each kind of machine
learning project is a difficult task. So, in this topic, we will provide the detail
of the sources from where you can easily get the dataset according to your
project. After loading the dataset via Pandas, we can see a list of laptops and
specs that are associated with each laptop.

 Explanatory Data Analysis (EDA)-


1. Distribution of target column- Working with regression problem
statement target column distribution is important to understand.
The distribution of the target variable is skewed and it is obvious that
commodities with low prices are sold and purchased more than the
branded ones.

2. Company Column- we want to understand how does brand name


impacts the laptop price or what is the average price of each laptop
brand? If you plot a count plot(frequency plot) of a company then the
major categories present are Lenovo, Dell, HP, Asus, etc. Now if we
plot the company relationship with price then you can observe that
how price varies with different brands.
Razer, Apple, LG, Microsoft, Google, MSI laptops are expensive, and
others are in the budget range.

3. Types of laptops- Which type of laptop you are looking for like a
gaming laptop, workstation, or notebook. As major people prefer
notebook because it is under budget range and the same can be
concluded from our data.
4. Does the price vary with laptop size in inches?- A Scatter plot is
used when both the columns are numerical and it answers our
question in a better way. From the below plot we can conclude that
there is a relationship but not a strong relationship between the price
and size column.
5. Screen Resolution-screen resolution contains lots of information.
before any analysis first, we need to perform feature engineering over
it. If you observe unique values of the column then we can see that all
value gives information related to the presence of an IPS panel, are a
laptop touch screen or not, and the X-axis and Y-axis screen
resolution. So, we will extract the column into 3 new columns in the
dataset.

Extract Touch screen information- It is a binary variable so we can


encode it as 0 and 1. one means the laptop is a touch screen and zero
indicates not a touch screen.
data['Touchscreen'] = data['ScreenResolution'].apply(lambda x:1 if
'Touchscreen' in x else 0)

#how many laptops in data are touchscreen

sns.countplot(data['Touchscreen'])

#Plot against price

sns.barplot(x=data['Touchscreen'],y=data['Price'])

If we plot the touch screen column against price then laptops with
touch screens are expensive which is true in real life.
Extract IPS Channel presence information-

It is a binary variable and the code is the same we used above. The
laptops with IPS channel are present less in our data but by observing
relationship against the price of IPS channel laptops are high.

#extract IPS column

data['Ips'] = data['ScreenResolution'].apply(lambda x:1 if 'IPS' in x else


0)

sns.barplot(x=data['Ips'],y=data['Price'])

Extract X-axis and Y-axis screen resolution dimensions

Now both the dimension are present at end of a string and separated
with a cross sign. So first we will split the string with space and access
the last string from the list. then split the string with a cross sign and
access the zero and first index for X and Y-axis dimensions.

def findXresolution(s):

return s.split()[-1].split("x")[0]

def findYresolution(s):

return s.split()[-1].split("x")[1]

#finding the x_res and y_res from screen resolution

data['X_res']=data['ScreenResolution'].apply(lambda x:
findXresolution(x))
data['Y_res']=data['ScreenResolution'].apply(lambda y:
findYresolution(y))

#convert to numeric

data['X_res'] = data['X_res'].astype('int')

data['Y_res'] = data['Y_res'].astype('int')

Replacing inches, X and Y resolution to PPI

If you find the correlation of columns with price using the corr method
then we can see that inches do not have a strong correlation but X and
Y-axis resolution have a very strong resolution so we can take
advantage of it and convert these three columns to a single column that
is known as Pixel per inches(PPI). In the end, our goal is to improve
the performance by having fewer features.

data['ppi']=(((data['X_res']**2) +
(data['Y_res']**2))**0.5/data['Inches']).astype('float')

data.corr()['Price'].sort_values(ascending=False)

Now when you will see the correlation of price then PPI is having a
strong correlation.

correlation between different features and price So now we can drop


the extra columns which are not of use. At this point, we have started
keeping the important columns in our dataset.
data.drop(columns = ['ScreenResolution', 'Inches','X_res','Y_res'],
inplace=True)

6. CPU column- If you observe the CPU column then it also contains
lots of information. If you again use a unique function or value counts
function on the CPU column then we have 118 different categories.
The information it gives is about pre-processors in laptops and speed.

How does the price vary with processors?

we can again use our bar plot property to answer this question. And as
obvious the price of i7 processor is high, then of i5 processor, i3 and
AMD processor lies at the almost the same range. Hence price will
depend on the pre-processor
7. Price with Ram-Again Bivariate analysis of price with Ram. If you
observe the plot then Price is having a very strong positive correlation
with Ram or you can say a linear relationship.

8. Memory column-memory column is again a noisy column that gives


an understanding of hard drives. many laptops came with HHD and
SSD both, as well in some there is an external slot present to insert
after purchase. This column can disturb your analysis if not feature
engineer it properly. So If you use value counts on a column then we
are having 4 different categories of memory as HHD, SSD, Flash
storage, and hybrid.
First, we have cleaned the memory column and then made 4 new
columns which are a binary column where each column contains 1
and 0 indicate that amount four is present and which is not present.
Any laptop has a single type of memory or a combination of two. so
in the first column, it consists of the first memory size and if the
second slot is present in the laptop then the second column contains it
else, we fill the null values with zero. After that in a particular
column, we have multiplied the values by their binary value. It means
that if in any laptop particular memory is present then it contains
binary value as one and the first value will be multiplied by it, and
same with the second combination. For the laptop which does have a
second slot, the value will be zero multiplied by zero is zero. Now
when we see the correlation of price then Hybrid and flash storage
have very less or no correlation with a price. We will drop this column
with CPU and memory which is no longer required.

9. GPU Variable- GPU(Graphical Processing Unit) has many categories


in data. We are having which brand graphic card is there on a laptop.
we are not having how many capacities like (6Gb, 12 Gb) graphic card
is present. so we will simply extract the name of the brand.
10.Operating System Column-There are many categories of operating
systems. we will keep all windows categories in one, Mac in one, and
remaining in others. This is a simple and most used feature
engineering method you can try something else if you find more
correlation with price.

when you plot price against operating system then as usual Mac is
most expensive.
 Log-Normal Transformation- we saw the distribution of the target variable
above which was right-skewed. By transforming it to normal distribution
performance of the algorithm will increase. we take the log of values that
transform to the normal distribution which you can observe below. So while
separating dependent and independent variables we will take a log of price,
and in displaying the result perform exponent of it.

 Machine Learning Modelling for Laptop Price Prediction- Now we have


prepared our data and hold a better understanding of the dataset. so let’s get
started with Machine learning modelling and find the best algorithm with the
best hyperparameters to achieve maximum accuracy.

Import Libraries
we have imported libraries to split data, and algorithms you can try. At a
time we do not know which is best so you can try all the imported
algorithms.

Split in train and test test-As discussed, we have taken the log of the
dependent variables. And the training data looks something below the data
frame.

 Classification-

1. Linear regression- Linear regression is one of the easiest and most


popular Machine Learning algorithms. It is a statistical method that is
used for predictive analysis. Linear regression makes predictions for
continuous/real or numeric variables such as sales, salary, age, product
price, etc. Linear regression algorithm shows a linear relationship
between a dependent (y) and one or more independent (y) variables,
hence called as linear regression. Since linear regression shows the
linear relationship, which means it finds how the value of the
dependent variable is changing according to the value of the
independent variable.The linear regression model provides a sloped
straight line representing the relationship between the variables.
Consider the below image:

2. Ridge Regression- Ridge regression is a regularization technique,


which is used to reduce the complexity of the model. It is also called as
L2 regularization. In this technique, the cost function is altered by
adding the penalty term to it. The amount of bias added to the model is
called Ridge Regression penalty.
3. KNN- K-nearest neighbours (KNN) algorithm is a type of supervised
ML algorithm which can be used for both classification as well as
regression predictive problems. However, it is mainly used for
classification predictive problems in industry. The following two
properties would define KNN well −

 Lazy learning algorithm − KNN is a lazy learning algorithm


because it does not have a specialized training phase and uses all
the data for training while classification.

 Non-parametric learning algorithm − KNN is also a non-


parametric learning algorithm because it doesn’t assume
anything about the underlying data.
4. Decision Tree- Decision Tree Analysis is a general, predictive
modelling tool that has applications spanning a number of different
areas. In general, decision trees are constructed via an algorithmic
approach that identifies ways to split a data set based on different
conditions. It is one of the most widely used and practical methods for
supervised learning. Decision Trees are a non-parametric supervised
learning method used for both classification and regression tasks. The
goal is to create a model that predicts the value of a target variable by
learning simple decision rules inferred from the data features.

The decision rules are generally in form of if-then-else statements. The


deeper the tree, the more complex the rules and fitter the model.
5. SVM- Support vector machines (SVMs) are a type of supervised
learning models along with associated learning algorithms that
analyze data and recognize various patterns, used for
classificationanalysis.ThebasicSVMtakesasetofinputdataandpredicts,f
oreachgiven input, which of two possible classes, malignant and
benign forms the output, making it a non-probabilistic binary linear
classifier. Now that there are set of training examples at hand, each
marked as belonging to one of two categories, an SVM training
algorithm constructs a model that assigns new examples into one
category or the other. An SVM model is a representation of the
examples as points in space, mapped so that the examples of the
separate categories are divided by a clear gap that is as wide as
possible. Newer
examplesarethenplottedintoitandthenpredictedtobelongtoacategorybas
edonwhich side of the gap they fall on. More formally, a support
vector machine constructs a hyper plane or set of hyper planes in a
high- or infinite-dimensional space, which can be used for
classification,
regression,orothertasks.Intuitively,agoodseparationisachievedbythehy
perplanethathasthelargest distance to the nearest training data point of
any class (so-called functional margin), since in general the larger the
margin the lower the generalization error of the classifier. A SVM
takes a set of feature vectors as input, generates a training model after
scaling, selecting and validating, and generates a training model as the
output. The following figure represents the training process of a SVM:
6. Random Forest- Random Forest is a popular machine learning
algorithm that belongs to the supervised learning technique. It can be
used for both Classification and Regression problems in ML. It is
based on the concept of ensemble learning, which is a process of
combining multiple classifiers to solve a complex problem and to
improve the performance of the model. As the name suggests,
"Random Forest is a classifier that contains a number of decision trees
on various subsets of the given dataset and takes the average to
improve the predictive accuracy of that dataset." Instead of relying on
one decision tree, the random forest takes the prediction from each tree
and based on the majority votes of predictions, and it predicts the final
output. The greater number of trees in the forest leads to higher
accuracy and prevents the problem of overfitting.
7. Extra Trees- Extra Trees is an ensemble machine learning algorithm
that combines the predictions from many decision trees. It is related to
the widely used random forest algorithm
8. AdaBoost- AdaBoost also called Adaptive Boosting is a technique in
Machine Learning used as an Ensemble Method. The most common
algorithm used with AdaBoost is decision trees with one level that
means with Decision trees with only 1 split. These trees are also called
Decision Stumps.
9. Gradient Boost- Gradient Boosting is a popular boosting algorithm. In
gradient boosting, each predictor corrects its predecessor’s error. In
contrast to Adaboost, the weights of the training instances are not
tweaked, instead, each predictor is trained using the residual errors of
predecessor as labels. There is a technique called the Gradient Boosted
Trees whose base learner is CART (Classification and Regression
Trees).
10.Voting Regressor- Prediction voting regressor for unfitted estimators.
A voting regressor is an ensemble meta-estimator that fits several base
regressors, each on the whole dataset. Then it averages the individual
predictions to form a final prediction.
 Create Web Application for Deployment of Laptop Price Prediction
Model- Now we will use streamlit to create a web app to predict laptop
prices. In a web application, we need to implement a form that takes all the
inputs from users that we have used in a dataset, and by using the dumped
model we predict the output and display it to a user.

Streamlit- Streamlit is an open-source web framework written in Python. It is


the fastest way to create data apps and it is widely used by data science
practitioners to deploy machine learning models. To work with this it is not
important to have any knowledge of frontend languages. Streamlit contains a
wide variety of functionalities, and an in-built function to meet your
requirement. It provides you with a plot map, flowcharts, slider, selection
box, input field, the concept of caching, etc. install streamlit using the below
pip command.
import streamlit as st
import pickle
import numpy as np

# import the model


pipe = pickle.load(open('pipe.pkl','rb'))
df = pickle.load(open('df.pkl','rb'))

st.title("Laptop Predictor")

# brand
company = st.selectbox('Brand',df['Company'].unique())

# type of laptop
type = st.selectbox('Type',df['TypeName'].unique())

# Ram
ram = st.selectbox('RAM(in GB)',[2,4,6,8,12,16,24,32,64])

# weight
weight = st.number_input('Weight of the Laptop')

# Touchscreen
touchscreen = st.selectbox('Touchscreen',['No','Yes'])

# IPS
ips = st.selectbox('IPS',['No','Yes'])
# screen size
screen_size = st.number_input('Screen Size')

# resolution
resolution = st.selectbox('Screen Resolution',
['1920x1080','1366x768','1600x900','3840x2160','3200x1800','2880x1800','2560x1600','2560x1440','2304x
1440'])

#cpu
cpu = st.selectbox('CPU',df['Cpu brand'].unique())

hdd = st.selectbox('HDD(in GB)',[0,128,256,512,1024,2048])

ssd = st.selectbox('SSD(in GB)',[0,8,128,256,512,1024])

gpu = st.selectbox('GPU',df['Gpu brand'].unique())

os = st.selectbox('OS',df['os'].unique())

if st.button('Predict Price'):
# query
ppi = None
if touchscreen == 'Yes':
touchscreen = 1
else:
touchscreen = 0

if ips == 'Yes':
ips = 1
else:
ips = 0

X_res = int(resolution.split('x')[0])
Y_res = int(resolution.split('x')[1])
ppi = ((X_res**2) + (Y_res**2))**0.5/screen_size
query = np.array([company,type,ram,weight,touchscreen,ips,ppi,cpu,hdd,ssd,gpu,os])

query = query.reshape(1,12)
st.title("The predicted price of this configuration is " + str(int(np.exp(pipe.predict(query)[0]))))

Explanation – First we load the data frame and model that we have saved.
After that, we create an HTML form of each field based on training data
columns to take input from users. In categorical columns, we provide the first
parameter as input field name and second as select options which is nothing
but the unique categories in the dataset. In the numerical field, we provide
users with an increase or decrease in the value.
After that, we created the prediction button, and whenever it is triggered it
will encode some variable and prepare a two-dimension list of inputs and
pass it to the model to get the prediction that we display on the screen. Take
the exponential of predicted output because we have done a log of the output
variable.

Now when you run the app file using the above command you will get two
URL and it will automatically open the web application in your default
browser or copy the URL and open it. the application will look something
like the below figure.

4.2 Screenshots-
Input -
Result Prediction-
4.3 Programming Language Used for Implementation-
Jupyter notebooks basically provides an interactive computational environment for
developing Python based Data Science applications. They are formerly known as
python notebooks. The following are some of the features of Jupyter notebooks
that makes it one of the best components of Python ML ecosystem −

 Jupyter notebooks can illustrate the analysis process step by step by


arranging the stuff like code, images, text, output etc. in a step by step
manner.
 It helps a data scientist to document the thought process while developing
the analysis process.
 One can also capture the result as the part of the notebook.
 With the help of jupyter notebooks, we can share our work with a peer also.

4.4 Tool Used-


 Languages: Python.
 Documentation: Microsoft Word, Google docx.
 Software: Jupyter/Google Collab or PyCharm , Excel
 Hardware: Intel Dual Core Dual Processor or advanced version; Minimum
8GB of RAM; Minimum 20 GB of Hard disk Space
4.5 Testing Approach-
The testing of the system is done through a process of different tests to ensure the
correct working of the software and measure its capabilities and limitations. A brief
explanation of the proposed type of tests to be conducted are given below:

1. Unit Testing-

 It concentrates on the efforts required for verification on the minute


units of software design which namely, is the software module.

 We use the component level design description as a guide. Important


control paths are tested to uncover errors within the boundary of the
module.

 The unit test is white box oriented, and the steps can be conducted in
parallel for multiple modules.

2. Integration testing-

 Interfacing of different modules can be problematic.

 Data loss may occur in one interface, one module may affect another,
and individually The allowable impurity can be increased when
combined.

 Integration testing is thus used for building program structure and also
for testing Reveal interface-related errors.

3. Stress testing-
 Stress tests are conducted to counter programs with abnormal
conditions.

 Stress testing forces a system to perform in a way that demands


resources unusually

 The quantity, frequency or volume that allows us to measure the


extent of the system.

4. Performance Testing –

 Performance testing is conducted through all the steps in the testing


process to test runtime performance of software within the context of
an integrated system.

5. Security Testing-

 Thissystemmanagessensitiveinformationrelatedtopatients.Theremaybe
causes and actions that can harm these individuals thus becoming a
target for improper or illegal penetration.

 Security testing attempts to verify that protection mechanisms built


into a system will, in fact, protect it from improper penetration.

 During security testing, the tester plays the role of the hacker who
desires to penetrate the system.

 Given enough time and resources, good security testing will


ultimately
penetratethesystem.Theroleofthesystemdesignerismakepenetratecostm
orethanvalue of the information that will be obtained
4.6 Testing Plan-
1. Unit Testing- Each module will be tested to check whether it gives the
desired output. . feature extraction, it will be ensured that features are
stored in excel sheet and not Result in NaN values. The classification
algorithm will be trained on the data set and then tested on the training
dataset to get the accuracy of the classifier.

2. Integration Testing- All the possible combinations of the individual


modules will be integrated and tested to ensure that it produces correct
output. Each segmentation module will be combined with feature
extraction module and then combined with classification module for
testing.

Test Schedule-

Test Title Date

Test the selected data set with segmentation algorithm. June 2022

Test the selected data set with feature extraction June 2022
algorithm.

Test the selected data set with classification algorithm. June 2022

Test the selected data set with all combinations of


July 2022
segmentation, feature extraction and classification
algorithms.

Unit Test Cases-


Test Case Average Accuracy

Linear regression Accuracy 80%

Ridge Regression Accuracy 81%

KNN Accuracy 80%

Decision Tree Accuracy 84%

SVM Accuracy 80%

Random Forest Accuracy 88%

Extra Trees Accuracy 87%

AdaBoost Accuracy 79%

Gradient Boost Accuracy 88%

Integration Testing Cases –

Voting Regressor Accuracy 89%

Stacking Accuracy 88%


5. CONCLUSION AND FUTURE SCOPE

This chapter discusses the lessons learned and the knowledge gained after the completion of our
project and the possible future scope of our project.

5.1 Conclusion-
Predicting something through the application of machine learning using the
Decision Tree algorithm makes it easy for students, especially in
determining the choice of laptop specifications that are most desirable for
students to meet student needs and in accordance with the purchasing power
of students. Students no longer need to look for various sources to find
laptop specifications that are needed by students in meeting the needs of
students, because the laptop specifications from the results of the machine
learning application have provided the most desirable specifications with
their prices of laptops.

Price optimization using


artificial intelligence and machine learning will help you succeed in an
unpredictable market. The practice saves both time and money because the
software can automate most pricing analytics tasks, leaving you with little
more to do than watch your sales grow. But that’s not to say you won’t have
the freedom to test new hypotheses. Automated price prediction is here to
help you see what works — and what doesn’t. Because pricing is a delicate
matter, and flexibility and real-world trials are still the key. That said,
predictions make the process easier. And they’ll help you get your pricing on
point.

5.2 Future Scope –

Encouraged by these results, future work will involve the improvement of


classification result and overall accuracy. The number of output classes can
also be increased if more data is available. With a more extensive and
diverse dataset, the overall classification accuracy can be dramatically
increased. Another approach to improve the result would be to increase the
number of hidden layers of the neural network. By increasing the number
of hidden layers, the weights will be better adjusted and thus increase the
classification. One can also do fine tuning and transfer learning approaches
to better tune the model on the basis of already trained models.
References-
[1]. Sorower MS. A literature survey on algorithms for multi-label learning.
Oregon State University, Corvallis. 2010 Dec;18.
[2]. Pandey M, Sharma VK. A decision tree algorithm pertaining to the student
performance analysis and prediction. International Journal of Computer
Applications. 2013 Jan 1;61(13).
[3]. Priyama A, Abhijeeta RG, Ratheeb A, Srivastavab S. Comparative analysis of
decision tree classification algorithms. International Journal of Current
Engineering and Technology. 2013 Jun;3(2):334-7
[4]. Streamlit.io, Kaggle.com, Wikipedia.com
[5]. Ho, T. K. (1995, August). Random decision forests. In Document analysis and
recognition, 1995., proceedings of the third international conference on (Vol. 1, pp.
278-282).
[6]. Weka 3 - Data Mining with Open Source Machine Learning Software in Java.
(n.d.), Retrieved from: https://www.cs.waikato.ac.nz/ml/weka/. [August 04, 2018].
[7]. Noor, K., & Jan, S. (2017). Vehicle Price Prediction System using Machine
Learning Techniques. International Journal of Computer Applications, 167(9), 27-
31.
[8]. Pudaruth, S. (2014). Predicting the price of used cars using machine learning
techniques. Int. J. Inf. Comput. Technol, 4(7), 753-764.
[9]. Listiani, M. (2009). Support vector regression analysis for price prediction in a
car leasing application (Doctoral dissertation, Master thesis, TU Hamburg-
Harburg).
[10]. Agencija za statistiku BiH. (n.d.), retrieved from: http://www.bhas.ba .
[accessed July 18, 2018.]
[11]. Utku A, Hacer (Uke) Karacan, Yildiz O, Akcayol MA. Implementation of a
New Recommendation System Based on Decision Tree Using Implicit Relevance
Feedback. JSW. 2015 Dec.

BIBLIOGRAPHY-
1. https://www.python.org
2. https://www.kaggle.com
3. https://github.com
4. GeeksforGeeks
5. https://www.upgrad.com/
6. https://www.researchgate.net/

You might also like