Professional Documents
Culture Documents
A
Major Project Report
on
“LAPTOP PRICE PREDICTOR”
Master of Computer Application
(Session: 2021-22)
ACKNOWLEDGEMENT
At every outset we express our gratitude to almighty lord for showering his grace
and blessing upon us to complete this project.
Although our name appears on the cover of this book, many people had contributed
in some form or the other form to this project development. We could not done this project
without the assistance or support of each of the following we thank you all.
I wish to place on our record my deep sense of gratitude to my project guide and my
project in charge Prof. A. K. Saxena (H.O.D. of CSIT Dept, GGU), for their constant
motivation and valuable help through the project work. Express our gratitude to Prof. A.
K. Saxena (H.O.D. of CSIT Dept, GGU), for his valuable suggestions and advices
throughout the course. We also extend our thanks to other faculties for their cooperation
during our course.
Finally I would like to thank our friends for their cooperation to complete this project.
Ajit Tiwari
(Signature of the Candidate) Roll no. 20606005
Enroll no. GGV/20/05005
DEPARTMENT OF COMPUTER SCIENCE & INFORMATION TECHNOLOGY
GURU GHASIDAS CENTRAL UNIVERSITY, BILASPUR
(A Central University Established by the Central Universities Act, 2009 No.25 of 2009)
Dept. of CSIT
Guru Ghasidas University Bilaspur (C.G.)
(A Central University)
DEPARTMENT OF COMPUTER SCIENCE & INFORMATION TECHNOLOGY
GURU GHASIDAS CENTRAL UNIVERSITY, BILASPUR
(A Central University Established by the Central Universities Act, 2009 No.25 of 2009)
Date: Date:
Prof A. K Saxena
ABSTRACT
When an item is missing and has to be replaced, the difference in quality between
the disappearing product and the new one must be taken into account in the
consumer price index, in order to measure comparable prices. Hedonic regressions
can be used to estimate this difference, using product characteristics as explanatory
variables for the price. However, the quality of the models can be insufficient due
to the small size of samples. This paper explores the use of web scraping in order
together bigger volumes of information on prices and characteristics, in particular
for electronic goods. Traditional hedonic regressions will be compared with other
predictive methods, including machine learning algorithms in terms of predictive
power.
This paper presents a Laptop price prediction system by using the supervised
machine learning technique. The research uses multiple linear regression as the
machine learning prediction method which offered 81% prediction precision.
Using multiple linear regression, there are multiple independent variables but one
and only one dependent variable whose actual and predicted values are compared
to find precision of results. This paper proposes a system where price is dependent
variable which is predicted, and this price is derived from factors like Laptop’s
model, RAM, ROM (HDD/SSD), GPU, CPU, IPS Display, and Touch Screen.
2. LITERATURE SURVEY
2.1 Data Set
2.2 Pre – Processing and Enhancement
2.3 Feature Engineering
2.4 Classification
3. PROJECT DESIGN
3.1 Feasibility Analysis
3.2 Feasibility Studies
3.3 Life cycle Model
3.4 Project Cost and Time Estimation
3.5 Software Architecture Diagram
3.6 Architectural Style and Justification
3.7 Flow Chart
3.8 Hardware and Software Platform requirements Software
3.9 Software Design/Diagram Document
3.10 Software Description
4. PROJECT IMPLEMENTATION
4.1 Methodology
4.2 Screenshots
4.3 Programming Language Used for Implementation
4.4 Tool Used
4.5 Testing Approach
4.6 Testing Plan
This chapter discusses the main concepts that the project is based on. It identifies what the project is
actually meant to accomplish
Laptop price prediction especially when the laptop is coming direct from the
factory to Electronic Market/ Stores, is both a critical and important task. The mad
rush that we saw in 2020 for laptops to support remote work and learning is no
longer there. In India, demand of Laptops soared after the Nationwide lockdown,
leading to 4.1-Million-unit shipments in the June quarter of 2021, the highest in the
five years. Accurate Laptop price prediction involves expert knowledge, because
price usually depends on many distinctive features and factors. Typically, most
significant ones are brand and model, RAM, ROM, GPU, CPU, etc. In this paper,
we applied different methods and techniques in order to achieve higher precision
of the used laptop price prediction.
Predicting price of laptops has been studied extensively in various researches.
Listen discussed, in her paper written for Master thesis, that regression model that
was built using Decision Tree & Random Forest Regressor can predict the price of
a laptop that has been leased with better precision than multivariate regression or
some simple multiple regression. This is on the grounds that Decision Tree
Algorithm is better in dealing with datasets with more dimensions and it is less
prone to overfitting and underfitting. The weakness of this research is that a change
of simple regression with more advanced Decision Tree Algorithm regression was
not shown in basic indicators like mean, variance or standard deviation.
1.1 Scope of work-
When shopping for a new laptop a consumer may look for certain specifications
and
features on a budget. College students, who generally are financially unstable, have
a limited budget to afford high-end laptops. There are several factors influencing
the price of the laptop. Usually high specifications and more features mean more
money. The purpose of the paper is to identify the most significant factors of
laptop which drives the laptop prices by developing a regression model to forecast
the prices of laptops. Using our regression model and analysis, one may be able to
identify the correct price of the laptop instead of performing a competitive
analysis. For a consumer who lacks knowledge about laptops may find our model
to be useful. Especially our model may be helpful to consumers with a limited
budget, such as students, since they may be able to predict the price of the laptop
given the features and specifications that they want.
We will make a project for Laptop price prediction. The problem statement is that
if any user wants to buy a laptop then our application should be compatible to
provide a tentative price of laptop according to the user configurations. Although it
looks like a simple project or just developing a model, the dataset we have is noisy
and needs lots of feature engineering, and preprocessing that will drive your
interest in developing this project.
When shopping for a new laptop a consumer may look for certain specifications
and features on a budget. College students, who generally are financially unstable,
have a limited budget to afford high-end laptops. There are several factors
influencing the price of the laptop. Usually high specifications and more features
mean more money. The purpose of the paper is to identify the most significant
factors of laptop which drives the laptop prices by developing a regression model
to forecast the prices of laptops. Using our regression model and analysis, one may
be able to identify the correct price of the laptop instead of performing a
competitive analysis. For a consumer who lacks knowledge about laptops may find
our model to be useful. Especially our model may be helpful to consumers with a
limited budget, such as students, since they may be able to predict the price of the
laptop given the features and specifications that they want.
Now, let’s deep-dive into how price prediction with machine learning works.
Machine learning models use both technical and fundamental analysis in the price
forecasting process. Technical analysis looks at historical prices, economic growth
rates, and other related factors, formulating an approximate price. Then, to get a
more accurate picture of the market, the process turns to fundamental analysis.
This step looks at various external and internal factors, including macro-factors
like the season and micro-influencers like the time of the day, trying to figure out
when a consumer is most likely to buy. In mathematical terms, these processes are
known as regression analysis. Which is a statistical way to predict the relationship
between variables (one independent variable and one — or more — dependent
variable).In price prediction, price is the independent variable. And it’s affected by
several dependent variables. Suppose we were trying to price a pizza: the answer
would depend on the size of the pizza and the cost of its ingredients. Beyond
regression, price prediction uses descriptive and predictive analytics. But this is
just another way to describe the discrete steps of regression analysis.
This chapter discusses the papers referenced in preparation for undertaking this project. These papers
serve as a benchmark to enable this project to be undertaken.
In this section, we will relabel and convert categorical features into numerical
features. This is essential for training our ML models as ML models only accept
numerical values as inputs. Starting off, we identify features that are non-numerical
(Object type) and compute their cardinalities (categories present in each
feature).Knowing that the Touchscreen feature only has 2 categories, we can use
label encoding to encode this feature. (One-hot-encoding can be used too) Using the
Scikit-learns label encoding function, the variables present in Touchscreen (‘No’,
‘Yes’) will be encoded into 0s and 1s. Abel encoding also handles features with
high cardinalities. Applying label encoding to the CPU feature, the label encoded
values (associated with their pre-encoded variables) are recorded for predicting
purposes later. Other features with slightly lower cardinality were encoded via the
one-hot-encoding method. Through the use of pandas’ .get dummies() method, new
column will be created to indicate the presence of each categorical variable. after
applying One-hot-encoding to TypeName and OpSys features, we will use manual
encoding to deal with features with high cardinality if we know the order of
variables. We can use python’s dictionary and mapping methods to specify and
encode each category based on their magnitude/order. The code snippet shown
below encodes the Screen Resolution feature based on the pixel count.
We would now extract and reorganize our data to better understand the underlying
factors that contribute to the price of laptops.
If we take a look at the Screen Resolution column, there seems to be laptops with
touchscreen capabilities. Since touchscreen laptops are known to be more
expensive than those without them, a Touchscreen feature would be added to mark
laptops with such capabilities. We would then extract and replace the screen
resolution column with their respective pixel count using regular expressions. I
find regular expressions incredibly useful when it comes to extracting/filtering
alphanumeric values. We would then apply the same process for engineering the
CPU, Ram, and Weight features. Our goal is to minimize or remove any units and
words that are not essential for analysis later. Now comes the most tiring part of
feature engineering, dealing with memory feature. Upon closer inspection, the
memory column contains various types of memory (SSD, HDD, SSHD, and Flash
Storage). We would need to create 4 additional columns representing different
memory types and extract their memory capacities individually. (Additional
processing needs to be done for laptops having double memory configuration that
uses the same memory types.( EX: 256GB SSD + 512GB SSD). This could be
done using a similar process shown above.
It is good that there are no NULL values. And we need little changes in weight and
Ram column to convert them to numeric by removing the unit written after value.
So we will perform data cleaning here to get the correct types of columns.
2.4 Classification-
In classification tasks, there are often several candidate feature extraction methods
available. The most suit able method can be chosen by training neural networks to
perform the required classification task using different input features (derived
using different methods). The error in the neural network k response to test
examples provides an indication of the suitability of the corresponding input
features (and thus method used to derive them) to the considered classification
task. Following are the classification algorithms that have been implemented:
Linear Models
o Logistic Regression
o Support Vector Machines
Non-linear Models
o K-Nearest Neighbours
o Kernel SVM
o Naïve Bayes
o Decision Tree Classification
o Random Forest Classification
3. PROJECT DESIGN
This chapter contains a fully developed Software Project Management Plan for the project. The plan
highlights the deliverables roles tasks and schedule for the project
Our study deals with the segmentation and classification of automated laptop
predictors. laptop, laptop computer, or notebook computer is a small, portable
personal computer (PC) with a screen and alphanumeric keyboard. These typically
have a clamshell form factor, typically having the screen mounted on the inside of
the upper lid and the keyboard on the inside of the lower lid, although 2-in-1 PCs
with a detachable keyboard are often marketed as laptops or as having a laptop
mode. Laptops are folded shut for transportation, and thus are suitable for mobile
use. Its name comes from the lap, as it was deemed practical to be placed on a
person’s lap when being used. Today, laptops are used in a variety of settings, such
as at work, in education, for playing games, web browsing, for personal
multimedia, and general home computer use.
Economic feasibility: Whether the firm can afford to build the software,
whether its benefits should substantially exceed its cost. Our project is
economically feasible. Our system uses academic version of JUPYTER a,
which was very feasible, economically since it can be viewed as a onetime
investment.
Technical feasibility: Whether the technology needed for the system exists,
how difficult It is to build. Our project is technically versatile system which
can work on most platforms making it technically feasible to build requiring
only few specifications. Software used for the project implementation is
JUPYTER. Basic technical knowledge of operating JUPYTER software
along with the classification toolbox is required for the developers.
Schedule Feasibility: How much time is available to build the new system,
when it can be built? The project is entirely build from scratch to completion
in a span of eight- nine months.
Ecological Feasibility: Whether the system has an impact on its
environment. There are no adverse effects on the environment.
Operational feasibility: The system is easy to use and user-friendly. All
maintenance issues will be handled efficiently. System is adaptable to most
environments. Hence our system is operationally feasible.
A feasibility study will help you determine the specific factors that will affect your
project before committing resources, time, or budget. So while it’s tempting to
brush it aside as another exercise delaying getting to work, remember that it’s
easier to address issues before you jump in than it is after. Say, for example, you’re
launching a new app. You’ll want to know if you physically have the resources and
technology needed to produce it, as well as whether or not it’ll give you an
acceptable return on investment (ROI). If you proceed without conducting a full
analysis, you’re opening yourself up to unnecessary risk. A feasibility study
mitigates that risk.
What are the benefits of a feasibility study?
It’s flexible and scalable, which means it can be applied to any kind of
project – whether that’s a software development project, a new product
launch, or a new team process. Although the bigger the project, the more
important it becomes because the investment stakes are that much higher.
It helps you avoid project failure through logical assessment.
It gives stakeholders a clearer picture of the project, which, in turn, helps
improve focus and commitment.
Comparing and analyzing the different options helps you narrow business
alternatives while helping simplify the decision-making process.
It outlines a valid reason for your project to exist.
Evaluating multiple options enhances your project’s success rate.
This model is used only when the requirements are very well known, clear and
fixed.
Product definition is stable.
Technology is understood.
There are no ambiguous requirements
Ample resources with required expertise are available freely
Functionality 4: Verification-
Functionality 5: Maintenance-
Project deliverables-
The diagram (Figure 2) shows the various stages in the development of our system.
The diagram shows interaction between the various components of the application
and their position in the development hierarchy. This style hence, is appropriate for
the selected problem because all the modules in the selected problem function
independently. The communication is strictly through message passing connectors.
The flow of the system is from the left to right.
3.7 Flow Chart-
Start
Segments
Data Cleaning
Classification
Input value
Predict Price
Stop
3.8 Hardware and Software Platform requirements
Software-
Hardware Requirements-
CPU configuration
Software Requirements-
Languages : Python 3
Editor : Python IDK
Dataset : MS excel
Operating System : Windows Xp/7/8/10/11
Hardware Used-
o Processor : Intel i3
o RAM : 4 GB
o Hard Disk : 1 TB Hard Disk
o Monitor : Laptop
Software Used-
Languages : Python 3
Editor : Jupiter / Google Collab And PyCharm
Dataset : MS Excel
Operating System : Windows 11
3.9 Software Design/Diagram Document-
The use case diagram consists of two actors, who interact with the software.
The User: The user takes in input image and sees the final output
The System: The System performs all clustering, feature extraction,
classification and training algorithms.
4. Flow of design and analysis-
INPUT
MISSING DATA
TEST
RESULT
PREDICTION
SERVICES
3.10 Software And Library Description-
This chapter discusses the implementation of the project – the various algorithms, testing approaches
and the results.
4.1 Methodology-
We have implemented nine algorithms in this project. A detailed explanation and
the various outputs are shown below To support the application of machine
learning using the Decision Tree algorithm, of course the sample data is needed.
Table below contains data about various laptops and their prices depending on
their configuration. Sample data were obtained from Kaggle.com
Dataset Used for Analysis- The key to success in the field of machine
learning or to become a great data scientist is to practice with different types
of datasets. But discovering a suitable dataset for each kind of machine
learning project is a difficult task. So, in this topic, we will provide the detail
of the sources from where you can easily get the dataset according to your
project. After loading the dataset via Pandas, we can see a list of laptops and
specs that are associated with each laptop.
3. Types of laptops- Which type of laptop you are looking for like a
gaming laptop, workstation, or notebook. As major people prefer
notebook because it is under budget range and the same can be
concluded from our data.
4. Does the price vary with laptop size in inches?- A Scatter plot is
used when both the columns are numerical and it answers our
question in a better way. From the below plot we can conclude that
there is a relationship but not a strong relationship between the price
and size column.
5. Screen Resolution-screen resolution contains lots of information.
before any analysis first, we need to perform feature engineering over
it. If you observe unique values of the column then we can see that all
value gives information related to the presence of an IPS panel, are a
laptop touch screen or not, and the X-axis and Y-axis screen
resolution. So, we will extract the column into 3 new columns in the
dataset.
sns.countplot(data['Touchscreen'])
sns.barplot(x=data['Touchscreen'],y=data['Price'])
If we plot the touch screen column against price then laptops with
touch screens are expensive which is true in real life.
Extract IPS Channel presence information-
It is a binary variable and the code is the same we used above. The
laptops with IPS channel are present less in our data but by observing
relationship against the price of IPS channel laptops are high.
sns.barplot(x=data['Ips'],y=data['Price'])
Now both the dimension are present at end of a string and separated
with a cross sign. So first we will split the string with space and access
the last string from the list. then split the string with a cross sign and
access the zero and first index for X and Y-axis dimensions.
def findXresolution(s):
return s.split()[-1].split("x")[0]
def findYresolution(s):
return s.split()[-1].split("x")[1]
data['X_res']=data['ScreenResolution'].apply(lambda x:
findXresolution(x))
data['Y_res']=data['ScreenResolution'].apply(lambda y:
findYresolution(y))
#convert to numeric
data['X_res'] = data['X_res'].astype('int')
data['Y_res'] = data['Y_res'].astype('int')
If you find the correlation of columns with price using the corr method
then we can see that inches do not have a strong correlation but X and
Y-axis resolution have a very strong resolution so we can take
advantage of it and convert these three columns to a single column that
is known as Pixel per inches(PPI). In the end, our goal is to improve
the performance by having fewer features.
data['ppi']=(((data['X_res']**2) +
(data['Y_res']**2))**0.5/data['Inches']).astype('float')
data.corr()['Price'].sort_values(ascending=False)
Now when you will see the correlation of price then PPI is having a
strong correlation.
6. CPU column- If you observe the CPU column then it also contains
lots of information. If you again use a unique function or value counts
function on the CPU column then we have 118 different categories.
The information it gives is about pre-processors in laptops and speed.
we can again use our bar plot property to answer this question. And as
obvious the price of i7 processor is high, then of i5 processor, i3 and
AMD processor lies at the almost the same range. Hence price will
depend on the pre-processor
7. Price with Ram-Again Bivariate analysis of price with Ram. If you
observe the plot then Price is having a very strong positive correlation
with Ram or you can say a linear relationship.
when you plot price against operating system then as usual Mac is
most expensive.
Log-Normal Transformation- we saw the distribution of the target variable
above which was right-skewed. By transforming it to normal distribution
performance of the algorithm will increase. we take the log of values that
transform to the normal distribution which you can observe below. So while
separating dependent and independent variables we will take a log of price,
and in displaying the result perform exponent of it.
Import Libraries
we have imported libraries to split data, and algorithms you can try. At a
time we do not know which is best so you can try all the imported
algorithms.
Split in train and test test-As discussed, we have taken the log of the
dependent variables. And the training data looks something below the data
frame.
Classification-
st.title("Laptop Predictor")
# brand
company = st.selectbox('Brand',df['Company'].unique())
# type of laptop
type = st.selectbox('Type',df['TypeName'].unique())
# Ram
ram = st.selectbox('RAM(in GB)',[2,4,6,8,12,16,24,32,64])
# weight
weight = st.number_input('Weight of the Laptop')
# Touchscreen
touchscreen = st.selectbox('Touchscreen',['No','Yes'])
# IPS
ips = st.selectbox('IPS',['No','Yes'])
# screen size
screen_size = st.number_input('Screen Size')
# resolution
resolution = st.selectbox('Screen Resolution',
['1920x1080','1366x768','1600x900','3840x2160','3200x1800','2880x1800','2560x1600','2560x1440','2304x
1440'])
#cpu
cpu = st.selectbox('CPU',df['Cpu brand'].unique())
os = st.selectbox('OS',df['os'].unique())
if st.button('Predict Price'):
# query
ppi = None
if touchscreen == 'Yes':
touchscreen = 1
else:
touchscreen = 0
if ips == 'Yes':
ips = 1
else:
ips = 0
X_res = int(resolution.split('x')[0])
Y_res = int(resolution.split('x')[1])
ppi = ((X_res**2) + (Y_res**2))**0.5/screen_size
query = np.array([company,type,ram,weight,touchscreen,ips,ppi,cpu,hdd,ssd,gpu,os])
query = query.reshape(1,12)
st.title("The predicted price of this configuration is " + str(int(np.exp(pipe.predict(query)[0]))))
Explanation – First we load the data frame and model that we have saved.
After that, we create an HTML form of each field based on training data
columns to take input from users. In categorical columns, we provide the first
parameter as input field name and second as select options which is nothing
but the unique categories in the dataset. In the numerical field, we provide
users with an increase or decrease in the value.
After that, we created the prediction button, and whenever it is triggered it
will encode some variable and prepare a two-dimension list of inputs and
pass it to the model to get the prediction that we display on the screen. Take
the exponential of predicted output because we have done a log of the output
variable.
Now when you run the app file using the above command you will get two
URL and it will automatically open the web application in your default
browser or copy the URL and open it. the application will look something
like the below figure.
4.2 Screenshots-
Input -
Result Prediction-
4.3 Programming Language Used for Implementation-
Jupyter notebooks basically provides an interactive computational environment for
developing Python based Data Science applications. They are formerly known as
python notebooks. The following are some of the features of Jupyter notebooks
that makes it one of the best components of Python ML ecosystem −
1. Unit Testing-
The unit test is white box oriented, and the steps can be conducted in
parallel for multiple modules.
2. Integration testing-
Data loss may occur in one interface, one module may affect another,
and individually The allowable impurity can be increased when
combined.
Integration testing is thus used for building program structure and also
for testing Reveal interface-related errors.
3. Stress testing-
Stress tests are conducted to counter programs with abnormal
conditions.
4. Performance Testing –
5. Security Testing-
Thissystemmanagessensitiveinformationrelatedtopatients.Theremaybe
causes and actions that can harm these individuals thus becoming a
target for improper or illegal penetration.
During security testing, the tester plays the role of the hacker who
desires to penetrate the system.
Test Schedule-
Test the selected data set with segmentation algorithm. June 2022
Test the selected data set with feature extraction June 2022
algorithm.
Test the selected data set with classification algorithm. June 2022
This chapter discusses the lessons learned and the knowledge gained after the completion of our
project and the possible future scope of our project.
5.1 Conclusion-
Predicting something through the application of machine learning using the
Decision Tree algorithm makes it easy for students, especially in
determining the choice of laptop specifications that are most desirable for
students to meet student needs and in accordance with the purchasing power
of students. Students no longer need to look for various sources to find
laptop specifications that are needed by students in meeting the needs of
students, because the laptop specifications from the results of the machine
learning application have provided the most desirable specifications with
their prices of laptops.
BIBLIOGRAPHY-
1. https://www.python.org
2. https://www.kaggle.com
3. https://github.com
4. GeeksforGeeks
5. https://www.upgrad.com/
6. https://www.researchgate.net/