You are on page 1of 6

2023 International Conference on Computer Communication and Informatics (ICCCI), Jan 23-25, 2023,

Coimbatore, India

Automated E-Commerce Price Comparison Website using


PHP, XAMPP, MongoDB, Django, and Web Scrapping
2023 International Conference on Computer Communication and Informatics (ICCCI) | 979-8-3503-4821-7/23/$31.00 ©2023 IEEE | DOI: 10.1109/ICCCI56745.2023.10128573

Nagaraj P Muneeswaran V A V S R Pavan Naidu


Department of Computer Science and Department of Electronics and Department of Computer Science and
Engineering Communication Engineering Engineering
Kalasalingam Academy of Research and Kalasalingam Academy of Research and Kalasalingam Academy of Research and
Education Education Education
Krishnankoil, Virudhunagar, India Krishnankoil, Virudhunagar, India Krishnankoil, Virudhunagar, India
nagaraj.p@klu.ac.in munees.klu@gmail.com 9920004672@klu.ac.in

N Shanmukh P Vinod Kumar G Sri Satyanarayana


Department of Computer Science and Department of Computer Science and Department of Computer Science and
Engineering Engineering Engineering
Kalasalingam Academy of Research and Kalasalingam Academy of Research and Kalasalingam Academy of Research and
Education Education Education
Krishnankoil, Virudhunagar, India Krishnankoil, Virudhunagar, India Krishnankoil, Virudhunagar, India
9920004626@klu.ac.in 9920004709@klu.ac.in 992004648@klu.ac.in

Abstract— normal technique for price comparison of 21% of e-


Background: commerce traffic then we came to implement some
techniques. This website dynamically there no cost
Every online user wants a detailed review about the product they adjustment issues this model achieves an overall accuracy of
buy or the product they are interested in. Buyers research
78% for price comparison we have used 100 categories from
various websites on the internet to get the best offers on their
Amazon and Flipkart we have obtained a precision of 96%.
desired products.
This website helps to compare the prices from different
Purpose: E-commerce websites. This website is very useful for online
We need a model which can give us the details of the products shoppers who shop frequently, to check the prices of various
from different websites. online stores in one place [3]. This system gives you the
Method: prices of products from various sellers to show you where to
buy the product for an affordable price. Any two classes of
In this paper, we are trying to utilize PHP, XAMPP, and the web are analyzed to get the price details. To get the
MongoDB to build a website to get the prices of any desired
required price details of the desired product the system visits
product from Amazon and Flipkart.
the website based on the search and downloads of the user.
Result: Once the prices from the websites are loaded, they will be
Experimental evaluations showed that this proposed website displayed on the website in the form of a price comparison.
shows the prices of the desired product from Amazon and E-commerce applications have different parts, for
Flipkart with an accuracy of 96%. example, a database server, a web application server, and the
Keywords—Price Comparison, Products, Amazon, Flipkart, PGI for transactions online. The internet has changed the
Web Scraping, Data Extraction, XAMPP, PHP, MariaDB, way many people and organizations think and work [4].
MongoDB, Django Web Framework. For this review, a shrewd item search framework was
fostered that empowers PCSs to help amateur customers
I. INTRODUCTION explicitly by obliging client-characterized cost ranges. In
Nowadays, every online user looks for the best deals before this, a "beginner customer" is characterized as a web-based
buying their desired product. One of the main factors that customer who is keen on a specific item classification and
lead to the purchase of a product is the price or cost of the wishes to make a buy inside an estimated financial plan, but
product. Buyers tend to compare prices before buying a who is experiencing issues choosing a particular item
product. But since it is not that easy to search for the product attributable to an absence of earlier information on the
on every price website, indeed there must be a solution to objective item class.
automate this whole process [1]. Automated e-commerce creates new economic values
A web scraper is an automated program that retrieves by making business processes easier and also by opening
information from various websites. This increases web possibilities for market interactions. These technologies
traffic by up to 55% because of security issues. Price provide e-commerce sellers with unprecedented
scraping or price comparison is a method that compares the opportunities to price discriminate. We show that this can be
prices of a product from different websites [2]. If we use the good for consumers and that the focus of regulations should
not be on disallowing such practices [5].

979-8-3503-4821-7/23/$31.00 ©2023 IEEE

Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY SURATHKAL. Downloaded on October 15,2023 at 12:55:30 UTC from IEEE Xplore. Restrictions apply.
2023 International Conference on Computer Communication and Informatics (ICCCI), Jan 23-25, 2023,
Coimbatore, India

II. RELATED WORKS

Ref. No. Author Details Type of the website Features of the website Results

[6] Nimbalkar et al Spam bot Post-Comment similarity, 86% accuracy has been
the ratio of stop words, achieved by this model
and redundancy using the Decision tree
classifier

[7] Khatter et al Real-time bot HTTP response code and Bots were discovered
Inter arrival time before the session
expired

[8] Dharmik et al Web Crawler Percentage of requested 95% accuracy has been
images, duration of the achieved by this model
session, and the response
code from the web.

[9] Sharma et al. Chatbot Size of the Message and 100% accuracy has been
the delay time between achieved by this model
them

[10] Sarker et al. Web Scarper The entropy of inter- 91% accuracy has been
request time and achieved by this model
requested bytes

[11] Raza et al. Social Media bot Probability of transition The accuracy of the
features between the user model has increased by
and click streams an average of 91% in
total

[12] Chee et al. Price Scraping bot Twenty-five features as 90% accuracy has been
total requested pages, achieved using Cascade
session time, the standard neural network
deviation of time between
the requests

Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY SURATHKAL. Downloaded on October 15,2023 at 12:55:30 UTC from IEEE Xplore. Restrictions apply.
2023 International Conference on Computer Communication and Informatics (ICCCI), Jan 23-25, 2023,
Coimbatore, India

PHASE 4:
III. MATERIALS AND METHODS
Develop the model architecture - The following section is
1. MongoDB to explain the structure and working of the machine. This
It is a document-oriented database that is classified provides a clear picture of the system, and how the machine
under NoSQL. The system can deal with large amounts of works and prevents it from becoming a machine that does not
unstructured data hence it is easier to use MongoDB as the solve the problems.
database. The extracted data using the scraper is stored in
the MongoDB database [13]. WEB CRAWLER :
Web Crawlers or Web Spiders are used to automatically
2. Django Web Framework search websites and gather information over the internet. The
Django is a python web framework. E-commerce product first aspect is, the re-fetched URLs will be sent to the scraper
comparison using web scraping is a product price for scraping.
comparison website that is made using the Django web
framework. Products that are requested by users are queried WEB SCRAPER :
in the database using a mapper mongo-engine which is It is used to extract the HTML statistics from URLs sent
object related [14-18]. by the crawler and use them for purposes that are not public.
PHASE 1: In this gadget, python libraries like requests and
beautifulsoup4 are used to achieve scraping. Beautifulsoup4
Data Collection and Analysis - A series of investigations is a python library used for HTML page parsing. Using this
yielded useful information on power and energy data, product information from various unique e-commerce
consumption. Also, the results that turned into results gained websites is extracted and stored in a database.
better know-how about what a rating website is, all about the IV. Proposed Methodology
way it helps people to solve problems before buying home
groceries and examples for current rating websites to test
against competitors. Previous research statistics are obtained
from student-written term papers and have been defined in
the literature review phase information earlier. Meanwhile,
customer statistics were obtained through a survey and
interview conducted by online respondents in addition to
assembly [19-22].
PHASE 2:
Survey of any existing similar system - The next step is to
look at the results achieved and test if there is any machine
that's similar or comparable. The main theme of conducting
the study on a comparable current machine is to know how it
works, what idea is being worked within the machine, what
is calculated with the help of the machine, and how the
difficulties are handled by the machine [23-27].
Fig.1 Proposed Architecture
PHASE 3: In this paper, we are going to develop a website where
Design of the main components of the system - After the prices of the desired products from Amazon and Flipkart
studies on current comparable systems, the next thing to do are displayed using PHP, XAMPP, MongoDB, and web
is to realize what will be the main thing that will make the scraping. Fig.1 describes the architecture of the proposed
machine get advanced. For the operation of this website to system.
use its services, they must log in with their basic records, i.e., We’ve used ‘XAMPP’ for the implementation of our
Name, E-mail, etc. Customers who are registered can be code. This package includes MySQL, Apache web server,
mechanically given access to the newsletter of this website. Perl, FTP server, PHP, and also phpMyAdmin.
Customers will be able to select a product and related listings PHP is a scripting language used for servers and a very
can be displayed. In addition, customers can also upload useful tool used for making interactive and interactive web
additional information such as their preferred goods to their pages. No dataset is present in the project, the data is
profile, so that the current charge of favorite gadgets can be scrapped on its own by web scraping and the results are
sent to the customer individually in addition to the day's displayed. This software collection also includes many other
merchandise mail. Thus, the consumer may be able to obtain components, which are explained below.
the records in which he participates without delay. The main XAMPP Control Panel:
additions of the machine are A flexible database for storing It helps to control and regulate various components
goods and customer records. A user will be able to search for present in XAMPP. The latest update of this software is
the product he is involved in [28-33]. Version 3.2.1.

Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY SURATHKAL. Downloaded on October 15,2023 at 12:55:30 UTC from IEEE Xplore. Restrictions apply.
2023 International Conference on Computer Communication and Informatics (ICCCI), Jan 23-25, 2023,
Coimbatore, India

Cross-platform: is that it is a server-side Scripting Language; i.e, you can just


Various locales have Various configurations of installed install it from the open server and the client computers. It is
operating systems. A cross-platform component has been not necessary to have PHP installed when requesting
included to raise the usage and users for this Apache resources from the server, only the browser would be
distribution package. It extends support for various enough.
platforms such as Windows, MAC, and Linux OS packages.
Apache: What are the requirements to Build a PHP Website?
This is an HTTP cross-platform web server. It is used to Before starting, make sure that there is plain text or a PHP
deliver web content worldwide. The server application is editor installed in the development environment. To upload
cost-free to install and is used by the developer community the files, we need a PHP web server. It can be a local
under the auspices of the Apache Software Foundation. The computer with LAMP or a remote server.
user gets the delivery of requested files, images, and other
documents with the help of a remote Apache server. Creating a website using PHP:
PHP: Our basic PHP website will include a home page,
It is a scripting language that is used in the backend including pricing the product from the search results and
primarily used for web application development. It is easily some images on the e-commerce website. For this PHP
installable on any platform and also supports various website, you need to create a PHP page filled with content
database management systems. C language is used for the based on three HTML pages. The file that you’ve created
implementation. Hypertext Processor is implied by PHP and can be edited by editing text and images from the base
is said to be derived from the tools of Personal Home Page, HTML files.
this explains its functionality and simplicity.
MariaDB:
MySQL DBMS has been a part of XAMPP. But now it
has been replaced by MariaDB. It is developed by MySQL
and is one of the most extensively used relational DBMSs.
It offers various online services for storing, manipulating,
searching, organizing, and deleting data.

Implementing the project using PHP:


PHP is a very widely-used server-side scripting language
for web development. There is various web server software
for setting up our local web server. Amongst them, XAMPP
and Wamp Server are the most popular. XAMPP is a cross-
platform application that can run on Windows, Linux, and
macOS. Hence, we are using PHP in XAMPP for this
project.

Fig 3. Web Scraping using PHP

Fig 2. XAMPP Control Panel Fig 4. Web Scraping using PHP

Why PHP?
Like MySQL, PHP is free to use and also open source. Using Simple HTML Dom
Packages such as XAMPP already have a web server, It is a library developed for PHP versions and it helps us
MySQL, and PHP among others. This makes PHP a cost- to access the page’s content in a much easier way with
effective scripting language when compared to languages selectors. You only need the simple HTML dom.php file
such as CFML or ASP. One of the other advantages of PHP from the zip file. This file should be placed in the same

Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY SURATHKAL. Downloaded on October 15,2023 at 12:55:30 UTC from IEEE Xplore. Restrictions apply.
2023 International Conference on Computer Communication and Informatics (ICCCI), Jan 23-25, 2023,
Coimbatore, India

folder where you wrote your scraper code. Install PHP-


CURL is Not always required, but for more advanced
requests you may need to send different headers.

Installing PHP-CURL
It's not always necessary, but for more advanced requests
you should send different headers. Using the PHP-CURL
library will help. Don't forget to restart the Apache server
after installing the library. Fig.2, 3, and 4 indicate the XAMP
model for the control panel and Web Scraping using PHP.
Scraping the content from various websites :

1. Check the website content Fig 5. The interface of the Price Comparison Website
Most web content is displayed using HTML. Since we
need to extract specific content from the HTML source, it is
also necessary to understand it. First, we need to check what
the source of the page looks like to know what elements to
extract from the page.
In Google Chrome, you can do this by right-clicking on
the element you want to extract and selecting "Inspect
Element". This should open a window in your browser with
the page source and rendered element styles. In this window,
we only need to check the "Elements" tab, which will show
us how the HTML home of the page is structured.

2. Send the request from PHP


Sending a request in this case means accessing the
HTML page directly using PHP code. This can be achieved
in two ways. First, we can use the PHP-CURL library, which
also allows us to modify the headers and body that we send Fig 6. Prices of the Desired Product from Amazon and
in our request. Flipkart
VI. CONCLUSION AND FUTURE WORK
3. Extract the data
We will only extract the title of the films and the rating We have completed a web scraping e-commerce price
associated with each of them from the page we have chosen. comparison website with Web Scraping. A website that
As we saw earlier, the content is displayed in a table where provides price comparison with different online shopping
each cell has its class. Using this we can choose to extract websites like Flipkart, and Amazon for providing the best
all rows of the table. We then look at the cells of interest in deal for the same product with the help of web scraping
each row. (PHP simple DOM), and it is based on PHP.
After retrieving the rows, we looped through them Prices from different e-commerce websites can be
looking for elements with the class name Column or rating compared using this product price comparison website. It is
column. It is important to note that in this example we used very useful for online shoppers who shop frequently to
file_get_html instead of file_get_contents. This is because check the prices of various online stores in a place. This
this function comes from the simple_html_dom library and system shows you the prices of products from various sellers
acts as a wrapper for the file_get_contents function. to show you the best site to buy the product for a reasonable
price. The best two classes of web pages are analyzed for
4. Export the data getting the price information. To get the price details the
In the examples above, we collected the site data and system babbles the website based on the user searches and
displayed it directly on the screen. However, you can also then it downloads the HTML search page of that particular
save data in PHP quite easily. You can save the scraped data web page. After both websites’ product prices are loaded,
to a .txt file, as JSON, as CSV, or even send it directly to a they will be exhibited on the interface of the website in the
database. PHP is doing very well. form of price comparison.
We need to save it in an array and put the contents of the
array in a new file. quired is to collect a huge amount of REFERENCES
statistics from unique e-commerce websites. Manually 1. Shalini, A., & Ambikapathy, M. R. (2022). E-Commerce Analysis and
Product Price Comparison Using Web Mining. Journal homepage: www.
retrieving statistics from websites is not possible. So, a good
ijrpr. com ISSN, 2582, 7421.
way is to create an internet crawler to go to these e-trade 2. Avron, U., Gershtein, S., Guy, I., Milo, T., & Novgorodov, S. (2022).
websites. Fig. 5 and 6 depict the interface models. Automated Category Tree Construction in E-Commerce.

Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY SURATHKAL. Downloaded on October 15,2023 at 12:55:30 UTC from IEEE Xplore. Restrictions apply.
2023 International Conference on Computer Communication and Informatics (ICCCI), Jan 23-25, 2023,
Coimbatore, India

3. Al-Mushayt, O. S., Gharibi, W., & Armi, N. (2022). An E- Technologies in Computer Engineering (pp. 573-583).
Commerce Control Unit for Addressing Online Transactions in Springer, Cham.
Developing Countries: Saudi Arabia—Case Study. IEEE 21. Brintha, N. C., Nagaraj, P., Tejasri, A., Durga, B. V., Teja, M.
Access, 10, 64283-64291. T., & Kumar, M. N. V. P. (2022, June). A Food
4. Zhang, X., Shen, K., Zhang, C., Fan, X., Xiao, Y., He, Z., ... & Recommendation System for Predictive Diabetic Patients using
Wu, L. (2022). Scenario-based Multi-product Advertising ANN and CNN. In 2022 7th International Conference on
Copywriting Generation for E-Commerce. arXiv preprint Communication and Electronics Systems (ICCES) (pp. 1364-
arXiv:2205.10530. 1371). IEEE.
5. Niemir, M., & Mrugalska, B. (2022). Product Data Quality in e- 22. Nagaraj, P., Deepalakshmi, P., Muneeswaran, V., & Muthamil
Commerce: Key Success Factors and Challenges. Production Sudar, K. (2022). Sentiment Analysis on Diabetes Diagnosis
Management and Process Control, 36, 1-12. Health Care Using Machine Learning Technique. In Congress
6. Nimbalkar, T. R., Bhadane, L. G., Kharatmal, D. B., & Borase, on Intelligent Systems (pp. 491-502). Springer, Singapore.
J. P. E-COMMERCE PORTAL WITH RECOMMENDATION 23. Nagaraj, P., Muneeswaran, V., Reddy, L. V., Upendra, P., &
SYSTEM FOR SURGICAL EQUIPMENT. Journal Reddy, M. V. V. (2020, May). Programmed multi-classification
homepage: www. ijrpr. com ISSN, 2582, 7421. of brain tumor images using deep neural network. In 2020 4th
7. Khatter, H., Sharma, A., & Kushwaha, A. K. (2022, July). Web international conference on intelligent computing and control
Scraping based Product Comparison Model for E-Commerce systems (ICICCS) (pp. 865-870). IEEE.
Websites. In 2022 IEEE International Conference on Data 24. Nagaraj, P., Rajasekaran, M. P., Muneeswaran, V., Sudar, K.
Science and Information System (ICDSIS) (pp. 1-6). IEEE. M., & Gokul, K. (2020, August). VLSI implementation of
8. Dharmik, H., Padmane, P., Dhoke, K., Chambhare, S., & image compression using TSA optimized discrete wavelet
Kohad, D. A Review on E-commerce Price Evaluation System. transform techniques. In 2020 Third International Conference
9. Sharma, D. K., Lohana, S., Arora, S., Dixit, A., Tiwari, M., & on Smart Systems and Inventive Technology (ICSSIT) (pp. 667-
Tiwari, T. (2022). E-Commerce product comparison portal for 670). IEEE.
classification of customer data based on data mining. Materials 25. Vamsi, A. M., Deepalakshmi, P., Nagaraj, P., Awasthi, A., &
Today: Proceedings, 51, 166-171. Raj, A. (2020). IOT based autonomous inventory management
10. Sarker, K. U., Saqib, M., Hasan, R., Mahmood, S., Hussain, S., for warehouses. In EAI International Conference on Big Data
Abbas, A., & Deraman, A. (2022). A Ranking Learning Model Innovation for Sustainable Cognitive Computing (pp. 371-376).
by K-Means Clustering Technique for Web Scraped Movie Springer, Cham.
Data. Computers, 11(11), 158. 26. Muneeswaran, V., Nagaraj, P., Dhannushree, U., Ishwarya
11. Raza, M. Z., Verma, P., Abirami, G., & Girija, R. (2022). Lakshmi, S., Aishwarya, R., & Sunethra, B. (2021). A
Prediction of Consumer Purchase Intention Using E-Commerce Framework for Data Analytics-Based Healthcare Systems.
Web Data. Telematique, 6679-6690. In Innovative Data Communication Technologies and
12. Chee, C. C. F. C., Chiew, K. L., Sarbini, I. N., & Jing, E. K. H. Application (pp. 83-96). Springer, Singapore.
(2022). Data Analytics Approach for Short-term Sales Forecasts 27. Nagaraj, P., Rao, J. S., Muneeswaran, V., & Kumar, A. S. (2020,
Using Limited Information in E-commerce Marketplace. Acta May). Competent ultra data compression by enhanced features
Informatica Pragensia, 11(3). excerption using deep learning techniques. In 2020 4th
13. Nagaraj, P., & Deepalakshmi, P. (2020). A framework for e- International Conference on Intelligent Computing and Control
healthcare management service using recommender Systems (ICICCS) (pp. 1061-1066). IEEE.
system. Electronic Government, an International 28. Muneeswaran, V., Nagaraj, M. P., Rajasekaran, M. P.,
Journal, 16(1-2), 84-100. Chaithanya, N. S., Babajan, S., & Reddy, S. U. (2021, July).
14. Nagaraj, P., Deepalakshmi, P., Mansour, R. F., & Almazroa, A. Indigenous Health Tracking Analyzer Using IoT. In 2021 6th
(2021). Artificial flora algorithm-based feature selection with International Conference on Communication and Electronics
gradient boosted tree model for diabetes Systems (ICCES) (pp. 530-533). IEEE.
classification. Diabetes, Metabolic Syndrome and Obesity: 29. Muneeswaran, V., BenSujitha, B., Sujin, B., & Nagaraj, P.
Targets and Therapy, 14, 2789. (2020). A compendious study on security challenges in big data
15. Pa, N., Mb, A., Kb, B., & Ab, D. (2020). Analysis of data and approaches of feature selection. International Journal of
mining techniques in diagnalising heart disease. Intelligent Control and Automation, 13(3), 23-31.
Systems and Computer Technology, 37, 257. 30. Varma, C. G., Nagaraj, P., Muneeswaran, V., Mokshagni, M.,
16. Vb, S. K. (2020). Perceptual image super resolution using deep & Jaswanth, M. (2021, May). Astute Segmentation and
learning and super resolution convolution neural networks Classification of leucocytes in blood microscopic smear images
(SRCNN). Intelligent Systems and Computer using titivated K-means clustering and robust SVM techniques.
Technology, 37(3). In 2021 5th International Conference on Intelligent Computing
17. Nagaraj, P., & Deepalakshmi, P. (2021). Diabetes Prediction and Control Systems (ICICCS) (pp. 818-824). IEEE.
Using Enhanced SVM and Deep Neural Network Learning 31. Sudar, K. M., Nagaraj, P., Deepalakshmi, P., & Chinnasamy, P.
Techniques: An Algorithmic Approach for Early Screening of (2021, January). Analysis of Intruder Detection in Big Data
Diabetes. International Journal of Healthcare Information Analytics. In 2021 International Conference on Computer
Systems and Informatics (IJHISI), 16(4), 1-20. Communication and Informatics (ICCCI) (pp. 1-5). IEEE.
18. Nagaraj, P., & Deepalakshmi, P. (2022). An intelligent fuzzy 32. Sudar, K. M., Deepalakshmi, P., Nagaraj, P., & Muneeswaran,
inference rule‐based expert recommendation system for V. (2020, November). Analysis of Cyberattacks and its
predictive diabetes diagnosis. International Journal of Imaging Detection Mechanisms. In 2020 Fifth International Conference
Systems and Technology. on Research in Computational Intelligence and Communication
19. Nagaraj, P., Deepalakshmi, P., & Ijaz, M. F. (2022). Optimized Networks (ICRCICN) (pp. 12-16). IEEE.
adaptive tree seed Kalman filter for a diabetes recommendation 33. Sudar, K. M., Deepalakshmi, P., Ponmozhi, K., & Nagaraj, P.
system—bilevel performance improvement strategy for (2019, December). Analysis of Security Threats and
healthcare applications. In Cognitive and Soft Computing Countermeasures for various Biometric Techniques. In 2019
Techniques for the Analysis of Healthcare Data (pp. 191-202). IEEE International Conference on Clean Energy and Energy
Academic Press. Efficient Electronics Circuit for Sustainable Development
20. Vignesh, K., & Nagaraj, P. (2022). Analysing the Nutritional (INCCES) (pp. 1-6). IEEE.
Facts in Mc. Donald’s Menu Items Using Exploratory Data
Analysis in R. In International Conference on Emerging

Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY SURATHKAL. Downloaded on October 15,2023 at 12:55:30 UTC from IEEE Xplore. Restrictions apply.

You might also like