Professional Documents
Culture Documents
Product Recommender - Updated
Product Recommender - Updated
CERTIFICATE
This is to certify that this is a bona fide record to the FINAL YEAR project work “A product recommender
system using frequent pattern mining algorithms” done satisfactorily at DR SUDHIR CHANDRA SUR
INSTITUTE OF TECHNOLOGY AND SPORTS COMPLEX by AISHIK SETT, NILANJAN
MUKHERJEE, NIDHI SHAH, BAISAKHI MUKHERJEE OF 8TH SEMESTER, CSE.
This report or the similar report on this topic has not been submitted for any other examination and does not
form part of any other course undergone by the candidate. I have no doubt that they have a very good research
potential.
I would like to wish them a very bright future.
Date:__________________ ____________________
Ms.Madhusmita Mishra
(Asst.Prof,CSE,DSCSITSC)
_____________________ _______________________
Dr.OM PRAKASH SHARMA Ms. Rinku Supakar
(PRINCIPAL,DSCSITSC) (HOD,CSE,DSCSITSC)
ACKNOWLEDGEMENT
We would like to take this opportunity to express my gratitude towards all the people who have in various
ways, helped me in the successful completion of our final year project on work “A Product Recommender
System using Frequent Pattern Mining Algorithms” done satisfactorily at DR. SUDHIR CHANDRA SUR
DEGREE ENGINEERING COLLEGE. We must convey our gratitude to Mrs Madhusmita Mishra,
Assistant professor of Computer Science and Engineering Department for giving us the constant source of
inspiration and help us in preparing the project, personally correcting our work and providing encouragement
throughout the project. In this regard we are especially thankful to our Head of the Department Mrs Rinku
Supakar for steering us through the tough as well as easy phase of the project in a result oriented manner
with concerned attention. A special thanks to all the faculty members of our Computer Science Department.
We will also love to take this opportunity to tell the readers of the material that their comments, be it
appreciation or criticism would be the most valuable thing to us. That will be something we will always
thankful to.
Date: 24/06/2021
ABSTRACT
In this competitive world, time plays an important role in our life. Everybody should utilize its time in an
optimized way for better productivity. A qualitative recommender system helps us to get relevant products
and reduces searching time. To this end, machine learning is used to develop the recommender system. The
recommender system is used for finding most relevant product. In this work Apriori algorithm, FP- Growth,
Eclat algorithm is applied to design the recommendation system for the users based on the relevancy of the
product ranking. At the latter part of the work, the performance analysis and comparison study is done
between above algorithms to find the better algorithms for product recommendation.Recommendation
systems in e-commerce has now become essential tools to help businesses increase their sales. [1]
Keywords: Recommender, Apriori, FP-Growth, Machine learning, Eclat, Recommendation
CHAPTER: 1. INTRODUCTION
1.1 Introduction
1.2 Objective
4.15 Required Run Time for Frequent Pattern Growth algorithm (FP-
Growth) algorithm
CHAPTER: 5. RESULTS
5 Comparisons
CHAPTER: 7. CONCLUSIONS
CHAPTER: 8. REFERENCES
CHAPTER: 1. INTRODUCTION
1.1 Introduction
In today’s world, different methods are used to analyse data such as clustering, regression, Neural Networks,
Random Forests, SVM, etc. Data Mining is one among them.
Retailers have access to an unprecedented amount of shopper transaction. As shopping habits have become
more electronics, records of every purchase are neatly stored in databases, ready to be read and analysed.
With such an arsenal of data at their disposal, they can uncover patterns of consumer behaviour.[1]
The challenge with many of these approaches is that they can be difficult to tune, challenging to interpret and
require quite a bit of data preparation and feature engineering to get good results. In other words, they can be
very powerful but require a lot of knowledge to implement properly.
Data Mining is important because lots of data is being collected and warehoused
Web data and e- commerce.
Purchase at departmental stores and grocery stores.
Bank and credit card transaction.
Data Mining is also important because computers have become cheaper and more powerful. Competitive
pressure is also strong because it increases good customer management relations.[2]
1.2 Objective
The objective of Market Basket Analysis models is to recognize and predict the product that a customer may
buy from the store. The primary objective is to improve the effectiveness of marketing and sales tactics
using the customer data that is accumulated with the enterprise during the sales transaction.
The marketing and sales teams can develop more effective pricing, product placement and selling strategies.
It results in predicting product deals in different geographic locations which helps in improving trading time
and decreasing cost operations. It transforms into increased revenues and higher profit margins.
The purpose of this study was to analyze the data of different groceries present in a grocery store by
comparing the Apriori algorithm and the FP-Growth algorithm. To determine the association rules based
on consumer purchase patterns with association techniques that seek several frequent item-sets and proceed
with the establishment of Association Rules.
Given a set of transactions, this process aims to find the rules that enable us to predict the occurrence of a
specific item based on the occurrence of other items in the transaction. Let’s look at an example of Frequent
Pattern Mining.
In the table below, Support (milk->bread) = 0.4 means milk and bread are purchased together occur in 40% of
all transactions. Confidence (milk->bread) = 0.5 means that if there are 100 transactions containing milk then
there will be 50 that will also contain bread. [6]
Fig 10: Example of Frequent Pattern Mining
Association rules are "if-then" statements that help to show the probability of relationships between data
items, within large data sets in various types of databases. Association rule mining has a number of
applications and is widely used to help discover sales correlations in transactional data or in medical data sets.
[7]
Given a set of transactions, we can find rules that will predict the occurrence of an item based on the
occurrences of other items in the transaction. [8]
An implication expression of the form X -> Y, where X and Y are any 2 itemsets.
Example: {Milk, Diaper}->{Beer}.
From the above table, {Milk, Diaper}=>{Beer}
Support = ({Milk, Diaper, Beer}) |T|
= 2/5
= 0.4
1.8.1 Support
The number of transactions that include items in the {X} and {Y} parts of the rule as a percentage of the total
number of transaction. It is a measure of how frequently the collection of items occur together as a percentage
of all transactions. [9]
1.8.2 Confidence:
Confidence can be interpreted as the likelihood of purchasing both the products A and B. Confidence is
calculated as the number of transactions that include both A and B divided by the number of transactions
includes only product A. [9]
The lift of the rule X=>Y is the confidence of the rule divided by the expected confidence, assuming that the
item sets X and Y are independent of each other. The expected confidence is the confidence divided by the
frequency of {Y}. [10]
range() is
a built-in function of Python. It is used when a user needs to perform an action for a specific number
of times. range() in Python(3.x) is just a renamed version of a function called xrange in Python(2.x).
The range() function is used to generate a sequence of numbers.
range() is
commonly used in for looping hence, knowledge of same is key aspect when dealing with any kind
of Python code. Most common use of range() function in Python is to iterate sequence type (List, string etc.. )
with for and while loop.
Python range() Basics :
In simple terms, range() allows user to generate a series of numbers within a given range. Depending on how
many arguments user is passing to the function, user can decide where that series of numbers will begin and
end as well as how big the difference will be between one number and the next. range() takes mainly three
arguments.
2.1.1 Definition:
Content-based Filtering is a Machine Learning technique that uses similarities in features to make decisions.
This technique is often used in recommender systems, which uses algorithms for designing advertisements or
recommending things to users based on knowledge accumulated about the user. [12]
2.1.2 Method/Algorithm:
The method revolves completely around comparing user interests to product features. The products that have
the most overlapping features with user interests are what’s recommended.
Here, two methods can be used (possibly in combination). Firstly, users are given a list of features out of
which they will choose anything they like the most. Secondly, the algorithm will keep track of the products
the user has chosen before and add those features to the users’ data.
Similarly, product features can be identified by the developers of the product themselves. Moreover, users can
be asked what features they believe identify with the products the most. [12]
2.1.3 Example:
Assuming Nikhil has given Good ratings to movies like Mission Impossible and James Bond which are
tagged as “Action” Genre and gave a bad rating to the movie “Toy Story” which is tagged as “Children”
Genre.
Now we will create a User Vector for Nikhil based on his 3 ratings :
The item vector for movie “Toy Story” is (0,1,1) and the movie “Star Wars” is (1,0,0) in order of (Action,
Animation, Children).
We now need to make dot product of two 2-D vectors — Item vector and User Vector
Accordingly, the dot product of “Toy Story” is -6 and that os “Star Wars” is 9.
Hence “Star Wars” will be recommended to Nikhil — which also matches our intuition that Nikhil likes
Action movies and dislikes Children movies.
In a similar manner — we can calculate the dot products of all the item vectors of all the movies in-store and
recommend top 10 movies to Nikhil. [13]
2.1.4 Advantages of Content-Based Filtering:
User independence: The content-based method only has to analyze the items and a single user’s
profile for the recommendation, which makes the process less cumbersome. Content-based
filtering would thus produce more reliable results with fewer users in the system.
Transparency: Collaborative filtering gives recommendations based on other unknown users
who have the same taste as a given user, but with content-based filtering, items are recommended
on a feature-level basis.
No cold start: As opposed to collaborative filtering, new items can be suggested before being
rated by a substantial number of users. [14]
Limited content analysis: If the content doesn’t contain enough information to discriminate the
items precisely, the recommendation itself risks being imprecise.
Over-specialization: Content-based filtering provides a limited degree of novelty since it has to
match up the features of a user’s profile with available items. In the case of item-based filtering,
only item profiles are created and users are suggested items similar to what they rate or search
for, instead of their past history. A perfect content-based filtering system may suggest nothing
unexpected or surprising. [14]
.The second step is to calculate the set of similar users of the target user by cosine similarity
according to the product matrix; finally, the user-product matrix is calculated by N The Nearest
Neighbor and User CF algorithm predicts the target user's interest scores for unknown products to
generate recommendation results [18].
Here is the flowchart of the user-based collaborative filtering algorithm recommendation model:-
.The second step is to calculate the set of similar users of the target user by cosine similarity
according to the product matrix; finally, the user-product matrix is calculated by N The Nearest
Neighbor and User CF algorithm predicts the target user's interest scores for unknown products to
generate recommendation results [18].
Here is the flowchart of the user-based collaborative filtering algorithm recommendation model:-
.The second step is to calculate the set of similar users of the target user by cosine similarity
according to the product matrix; finally, the user-product matrix is calculated by N The Nearest
Neighbor and User CF algorithm predicts the target user's interest scores for unknown products to
generate recommendation results [18].
Here is the flowchart of the user-based collaborative filtering algorithm recommendation model:-
.The second step is to calculate the set of similar users of the target user by cosine similarity
according to the product matrix; finally, the user-product matrix is calculated by N The Nearest
Neighbor and User CF algorithm predicts the target user's interest scores for unknown products to
generate recommendation results [18].
Here is the flowchart of the user-based collaborative filtering algorithm recommendation model:-
.The second step is to calculate the set of similar users of the target user by cosine similarity
according to the product matrix; finally, the user-product matrix is calculated by N The Nearest
Neighbor and User CF algorithm predicts the target user's interest scores for unknown products to
generate recommendation results [18].
Here is the flowchart of the user-based collaborative filtering algorithm recommendation model:-
The algorithm starts working by listing the top frequent items brought by the customer.
Now we need to generate the association rules for Apriori algorithm. On generating the association rules, we observed:
22% of transactions containing mineral water also contain chocolate
32% of transactions containing chocolate also contain mineral water
34% of transactions containing spaghetti also contain mineral water
25% of transactions containing mineral water also contain spaghetti
A rule is said to be interesting if it is unexpected or actionable to the user. It is related to
subjective measure. There is more chance of the transaction {spaghetti,mineral water} than {chocolate,mineral water} as we
can find the interesting nature of rule by comparing lift,leverage and conviction of {spaghetti,mineral water} and
{chocolate,mineral water}.
Fig 52 : Getting itemsets With length=1 and Support More than 10%
The data is fed into the frequent pattern growth algorithm along with its properties and attributes for the machine to predict
grocery items in future.
Fig 53 Fedding the data in FP-Growth Algorithm
Now we need to generate the association rules for Frequent Pattern Growth (FP-Growth) algorithm.
Invoking the ECLAT algorithm to the grocery data set for studying its effectiveness.
Generation of a Binary data Frame from the Grocery Dataset for the analysis of ECLAT algorithm. Displaying the unique
items present in a dataset.
EXPLANATION
Confidence is used to expresses about the total number of times a particular relationships have been
found to be true. Lift is used to measure the performance of a target model for prediction or classification
of different cases having an enhanced response, measured against a random choice targeting mode
From FIGURE 2, we find that the confidence value decreases rapidly at a fixed Lift value. Then on
reaching the minimum value, the confidence value increases with the increase in lift value. Then, it
decreases rapidly at a fixed lift value.
4.13 Required Run Time for Apriori Algorithm
The below code is used to display and study the effect of ECLAT algorithm on dataset.
CHAPTER: 5. RESULTS
5. Comparisons:
Apriori FP-Growth
1) It is an array based algorithm 1) It is a tree based algorithm
7) . It requires large memory space due to 7) It requires less memory space due to compact
large number of candidate generation. structure and no candidate generation
8) It scans the database multiple times for 8) It scans the database only twice for
generating candidate sets constructing frequent pattern tree
5.1.1 Apriori vs FP-Growth Codes:
Apriori Eclat
1)Apriori is useable with large datasets 1) Eclat is better suited to small and medium
datasets.
2) Apriori Algorithm scans the original 2) Eclat Algorithm scans the currently
(real) dataset generated dataset.
3) Apriori algorithm is a classical 3) Eclat algorithm also mines the frequent
algorithm used to mine the frequent item itemsets but in vertical manner and it follows
sets in a given dataset. the depth first search of a graph
FPGROWTH Eclat
1)FPGROWTH is useable with large 1) Eclat is better suited to small and medium
datasets datasets.
We have developed a portal that deals with the real life situation. The situation deals with the billing
of different products bought by the customer. The system deals in recommending other products to
the customer based on their purchased items.
To develop the portal, we have used Python Programming Language as the backend script. We have
used Frequent Pattern (FP) Growth Algorithm for Product Recommendation to the buyers and/or
users.
We have used a dataset and a database for the working of the portal. The dataset consists of the list of
different grocery items present in a grocery store. The database consists of the list of customer names
along with their purchase date, purchase time, mobile number. It also includes the lists of products
brought by the specific customers along with the total amount paid by the customer.
The below table lists all the items present in the grocery store.
The below image shoes the demo of the Product Recommender System. The system consists of
customer name, purchase date, purchase time, mobile number of the customer. It also includes
Product name, price and the quantity of the products bought by the customer. The system consists of
four button. They are:
Add: The add button is used to add items in the billing system
Submit: The submit button is used to store different items bought by the customer in the
database of the grocery store.
Print Bill: This button is used to print the bill of the customer. The customer keeps the bill
for future reference and exchange different products.
Recommend: The button deals in recommending other products to the customer based on
their purchased items. It creates a notepad file that can be printed and provided to the
customer for further suggestions of different products and availability of items in the store
based on previous purchase.
The below figure shows the working of the add button. The add button will continue to work until the
list of items purchased is submitted to the grocery database. Once it is submitted the add button stops
working.
The below figure shows the working of the Submit button. The Submit button helps to store the Total
Amount that the customer needs to pay. The list of items purchased is submitted to the grocery
database. At last it prints the message “Submitted Successfully!!!”.
The below figure shows the working of the Print button. The Print button helps to print the Total
Amount that the customer needs to pay.
Fig 72 :Using Print Button
The below figure shows the working of the Recommend button. The Recommend button helps to
recommend the list of items that the customer might buy in future. The recommender button deals in
recommending other products to the customer based on their purchased items and previous history.
During this project we tend to find the correlation between Apriori algorithm, Frequent Pattern (FP)
growth algorithm and ECLAT algorithm. Apriori algorithm and the FP-Growth algorithm have used
on this dataset. Both algorithms generate the same number of rules but the execution time has
differed. The FP-Growth algorithm will be superior in performance in comparison to an existing
Apriori algorithm for association rule mining on spatial data. [37]. Apriori finds the incessant factor sets
whereas FP Growth problem solving finds the traditional factor sets while not candidate factor set
age. Apart from Apriori algorithm we have used ECLAT algorithm. ECLAT algorithm is a superior
version of Apriori algorithm but inferior to FP Growth. In the future, we can try to apply a new
technique of pattern generation which is more efficient than that of FP growth. Also, we can apply
any algorithm of fuzzification on a dataset and we can also use any other spatial dataset for new
research purposes. [38]
CHAPTER 8 REFERENCES
[1] https://towardsdatascience.com/what-are-product-recommendation-engines-and-the-various-
versions-of-them-9dcab4ee26d5
[2] Online Store Product Recommendation SystemUses Apriori MethodTo cite by C S Fatoni et al
2018 J. Phys.: Conf. Ser. 1140 012034
[3] https://labs.sogeti.com/recommender-systems-using-apriori/
[4] https://databricks.com/blog/2018/09/18/simplify-market-basket-analysis-using-fp-growth-on-
databricks.html
[5] RecommenderSystems Prem Melville, Vikas Sindhwani, Machine Learning, IBMT. J. Watson
Research Center, Route /P.O. Box , Kitchawan Rd, Yorktown Heights, USA
[6] https://www.dataversity.net/frequent-pattern-mining-association-support-business-analysis/
[7] https://searchbusinessanalytics.techtarget.com/definition/association-rules-in-data-mining
[8] https://www.geeksforgeeks.org/association-rule/
[9] Support vs Confidence in Association Rule Algorithms by Lai, Kenneth & Cerpa, Narciso. (2001).
[10] https://www.dataminingapps.com/2017/04/what-is-the-lift-value-in-association-rule-mining/
[11]http://webpages.iust.ac.ir/yaghini/Courses/Data_Mining_882/DM_02_01_Data
%20Undrestanding.pdf
[12] (IJACSA) International Journal of Advanced Computer Science and Applications, Vol. 5, No.
10, 2014 Application of Content-Based Approach in Research Paper Recommendation System for a
Digital Library
[13] Content-based Recommender System for Movie Website KE MA Master’s Thesis at VionLabs
Supervisor: Chang Gao Examiner: Mihhail Matskin
[14] International Journal of Innovations in Engineering and Technology (IJIET) Content Based
Filtering Techniques in Recommendation System using user preferences R.Manjula Research
Scholar, Anna University, Chennai, India A. Chilambuchelvan Professor, Department of CSE,
R.M.D Engineering College, Chennai, India
[15] https://towardsdatascience.com/introduction-to-recommender-systems-6c66cf15ada
[16] Y. Gao and L. Ran, "Collaborative Filtering Recommendation Algorithm for Heterogeneous Data
Mining in the Internet of Things," in IEEE Access, vol. 7, pp. 123583-123591, 2019, doi:
10.1109/ACCESS.2019.2935224.
[17] Yunkyoung Lee.2015. “Recommendation system using collaborative filtering.” Approved for the
department of computer science san jose state niversity on December 2015
[18] User-based Collaborative Filtering Algorithm Design and Implementation by Hulong Wang et al
2021 J. Phys.: Conf. Ser. 1757 012168
[19] https://en.wikipedia.org/wiki/Item-item_collaborative_filtering
[20] Sarwar, George Kaypi, Joseph Konstan, John Riedl.2001. “Item-based Collaborative Filtering
Recommendation Algorithms.” In the 10th International World Wide Web Conference, 285-295.
[21] https://en.wikipedia.org/wiki/Cosine_similarity
[22] Owen, Sean, Anil, Robin, Dunning, Ted, Friedman, Ellen. 2011 . “ Mahout in action “. Shelter
Island NY: Manning.
[23] Su, Xiaoyuan, and Taghi M. Khoshgoftaar. 2009. “A Survey Of Collaborative Filtering
Techniques.” Advances In Artificial Intelligence 2009: 1-19. doi:10.1155/2009/421425
[25] Resnick, Paul, Iacovou, Neophytos, Suchak, Mitesh, Bergstrom, Peter, Riedl, John.1994. “
GroupLens: an open architecture for collaborative filtering of netnews.” CSCW conference, ACM
(1994)
[26] Y. Dong, S. Liu and J. Chai, "Research of hybrid collaborative filtering algorithm based on news
recommendation," 2016 9th International Congress on Image and Signal Processing, BioMedical
Engineering and Informatics (CISP-BMEI), 2016, pp. 898-902, doi: 10.1109/CISP-BMEI.2016.7852838.
[27] Nitin Pradeep Kumar, Zhenzhen Fan, “Hybrid User-Item Based Collaborative Filtering”, Procedia
Computer Science, Volume 60, 2015, Pages 1453-1461, ISSN 1877-0509,
[28] International Journal on Natural Language Computing (IJNLC) Vol. 3, No.1, February 2014.
AN IMPROVED APRIORI ALGORITHM FOR ASSOCIATION RULES Mohammed Al-Maolegi1, Bassam
Arkok2 Computer Science, Jordan University of Science and Technology, Irbid, Jordan
[29] https://www.geeksforgeeks.org/apriori-algorithm/
[30] C. Zhang, X. Zhang and P. Tian, "An Approximate Approach to Frequent Itemset Mining," 2017
IEEE Second International Conference on Data Science in Cyberspace (DSC), 2017, pp. 68-73, doi:
10.1109/DSC.2017.60.
[31] https://www.geeksforgeeks.org/ml-eclat-algorithm/
[32] Yuan J., Ding S. (2012) Research and Improvement on Association Rule Algorithm
Based on FP-Growth. In: Wang F.L., Lei J., Gong Z., Luo X. (eds) Web Information Systems
and Mining. WISM 2012. Lecture Notes in Computer Science, vol 7529. Springer, Berlin,
Heidelberg.
[33] https://www.softwaretestinghelp.com/fp-growth-algorithm-data-mining/
[34] W. Zhang, H. Liao and N. Zhao, "Research on the FP Growth Algorithm about Association Rule
Mining," 2008 International Seminar on Business and Information Management, 2008, pp. 315-318,
doi: 10.1109/ISBIM.2008.177.
[35] Manpreet Kaur, Shivani Kang, Market Basket Analysis: Identify the Changing Trends of Market
Data Using Association Rule Mining, Procedia Computer Science, Volume 85, 2016, Pages 78-85, ISSN
1877-0509
[36] W. Zhang, H. Liao and N. Zhao, "Research on the FP Growth Algorithm about Association Rule
Mining," 2008 International Seminar on Business and Information Management, 2008, pp. 315-318,
doi: 10.1109/ISBIM.2008.177.
[37] Applications of FP-Growth and Apriori Algorithm for Mining Fuzzified Spatial Dataset Puneet
Matapurkar, Saurabh Shrivastava
[38] Comparative Analysis of FP – Tree and Apriori Algorithm Pranali Foley1 , Mohd. Shajid
Ansari2 1M Tech Scholar, CSE, RSRRCET, Bhilai (CG) India, 2Assistant Professor, CSE
Department, RSRRCET Bhilai (CG) India.