You are on page 1of 53

A PRODUCT RECOMMENDER

SYSTEM USING FREQUENT PATTERN MINING ALGORITHMS


By
AISHIK SETT
Roll No: 25500117063
Registration No: 172550110004
NILANJAN MUKHERJEE
Roll No:25500118003
Registration No:182550120002
NIDHI SHAH
Roll No: 25500117035
Registration No: 172550110032
BAISAKHI MUKHERJEE
Roll No: 25500117046
Registration No: 172550110021

Under the Guidance of


Ms.Madhusmita Mishra

Bachelor of Technology(Computer Science and Engineering)


Department of Computer Science and Engineering
Dr Sudhir Chandra Sur Institute Of Technology & Sports Complex
Maulana Azad Kalam Azad University of Technology
Kolkata,West Bengal,India

CERTIFICATE

This is to certify that this is a bona fide record to the FINAL YEAR project work “A product recommender
system using frequent pattern mining algorithms” done satisfactorily at DR SUDHIR CHANDRA SUR
INSTITUTE OF TECHNOLOGY AND SPORTS COMPLEX by AISHIK SETT, NILANJAN
MUKHERJEE, NIDHI SHAH, BAISAKHI MUKHERJEE OF 8TH SEMESTER, CSE.
This report or the similar report on this topic has not been submitted for any other examination and does not
form part of any other course undergone by the candidate. I have no doubt that they have a very good research
potential.
I would like to wish them a very bright future.

Date:__________________ ____________________
Ms.Madhusmita Mishra
(Asst.Prof,CSE,DSCSITSC)

_____________________ _______________________
Dr.OM PRAKASH SHARMA Ms. Rinku Supakar
(PRINCIPAL,DSCSITSC) (HOD,CSE,DSCSITSC)
ACKNOWLEDGEMENT

We would like to take this opportunity to express my gratitude towards all the people who have in various
ways, helped me in the successful completion of our final year project on work “A Product Recommender
System using Frequent Pattern Mining Algorithms” done satisfactorily at DR. SUDHIR CHANDRA SUR
DEGREE ENGINEERING COLLEGE. We must convey our gratitude to Mrs Madhusmita Mishra,
Assistant professor of Computer Science and Engineering Department for giving us the constant source of
inspiration and help us in preparing the project, personally correcting our work and providing encouragement
throughout the project. In this regard we are especially thankful to our Head of the Department Mrs Rinku
Supakar for steering us through the tough as well as easy phase of the project in a result oriented manner
with concerned attention. A special thanks to all the faculty members of our Computer Science Department.

We will also love to take this opportunity to tell the readers of the material that their comments, be it
appreciation or criticism would be the most valuable thing to us. That will be something we will always
thankful to.

Date: 24/06/2021
ABSTRACT
In this competitive world, time plays an important role in our life. Everybody should utilize its time in an
optimized way for better productivity. A qualitative recommender system helps us to get relevant products
and reduces searching time. To this end, machine learning is used to develop the recommender system. The
recommender system is used for finding most relevant product. In this work Apriori algorithm, FP- Growth,
Eclat algorithm is applied to design the recommendation system for the users based on the relevancy of the
product ranking. At the latter part of the work, the performance analysis and comparison study is done
between above algorithms to find the better algorithms for product recommendation.Recommendation
systems in e-commerce has now become essential tools to help businesses increase their sales. [1]
Keywords: Recommender, Apriori, FP-Growth, Machine learning, Eclat, Recommendation

Fig 1: Data Flow of Product Recommendation


CONTENTS

CHAPTER: 1. INTRODUCTION

1.1 Introduction

1.2 Objective

1.3 Data Mining

1.4 Associated Data Mining

1.5 Recommender System

1.6 Frequent Mining

1.7 Association Rules

1.8 Terms Related to Recommender System

CHAPTER: 2. RELATED WORKS ON RECOMMENDER


SYSTEM
2.1 Content Based Filtering

2.2 Collaborative Filtering


2.2.1 User Based Collaborative
2.2.2 Item Based Collaborative Filtering

2.3 Hybrid Filtering

CHAPTER: 3. ALGORITHMS USED FOR PREDICTION


3.1 Apriori Algorithm

3.2 Eclat Algorithm

3.3 FP Growth Algorithm

CHAPTER: 4. PRESENT WORK

4.1 Market Basket Analysis

4.2 Working of Market Basket Analysis

4.3 Applications of Market Basket Analysis

4.4 Dataset used for the algorithms

4.5 Importing the libraries

4.6 Import of Dataset

4.7 Pre-processing of Data

4.8 Analysis of Data


4.9 Apriori Algorithm
4.9.1 Prediction using the Apriori algorithm

4.10 Frequent Pattern Growth Algorithm (FP-Growth)

4.11 ECLAT Algorithm

4.12 Analysis of Apriori algorithm using graphs

4.13 Required Run Time for Apriori algorithm

4.14 Analysis of Frequent Pattern Growth algorithm (FP-Growth) using


graphs

4.15 Required Run Time for Frequent Pattern Growth algorithm (FP-
Growth) algorithm

4.16 Analysis of ECLAT algorithm using graphs

4.17 Required Run Time for ECLAT algorithm

4.18 Software Requirements

CHAPTER: 5. RESULTS

5 Comparisons

5.1 Apriori algorithm vs. FP Growth algorithm


5.1.1 Apriori vs. FP Growth Code
5.1.2 Apriori vs. FP Growth Graphs

5.2 Apriori algorithm vs. ECLAT algorithm


5.2.1 Apriori vs. ECLAT Code
5.2.2 Apriori vs. ECLAT Graphs
5.3 FP Growth algorithm vs. ECLAT algorithm
5.3.1 FP Growth vs. ECLAT Code
5.3.2 FP Growth vs. ECLAT Graphs

CHAPTER: 6. DEVELOPMENT OF A PORTAL TO SHOW


THE WORKING OF THE ALGORITHM IN THE BACKEND
FOR REAL LIFE SITUATIONS.

6.1 Use of Add button

6.2 Use of Submit button

6.3 Use of Print button

6.4 Use of Recommend button

CHAPTER: 7. CONCLUSIONS

CHAPTER: 8. REFERENCES
CHAPTER: 1. INTRODUCTION

1.1 Introduction
In today’s world, different methods are used to analyse data such as clustering, regression, Neural Networks,
Random Forests, SVM, etc. Data Mining is one among them.
Retailers have access to an unprecedented amount of shopper transaction. As shopping habits have become
more electronics, records of every purchase are neatly stored in databases, ready to be read and analysed.
With such an arsenal of data at their disposal, they can uncover patterns of consumer behaviour.[1]
The challenge with many of these approaches is that they can be difficult to tune, challenging to interpret and
require quite a bit of data preparation and feature engineering to get good results. In other words, they can be
very powerful but require a lot of knowledge to implement properly.
Data Mining is important because lots of data is being collected and warehoused
 Web data and e- commerce.
 Purchase at departmental stores and grocery stores.
 Bank and credit card transaction.
Data Mining is also important because computers have become cheaper and more powerful. Competitive
pressure is also strong because it increases good customer management relations.[2]

1.2 Objective
The objective of Market Basket Analysis models is to recognize and predict the product that a customer may
buy from the store. The primary objective is to improve the effectiveness of marketing and sales tactics
using the customer data that is accumulated with the enterprise during the sales transaction.
The marketing and sales teams can develop more effective pricing, product placement and selling strategies.
It results in predicting product deals in different geographic locations which helps in improving trading time
and decreasing cost operations. It transforms into increased revenues and higher profit margins.

The purpose of this study was to analyze the data of different groceries present in a grocery store by
comparing the Apriori algorithm and the FP-Growth algorithm. To determine the association rules based
on consumer purchase patterns with association techniques that seek several frequent item-sets and proceed
with the establishment of Association Rules. 

1.3 Data Mining


Data Mining is a process of discovering patterns in large data sets involving methods at the intersection of
machine learning, statistics and database systems.
Data Ming involves classes of tasks.
Regression- attempts to find a function that models the data with the least error that is, for estimating the
relationships among data or datasets.
Classification and prediction- is the task of generalizing known structure to apply to new data.
Clustering- is the task of discovering groups and structures in the data that are in some way or another
‘similar’, without using known structure in the data.
Association rule learning- Searches for relationships between variables. This is sometimes referred to as
market basket analysis.
Data mining has several types, including pictorial data mining, text mining, social media mining, web mining,
and audio and video mining among others.
There are few Data Mining Tools that help us to perform our task easily. They are- Oracle Data Mining, IBM
SPSS Modeler, KNIME, Python, Orange etc.[2]

Fig 2:Data Mining Phases


There are some alternative names of Data Mining like- Knowledge Discovery(Mining) in database(KDD),
Knowledge extraction, data/ pattern analysis, Data archaeology , Data Dredging, information harvesting,
business intelligence etc.[2]

Fig 3: Data Mining Alternative Definition


1.3.1 Flow Chart of Data Mining:-
Fig 4: Flowchart of Data Mining
1.3.2 Flow Chart in Tele-Communication Industry
Fig 5: Flowchart of Data Mining in Telecommunication Industry

1.4 Association Mining


Association is a data mining function that discovers the probability of the co-occurrence of items in a
collections.
The relationships between co-occurring items are expressed as association rules. Some of the application of
Association rules mining are to analyse sales transactions, stock analysis, web log mining, medical diagnosis,
customer market analysis bioinformatics etc. There are two necessary steps for association mining. They are-
Frequent Itemset Generation – It finds all itemsets whose support is greater than or equal to the minimum
support threshold.
Rule generation: It generates strong association rules from the frequent itemset whose confidence is greater
than or equal to minimum confidence threshold.[3]

Fig 6: Association Mining Terms with their Formulas

1.4.1 Steps involved in Association Rule Mining:-


Step 1: First find all frequent item sets.
Step 2: Then generate strong association rules from the frequent item sets. Association rules are generated by
building associations from frequent item sets generated in step 1.[3]
Fig 7: Steps of Association Rule Mining

1.4.2 Examples of association rules in data mining:-


One of the classic example of association rule mining refers to a relationship between diapers and beers. The
example, which seems to be fictional, claims that men who go to a store to buy diapers are also likely to buy
beer. Data that would point to that might look like this:
A supermarket has 200,000 customer transactions. About 4,000 transactions, or about 2% of the total number
of transactions, include the purchase of diapers. About 5,500 transactions (2.75%) include the purchase of
beer. Of those, about 3,500 transactions, 1.75%, include both the purchase of diapers and beer. Based on the
percentages, that large number should be much lower. However, the fact that about 87.5% of diaper purchases
include the purchase of beer indicates a link between diapers and beer. [4]
Fig 8: Example of Association Rule Mining

1.4.3 Disadvantages of Association Mining:-


One of the main disadvantage of association rule algorithms is in e-learning: the used algorithms have too
many parameters for somebody non expert in data mining and the obtained rules are far too many, most of
them non-interesting and with low comprehensibility. [4]
Fig 9: Advantages and Disadvantages of Association Rule Mining Algorithms

1.5 Recommender System:


A recommender system  is a information filtering system that seeks to predict user preferences. The motive
of a recommender system is to generate meaningful and similar recommendations to a collection of users for
products that might interest them [2], Recommender systems are used in many areas like video and music
resources, product recommenders for online stores, to buy a product from any kind of online stores like
amazon, flip-kart ; at first we have to search the item and then we get the some similar kind of items
;recommender system recommend us these items. Content recommenders for social media platforms and
open web content recommenders. By giving one single input we are getting many similar kind of results.[5]
1.6 Frequent Pattern Mining
Frequent pattern is defined as a pattern (A set of items, subsequences, substructures etc ) that occurs
frequently in a dataset. In association rule mining finding frequent patterns from databases is time consuming
process. Frequent Pattern Mining (AKA Association Rule Mining) is an analytical process that finds frequent
patterns, associations, or causal structures from data sets found in various kinds of databases such as relational
databases, transactional databases, and other data repositories. [6]

1.6.1 Example of Frequent Pattern Mining

Given a set of transactions, this process aims to find the rules that enable us to predict the occurrence of a
specific item based on the occurrence of other items in the transaction. Let’s look at an example of Frequent
Pattern Mining. 

In the table below, Support (milk->bread) = 0.4 means milk and bread are purchased together occur in 40% of
all transactions. Confidence (milk->bread) = 0.5 means that if there are 100 transactions containing milk then
there will be 50 that will also contain bread. [6]
Fig 10: Example of Frequent Pattern Mining

1.7 Association Rules

Association rules are "if-then" statements that help to show the probability of relationships between data
items, within large data sets in various types of databases. Association rule mining has a number of
applications and is widely used to help discover sales correlations in transactional data or in medical data sets.
[7]

1.7.1 Example of Association Rules

Given a set of transactions, we can find rules that will predict the occurrence of an item based on the
occurrences of other items in the transaction. [8]

Fig 11:Example of Association Rules

An implication expression of the form X -> Y, where X and Y are any 2 itemsets.
Example: {Milk, Diaper}->{Beer}.
From the above table, {Milk, Diaper}=>{Beer}
Support = ({Milk, Diaper, Beer}) |T|
= 2/5
= 0.4

Confidence = (Milk, Diaper, Beer) (Milk, Diaper)


= 2/3
= 0.67

Lift = Supp({Milk, Diaper, Beer}) Supp({Milk, Diaper})*Supp({Beer})


= 0.4/(0.6*0.6)
= 1.11

1.8:Terms Related to Recommender System

1.8.1 Support

The number of transactions that include items in the {X} and {Y} parts of the rule as a percentage of the total
number of transaction. It is a measure of how frequently the collection of items occur together as a percentage
of all transactions. [9]

Fig 12:Support Formula

Considering itemset1 = {bread, butter} and 


itemset2 = {bread, shampoo}.
Many transactions will have both bread and butter on the cart but not so much
bread and shampoo.
. So in this case, itemset1 will generally have a higher support than itemset2.
Mathematically, support is the fraction of the total number of transactions in which the itemset occurs.

1.8.2 Confidence:

Confidence can be interpreted as the likelihood of purchasing both the products A and B. Confidence is
calculated as the number of transactions that include both A and B divided by the number of transactions
includes only product A. [9]

Fig 13 :Confidence Formula


1.8.3 Lift:

The lift of the rule X=>Y is the confidence of the rule divided by the expected confidence, assuming that the
item sets X and Y are independent of each other. The expected confidence is the confidence divided by the
frequency of {Y}. [10]

Fig 14:Lift Formula


1.8.4 Range:

range() is
a built-in function of Python. It is used when a user needs to perform an action for a specific number
of times. range() in Python(3.x) is just a renamed version of a function called xrange in Python(2.x).
The range() function is used to generate a sequence of numbers.

range() is
commonly used in for looping hence, knowledge of same is key aspect when dealing with any kind
of Python code. Most common use of range() function in Python is to iterate sequence type (List, string etc.. )
with for and while loop.

Python range() Basics :
In simple terms, range() allows user to generate a series of numbers within a given range. Depending on how
many arguments user is passing to the function, user can decide where that series of numbers will begin and
end as well as how big the difference will be between one number and the next. range() takes mainly three
arguments.

start: integer starting from which the sequence of integers is to be returned


stop: integer before which the sequence of integers is to be returned.
The range of integers end at stop – 1.
step: integer value which determines the increment between each integer in the sequence [11]

FIG 15 : Example Of Range


CHAPTER: 2. RELATED WORKS ON RECOMMENDER SYSTEM

2.1 Content Based Filtering

2.1.1 Definition:

Content-based Filtering is a Machine Learning technique that uses similarities in features to make decisions.
This technique is often used in recommender systems, which uses algorithms for designing advertisements or
recommending things to users based on knowledge accumulated about the user. [12]

2.1.2 Method/Algorithm:

The method revolves completely around comparing user interests to product features. The products that have
the most overlapping features with user interests are what’s recommended.
Here, two methods can be used (possibly in combination). Firstly, users are given a list of features out of
which they will choose anything they like the most. Secondly, the algorithm will keep track of the products
the user has chosen before and add those features to the users’ data.
Similarly, product features can be identified by the developers of the product themselves. Moreover, users can
be asked what features they believe identify with the products the most. [12]

FIG 16:Implementation Of The Algorithm


Once a numerical value, either binary (1 or 0 value) or an arbitrary number, has been assigned to product
features and user interests, a method to identify similarities between products and user interests needs to be
identified. [12]
A very basic formula would be the dot product. To calculate the dot product the following formula should be
used:

FIG 17 : Dot Product Formula


(where pi is the product feature value and ui is the user interest value in column i).
In the FIG above, user interest level with Product 1 can be estimated to be :
2*1 + 1*1 + 1*22∗1+1∗1+1∗2 , which equals 55.
Similarly, interest in Product 2 will be 1*4 = 41∗4=4 and
will be:2*3 + 1*1=72∗3+1∗1=7 in Product 3.
Hence, Product 3 will be the algorithm’s top recommendation to the user. [12]

2.1.3 Example:

Movie Recommendation System to a Netflix user with profile named Nikhil:

FIG 18 :Example Of Content Based Filtering

Assuming Nikhil has given Good ratings to movies like Mission Impossible and James Bond which are
tagged as “Action” Genre and gave a bad rating to the movie “Toy Story” which is tagged as “Children”
Genre.
Now we will create a User Vector for Nikhil based on his 3 ratings :

FIG 19 :Example Of Content Based Filtering Contd..(User Vector)


On a rating scale of -10 to 10, since Nikhil loves Action movies, value of 9 is assigned to “Action”, Nikhil
hasn't watched any Animation movies, we assign 0 to “Animation” and since Nikhil has given bad reviews for
movies with Children genre — we assign ‘-6 ‘ to “Children”.
So user Vector for Nikhil is (9, 0, -6) in order of (Action, Animation, Children).
FIG 20 : Example Of Content Based Filtering Contd..(User Vector)

The item vector for movie “Toy Story” is (0,1,1) and the movie “Star Wars” is (1,0,0) in order of (Action,
Animation, Children).
We now need to make dot product of two 2-D vectors — Item vector and User Vector

FIG 21 : Example Of Content Based Filtering Contd..(Dot Product)

Accordingly, the dot product of “Toy Story” is -6 and that os “Star Wars” is 9.

Hence “Star Wars” will be recommended to Nikhil — which also matches our intuition that Nikhil likes
Action movies and dislikes Children movies.

In a similar manner — we can calculate the dot products of all the item vectors of all the movies in-store and
recommend top 10 movies to Nikhil. [13]
2.1.4 Advantages of Content-Based Filtering:

 User independence: The content-based method only has to analyze the items and a single user’s
profile for the recommendation, which makes the process less cumbersome. Content-based
filtering would thus produce more reliable results with fewer users in the system.
 Transparency: Collaborative filtering gives recommendations based on other unknown users
who have the same taste as a given user, but with content-based filtering, items are recommended
on a feature-level basis.
 No cold start: As opposed to collaborative filtering, new items can be suggested before being
rated by a substantial number of users. [14]

2.1.5 Disadvantages of Content-Based Filtering:

 Limited content analysis: If the content doesn’t contain enough information to discriminate the
items precisely, the recommendation itself risks being imprecise.
 Over-specialization: Content-based filtering provides a limited degree of novelty since it has to
match up the features of a user’s profile with available items. In the case of item-based filtering,
only item profiles are created and users are suggested items similar to what they rate or search
for, instead of their past history. A perfect content-based filtering system may suggest nothing
unexpected or surprising. [14]

2.2 Collaborative filtering recommender system (CFRS):


Collaborative filtering is an effective and well known technology in recommendation system. Now-a-days
Users often need to spend a lot of time to select an item in the face of massive amounts of information [4]. To
resolve this problem we need recommender system and here CFRS helps us to choose an item with its past
ratings. Collaborating system is basically whenever a person recommending products based on the past
ratings of all buyers collectively [6]. The main purpose of a Collaborative filtering recommender system is that
suggest some relevant products to users [15].
2.2.1 User Based Collaborative Filtering
User based collaborative filtering finds the nearest neighbor set of the target user according to the similarity
between users, and then decides the recommendation result of the target user according to the user’s rating in
the set [16]. The basic idea of the user-based collaborative filtering algorithm is that firstly, create an user-
product matrix, which is based on whether the user is interested in a product.
Let’s take an example, As we have seen below Figure; there are 3 items and and 3 users. If User 1 likes
Item 1,On the other hand UBCF can recommend Item1 to User 3 also [17].
FIG 22 :User Based Collaborative Filtering

.The second step is to calculate the set of similar users of the target user by cosine similarity
according to the product matrix; finally, the user-product matrix is calculated by N The Nearest
Neighbor and User CF algorithm predicts the target user's interest scores for unknown products to
generate recommendation results [18].
Here is the flowchart of the user-based collaborative filtering algorithm recommendation model:-

Fig 23 : Flowchart Of User Based Collaborative Filtering Algorithm

2.2.2 Item Based Collaborative Filtering


FIG 22 :User Based Collaborative Filtering

.The second step is to calculate the set of similar users of the target user by cosine similarity
according to the product matrix; finally, the user-product matrix is calculated by N The Nearest
Neighbor and User CF algorithm predicts the target user's interest scores for unknown products to
generate recommendation results [18].
Here is the flowchart of the user-based collaborative filtering algorithm recommendation model:-

Fig 23 : Flowchart Of User Based Collaborative Filtering Algorithm

2.2.2 Item Based Collaborative Filtering


FIG 22 :User Based Collaborative Filtering

.The second step is to calculate the set of similar users of the target user by cosine similarity
according to the product matrix; finally, the user-product matrix is calculated by N The Nearest
Neighbor and User CF algorithm predicts the target user's interest scores for unknown products to
generate recommendation results [18].
Here is the flowchart of the user-based collaborative filtering algorithm recommendation model:-

Fig 23 : Flowchart Of User Based Collaborative Filtering Algorithm

2.2.2 Item Based Collaborative Filtering


FIG 22 :User Based Collaborative Filtering

.The second step is to calculate the set of similar users of the target user by cosine similarity
according to the product matrix; finally, the user-product matrix is calculated by N The Nearest
Neighbor and User CF algorithm predicts the target user's interest scores for unknown products to
generate recommendation results [18].
Here is the flowchart of the user-based collaborative filtering algorithm recommendation model:-

Fig 23 : Flowchart Of User Based Collaborative Filtering Algorithm

2.2.2 Item Based Collaborative Filtering


FIG 22 :User Based Collaborative Filtering

.The second step is to calculate the set of similar users of the target user by cosine similarity
according to the product matrix; finally, the user-product matrix is calculated by N The Nearest
Neighbor and User CF algorithm predicts the target user's interest scores for unknown products to
generate recommendation results [18].
Here is the flowchart of the user-based collaborative filtering algorithm recommendation model:-

Fig 23 : Flowchart Of User Based Collaborative Filtering Algorithm

2.2.2 Item Based Collaborative Filtering


4.9 Apriori Algorithm

The algorithm starts working by listing the top frequent items brought by the customer.

Fig 49 : Listing Top Frequent Items bought by the Customer

Now we need to generate the association rules for Apriori algorithm. On generating the association rules, we observed:
 22% of transactions containing mineral water also contain chocolate
 32% of transactions containing chocolate also contain mineral water
 34% of transactions containing spaghetti also contain mineral water
 25% of transactions containing mineral water also contain spaghetti
A rule is said to be interesting if it is unexpected or actionable to the user. It is related to
subjective measure. There is more chance of the transaction {spaghetti,mineral water} than {chocolate,mineral water} as we
can find the interesting nature of rule by comparing lift,leverage and conviction of {spaghetti,mineral water} and
{chocolate,mineral water}.

Fig 50 : Generating Association Rules


4.9.1 Prediction using the above algorithm

Fig 51 : Getting itemsets With length=2

Fig 52 : Getting itemsets With length=1 and Support More than 10%

4.10 Frequent Pattern Growth Algorithm (FP-Growth)

The data is fed into the frequent pattern growth algorithm along with its properties and attributes for the machine to predict
grocery items in future.
Fig 53 Fedding the data in FP-Growth Algorithm
Now we need to generate the association rules for Frequent Pattern Growth (FP-Growth) algorithm.

Fig 54 : Generating Association Rules for FP-Growth Algorithm


4.11 ECLAT Algorithm

Invoking the ECLAT algorithm to the grocery data set for studying its effectiveness.

Fig 55 : Invoking Eclat Algorithm to Grocery Store Data

Generation of a Binary data Frame from the Grocery Dataset for the analysis of ECLAT algorithm. Displaying the unique
items present in a dataset.

Fig 54 : Generation of a Binary data Frame from the Grocery Dataset


Calculation of ECLAT Indexes and ECLAT Support. Here we have taken the following assumptions:
 Minimum support count = 0.08
 Minimum Combination = 1
 Minimum Combination = 3

Fig 55 : Calculation of ECLAT Indexes and ECLAT Support.


Generation of a Binary data Frame from the Grocery Dataset
Display of ECLAT indexes with respect to the dataset.

Fig 56 : Generation of a Binary data Frame from the Grocery Dataset


Display of ECLAT support values with respect to the dataset.

Fig 57 : Displaying ECLAT support values with respect to the dataset .

4.12 Analysis of Apriori Algorithm using Graphs

Graph 5 : Support vs Confidence


EXPLANATION:
Support is that points that defines how often the if/then relationship appears in the database.
Confidence is used to expresses about the total number of times a particular relationships have
been found to be true.
From FIGURE 1, we find that the support value remains constant with the increase in
confidence value. Then, it decreases linearly with the decrease in support value. At last it
again remains constant with the increase in confidence values.

Graph 6 : Lift vs Confidence

EXPLANATION
Confidence is used to expresses about the total number of times a particular relationships have been
found to be true. Lift is used to measure the performance of a target model for prediction or classification
of different cases having an enhanced response, measured against a random choice targeting mode
From FIGURE 2, we find that the confidence value decreases rapidly at a fixed Lift value. Then on
reaching the minimum value, the confidence value increases with the increase in lift value. Then, it
decreases rapidly at a fixed lift value.
4.13 Required Run Time for Apriori Algorithm

Fig 58 : Required Run Time for Apriori Algorithm

Graph 7 : LinePlot Graph of Apriori Algorithm


4.14 Analysis of Frequent Pattern Growth Algorithm (FP-Growth) using
Graphs

Graph 8 : Support vs Confidence

GRAPH 1: Support Vs. Confidence


We observed that on increasing the confidence value up to 0.28 the support value remains constant.
From 0.28 the support value increases with the decrease in confidence value. Now on reaching the
maximum value the support value remains constant although the confidence value increases.
When both the support and confidence value is maximum then they decreases and reaches the co-
ordinate (0.22, 0.053). From 0.053 the confidence value increases keeping the support value constant.

Graph 8 : Support Vs Lift


GRAPH 2: Lift Vs. Support
We observed that there is a linear increase for lift and support values. After reaching (1.43.0.058) co-
ordinate, both support and lift value starts decreasing. Finally they reaches the end point at (1.35,
0.053)

Graph 9 : Lift Vs Confidence

GRAPH 3: Confidence vs. Lift


We observed that at a specific value of 1.17 there is a constant increase of confidence. The confidence
reaches the value of 0.29 and then decreases with the increase in Lift. From the lift value of 1.45 the
confidence increases keeping the lift constant. Now from coordinate (1.45, 0.36) the values of
confidence and lift decreases reaching the coordinate (1.35, 0.23). From 1.35 the confidence value
increases keeping lift value constant.

4.15 Required Run Time for Frequent Pattern Growth Algorithm

Fig 59 : Required Run Time for FP-Growth Algorithm


Graph 10 : LinePlot Graph of FP-Growth Algorithm

4.16 Analysis of ECLAT Algorithm Using Graphs

The below code is used to display and study the effect of ECLAT algorithm on dataset.

Graph 11 : Eclat Graph for Eclat_Spupport Vs Eclat_Indexes


The above graph is used on two attributes – Support and Indexes for ECLAT algorithm. At first we
find that the ECLAT Support Value of 0.05 remains constant with the decrease in ECLAT indexes till
it reaches to 0.22. From the coordinate (0.22,0.05) we find that ECLAT Support increases more as
compared to the increase in ECLAT index. It stops at the coordinate of (0.25,0.06). Now there is
constant increase in ECLAT index keeping the ECLAT Support value constant at 0.06.

4.17 Required Run Time for ECLAT Algorithm

The run time for eclat algorithm is implemented in below code.

Fig 62 : Required Run Time for Eclat Algorithm


Now we obtain the ECLAT algorithm graph using below code.

Graph 12 : LinePlot Graph of Eclat Algorithm


4.18 Software Requirements
 The software required for building the project are:
 Windows 10
 Python programming language
 Jupyter Notebook

CHAPTER: 5. RESULTS

5. Comparisons:

5.1 Apriori vs FP-Growth:

Apriori FP-Growth
1) It is an array based algorithm 1) It is a tree based algorithm

2) It uses Join and Prune technique 2) . It constructs conditional frequent pattern


tree and conditional pattern base from database
which satisfy minimum support.

3) Apriori uses a breadth-first search 3) FP Growth uses a depth-first search

4) Apriori utilizes a level-wise approach 4) FP Growth utilizes a pattern-growth


where it generates patterns containing 1 approach means that, it only considers patterns
item, then 2 items, then 3 items, and so on actually existing in the database

5) Candidate generation is extremely slow. 5) Runtime increases linearly, depending on the


Runtime increases exponentially depending number of transactions and items
on the number of different items

6) Candidate generation is very 6) Data are very interdependent, each node


parallelizable. needs the root

7) . It requires large memory space due to 7) It requires less memory space due to compact
large number of candidate generation. structure and no candidate generation

8) It scans the database multiple times for 8) It scans the database only twice for
generating candidate sets constructing frequent pattern tree
5.1.1 Apriori vs FP-Growth Codes:

Fig 60 :Apriori Code

Fig 61 :FP-Growth Code

5.1.2 Apriori vs FP-Growth Graphs:

Graph 13 :Apriori Vs FP-Growth Graph


5.2 Apriori vs Eclat:

Apriori Eclat
1)Apriori is useable with large datasets  1) Eclat is better suited to small and medium
datasets.

2) Apriori Algorithm scans the original 2)  Eclat Algorithm scans the currently
(real) dataset generated dataset.
3) Apriori algorithm is a classical 3) Eclat algorithm also mines the frequent
algorithm used to mine the frequent item itemsets but in vertical manner and it follows
sets in a given dataset. the depth first search of a graph

5.2.1 Apriori vs ECLAT Codes:

Fig 63 :Apriori Code

Fig 64 :Eclat Code


5.2.2 Apriori vs ECLAT Graphs:

Graph 14 :Apriori Vs Eclat Graph


5.3 FP-Growth vs ECLAT:

FPGROWTH Eclat
1)FPGROWTH is useable with large 1) Eclat is better suited to small and medium
datasets  datasets.

2) FPGROWTH Algorithm scans the 2)  Eclat Algorithm scans the currently


original (real) dataset generated dataset.
3) FPGROWTH algorithm is a classical 3) Eclat algorithm also mines the frequent
algorithm used to mine the frequent item itemsets but in vertical manner and it follows
sets in a given dataset. the depth first search of a graph
5.3.1 FP GROWTH VS ECLAT Codes

Fig 66 :FP-Growth Code

Fig 67 :Eclat Code

5.3.2 FP GROWTH VS ECLAT Graphs:

Graph 15 :FP-Growth Vs Eclat Graph


CHAPTER 6 DEVELOPMENT OF A PORTAL TO SHOW THE
WORKING OF THE ALGORITHM IN THE BACKEND FOR REAL
LIFE SITUATIONS

We have developed a portal that deals with the real life situation. The situation deals with the billing
of different products bought by the customer. The system deals in recommending other products to
the customer based on their purchased items.

To develop the portal, we have used Python Programming Language as the backend script. We have
used Frequent Pattern (FP) Growth Algorithm for Product Recommendation to the buyers and/or
users.

We have used a dataset and a database for the working of the portal. The dataset consists of the list of
different grocery items present in a grocery store. The database consists of the list of customer names
along with their purchase date, purchase time, mobile number. It also includes the lists of products
brought by the specific customers along with the total amount paid by the customer.

The below table lists all the items present in the grocery store.

Table 19 :Displaying All items Present in the Grocery Store

The below image shoes the demo of the Product Recommender System. The system consists of
customer name, purchase date, purchase time, mobile number of the customer. It also includes
Product name, price and the quantity of the products bought by the customer. The system consists of
four button. They are:
 Add: The add button is used to add items in the billing system
 Submit: The submit button is used to store different items bought by the customer in the
database of the grocery store.
 Print Bill: This button is used to print the bill of the customer. The customer keeps the bill
for future reference and exchange different products.
 Recommend: The button deals in recommending other products to the customer based on
their purchased items. It creates a notepad file that can be printed and provided to the
customer for further suggestions of different products and availability of items in the store
based on previous purchase.

Fig 68 :Demo Veiw of our Product Recommender System

6.1 Use of Add Button

The below figure shows the working of the add button. The add button will continue to work until the
list of items purchased is submitted to the grocery database. Once it is submitted the add button stops
working.

Fig 69 :Using Add Button


Fig 70 :Using Add Button Contd..(Adding More Items)

6.2 Use of Submit Button

The below figure shows the working of the Submit button. The Submit button helps to store the Total
Amount that the customer needs to pay. The list of items purchased is submitted to the grocery
database. At last it prints the message “Submitted Successfully!!!”.

Fig 71 :Using Submit Button

6.3 Use of Print Button

The below figure shows the working of the Print button. The Print button helps to print the Total
Amount that the customer needs to pay.
Fig 72 :Using Print Button

6.4 Use of Recommend Button

The below figure shows the working of the Recommend button. The Recommend button helps to
recommend the list of items that the customer might buy in future. The recommender button deals in
recommending other products to the customer based on their purchased items and previous history.

Fig 73 :Using Recommend Button


CHAPTER 7 CONCLUSION

During this project we tend to find the correlation between Apriori algorithm, Frequent Pattern (FP)
growth algorithm and ECLAT algorithm. Apriori algorithm and the FP-Growth algorithm have used
on this dataset. Both algorithms generate the same number of rules but the execution time has
differed. The FP-Growth algorithm will be superior in performance in comparison to an existing
Apriori algorithm for association rule mining on spatial data. [37]. Apriori finds the incessant factor sets
whereas FP Growth problem solving finds the traditional factor sets while not candidate factor set
age. Apart from Apriori algorithm we have used ECLAT algorithm. ECLAT algorithm is a superior
version of Apriori algorithm but inferior to FP Growth. In the future, we can try to apply a new
technique of pattern generation which is more efficient than that of FP growth. Also, we can apply
any algorithm of fuzzification on a dataset and we can also use any other spatial dataset for new
research purposes. [38]
CHAPTER 8 REFERENCES

[1] https://towardsdatascience.com/what-are-product-recommendation-engines-and-the-various-
versions-of-them-9dcab4ee26d5

[2] Online Store Product Recommendation SystemUses Apriori MethodTo cite by C S Fatoni et al
2018 J. Phys.: Conf. Ser. 1140 012034

[3] https://labs.sogeti.com/recommender-systems-using-apriori/

[4] https://databricks.com/blog/2018/09/18/simplify-market-basket-analysis-using-fp-growth-on-
databricks.html

[5] RecommenderSystems Prem Melville, Vikas Sindhwani, Machine Learning, IBMT. J. Watson
Research Center, Route /P.O. Box ,  Kitchawan Rd, Yorktown Heights, USA

[6] https://www.dataversity.net/frequent-pattern-mining-association-support-business-analysis/

[7] https://searchbusinessanalytics.techtarget.com/definition/association-rules-in-data-mining

[8] https://www.geeksforgeeks.org/association-rule/

[9] Support vs Confidence in Association Rule Algorithms by Lai, Kenneth & Cerpa, Narciso. (2001).

[10] https://www.dataminingapps.com/2017/04/what-is-the-lift-value-in-association-rule-mining/

[11]http://webpages.iust.ac.ir/yaghini/Courses/Data_Mining_882/DM_02_01_Data
%20Undrestanding.pdf

[12] (IJACSA) International Journal of Advanced Computer Science and Applications, Vol. 5, No.
10, 2014 Application of Content-Based Approach in Research Paper Recommendation System for a
Digital Library

[13] Content-based Recommender System for Movie Website KE MA Master’s Thesis at VionLabs
Supervisor: Chang Gao Examiner: Mihhail Matskin

[14] International Journal of Innovations in Engineering and Technology (IJIET) Content Based
Filtering Techniques in Recommendation System using user preferences R.Manjula Research
Scholar, Anna University, Chennai, India A. Chilambuchelvan Professor, Department of CSE,
R.M.D Engineering College, Chennai, India

[15] https://towardsdatascience.com/introduction-to-recommender-systems-6c66cf15ada

[16] Y. Gao and L. Ran, "Collaborative Filtering Recommendation Algorithm for Heterogeneous Data
Mining in the Internet of Things," in IEEE Access, vol. 7, pp. 123583-123591, 2019, doi:
10.1109/ACCESS.2019.2935224.

[17] Yunkyoung Lee.2015. “Recommendation system using collaborative filtering.” Approved for the
department of computer science san jose state niversity on December 2015

[18] User-based Collaborative Filtering Algorithm Design and Implementation by Hulong Wang et al
2021 J. Phys.: Conf. Ser. 1757 012168

[19] https://en.wikipedia.org/wiki/Item-item_collaborative_filtering
[20] Sarwar, George Kaypi, Joseph Konstan, John Riedl.2001. “Item-based Collaborative Filtering
Recommendation Algorithms.” In the 10th International World Wide Web Conference, 285-295.

[21] https://en.wikipedia.org/wiki/Cosine_similarity

[22] Owen, Sean, Anil, Robin, Dunning, Ted, Friedman, Ellen. 2011 . “ Mahout in action “. Shelter
Island NY: Manning.

[23] Su, Xiaoyuan, and Taghi M. Khoshgoftaar. 2009. “A Survey Of Collaborative Filtering
Techniques.” Advances In Artificial Intelligence 2009: 1-19. doi:10.1155/2009/421425

[24] Xingyuan Li.2011 “Collaborative Filtering Recommendation Algorithm Based on Cluster”,


International Conference on Computer Science and network Technology(ICCSNT), IEEE, 4: 2682-2685

[25] Resnick, Paul, Iacovou, Neophytos, Suchak, Mitesh, Bergstrom, Peter, Riedl, John.1994. “
GroupLens: an open architecture for collaborative filtering of netnews.” CSCW conference, ACM
(1994)

[26] Y. Dong, S. Liu and J. Chai, "Research of hybrid collaborative filtering algorithm based on news
recommendation," 2016 9th International Congress on Image and Signal Processing, BioMedical
Engineering and Informatics (CISP-BMEI), 2016, pp. 898-902, doi: 10.1109/CISP-BMEI.2016.7852838.

[27] Nitin Pradeep Kumar, Zhenzhen Fan, “Hybrid User-Item Based Collaborative Filtering”, Procedia
Computer Science, Volume 60, 2015, Pages 1453-1461, ISSN 1877-0509,

[28] International Journal on Natural Language Computing (IJNLC) Vol. 3, No.1, February 2014.
AN IMPROVED APRIORI ALGORITHM FOR ASSOCIATION RULES Mohammed Al-Maolegi1, Bassam
Arkok2 Computer Science, Jordan University of Science and Technology, Irbid, Jordan

[29] https://www.geeksforgeeks.org/apriori-algorithm/

[30] C. Zhang, X. Zhang and P. Tian, "An Approximate Approach to Frequent Itemset Mining," 2017
IEEE Second International Conference on Data Science in Cyberspace (DSC), 2017, pp. 68-73, doi:
10.1109/DSC.2017.60.

[31] https://www.geeksforgeeks.org/ml-eclat-algorithm/

[32] Yuan J., Ding S. (2012) Research and Improvement on Association Rule Algorithm
Based on FP-Growth. In: Wang F.L., Lei J., Gong Z., Luo X. (eds) Web Information Systems
and Mining. WISM 2012. Lecture Notes in Computer Science, vol 7529. Springer, Berlin,
Heidelberg.

[33] https://www.softwaretestinghelp.com/fp-growth-algorithm-data-mining/

[34] W. Zhang, H. Liao and N. Zhao, "Research on the FP Growth Algorithm about Association Rule
Mining," 2008 International Seminar on Business and Information Management, 2008, pp. 315-318,
doi: 10.1109/ISBIM.2008.177.

[35] Manpreet Kaur, Shivani Kang, Market Basket Analysis: Identify the Changing Trends of Market
Data Using Association Rule Mining, Procedia Computer Science, Volume 85, 2016, Pages 78-85, ISSN
1877-0509

[36] W. Zhang, H. Liao and N. Zhao, "Research on the FP Growth Algorithm about Association Rule
Mining," 2008 International Seminar on Business and Information Management, 2008, pp. 315-318,
doi: 10.1109/ISBIM.2008.177.
[37] Applications of FP-Growth and Apriori Algorithm for Mining Fuzzified Spatial Dataset Puneet
Matapurkar, Saurabh Shrivastava

[38] Comparative Analysis of FP – Tree and Apriori Algorithm Pranali Foley1 , Mohd. Shajid
Ansari2 1M Tech Scholar, CSE, RSRRCET, Bhilai (CG) India, 2Assistant Professor, CSE
Department, RSRRCET Bhilai (CG) India.

You might also like