You are on page 1of 5

Optimizing Retail Strategies: Comparative

analysis of algorithms for Association Rule


Mining.
Bhavana N S Brundaja D N
Dept. CSE Dept. CSE
BMS Institute of Technology and BMS Institute of Technology and
Management, Bengaluru Karnataka - Management, Bengaluru Karnataka -
560064 560064
1by20cs038@bmsit.in 1by20cs041@bmsit.in

Abstract: The identification of frequent item sets stands as a example is Market Basket Analysis [2]. Upon entering a
pivotal challenge in the realm of data mining. In the contemporary department store, individuals often have a list of intended
landscape, where the demand is for increased outcomes within purchases based on their needs and preferences. If items X
stringent time constraints, there arises an imperative for and Y are frequently purchased together, positioning both X
algorithms capable of swift computations. This study aims to
devise an algorithm facilitating the generation of frequent item
and Y on the same shelf can entice buyers to consider
sets, markedly reducing the time and effort invested. Frequent complementary purchases. Conversely, separating products
item sets represent combinations of items frequently purchased X and Y prompts customers to traverse the store, potentially
together, such as the classic example of bread and butter. In this prompting recall of additional desired items. Traditional rule
study, we explore two association rules mining algorithms: FP- learning algorithms focus on constructing regression,
Growth and Eclat, employed to ascertain the likelihood of classification, and predictive rules [3]. This domain, part of
secondary items being purchased when a primary item is acquired. machine learning, falls under supervised or predictive
FP-Growth operates through a two-phase methodology, offering learning. Meanwhile, descriptive induction, including
a vertical perspective of data and utilizing a depth-first algorithm. associative rule learning and subgroup discovery, has gained
It minimizes database reads, storing data in a tree format within
the primary memory. On the other hand, the Eclat algorithm
notable traction as approaches to non-classical induction.
employs depth-first search techniques to discover frequent item
sets. To evaluate their performance, these algorithms are Diverse algorithms are employed in association rule learning,
compared based on their time efficiency. Through meticulous among which FP-Growth and Eclat stand prominent. FP-
comparison, it is established that the FP-Growth algorithm Growth, a two-phase method, optimizes by reading the
exhibits superior speed for smaller datasets. It demonstrates a database only twice and storing it in a tree structure within
shorter timeframe in generating frequent item sets compared to primary memory. Eclat, a scalable variant of FP-Growth,
the Eclat algorithm. emphasizes using set intersection to compute the support of
candidate item sets, bypassing the generation of non-existent
Keywords: Association Rule, Frequent Item Sets, Eclat, FP- subsets within the prefix tree.
Growth

I. INTRODUCTION
II. LITERATURE SURVEY
Investing time and resources into strategically placing
1. Christian Borgelt [4], in the article "Efficient
products in a store not only reduces customer shopping
Implementations of FP-Growth and Eclat," details the
duration but also triggers reminders of related products,
implementations of FP-Growth and Eclat alongside various
potentially amplifying cross-sales. This strategy hinges on
optimization techniques aimed at maximizing performance
association learning, which navigates relationships among
concerning execution time and memory usage. Borgelt
items within extensive databases [1]. A quintessential
explains that FP-Growth relies on an FP tree representation same supermarket visit? Leveraging this information can
to identify frequent item sets, while Eclat utilizes bit matrices potentially increase sales by enabling retailers to engage in
to portray transaction lists, facilitating the filtering of closed targeted marketing based on predictions, implement cross-
and maximal item sets. selling initiatives, and plan shelf space for optimal product
placement.
2. In the paper "Comparative Survey on Association Rule
Mining Algorithms" by Manisha Girotra et al [5], the In a conceptual shift, envision the universe as the set of items
exploration of algorithms employed in associative rule available in the store, where each item possesses a Boolean
learning is segmented into two stages. Girotra and colleagues variable indicating its presence or absence. Each basket is
thoroughly examine and scrutinize the merits and drawbacks then represented by a Boolean vector assigned values for
of these algorithms. Interestingly, despite fundamental these variables. The analysis of these Boolean vectors unveils
differences in their strategies, these algorithms demonstrate purchase patterns reflecting frequently associated or co-
almost similar runtimes. Notably, the FP-Growth algorithm bought items, represented in the form of association rules.
outperforms others for closed item sets, whereas Eclat takes
the lead in free item sets. Association Rule Mining - Association rule mining plays a
crucial role in recommendation systems, guiding decisions
3. Mohammod Akib Khan et al, in their paper titled "Market on the user's interests. It operates as a recommendation
Basket Analysis for improving the effectiveness of marketing system, as seen in platforms like Netflix, where user
and sales using Apriori, FP Growth, and Eclat Algorithm" interactions are monitored to personalize content.
[6], expound on the design and execution of the three most Association rule learning extends beyond recommendation
prominent algorithms in market basket analysis. systems, revealing relationships between independent
databases, applicable in domains such as retail, web
4. The paper "Association Rule – Extracting Knowledge analytics, and bioinformatics.
Using Market Basket Analysis" by Raorane A.A. et al [7]
conducts an in-depth analysis of a substantial volume of data Support, a key parameter, quantifies the frequency of an
to elucidate consumer behavior. The aim is to make informed item's appearance in the dataset. Mathematically, the support
decisions, gaining a competitive edge over competitors. of item 'X' concerning transaction 'T' is the count of
transactions containing 'X.'
5. In [8], an optimized and efficient algorithm known as
Customer Purchase Behavior (CPB) was devised for
discovering frequent item sets while minimizing scans, time,
and memory usage, subsequently generating rules. The CPB Support Count, another parameter, elucidates the frequent
algorithm comprises two primary components: generating occurrence of an item in a transaction. If 'N' is the total
frequent item sets and predicting customer behavior based on number of transactions, the support of 'X' is defined
these sets. However, it's noteworthy that the CPB algorithm accordingly.
consumes more time in generating associated rules.

III. CONCEPTUAL INFORMATION The 'confidence' parameter gauges the reliability of a rule,
indicating the certainty with which a rule applies to a specific
Market Basket Analysis - Market basket analysis serves as item set. Confidence for 'X' implies 'Y' is mathematically
a strategic data mining approach employed by retailers to defined.
boost sales through a comprehensive understanding of
customer purchasing patterns. This technique involves
scrutinizing substantial datasets, including historical
purchase records, to unveil inherent product groupings and Lift, a metric of significance, elucidates the importance of a
identify items frequently bought together. targeting item in a specific classification case. It signifies the
correlation between the targeted item and the primary item.
Recognition of these co-occurrence patterns empowers
retailers to make informed decisions, optimizing inventory
management, devising effective marketing strategies,
employing cross-selling tactics, and refining store layout for
enhanced customer engagement. FP-Growth Algorithm - FP-Growth, which stands for
Frequent Pattern Growth, represents a novel approach to data
Consider, for instance, the scenario where customers representation through an FP tree, emphasizing frequent
purchase milk. What is the likelihood that they will also patterns. It is a method employed for Mining Frequent Item
purchase bread (and the specific type of bread) during the sets and serves as an evolution beyond the Apriori Algorithm.
Notably, FP-Growth eliminates the necessity for candidate 2. Building the FP-Tree:
generation to derive frequent patterns, introducing the
concept of a frequent pattern tree structure that retains First Pass: Counting Item Frequencies:
associations between item sets. ➔ Scan the dataset to count the frequency of each item.
➔ Discard infrequent items below a specified minimum
A Frequent Pattern Tree is a tree structure constructed using support threshold.
earlier item sets from the data, with the primary objective of
mining the most frequent patterns. Each node in the FP tree Second Pass: Building the FP-Tree:
symbolizes an item within that itemset. The root node ➔ Construct the FP-Tree (Frequent Pattern Tree) from the
denotes the null value, while subsequent nodes signify the transactions.
item sets in the data. The creation of the tree meticulously ➔ Start with an empty tree.
preserves the associations between these nodes, ensuring a ➔ For each transaction, insert the items into the tree,
clear representation of the relationships between different updating node counts as needed.
item sets.
3. Mining Frequent Item sets:
ECLAT Algorithm - ECLAT, which stands for Equivalence
Class Clustering and Bottom-Up Lattice Traversal, is an Mining the FP-Tree:
association rule mining method that addresses the challenge ➔ Traverse the FP-Tree to generate frequent item sets.
of generating frequent items efficiently. While various
➔ Start with the least frequent item and grow the item sets
algorithms exist for discovering frequent item sets, Apriori, a
incrementally.
fundamental method, can be time-consuming as it requires
multiple scans of the database. To overcome these ➔ For each item, find its conditional pattern base and
limitations, the ECLAT algorithm was developed. construct a conditional FP-Tree.
➔ Recursively mine the conditional FP-Tree to find
In contrast to Apriori, ECLAT adopts a vertical database frequent item sets.
representation, enabling a single scan of the database.
Initially, the horizontally formatted data is transformed into 4. Generating Association Rules:
a vertical format during the first database scan. Mining
operations are then performed on this vertically structured Calculating Confidence:
dataset by intersecting the TID-sets of each pair of frequent ➔ For each frequent itemset, generate association rules.
single items. The support count of an itemset is determined ➔ Calculate the confidence of each rule, indicating the
by the length of its TID-set. For instance, if the minimum likelihood of the consequence given the antecedent.
support count is set to 2, association rules can be generated
from any frequent item sets. Pruning Rules:
➔ Prune rules that do not meet the minimum confidence
A notable optimization in ECLAT is the use of "fast threshold.
intersection." In this approach, when two TID-lists are
intersected, the resulting TID-list is only considered if its
cardinality meets the minimum support requirement. This
means that each intersection is promptly eliminated if it fails
to satisfy the minimum support criteria, contributing to the
efficiency of the algorithm.

IV. IMPLEMENTATION

Steps of FP-Growth Algorithm

1. Data Preprocessing:
Transaction Encoding:
➔ Represent each transaction in the dataset as a set of
items.
➔ Encode transactions into a binary format where each
column corresponds to an item, and each row
corresponds to a transaction.
Pseudocode for FP-Growth Algorithm
Steps of ECLAT algorithm week. With 7501 transaction records, each detailing items
sold, the analysis aims to unveil association rules between
1. Input: items, contributing to various applications in data-driven
decision-making.
➔ Transactional database D with transactions T1, T2, ...,
Tn.
➔ Minimum support threshold min_support.

2. Transaction Encoding:

➔ Represent each transaction as a set of items.

3. Building Equivalence Classes:

➔ Create a vertical database representation.


➔ Sort items by support count in descending order.
➔ Create equivalence classes for each item.

4. Recursive Depth-First Search:

➔ Recursively combine TID-lists of items in descending


order of support.

5. Generate Frequent Itemsets:

➔ Explore the lattice structure in a bottom-up manner.

6. Generating Association Rules:

➔ Calculate confidence for each frequent itemset.


➔ Prune rules that do not meet the minimum confidence
threshold.

Output: Trial 1

Pseudocode for ECLAT algorithm

In this implementation, we leverage the publicly available


Market Basket Optimization dataset from Kaggle,
encompassing transaction records of a retail company over a
prospect of amalgamating these algorithms into a unified,
Trial 2 more efficient computational framework is tantalizing. Such
integration could pave the way for innovative algorithms
capable of delving deeper into datasets, unlocking new
dimensions of understanding, and driving impactful
advancements across multifaceted domains.

REFERENCES

[1] Heaton J.T. College of Engineering and Computing.


Nova Southeastern University. Ft. Lauderable.
“Comparing Dataset Characteristics that Favor the
Apriori, Eclat or FP Growth Frequent Itemset Mining
Algorithm”, conference paper,IEEE, Jan 2016.
Trial 3 [2] Aggarwal C.C. and Yu P.S. Mining association with the
collective strength approach, knowledge and data engg.,
IEEE, 13(6) 863-873 (2001).
[3] Jiauei Han and Michele Kamber, Data mining Concepts
and Techniques, Simon Fraser University, ISBN 1-
55860-489-8-(2001).
[4] Borgelt, C. (2013). Efficient Implementations of Apriori
and Eclat Efficient Implementations of Apriori and
Eclat. December 2003.
[5] Girotra, M. (2013). Comparative Survey on Association
Rule Mining Algorithms. 84(10), 18–22.
[6] Arif, H., Akib, M., Solaiman, K. M., Pritom, T. H., &
Science, C. (2017). Market basket Analysis for
improving the effectiveness of marketing and sales using
FP Growth and Eclat Algorithm. August, 1–61.
V. CONCLUSION [7] V, R. A. A. K. R., & Jitkar, B. D. (2012). Association
Rule – Extracting Knowledge Using Market Basket
The Eclat and FP Growth algorithms underwent a thorough Analysis. 1(2), 19–27.
comparative analysis to assess their operational efficiencies, [8] M. Krishnamurthy, E. Rajalakshmi, R. Baskaran, A.
blending both runtime assessments and theoretical Kannan, “Prediction of customer buying nature from
considerations. Despite clear disparities in their strategic frequent itemsets generation using Quine-McCluskey
method”, IET Chennai Fourth International Conference
approaches, a surprising similarity emerged in their runtime
on Sustainable Energy and Intelligent Systems
performances. However, the empirical evidence decisively (SEISCON 2013), Pages 12-14, 2013.
favors the Eclat algorithm, showcasing its superior
performance and time efficiency over FP Growth.

This conclusion is firmly supported by a meticulous


evaluation of the graphical results derived from an online
dataset, unequivocally demonstrating Eclat's dominance.
Notably, a discernible trend surfaces indicating a decrement
in execution time with an increase in support, a vital insight
crucial for optimizing these algorithms.

Beyond their current application, these algorithms boast


versatility transcending diverse domains, capable of
unearthing fascinating insights within various datasets.
Moreover, the potential amalgamation of association rules
generated by both algorithms holds promise in devising more
effective algorithms tailored for real-world applications. The

You might also like