Professional Documents
Culture Documents
Introduction:
• It uses frequent itemsets to generate association rules and it works on databases that contain
transactions.
• With the help of this association rule , it determines how strongly or weakly two objects are
connected.
• This algorithm uses breadth – first search and Hash Tree to calculate the itemset associations
efficiently.
• This process works iteratively for finding the frequent itemsets from the large dataset.
Introduction:
• Lift: It is the ratio of observed support to that expected if two rules were
independent.
• The lift value close to 1 means that the rules were completely
independent.
• Lift values >1 are more useful and it is indicative of a useful rule pattern.
Steps For Apriori Algorithm
• Step 1: Find the support of itemsets in the transactional database , and
select the minimum support and confidence.
• Step 2: Take all the supports in the transaction with higher support value
than the selected or minimum support value.
• Step 3: Find all the rules of these subsets that have higher confidence
value than the threshold or minimum confidence.
Steps For Apriori Algorithm
• Step 4: Sort the rules as the decreasing order of fit.
Apriori Algorithm Working
• We have a dataset that contains various transactions .
• From this dataset , we need to find the frequent itemsets and create
association rules using the Apriori algorithm.
Apriori Algorithm Working
• Given: Min Support = 2, Minimum Confidence = 50%.
Solution
• Step 1:
• The first step is to create a table that contains the support count.
• The frequency of each itemset individually in the dataset of each itemset
in the given dataset.
• This table is called the candidate set or C1.
Solution
Solution
• Next , take out all the itemsets that have the greater support count than the
minimum support (2).
• This will give us the table for the frequent itemset L1.
• All the itemsets have greater or equal support count than the minimum
support , except E , so E itemset will be removed.
Solution
Solution
• Step 2: Candidate Generation , C2 and L2:
• In this step , we will generate C2 with the help of L1.
• In C2 , create a pair of itemsets of L1 in the form of subsets.
• After creating the subsets , we will again find the support count from the
main transaction table of datasets, i.e , how many times these pairs have
occured together in the given dataset.
Solution
• Step 2: Candidate Generation , C2 and L2:
• This will give us the below table for C2.
Solution
• Again , we need to compare the C2 support count with the minimum
support count , and after comparing , the itemset with less support count
will be eliminated from the table C2.
• This will give us the table for L2.
Solution
• Candidate Generation , C3 and L3:
• For C3 , repeat the same two processes.
• Now , form the C3 table with the subsets of three itemsets together.
• Calculate the support count from the dataset.
Solution
Solution
• Next , create the L3 table.
• From the C3 table , we can find that there is only one combination of
itenset that has support count equal to the minimum support count.
• L3 will have only one combination, {A,B,C}.
Solution
• The min . Threshold or confidence level is 50%, so first three rules are
considered as strong association rules for the given problem.
Advantages:
• Import Step:
• The first step is to import the dataset required for our model:
• The rows of the dataset are showing transactions made by the customers.
• The first row in the transaction is made by the first customer.
• There is no specific name for each column and they have their own individual value or product
details.
• There is no header specified.
Python Implementation:
• Import Step:
• df = pd.read_excel('http://archive.ics.uci.edu/ml/machine-learning-
databases/00352/Online%20Retail.xlsx')
• df.head()
Python Implementation:
Python Implementation:
• if x >= 1:
• return 1
• basket_sets = basket.applymap(encode_units)
• basket_sets.drop('POSTAGE', inplace=True, axis=1)
Python Implementation:
• When we look at the rules , we can find that red and green alarm clocks are
purchased together .
• Red paper cups , napkins and plates are purchased together.
• The popularity of one product can be used to drive the sales of the other product.
• For eg, we can find that we sell 340 Green Alarm clocks but only 316 Red Alarm
Clocks.
Python Implementation:
• So , we can drive more Red Alarm clock sales through recommendations.
• basket['ALARM CLOCK BAKELIKE GREEN'].sum()
• 340.0
• 316.0
Python Implementation:
• The combinations vary by country of purchase.
• basket2 = (df[df['Country'] =="Germany"]
• .groupby(['InvoiceNo', 'Description'])['Quantity']
• .sum().unstack().reset_index().fillna(0)
• .set_index('InvoiceNo'))
Python Implementation:
• basket_sets2 = basket2.applymap(encode_units)
• basket_sets2.drop('POSTAGE', inplace=True, axis=1)
• frequent_itemsets2 = apriori(basket_sets2, min_support=0.05, use_colnames=True)
• rules2 = association_rules(frequent_itemsets2, metric="lift", min_threshold=1)