Association Rules Problem Statement

Topic: Association Rules
Instructions
Please share your answers filled inline in the word document. Submit Python code and R code
files wherever applicable.
Please ensure you update all the details:

Name: GURRAM DATHU SWAMY
Batch Id: DS_08032021
Topic: - Association Rules.
Hints:
1. Business Problem
1.1. Objective
1.2. Constraints (if any)
2. Work on each feature of the dataset to create a data dictionary as displayed in the below
image:
Using R and Python codes perform:

3.Data Pre-processing
3.1 Data Cleaning, Feature Engineering, etc.
4.Model Building
4.1 Application of Apriori Algorithm.
4.2 Build most frequent item sets and plot the rules.
4.3 Work on both R and Python Codes.
5.Deployment
5.1 Deploy solutions using R shiny and Python Flask.
6. Result Share the benefits/impact of the solution - how or in what way the business (client)
gets benefit from the solution provided.
© 2013 - 2020 360DigiTMG. All Rights Reserved.

Note:
1. For each assignment, the solution should be submitted in the above format
2. Research and Perform all possible steps for improving the rules and also
check if you can take out sub rules from main rules.
3. All the codes (executable programs) are running without errors
4. Documentation of the module should be submitted along with R & Python codes, elaborating
on every step mentioned here that is commenting is necessary in the codes.
5. Please send all files at once whilst submitting assignments.
Problem Statement: -
Kitabi Duniya , a famous book store in India, which was established before Independence, the growth
of the company was incremental year by year, but due to online selling of books and wide spread Internet
access its annual growth started to collapse, seeing sharp downfalls, you as a Data Scientist help this heritage
book store gain its popularity back and increase footfall of customers and provide ways the business can
improve exponentially, apply Association Rule Algorithm, explain the rules, and visualize the graphs for clear
understanding of solution.
1.) Books.csv

R-code
install.packages("arules")
data()
library("arules") # Used for building association rules i.e. apriori algorithm

bks <- read.csv(file.choose())
bks
# making rules using apriori algorithm

# Keep changing support and confidence values to obtain different rules
# Building rules using apriori algorithm

arules <- apriori(bks, parameter = list(support = 0.003, confidence = 0.65, minlen = 2))
arules
# Viewing rules based on lift value

inspect(head(sort(arules, by = "lift"))) # to view we use inspect
# Overal quality
head(quality(arules))
# install.packages("arueslViz")
library("arulesViz") # for visualizing rules
# Different Ways of Visualizing Rules

plot(arules)
windows()
plot(arules, method = "grouped")
plot(arules[1:10], method = "graph") # for good visualization try plotting only few rules
write(arules, file = "bk_rules.csv", sep = ",")
getwd()

lhs rhs support confidence coverage lift count
[1] {ChildBks=[0,1]} => {YouthBks=[0,1]} 1 1 1 1 2000
[2] {YouthBks=[0,1]} => {ChildBks=[0,1]} 1 1 1 1 2000
[3] {ChildBks=[0,1]} => {CookBks=[0,1]} 1 1 1 1 2000
[4] {CookBks=[0,1]} => {ChildBks=[0,1]} 1 1 1 1 2000
[5] {ChildBks=[0,1]} => {DoItYBks=[0,1]} 1 1 1 1 2000
[6] {DoItYBks=[0,1]} => {ChildBks=[0,1]} 1 1 1 1 2000
All the book types were evenly distributed except some books were brought together like child books-
youthbooks, childbooks-cookbooks, childbooks-DoltYBks.
Pay attention to these brought together types and the books in the library needs to be sorted(side by
side) according to these types.
Python Code:
# Implementing Apriori algorithm from mlxtend
# conda install mlxtend

# or
# pip install mlxtend
pip install mlxtend
from mlxtend.plotting import plot_decision_regions
import pandas as pd
from mlxtend.frequent_patterns import apriori, association_rules
from io import StringIO
groceries = []
with open(C:\Users\Admin\AppData\Local\Temp\Temp2_Datasets_Association Rules.zip\book.csv) as f:
groceries = f.read()
books=StringIO(groceries)
# splitting the data into separate transactions using separator as "\n"
book=pd.read_csv(books)
frequent_itemsets = apriori(book, min_support = 0.0075, max_len = 4, use_colnames = True)
# Most Frequent item sets based on support

frequent_itemsets.sort_values('support', ascending = False, inplace = True)
plt.bar(x = list(range(0, 11)), height = frequent_itemsets.support[0:11], color ='rgmyk')

plt.xticks(list(range(0, 11)), frequent_itemsets.itemsets[0:11], rotation=20)
plt.xlabel('item-sets')
plt.ylabel('support')
plt.show()
rules = association_rules(frequent_itemsets, metric = "lift", min_threshold = 1)

rules.head(20)
rules.sort_values('lift', ascending = False).head(10)
################################# Extra part ###################################

def to_list(i):
return (sorted(list(i)))
ma_X = rules.antecedents.apply(to_list) + rules.consequents.apply(to_list)
ma_X = ma_X.apply(sorted)
rules_sets = list(ma_X)
unique_rules_sets = [list(m) for m in set(tuple(i) for i in rules_sets)]
index_rules = []
for i in unique_rules_sets:
index_rules.append(rules_sets.index(i))
# getting rules without any redudancy

rules_no_redudancy = rules.iloc[index_rules, :]
# Sorting them with respect to list and getting top 10 rules

rules_no_redudancy.sort_values('lift', ascending = False).head(10)

Problem Statement:
The Departmental Store, has gathered the data of the products it sells on a
Daily basis. Using Association Rules concepts, provide the insights on the rules and the plots.
2.) Groceries.csv


grc <- read.csv(file.choose())
grc


grules <- apriori(grc, parameter = list(support = 0.003, confidence = 0.85, minlen = 2))
grules

inspect(head(sort(grules, by = "lift"))) # to view we use inspect
# Overal quality
head(quality(grules))

plot(grules)
windows()
plot(grules, method = "grouped")
plot(grules[1:10], method = "graph") # for good visualization try plotting only few rules
write(grules, file = "groceries_rules.csv", sep = ",")
getwd()

Semi finished bread = sausage has more lift. Semi finished bread, pot plants, margraine, citrus fruit candy
were brought together(more lifting).
Citrus fruit and specialty bar has more support.
Python code:

# or
pip install mlxtend
import pandas as pd
groceries = []
with open(C:\Users\Admin\AppData\Local\Temp\Temp2_Datasets_Association Rules.zip\groceries.csv) as f:

groceries = groceries.split("\n")
groceries_list = []
for i in groceries:
groceries_list.append(i.split(","))
all_groceries_list = [i for item in groceries_list for i in item]
from collections import Counter # ,OrderedDict
item_frequencies = Counter(all_groceries_list)
# after sorting
item_frequencies = sorted(item_frequencies.items(), key = lambda x:x[1])
# Storing frequencies and items in separate variables

frequencies = list(reversed([i[1] for i in item_frequencies]))
items = list(reversed([i[0] for i in item_frequencies]))
# barplot of top 10
import matplotlib.pyplot as plt
plt.bar(height = frequencies[0:11], x = list(range(0, 11)), color = 'rgbkymc')

plt.xticks(list(range(0, 11), ), items[0:11])
plt.xlabel("items")
plt.ylabel("Count")
plt.show()
# Creating Data Frame for the transactions data

groceries_series = pd.DataFrame(pd.Series(groceries_list))
groceries_series = groceries_series.iloc[:9835, :] # removing the last empty transaction
groceries_series.columns = ["transactions"]
# creating a dummy columns for the each item in each transactions ... Using column names as item name
X = groceries_series['transactions'].str.join(sep = '*').str.get_dummies(sep = '*')
frequent_itemsets = apriori(X, min_support = 0.0075, max_len = 4, use_colnames = True)


plt.show()

rules.head(20)
################################# Extra part ###################################
def to_list(i):
index_rules = []



Problem Statement:
A film distribution company wants to target audience based on their likes and dislikes, you as a Chief
Data Scientist Analyze the data and come up with different rules of movie list so that the business
objective is achieved.
3.) my_movies.csv


mymv <- read.csv(file.choose())
mymv


mvrules <- apriori(mymv, parameter = list(support = 0.1, confidence = 1, minlen = 2))
mvrules

inspect(head(sort(mvrules, by = "lift"))) # to view we use inspect
# Overal quality
head(quality(mvrules))

plot(mvrules)
windows()
plot(mvrules, method = "grouped")
plot(mvrules[1:10], method = "graph") # for good visualization try plotting only few rules
write(mvrules, file = "my_movies_rules.csv", sep = ",")
getwd()

Python code:

# or
pip install mlxtend
import pandas as pd
import matplotlib.pylab as plt
groceries = []
with open(C:\Users\Admin\AppData\Local\Temp\Temp2_Datasets_Association Rules.zip\
my_movies.csv) as f:
Movies=StringIO(groceries)
movie=pd.read_csv(Movies)
movie_like=movie.iloc[:,5:15]
frequent_itemsets = apriori(movie_like, min_support = 0.0075, max_len = 4, use_colnames = True)


plt.show()

rules.head(20)
################################# Extra part ###################################

def to_list(i):
index_rules = []




A Mobile Phone manufacturing company wants to launch its three brand new phone into the market,
but before going with its traditional marketing approach this time it want to analyze the data of its
previous model sales in different regions and you have been hired as an Data Scientist to help them out,
use the Association rules concept and provide your insights to the company’s marketing team to
improve its sales.
4.) myphonedata.csv


phn <- read.csv(file.choose())
phn


phrules <- apriori(phn, parameter = list(support = 0.05, confidence = 0.55, minlen = 2))
phrules

inspect(head(sort(phrules, by = "lift"))) # to view we use inspect
# Overal quality
head(quality(phrules))

plot(phrules)
windows()

plot(phrules, method = "grouped")
plot(phrules[1:10], method = "graph") # for good visualization try plotting only few rules
write(phrules, file = "my_phonedata.csv", sep = ",")
getwd()
Python Code:

# or
pip install mlxtend
import pandas as pd
import matplotlib.pylab as plt
groceries = []
with open(C:\Users\Admin\AppData\Local\Temp\Temp2_Datasets_Association Rules.zip\myphonedata.csv)
as f:
phones=StringIO(groceries)
movie=pd.read_csv(phones)
Phone_like=movie.iloc[:,3:9]
frequent_itemsets = apriori(Phone_like, min_support = 0.005, max_len = 3, use_colnames = True)


plt.show()

rules.head(20)
################################# Extra part ###################################

def to_list(i):
index_rules = []



A retail store in India, has its transaction data, and it would like to know the buying pattern of the
consumers in its locality, you have been assigned this task to provide the manager with rules on how the
placement of products needs to be there in shelves so that it can improve the buying patterns of
consumes and increase customer footfall.
5.) transaction_retail.csv


trn <- read.csv(file.choose())
trn


trnrules <- apriori(trn, parameter = list(support = 0.002, confidence = 0.85, minlen = 2))
trnrules

inspect(head(sort(trnrules, by = "lift"))) # to view we use inspect
# Overal quality
head(quality(trnrules))

plot(trnrules)
windows()
plot(trnrules, method = "grouped")
plot(trnrules[1:10], method = "graph") # for good visualization try plotting only few rules
write(trnrules, file = "transaction.csv", sep = ",")
getwd()

Poppy’s and playhouse has more support and also more lift than another products.
Python-code:
# or
pip install mlxtend
import pandas as pd

groceries = []
with open(C:\Users\Admin\AppData\Local\Temp\Temp2_Datasets_Association Rules.zip\

transactions_retail1.csv) as f:
bad_chars = [';', ':', '!', '*','"','.','&','(',')','+','-','/',' ','...','#','?',',,',',,,',',,,,',',,,,,','NA']
for i in bad_chars :
groceries = groceries.replace(i,'')
groceries = groceries.split("\n")
groceries_list = []
for i in groceries:
groceries_list.append(i.split(",") )
all_groceries_list = [i for item in groceries_list for i in item if str(i)!='NA']
from collections import Counter # ,OrderedDict
item_frequencies = Counter(all_groceries_list)
# after sorting
item_frequencies = sorted(item_frequencies.items(), key = lambda x:x[1])

# Storing frequencies and items in separate variables
frequencies = list(reversed([i[1] for i in item_frequencies]))
items = list(reversed([i[0] for i in item_frequencies]))
# barplot of top 10
import matplotlib.pyplot as plt
plt.bar(height = frequencies[0:11], x = list(range(0, 11)), color = 'rgbkymc')
plt.xticks(list(range(0, 11), ), items[0:11])
plt.xlabel("items")
plt.ylabel("Count")
plt.show()
# Creating Data Frame for the transactions data
groceries_series = pd.DataFrame(pd.Series(groceries_list))
groceries_series.dropna(axis=0)
groceries_series = groceries_series.iloc[:13698, :] # removing the last empty transaction
groceries_series.columns = ["transactions"]
# creating a dummy columns for the each item in each transactions ... Using column names as item
name
X = groceries_series['transactions'].str.join(sep = '*').str.get_dummies(sep = '*')
frequent_itemsets = apriori(X, min_support = 0.0075, max_len = 4, use_colnames = True)

plt.show()
rules.head(20)
################################# Extra part ###################################
def to_list(i):
index_rules = []


Association Rules Problem Statement

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Association Rules Problem Statement

Uploaded by

Copyright:

Available Formats

Topic: Association Rules

Please ensure you update all the details:

Topic: - Association Rules.

Using R and Python codes perform:

© 2013 - 2020 360DigiTMG. All Rights Reserved.

5. Please send all files at once whilst submitting assignments.

© 2013 - 2020 360DigiTMG. All Rights Reserved.

library("arules") # Used for building association rules i.e. apriori algorithm

# making rules using apriori algorithm

# Building rules using apriori algorithm

# Viewing rules based on lift value

# Different Ways of Visualizing Rules

write(arules, file = "bk_rules.csv", sep = ",")

© 2013 - 2020 360DigiTMG. All Rights Reserved.

# Implementing Apriori algorithm from mlxtend

# conda install mlxtend

frequent_itemsets = apriori(book, min_support = 0.0075, max_len = 4, use_colnames = True)

# Most Frequent item sets based on support

plt.bar(x = list(range(0, 11)), height = frequent_itemsets.support[0:11], color ='rgmyk')

rules = association_rules(frequent_itemsets, metric = "lift", min_threshold = 1)

################################# Extra part ###################################

ma_X = rules.antecedents.apply(to_list) + rules.consequents.apply(to_list)

unique_rules_sets = [list(m) for m in set(tuple(i) for i in rules_sets)]

# getting rules without any redudancy

# Sorting them with respect to list and getting top 10 rules

© 2013 - 2020 360DigiTMG. All Rights Reserved.

© 2013 - 2020 360DigiTMG. All Rights Reserved.

library("arules") # Used for building association rules i.e. apriori algorithm

# making rules using apriori algorithm

# Building rules using apriori algorithm

# Viewing rules based on lift value

# Different Ways of Visualizing Rules

write(grules, file = "groceries_rules.csv", sep = ",")

© 2013 - 2020 360DigiTMG. All Rights Reserved.

Citrus fruit and specialty bar has more support.

# conda install mlxtend

# splitting the data into separate transactions using separator as "\n"

from collections import Counter # ,OrderedDict

# Storing frequencies and items in separate variables

plt.bar(height = frequencies[0:11], x = list(range(0, 11)), color = 'rgbkymc')

# Creating Data Frame for the transactions data

frequent_itemsets = apriori(X, min_support = 0.0075, max_len = 4, use_colnames = True)

# Most Frequent item sets based on support

plt.bar(x = list(range(0, 11)), height = frequent_itemsets.support[0:11], color ='rgmyk')

rules = association_rules(frequent_itemsets, metric = "lift", min_threshold = 1)

ma_X = rules.antecedents.apply(to_list) + rules.consequents.apply(to_list)

unique_rules_sets = [list(m) for m in set(tuple(i) for i in rules_sets)]

# getting rules without any redudancy

# Sorting them with respect to list and getting top 10 rules

© 2013 - 2020 360DigiTMG. All Rights Reserved.

© 2013 - 2020 360DigiTMG. All Rights Reserved.

library("arules") # Used for building association rules i.e. apriori algorithm

# making rules using apriori algorithm

# Building rules using apriori algorithm

# Viewing rules based on lift value

# Different Ways of Visualizing Rules

write(mvrules, file = "my_movies_rules.csv", sep = ",")

© 2013 - 2020 360DigiTMG. All Rights Reserved.

# conda install mlxtend

# Most Frequent item sets based on support

plt.bar(x = list(range(0, 11)), height = frequent_itemsets.support[0:11], color ='rgmyk')

rules = association_rules(frequent_itemsets, metric = "lift", min_threshold = 1)

################################# Extra part ###################################

ma_X = rules.antecedents.apply(to_list) + rules.consequents.apply(to_list)

X = groceries_series['transactions'].str.join(sep = '').str.get_dummies(sep = '')