Professional Documents
Culture Documents
Instructions
Please share your answers filled inline in the word document. Submit Python code and R code
files wherever applicable.
Hints:
1. Business Problem
1.1. Objective
1.2. Constraints (if any)
2. Work on each feature of the dataset to create a data dictionary as displayed in the below
image:
4.Model Building
4.1 Application of Apriori Algorithm.
4.2 Build most frequent item sets and plot the rules.
4.3 Work on both R and Python Codes.
5.Deployment
5.1 Deploy solutions using R shiny and Python Flask.
6. Result Share the benefits/impact of the solution - how or in what way the business (client)
gets benefit from the solution provided.
1. For each assignment, the solution should be submitted in the above format
2. Research and Perform all possible steps for improving the rules and also
check if you can take out sub rules from main rules.
3. All the codes (executable programs) are running without errors
4. Documentation of the module should be submitted along with R & Python codes, elaborating
on every step mentioned here that is commenting is necessary in the codes.
Problem Statement: -
Kitabi Duniya , a famous book store in India, which was established before Independence, the growth
of the company was incremental year by year, but due to online selling of books and wide spread Internet
access its annual growth started to collapse, seeing sharp downfalls, you as a Data Scientist help this heritage
book store gain its popularity back and increase footfall of customers and provide ways the business can
improve exponentially, apply Association Rule Algorithm, explain the rules, and visualize the graphs for clear
understanding of solution.
1.) Books.csv
install.packages("arules")
data()
bks
# Overal quality
head(quality(arules))
# install.packages("arueslViz")
library("arulesViz") # for visualizing rules
windows()
plot(arules, method = "grouped")
plot(arules[1:10], method = "graph") # for good visualization try plotting only few rules
getwd()
All the book types were evenly distributed except some books were brought together like child books-
youthbooks, childbooks-cookbooks, childbooks-DoltYBks.
Pay attention to these brought together types and the books in the library needs to be sorted(side by
side) according to these types.
Python Code:
ma_X = ma_X.apply(sorted)
rules_sets = list(ma_X)
index_rules = []
for i in unique_rules_sets:
index_rules.append(rules_sets.index(i))
grc
# Overal quality
head(quality(grules))
# install.packages("arueslViz")
library("arulesViz") # for visualizing rules
windows()
plot(grules, method = "grouped")
plot(grules[1:10], method = "graph") # for good visualization try plotting only few rules
getwd()
Python code:
# Implementing Apriori algorithm from mlxtend
groceries = []
with open(C:\Users\Admin\AppData\Local\Temp\Temp2_Datasets_Association Rules.zip\groceries.csv) as f:
groceries = f.read()
groceries_list = []
for i in groceries:
groceries_list.append(i.split(","))
© 2013 - 2020 360DigiTMG. All Rights Reserved.
all_groceries_list = [i for item in groceries_list for i in item]
item_frequencies = Counter(all_groceries_list)
# after sorting
item_frequencies = sorted(item_frequencies.items(), key = lambda x:x[1])
# barplot of top 10
import matplotlib.pyplot as plt
groceries_series.columns = ["transactions"]
# creating a dummy columns for the each item in each transactions ... Using column names as item name
X = groceries_series['transactions'].str.join(sep = '*').str.get_dummies(sep = '*')
ma_X = ma_X.apply(sorted)
rules_sets = list(ma_X)
index_rules = []
for i in unique_rules_sets:
index_rules.append(rules_sets.index(i))
mymv
# Overal quality
head(quality(mvrules))
# install.packages("arueslViz")
library("arulesViz") # for visualizing rules
windows()
plot(mvrules, method = "grouped")
plot(mvrules[1:10], method = "graph") # for good visualization try plotting only few rules
getwd()
ma_X = ma_X.apply(sorted)
rules_sets = list(ma_X)
index_rules = []
phn
# Overal quality
head(quality(phrules))
# install.packages("arueslViz")
library("arulesViz") # for visualizing rules
windows()
getwd()
Python Code:
# Implementing Apriori algorithm from mlxtend
ma_X = ma_X.apply(sorted)
rules_sets = list(ma_X)
index_rules = []
for i in unique_rules_sets:
index_rules.append(rules_sets.index(i))
trn
# Overal quality
head(quality(trnrules))
# install.packages("arueslViz")
library("arulesViz") # for visualizing rules
windows()
plot(trnrules, method = "grouped")
plot(trnrules[1:10], method = "graph") # for good visualization try plotting only few rules
getwd()
Python-code:
# or
import pandas as pd
groceries = f.read()
for i in bad_chars :
groceries = groceries.replace(i,'')
groceries = groceries.split("\n")
groceries_list = []
for i in groceries:
groceries_list.append(i.split(",") )
item_frequencies = Counter(all_groceries_list)
# after sorting
# barplot of top 10
plt.xlabel("items")
plt.ylabel("Count")
plt.show()
groceries_series = pd.DataFrame(pd.Series(groceries_list))
groceries_series.dropna(axis=0)
groceries_series.columns = ["transactions"]
# creating a dummy columns for the each item in each transactions ... Using column names as item
name
plt.xlabel('item-sets')
plt.ylabel('support')
plt.show()
rules.head(20)
def to_list(i):
return (sorted(list(i)))
ma_X = ma_X.apply(sorted)
rules_sets = list(ma_X)
index_rules = []
for i in unique_rules_sets:
index_rules.append(rules_sets.index(i))
rules_no_redudancy = rules.iloc[index_rules, :]