You are on page 1of 12

Market Basket Analysis

Interim Progress Report (IPR)

University of Hertfordshire

Contents
Background Research ...................................................................................................................................2
Summary of Progress to date .......................................................................................................................3
Consideration of Ethical, Legal, Professional, and Social Issue.....................................................................5
Ethical Issues.............................................................................................................................................5
Legal Issues ...............................................................................................................................................5
Professional Issue .....................................................................................................................................5
Social Issues ..............................................................................................................................................5
Project Plan...................................................................................................................................................5
Bibliography..................................................................................................................................................6
Appendix.......................................................................................................................................................7
Appendix 1................................................................................................................................................7
Appendix 02............................................................................................................................................12
Background Research
The retail sector has gradually increased throughout its life. Two decades ago, the retail industry
was different from the present era. In the current period, technology has changed everything in the
retail sector. This change process has given rise to new opportunities for businesses and
consumers. However, the expansion of e-commerce puts tremendous pressure on retailers,
particularly when it comes to powerful companies like Amazon and Walmart Inc. Multiples
techniques are being analyzed to enhance sales in retail. Different companies are tracking their
consumer buying patterns to identify the product combination to uncover associations between
customers and their buying patterns. They analyze customers' purchasing patterns by identifying
which items are purchased together, which items are purchased with other items, and which items
are purchased according to season. This process produces valuable information; if customers buy
a set of items, what is the probability they will buy another set of items? For instance, if customers
purchase bundles of water bottles in the summer season, what is the likelihood they will also buy
bottles of juices? This information is used to enhance the sale of items and boosts the total profit
by redesigning the store layout and placing together those purchased products. (Hruschka, H., 2021)
So, the question arises: How can Market basket Analysis be used to enhance sales in retail?
Suppose a retail store manager would like to know the purchasing patterns of consumers. More
specifically, he aims to determine, "Which set of goods would the customers buy on a particular
visit to the store? To find an association between items and to identify purchasing patterns, one
must perform a Market Basket Analysis (MBA) on a database that consists of different customer
transactions at the store. These transactions may occur throughout the day, week, month, or year.
After achieving the results, marketing or advertising tactics are devised, and the store's layout is
revised. For instance, if there is a high probability that customers buy bread with butter, then place
the butter near the bread to increase the sales of both items. However, the store's sales can be
reduced if these items are at different places. So, the main objective of Market Basket Analysis is
to identify items or groups of items that frequently occur together (or are related) in purchasing
transactions. (Hossain, M, 2019)
Many algorithms have been proposed to identify the association between items. In simple terms
finding an association between objects is called mining association, and the mining association
rule is a crucial measurement. An Association rule is of the form 𝑋 => 𝑌, where Y is referred to
as consequent and X is referred to as antecedent. According to that rule, if a customer buys X
product is more probability of buying product Y. This rule is measured by support and confidence.
Therefore, the association rules must meet the minimal support and confidence levels that the user
has specified. For example, Apriori and FP Growth are two Market Basket Analysis algorithms
used to identify associations between items and find frequent itemsets. A lot of studies have been
done to find the association between items. For example, one author used the Apriori algorithm to
the association between items set by taking minimum support= 1% and minimum
confidence=50%. (Arora and Arora, 2022)
On the other hand, one author used the FP Growth algorithm to solve the disadvantages of the
Apriori algorithm. According to the author, FP growth construct the FP tree with substantially
compressed information. So here the question arises, Which market basket analysis algorithm
is more efficient for mining frequent itemsets?

Summary of Progress to date


There are eight deliverables in my project, Such as

1. Introduction to Project
I have done extensive research on market basket analysis. To complete this deliverable, I have
completed multiple tasks
 Background Study
This first deliverable is 100% complete.

2. Research Question One


I have designed my first Research Question. To this deliverable, I have done two tasks
 Problem Discussion
 Purpose of study
The Second deliverable is 100% complete

3. Research Question Two


To do this deliverable, I have completed two task
 Studies on MBA
 Literature Review
The third deliverable is 100% complete.
After all these deliverables, I have completed my Interm Progress Report.
I am working on the Practical part of my project. Below I have discussed my practical part

Identify dataset
The Identification of data is a crucial part of my project. I have identified the dataset of a retail
store on which I will perform the Market Basket Analysis Algorithm to find associations between
items. After seeing the dataset, the next step is purifying it to make it useable by performing the
data preprocessing. Unfortunately, I faced multiple data preprocessing problems like missing,
repeated, and redundant values in the data set. After the Indentification of data, Import data sets
See Appendix 01 for Code
Market Basket Analysis using python
Different people use R language to perform Market Basket Analysis, and others use JAVA
language to perform MBA. After Identifying the dataset, the next step was identifying which
programming language would be more suitable for MBA. I chose the python language.
I have a complete concept of how I will MBA in Python, Such as
 Import dataset
 Remove all Null values
 Encode the data
 Filter the transaction
 Apply the Apriori Algorithm
 Finding the Association
 Apply the FP Growth
 Finding the association
 Compare the Results

Importing Libraries
According to the data specification, I have identified the libraries I will use in the Practical part.
 Import pandas as pd
 Import gapandas import, query
 From mlstend.frequent_patterns import Apriori
 From mlstend.frequent_patterns import FP Growth
 From mlstend.frequent_patterns Association Rule
See Appendix 02 for code
I am working on the remaining Deliverables. Such as
4. Methodology
5. Perform Practical Part
6. Comparison of Both algorithm results (Apriori & FP Growth)
7. Result Oriented Discussion
8. Final Report
Consideration of Ethical, Legal, Professional, and Social Issue
Ethical Issues
There are ethical consequences when we perform learning algorithms on massive data sets and
produce patterns and models. For example, we perform Market Basket Analysis on data sets to
identify customers purchasing and to know the association between items it can cause ethical
concerns. For instance, if a company receives Sainbury data to apply market basket analysis, if
they will share Sainsbury data with other stores like Tesco, Aldi, and Iceland, it will cause ethical
issues because the company breaches privacy. However, there are no ethical issues involved in my
project.

Legal Issues
There are not any legal issues involved in Markey Basket Analysis. The company reveals customer
buying patterns in market basket analysis, not personal information.

Professional Issue
There are no professional issues in my project.

Social Issues
There are no professional issues in my project.

Project Plan
I have completed the First three deliverables, and the primary task is Perform MBA using python.
Then, apply Apriori and FP Growth algorithm. After the completion of these tasks, the practical
deliverable will be complete. After the completion of all deliverables that I have mentioned in the
Gantt chart, I will be able to evaluate my project.
I have designed Gantt Chart to Show my Project Plan.

Bibliography
Hruschka, H., 2021. Comparing unsupervised probabilistic machine learning methods for market basket
analysis. Review of Managerial Science, 15(2), pp.497-527.

Hossain, M., Sattar, A.S. and Paul, M.K., 2019, December. Market basket analysis using apriori and FP
growth algorithm. In 2019 22nd International Conference on Computer and Information Technology
(ICCIT) (pp. 1-6). IEEE.

Singh, H., Shelke, N., Bavaskar, A., Nikam, S., Shewale, P. and Mahajan, D., 2021. Study on Market Basket
Analysis with Apriori Algorithm Approach.

Trnka, A., 2010, June. Market basket analysis with data mining methods. In 2010 International Conference
on Networking and Information Technology (pp. 446-450). IEEE.
Zamil, A.M.A., Al Adwan, A. and Vasista, T.G., 2020. Enhancing customer loyalty with market basket
analysis using innovative methods: a python implementation approach. International Journal of Innovation,
Creativity and Change, 14(2), pp.1351-1368.

Appendix
Appendix 1
>>> df = pd.read_csv('Datasets/BL-Flickr-Images-Book.csv') >>> df = >>> df = pd.read_csv('Datasets/BL-
Flickr-Images-Book.csv')
>>> df.head()

Identifier Edition Statement Place of Publication \


0 206 NaN London
1 216 NaN London; Virtue & Yorston
2 218 NaN London
3 472 NaN London
4 480 A new edition, revised, etc. London

Date of Publication Publisher \


0 1879 [1878] S. Tinsley & Co.
1 1868 Virtue & Co.
2 1869 Bradbury, Evans & Co.
3 1851 James Darling
4 1857 Wertheim & Macintosh

Title Author \
0 Walter Forbes. [A novel.] By A. A A. A.
1 All for Greed. [A novel. The dedication signed... A., A. A.
2 Love the Avenger. By the author of “All for Gr... A., A. A.
3 Welsh Sketches, chiefly ecclesiastical, to the... A., E. S.
4 [The World in which I live, and my place in it... A., E. S.

Contributors Corporate Author \


0 FORBES, Walter. NaN
1 BLAZE DE BURY, Marie Pauline Rose - Baroness NaN
2 BLAZE DE BURY, Marie Pauline Rose - Baroness NaN
3 Appleyard, Ernest Silvanus. NaN
4 BROOME, John Henry. NaN

Corporate Contributors Former owner Engraver Issuance type \


0 NaN NaN NaN monographic
1 NaN NaN NaN monographic
2 NaN NaN NaN monographic
3 NaN NaN NaN monographic
4 NaN NaN NaN monographic

Flickr URL \
0 http://www.flickr.com/photos/britishlibrary/ta...
1 http://www.flickr.com/photos/britishlibrary/ta...
2 http://www.flickr.com/photos/britishlibrary/ta...
3 http://www.flickr.com/photos/britishlibrary/ta...
4 http://www.flickr.com/photos/britishlibrary/ta...

Shelfmarks
0 British Library HMNTS 12641.b.30. >>> df = pd.read_csv('Datasets/BL-Flickr-Images-Book.csv')
>>> df.head()

Identifier Edition Statement Place of Publication \


0 206 NaN London
1 216 NaN London; Virtue & Yorston
2 218 NaN London
3 472 NaN London
4 480 A new edition, revised, etc. London

Date of Publication Publisher \


0 1879 [1878] S. Tinsley & Co.
1 1868 Virtue & Co.
2 1869 Bradbury, Evans & Co.
3 1851 James Darling
4 1857 Wertheim & Macintosh

Title Author \
0 Walter Forbes. [A novel.] By A. A A. A.
1 All for Greed. [A novel. The dedication signed... A., A. A.
2 Love the Avenger. By the author of “All for Gr... A., A. A.
3 Welsh Sketches, chiefly ecclesiastical, to the... A., E. S.
4 [The World in which I live, and my place in it... A., E. S.

Contributors Corporate Author \


0 FORBES, Walter. NaN
1 BLAZE DE BURY, Marie Pauline Rose - Baroness NaN
2 BLAZE DE BURY, Marie Pauline Rose - Baroness NaN
3 Appleyard, Ernest Silvanus. NaN
4 BROOME, John Henry. NaN

Corporate Contributors Former owner Engraver Issuance type \


0 NaN NaN NaN monographic
1 NaN NaN NaN monographic
2 NaN NaN NaN monographic
3 NaN NaN NaN monographic
4 NaN NaN NaN monographic

Flickr URL \
0 http://www.flickr.com/photos/britishlibrary/ta...
1 http://www.flickr.com/photos/britishlibrary/ta...
2 http://www.flickr.com/photos/britishlibrary/ta...
3 http://www.flickr.com/photos/britishlibrary/ta...
4 http://www.flickr.com/photos/britishlibrary/ta...

Shelfmarks
0 British Library HMNTS 12641.b.30.
1 British Library HMNTS 12626.cc.2.
2 British Library HMNTS 12625.dd.1.
3 British Library HMNTS 10369.bbb.15.
4 British Library HMNTS 9007.d.28.

1 British Library HMNTS 12626.cc.2.


2 British Library HMNTS 12625.dd.1.
3 British Library HMNTS 10369.bbb.15.
4 British Library HMNTS 9007.d.28.
pd.read_csv('Datasets/BL-Flickr-Images-Book.csv')
>>> df.head()

Identifier Edition Statement Place of Publication \


0 206 NaN London
1 216 NaN London; Virtue & Yorston
2 218 NaN London
3 472 NaN London
4 480 A new edition, revised, etc. London

Date of Publication Publisher \


0 1879 [1878] S. Tinsley & Co.
1 1868 Virtue & Co.
2 1869 Bradbury, Evans & Co.
3 1851 James Darling
4 1857 Wertheim & Macintosh

Title Author \
0 Walter Forbes. [A novel.] By A. A A. A.
1 All for Greed. [A novel. The dedication signed... A., A. A.
2 Love the Avenger. By the author of “All for Gr... A., A. A.
3 Welsh Sketches, chiefly ecclesiastical, to the... A., E. S.
4 [The World in which I live, and my place in it... A., E. S.

Contributors Corporate Author \


0 FORBES, Walter. NaN
1 BLAZE DE BURY, Marie Pauline Rose - Baroness NaN
2 BLAZE DE BURY, Marie Pauline Rose - Baroness NaN
3 Appleyard, Ernest Silvanus. NaN
4 BROOME, John Henry. NaN

Corporate Contributors Former owner Engraver Issuance type \


0 NaN NaN NaN monographic
1 NaN NaN NaN monographic
2 NaN NaN NaN monographic
3 NaN NaN NaN monographic
4 NaN NaN NaN monographic

Flickr URL \
0 http://www.flickr.com/photos/britishlibrary/ta...
1 http://www.flickr.com/photos/britishlibrary/ta...
2 http://www.flickr.com/photos/britishlibrary/ta...
3 http://www.flickr.com/photos/britishlibrary/ta...
4 http://www.flickr.com/photos/britishlibrary/ta...

Shelfmarks
0 British Library HMNTS 12641.b.30.
1 British Library HMNTS 12626.cc.2.
2 British Library HMNTS 12625.dd.1.
3 British Library HMNTS 10369.bbb.15.
4 British Library HMNTS 9007.d.28.
Appendix 02
import numpy as np
import pandas as pd

from apyori import apriori


Importing the dataset
# reading the dataset

df = pd.read_csv('drive/My Drive/Super/market
basket/Market_Basket_Optimisation.csv', header = None)

df.head()
print(df.shape)
(7501, 20)
df.describe()
# making each customers shopping items an identical list
trans = []
for i in range(0, 7501):
trans.append([str(df.values[i,j]) for j in range(0, 20)])

# conveting it into an numpy array


trans = np.array(trans)

# checking the shape of the array


print(trans.shape)

# having a look at the top 10 customer's items list


print(trans[1:10])

import matplotlib.pyplot as plt


import seaborn as sns

from wordcloud import WordCloud

plt.rcParams['figure.figsize'] = (10, 10)


wordcloud = WordCloud(background_color = 'white', width = 1200, height =
1200, max_words = 20).generate(str(df[0]))
plt.imshow(wordcloud)
plt.axis('off')
plt.title('Most Popular Items bought first by the Customers',fontsize = 40)
plt.show()

You might also like