You are on page 1of 12

Machine Learning & Deep Learning

Market Basket Analysis

Submitted By -

Group 4

Rohit Singh

Akshay Anand

Sushobhan Dutta

Saurabh Shimpi

Abhishek Giri

Kaustubh Tidke

Himanshu Patil
Ajey Verma

● What is Market Basket Analysis?

The shopping basket analysis is a method used to increase sales by understanding consumers
’buying patterns. Analyze a large set of data, including consumer purchasing history, product
groups, and products that are likely to be purchased together. The market basket analysis
determines the strength of the relationship between the paired products purchased together and
reveals patterns of coexistence.

Types Of Market Basket Analysis

Predictive market basket analysis: This type considers items purchased in sequence to
determine cross-sell .

Differential market basket analysis: This type takes into account data from different stores, as
well as purchases from different groups of customers at different times of the day, month or year.
If the rule is true in one dimension (for example, in one store, time period, or customer group),
but not in others, analysts can identify the factors responsible for the exception. These ideas can
create new products that drive sales.
Benefits of market basket analysis

● It increases Customer Satisfaction and can increase sales for retailers .


● By using the data to determine which products are often purchased together, retailers can
optimize product placement, offer special offers, and create new product bundles to encourage
upsells of these combinations.

Example

All the E-commerce websites such as Flipkart, Amazon etc. On a product page these websites
provide the product based on a market basket analysis. They recommend the product based on
related products and the product which are bought together.

2
● Problem Statement

The problem statement of our project is to find the products which can be sold together to the
target customers of a U.K based and registered non-store online retail. The data set is the
transactional data which contains all the transactions occurring between 01/12/2010 to
09/12/2011 . The objective of this project is to increase the sale and profit of online retail stores
and to make useful recommendations to the customer and increase the customer satisfaction.

● Association Rules

Association rule mining finds interesting associations and relationships among large sets of data
items. This rule shows how frequently an item set occurs in a transaction. A typical example is
Market Based Analysis.

An association rule has 2 parts:

1.an antecedent (if) and

2.a consequent (then)

ex.“If a customer buys bread, he’s 70% likely of buying milk.”

Association rules are created by thoroughly analyzing data and looking for frequent if/then
patterns. Then, depending on the following two parameters, the important relationships are
observed:

1.Support: Support indicates how frequently the if/then relationship appears in the database.

2.Confidence: Confidence tells about the number of times these relationships have been found to
be true.

So, in a given transaction with multiple items, Association Rule Mining primarily tries to find
the rules that govern how or why such products/items are often bought together.

Association Rule Mining is sometimes referred to as “Market Basket Analysis”, as it was the
first application area of association mining. The aim is to discover associations of items
occurring together more often than you’d expect from randomly sampling all the possibilities.

3
• Association Rule (AR): implication X ⇒ Y where X,Y ⊆ I and X ⋂ Y =𝚽

• Support of AR (s) X ⟹Y: Percentage of transactions that contain X ⋃Y

• Confidence of AR (a) X ⥤Y: Ratio of number of transactions that contain X⋃ Y to the


number that contain X.

Some areas where Association Rule Mining has helped quite a lot:

1.Market Basket Analysis

2.Medical Diagnosis

3.Census Data

4.Protein Sequence.

– Market Basket Analysis Flowchart

4
Steps for Market Basket Analysis

Following are the steps for the Market Basket Analysis used for a dataset of an Online Retail
store :

Acquiring Dataset : The dataset for this study is being downloaded from Kaggle with the data
related to the transactional data which contains all the transactions occurring between 01/12/2010
to 09/12/2011 for a UK based and registered non-store online retail. Following are the details for
the dataset being used -

Data Set Characteristics: Multivariate, Sequential, Time-Series


Attribute Characteristics: Integer, Real
Associated Tasks: Classification, Clustering
Number of Instances: 541909
Number of Attributes: 8
Missing Values? N/A
Area: Business
Date Donated: 2015-11-06
Number of Web Hits: 438868

Data Pre - Processing : The step for data pre-processing has two major steps which are
eliminating the missing values and cleaning the data set. In this step, we first check if there are
any missing values, in which columns are they missing and how many of them are missing.

After initially looking at the missing values in the data set, we clean the data set with all the
values and the attributes that are not required for our association analysis. In our example, we
clean the Invoice Number and the Description of items in the dataset.

Then we look at the Buzz words, the undesirable words from the dataset and clean them through
the next step which gives us the dataset that is required for our analysis further.

5
Creating the Baskets: Now that the data is cleaned and pre-processed, we look at creating the
baskets of items according to the transactions in the dataset and this is the base for our further
analysis and creation of our rules for Market Basket Analysis.

Creating Rules: For the association rules as mentioned in our section above, we use Support,
Confidence and lift as our metrics for generating association rules for our study. Here in our
study, we generate two rules, one with support of 5% and confidence of 75% and the other with
support of 1% and confidence of 70%. After creating certain rules, we sort them according to the
Lift that is being generated. For the first rule, we sort it by the decreasing value of Lift and the
second rule is sorted by the decreasing value of Confidence.

Data Visualization: We then visualize the data, the set of rules through the scatter plot and
graphs to find the effective recommended association of the products in our history of
transactions and find the optimal solution for our association problems. The data visualization,
results and interpretation can be seen in the further sections in this report.

● Packages Used in this Analysis

Following are the lists of packages used in our Association rules based Market Basket Analysis.
These libraries are free to download and have been available with the latest version of R.For the
complete documentation related to these packages in R, please refer to the sources section of the
report.

1. Stringr - The Stringr package is used to work with the strings that are very helpful in the data
cleaning and preparation tasks. The stringr package has a set of functions that are used to make
working with strings as easy as possible. For the complete documentation related to the Stringr
package in R, please refer to the sources section of the report.
2. Plyr - The Plyr package is used for Splitting, Applying and Combining data that is used for our
analysis. It involves a set of rules for where the common problems are to split the big data into
homogeneous functions, apply functions to each piece and combine all the results.
3. Arules - Arules is an association rule based package that provides an infrastructure to represent,
manipulate and analyse the patterns in the transactional data using the frequent items in the
itemset and the association rules.

6
4. ArulesViz - ArulesViz is the package in the dataset that is used to visualize the data in our
analysis through the set of association rules that are used in the dataset. It is generally an
extension of the Arules package and is used to plot the interactive visualizations used for
exploring the rules generated in the analysis.
● Interpretation

● Results & Conclusion

● Sources
● Stringer Package

https://www.rdocumentation.org/packages/stringr/versions/1.4.0#:~:text=stringr%20is%20built
%20on%20top,almost%20anything%20you%20can%20imagine

● Plyr Package

https://www.rdocumentation.org/packages/plyr

● Arules Package

https://www.rdocumentation.org/packages/arules

● Arulesviz Package

https://www.rdocumentation.org/packages/arules

7
Data Set used for the project:

Transactional data set which contains all the transactions occurring between

01/12/2010 and 09/12/2011 for a UK-based and registered non-store online retail

has been used for this project.

Data Set Multivariate,


Number of 54190 Busines
Characteristics:  Sequential, Area:
Instances: 9 s
  Time-Series
Number of Date
Attribute 2015-
Integer, Real Attributes 8 Donate
Characteristics: 11-06
: d
Number
Associated Classification Missing
N/A of Web 438868
Tasks: , Clustering Values?
Hits:

Association Rules obtained from the data set:

Visualization:

8
9
10
11
Conclusion:

 We can recommend the customers certain items to buy based on the results

obtained from market basket analysis.

 We can bundle items together using bundle pricing in order to increase sales

 We can increase and decrease prices based on association data

12

You might also like