You are on page 1of 28

DATA MINING

FOR
BUSINESS INTELLIGENCE
(1 and 2)
Data Mining: A process for extracting
information from large data sets to solve
business problems.

 Data Warehouse: A large database


created specifically for decision support
throughout the enterprise. It usually
consists of data extracted from other
company databases. This data has been
cleaned and organized for easy access.
Often includes a metadata store as well.
Data Mining:
Data mining is defined as the process of
extracting significant and potentially useful
patterns in large volume of data. In other words,
data mining is the search for the relationships
and global patterns that exist in large databases
but are hidden among vast amount of data. This
relationship represents valuable knowledge
about the database, if the database is a faithful
mirror of the real world registered by database.
Some application areas:
(i) Sales and Customer Service:
Market Basket Analysis (Analysis of transactional databases to find
sets of items that appear frequently together in a single purchase) have
already shown phenomenal gains in cross-selling, floor and shelf layout,
better layout of catalog and web pages, effective promotion schemes)
(ii) Customer Retention:
-Identifying patterns that leads to defection of customers and suggesting
preventive measures for the current customers
(iii) Risk Assessment and Fraud Detection:
-a mail order retailer can identify payment patterns from different
customers at the same address, identifying potentially fraudulent
practices by an individual using different names
- An insurance company can identify client who may have different kinds
of policies totaling more than an acceptable level
- A bank can identify companies that may be in financial jeopardy before
extending a loan to them
(iv) Customer Segmentation
(v) Product Grouping
TYPES OF KNOWLEDGE
EXTRACTED USING DATA MINING

- Association Rule
- Classification
- Clustering
- Feature Selection
- Factor Analysis
- Sequence Mining
- Regression
ASSOCIATION RULE
Association rule is a type of data mining that
correlates one set of items or events with
another set of items or events. It employs
association or linkage analysis, searching
transactions from operational systems for
interesting patterns with a high probability of
repetition.

Algorithms :
 A priori
 Partitioning
 Dynamic Itemset Counting
 Frequent Pattern Tree Algorithm
Examples of Association Rules
(i) When people buy butter, they also buy bread
70% of the time (Association)
(ii) When people buy Pepsi, they also buy Lays
chips 70% of the time, on Sunday evenings
(Temporal Association)
(iii) 70% of the readers who buy a DBMS book
also buy a Data Mining book after a semester
(Sequence Rule, a type of Temporal
Association)
(iv) When people buy coke, they do not buy
coffee 95% of the time (Negative Association)
Strength of an association rule defined under the
framework of ‘support’, ‘confidence’ and ‘lift’

‘Support’ of an itemset in a transaction


database is defined as the percetntage
of occurrence of the itemset, out of all
the transactions.
X=>Y holds with support s% , if s% of
the transactions in the database contain
‘X’ and ‘Y’ both,
For a given transaction database, if ‘X’ and
‘Y’ represent two items/itemsets such that
X∩Y=Ф, i.e., there is no common
item in them, we say ‘X’ associates ‘Y’
and represented as X=>Y
‘Confidence’ of an association rule X=>Y is
defined as the percentage of transactions
containing X and Y both, out of all the
transactions containing X.
X=>Y holds with confidence c%, if c% of
the transaction in the database that
supports‘X’ also supports‘Y’
‘Lift’ of an association rule X=>Y
 The lift ratio is the confidence of the rule divided by
the confidence assuming independence of
consequent from antecedent. A lift ratio greater
than 1.0 suggests that there is some usefulness to
the rule. The larger the lift ratio, the greater is the
strength of the association.

Lift = Confidence/Percentage Support of Y


Various types of association rules:
 Ordinary Association Rule:
-Boolean Type
- Quantitative Type
- Categorical Type
 Temporal Association Rule:
-Boolean Type
-Quantitative Type
-Categorical Type
 Spatial Association Rule:
-Boolean Type
-Quantitative Type
-Categorical Type
Transaction of items:
TID I1 I2 I3
1 1 0 1
2 1 1 0
3 0 0 0
4 1 0 0
5 1 1 1
6 1 0 0
7 0 0 0
8 0 0 1
9 1 0 1
10 1 0 1
Total no. of transactions = 10
n(I1) = 7, n(I2) = 2, n(I3) = 5
n(I1and I2) = 2,
n(I1 and I3) = 4,
n(I2 and I3) = 1,
n (I1, I2 and I3) = 1

Find all the association rules for


min_support = 30% and min_confidence =
60% and min_lift =1.
Itemset No. of Support Frequent Confidence & Lift Association
Trans. Itemset Rules

I1 7 70% I1 Conf=n(I1,I3)/n(I
1)
=4/7 =57.1%
I2 2 20% -
I3 5 50% I3 n(I1,I3)/n(I3) I3=>I1
=4/5 =80%
Lift=
80/70
=1.14
I1,I2 2 20% -
I1,I3 4 40% (I1,I3)

I2,I3 1 10% -
I1,I2,I3 1 10% -
.
Association Rules in Retail Sale Transaction Data

Rule Support(a Support( Lift


Conf. % Antecedent (a) Consequent (c) Support(a U c)
# ) c) Ratio

Noodles-maggi-200gm,
Tomato sauce (Maggi)- MDH-Masala-chicken
1 92.62 200gm=> masala 149 201 138 2.7647

Noodles-maggi-
MDH-Masala-chicken 200gm, Tomato sauce
2 68.66 masala=> (Maggi)-200gm 201 149 138 2.7647

MDH-Masala-chicken
masala, Noodles-maggi- Tomato sauce 2.5688
3 87.34 200gm=> (Maggi)-200gm 158 204 138 8

MDH-Masala-chicken
Tomato sauce (Maggi)- masala, Noodles- 2.5688
4 67.65 200gm=> maggi-200gm 204 158 138 8

2.3795
5 92.41 Kismis - 100 gm=> Basmati Rice (1 Kg) 158 233 146 3

MDH-Masala-chicken Tomato sauce 2.1949


6 74.63 masala=> (Maggi)-200gm 201 204 150 1

Tomato sauce (Maggi)- MDH-Masala-chicken 2.1949


7 73.53 200gm=> masala 204 201 150 1

Bournvita-Cadbury 1.7474
8 66.99 Coffee (50gm)_Nescafe=> -500gm 209 230 140 5
Case-I
Maximize return on investment in
retail industry Customer rewards
programs

(Publications:Articles & Whitepapers, Inductis


Retail Rewards Solutions, by Dr. Matt Hasan,
May 2007)  

“Optimally Manage Churn by Leveraging


Each Shopper’s Inherent Loyalty
Intensity.”
Recent research of retail customer rewards programs industry sources reveal the
following interesting facts:
 Reward programs, especially card based ones, are
widespread in the retail industry
 Programs have very high adoption rates due to no
joining fee and low barriers to entry; 80% of the
customers have a loyalty card
 54% of shoppers have multiple loyalty cards from
competing retailers
 80%-90% of grocery purchases are made with a loyalty
card
 Less than 30% of shoppers say reward programs have a
major impact on their shopping decisions
 40% of long term reward program members never
redeem their rewards
 Shoppers with emotional ties to a store (or chain) tend
to shop there even if they have to go further or pay more
 Majority of retail rewards programs use just cumulative
dollar value of purchases to allocate rewards; some use
transaction data to match coupons and discounts to
customer buying patterns;
 Success Story of a Leading Retail Chain in USA

Background: Store sales were declining with

 The average shelf life of store merchandise was higher than


industry norms
 Rewards/Loyalty program, including discounts and coupons were
attracting less-profitable customers, without significantly
impacting the top line.

Using the SCWM (Strategic Customer Worth Management -


Developed by Inductis)
“They were able to reverse a decline in sales while decreasing
expenditures on rewards program, resulting in a gross margin
improvement of 17%.”

Strategic Customer Worth Management (SCWM): SCWM is a proven


enterprise solution based on unique “true customer worth”
evaluating framework that determines the optimal treatment and
rewards to offer to acquire, retain, or expand business with each
customer.
Current Rewards/Loyalty Programs
 Point of Sale Data
 Customer Database
 Product/ Discount Database

- Rewards frequent and high spending


shoppers
- Provides broad demographic and
transaction data
Enhancements/Additions Based on SCWM:
- Personalized Customer Interaction
- Predictive Modeling

 Matches incentives to customer profile


 Scores customers for true economic
worth
 Provides details on inherent loyalty
(emotional) profile of customers in
addition to demographic and transaction
data
Approach
 They applied unique SCWM framework to perform in-
depth customer segmentation of intrinsic loyalty, as well
as preferences for merchandise and communications
channel. They used in-store survey data as well as
geodemographic data from consumer research services
 Aligned the loyalty programs, merchandizing, and
communications channels of each store type according to
the profile of the profitable customers of that store type
 Implemented a targeted direct mail program designed to
resonate with shoppers matching the profitable customer
profile of each store type
 Designed and executed a rewards program based on true
customer worth and inherent loyalty coefficient of each
customer segment
 Instituted an ongoing survey and tracking system to
collect data on customer demographics and browsing,
purchasing, and loyalty patterns, to create a virtuous
circle of understanding customers and creating
programs that retain
their loyalty
Results
 Average revenue per store increased by 5%
 Among those stores which were previously the worst performers,
average revenue per customer increased by 14%
 Traffic increased by an average of 11%, with some stores seeing an
increase of as much as 20%
 Expenditures on rewards programs decreased by 8%, as targeted
rewards programs replaced some sales-coupons, and discounts
 Gross margin increased by 17%
 They applied unique SCWM framework to perform in-depth customer
segmentation of intrinsic loyalty, as well as preferences for merchandise
and communications channel. They used in-store survey data as well as
geodemographic data from consumer research services
 Aligned the loyalty programs, merchandizing, and communications channels of
each store type according to the profile of the profitable customers of that
store type
 Implemented a targeted direct mail program designed to resonate with
shoppers matching the profitable customer profile of each store type
 Designed and executed a rewards program based on true customer worth
and inherent loyalty coefficient of each customer segment
 Instituted an ongoing survey and tracking system to collect data on
customer demographics and browsing, purchasing, and loyalty patterns,
to create a virtuous circle of understanding customers and creating
programs that retain
their loyalty
….

You might also like