You are on page 1of 2

Next-Item Discovery in the Market Basket

Analysis
Luis Cavique, IEEE Member
ESCS- Polytechnic Institute of Lisboa,
Campus de Benfica do IPL,
1509- 014 Lisboa, Portugal
phone: (+351) 21 711 90 00

Abstract This paper addresses the problem of finding the the next-item is essential. However, the sheer number of
next-item for each customer in large database marketing and can association rules may make the interpretation of the results
be seem as an extension of the market basket analysis. Most of difficult. Therefore it is difficult for the marketer to predict the
the existing software uses Apriori-like algorithms. The outputs of next-item that each customer will buy.
the Apriori algorithms are easy to understand and many new
patterns can be identified. However, the sheer number of In order to obtain frequent market baskets in reduced
association rules may make the interpretation of the results computational times the Similis algorithm [3] can be used,
difficult. The aim of this work is to automate the cross-selling since this algorithm reduces the number of passes over the
strategy. We would like to simplify the work of the marketer, database.
avoiding the analysis of thousands of rules in associating This paper addresses the problem of finding the next-item
customers with their next-item. for each customer in large database marketing and can be
Index Terms market basket analysis, temporal knowledge seem as an extension of the market basket analysis.
extraction, database marketing We describe a way of finding the next-item each specific
customer will buy using the market basket analysis. We study
I. INTRODUCTION not only the discovery of data patterns, but also sequential
C urrent database capacities associated with bar code patterns, allowing a clearer real-world approach for one-time
technology and growth of the Internet has led to a huge purchases. A typical example is buying a laptop as first item,
collection of customer transaction data. Companies in followed by the purchase of the mouse, some software, a bag,
different sectors such as banking, insurance, a memory stick, a diskette drive, a printer/scanner, a modem
telecommunications and airlines have now become more and so on [4].
customer-oriented than never. The sequence of items must be added to the information in
To obtain the customer's profile there are two main data the database marketing. Finally, the association of each
sources: using the customer's personal data or the item- customer with his next-item is obtained using a procedure
oriented data. In order to gather demographic, social, developed in SQL and written in Relational Algebra.
geographic, personality or lifestyle data of the customer,
costly surveys are needed. On the other hand, item-oriented II. SEQUENTIAL PATTERNS
data, regarding the frequency or the quantity each customer Sequential pattern mining is an important data mining
buys of a certain item, already exists in the companies' problem with a wide range of applications. Sequence analysis
database. is used to determine data patterns throughout a sequence of
A market basket is composed of an itemset bought together temporal states.
in a single trip to a store [1]. Given that, each customer buys Nowadays, sequence analysis is widely applied to find
different itemsets, in different quantities, in different times, click-stream data in the Web sites. In this work we are going
makes the market basket analysis a difficult task. To simplify to apply it to the market basket sequence. Several authors
the problem only two attributes are used for each transaction: have been trying to find customers' purchase patterns of
the customer and the purchased item. goods in the retail and in the financial sectors [5].
Most of the existing software uses Apriori-like algorithms For the market basket analysis, in the input database, each
[2] known for their higher algorithmic complexity in time. transaction represents a purchase, which occurred in a specific
The outputs of the Apriori algorithm are easy to understand time and place and the transaction needs at least two
and many new patterns can be identified. attributes: customer and item. For this new problem each
For the cross-selling marketing strategy the knowledge of transaction needs three attributes: customer, item and time.

198 0-7803-9365-1/05/$17.00c2005 IEEE


The output of the algorithm will be the most frequent IV. CONCLUSIONS
sequence of items. The aim of this work is to automate the cross-selling
This work is based on the principle that the purchase strategy. We would like to simplify the work of the marketer,
patterns can be represented by a State Transition Model in avoiding the analysis of thousands of rules in associating
such a way that if the present state of a specific customer is customers with their next-item.
known, the following purchase state can be predicted, and an To find the sequential patterns we use a state transition
adequate cross-selling strategy can be implemented. algorithm that returns the most probable item sequence. With
SI - Item-Sequence Algorithm the given sequence in the database marketing, it is possible to
Input: a database Transaction (transaction-id, customer-id, discover the next-item for each customer.
item-id, date); The scalability of SI and S2 algorithms performs well,
Output: a sequence of items; allowing their inclusion in a commercial database-marketing.
1) Find a maximal market basket (i.e. a market basket that is During the presentation some specific examples will be
not included in another market basket) using Apriori-like given using real datasets.
algorithms or the Similis algorithm;
2) Select the transactions using the items found in step 1; REFERENCES
3) Transform of the input data in a dataset using sequences of [1] Berry, M. and Linoff, G., Data Miining Techniques for Marketing, Sales
items for each customer, as referred in Agrawal [6]; and Customer Support, John Wiley and Sons, 1997.
4) Build a state transition diagram, where each state [2] Agrawal, R. and Srikan, R., "Fast algorithms for mining association
corresponds to an item and each transition represents the rules", Proceedings of the 20th international conference on very large
data bases, 1994, pp. 478-499.
sequence from one item to the next-item; [3] Cavique, L., "Graph-based Structures for the Market Basket Analysis",
5) Find the most probable path using the Viterbi algorithm [7]. Revista de Investigavao Operacional, 2004, vol.24, pp. 1-14.
[4] Chen, Y.-L., Chiang, M.-C. and Ko, M.-T., "Discovering time-interval
sequential patterns in sequence databases", Expert Systems with
III. ASSOCIATING CUSTOMERS WITH THE NEXT-ITEM Applications, 2003, vol. 25, pp. 343-354.
Given the following basic database marketing, where the [5] Prinzie, A. and Van-Den-Poel, D., "Investigating Purchasing Patterns for
Financial Services using Markov, MTD and MTDg Models", European
primary keys are underlined, the information of the Item- Journal of Operational Research (forthcoming).
Sequence algorithm is stored in the table Basket: [6] Agrawal, R. and Srikan, R., "Mining sequential patterns", Proceedings
Customer (customer id, customer-details); of the 11th international conference on data engineering, 1995, pp. 3-14.
[7] Han, J. and Kamber, M., Data Mining: Concepts and Techniques,
Item (item_id, item-details); Morgan Kaufmann Series in Data Maanagement Systems, 2002.
Transaction (transaction id, customer-id, item-id, date);
Basket(sequence id, item id);
The foreign keys are the folowing:
Transaction. item-id cItem. item_id;
Transaction. customer id c- Customer. customer_id;
Basket. item-id cItem. item id;
We propose the following algorithm, expressed in
Relational Algebra, where ca and ® represents the Selection
operator and Join operator respectively. Firstly, all the
possible purchases are generated by applying a Cartesian
Product to Customer and Basket tables. To obtain all the next
purchases, we subtract from the Cartesian Product output, the
purchases already made in Transactions. Then, for each
customer the first item in his sequence is chosen in
Min-sequence. Finally, the Next item returns the association
of the customer with his next-item.

S2 - Next-item Algorithm
Input: a database marketing
Output: Customer-Next-Item (customer-id, customer-details,
next-item)
1) Cartesian Product = Customer x Basket
2) Minus= Cartesian Product- Transaction
3) Min Sequence= 6 min (Minus sequence id) (Minus)
4) Next-Item= (Min-sequence ® Basket C Item)

199

You might also like