Professional Documents
Culture Documents
Analysis
Luis Cavique, IEEE Member
ESCS- Polytechnic Institute of Lisboa,
Campus de Benfica do IPL,
1509- 014 Lisboa, Portugal
phone: (+351) 21 711 90 00
Abstract This paper addresses the problem of finding the the next-item is essential. However, the sheer number of
next-item for each customer in large database marketing and can association rules may make the interpretation of the results
be seem as an extension of the market basket analysis. Most of difficult. Therefore it is difficult for the marketer to predict the
the existing software uses Apriori-like algorithms. The outputs of next-item that each customer will buy.
the Apriori algorithms are easy to understand and many new
patterns can be identified. However, the sheer number of In order to obtain frequent market baskets in reduced
association rules may make the interpretation of the results computational times the Similis algorithm [3] can be used,
difficult. The aim of this work is to automate the cross-selling since this algorithm reduces the number of passes over the
strategy. We would like to simplify the work of the marketer, database.
avoiding the analysis of thousands of rules in associating This paper addresses the problem of finding the next-item
customers with their next-item. for each customer in large database marketing and can be
Index Terms market basket analysis, temporal knowledge seem as an extension of the market basket analysis.
extraction, database marketing We describe a way of finding the next-item each specific
customer will buy using the market basket analysis. We study
I. INTRODUCTION not only the discovery of data patterns, but also sequential
C urrent database capacities associated with bar code patterns, allowing a clearer real-world approach for one-time
technology and growth of the Internet has led to a huge purchases. A typical example is buying a laptop as first item,
collection of customer transaction data. Companies in followed by the purchase of the mouse, some software, a bag,
different sectors such as banking, insurance, a memory stick, a diskette drive, a printer/scanner, a modem
telecommunications and airlines have now become more and so on [4].
customer-oriented than never. The sequence of items must be added to the information in
To obtain the customer's profile there are two main data the database marketing. Finally, the association of each
sources: using the customer's personal data or the item- customer with his next-item is obtained using a procedure
oriented data. In order to gather demographic, social, developed in SQL and written in Relational Algebra.
geographic, personality or lifestyle data of the customer,
costly surveys are needed. On the other hand, item-oriented II. SEQUENTIAL PATTERNS
data, regarding the frequency or the quantity each customer Sequential pattern mining is an important data mining
buys of a certain item, already exists in the companies' problem with a wide range of applications. Sequence analysis
database. is used to determine data patterns throughout a sequence of
A market basket is composed of an itemset bought together temporal states.
in a single trip to a store [1]. Given that, each customer buys Nowadays, sequence analysis is widely applied to find
different itemsets, in different quantities, in different times, click-stream data in the Web sites. In this work we are going
makes the market basket analysis a difficult task. To simplify to apply it to the market basket sequence. Several authors
the problem only two attributes are used for each transaction: have been trying to find customers' purchase patterns of
the customer and the purchased item. goods in the retail and in the financial sectors [5].
Most of the existing software uses Apriori-like algorithms For the market basket analysis, in the input database, each
[2] known for their higher algorithmic complexity in time. transaction represents a purchase, which occurred in a specific
The outputs of the Apriori algorithm are easy to understand time and place and the transaction needs at least two
and many new patterns can be identified. attributes: customer and item. For this new problem each
For the cross-selling marketing strategy the knowledge of transaction needs three attributes: customer, item and time.
S2 - Next-item Algorithm
Input: a database marketing
Output: Customer-Next-Item (customer-id, customer-details,
next-item)
1) Cartesian Product = Customer x Basket
2) Minus= Cartesian Product- Transaction
3) Min Sequence= 6 min (Minus sequence id) (Minus)
4) Next-Item= (Min-sequence ® Basket C Item)
199