You are on page 1of 8

Clementine Tutorial - Market Basket Project

Data Warehousing and Data Mining December 10, 2007

Market Basket Exercise

Brieng: This example deals with ctitious data describing the contents of supermarket baskets (that is, collections of items bought together) plus the associated personal data of the purchaser, which might be acquired through a loyalty card scheme. The goal is to discover groups of customers who buy similar products and can be characterized demographically, such as by age, income, and so on. This example illustrates two phases of data mining: Association rule modeling and a web display revealing links between items purchased C5.0 rule induction proling the purchasers of identied product groups Note: This application does not make direct use of predictive modeling, so there is no accuracy measurement for the resulting models and no associated training/test distinction in the data mining process. This example uses the streams named basklinks.str and baskrule.str, which reference the data le named BASKETS1n. These les are available from the Demos directory of any Clementine Client installation and can be accessed from Start menu Programs SPSS Clementine Desktop 11.0 Demos.

1.1

Accessing the data

A few steps: 1. Using a Variable File node, connect to the dataset BASKETS1n, selecting to read eld names from the le. 2. Connect a Type node to the data source, Set the type of the eld cardid to Typeless (because each loyalty card ID occurs only once in the dataset and can therefore be of no use in modeling). Select Set as the type for the eld sex (this is to ensure that the GRI modeling algorithm will not treat sex as a ag). You also need to modify Direction eld as below:(Figure 1) 3. and then connect the Type Node to a Table Node. 1

Figure 1: Market Basket Project - Type Node

4. Attach a GRI node to the Type node and Execute the GRI node. You should get a window as: (Figure 2) 5. Attach a Web node to the Type node to have a clear view of how dierent objects are connected. (Figure 3)

Figure 2: Market Basket Project - GRI model

Figure 3: Market Basket Project - Web Node

6. At the end, You should build a graph of nodes as: (Figure 4)

Figure 4: Market Basket Project

7. Execute the Web node, you will have a graph as below. (Figure 5) In the resulting display, three groups of customers stand out: Those who buy sh and fruits and vegetables, who might be called Healthy eaters Those who buy wine and confectionery Those who buy beer, frozen meals, and canned vegetables (Beer, beans, and pizza)

1.2

Proling the Customer Groups

You have now identied three groups of customers based on the types of products they buy, but you would also like to know who these customers are that is, their demographic prole. This can be achieved by tagging each customer with a ag for each of these groups and using rule induction (C5.0) to build rule-based proles of these ags. 1. you must derive a ag for each group. This can be automatically generated using the web display that you just created. Using the right mouse button, 4

Figure 5: Market Basket Project - Web Node After Execution

click the link between fruitveg and sh and select Generate Derive Node For Link. (Figure 6)

Figure 6: Market Basket Project - Web Node Derived Node

2. Edit the resulting Derive node to change the Derive eld name to healthy. Repeat the exercise with the link from wine to confectionery, naming the resultant Derive eld winechocs. For the third group (involving three links), rst make sure that no links are selected. Then select all three links in the cannedveg, beer, and frozenmeal triangle by holding down the shift 5

key while you click the left mouse button. (Be sure you are in Interactive mode rather than Edit mode.) Then from the web display menus choose: Generate Derive Node (And) Change the name of the resultant Derive eld to beer beans pizza. (Figure 7)

Figure 7: Market Basket Project - Web Node Multiple Derived Node

3. To prole these customer groups, connect the existing Type node to these three Derive nodes in series, and then attach another Type node. (Figure 8) In the new Type node, set all elds to direction None, except for value, pmethod, sex, homeown, income, and age, which should be set to In, and the relevant customer group (for example, beer beans pizza), which should be set to Out. Attach a C5.0 node, set the Output type to Rule set (Figure 9), and execute it.

Figure 8: Market Basket Project

Figure 9: Market Basket Project - C5.0 Node

The resultant model (for beer beans pizza) contains a clear demographic prole for this customer group: (Figure 10)

Figure 10: Market Basket Project - Prole

The same method can be applied to the other customer group ags by selecting them as the output in the second Type node. A wider range of alternative proles can be generated by using GRI instead of C5.0 in this context; GRI can also be used to prole all of the customer group ags simultaneously because it is not restricted to a single output eld. END of Tutorial 2