You are on page 1of 57

Solving Business Problems with Oracle Data Mining

Purpose
This tutorial shows you how to use Oracle Data Mining to solve business problems.
Time to Complete
Approximately 2 hours
Topics
This tutorial covers the following topics:
Overview

Scenario

Prerequisites

Launching Oracle Data Miner

Preparing Data for Data Mining

Using the Mining Activity Guides

Summary

Viewing Screenshots

Place the cursor over this icon to load and view all the screenshots for this tutorial.
(Caution: This action loads all screenshots simultaneously, so response time may be
slow depending on your Internet connection.)
Note: Alternatively, you can place the cursor over an individual icon in the following
steps to load and view only the screenshot associated with that step. You can hide an
individual screenshot by clicking it.
Overview
Data Mining is sometimes called Knowledge Discovery-its goal is to provide actionable
information, not found by other means. This information can improve the functioning of
your business. For example, suppose a marketing campaign results in a 2% positive
response. If Data Mining can help focus the campaign on the people most likely to
respond, resulting in a 3% response, then the business outcome is a 50% increase in
revenue.
Data Mining can be divided into two types of "Learning".
The goal is to predict a value for a particular
Supervised characteristic, or attribute that describes some
Learning behavior. The attribute being predicted is called the
Target Attribute.
The goal is to discover relationships and patterns
Unsupervised
rather than to determine a particular value. Here,
Learning
there is no Target Attribute.
Oracle Data Mining (ODM) is powerful data mining software embedded in the Oracle
Database that enables you to discover new insights hidden in your data. Oracle Data
Mining helps businesses to target their best customers, find and prevent fraud, discover
the most influential attributes that affect Key Performance Indicators (KPIs), and find
valuable new information hidden in the data. Oracle Data Mining helps technical
professionals find patterns in their data, identify key attributes, discover new clusters
and associations, and uncover valuable insights.
Oracle Data Mining enables companies to:
KNOW Leverage your data and discover valuable new
MORE information and insights that were previously hidden.
DO Build applications that automate the extraction and
MORE dissemination of new information and insights.
Oracle Data Mining is significantly less expensive
SPEND compared to traditional approaches and, as a component
LESS of your investment in Oracle technology, significantly
reduces your total cost of ownership.
Oracle Data Mining enables you to go beyond standard query and reporting tools and
Online Analytical Processing (OLAP). Query and reporting and OLAP tools can tell you
who are your top customers, what products have sold the most, and where you are
incurring the highest costs. With Oracle Data Mining, you can implement strategies to:
Anticipate and prevent customer attrition

Acquire new customers and identify the most profitable customers

Identify promising cross-sell opportunities

Detect noncompliant and fraudulent activities

Discover new clusters or segments

Develop customer profiles

Identify likely targets and promising leads in drug discovery


Find association relationships of co-occurring items and/or events

Mine unstructured data, that is, text.

Traditional business intelligence (BI) tools such as reports, interactive query and
reporting, and Online Analytical Processing (OLAP), only report on what has happened
in the past. Oracle Data Mining (ODM) allows you to go beyond traditional BI and
reporting to mine your data and build advanced data mining applications. ODM enables
you to discover new insights, segments and associations, make more accurate
predictions, find the variables that most influence your business, and in general, extract
more information from your data. For example, by analyzing the profiles of your best
customers, ODM enables you to build data mining models and integrated applications to
identify customers who are likely to become your best customers in the future. These
customers may not represent your most valuable customers today, but may match
profiles of your current best customers. Moreover, with ODM you can do more and
transform a predictive model into a regular production application that distributes lists
of your most promising customers to your Sales force every Monday morning. Knowing
the “strategic value” of your customers — which are likely to become profitable
customers in the future and which are not, or predicting which customers are likely to
churn or likely to respond to a marketing offer — and integrating this information into
your operations is the key to proactively managing your business.
The phases of solving a business problem using Oracle Data Mining are as follows:
Problem Definition in terms of Data Mining and Business Goals

Data Acquisition and Preparation


Building and Evaluation of Models

Deployment.

Note: This tutorial is not intended as a comprehensive course on Oracle Data Mining. It
illustrates the techniques required to carry out some common data mining operations. If
you want more background information about some of the topics, see the Oracle Data
Mining Concepts Guide.
Back to Topic List
Scenario
An electronics store chain wants to distribute a discount card to its customers, but only
to those customers who are expected to increase their buying (and thus the company’s
revenue) because of this card. A test campaign was run on a sample of customers and
the results were compiled into a table containing the customer demographics,
purchasing patterns, and a measure of revenue produced by each customer.
A dataset describing the customers and results in the test campaign is used to create a
model that can be applied to all customers for the purpose of predicting revenue levels
expected from each customer who uses a discount affinity card – the Target attribute to
be predicted is AFFINITY_CARD with values 0 (= low revenue) and 1 (= high
revenue). The distinct target values (0 and 1 in this case) are sometimes called the target
“classes”, thus the prediction of a target value for each customer is called Classification.
Back to Topic List
Prerequisites
Before starting this tutorial, you should:
1. Perform the Installing Oracle Database 10g on Windows tutorial.
2. Perform the Installing Oracle Data Miner 10g Release 2 tutorial.

Back to Topic List


Launching Oracle Data Miner
To launch Oracle Data Miner perform the following.
1. Double-click <your_path>\bin\odminerw.exe.
2. To connect as dmuser1, click OK.

3. You may want to confirm sample tables and views in your user schema. Expand Data Sources >
dmuser1.
This displays both tables and views. Click Tables to view tables; Click Views to have the list of
views.
Back to Topic List
Preparing Data for Data Mining
The data used in the Data Mining process usually has to be collected from various
locations. Also, some transformation of the data is usually required to prepare the data
for Data Mining operations. Oracle Data Mining has powerful utilities for sampling,
recoding, discretizing and creating new transformed attributes.
In this section, you perform the following tasks:
Display Data and Statistics

Use Data Transformation wizard

Back to Topic List


Display Data and Statistics
Perform the following steps:
1. You can display the structure of a table or view. Click MINING_DATA_BUILD_V.
2. Click the Data tab to view a sample of the contents.
Scroll to the right of the window to view full details
Note: The default number of records shown is 100; Enter a different number in the Fetch Size
window, then click Refresh to change the size of the display. Also, you can click Fetch Next to
add more rows to the display.

3. Right-click MINING_DATA_BUILD_V to expose a menu with more options. You have access
to Transformation wizards and other tools for data exploration.
4. To see a statistical summary, you can click one of the two selections depending on the data
format type. Click Show Summary Single-Record.
Your output should match the image below:
For each numerical attribute, Maximum and Minimum values, as well as average and variance
are shown. These statistics are calculated on a sample. The sample size in this example is 999.
You can change the sample size by adjusting ODM Preferences, available on the Tools pull-
down menu.

5. For any highlighted attribute, you can view a distribution of values.


a. Highlight AFFINITY_CARD.
b.Click Histogram.
The default maximum number of bins is 10; this number can be changed by clicking Preferences.
Numerical attributes are divided into the designated number of bins of equal width between the
minimum and maximum. The bins are displayed in ascending order of attribute values.
Categorical attributes are binned using the "Top N" method(N is the number of bins). The N
values occurring most frequently have bins of their own; the remaining values are thrown into a
bin labeled "Other". The bins are displayed in descending order of bin size.
When done reviewing, click OK, then close the Data Summarization Viewer window.

Back to Topic
Use Data Transformation wizard
The Mining Activity Guides assist you in joining data from disparate sources into one
view or table. Also, they carry out transformations that are required by a particular
algorithm. However, there are transforms that typically are completed on a standalone
basis using one of the following Data Transformation wizards.
Filter
Recode

Derive field

In this tutorial, you learn how to use the Filter Single -Record Transformation wizard.
Suppose you want to concentrate on customers between the ages of 21 and 35. You can
filter the data to include only those people. Oracle Data Miner provides a filtering
transformation to define a subset of the data based upon attribute values. Perform the
following steps:
1. Click Data > Transform > Filter Single-Record.

2. Click Next on the Welcome page.


3. Identify the input data and click Next. In this example the view used is
MINING_DATA_BUILD_V.
Note: If you accessed the wizard by right-clicking the table or view name, then the data is
already known and this step is skipped.

4. Enter a name for the resultant view and click Next.


5. Enter the filter condition in the Filter text box. Alternatively, to construct the condition in a
dialog box click the icon to the right of the Filter field.
6. In the Expression Editor page, peform the following steps:
a. Double-click the attribute name AGE.
b. Click the >= button and type 21 to construct the first part of the condition shown.
c. Click AND to continue defining the full condition. Double Click AGE.
d. Click the <= button and type 35.
e. Click the Validate button to check that the condition is satisfied by a subset of the source data.
f. Click OK to display the condition in the Filter window.
7. Click Next.
8. Click Finish to complete the transformation.
Your output should match the image below:

Back to Topic
Back to Topic List
Using the Mining Activity Guides
When the data mining problem has been defined and the source data identified, there are
two phases remaining in the Data Mining Process: Build/Evaluate models and deploy
the results. Oracle Data Miner contains Activity Guides for the purpose of carrying out
these phases with the minimum of required intervention. Moreover, the implicit and
explicit choices and settings used in the Build activity can be passed on seamlessly to
the Apply activity. Therefore, many operations usually required are hidden or
eliminated.
The Build Activity Wizard allows you to :
Identify supplemental data to add to the case table(the basic source data)

Select the Data Mining Functionality and Algorithm

Adjust the Activity settings manually, rather than accept automatic settings

This example uses the Decision Tree Classification algorithm to create, evaluate, and
apply a predictive model classifying customers as fitting the profile of either a high-
revenue or low-revenue customer. You need to perform the following tasks:
1. Build a Predictive Model using a Mining Activity
2. Apply a Predictive Model using a Mining Activity

Back to Topic List


Build a Predictive Model using a Mining Activity
1. For this topic, you use the original view, not the view you just created. Click
MINING_DATA_BUILD_V in the left navigator.

2. To launch the Activity Build Wizard, click Activity > Build.


3. The New Activity Wizard-Welcome page is displayed. Click Next.

4. On the Select Mining Type page, accept the default of Classification Function Type and the
Decision Tree Algorithm. Click Next.
5. To select all the data to be mined from the case table or view, ensure that the box Join additional
data with case table is not checked. If the box is checked, you can specify data from other tables
and schemas to be joined with the case table. This example assumes that all the necessary data is
in the case table MINING_DATA_BUILD_V. From the Unique Identifier drop down list, select
CUST_ID. Then click Next.
6. You can specify the attribute to be predicted for a data mining problem. Click the
AFFINITY_CARD radio button to indicate the target and then click Next.
Note: The Target Attribute to be predicted in this tutorial is AFFINITY_CARD
7. Some evaluation results depend upon your choice of Preferred Target Value. In this example, you
want to identify the high-revenue customers (Target Value = 1). Select 1 for the Preferred Target
Value and click Next.
8. Enter a name for the activity that is descriptive and click Next.
9. The last step of the Activity Wizard is displayed. You can display the settings in the Activity.
Click Advanced Settings.
10 The parameters have default settings that ensure a good result. The Build parameters determine
. how the branches of the decision tree are constructed. Click OK.
11. Ensure that the Run upon finish check box is selected. Click Finish.
When the Activity Wizard is completed, the steps appropriate to the chosen activity are
displayed. If you choose Run Upon Finish, the steps are executed to completion in sequence and
a check appears on the right side of each step as it is completed.

12 To view the structure of the tree, click Result in the Build step.
.
13 Your results might differ from the display due to the small size of the sample dataset. The default
. view shows all nodes and the attribute values used to determine splits. You can highlight a node
to show the rule for a record to be included in that node. The predicted value is the target value of
the majority of records in that node. Confidence is the percentage of records in the node having
the predicted taget value. Cases is the actual number of cases in the source data satisfying the
rule for that node. Support is the percentage of cases in the source data satisfying the rule for that
node.
14 You may want to view only the terminal nodes(known as Leaves) and eliminate the intermediate
. nodes. Click the Show Leaves Only check box.
15 A decision tree is sensitive to missing values when applied to new data. However, if the splitting
. attribute is missing, the ODM Decision Tree Algorithm provides an alternative attribute known
as surrogate to be used in its place. If both the splitting attribute and its surrogate are missing, the
predicted value is determined at the parent node of the tree. To view the surrogate, highlight a
node and click the Surrogate radio button.
Close the Build Results window.

16 To evaluate the Decision Tree Model, click Result in the Test Metrics step.
.
17 Click the Accuracy tab.
.
18 Click the More Detail button to see the Confusion Matrix.
.
19 The Confusion Matrix is calculated by applying the model to the hold-out sample from the test
. campaign. The values of AFFINITY_CARD are known and are represented by the rows; the
columns are the predictions made by the Classification Model. The upper right and lower left
cells indicate potential prediction errors. Most business problem solutions attempt to minimize
one type of error. For example, the lower left cell contains the number of “missed opportunities”
– cases of high-revenue customers who were not correctly identified by the model. Click the
ROC tab.
20 To explore possible changes in the model's parameters, you can slide the red line back and forth
. by clicking the arrow buttons at the bottom right of the graph and observe the change in the
Confusion Matrix until the values indicate a good solution to the problem. Click Lift.
21 You can view two graphs showing different interpretations of the lift calculations. With the
. Cumulative Lift radio button selected, this indicates how much better the model does than the
naive guess. In this example, the Decision Tree finds about 2.5 times more positive cases than
chance in the first quantile. Select the Cumulative Positive Cases radio button.
21 The graph indicates how many cases must be analysed in order to find the desired number of
. positive cases, that is high-revenue customers.
Back to Topic
Apply a Predictive Model using a Mining Activity
When a model is applied to new data, the input data must be prepared and transformed
in exactly the same way that the source data for the Build activity was prepared. The
Apply activity is based on a Build activity. This Build activity passes to the Apply
activity whatever knowledge is required to prepare the input data appropriately. Perform
the following steps:
1. Click Activity > Apply
2. Click Next.

3. Expand Classification and select the Build Activity that you just built. Then click Next.
4. Click the Select link to choose the input data for the Apply procedure.
5. Expand DMUSER1 and select MINING_DATA_APPLY_V. Then click OK.

6. Accept the default data source. Click Next.


7. A table will be produced containing the Prediction and Probability for each customer. You can
select additional attributes for display in the results. Accept the defaults and click Next.
8. You have a choice of formats for the Predictions. Accept the default, which includes the Target
value with the highest Probability for each case, then click Next.
9. Enter a Name for the activity and click Next.
10 Ensure that the Run upon finish check box is selected. Click Finish.
.
11. Your Data Mining activity is running. When completed, click the Result link.
12 For each customer (case), the prediction of either 1 (high-revenue) or 0 (low-revenue) is
. included, along with the Confidence in that Prediction (Probability). Cost is another measure of
the value of the prediction – low Cost means high Probability. Since this model is a Tree, where
each branch represents a rule, you can highlight a row and click Rule to see how the prediction
was made.

13 The rule logic is explained.


.
Back to Topic
Summary
In this tutorial, you learned how to:
Launch Oracle Data Miner

Prepare Data for Data Mining

Use the Mining Activity Guides