You are on page 1of 60

Business

Intelligence
DATA
MINING
“We’re drowning in information
but starving for knowledge.”
(John Naisbett)
Topics

• Data mining - an enabling technology for BI – predictive analytics


• Objectives and benefits of DM
• The wide range of applications of DM
• Standardized DM processes: CRISP- DM, SEMMA, KDD
• Data preprocessing for data mining
• Methods and algorithms of data mining
• Data mining software tools
• Pitfalls and myths of data mining
• DM – privacy issues
WHAT is Data Mining?
• Generally speaking, data mining is a way to
develop intelligence (actionable information
or knowledge) from data that an organization
collects, organizes, and stores.
• A wide range of data mining techniques is being
used by organizations
– to gain a better understanding of their customers and their
operations and
– to solve complex organizational problems
Why Data Mining?

• More intense competition at the global scale.

• Recognition of the “hidden” value in data sources.

• Movement toward the de-massification (conversion of


information resources into nonphysical form) of business
practices
Drivers

Ø Market – from focus on product/service to focus on customer


Ø IT : from focus on up-to-date balances to focus on patterns in
transactions - Data Warehouses - OLAP

• Availability of quality data on customers, vendors, transactions, Web, …


• Consolidation and integration of data repositories into data warehouses.
The pressing problem now is not the generation of data, but the attempt
to understand it, as many companies are data rich but information poor.
• The exponential increase in data processing and storage capabilities;
and decrease in cost.
What is Data Mining ?

• “The nontrivial process of identifying valid,


novel, potentially useful, and ultimately
understandable patterns in large collections of
data”
Fayyad et al., (1996)

• “Data mining is the process of discovering


meaningful new correlations, patterns and
trends by sifting through large amounts of data
stored in repositories, using pattern recognition
technologies as well as statistical and
mathematical techniques.”
(Gartner group)
DM definitions
• the process through which previously unknown
patterns in data were discovered.
• a process that uses statistical, mathematical, and
artificial learning techniques to extract and
identify useful information and subsequent
knowledge from large sets of data
• the process of finding mathematical patterns
from (usually) large sets of data; these can be
rules, affinities, correlations, trends, or prediction
models
Ar
Data Mining Pattern

tifi
c
Recognition

ial
s
tic

Int
is a blend of

tis

ellig
Sta

en
multiple

ce
DATA Machine
MINING Learning

Disciplines Mathematical
Modeling Databases

Management Science &


Information Systems
Just another name for old good
statistics?

Statistics :
- Impose a model on the data that we feel will replicate the actual
patterns in the data

DM :
- Let the data tell us the story
- To make sense of what was previously unable to be seen
Statistical forecasting and Data Mining

Statistical Forecasting
- we seek verification of previously held hypothesis
- we know which patterns exist in the time series data we forecast

Data Mining
- seeks discovery of new knowledge from the data
- allows the data itself to reveal the patterns within, rather than imposing
the patterns on the data at the outset
DM terminology – PREDICTION
(prediction +forecasting)
Prediction Forecasting
• Estimating a future
• The act of “telling” value based on past
about the future data values

• Guessing
+experiences
• Data and model
based
+ opinions
+ other relevant information
Terminology in Data Mining

Data mining terminology Statistical terminology

Output variable = Target variable Dependent Variable

Algorithm Forecasting model


Attribute = Feature Explanatory variable
Record Observation
Score (PREDICT) Forecast
Data mining

• In a state of flux, many definitions, lot of debate about what it is and


what it is not.

• “Statistics at scale and speed” (Darryl Pregibon )


Ppossible extension:
“ . . . And simplicity”

• Terminology not standard


classification, prediction, feature = independent variable,
target = dependent variable, etc.
Data Mining Characteristics/Objectives

• Source of data for DM is often a consolidated data warehouse (not


always!).
• DM environment is usually a client-server or a Web-based information
systems architecture.
• Data is the most critical ingredient for DM which may include
soft/unstructured data.
• The miner is often an end user
• Striking it rich requires creative thinking
• Data mining tools’ capabilities and ease of use are essential (Web,
Parallel processing,…)
• What is not Data • What is Data Mining?
Mining?
– Look up phone – Certain names are more prevalent
number in in certain locations
phone directory

– Query a Web – Group together similar documents


search engine returned by search engine
according to their context (Amazon
for information rainforest, Amazon.com)
about “Amazon”

Examples: What is (not) Data Mining?


Database processing vs Data mining

• QUERY • QUERY
Well defined Poorly defined
SQL Not precise query language

• DATA
• DATA Not operational data
Operational data
• OUTPUT
• OUTPUT Fuzzy
Precise Not a subset of a database
Subset of a database
Query Examples

• OLTP (Querying a database)


Find all credit applicants with last name AAA
Identify customers who have purchased more than 1000… last month
Find all customers who have purchased Y

• DATA MINING
Find all credit applicants who are poor credit risks - CLASSIFICATION
Identify customers with similar buying habits - CLUSTERING
Find all items which are frequently purchased with Y - ASSOCIATION
How Data Mining Works

• DM extract patterns from data


– Pattern? A mathematical (numeric and/or symbolic) relationship
among data items

• Types of patterns
– Association
– Prediction
– Cluster (segmentation)
– Sequential (or time series) relationships
Data Mining Applications

• Customer Relationship Management


– Maximize return on marketing campaigns
– Improve customer retention (churn analysis)
– Maximize customer value (cross-, up-selling)
– Identify and treat most valued customers
• Banking & Other Financial
– Automate the loan application process
– Detecting fraudulent transactions
– Maximize customer value (cross-, up-selling)
– Optimizing cash reserves with forecasting
Data Mining Applications

• Retailing and Logistics


– Optimize inventory levels at different locations
– Improve the store layout and sales promotions
– Optimize logistics by predicting seasonal effects
– Minimize losses due to limited shelf life
• Manufacturing and Maintenance
– Predict/prevent machinery failures
– Identify anomalies in production systems to optimize the use
manufacturing capacity
– Discover novel patterns to improve product quality
Data Mining Applications

• Brokerage and Securities Trading


– Predict changes on certain bond prices
– Forecast the direction of stock fluctuations
– Assess the effect of events on market movements
– Identify and prevent fraudulent activities in trading
• Insurance
– Forecast claim costs for better business planning
– Determine optimal rate plans
– Optimize marketing to specific customers
– Identify and prevent fraudulent claim activities
Data Mining Applications

• Computer hardware and software


• Science and engineering
• Government and defense
• Homeland security and law enforcement
• Travel, entertainment, sports
• Healthcare and medicine
• Sports,… virtually everywhere…
Data Mining Process

• A manifestation of the best practices


• A systematic way to conduct DM projects
• Moving from Art to Science for DM project
• Everybody has a different version
• Most common standard processes:
– CRISP-DM (Cross-Industry Standard Process for Data Mining)
– SEMMA (Sample, Explore, Modify, Model, and Assess)
– KDD (Knowledge Discovery in Databases)
Data Mining Process: CRISP-DM

• Cross Industry Standard Process for Data Mining


• Proposed in 1990s by a European consortium
• Composed of six consecutive steps

– Step 1: Business Understanding ü Accounts for


ï
– Step 2: Data Understanding ý ~85% of total
– Step 3: Data Preparation ï project time
þ
– Step 4: Model Building
– Step 5: Testing and Evaluation
– Step 6: Deployment
Real-world
Data

• Collect data
Data Consolidation • Select data
• Integrate data

• Impute missing values


Data Cleaning • Reduce noise in data

Data
• Eliminate inconsistencies

Preparation –
• Normalize data
Data Transformation • Discretize/aggregate data
• Construct new attributes

A Critical DM • Reduce number of variables

Task
Data Reduction • Reduce number of cases
• Balance skewed data

Well-formed
Data
Data Mining Process: CRISP-DM
The Six-Step CRISP-DM
Data Mining Process

The process is highly


repetitive and
experimental
(DM: art versus science?)
Data Mining Process: SEMMA

Developed by SAS Institute


Data Mining Process: KDD
KDD (Knowledge Discovery in Databases) Process.
Which Data Mining Process is the Best?
Ranking of Data Mining Methodologies/Processes.

Source: KDnuggets.com.
DM tasks
• Prediction: the act of “telling” about the future
• Classification: analyzing the historical behavior
of groups of entities with similar characteristics,
to predict the future behavior of a new entity
from its similarity to those groups
• Clustering: finding groups of entities with similar
characteristics
• Association: establishing relationships among
items that occur together
• Sequence discovery: finding time-based
associations
DM tasks
• Visualization: presenting results obtained
through one or more of the other methods
• Regression: a statistical estimation technique
based on fitting a curve defined by a
mathematical equation of known type but
unknown parameters to existing data
• Forecasting: estimating a future data value
based on past data values.
A Taxonomy for Data Mining tasks, methods and
algorithms
PREDICTION
CLASSIFICATION REGRESSION
• For prediction that can be • A statistical estimation
used on historical data and technique – fitting a curve
relationships defined by a mathematical
• What is being predict is a equation of known type but
class label unknown parameters to
existing data
• Weather predictions: sunny, • What is being predict is a
rainy, cloudy…. numeric value

• Weather predictions: 25 0 C
Classification
Analyzing the historical behavior of groups of entities with similar
characteristics, to predict the future behavior of a new entity from its
similarity to those groups

• Part of the machine-learning family - employ supervised learning


• Learn from past data, classify new data
• Purpose: To create a model that allow us to predict a class of
objects whose label is unknown
• The output variable is categorical (nominal or ordinal) in nature
Classification
Techniques
• Decision tree analysis
• Statistical analysis
• Neural networks
• Support vector machines
• Case-based reasoning
• Bayesian classifiers
• Genetic algorithms
• Rough sets
Classification: Application 1
Direct Marketing
– Goal: Reduce cost of mailing by targeting a set of consumers likely to
buy a new cell-phone product.
– Approach:
• Use the data for a similar product introduced before.
• We know which customers decided to buy and which decided
otherwise. This {buy, don’t buy} decision forms the class attribute.
• Collect various demographic, lifestyle, and company-interaction
related information about all such customers.
– Type of business, where they stay, how much they earn, etc.
• Use this information as input attributes to learn a classifier model.
Classification: Application 2
Fraud Detection
– Goal: Predict fraudulent cases in credit card transactions.
– Approach:
• Use credit card transactions and the information on its account-
holder as attributes.
– When does a customer buy, what does he buy, how often he
pays on time, etc.
• Label past transactions as fraud or fair transactions. This forms the
class attribute.
• Learn a model for the class of the transactions.
• Use this model to detect fraud by observing credit card
transactions on an account.
Classification: Application 3
Customer Attrition/Churn:
– Goal: To predict whether a customer is likely to be lost to a
competitor.
– Approach:
• Use detailed record of transactions with each of the past
and present customers, to find attributes.
– How often the customer calls, where he calls, what time-
of-the day he calls most, his financial status, marital
status, etc.
• Label the customers as loyal or disloyal.
• Find a model for loyalty.
Segmentation - Cluster Analysis
for Data Mining
Finding groups of entities with similar characteristics
• Used for automatic identification of natural groupings of things
• Part of the machine-learning family - employ unsupervised
learning
• Learns the clusters of things from past data, then assigns new
instances
• There is not an output variable
• Whether the clusters unearthed are useful to the business manager
is subjective
Cluster Analysis for Data Mining
Clustering results may be used to
– Identify natural groupings of customers
– Identify rules for assigning new cases to classes for
targeting/diagnostic purposes
– Provide characterization, definition, labeling of populations
– Decrease the size and complexity of problems for other data
mining methods
– Identify outliers in a specific domain (ex: rare-event detection)
Cluster Analysis for Data Mining
Analysis methods
– Statistical methods (including both hierarchical and
nonhierarchical), such as k-means, k-modes, and so on.
– Neural networks (adaptive resonance theory [ART], self-
organizing map [SOM])
– Fuzzy logic
– Genetic algorithms
Cluster Analysis for Data Mining
How many clusters?
– There is not a “truly optimal” way to calculate it
– Heuristics are often used

Most cluster analysis methods involve the use of a distance measure to


calculate the closeness between pairs of items.
– Euclidian versus Manhattan/Rectilinear distance
Clustering: Application
Market Segmentation
– Goal: subdivide a market into distinct subsets of customers
where any subset may conceivably be selected as a market
target to be reached with a distinct marketing mix.
– Approach:
• Collect different attributes of customers based on their
geographical and lifestyle related information.
• Find clusters of similar customers.
• Measure the clustering quality by observing buying patterns
of customers in same cluster vs. those from different
clusters.
Association Rule Mining

Establishing relationships among items


that occur together
• Finds interesting relationships (affinities) between
variables (items or events)
• Part of machine learning family - employs
unsupervised learning
• There is no output variable
• Also known as market basket analysis / affinity
analysis
Association Rule Mining
• Input: the simple point-of-sale transaction data
• Output: Most frequent affinities among items
• Example: according to the transaction data…
“Customer who bought a lap-top computer and a virus protection
software, also bought extended service plan 70 percent of the time."
• How do you use such a pattern/knowledge?
– Put the items next to each other
– Promote the items as a package
– Place items far apart from each other!
A representative applications of
association rule mining include
– In business: cross-marketing, cross-
selling, store design, catalog
design, e-commerce site design,
optimization of online advertising,
Association product pricing, and
sales/promotion configuration
Rule Mining – In medicine: relationships between
symptoms and illnesses; diagnosis
and patient characteristics and
treatments (to be used in medical
DSS); and genes and their functions
(to be used in genomics projects)
– …
Association Rule Mining
Are all association rules interesting and useful?
A Generic Rule: X Þ Y [S%, C%]

X, Y: products and/or services


X: Left-hand-side (LHS)
Y: Right-hand-side (RHS)
S: Support: how often X and Y go together
C: Confidence: how often Y go together with the X

Example: {Laptop Computer, Antivirus Software} Þ {Extended Service


Plan} [30%, 70%]
Association Rule Discovery:
Application 1
Marketing and Sales Promotion:
– Let the rule discovered be
{Bagels, … } --> {Potato Chips}
– Potato Chips as consequent => Can be used to determine what
should be done to boost its sales.
– Bagels in the antecedent => Can be used to see which products
would be affected if the store discontinues selling bagels.
– Bagels in antecedent and Potato chips in consequent => Can be
used to see what products should be sold with Bagels to
promote sale of Potato chips!
Association Rule Discovery:
Application 2
Supermarket shelf management.
– Goal: To identify items that are bought together by sufficiently
many customers.
– Approach: Process the point-of-sale data collected with barcode
scanners to find dependencies among items.
– A classic rule :
• If a customer buys diaper and milk, then he is very likely to
buy beer:
Data Mining Software Tools
Popular Data Mining Software Tools (Poll Results).
• Commercial
– IBM SPSS Modeler (formerly
Clementine)
– SAS Enterprise Miner
– Statistica - Dell/Statsoft
– … many more
• Free and/or Open Source
– KNIME
– RapidMiner
– Weka
– R, …

XLMiner - learning tool Source: KDnuggets.com.


Data Mining & Privacy Issues

Case Study. Predicting Customer


Buying Patterns—
The Target Story

What is the threshold between discovery of


knowledge and infringement of privacy?
How Target used advanced analytical
methods to drive new revenue
After analysing consumer-purchasing behaviour,
Target’s statisticians determined that the retailer
made a great deal of money from three main life-
event situations.
• Marriage, when people tend to buy many new
products
• Divorce, when people buy new products and
change their spending habits
• Pregnancy, when people have many new things
to buy and have an urgency to buy them
How Target used advanced analytical
methods to drive new revenue
• Target determined that the most lucrative of these life-
events is the third situation: pregnancy.
• Using data collected from shoppers, Target was able to
identify this fact and predict which of its shoppers were
pregnant. In one case, Target knew a female shopper was
pregnant even before her family knew. This kind of
knowledge allowed Target to offer specific coupons and
incentives to their pregnant shoppers.
• In fact, Target could not only determine if a shopper was
pregnant, but in which month of pregnancy a shopper may
be. This enabled Target to manage its inventory, knowing
that there would be demand for specific products and it
would likely vary by month over the coming nine- to ten-
month cycles.
Data Mining & Privacy Issues

Predicting Customer Buying Patterns – The Target


Story

1. What do you think about data mining and its implication for
privacy? What is the threshold between discovery of
knowledge and infringement of privacy?

1. Did Target go too far? Did it do anything illegal? What do you


think Target should have done? What do you think Target
should do next (quit these types of practices)?
Data Mining Myths

Myth Reality
Data mining provides instant, crystal-ball-like Data mining is a multistep process that
predictions. requires deliberate, proactive design and
use.
Data mining is not yet viable for mainstream The current state of the art is ready for
business applications. almost any business type and/or size.
Data mining requires a separate, dedicated Because of the advances in database
database. technology, a dedicated database is not
required.
Only those with advanced degrees can do data Newer Web-based tools enable managers
mining. of all educational levels to do data mining.
Data mining is only for large firms that have lots If the data accurately reflect the business
of customer data. or its customers, any company can use
data mining.
Common Data Mining Blunders

1. Selecting the wrong problem for data mining


2. Ignoring what your sponsor thinks data mining is and what it
really can/cannot do
3. Not leaving insufficient time for data acquisition, selection and
preparation
4. Looking only at aggregated results and not at individual
records/predictions
5. Being sloppy about keeping track of the data mining procedure
and results
Successful managers need to know about the
possibilities and limitations of data mining

• It is essential that managers are able to


translate business or other functional problems
into the appropriate statistical problem before
it can be ”handed off” to a technical team

• Case studies and industry


– Datamation magazine website
http://www.datamation.com

You might also like