You are on page 1of 49

Data Mining

and
Data Visualization
SOM 485
Fall 2007

Getting Started
What is Data Mining?
Online Analytical Processing
Data Mining Techniques
Market Basket Analysis
Limitations and Challenges to Data Mining
Data Visualization
Siftware Technologies

What is Data Mining (DM)?


Group of activities used to find different patterns
in data
Information provided through a Data Warehouse
Provides valuable information for different types
of research.

Applications of DM
Customer Relationship
Management (CRM)
software is an
application that can
benefit DM

Activities of CRM
One-to-One Marketing
Sales Force Automation
Sales Campaign Management
Marketing Encyclopedia
Call Center Automation

Verification of DM
Requires a lot of prior knowledge on the
decision makers part
Used mainly in casinos
i.e. Can determine if a new customer is a high roller, a souvenir
buyer, a ticket purchaser, etc.

Uses Siftware to help discover new


patterns of customer spending habits
Allows effective targeting to a specific group of customers

Online Analytical Processing


Online Analytical Processing (OLAP) was
introduced by E. F. Codd in 1993
OLAP: computer process that allows a
user to extract data from different view
points
Scientific and Academic organizations
store about 1 terabyte (1 trillion bytes) of
new data each day.

OLAP continue
Codds 12 Rules for OLAP

1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.

Multidimensional View
Transparent to the User
Accessible
Consistent Reporting
Client-Server architecture
Generic Dimensionality
Dynamic Sparse Matrix Handling
Multi-user Support
Cross-Dimensional Operations
Intuitive Data Manipulation
Flexible Reporting
Infinite Levels of Dimension and
Aggregation

OLAP: MOLAP & ROLAP


OLAP data is stored in a Multidimensional
Database (MBD)
MOLAP: OLAP application that accesses
data from a multidimensional database
MBD are frequently created using input
from an existing Relational Database
ROLAP: Relational Database server that
can work with SQL for portability and
scalability.

DATA MINING
TECHNIQUES

FOUR MAJOR
CATEGORIES

1. Classification
2. Association
3. Sequence
4. Cluster

CLASSIFICATION
- Mining processes
intended to discover
rules that define
whether an item
belongs to a particular
class of data
- Two Sub-processes:
1) Building a Model
2) Predicting
Classifications

ASSOCIATION
Techniques that employ association
search all details from operational
systems for patterns with a high probability
of repetition

Example: Market Basket Analysis

SEQUENCE
Time series analysis methods relate
events in time based on a series of
preceding events
Through analysis, various hidden trends,
often highly predictive of future events,
can be discovered.
Example: Mail Industry

CLUSTER
To create partitions so that all members of
each set are similar according to some
metric
Simply a set of objects grouped together
by virtue of their similarity or proximity to
each other
Example: Credit Card Transactions

DATA MINING
TECHNOLOGIES

Providing new answers to old questions


Developing new knowledge and understanding

through discovery
Statistical Analysis statistically evaluating
products and making a decision based on logical
reasoning
Neural Networks attempts to mirror the way
the human brain works in recognizing patterns
by developing mathematical structures with the
ability to learn

DATA MINING
TECHNOLOGIES CONT

Genetic Algorithms and Fuzzy Logic machine


learning techniques derive meaning from
complicated and imprecise data and can extract
patterns from and detect trends within the data
that are far too complex to be noticed by
humans
Decision Trees assists in data mining
applications by the classification of items or
events contained within the warehouse

NEW APPLICATIONS FOR


DATA MINING

Two new categories of applications

1) Text Mining summarizes, navigates, and


clusters documents contained in a database
2) Web Mining integrates data and text mining
within a Web site; enhances the Web site with
intelligent behavior, such as suggesting related
links or recommending new products to the
consumer

Market Basket Analysis

Market Basket Analysis

Market Basket Analysis

Market Basket Analysis is an algorithm that

examines a long list of transactions in order to


determine which items are most frequently
purchased together.
It takes its name from the idea of a person in a
supermarket throwing all of their items into a
shopping cart (a "market basket").

Market basket analysis one of the most


common and useful types of data analysis for
marketing.
With the data gathered from MBA, marketers
can group products that customers like and group
them together.
Market basket analysis can improve the
effectiveness of marketing and sales tactics.

Benefits of Market Basket Analysis:


A good indication of consumer behavior
Increase in sales
Improves customer satisfaction
Tracks what types of products interest
consumer and finds relative alternative ones to
introduce to the consumer.

ASSOCIATION RULES for MBA


Support
Confidence
Lift
Method
Association rules- are a common undirected data mining
technique and complement market basket analysis.
These rules are unidirectional
Left-hand side rule IMPLIES Right-hand side rule
ex. Pasta IMPLIES Wine, but Wine IMPLIES Pasta may not hold

40% of transactions that contain Pasta also


contain Wine. 4% of transaction contain both
of these items.
Support- % measure of baskets where the association rule is true
between the Left-hand side & the Right-hand side.
ex. 4% of transactions contain both
Confidence- Probability that the Right-hand side item is present
once the Left-hand side item is present.
ex. 40% of transactions that contain Pasta p=.40
Lift- compares the likelihood of finding the right-hand side item in
any random basket. Measures how well and associative rules
performs by comparing how well an item can sell without the other
item (improvement).

Method
Frozen
Pizza

Milk

Cola

Potato Chips

Pretzels

Frozen Pizza 2

Milk

Cola

Potato
Chips

Pretzels

Market Basket Analysis

Market Basket analysis- determines what products


customers purchase together

Limits to Market Basket Analysis


A large number of data is req. to obtain meaningful
data, but datas accuracy is compromised if all the
products dont occur w/in similar frequency.
ex. Milk sells almost every transaction, but

Elmers glue sells


sporadically, its not effective to put them in same basket analysis.

Sometimes presents results that are actually due to


the success of previous market campaigns.
ex. Discounted price of cola with purchase of pizza.

Using Data from MBA


Once information has been gathered about different
items and how they sell with respect to other items,
a store may want to change their layout of items to
improve their profits.

ex. Lunchboxes and School Supplies


For business without an actual storefront, they may want
to offer promotions for products that sell togetherincreasing sales.

MARKET BASKET ANALYSIS In a


Nutshell

Current Limitations and


Challenges to Data Mining

Current Limitations & Challenges to


Data Mining
New and underdeveloped field
Identification of missing information

Most companies run legacy systems


Not DW (data warehouse) friendly
DW designers have to convert existing ODSs
(operational data stores) to homogenous form
of DW

Current Limitations & Challenges to


Data Mining
Not all knowledge about application
domains are present in the data

ODSs are normally limited to those needed


by the operational application associated
with that DB

Data warehouse designers need to include


mechanisms for inventorying data

Data noise & missing values


Most operational databases contain data
errors in their values and/or classification
Errors lead to misclassification
Future data mining systems must incorporate
more sophisticated mechanisms for treating
noisy data
Bayesian technique a statistical technique

Large Databases & high


dimensionality
Databases are large & dynamic
Contents are always changing
Data patterns must be constantly updated
New discovery applications have to portion
problems into smaller chunks of manageable
data without losing any essential attributes of
the data

Data Visualization
Process by which numerical data are

converted into meaningful 3-D images


Example

Intended to analyze complex data


Data from: satellite photos, sonar

measurements, surveys, or computer


simulations

History of Data Visualization


Originated from statistics and science
Example of 2-D

Advancement credited to NCSA


National Center for Supercomputing
Applications

Newest developments by Xerox PARC in


virtual reality

Human Visual Perception


Human visual cortex dominates our
perception

Accelerates the identification of hidden


patterns in data
A picture is worth a thousand words

Geographical Information Systems


(GIS)

A special-purpose DB which common spatial


coordinate system is primary means of
reference

Requires:
1.
2.
3.
4.

Data input
Data storage, retrieval, and query
Data transformation, analysis, and modeling
Data reporting

Integrates info. and aids in decision making

GIS continued
Spatial Data elements stored in map
form

Contain three basic components:


1. Points
2. Lines
3. Polygons

Attribute Data describes spatial data


Example of GIS

Applications of Data Visualization


Techniques
Retail Banking
Government
Insurance
Health Care and Medicine
Telecommunications
Transportation
Capital Markets
Asset Management

Siftware Technologies

Siftware Technologies
IBM

Informix
Red Brick
DB2

Oracle
Silicon Graphics
Sybase

Offers several Data Mining solutions, depending on


users need.

IBM Information Warehouse Solutions


IBM Visualizer
Red Brick

Informix
Three-tier model
Tier 1: Client presentation layer
Tier 2: Hewlett-Packard hardware
Tier 3: Data layer INFORMIX OnLine
database

Sybase Warehouse WORKS


Assemble data from may sources
Transform data for a consistent and understandable
view
Distribute data where needed
Provide high-speed access to the data

Leading company for large-scale data mining


Data spread across mutliple databases
Data spread across processors for faster
queries

Discover new patterns and trends that may not


be realized using traditional SQL

Three-dimensional Visualization
Visual models can save days and even months
from the review process

Review
Data mining (DM)
Techniques used to mine data
Market Basket Analysis: The King of DM
Algorithms

Review continued..
Current Limitations and Challenges to
Data Mining

Data Visualization
Siftware Technologies

You might also like