Professional Documents
Culture Documents
Overview
Introduction
Explanation of Data Mining Techniques
Advantages
Applications
Privacy
Data Mining
What is Data Mining?
“The process of semi automatically analyzing large
databases to find useful patterns” (Silberschatz)
KDD – “Knowledge Discovery in Databases” (3)
“Attempts to discover rules and patterns from data”
Discover Rules Make Predictions
Areas of Use
Internet – Discover needs of customers
Economics – Predict stock prices
Science – Predict environmental change
Medicine – Match patients with similar problems cure
Example of Data Mining
Credit Card Company wants to discover information about
clients from databases. Want to find:
Clients who respond to promotions in “Junk Mail”
Clients that are likely to change to another competitor
Clients that are likely to not pay
Services that clients use to try to promote services affiliated
with the Credit Card Company
Anything else that may help the Company provide/ promote
services to help their clients and ultimately make more
money.
Data Mining & Data Warehousing
Data Warehouse: “is a repository (or archive) of
information gathered from multiple sources, stored under
a unified schema, at a single site.” (Silberschatz)
Collect data Store in single repository
Allows for easier query development as a single repository
can be queried.
Data Mining:
Analyzing databases or Data Warehouses to discover
patterns about the data to gain knowledge.
Knowledge is power.
Discovery of Knowledge
Data Mining Techniques
Classification
Clustering
Regression
Association Rules
Classification
Classification: Given a set of items that have several classes, and
given the past instances (training instances) with their
associated class, Classification is the process of predicting the
class of a new item.
Therefore to classify the new item and identify to which class it
belongs
Example: A bank wants to classify its Home Loan Customers into
groups according to their response to bank advertisements. The
bank might use the classifications “Responds Rarely, Responds
Sometimes, Responds Frequently”.
The bank will then attempt to find rules about the customers
that respond Frequently and Sometimes.
The rules could be used to predict needs of potential customers.
Technique for Classification
Decision-Tree Classifiers
Job
Engineer Doctor
Carpenter
>100K
<30K >50K <40K >90K <50K
Hierarchical
Group data into t-trees
Regression
“Regression deals with the prediction of a value, rather
than a class.” (1, P747)
Example: Find out if there is a relationship between
smoking patients and cancer related illness.