Data Mining-4 - Overview of Data Mining Methods (Old Book)

4
COIS 448: Data Mining &

Business Intelligence
Overview of Data Mining Methods
Information Systems Department

Faculty of Computing and Information Technology Rabigh
King Abdulaziz University
2 Data mining applications
 Automobile insurance company: Fraud detection

 Business applications: loan evaluation, customer segmentation,
employee evaluation…
 Data mining tools categorized by the tasks of classification,
estimation, prediction, clustering, and summarization.
 Classification, estimation, prediction are predictive, while
clustering and summarization are descriptive.
History
3
Statistics
AI:
genetic algorithms, neural networks
analogies with biology
memory-based reasoning
link analysis from graph theory
4
Data mining perspectives
 Methods can be viewed from different perspectives, data mining methods include:
 Market Basket Analysis
 Classification analysis
 Clustering analysis
 Regression of various forms
 AI:
 Artificial Neural Network (ANN)
 Rule induction (decision trees)
 Genetic algorithms (supplement)
Techniques
5
Statistical
 Market-Basket Analysis - find groups of items
 Memory-Based Reasoning- case based
 Cluster Detection - undirected (quantitative)
Artificial Intelligence
 Link Analysis - MCI’s Friends & Family
 Decision Trees, Rule Induction - production rule
 Neural Networks - automatic pattern detection
 Genetic Algorithms - keep best parameters
Models
6
Regression: Y = a + bX
Classification: assign new record to class
Predictive: assign value to new record
Clustering: groups for data
Time-series: assign future value
Links: patterns in data
Fitting
7
Underfitting: not enough detail
leave out important variables
Overfitting: too much detail
memorizes training set, but doesn’t help with
new data
data set too small
redundancy in data
Comparison of Features
8
Rules Neural Net CaseBase Genetic
Noisy data Good Very good Good Very good
Missing data Good Good Very good Good
Large sets Very good Poor Good Good
Different types Good Numerical Very good Transform
Accuracy High Very high High High
Explanation Very good Poor Very good Good
Integration Good Good Good Very good
Ease Easy Difficult Easy Difficult

Data Mining Functions
9
Classification
 Identify categories in data
Prediction
 Formula to predict future observations
Association
 Rules using relationships among entities
Detection
 Anomalies (unusual) & irregularities (fraud detection)
Financial Applications
10
Technique Application Problem Type
Neural net Forecast stock price Prediction

Forecast bankruptcy Prediction
NN, Rule
Fraud detection Detection
NN, Case Forecast interest rate Prediction
NN, visual Late loan detection Detection

Credit assessment Prediction
Rule
Risk classification Classification
Rule, Case Corporate bond rate ( 公司債 ) Prediction
11
Telecom Applications

Neural net, Forecast network
Prediction
Rule induction behavior.
Churn Classification
Rule induction
Fraud detection Detection
Case based Call tracking Classification
Marketing Applications
12

Market segment Classification
Rule induction
Cross-selling Association
Lifestyle analysis
Rule induction, Classification
visual Performance
Association
analysis.
Rule induction, Reaction to
Prediction
genetic, visual promotion
Case based Online sales support Classification
Web Applications
13

Rule induction, User browsing Classification,
Visualization similarity analysis. Association
Rule-based Web page content
Association
heuristics similarity
Other Applications
14
Neural net Software cost Detection
Neural net,
Litigation assessment Prediction
rule induction
Insurance fraud Detection
Rule induction
Healthcare except. Detection
Prediction
Insurance claim
Case based Classification
Software quality
Genetic algorithm Budget spending Classification

Data Sets
15
Loan Applications
 classification
Job Applications
 classification
Insurance Fraud
 detection
Expenditure Data
 prediction
Loan Data
16
650 observations
OUTCOMES (binary):
 On-time cost of error: $300
 Late (default) cost of error: $2,000
Variables
 Age, Income, Assets, Debts, Want, Credit
Credit ordinal
 Transform: Assets, Debts, & Want →Risk
Job Application Data
17
500 observations
OUTCOMES (ordinal):
 Unacceptable
 Minimal
 Acceptable
 Excellent
Variables
 Age, State, Degree, Major, Experience
State nominal; degree & major ordinal
State is superfluous
Insurance Claim Data
18
5000 observations
OUTCOMES (binary):
 OK cost of error $500
 Fraudulent cost of error $2,500
Variables
 Age, Gender, Claim, Tickets, Prior claims, Attorney
Gender & attorney nominal, tickets & prior claims
categorical
Expenditure Data
19
10,000 observations
OUTCOMES:
 Could predict response in a number of categories
 Others
Variables:
 Age, Gender, Marital, Dependents, Income, Job
years, Town years, Education years, Drivers license,
Own home, Number of credit cards
 Churn, proportion of income spent on seven
categories

Data Mining-4 - Overview of Data Mining Methods (Old Book)

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Data Mining-4 - Overview of Data Mining Methods (Old Book)

Uploaded by

Copyright:

Available Formats

4

COIS 448: Data Mining &

Overview of Data Mining Methods

Information Systems Department

 Automobile insurance company: Fraud detection

Rules Neural Net CaseBase Genetic

Noisy data Good Very good Good Very good

Missing data Good Good Very good Good

Large sets Very good Poor Good Good

Different types Good Numerical Very good Transform

Accuracy High Very high High High

Explanation Very good Poor Very good Good

Integration Good Good Good Very good

Ease Easy Difficult Easy Difficult

Neural net Forecast stock price Prediction

NN, visual Late loan detection Detection

Technique Application Problem Type

Technique Application Problem Type

Technique Application Problem Type

Genetic algorithm Budget spending Classification

You might also like