Professional Documents
Culture Documents
Univ. of Helsinki/Dept. of CS
Pirjo Moen
Univ. of Helsinki/Dept. of CS
Classification
Introduction 17.1. KDD Process
20.1.
What is classification?
28.2. Aim: predict categorical class labels for new tuples/samples Input: a training set of tuples/samples, each with a class label Output: a model (a classifier) based on the training set and the class labels
Clustering
Page 1
Page 3
Pirjo Moen
Univ. of Helsinki/Dept. of CS
Pirjo Moen
Univ. of Helsinki/Dept. of CS
What is prediction?
Is similar to classification
constructs a model uses the model to predict unknown or missing values
Overview
Page 2
Page 4
Pirjo Moen
Univ. of Helsinki/Dept. of CS
Pirjo Moen
Univ. of Helsinki/Dept. of CS
Typical applications
Credit approval Target marketing Medical diagnosis Treatment effectiveness analysis Performance prediction
Applications
Page 5
Page 7
Pirjo Moen
Univ. of Helsinki/Dept. of CS
Pirjo Moen
Univ. of Helsinki/Dept. of CS
Prediction:
models continuous-valued functions predicts unknown or missing values
Page 6 Data mining methods Spring 2005
Page 8
Pirjo Moen
Univ. of Helsinki/Dept. of CS
Pirjo Moen
Univ. of Helsinki/Dept. of CS
Model construction
Each tuple/sample is assumed to belong a prefined class. The class of a tuple/sample is determined by the class label attribute. The training set of tuples/samples is used for model construction. The model is represented as classification rules, decision trees or mathematical formulae.
NAME Mary James Bill John Mark Annie
Step 1
RANK YEARS TENURED Assistant Prof 3 No Assistant Prof 7 Yes Professor 2 Yes Associate Prof 7 Yes Assistant Prof 6 No Associate Prof 3 No
0 2 &1
87
'
" @ 2 ' F
'
9
Page 9
'
Page 11
Pirjo Moen
Univ. of Helsinki/Dept. of CS
Pirjo Moen
Univ. of Helsinki/Dept. of CS
Model usage
Classify future or unknown objects
' 5
Step 2
(Jeff, Professor, 4)
NAME Tom Lisa Jack Ann RANK YEARS TENURED Assistant Prof 2 No Associate Prof 7 No Professor 5 Yes Assistant Prof 7 Yes
Tenured?
ED
'
'
Page 12
)
Pirjo Moen
Univ. of Helsinki/Dept. of CS
Pirjo Moen
Univ. of Helsinki/Dept. of CS
Data preparation
Data cleaning
noise missing values
B?
A decision tree is a tree where internal node = a test on an attribute tree branch = an outcome of the test leaf node = class label or class distribution
D?
Yes
Data transformation
Page 13
Page 15
Pirjo Moen
Univ. of Helsinki/Dept. of CS
Pirjo Moen
Univ. of Helsinki/Dept. of CS
tree pruning
identify and remove branches that reflect noise or outliers
Page 14
Page 16
Pirjo Moen
Univ. of Helsinki/Dept. of CS
Pirjo Moen
Univ. of Helsinki/Dept. of CS
PRS
TUV
WH
GX Y` `a
Training set
e `X
SH
TG
Sf I W
TV
ba
cd `
Hb
!$ ! %
' (&
! "
10
1$
@ %
D 3
'
E$
Page 17
07
9 @6
Page 19
Pirjo Moen
Univ. of Helsinki/Dept. of CS
Pirjo Moen
Univ. of Helsinki/Dept. of CS
9 0
9
@ 2
2 @
Page 18
Page 20
Pirjo Moen
Univ. of Helsinki/Dept. of CS
Pirjo Moen
Univ. of Helsinki/Dept. of CS
pi n i I pi , ni p n
Page 21
Page 23
Pirjo Moen
Univ. of Helsinki/Dept. of CS
Pirjo Moen
Univ. of Helsinki/Dept. of CS
Information gain
Select the attribute with the highest information gain Let P and N be two classes and S a dataset with p P-elements and n N-elements The amount of information needed to decide if an arbitrary example belongs to the class P or N is
I p,n p
I p,n
I 9,5
0 . 940
p n
log 2
p p n
n p n
log 2
n p n
Page 22
Page 24
Pirjo Moen
Univ. of Helsinki/Dept. of CS
Pirjo Moen
Univ. of Helsinki/Dept. of CS
pi 2 4 3
ni 3 0 2
Now Hence
E outlook
5 I 2,3 14
5 I 3,2 14
Gain outlook
I 9,5
E outlook
0 . 246
Similarly
Page 25
Page 27
Pirjo Moen
Univ. of Helsinki/Dept. of CS
Pirjo Moen
Univ. of Helsinki/Dept. of CS
postpruning:
remove branches from a fully grown tree best pruned tree ?
Branching scheme
Labeling rule
Page 28
Pirjo Moen
Univ. of Helsinki/Dept. of CS
Pirjo Moen
Univ. of Helsinki/Dept. of CS
Incremental:
Probabilistic prediction: Standard of optimal decision making against which other methods can be measured
Page 31
Page 29
Pirjo Moen
Univ. of Helsinki/Dept. of CS
Pirjo Moen
Univ. of Helsinki/Dept. of CS
Bayesian classification
The classification problem may be formalized using posterior probabilities:
P(C|X) = probability that the sample tuple X = <x1,,xk> belongs to the class C
For example
P( class=N | outlook=sunny,windy=true,)
Idea:
assign to a sample X the class label C such that P(C|X) is maximal
Page 30
Page 32
Pirjo Moen
Univ. of Helsinki/Dept. of CS
Pirjo Moen
Univ. of Helsinki/Dept. of CS
Finding a class C such that P(C|X) is maximum means finding a class C such that P(X|C)P(C) is maximum
problem: computing P(X|C) is unfeasible!
Page 33
Page 35
Pirjo Moen
Univ. of Helsinki/Dept. of CS
Pirjo Moen
Univ. of Helsinki/Dept. of CS
Pirjo Moen
Univ. of Helsinki/Dept. of CS
Pirjo Moen
Univ. of Helsinki/Dept. of CS
Classification accuracy
Estimating error rates: Partition: training-and-testing (large data sets)
use two independent data sets, e.g., training set (2/3), test set(1/3)
Page 37
Page 39
Pirjo Moen
Univ. of Helsinki/Dept. of CS
Pirjo Moen
Univ. of Helsinki/Dept. of CS
Summary
Classification is an extensively studied problem. Classification is probably one of the most widely used data mining techniques with a lot of extensions. Scalability is still an important issue for database applications. Research directions: classification of nonrelational data, e.g., text, spatial and multimedia.
Page 40
More methods
Page 38
Pirjo Moen
Univ. of Helsinki/Dept. of CS
Pirjo Moen
Univ. of Helsinki/Dept. of CS
References
C. Apte and S. Weiss. Data mining with decision trees and decision rules. Future Generation Computer Systems, 13, 1997. F. Bonchi, F. Giannotti, G. Mainetto, D. Pedreschi. Using Data Mining Techniques in Fiscal Fraud Detection. In Proc. DaWak'99, First Int. Conf. on Data Warehousing and Knowledge Discovery, Sept. 1999. F. Bonchi , F. Giannotti, G. Mainetto, D. Pedreschi. A Classification-based Methodology for Planning Audit Strategies in Fraud Detection. In Proc. KDD-99, ACM-SIGKDD Int. Conf. on Knowledge Discovery & Data Mining, Aug. 1999. J. Catlett. Megainduction: machine learning on very large databases. PhD Thesis, Univ. Sydney, 1991. P. K. Chan and S. J. Stolfo. Metalearning for multistrategy and parallel learning. In Proc. 2nd Int. Conf. on Information and Knowledge Management, p. 314-323, 1993. J. R. Quinlan. C4.5: Programs for Machine Learning. Morgan Kaufman, 1993. J. R. Quinlan. Induction of decision trees. Machine Learning, 1:81-106, 1986. L. Breiman, J. Friedman, R. Olshen, and C. Stone. Classification and Regression Trees. Wadsworth International Group, 1984. P. K. Chan and S. J. Stolfo. Learning arbiter and combiner trees from partitioned data for scaling machine learning. In Proc. KDD'95, August 1995. J. Gehrke, R. Ramakrishnan, and V. Ganti. Rainforest: A framework for fast decision tree construction of large datasets. In Proc. 1998 Int. Conf. Very Large Data Bases, pages 416-427, New York, NY, August 1998. Page 41 Data mining methods Spring 2005
Thank you
Thank you for Jiawei Han from Simon Fraser University for his slides which greatly helped in preparing this lecture! Also thank you for Fosca Giannotti and Dino Pedreschi from Pisa for their slides of classification!
Page 43
Pirjo Moen
Univ. of Helsinki/Dept. of CS
References (2)
B. Liu, W. Hsu and Y. Ma. Integrating classification and association rule mining. In Proc. KDD98, New York, 1998. J. Magidson. The CHAID approach to segmentation modeling: Chi-squared automatic interaction detection. In R. P. Bagozzi, editor, Advanced Methods of Marketing Research, pages 118-159. Blackwell Business, Cambridge Massechusetts, 1994. M. Mehta, R. Agrawal, and J. Rissanen. SLIQ : A fast scalable classifier for data mining. In Proc. 1996 Int. Conf. Extending Database Technology (EDBT'96), Avignon, France, March 1996. S. K. Murthy, Automatic Construction of Decision Trees from Data: A Multi-Diciplinary Survey. Data Mining and Knowledge Discovery 2(4): 345-389, 1998 J. R. Quinlan. Bagging, boosting, and C4.5. In Proc. 13th Natl. Conf. on Artificial Intelligence (AAAI'96), 725730, Portland, OR, Aug. 1996. R. Rastogi and K. Shim. Public: A decision tree classifer that integrates building and pruning. In Proc. 1998 Int. Conf. Very Large Data Bases, 404-415, New York, NY, August 1998. J. Shafer, R. Agrawal, and M. Mehta. SPRINT : A scalable parallel classifier for data mining. In Proc. 1996 Int. Conf. Very Large Data Bases, 544-555, Bombay, India, Sept. 1996. S. M. Weiss and C. A. Kulikowski. Computer Systems that Learn: Classification and Prediction Methods from Statistics, Neural Nets, Machine Learning, and Expert Systems. Morgan Kaufman, 1991. D. E. Rumelhart, G. E. Hinton and R. J. Williams. Learning internal representation by error propagation. In D. E. Rumelhart and J. L. McClelland (eds.) Parallel Distributed Processing. The MIT Press, 1986 Page 42