4 Classification 1

University of Human Development College of science and technology Computer department 4rth class
CLASSIFICATION
Dr. Aso Mohammad Darwesh
Chapter Four 1
aso.darwesh@yahoo.fr
Outlines
2
Classification and prediction Decision Tree Induction Bayesian classification Nearest Neighbor Classification Rule-Based Classification Artificial Neural Network Support Vector Machines
Data Mining - 4rth class UHD Aso M. Darwesh
Classification and prediction

3
Classification: Dividing up objects to one and only one class

Classes
are mutually exhaustive and exclusive Classification predicts categorical classes
Example
Medical
diagnosis: analyzing a tumor (cancerous or
benign)
Prediction: Models continuous valued functions Example

Predicting
the benefit of a new costumer

Aso M. Darwesh
Data Mining - 4rth class UHD
Classification
4
Given dataset
Collection
of instances (records)
Model building
Find
a model for class attribute as a function of the values of other attributes
Goal
Classify
previously unseen instances as accurately as possible

Classification illustration
5 Age Gender 19 21 20 35 34 28 35 40 35 23 24 23 24 F F M M M M F F M M F F F Specialty IT IT Medicine Engineering Medicine Sociology IT Medicine IT IT Engineering Medicine Sociology Sportive Yes Yes No No Yes No Yes No Yes No No No Yes Age Gender Specialty Sportive 23 F IT ? 30 M IT ? 28 F Medicine ? 27 M Engineering ? 29 F Sociology ?
Classification algorithm Building Model Using
Learning set
Test set
Classification: model building

6
Model: represents one of the following

Classification
rules (e.g., if x then y) Decision tree (Automatically generating classification rules) Mathematical formulae (e.g., f(attributes) = class label)
Model building
The

given dataset is divided into
Training set used to build the model Test set used to validate it and find the accuracy of the model
Model building contd

7
Using the model

For
classifying future (unseen) instances
Estimating model accuracy

known label of test set is compared with the classified result from the model Accuracy rate is the percentage of test set instances that are correctly classified by the model If the accuracy is acceptable, use the model to classify new data
The
Decision Tree Induction

8
Partitioning dataset based on the value of an attribute (or un attribute) Creating a branch for each value of its possible values For continuous attributes the test is normally like less than or equal to or greater than The splitting process continues until each branch can be labeled with just one classification
Decision Tree Induction: Example

9 Age Gender 19 21 20 35 34 28 35 40 35 23 24 23 24 F F M M M M F F M M F F F Specialty IT IT Medicine Engineering Medicine Sociology IT Medicine IT IT Engineering Medicine Sociology Sportive Yes Yes No No Yes No Yes No Yes No No No Yes
Specialty
Engineering
Medicine
Sociology
IT
No
No
Gender F Age 30 Yes
Gender
F Yes
M No
M Yes <30 No
Aso M. Darwesh
Decision Tree Induction contd

10
Used for all types of data

Categorical Continuous
Decision tree has two main functions

Data

compression
Tree representation is equivalent to the dataset in the sense that the values of all attributes will lead to identical classification
Prediction
unseen instances
Aso M. Darwesh
TDIDT algorithm contd

11
Top-Down Induction of Decision Trees Has no preconditions The same attribute cannot be assumed twice in the same branch Production decision rules in the implicit form of a decision tree At each non-leaf node an attribute is chosen for splitting
TDIDT algorithm problems

12
TDIDT algorithms have two main distinguish aspects Impurity measure Selection method (underspecified)
The
algorithm specifies Select an attribute A to split on but no method is given for doing this
This led to the introduction of various TDIDT algorithms

ID3
(Iterative Dichotomiser) [Quinlan] C4.5


13
To construct decision tree T from learning set S:

If
all examples in S belong to some class C Then make leaf labeled C

Otherwise
select the most informative attribute A partition S according to As values recursively construct subtrees T1, T2, ..., for the subsets of S
Aso M. Darwesh

14
Resulting tree T is:

A
Attribute A
v1
v2
vn
As values
T1
T2
Tn
Aso M. Darwesh
Subtrees
Measurement of attribute selection

15
Select attribute which partitions the learning set into subsets as pure as possible
Entropy Gini
index tables
Frequency G2
Aso M. Darwesh
Entropy
16
The average amount of information needed to classify an object n is the number of classes in the dataset
n
E ! pi log 2 pi
i
Where pi { 0 For K classes, pi is the relative frequency of class i

The Logarithm Function log2X

17
log2x=y 2y=x (x>0) e.g., log28=3, because 23=8 Properties The value of log2x is: Positive when x>1 Negative when x<1 Zero when x=1
The Logarithm Function: properties

18
log2(a b) = log2 a + log2 b log2 (a/b) = log2 a log2 b log2 (an) = n log2 a log2 (1/a) = log2 a
Aso M. Darwesh
The function x loog2x

19
The value of -x log2x is in [0,1] when x is in [0,1] Maximum value of -x log2x is when x=1/e (e2.71828) The initial minus sign (-) is included to make the value of the function positive (or zero)
The logarithm of other bases

20
Natural logarithm
logex
written as lnx
Other common uses base of logarithm is 10

log10 ( x ) log e ( x) log a ( x) ! ! log10 ( a ) log e ( a ) ln( x ) log a ( x) ! ln(a )
Entropy: Example
21
E ! pi log 2 pi
i
C1 C2
C1 C2
0 6
1 5
P1 = 0/6 = 0
P2 = 6/6 = 1
Entropy = 0 log2 0 1 log2 1 = 0 0 = 0 P1= 1/6 P2 = 5/6
Entropy = (1/6) log2 (1/6) (5/6) log2 (5/6) = 0.65
C1 C2
2 4
P1= 2/6
P2 = 4/6
Entropy = (2/6) log2 (2/6) (4/6) log2 (4/6) = 0.92

Entropy: Example
22 Age Gender 19 21 20 35 34 28 35 40 35 23 24 23 24 F F M M M M F F M M F F F Specialty IT IT Medicine Engineering Medicine Sociology IT Medicine IT IT Engineering Medicine Sociology Sportive Yes Yes No No Yes No Yes No Yes No No No Yes
For the initial dataset we calculate Estart

There are only two classes Class (Sportive = Yes) = 6 Class (Sportive = No) = 7 PYes=6/13 PNo=7/13
EStart = -(6/13)log2(6/13)-(7/13)log2 (7/13) = 0,99572745

Aso M. Darwesh
Entropy: Example
23
Splitting on attribute Specialty For subset (Specialty = IT)

Class
Age Gender Specialty 19 21 35 35 23 F F F M M IT IT IT IT IT
Sportive Yes Yes Yes Yes No
(Sportive = Yes) = 4 Class (Sportive = No) = 1 PYes=4/5 PNo=1/5
EIT
= [(4/5)log2(4/5)+(1/5)log2 (1/5)] = 0,72192809

Example contd
24
For subset (Specialty = Medicine)

Class
Age Gender 20 34 40 23 M M F F
Specialty Medicine Medicine Medicine Medicine
Sportive No Yes No No

EMedicine
= [(1/4)log2(1/4)+(3/4)log2 (3/4)] = 0,81127812
Aso M. Darwesh
Example contd
25
For subset (Specialty = Engineering)

Class

EEngineering
Age Gender 35 24 M F
Specialty Engineering Engineering
Sportive No No
= 0 (when all instances belong to the same
class)
Aso M. Darwesh
Example contd
26
For subset (Specialty = Sociology)

Class
Age Gender 28 24 M F
Specialty Sociology Sociology
Sportive No Yes

ESociolgy
= 1 (when instances are equally distributed amongst classes)
Aso M. Darwesh
Example contd
27
Now, we calculate ENew ENew is the weighted means of Eis Weights are the proportion of the original instances in each subset ENew = (5/13)EIT+(4/13)EMed+(2/13)EEng+(2/13)ESoc = (5/13)* 0,72192809 +(4/13)* 0,81127812 +(2/13)*0+(2/13)*1 = 0,68113484
Example contd
28
We define Information Gain = EStart - Enew IG = 0,99572745 - 0,68113484 = 0,3 It must be calculate Enew for all other attributes of the original dataset Homework
Aso M. Darwesh
Example contd
29
We define Information Gain = EStart - Enew The entropy method of attribute selection is to choose to split on the attribute that maximizes the value of Information Gain This is equivalent to minimizing the value of ENew as EStart is fixed
Aso M. Darwesh
References
30
Principles of Data Mining, by Max Bramer, Springer-Verlag London Limited,2006. 342 pages, ISSN 1863-7310 Data Mining: Concepts and Techniquesby Jiawei Han and Micheline Kamber, Morgan Kaufmann Publishers, 2006. 772 pages. ISBN 1-55860-489-8 Seyed R. Mousavi and Krysia Broda, Impact of Binary Coding on Multiway-split TDIDT Algorithms. International Journal of Electrical and Electronics Engineering 2:3 2008, P 150-159.
University of Human Development College of science and technology Computer department 4rth class
CLASSIFICATION
Dr. Aso Mohammad Darwesh
Chapter Four - 2
aso.darwesh@yahoo.fr
Gini index
32
Aso M. Darwesh
TDIDT algorithm
33
Principles of Data Mining, Max Bramer. Page 48
Aso M. Darwesh
Bayesian classification
34
Nave Bayes classifiers Does not use rules, decision tree, etc. Using probability theory to find the most likely of the possible classifications The sum of the probabilities of a set of mutually exclusive and exhaustive events must always be 1 The outcome of each trial is recorded in one row of a table. Each row must have one and only one classification
35
Define dataset as in page 26 and the paragraph of the training set constitutes The probability of an event occuring if we know that an attribute has a particular value (or that several variables have particular values) is called the conditional probability of the event occuring and is written as e.g., p(class=on time|season=winter)
Aso M. Darwesh
36
Nave Bayes (1702-1761) Combining the prior and conditional probability in a single formula Nave: The effect of the value of one attribute on the probability of a given classification is independent of the values of the other attributes
Aso M. Darwesh
Nave Bayes Classification: algo.

37
Aso M. Darwesh
38
Problems It relies on all attributes being categorical Estimating probabilities by relative frequencies can give a poor estimate if the number of instances with a given attribute / value combination is small
Aso M. Darwesh
39
A test set is used to determine the accuracy of the model Usually, the given data set is divided into Training set
Used
to build the model to validate it
Test set
Used
Aso M. Darwesh
Classification and prediction contd

40
Model construction: j Prediction: p
Aso M. Darwesh
Nearest Neighbour Classification

41
Used when all attribute values are continuous The idea is to estimate the classification of an unseen instance using the classification of the instance or instances that are closest to it, in some sense that we need to define
Aso M. Darwesh
Nearest Neighbour Classification

42
Aso M. Darwesh
Rule-Based Classification
43
Aso M. Darwesh
Artificial Neural Network

44
Aso M. Darwesh
Support Vector Machines

45
Aso M. Darwesh

4 Classification 1

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

4 Classification 1

Uploaded by

Copyright:

Available Formats

University of Human Development College of science and technology Computer department 4rth class

Classification and prediction

Classification: Dividing up objects to one and only one class

are mutually exhaustive and exclusive Classification predicts categorical classes

diagnosis: analyzing a tumor (cancerous or

Prediction: Models continuous valued functions Example

the benefit of a new costumer

Data Mining - 4rth class UHD

a model for class attribute as a function of the values of other attributes

previously unseen instances as accurately as possible

Classification algorithm Building Model Using

Classification: model building

Model: represents one of the following

given dataset is divided into

Model building contd

Using the model

classifying future (unseen) instances

Estimating model accuracy

Decision Tree Induction

Decision Tree Induction: Example

Gender F Age 30 Yes

Data Mining - 4rth class UHD

Decision Tree Induction contd

Used for all types of data

Decision tree has two main functions

Data Mining - 4rth class UHD

TDIDT algorithm contd

TDIDT algorithm problems

This led to the introduction of various TDIDT algorithms

(Iterative Dichotomiser) [Quinlan] C4.5

TDIDT algorithm contd

To construct decision tree T from learning set S:

all examples in S belong to some class C Then make leaf labeled C

Data Mining - 4rth class UHD

TDIDT algorithm contd

Resulting tree T is:

Measurement of attribute selection

Data Mining - 4rth class UHD

Where pi { 0 For K classes, pi is the relative frequency of class i

The Logarithm Function log2X

The Logarithm Function: properties

Data Mining - 4rth class UHD

The function x loog2x

The logarithm of other bases

Other common uses base of logarithm is 10

Entropy = 0 log2 0 1 log2 1 = 0 0 = 0 P1= 1/6 P2 = 5/6

Entropy = (1/6) log2 (1/6) (5/6) log2 (5/6) = 0.65

Entropy = (2/6) log2 (2/6) (4/6) log2 (4/6) = 0.92

For the initial dataset we calculate Estart

EStart = -(6/13)log2(6/13)-(7/13)log2 (7/13) = 0,99572745

Data Mining - 4rth class UHD

Splitting on attribute Specialty For subset (Specialty = IT)

Age Gender Specialty 19 21 35 35 23 F F F M M IT IT IT IT IT

Sportive Yes Yes Yes Yes No

(Sportive = Yes) = 4 Class (Sportive = No) = 1 PYes=4/5 PNo=1/5

= [(4/5)log2(4/5)+(1/5)log2 (1/5)] = 0,72192809

For subset (Specialty = Medicine)

Specialty Medicine Medicine Medicine Medicine

(Sportive = Yes) = 1 Class (Sportive = No) = 3 PYes=1/4 PNo=3/4

= [(1/4)log2(1/4)+(3/4)log2 (3/4)] = 0,81127812

Data Mining - 4rth class UHD

For subset (Specialty = Engineering)

(Sportive = Yes) = 0 Class (Sportive = No) = 2 PYes=0/2 PNo=2/2