Classification algorithms like Decision tree, ID3, Information Theory,Entropy,CART, Naive Baysian classsification are discussed with examples.

Attribution Non-Commercial (BY-NC)

6.6K views

Classification algorithms like Decision tree, ID3, Information Theory,Entropy,CART, Naive Baysian classsification are discussed with examples.

Attribution Non-Commercial (BY-NC)

- Data Mining Information
- ID3 Algorithm
- Cluster Analysis in data Minining
- Introduction to Data Mining
- An Introduction to Social Network Analysis
- Time Series Analysis by SPSS
- (DATA MINING) ppt
- Web Usage Mining
- Module 1.2-Basics of Technopreneurship
- Data Mining
- Effective Classification Algorithms to Predict the Accuracy of Tuberculosis - A Machine Learning Approach
- History Intel Processors
- Factors Influencing Customers Preference Regarding Milk Products
- Decision Tree & Techniques
- Web Mining
- Logistic Regression Mini Tab
- Social Networks and Data Mining
- Object Database system
- Predicting Missing Items in Shopping Carts
- List of ISO Certified Companies Coimbatore, Tirupur, Bangalore and Chennai

You are on page 1of 63

IN DATA MINING

SUSHIL KULKARNI

SUSHIL KULKARNI

Classification

What is classification?

Model Construction

ID3

Information Theory

Naïve Baysian Classifier

SUSHIL KULKARNI

CLASSIFICATION

PROBLEM

SUSHIL KULKARNI

CLASSIFICATION PROBLEM

Given a database D={t1,t2,…,tn} and a set

of classes C={C1,…,Cm}, the Classification

Problem is to define a mapping f: D C

where each ti is assigned to one class.

data with the help of given set of data

called training set.

SUSHIL KULKARNI

CLASSIFICATION EXAMPLES

B, C, D, or F.

ӂ Identify mushrooms as poisonous or

edible.

ӂ Identify individuals with credit risks.

SUSHIL KULKARNI

Why Classification? A motivating

application

Credit approval

o A bank wants to classify its customers

based on whether they are expected to

pay back their approved loans

The history of past customers is used to

train the classifier

The classifier provides rules, which

identify potentially reliable future

customers SUSHIL KULKARNI

Why Classification? A motivating

application

Credit approval

o Classification rule:

If age = “31...40” and income = high then

credit_rating = excellent

o Future customers

Suhas : age = 35, income = high ⇒ excellent

credit rating

Heena : age = 20, income = medium ⇒ fair

credit rating

SUSHIL KULKARNI

Classification — A Two-Step

Process

Model construction: describing a set of

predetermined classes: Excellent and Fair

using training set.

rules.

SUSHIL KULKARNI

Supervised Learning

o Supervision: The training data (observations,

measurements, etc.) are accompanied by labels

indicating the class of the observations

o New data is classified based on the training set

SUSHIL KULKARNI

Classification Process (1):

Model Construction

Classification

Algorithms

Training

Data

Henna Assistant Prof 3 no (Model)

Leena Assistant Prof 7 yes

Meena Professor 2 yes

Dinesh Associate Prof 7 yes

IF rank = ‘professor’

Dinu Assistant Prof 6 no

OR years > 6

Amar Associate Prof 3 no

THEN teach = ‘yes’

SUSHIL KULKARNI

Classification Process (2): Use

the Model in Prediction

Classifier

Testing

Data Unseen Data

(Dina, Professor, 4)

NAME RANK YEARS TENURED Teach?

Swati Assistant Prof 2 no

Malika Associate Prof 7 no

Tina Professor 5 yes

June Assistant Prof 7 yes SUSHIL KULKARNI

Model Construction: Example

Sr. Gender Age BP Drug

1 M 20 Normal A

2 F 73 Normal B

3 M 37 High A

4 M 33 Low B

5 F 48 High A

6 M 29 Normal A

7 F 52 Normal B

8 M 42 Low B

9 M 61 Normal B

10 F 30 Normal A

11 F 26 Low B

12 M 54 High A

SUSHIL KULKARNI

Model Construction: Example

Directed Tree

Blood Pressure ?

≤ 40

> 40

Drug A Drug B

SUSHIL KULKARNI

Model Construction: Example

o If BP=High prescribe Drug A

o If BP=Low prescribe Drug B

o If BP=Normal and age ≤40 prescribe Drug A else prescribe

Drug B

SUSHIL KULKARNI

Model Construction: Example

The tree is constructed with training data and

there is no training error.

according to training data.

rules with 100% accuracy and with high

support.

SUSHIL KULKARNI

Model Construction: Example

Accuracy and Support :

o Accuracy is 100% correct for given rules.

o If BP=Low prescribe Drug B ( Support = 3/12)

o If BP=Normal and age ≤ 40 prescribe Drug A else

prescribe Drug B ( Support = 3/12)

SUSHIL KULKARNI

Error and Support

Let t = No. of data points, r = no. of data points in a

class or node, max = maximum no. of data points in

a class or node, min = minimum no. of data points in

a class or node

o Accuracy = max / r

o Error = min / r

o Support = max / t

support for the class is calculated with respect to

the total number of data points in a given set.

SUSHIL KULKARNI

Rules with different

accuracy & support

180 data points

E = 5/120 E = 2/60

A= 115/120 A= 58/60

S= 115/180 S= 58/180

X < 60 X > 60

115 A 58 A

5B 2B

Node P Node Q

SUSHIL KULKARNI

Criteria to grow the tree

tree is called as classification tree.

[ Eg. Drug Prescribe]

is called as regression tree.

[ Eg. Income]

SUSHIL KULKARNI

CLASSIFICATION

TREES FOR

CATEGORICAL

ATTRIBUTES

SUSHIL KULKARNI

INDUCTION DECISION TREE [ ID3]

Decision tree generation consists of two phases

o Tree construction

At start, all the training examples are at the root

Partition examples recursively based on

selected attributes

o Tree pruning

• Identify and remove branches that reflect noise

or outliers

o Test the attribute values of the sample against the

decision tree

SUSHIL KULKARNI

Training Dataset

This follows an example from Quinlan’s

NoID3 age income student credit_rating buys_computer

1 <=30 high no fair no

2 <=30 high no excellent no

3 31…40 high no fair yes

4 >40 medium no fair yes

5 >40 low yes fair yes

6 >40 low yes excellent no

7 31…40 low yes excellent yes

8 <=30 medium no fair no

9 <=30 low yes fair yes

10 >40 medium yes fair yes

11 <=30 medium yes excellent yes

12 31…40 medium no excellent yes

SUSHIL KULKARNI

Output: ID 3 for “buys_computer”

‘no’ and ‘yes’ are two classes created

age?

<=30

31..40 >40

no excellent fair

yes

no yes no yes

SUSHIL KULKARNI

ANOTHER EXAMPLE:

MARKS

x

<90 >=90

ӂ If x >= 90 then grade =A.

ӂIf 80<=x<90 then grade =B. x A

ӂIf 60<=x<70 then grade =D. x B

ӂIf x<50 then grade =F <70 >=70

x C

<50 >=60

F D

SUSHIL KULKARNI

ALGORITHM FOR ID 3

Basic algorithm (a greedy algorithm)

Tree is constructed in a top-down recursive divide-

and-conquer manner

At start, all the training examples are at the root

Attributes are categorical

Samples are partitioned recursively based on

selected attributes

Test attributes are selected on the basis of a

heuristic or statistical measure (e.g., information

gain)

SUSHIL KULKARNI

ALGORITHM FOR ID 3

All samples for a given node belong to the

same class

There are no remaining attributes for

further partitioning – majority voting is

employed for classifying the leaf

There are no samples left

SUSHIL KULKARNI

ID 3 : ADVANTAGES

Easy to understand.

SUSHIL KULKARNI

ID 3 :

DISADVANTAGES

data.

Can be quite large – pruning is

necessary.

SUSHIL KULKARNI

INFORMATION

THEORY

SUSHIL KULKARNI

INFORMATION THEORY

SUSHIL KULKARNI

INFORMATION THEORY

When all the marbles in the bowl are

mixed up, little information is given.

distributed in different classes , more

information is given.

SUSHIL KULKARNI

ENTROPY

attribute from a tree.

‘yes’ or ‘no’ in our example.

SUSHIL KULKARNI

BUILDING THE

TREE

SUSHIL KULKARNI

Information Gain ID3

Select the attribute with the highest information

gain

Assume there are two classes, P and N

Let the set S contain p elements of class P and n

elements of class N

The amount of information, needed to decide if an

arbitrary object in S belongs to P or N is defined as

p p n n

I ( p, n) = − log 2 − log 2

p+n p+n p+n p+n

SUSHIL KULKARNI

Information Gain in Decision

Tree Induction

Assume that using attribute A, a set S will be

partitioned into sets {S1, S2 , …, Sv}

If Si contains pi elements of P and ni elements of N,

the entropy, or the expected information needed to

classify objects in all sub trees Si is

ν pi + ni

E ( A) = ∑ I ( pi , ni )

i =1 p + n

by branching on A

Gain( A) = I ( p, n) − E ( A)

SUSHIL KULKARNI

Training Dataset

This follows an example from Quinlan’s

NoID3 age income student credit_rating buys_computer

1 <=30 high no fair no

2 <=30 high no excellent no

3 31…40 high no fair yes

4 >40 medium no fair yes

5 >40 low yes fair yes

6 >40 low yes excellent no

7 31…40 low yes excellent yes

8 <=30 medium no fair no

9 <=30 low yes fair yes

10 >40 medium yes fair yes

11 <=30 medium yes excellent yes

12 31…40 medium no excellent yes

SUSHIL KULKARNI

Attribute Selection by

Information Gain Computation

Class P:

buys_computer = E ( age) =

5

I ( 2,3) +

4

I ( 4,0)

“yes” 14 14

5

+ I (3,2) =0.69

Class N: 14

Hence

buys_computer = “no” Gain( age) = I ( p, n) − E ( age)

I(p, n) = I(9, 5) =0.940 =0.250

Similarly

for age: Gain( student ) = 0.151

age pi ni I(pi, ni) Gain(credit _ rating ) = 0.048

<=30 2 3 0.971 AGE IS MAX GAIN

31..40 4 0 0

SUSHIL KULKARNI

>40 3 2 0.971

Splitting the samples using

age

age?

<=30 >40

31...40

income student credit_rating buys_computer income student credit_rating buys_computer

high no fair no medium no fair yes

high no excellent no low yes fair yes

medium no fair no low yes excellent no

low yes fair yes medium yes fair yes

medium yes excellent yes medium no excellent no

high no fair yes

low yes excellent yes labeled yes

medium no excellent yes

high yes fair yes

SUSHIL KULKARNI

Output: ID 3 for “buys_computer”

age?

<=30

31..40 >40

no excellent fair

yes

no yes no yes

SUSHIL KULKARNI

CART

SUSHIL KULKARNI

CART [ CLASSIFICATION AND

REGRESSION TREE]

Algorithm is similar to ID 3 but used GINI

index called impurity measure to select

variables.

If target variable is normal and has more

than two categories , the option of merging of

target categories into two super categories

may be considered. The process is called

Twoing.

SUSHIL KULKARNI

Gini Index (IBM Intelligent Miner)

If a data set T contains examples from n

classes, gini index, gini(T) is defined as

n

gini(T ) =1−∑p 2 j

j= 1

where pj is the relative frequency of class j in T.

SUSHIL KULKARNI

Extracting Classification Rules

from Trees

Represent the knowledge in the form of IF-

THEN rules

to a leaf

conjunction

SUSHIL KULKARNI

Extracting Classification Rules

from Trees

The leaf node holds the class prediction

Rules are easy for humans to understand

Example

IF age = “<=30” AND student = “no” THEN

buys_computer = “no”

IF age = “<=30” AND student = “yes” THEN

buys_computer = “yes”

IF age = “31…40” THEN

buys_computer = “yes”

IF age = “>40” AND credit_rating = “excellent”

THEN buys_computer = “yes”

IF age = “>40” AND credit_rating = “fair” THEN

buys_computer = “no” SUSHIL KULKARNI

BAYESIAN

CLASSIFICATION

SUSHIL KULKARNI

Classification and

regression

What is classification? What is regression?

Issues regarding classification and

regression

Classification by decision tree induction

Bayesian Classification

Other Classification Methods

regression

SUSHIL KULKARNI

What is Bayesian

Classification?

Bayesian classifiers are statistical

classifiers

probability that the sample belongs to a

class (for all classes)

SUSHIL KULKARNI

What is Bayesian

Classification?

Example:

student=no, credit_rating=fair)

SUSHIL KULKARNI

Naive Bayesian Classifier

play tennis?

Example

Outlook Temperature Humidity W indy Class

sunny hot high false N

sunny hot high true N

overcast hot high false P

rain mild high false P

rain cool normal false P

rain cool normal true N

overcast cool normal true P

sunny mild high false N

sunny cool normal false P

rain mild normal false P

sunny mild normal true P

overcast mild high true P

overcast hot normal false P

rain mild high true N

SUSHIL KULKARNI

Naive Bayesian Classifier

Example

Outlook Temperature Humidity W indy Class

overcast hot high false P

rain mild high false P

rain cool normal false P

overcast cool normal true P 9

sunny cool normal false P

rain mild normal false P

sunny mild normal true P

overcast mild high true P

overcast hot normal false P

Outlook Temperature Humidity Windy Class

sunny hot high false N

sunny hot high true N

rain cool normal true N 5

sunny mild high false N

rain mild high true N

SUSHIL KULKARNI

Naive Bayesian Classifier

Example

Given the training set, we compute the

probabilities:

Outlook P N Humidity P N

sunny 2/9 3/5 high 3/9 4/5

overcast 4/9 0 normal 6/9 1/5

rain 3/9 2/5

Tempreature W indy

hot 2/9 2/5 true 3/9 3/5

mild 4/9 2/5 false 6/9 2/5

cool 3/9 1/5

P = 9/14 and N = 5/14 SUSHIL KULKARNI

Naive Bayesian Classifier

We use notation P(A) as the probability of an

event A and P(A/B) denotes the probability of A

conditional on another event B.

H is the hypothesis and E is the evidence and is

the combination of attribute values then

p( E / H ).p( H )

p( H / E ) =

p( E )

• Example: Let H be ‘yes’ and e is the combination

of attribute values for new day: Outlook=sunny,

temp.= cool, humidity= high, windy= true. Call these

for pieces as E1 , E2 ’ E 3 and E 4 and are independent

then p(E1 / H).p(E 2 / H).p(E3 / H)p(E 4 / H).p(H)

p( H / E ) =

p( E )

SUSHIL KULKARNI

Naive Bayesian Classifier

Denominator can be eliminated as the final normalizing

step when we make the probabilities of different pieces,

the sum is 1. Thus

SUSHIL KULKARNI

Naive Bayesian Classifier

Example

To classify a new day E:

outlook = sunny, temperature = cool

humidity = high, windy = false

* Prob(high|P) * Prob(false|P)

= 9/14*2/9*3/9*3/9*6/9 = 0.01

Prob(N|X) = Prob(N) * Prob(sunny|N) *

Prob(cool|N) * Prob(high|N) *

Prob(false|N)

= 5/14*3/5*1/5*4/5*2/5 = 0.013

SUSHIL KULKARNI

Naive Bayesian Classifier

Example

Probability of ‘Playing’

0.01

= = 43 %

0.01 + 0.013

0.013

= = 57 %

0.01 + 0.013

Therefore E takes class label N

SUSHIL KULKARNI

Naive Bayesian Classifier

Example

Second example X = <rain, hot, high, false>

* P(false|p) * P(p)

= 3/9·2/9·3/9·6/9·9/14 = 0.010582

P(X|n) · P(n) = P(rain|n) · P(hot|n) ·

P(high|n)·P(false|n)·P(n)

= 2/5·2/5·4/5·2/5·5/14 = 0.018286

Sample X is classified in class N (don’t play)

SUSHIL KULKARNI

Naive Bayesian Classifier

Example

Probability of ‘Playing’

0.010582

= = 37 %

0.010582 + 0.0182860

Probability of ‘ Not Playing’

0.0182860

= = 63 %

0.010582 + 0.0182860

Therefore X takes class label N SUSHIL KULKARNI

REGRESSION

SUSHIL KULKARNI

What Is regression?

regression is similar to classification

o First, construct a model

o Second, use model to predict unknown value

Major method for regression is regression

• Linear and multiple regression

• Non-linear regression

regression is different from classification

o Classification refers to predict categorical

class label

o regression models continuous-valued

functions SUSHIL KULKARNI

Predictive Modeling in

Databases

Predictive modeling: Predict data values or

construct generalized linear models based

on the database data.

One can only predict value ranges or

category distributions

Determine the major factors which influence

the regression

o Data relevance analysis: uncertainty

measurement, entropy analysis, expert

judgement, etc. SUSHIL KULKARNI

Regress Analysis and Log-

Linear Models in Regression

Linear regression: Y = α + β X

to be estimated by using the data at hand.

using the least squares criterion to the known

values of (x1,y1),(x2,y2),...,(xs,yS):

∑

s

( xi − x )( yi − y )

β= i =1

a =y −βx

∑

s

i =1

( xi − x ) 2

SUSHIL KULKARNI

Regress Analysis and Log-

Linear Models in Regression

Multiple regression: Y = b0 + b1 X1 + b2 X2.

Many nonlinear functions can be transformed into

the above.

E.g.,Y=b 0 + b1 X+ b2X 2+ b3X 3, X1=X, X2=X 2, X3=X 3

Log-linear models:

The multi-way table of joint probabilities is

approximated by a product of lower-order tables.

Probability: p(a, b, c, d) = α ab β acχ ad δ bcd

SUSHIL KULKARNI

T H A N K S !

SUSHIL KULKARNI

- Data Mining InformationUploaded byqun
- ID3 AlgorithmUploaded bysharad verma
- Cluster Analysis in data MininingUploaded bySushil Kulkarni
- Introduction to Data MiningUploaded bysrai1978
- An Introduction to Social Network AnalysisUploaded bysingaporeano
- Time Series Analysis by SPSSUploaded byFaizan Ahmad
- (DATA MINING) pptUploaded byapi-3765947
- Web Usage MiningUploaded byKurumeti Naga Surya Lakshmana Kumar
- Module 1.2-Basics of TechnopreneurshipUploaded byheromiki316
- Data MiningUploaded byRiteshgs
- Effective Classification Algorithms to Predict the Accuracy of Tuberculosis - A Machine Learning ApproachUploaded byijcsis
- History Intel ProcessorsUploaded byjohncel89
- Factors Influencing Customers Preference Regarding Milk ProductsUploaded byMuhammad Muavia Khan
- Decision Tree & TechniquesUploaded byjain
- Web MiningUploaded byPriya Agrawal
- Logistic Regression Mini TabUploaded byAnıl Toraman
- Social Networks and Data MiningUploaded bySushil Kulkarni
- Object Database systemUploaded bySushil Kulkarni
- Predicting Missing Items in Shopping CartsUploaded byvamsikaza1
- List of ISO Certified Companies Coimbatore, Tirupur, Bangalore and ChennaiUploaded bySheetal Sarvesh
- data miningUploaded byde_ep_an
- INTEL 8086Uploaded byKasaragadda Mahanthi
- Concepts and TechniquesUploaded bymahendirana
- Data Warehousing PPTUploaded bykishraj33
- C1 TechnopreneurshipUploaded byKaye Dominado Sibayan
- Decision Tree AlgorithmUploaded byThe
- e CommerceUploaded byskumar4787
- Interrupt Structure of 8086Uploaded byadithya123456
- data miningUploaded byCharan Rana

- Optimization of the Crossflow Micro Filtration of Araza Juice Under Different Operation ModesUploaded byAngelo Ramírez
- ASTM D 529 – 00 Enclosed Carbon-Arc Exposures of Bituminous MaterialsUploaded byalin2005
- DS 164 02 ZincAnode304 MZSUploaded byPower Power
- Key Stage 3 - HardwareUploaded byShivani Rajkumar
- R Building EnvelopeUploaded byZuhair Mased
- Annotated Stata Output_ Logistic RegressionUploaded byAswan Malik
- Hydro GraphUploaded byWilliam Wong
- 11544715-Clear-Audiobooks.pdfUploaded byolomizana
- Ti e Protegol 132 Ht Jan 08Uploaded byA Mahmood
- Building Capabilities for Flood Disaster.Oludare Hakeem Adedeji.pdfUploaded byAjin Aya
- Major Project ReportUploaded byArnab Midya
- Masterseal 550Uploaded byHaresh Bhavnani
- Spell Summaries D&D 3.5Uploaded bygoku_the_protector
- 3.3 Heat and Humidity - Psychrometers and Psychrometric CalculationsUploaded byDeepakKattimani
- Tenses in EnglishUploaded byPaulo Cardeira
- Heating and Cooling System BasicsUploaded bysopan sa
- Cern SlidesUploaded byMine RH
- chapter 4Uploaded byleelamech36
- Domestic Refrigeration SystemUploaded byRevanth Nikki
- Psychrometrics of Air Conditioning SystemsUploaded bynim_gourav1997
- 20421.pdfUploaded byayyagarisri
- How to Grow Sugar Baby Watermelon _ Guide to Growing Sugar Baby WatermelonUploaded byetanksley
- Snow Survey May 1Uploaded byJess Peters
- Logical Reasoning.docxUploaded byRanaAbrarSabir
- Greenhouse Effect.docUploaded byjagdish002
- Ssdx12-25 Handbook Dse702 & Esr3.1 Issue 3Uploaded byazry_alqadry
- PROTO-LANGUAGE MONOSYLLABLES With Their Principal Meanings by Patrick C. RyanUploaded byAmyr Ra El
- Lect 02 - Environment ManagementUploaded byAzfar Javaid
- Sigma Series de Meter-datasheetUploaded byDragac Cifra
- Ghg Inventory 00-12 Technical Support DocumentUploaded byOgenioja