This action might not be possible to undo. Are you sure you want to continue?

: M0614 / Data Mining & OLAP : Feb - 2010

**Classification and Prediction
**

Pertemuan 08

Learning Outcomes

Pada akhir pertemuan ini, diharapkan mahasiswa akan mampu :

• Mahasiswa dapat menggunakan teknik analisis classification by decision tree induction, Bayesian classification, classification by back propagation, dan lazy learners pada data mining. (C3)

3

Bina Nusantara

Acknowledgments

These slides have been adapted from Han, J., Kamber, M., & Pei, Y. Data Mining: Concepts and Technique and Tan, P.-N., Steinbach, M., & Kumar, V. Introduction to Data Mining.

Bina Nusantara

Outline Materi

• Bayesian classification

5

Bina Nusantara

**Bayesian Classification: Why?
**

• A statistical classifier: performs probabilistic prediction, i.e., predicts class membership probabilities • Foundation: Based on Bayes’ Theorem. • Performance: A simple Bayesian classifier, naïve Bayesian classifier, has comparable performance with decision tree and selected neural network classifiers • Incremental: Each training example can incrementally increase/decrease the probability that a hypothesis is correct — prior knowledge can be combined with observed data • Standard: Even when Bayesian methods are computationally intractable, they can provide a standard of optimal decision making against which other methods can be measured

June 20, 2010

Data Mining: Concepts and Techniques

6

**Bayesian Theorem: Basics
**

• • • • • • Let X be a data sample (“evidence”): class label is unknown Let H be a hypothesis that X belongs to class C Classification is to determine P(H|X), (posteriori probability), the probability that the hypothesis holds given the observed data sample X P(H) (prior probability), the initial probability – E.g., X will buy computer, regardless of age, income, … P(X): probability that sample data is observed P(X|H) (likelyhood), the probability of observing the sample X, given that the hypothesis holds – E.g., Given that X will buy computer, the prob. that X is 31..40, medium income

June 20, 2010

Data Mining: Concepts and Techniques

7

Bayesian Theorem

• Given training data X, posteriori probability of a hypothesis H, P(H|X), follows the Bayes theorem

P (H | X ) = P (X | H )P (H ) P (X )

• • • Informally, this can be written as posteriori = likelihood x prior/evidence Predicts X belongs to C2 iff the probability P(Ci|X) is the highest among all the P(Ck|X) for all the k classes Practical difficulty: require initial knowledge of many probabilities, significant computational cost

June 20, 2010

Data Mining: Concepts and Techniques

8

**Example of Bayes Theorem
**

• Given: – A doctor knows that meningitis causes stiff neck 50% of the time – Prior probability of any patient having meningitis is 1/50,000 – Prior probability of any patient having stiff neck is 1/20 • If a patient has stiff neck, what’s the probability he/she has meningitis?

P ( S | M ) P ( M ) 0.5 ×1 / 50000 P( M | S ) = = = 0.0002 P( S ) 1 / 20

Bayesian Classifiers

• Consider each attribute and class label as random variables • Given a record with attributes (A1, A2,…,An) – Goal is to predict class C – Specifically, we want to find the value of C that maximizes P(C| A1, A2,…,An ) • Can we estimate P(C| A1, A2,…,An ) directly from data?

Bayesian Classifiers

• Approach: – compute the posterior probability P(C | A1, A2, …, An) for all values of C using the Bayes theorem

P (C | A A K A ) =

1 2 n

P ( A A K A | C ) P (C ) P(A A K A )

1 2 n 1 2 n

– Choose value of C that maximizes P(C | A1, A2, …, An) – Equivalent to choosing value of C that maximizes P(A1, A2, …, An|C) P(C) • How to estimate P(A1, A2, …, An | C )?

**Naïve Bayes Classifier
**

• Assume independence among attributes Ai when class is given: – P(A1, A2, …, An |C) = P(A1| Cj) P(A2| Cj)… P(An| Cj) – Can estimate P(Ai| Cj) for all Ai and Cj. – New point is classified to Cj if P(Cj) Π P(Ai| Cj) is maximal.

**l How rtol EstimateusProbabilities from Data? ca ca i o ri c
**

Tid

at

go e

c

at

go e

c

on

u in t

s la c

s

Refund

Marital Status Single Married Single Married Divorced Married Divorced Single Married Single

Taxable Income 125K 100K 70K 120K 95K 60K 220K 85K 75K 90K

Evade No No No No Yes No No Yes No Yes

• Class: P(C) = Nc/N

– e.g., P(No) = 7/10, P(Yes) = 3/10

1 2 3 4 5 6 7 8 9 10

10

Yes No No Yes No No Yes No No No

**• For discrete attributes: P(Ai | Ck) = |Aik|/ Nck
**

– where |Aik| is number of instances having attribute Ai and belongs to class Ck – Examples:

P(Status=Married|No) = 4/7 P(Refund=Yes|Yes)=0

**How to Estimate Probabilities from Data?
**

• For continuous attributes: – Discretize the range into bins • one ordinal attribute per bin • violates independence assumption – Two-way split: (A < v) or (A > v) • choose only one of the two splits as new attribute – Probability density estimation: • Assume attribute follows a normal distribution • Use data to estimate parameters of distribution (e.g., mean and standard deviation) • Once probability distribution is known, can use it to estimate the conditional probability P(Ai|c)

**Howgoto Estimate Probabilities from Data? o tin ss e eg
**

at c at c on c a cl

Tid Refund Marital Status Single Married Single Married Divorced Married Divorced Single Married Single Taxable Income 125K 100K 70K 120K 95K 60K 220K 85K 75K 90K Evade No No No No Yes No No Yes No Yes

( 120 −110 ) 2 2 ( 2975 )

a ric

l

a ric

l

u uo

s

• Normal distribution:

1 2 3 4 5 6 7 8 9 10

10

Yes No No Yes No No Yes No No No

1 P( A | c ) = e 2πσ

i j 2 ij

−

( Ai − µ ij ) 2

2 2 σ ij

– One for each (Ai,ci) pair

**• For (Income, Class=No):
**

– If Class=No

• sample mean = 110 • sample variance = 2975

1 P ( Income = 120 | No) = e 2π (54.54)

−

= 0.0072

**Example of Naïve Bayes Classifier
**

Given a Test Record:

**X = (Refund = No, Married, Income = 120K)
**

naive Bayes Classifier:

P(Refund=Yes|No) = 3/7 P(Refund=No|No) = 4/7 P(Refund=Yes|Yes) = 0 P(Refund=No|Yes) = 1 P(Marital Status=Single|No) = 2/7 P(Marital Status=Divorced|No)=1/7 P(Marital Status=Married|No) = 4/7 P(Marital Status=Single|Yes) = 2/7 P(Marital Status=Divorced|Yes)=1/7 P(Marital Status=Married|Yes) = 0 For taxable income: If class=No: sample mean=110 sample variance=2975 If class=Yes: sample mean=90 sample variance=25

P(X|Class=No) = P(Refund=No|Class=No) × P(Married| Class=No) × P(Income=120K| Class=No) = 4/7 × 4/7 × 0.0072 = 0.0024 P(X|Class=Yes) = P(Refund=No| Class=Yes) × P(Married| Class=Yes) × P(Income=120K| Class=Yes) = 1 × 0 × 1.2 × 10-9 = 0

Since P(X|No)P(No) > P(X|Yes)P(Yes) Therefore P(No|X) > P(Yes|X)

=> Class = No

**Naïve Bayes Classifier
**

• If one of the conditional probability is zero, then the entire expression becomes zero • Probability estimation: N ic c: number of classes Original : P ( Ai | C ) = Nc

N ic + 1 Laplace : P( Ai | C ) = Nc + c N ic + mp m - estimate : P( Ai | C ) = Nc + m

p: prior probability m: parameter

**Example of Naïve Bayes Classifier
**

Name Give Birth Can Fly Live in Water Have Legs Class

human python salmon whale frog komodo bat pigeon cat leopard shark turtle penguin porcupine eel salamander gila monster platypus owl dolphin eagle

yes no no yes no no yes no yes yes no no yes no no no no no yes no

no no no no no no yes yes no no no no no no no no no yes no yes

no no yes yes sometimes no no no no yes sometimes sometimes no yes sometimes no no no yes no

yes no no no yes yes yes yes yes no yes yes yes no yes yes yes yes no yes

mammals non-mammals non-mammals mammals non-mammals non-mammals mammals non-mammals mammals non-mammals non-mammals non-mammals mammals non-mammals non-mammals non-mammals mammals non-mammals mammals non-mammals

A: attributes M: mammals N: non-mammals

**6 6 2 2 P ( A | M ) = × × × = 0.06 7 7 7 7 1 10 3 4 P ( A | N ) = × × × = 0.0042 13 13 13 13 7 P ( A | M ) P ( M ) = 0.06 × = 0.021 20 13 P ( A | N ) P( N ) = 0.004 × = 0.0027 20
**

P(A|M)P(M) > P(A|N)P(N) => Mammals

Give Birth

Can Fly

Live in Water Have Legs

Class

yes

no

yes

no

?

**Example Naïve Bayesian Classifier: Training Dataset
**

Class: C1:buys_computer = ‘yes’ C2:buys_computer = ‘no’ Data sample X = (age <=30, Income = medium, Student = yes Credit_rating = Fair)

age <=30 <=30 31…40 >40 >40 >40 31…40 <=30 <=30 >40 <=30 31…40 31…40 >40 income student redit_rating c uys_compu high no fair no high no excellent no high no fair yes medium no fair yes low yes fair yes low yes excellent no low yes excellent yes medium no fair no low yes fair yes medium yes fair yes medium yes excellent yes medium no excellent yes high yes fair yes medium no excellent no

19

June 20, 2010

Data Mining: Concepts and Techniques

**Example Naïve Bayesian Classifier: Training Dataset
**

• P(Ci): P(buys_computer = “yes”) = 9/14 = 0.643 P(buys_computer = “no”) = 5/14= 0.357

•

Compute P(X|Ci) for each class P(age = “<=30” | buys_computer = “yes”) = 2/9 = 0.222 P(age = “<= 30” | buys_computer = “no”) = 3/5 = 0.6 P(income = “medium” | buys_computer = “yes”) = 4/9 = 0.444 P(income = “medium” | buys_computer = “no”) = 2/5 = 0.4 P(student = “yes” | buys_computer = “yes) = 6/9 = 0.667 P(student = “yes” | buys_computer = “no”) = 1/5 = 0.2 P(credit_rating = “fair” | buys_computer = “yes”) = 6/9 = 0.667 P(credit_rating = “fair” | buys_computer = “no”) = 2/5 = 0.4 X = (age <= 30 , income = medium, student = yes, credit_rating = fair)

•

P(X|Ci) : P(X|buys_computer = “yes”) = 0.222 x 0.444 x 0.667 x 0.667 = 0.044 P(X|buys_computer = “no”) = 0.6 x 0.4 x 0.2 x 0.4 = 0.019 P(X|Ci)*P(Ci) : P(X|buys_computer = “yes”) * P(buys_computer = “yes”) = 0.028 P(X|buys_computer = “no”) * P(buys_computer = “no”) = 0.007 Therefore, X belongs to class (“buys_computer = yes”)

June 20, 2010 Data Mining: Concepts and Techniques 20

**Naïve Bayes: Summary
**

• Robust to isolated noise points • Handle missing values by ignoring the instance during probability estimate calculations • Robust to irrelevant attributes • Independence assumption may not hold for some attributes – Use other techniques such as Bayesian Belief Networks (BBN)

**Naïve Bayesian Classifier: Comments
**

• Advantages – Easy to implement – Good results obtained in most of the cases Disadvantages – Assumption: class conditional independence, therefore loss of accuracy – Practically, dependencies exist among variables • E.g., hospitals: patients: Profile: age, family history, etc. Symptoms: fever, cough etc., Disease: lung cancer, diabetes, etc. • Dependencies among these cannot be modeled by Naïve Bayesian Classifier How to deal with these dependencies? – Bayesian Belief Networks

Data Mining: Concepts and Techniques 22

•

•

June 20, 2010

Dilanjutkan ke pert. 09

Classification and Prediction (cont.)

Bina Nusantara

Are you sure?

This action might not be possible to undo. Are you sure you want to continue?

We've moved you to where you read on your other device.

Get the full title to continue

Get the full title to continue reading from where you left off, or restart the preview.

scribd