## Are you sure?

This action might not be possible to undo. Are you sure you want to continue?

: M0614 / Data Mining & OLAP : Feb - 2010

**Classification and Prediction
**

Pertemuan 08

Learning Outcomes

Pada akhir pertemuan ini, diharapkan mahasiswa akan mampu :

• Mahasiswa dapat menggunakan teknik analisis classification by decision tree induction, Bayesian classification, classification by back propagation, dan lazy learners pada data mining. (C3)

3

Bina Nusantara

Acknowledgments

These slides have been adapted from Han, J., Kamber, M., & Pei, Y. Data Mining: Concepts and Technique and Tan, P.-N., Steinbach, M., & Kumar, V. Introduction to Data Mining.

Bina Nusantara

Outline Materi

• Bayesian classification

5

Bina Nusantara

**Bayesian Classification: Why?
**

• A statistical classifier: performs probabilistic prediction, i.e., predicts class membership probabilities • Foundation: Based on Bayes’ Theorem. • Performance: A simple Bayesian classifier, naïve Bayesian classifier, has comparable performance with decision tree and selected neural network classifiers • Incremental: Each training example can incrementally increase/decrease the probability that a hypothesis is correct — prior knowledge can be combined with observed data • Standard: Even when Bayesian methods are computationally intractable, they can provide a standard of optimal decision making against which other methods can be measured

June 20, 2010

Data Mining: Concepts and Techniques

6

**Bayesian Theorem: Basics
**

• • • • • • Let X be a data sample (“evidence”): class label is unknown Let H be a hypothesis that X belongs to class C Classification is to determine P(H|X), (posteriori probability), the probability that the hypothesis holds given the observed data sample X P(H) (prior probability), the initial probability – E.g., X will buy computer, regardless of age, income, … P(X): probability that sample data is observed P(X|H) (likelyhood), the probability of observing the sample X, given that the hypothesis holds – E.g., Given that X will buy computer, the prob. that X is 31..40, medium income

June 20, 2010

Data Mining: Concepts and Techniques

7

Bayesian Theorem

• Given training data X, posteriori probability of a hypothesis H, P(H|X), follows the Bayes theorem

P (H | X ) = P (X | H )P (H ) P (X )

• • • Informally, this can be written as posteriori = likelihood x prior/evidence Predicts X belongs to C2 iff the probability P(Ci|X) is the highest among all the P(Ck|X) for all the k classes Practical difficulty: require initial knowledge of many probabilities, significant computational cost

June 20, 2010

Data Mining: Concepts and Techniques

8

**Example of Bayes Theorem
**

• Given: – A doctor knows that meningitis causes stiff neck 50% of the time – Prior probability of any patient having meningitis is 1/50,000 – Prior probability of any patient having stiff neck is 1/20 • If a patient has stiff neck, what’s the probability he/she has meningitis?

P ( S | M ) P ( M ) 0.5 ×1 / 50000 P( M | S ) = = = 0.0002 P( S ) 1 / 20

Bayesian Classifiers

• Consider each attribute and class label as random variables • Given a record with attributes (A1, A2,…,An) – Goal is to predict class C – Specifically, we want to find the value of C that maximizes P(C| A1, A2,…,An ) • Can we estimate P(C| A1, A2,…,An ) directly from data?

Bayesian Classifiers

• Approach: – compute the posterior probability P(C | A1, A2, …, An) for all values of C using the Bayes theorem

P (C | A A K A ) =

1 2 n

P ( A A K A | C ) P (C ) P(A A K A )

1 2 n 1 2 n

– Choose value of C that maximizes P(C | A1, A2, …, An) – Equivalent to choosing value of C that maximizes P(A1, A2, …, An|C) P(C) • How to estimate P(A1, A2, …, An | C )?

**Naïve Bayes Classifier
**

• Assume independence among attributes Ai when class is given: – P(A1, A2, …, An |C) = P(A1| Cj) P(A2| Cj)… P(An| Cj) – Can estimate P(Ai| Cj) for all Ai and Cj. – New point is classified to Cj if P(Cj) Π P(Ai| Cj) is maximal.

**l How rtol EstimateusProbabilities from Data? ca ca i o ri c
**

Tid

at

go e

c

at

go e

c

on

u in t

s la c

s

Refund

Marital Status Single Married Single Married Divorced Married Divorced Single Married Single

Taxable Income 125K 100K 70K 120K 95K 60K 220K 85K 75K 90K

Evade No No No No Yes No No Yes No Yes

• Class: P(C) = Nc/N

– e.g., P(No) = 7/10, P(Yes) = 3/10

1 2 3 4 5 6 7 8 9 10

10

Yes No No Yes No No Yes No No No

**• For discrete attributes: P(Ai | Ck) = |Aik|/ Nck
**

– where |Aik| is number of instances having attribute Ai and belongs to class Ck – Examples:

P(Status=Married|No) = 4/7 P(Refund=Yes|Yes)=0

**How to Estimate Probabilities from Data?
**

• For continuous attributes: – Discretize the range into bins • one ordinal attribute per bin • violates independence assumption – Two-way split: (A < v) or (A > v) • choose only one of the two splits as new attribute – Probability density estimation: • Assume attribute follows a normal distribution • Use data to estimate parameters of distribution (e.g., mean and standard deviation) • Once probability distribution is known, can use it to estimate the conditional probability P(Ai|c)

**Howgoto Estimate Probabilities from Data? o tin ss e eg
**

at c at c on c a cl

Tid Refund Marital Status Single Married Single Married Divorced Married Divorced Single Married Single Taxable Income 125K 100K 70K 120K 95K 60K 220K 85K 75K 90K Evade No No No No Yes No No Yes No Yes

( 120 −110 ) 2 2 ( 2975 )

a ric

l

a ric

l

u uo

s

• Normal distribution:

1 2 3 4 5 6 7 8 9 10

10

Yes No No Yes No No Yes No No No

1 P( A | c ) = e 2πσ

i j 2 ij

−

( Ai − µ ij ) 2

2 2 σ ij

– One for each (Ai,ci) pair

**• For (Income, Class=No):
**

– If Class=No

• sample mean = 110 • sample variance = 2975

1 P ( Income = 120 | No) = e 2π (54.54)

−

= 0.0072

**Example of Naïve Bayes Classifier
**

Given a Test Record:

**X = (Refund = No, Married, Income = 120K)
**

naive Bayes Classifier:

P(Refund=Yes|No) = 3/7 P(Refund=No|No) = 4/7 P(Refund=Yes|Yes) = 0 P(Refund=No|Yes) = 1 P(Marital Status=Single|No) = 2/7 P(Marital Status=Divorced|No)=1/7 P(Marital Status=Married|No) = 4/7 P(Marital Status=Single|Yes) = 2/7 P(Marital Status=Divorced|Yes)=1/7 P(Marital Status=Married|Yes) = 0 For taxable income: If class=No: sample mean=110 sample variance=2975 If class=Yes: sample mean=90 sample variance=25

P(X|Class=No) = P(Refund=No|Class=No) × P(Married| Class=No) × P(Income=120K| Class=No) = 4/7 × 4/7 × 0.0072 = 0.0024 P(X|Class=Yes) = P(Refund=No| Class=Yes) × P(Married| Class=Yes) × P(Income=120K| Class=Yes) = 1 × 0 × 1.2 × 10-9 = 0

Since P(X|No)P(No) > P(X|Yes)P(Yes) Therefore P(No|X) > P(Yes|X)

=> Class = No

**Naïve Bayes Classifier
**

• If one of the conditional probability is zero, then the entire expression becomes zero • Probability estimation: N ic c: number of classes Original : P ( Ai | C ) = Nc

N ic + 1 Laplace : P( Ai | C ) = Nc + c N ic + mp m - estimate : P( Ai | C ) = Nc + m

p: prior probability m: parameter

**Example of Naïve Bayes Classifier
**

Name Give Birth Can Fly Live in Water Have Legs Class

human python salmon whale frog komodo bat pigeon cat leopard shark turtle penguin porcupine eel salamander gila monster platypus owl dolphin eagle

yes no no yes no no yes no yes yes no no yes no no no no no yes no

no no no no no no yes yes no no no no no no no no no yes no yes

no no yes yes sometimes no no no no yes sometimes sometimes no yes sometimes no no no yes no

yes no no no yes yes yes yes yes no yes yes yes no yes yes yes yes no yes

mammals non-mammals non-mammals mammals non-mammals non-mammals mammals non-mammals mammals non-mammals non-mammals non-mammals mammals non-mammals non-mammals non-mammals mammals non-mammals mammals non-mammals

A: attributes M: mammals N: non-mammals

**6 6 2 2 P ( A | M ) = × × × = 0.06 7 7 7 7 1 10 3 4 P ( A | N ) = × × × = 0.0042 13 13 13 13 7 P ( A | M ) P ( M ) = 0.06 × = 0.021 20 13 P ( A | N ) P( N ) = 0.004 × = 0.0027 20
**

P(A|M)P(M) > P(A|N)P(N) => Mammals

Give Birth

Can Fly

Live in Water Have Legs

Class

yes

no

yes

no

?

**Example Naïve Bayesian Classifier: Training Dataset
**

Class: C1:buys_computer = ‘yes’ C2:buys_computer = ‘no’ Data sample X = (age <=30, Income = medium, Student = yes Credit_rating = Fair)

age <=30 <=30 31…40 >40 >40 >40 31…40 <=30 <=30 >40 <=30 31…40 31…40 >40 income student redit_rating c uys_compu high no fair no high no excellent no high no fair yes medium no fair yes low yes fair yes low yes excellent no low yes excellent yes medium no fair no low yes fair yes medium yes fair yes medium yes excellent yes medium no excellent yes high yes fair yes medium no excellent no

19

June 20, 2010

Data Mining: Concepts and Techniques

**Example Naïve Bayesian Classifier: Training Dataset
**

• P(Ci): P(buys_computer = “yes”) = 9/14 = 0.643 P(buys_computer = “no”) = 5/14= 0.357

•

Compute P(X|Ci) for each class P(age = “<=30” | buys_computer = “yes”) = 2/9 = 0.222 P(age = “<= 30” | buys_computer = “no”) = 3/5 = 0.6 P(income = “medium” | buys_computer = “yes”) = 4/9 = 0.444 P(income = “medium” | buys_computer = “no”) = 2/5 = 0.4 P(student = “yes” | buys_computer = “yes) = 6/9 = 0.667 P(student = “yes” | buys_computer = “no”) = 1/5 = 0.2 P(credit_rating = “fair” | buys_computer = “yes”) = 6/9 = 0.667 P(credit_rating = “fair” | buys_computer = “no”) = 2/5 = 0.4 X = (age <= 30 , income = medium, student = yes, credit_rating = fair)

•

P(X|Ci) : P(X|buys_computer = “yes”) = 0.222 x 0.444 x 0.667 x 0.667 = 0.044 P(X|buys_computer = “no”) = 0.6 x 0.4 x 0.2 x 0.4 = 0.019 P(X|Ci)*P(Ci) : P(X|buys_computer = “yes”) * P(buys_computer = “yes”) = 0.028 P(X|buys_computer = “no”) * P(buys_computer = “no”) = 0.007 Therefore, X belongs to class (“buys_computer = yes”)

June 20, 2010 Data Mining: Concepts and Techniques 20

**Naïve Bayes: Summary
**

• Robust to isolated noise points • Handle missing values by ignoring the instance during probability estimate calculations • Robust to irrelevant attributes • Independence assumption may not hold for some attributes – Use other techniques such as Bayesian Belief Networks (BBN)

**Naïve Bayesian Classifier: Comments
**

• Advantages – Easy to implement – Good results obtained in most of the cases Disadvantages – Assumption: class conditional independence, therefore loss of accuracy – Practically, dependencies exist among variables • E.g., hospitals: patients: Profile: age, family history, etc. Symptoms: fever, cough etc., Disease: lung cancer, diabetes, etc. • Dependencies among these cannot be modeled by Naïve Bayesian Classifier How to deal with these dependencies? – Bayesian Belief Networks

Data Mining: Concepts and Techniques 22

•

•

June 20, 2010

Dilanjutkan ke pert. 09

Classification and Prediction (cont.)

Bina Nusantara

- TABE PERT FCAT Reading and Writing Practice MmAcVcErReIaCKSSteve McCrea Facilitator
- Pert-Cpm 1muneeb2u
- Perthes Disease - By Abdul Karimdrnurhakim
- 7793250 Cpm Pert Examplemysterium610
- pert2suresh.srinivasn
- Pert Usa PhdRoman Pamatz
- Pert 2gaurav004us
- 06_Pert cpmMadhavi Gupta
- pert_cpmneel_pankajj
- CPM-PERTtanmayarora
- pert-cpmmacholdings
- project_planning_and_control_495Varnika Bajaj
- Slide Proses Bisnis&E-Business Pert_1-14M Zakiyyan Anwar Dinan
- 01CPMlimulsa78
- PERTDocRavindra Sb
- Project Network Analysis TIBKhyati Kapoor
- Bayes for BeginnersNikesh Prasad
- Nihar Pert CpmDr Nihar Ranjan Ray
- 4. Operations Management PERT & CPMpvithalani
- Designing-More-Accurate-Dose-Escalation-Studies-in-Exploratory-Drug-Development.pdffersperm7679
- Project ManagementHussain
- CPM PERTAjit Sam
- Bayes RuleS M Waliur Rahman
- PhilosophyAnonymous pgWs18GDG1
- Bayesian Statistics Explained to Beginners in Simple Englishramesh158
- Image Segmentationangels2_2009
- 10.1.1.69ghiuta
- PedroDomingos_FTFML_WebinarSlidesBesHox
- bayesianAA10
- Decisions -- Under RiskBella Novitasari

- tmpAF84.tmpFrontiers
- tmp3691.tmpFrontiers
- frbrich_wp11-2.pdfFRASER: Federal Reserve Archive
- UT Dallas Syllabus for cs4315.501.10s taught by Richard Golden (golden)UT Dallas Provost's Technology Group
- UT Dallas Syllabus for cgs4315.501.08s taught by Richard Golden (golden)UT Dallas Provost's Technology Group
- 2016 Reuters Tracking Poll 4-13-2016The Conservative Treehouse
- UT Dallas Syllabus for stat7330.0u1.08u taught by Michael Baron (mbaron)UT Dallas Provost's Technology Group
- UT Dallas Syllabus for stat7330.521.06u taught by Michael Baron (mbaron)UT Dallas Provost's Technology Group
- tmp1F0F.tmpFrontiers
- Feature Subset Selection for High Dimensional Data Based On ClusteringInternational Journal for Scientific Research and Development
- tmp1413.tmpFrontiers
- tmp2059.tmpFrontiers
- tmp3C81.tmpFrontiers
- tmp269C.tmpFrontiers
- tmp28D6.tmpFrontiers
- UT Dallas Syllabus for acn6349.501.08s taught by Richard Golden (golden)UT Dallas Provost's Technology Group
- tmpF0E2.tmpFrontiers
- UT Dallas Syllabus for acn6349.501.10s taught by Richard Golden (golden)UT Dallas Provost's Technology Group
- Computational Toxicology of Chloroform: Reverse DosimetryEnvironmental Health Perspectives
- tmpB676.tmpFrontiers
- UT Dallas Syllabus for cs6375.501.10f taught by Yu Chung Ng (ycn041000)UT Dallas Provost's Technology Group
- Improvising Classification Technique in Data-MiningInternational Journal for Scientific Research and Development
- WrongTBP_Think_Tank
- UT Dallas Syllabus for cgs4315.501.10s taught by Richard Golden (golden)UT Dallas Provost's Technology Group
- UT Dallas Syllabus for cs4375.501.09f taught by Yu Chung Ng (ycn041000)UT Dallas Provost's Technology Group
- tmpC424.tmpFrontiers
- tmp4549.tmpFrontiers
- UT Dallas Syllabus for cs4375.001.11f taught by Yu Chung Ng (ycn041000)UT Dallas Provost's Technology Group
- IPSOS Poll November 19 2015The Conservative Treehouse
- UT Dallas Syllabus for econ6310.001.08f taught by Nathan Berg (nberg)UT Dallas Provost's Technology Group

- tmpB290.tmpFrontiers
- Content based mechanism for social network using MAchine learningInternational Journal for Scientific Research and Development
- Texture Classification based on Gabor WaveletWhite Globe Publications (IJORCS)
- tmpCFA5.tmpFrontiers
- Draft Online Regulation PolicyMatthewLeCordeur
- Colour Object Recognition using Biologically Inspired ModelInternational Journal for Scientific Research and Development
- Lemon Disease Detection Using Image ProcessingInternational Journal for Scientific Research and Development
- tmpC287.tmpFrontiers
- A Survey on Medical Data ClassificationInternational Journal for Scientific Research and Development
- Pattern Classification Using Web MiningInternational Journal for Scientific Research and Development
- Named Entity Recognition for English Tweets using Random Kitchen Sink AlgorithmInternational Journal for Scientific Research and Development
- tmpA5C9.tmpFrontiers
- Quality Evaluation Technique For Phyllanthus Emblica(Gooseberry) Using Computervision With User Friendly GUIInternational Journal for Scientific Research and Development
- tmpC382.tmpFrontiers
- IJSRDV3I110455.pdfInternational Journal for Scientific Research and Development
- Improving Accuracy of Text Classification for SMS DataInternational Journal for Scientific Research and Development
- UT Dallas Syllabus for stat6348.501.11f taught by Robert Serfling (serfling)UT Dallas Provost's Technology Group
- HB 5031-2011 Records ClassificationSAI Global - APAC
- Principal Component Analysis based Opinion Classification for Sentiment AnalysisInternational Journal for Scientific Research and Development
- Comprehensive Survey of Data Classification & Prediction TechniquesInternational Journal for Scientific Research and Development
- tmp5555.tmpFrontiers
- tmp8DB.tmpFrontiers
- Garden City Classification and Compensation Study - Finalsavannahnow.com
- A brief Analysis on Offline Character Recognition in Malayalam ScriptsInternational Journal for Scientific Research and Development
- New Risk Instrument for Offenders Improves Classification DecisionsWashington State Institute for Public Policy
- A Novel Approach for Travel Package Recommendation Using Probabilistic Matrix FactorizationInternational Journal for Scientific Research and Development
- 50109_1935-1939FRASER: Federal Reserve Archive
- tmpFFD7.tmpFrontiers
- Classification of Lung Diseases Using Optimization TechniquesInternational Journal for Scientific Research and Development
- Introduction on Content Based Phishing DetectionInternational Journal for Scientific Research and Development

Sign up to vote on this title

UsefulNot usefulClose Dialog## Are you sure?

This action might not be possible to undo. Are you sure you want to continue?

Loading