13 views

Uploaded by malleswarasastry

Bayes Classification

- ch02-03-notes-1by2
- 0009009
- BayesiaLab Specifications En
- Interpreting Validity Indexes for Diagnostic Tests
- Ensemble Application of Convolutional and Recurrent Neural Networks for Multi-label Text Categorization
- Neural Networks Without Tears
- 290.full
- Diphtheria Toxoid
- Music Genre Classification Report
- I nternational Journal Of Computational Engineering Research (ijceronline.com) Vol. 2 Issue. 7
- Guidance Note C_B_ENV 002, July 02
- SEPSIS 27
- Waaler Rose Latex Package
- Design of Landfill Daily Cells
- A Content-Based Schema Matching Tool
- V2I6201383 paper on extracting text from image.pdf
- A Bayesian Approach to Problems in Stochastic Estimation & Control
- mYPAS.pdf
- Detection of left atrial appendage thrombus by cardiac computed tomography in patients with atrial fibrillation: a meta-analysis.
- Weinstein Et Al

You are on page 1of 31

Contd…

Bayes Classification

• Bayesian classifiers are statistical classifiers

based on Bayes’ theorem

• Predict class membership probabilities

• Naive Bayesian classifier

– Assumes effect of an attribute value on a given

class is independent of the values of the other

attributes – class conditional independence

– Simplifies the computations

– Has comparable performance with decision tree

and selected neural network classifiers

Bayes Classification

• Let X be a data sample (“evidence”): n attributes, class

label unknown

• H: some hypothesis such as that the data tuple X

belongs to a specified class C

• Find P(H|X): probability that the hypothesis H holds

given the observed data tuple X

• Probability that tuple X belongs to class C, given that

we know the attribute description of X

• P(H|X) a posteriori probability of H conditioned on X

Bayes Classification

• Baye’s theorem

– customers described by age and income

– X is a 35-year-old customer with an income of

$40,000

– H: hypothesis that customer will buy a computer

– P(H|X) = ?

Example

• P(H|X): posterior probability that customer X will buy

a computer given his age and income

• P(H) is the prior probability that any given customer

will buy a computer, regardless of age, income

• P(X|H): posterior probability that customer X is 35

years old and earns $40,000, given that he buys a

computer

• P(X): prior probability that a person from the set of

customers is 35 years old and earns $40,000

• P(H), P(X|H), P(X) may be estimated from given data

Naive Bayesian Classification

• D: training set of tuples. Each tuple represented by n-

dimensional attribute vector, X=(x1, x2,…, xn).

Attributes are A1, A2,…, An. m classes - C1, C2,…, Cm

• Given a tuple, X, the classifier will predict that X

belongs to the class having the highest posterior

probability, conditioned on X. Tuple X belongs to class

Ci if and only if

• By Bayes’ theorem

Naive Bayesian Classification

• As P(X) is constant for all classes, only P(X|Ci)P(Ci) needs

to be maximized

• If the class prior probabilities are not known, assume

P(C1)=P(C2)=…=P(Cm), and therefore maximize P(X|Ci)

• Class prior probabilities may be estimated by P(Ci) =

|Ci,D|/|D|

• Given data sets with many attributes, extremely

computationally expensive to compute P(X|Ci)

• Assumption: class-conditional independence, i.e., no

dependence relation between attributes:

n

P( X | C i) P( x | C i) P( x | C i) P( x | C i) ... P( x | C i)

k 1 2 n

k 1

Naive Bayesian Classification

• Attributes may be categorical or continuous

• If Ak is categorical, P(xk|Ci) is the # of tuples in Ci

having value xk for Ak divided by |Ci, D| (# of tuples of

Ci in D)

• If Ak is continous-valued, P(xk|Ci) is usually computed

based on Gaussian distribution with a mean μ and

standard deviation σ 1

( x ) 2

P ( X | Ci ) g ( x , , ) e 2 2

2

• Compute P(X|Ci) for each class Ci

• Predicted class label is class Ci for which

P(X|Ci) is maximum

Example

Avoiding the Zero-Probability Problem

• Naive Bayesian prediction requires each conditional

prob. be non-zero. Otherwise, the predicted

probability will be zero n

P( X | C i) P( x k | C i)

k 1

• Ex: A dataset with 1000 tuples, income=low (0),

income= medium (990), and income = high (10)

• Use Laplacian correction (or Laplacian estimator)

– Adding 1 to each case

Prob(income = low) = 1/1003

Prob(income = medium) = 991/1003

Prob(income = high) = 11/1003

– The “corrected” prob. estimates are close to their

“uncorrected” counterparts

Naive Bayes Classifier: Comments

• Advantages

– Easy to implement

– Good results obtained in most of the cases

• Disadvantages

– Assumption: class conditional independence, therefore loss

of accuracy

– Practically, dependencies exist among variables

• E.g., Patients: Profile - age, family history, etc. Symptoms

- fever, cough etc., Disease - lung cancer, diabetes, etc.

• Dependencies among these cannot be modeled by Naïve

Bayes Classifier

Bayesian Belief Networks

• Unlike naive Bayesian classifiers, do not assume class

conditional independence

• Allow the representation of dependencies among

subsets of attributes

• Graphical model of causal relationships

• Two components

– a directed acyclic graph

– set of conditional probability tables (CPT)

• Node = random variable (discrete- or continuous)

• Arc = probabilistic dependence

Bayesian Belief Networks

• Arc Y -> Z implies Y is a parent, Z is a descendant

Bayesian Belief Networks

• Each variable is conditionally independent of its non-

descendants in the graph, given its parents

• One CPT for each variable

• X = (x1,…, xn) be a data tuple described by the

variables or attributes Y1,…, Yn

• Complete representation of the existing joint

probability distribution

Bayesian Belief Networks

• A node within the network can be selected as an

“output” node, representing a class label attribute

• May be more than one output node

• Can return probability of each class

• Various learning algorithms – gradient descent

• Some applications

– genetic linkage analysis

– computer vision

– document and text analysis

– decision support systems

– sensitivity analysis

Rule-Based Classification

Using IF-THEN Rules for Classification

• Represent the knowledge in the form of IF-THEN rules

• R: IF age = youth AND student = yes THEN buys_computer =

yes

• Can be generated either from a decision Tree or directly from

the training data using a sequential covering algorithm

• The “IF” part (or left side) of a rule is known as the rule

antecedent or precondition

• The “THEN” part (or right side) is the rule consequent

• If the rule antecedent holds true for a given tuple, we say that

the rule is satisfied and that the rule covers the tuple

• Assessment of a rule R: coverage and accuracy

– ncovers = # of tuples covered by R

– ncorrect = # of tuples correctly classified by R

Using IF-THEN Rules for Classification

conflict resolution

– Size ordering: assigns highest priority to the

triggering rules that has the “toughest”

requirement (i.e., with the most attribute tests)

– Rule ordering: rules prioritized beforehand

• class-based, rule-based

Using IF-THEN Rules for Classification

• Class-based ordering: classes are sorted in

decreasing order of prevalence or misclassification

cost per class

• Rule-based ordering (decision list): rules are

organized into one long priority list, according to

some measure of rule quality or by experts

• What if no rule satisfied by X?

– Set up a default rule to specify a default class,

based on a training set

– May be the class in majority or the majority class

of the tuples that were not covered by any rule

Rule Extraction from a Decision Tree

• Rules are easier to understand than large trees

• One rule is created for each path from the root

to a leaf

• Each splitting criterion along a given path is

logically ANDed to form the rule antecedent

(“IF” part)

• The leaf node holds the class prediction,

forming the rule consequent (“THEN” part)

Rule Extraction from a Decision Tree

• Example

IF age = youth AND student = yes THEN buys_computer = yes

IF age = mid-age THEN buys_computer = yes

IF age = senior AND credit_rating = fair THEN buys_computer = no

IF age = senior AND credit_rating = excellent THEN buys_computer = yes

Rule Extraction from a Decision Tree

• Rules extracted are mutually exclusive and

exhaustive

• Mutually exclusive: cannot have rule conflicts

here as no two rules will be triggered for the

same tuple

• Exhaustive: one rule for each possible attribute–

value combination

• Since one rule extracted per leaf, the set of rules

is not much simpler than the corresponding

decision tree

• Rule pruning required

Rule Induction: Sequential Covering Algorithm

• Typical sequential covering algorithms: FOIL, AQ, CN2,

RIPPER

• Rules are learned sequentially, each for a given class Ci will

cover many tuples of Ci but none (or few) of the tuples of

other classes

• Steps:

– Rules are learned one at a time

– Each time a rule is learned, the tuples covered by the rules are

removed

– Repeat the process on the remaining tuples until termination

condition, e.g., when no more training examples or when the

quality of a rule returned is below a user-specified threshold

Basic Sequential Covering Algorithm

How are rules learned?

• Start with the most general rule possible:

– IF THEN loan_decision = accept

• Add new attributes by adopting a greedy

depth-first strategy

– Pick the one that improves the rule quality most

• The resulting rule should cover relatively more

of the “accept” tuples

Rule Learning

Rule-Quality measures

• Need to consider both coverage and accuracy

• Entropy - prefers rules that cover a large number of tuples of a

single class and few tuples of other classes

• Tuples of the class for which rules are learned are called

positive tuples, while the remaining tuples are negative

• Foil-gain (in FOIL & RIPPER): assesses information gained by

extending the antecedent

pos ' pos

FOIL _ Gain pos '(log 2 log 2 )

pos ' neg ' pos neg

• Favors rules that have high accuracy and cover many positive

tuples

Rule Pruning

• Prune a rule, R, if the pruned version of R has greater

quality, as assessed on an independent set of tuples

pos neg

FOIL _ Prune( R)

pos neg

• If FOIL_Prune is higher for the pruned version of R,

prune R

Classifier Evaluation Metrics

Confusion Matrix:

Actual class\Predicted class C1 ¬ C1

C1 True Positives (TP) False Negatives (FN)

¬ C1 False Positives (FP) True Negatives (TN)

Actual class\Predicted class buy_computer buy_computer = Total

= yes no

buy_computer = yes 6954 46 7000

buy_computer = no 412 2588 3000

Total 7366 2634 10000

Classifier Evaluation Metrics

• Classifier Accuracy, or recognition rate: A\P C ¬C

percentage of test set tuples that are C TP FN P

correctly classified ¬C FP TN N

Accuracy = (TP + TN)/All P’ N’ All

• Error rate: 1 – accuracy, or

Error rate = (FP + FN)/All

• Sensitivity: True Positive recognition rate

• Sensitivity = TP/P

• Specificity: True Negative recognition

rate

• Specificity = TN/N

Classifier Evaluation Metrics

• Precision: exactness – what % of tuples that the classifier labeled

as positive are actually positive

label as positive?

– assigns ß times as much weight to recall as to precision

- ch02-03-notes-1by2Uploaded byMinh Anh Nguyen
- 0009009Uploaded bysrilankanonline
- BayesiaLab Specifications EnUploaded byskathpalia
- Interpreting Validity Indexes for Diagnostic TestsUploaded byJudson Borges
- Ensemble Application of Convolutional and Recurrent Neural Networks for Multi-label Text CategorizationUploaded byjeffconnors
- Neural Networks Without TearsUploaded byposermobile1991
- 290.fullUploaded bycrownesya
- Diphtheria ToxoidUploaded byMuhammad Imran
- Music Genre Classification ReportUploaded byEge Erdem
- I nternational Journal Of Computational Engineering Research (ijceronline.com) Vol. 2 Issue. 7Uploaded byInternational Journal of computational Engineering research (IJCER)
- Guidance Note C_B_ENV 002, July 02Uploaded byMitha Arianti
- SEPSIS 27Uploaded byJuan Aarón Rodríguez Carbonell
- Waaler Rose Latex PackageUploaded byTarun Arora
- Design of Landfill Daily CellsUploaded byKaustav Patnaik
- A Content-Based Schema Matching ToolUploaded byWorld of Computer Science and Information Technology Journal
- V2I6201383 paper on extracting text from image.pdfUploaded bywriter jey
- A Bayesian Approach to Problems in Stochastic Estimation & ControlUploaded byMichael
- mYPAS.pdfUploaded byAnnisa
- Detection of left atrial appendage thrombus by cardiac computed tomography in patients with atrial fibrillation: a meta-analysis.Uploaded bygiovanni
- Weinstein Et AlUploaded byBeenu Jain
- 11796529 ds en xs-s precision lrUploaded byapi-239932254
- Architecture SlidesUploaded byDaniela Marquesan
- Journal of Parenteral and Enteral Nutrition Volume 39 Issue 4 2015 [Doi 10.1177%2F0148607114529161] Karagiozoglou-Lampoudi, T.; Daskalou, E.; Lampoudis, D.; Apostol -- Computer-Based Malnutrition RiskUploaded byAsep Iwan Purnawan
- 0d8Uploaded byAbdul Ghaffar Stanikzai
- wekaanfisUploaded byelvisbaique8264
- Session V Paper No 4 (P 100-105)Uploaded byRavi Soni
- 41f8cca7c0c2d946932ac9196d251755038fUploaded byCambriaChico
- Di Marco m 2014 Drivers of Extinction Risk in African Mammals the Interplay of Distribution State Human Pressure Conservation Response and Species BiologyUploaded byLucía Soler
- BahetyUploaded byAnika Gupta
- mlUploaded byNhật Anh

- Manufacturing & Market Potential of Sanitary Napkins _ Nonwovens & Technical Textiles _ Features _ the ITJUploaded bypndmonium
- bikram cervoUploaded byVipul Gupta
- Mizo Hla Kungpui Mual Bulletin (March 13, 2011Uploaded byLehkhabukhawvel
- Report 3D Finite Element Model of DLR-F6 Aircraft WingUploaded byjohnkevinmdizon
- Riedel Artist USUploaded bycliffforscribd
- 8051 Micro Controller HistoryUploaded bytarakp_1
- SMU contentsUploaded bySharath Kamath
- Chiller Testing Procedure Rev 1Uploaded byAli Aimran
- ADBMS test1Uploaded byBharath M Bharath
- Components of computer networksUploaded byLaur
- Binary NumUploaded byNeha Soni
- C02 Leakage Through Exsisiting WellsUploaded byb4rf
- Risk Management: Best Practices for Medical Device ProfitabilityUploaded bymastercontroldotcom
- Dynamic Stiffness Optimization Using RadiossUploaded byMounika Katragadda
- Best Imitation of Myself by Ben FoldsUploaded byzackmoy
- Enhancing Situational AwarenessUploaded byss_tayade
- Opportunity, challenge and Scope of Natural Products in Medicinal ChemistryUploaded byCamilla Karen Fernandes Carneiro
- 70-236V5.74Uploaded byMillind143
- KTA19-G8Uploaded byYosi Darmawansyah
- Getting Started With MASM and Visual Studio 2015Uploaded byHugo Betto Wilde Lunicer
- R/C Soaring Digest - Apr 2010Uploaded byAviation/Space History Library
- spanish 1 syllabusUploaded byapi-275286586
- Case Study on a Structural Building Subjected to Earthquake Forces Considering Soil Structure InteractionUploaded byEditor IJRITCC
- SP3D Tutorials for PipingUploaded byTonyGold25
- Каталог_Оружие России_2000Uploaded bybvg07
- 24226.Lotsizing and Scheduling for Production Planning (Lecture Notes in Economics and Mathematical Systems)Uploaded bySimplice Kengne
- a5e00074611-06en Ps2hartwithouteexd FbUploaded byGourishankar Mahapatra
- 16277628-CAESAR-II-UGUploaded bypraisecat
- IPR & Competition LawUploaded byAvYlashKumbhar
- FTC Meticulously Dismantles Motion to Dissolve Injunction Against Don Juravin Personally and All DefendantsUploaded byJuro