You are on page 1of 57

Introduction

Bayesian
Classifiers
Bayesian Classifiers
Naive Bayes
Classifier

TAN and BAN Probabilistic Graphical Models


Semi-Naive
Bayesian
Classifiers L. Enrique Sucar, INAOE
Multidimen.
Bayesian
Classifiers
Bayesian Chain
Classifiers

Hierarchical
Classification

Applications

References

(INAOE) 1 / 56
Outline
Introduction
1 Introduction
Bayesian
Classifiers
2 Bayesian Classifiers
Naive Bayes
Classifier
3 Naive Bayes Classifier
TAN and BAN

Semi-Naive
Bayesian
4 TAN and BAN
Classifiers

Multidimen. 5 Semi-Naive Bayesian Classifiers


Bayesian
Classifiers
Bayesian Chain
6 Multidimen. Bayesian Classifiers
Classifiers
Bayesian Chain Classifiers
Hierarchical
Classification

Applications
7 Hierarchical Classification
References
8 Applications

9 References

(INAOE) 2 / 56
Introduction

Classification
Introduction

Bayesian
Classifiers
Classification consists in assigning classes or labels to
Naive Bayes
objects. There are two basic types of classification
Classifier problems:
TAN and BAN

Semi-Naive
Unsupervised: in this case the classes are unknown, so the
Bayesian problem consists in dividing a set of objects into
Classifiers

Multidimen.
n groups or clusters, so that a class is assigned
Bayesian
Classifiers
to each different group. It is also known as
Bayesian Chain
Classifiers
clustering.
Hierarchical Supervised: the possible classes or labels are known a
Classification

Applications
priori, and the problem consists in finding a
References
function or rule that assigns each object to one
of the classes.

(INAOE) 3 / 56
Introduction

Probabilistic Classification
Introduction

Bayesian
Classifiers

Naive Bayes • Supervised classification consists in assigning to a


Classifier
particular object described by its attributes,
TAN and BAN

Semi-Naive
A1 , A2 , ..., An , one of m classes, C = {c1 , c2 , ..., cm },
Bayesian
Classifiers
such that the probability of the class given the attributes
Multidimen.
is maximized:
Bayesian
Classifiers
Bayesian Chain
Classifiers
ArgC [MaxP(C | A1 , A2 , ..., An )] (1)
Hierarchical
Classification • If we denote the set of attributes as A = {A1 , A2 , ..., An }:
Applications ArgC [MaxP(C | A)]
References

(INAOE) 4 / 56
Introduction

Classifier Evaluation
Introduction

Bayesian Accuracy: it refers to how well a classifier predicts the


Classifiers

Naive Bayes
correct class for unseen examples (that is,
Classifier those not considered for learning the classifier).
TAN and BAN
Classification time: how long it takes the classification
Semi-Naive
Bayesian process to predict the class, once the classifier
Classifiers
has been trained.
Multidimen.
Bayesian
Classifiers Training time: how much time is required to learn the
Bayesian Chain
Classifiers classifier from data.
Hierarchical
Classification
Memory requirements: how much space in terms of
Applications memory is required to store the classifier
References parameters.
Clarity: if the classifier is easily understood by a person.

(INAOE) 5 / 56
Introduction

Class Imbalance
Introduction

Bayesian
• In general we want to maximize the classification
Classifiers accuracy; however, this is only optimal if the cost of a
Naive Bayes
Classifier
wrong classification is the same for all the classes
TAN and BAN • When there is imbalance in the costs of
Semi-Naive
Bayesian
misclassification, we must then minimize the expected
Classifiers cost (EC). For two classes, this is given by:
Multidimen.
Bayesian
Classifiers EC = FN × P(−)C(− | +) + FP × P(+)C(+ | −) (2)
Bayesian Chain
Classifiers

Hierarchical Where: FN is the false negative rate, FP is the false


Classification
positive rate, P(+) is the probability of positive, P(−) is
Applications

References
the probability of negative, C(− | +) is the cost of
classifying a positive as negative, and C(+ | −) is the
cost of classifying a negative as positive

(INAOE) 6 / 56
Bayesian Classifiers

Bayes Classifier (I)


Introduction

Bayesian
Classifiers

Naive Bayes
Classifier • Applying Bayes rule:
TAN and BAN

Semi-Naive
Bayesian
P(C)P(A1 , A2 , ..., An | C)
Classifiers
P(C | A1 , A2 , ..., An ) = (3)
Multidimen. P(A1 , A2 , ..., An )
Bayesian
Classifiers
Bayesian Chain • Which can be written more compactly as:
Classifiers

Hierarchical
Classification
P(C | A) = P(C)P(A | C)/P(A) (4)
Applications

References

(INAOE) 7 / 56
Bayesian Classifiers

Bayes Classifier (II)


Introduction

Bayesian
Classifiers
• The classification problem can be formulated as:
Naive Bayes
Classifier

TAN and BAN


ArgC [Max[P(C | A) = P(C)P(A | C)/P(A)]] (5)
Semi-Naive
Bayesian
Classifiers
• Equivalently:
Multidimen. • ArgC [Max[P(C)P(A | C)]]
Bayesian
Classifiers • ArgC [Max[log(P(C)P(A | C))]]
Bayesian Chain
Classifiers • ArgC [Max[(logP(C) + logP(A | C)]]
Hierarchical
Classification Note that the probability of the attributes, P(A), does
Applications not vary with respect to the class, so it can be
References considered as a constant for the maximization.

(INAOE) 8 / 56
Bayesian Classifiers

Complexity
Introduction

Bayesian
Classifiers

Naive Bayes • The direct application of the Bayes rule results in a


Classifier

TAN and BAN


computationally expensive problem
Semi-Naive • The number of parameters in the likelihood term,
Bayesian
Classifiers P(A1 , A2 , ..., An | C), increases exponentially with the
Multidimen. number of attributes
Bayesian
Classifiers
Bayesian Chain
• An alternative is to consider some independence
Classifiers
properties as in graphical models, in particular that all
Hierarchical
Classification attributes are independent given the class, resulting in
Applications the Naive Bayesian Classifier
References

(INAOE) 9 / 56
Naive Bayes Classifier

Naive Bayes
Introduction

Bayesian
Classifiers

Naive Bayes
Classifier
• The naive or simple Bayesian classifier (NBC) is based
TAN and BAN on the assumption that all the attributes are
Semi-Naive independent given the class variable:
Bayesian
Classifiers

Multidimen.
Bayesian P(C)P(A1 | C)P(A2 | C)...P(An | C)
Classifiers P(C | A1 , A2 , ..., An ) =
Bayesian Chain P(A)
Classifiers
(6)
Hierarchical
Classification where P(A) can be considered, as mentioned before, a
Applications normalization constant.
References

(INAOE) 10 / 56
Naive Bayes Classifier

Complexity
Introduction

Bayesian
Classifiers
• The naive Bayes formulation drastically reduces the
Naive Bayes
Classifier complexity of the Bayesian classifier, as in this case we
TAN and BAN only require the prior probability (one dimensional
Semi-Naive vector) of the class, and the n conditional probabilities
Bayesian
Classifiers of each attribute given the class (two dimensional
Multidimen. matrices)
Bayesian
Classifiers
Bayesian Chain
• The space requirement is reduced from exponential to
Classifiers
linear in the number of attributes
Hierarchical
Classification • The calculation of the posterior is greatly simplified, as
Applications
to estimate it (unnormalized) only n multiplications are
References
required

(INAOE) 11 / 56
Naive Bayes Classifier

Graphical Model
Introduction

Bayesian
Classifiers

Naive Bayes
Classifier

TAN and BAN

Semi-Naive
Bayesian
Classifiers

Multidimen.
Bayesian
Classifiers
Bayesian Chain
Classifiers

Hierarchical
Classification

Applications

References

(INAOE) 12 / 56
Naive Bayes Classifier

Parameter Learning
Introduction

Bayesian
Classifiers

Naive Bayes
Classifier • The probabilities can be estimated from data using, for
TAN and BAN instance, maximum likelihood estimation
Semi-Naive
Bayesian
• The prior probabilities of the class variable, C, are given
Classifiers
by:
Multidimen.
Bayesian P(ci ) ∼ Ni /N (7)
Classifiers
Bayesian Chain
Classifiers
• The conditional probabilities of each attribute, Aj can be
Hierarchical estimated as:
Classification
P(Ajk | ci ) ∼ Njki /Ni (8)
Applications

References

(INAOE) 13 / 56
Naive Bayes Classifier

Inference
Introduction

Bayesian
Classifiers

Naive Bayes
Classifier • The posterior probability can be obtained just by
TAN and BAN multiplying the prior by the likelihood for each attribute
Semi-Naive
Bayesian
• Given the values for m attributes, a1 , ...am , for each
Classifiers class ci , the posterior is proportional to:
Multidimen.
Bayesian
Classifiers P(ci | a1 , ...am ) ∼ P(ci )P(a1 | ci )...P(am | ci ) (9)
Bayesian Chain
Classifiers

Hierarchical • The class ck that maximizes the previous equation will


Classification

Applications
be selected
References

(INAOE) 14 / 56
Naive Bayes Classifier

Example - Classifier for Golf


Introduction Outlook Temperature Humidity Windy Play
Bayesian
Classifiers
sunny high high false no
Naive Bayes sunny high high true no
Classifier
overcast high high false yes
TAN and BAN
rain medium high false yes
Semi-Naive
Bayesian rain low normal false yes
Classifiers
rain low normal true no
Multidimen.
Bayesian overcast low normal true yes
Classifiers
Bayesian Chain sunny medium high false no
Classifiers

Hierarchical
sunny low normal false yes
Classification rain medium normal false yes
Applications
sunny medium normal true yes
References
overcast medium high true yes
overcast high normal false yes
rain medium high true no

(INAOE) 15 / 56
Naive Bayes Classifier

Example - NBC for Golf


Introduction

Bayesian
Classifiers

Naive Bayes
Classifier

TAN and BAN

Semi-Naive
Bayesian
Classifiers

Multidimen.
Bayesian
Classifiers
Bayesian Chain
Classifiers

Hierarchical
Classification

Applications

References

(INAOE) 16 / 56
Naive Bayes Classifier

Example - inference for Golf


Introduction

Bayesian
Classifiers

Naive Bayes
Classifier
Given: Outlook=rain, Temperature=high, Humidity=normal,
TAN and BAN Windy=no
Semi-Naive P(Play = yes | overcast, medium, normal, no) =
Bayesian
Classifiers k × 0.64 × 0.33 × 0.22 × 0.67 × 0.67 = k × 0.021
Multidimen. P(Play = no | overcast, medium, normal, no) =
Bayesian
Classifiers k × 0.36 × 0.40 × 0.40 × 0.2. × 0.40 = k × 0.0046
Bayesian Chain
Classifiers k = 1/(0.021 + 0.0046) = 39.27
Hierarchical
Classification
P(Play = yes | overcast, medium, normal, no) = 0.82
Applications
P(Play = no | overcast, medium, normal, no) = 0.18
References

(INAOE) 17 / 56
Naive Bayes Classifier

Analysis
Introduction

Bayesian
• Advantages:
Classifiers
• The low number of required parameters, which reduces
Naive Bayes
Classifier
the memory requirements and facilitates learning them
TAN and BAN
from data.
• The low computational cost for inference (estimating the
Semi-Naive
Bayesian posteriors) and learning.
Classifiers
• The relatively good performance (classification
Multidimen.
Bayesian precision) in many domains.
Classifiers • A simple and intuitive model.
Bayesian Chain
Classifiers
• Limitations:
Hierarchical
Classification • In some domains, the performance is reduced given that
Applications the conditional independence assumption is not valid.
References • If there are continuous attributes, these need to be
discretized (or consider alternative models such as the
linear discriminator).

(INAOE) 18 / 56
TAN and BAN

TAN
Introduction
• The Tree augmented Bayesian Classifier, or TAN,
Bayesian
Classifiers incorporates some dependencies between the
Naive Bayes
Classifier
attributes by building a directed tree among the attribute
TAN and BAN
variables
Semi-Naive • The n attributes form a graph which is restricted to a
Bayesian
Classifiers directed tree that represents the dependency relations
Multidimen. between the attributes
Bayesian
Classifiers
Bayesian Chain
Classifiers

Hierarchical
Classification

Applications

References

(INAOE) 19 / 56
TAN and BAN

BAN
Introduction
• The Bayesian Network augmented Bayesian Classifier,
Bayesian
Classifiers or BAN, which considers that the dependency structure
Naive Bayes among the attributes constitutes a directed acyclic
Classifier
graph (DAG)
TAN and BAN

Semi-Naive
Bayesian
Classifiers

Multidimen.
Bayesian
Classifiers
Bayesian Chain
Classifiers

Hierarchical
Classification

Applications

References

(INAOE) 20 / 56
TAN and BAN

Inference and Learning


Introduction

Bayesian
Classifiers

Naive Bayes
Classifier

TAN and BAN


• The TAN and BAN classifiers can be considered as
Semi-Naive
particular cases of a more general model, that is,
Bayesian
Classifiers
Bayesian networks
Multidimen. • Techniques for inference and for learning Bayesian
Bayesian
Classifiers networks, which can be applied to obtain the posterior
Bayesian Chain
Classifiers probabilities (inference) and the model (learning) for the
Hierarchical
Classification
TAN and BAN classifiers
Applications

References

(INAOE) 21 / 56
Semi-Naive Bayesian Classifiers

Semi-Naive Bayesian Classifiers (SNBC)


Introduction

Bayesian
Classifiers

Naive Bayes
Classifier • Another alternative to deal with dependent attributes is
TAN and BAN to transform the basic structure of a naive Bayesian
Semi-Naive
Bayesian
classifier, while maintaining a star or tree-structured
Classifiers network
Multidimen.
Bayesian • The basic idea of the SNBC is to eliminate or join
Classifiers
Bayesian Chain attributes which are not independent given the class
Classifiers

Hierarchical • This is analogous to feature selection in machine


Classification
learning
Applications

References

(INAOE) 22 / 56
Semi-Naive Bayesian Classifiers

Structural Improvement
Introduction
• Two alternative operations to modify the structure of a
Bayesian
Classifiers NBC: (i) node elimination, and (ii) node combination,
Naive Bayes
Classifier
considering that we start from a full structure
TAN and BAN

Semi-Naive
Bayesian
Classifiers

Multidimen.
Bayesian
Classifiers
Bayesian Chain
Classifiers

Hierarchical
Classification

Applications

References

(INAOE) 23 / 56
Semi-Naive Bayesian Classifiers

Node elimination and combination


Introduction

Bayesian
Classifiers • Node elimination consists in simply eliminating an
Naive Bayes attribute, Ai , from the model, this could be because it is
Classifier
not relevant for the class (Ai and C are independent); or
TAN and BAN
because the attribute Ai and another attribute, Aj , are
Semi-Naive
Bayesian not independent given the class
Classifiers

Multidimen. • Node combination consists in merging two attributes, Ai


Bayesian
Classifiers and Aj , into a new attribute Ak , such that Ak has as
Bayesian Chain
Classifiers possible values the cross product of the values of Ai
Hierarchical and Aj (assuming discrete attributes). For example, if
Classification

Applications
Ai = a, b, c and Aj = 1, 2, then
References
Ak = a1, a2, b1, b2, c1, c2. This is an alternative when
two attributes are not independent given the class

(INAOE) 24 / 56
Semi-Naive Bayesian Classifiers

Node Insertion
Introduction
• Consists in adding a new attribute that makes two
Bayesian
Classifiers dependent attributes independent
Naive Bayes
Classifier
• This new attribute is a kind of virtual or hidden node in
TAN and BAN the model, for which we do not have any data
Semi-Naive • An alternative for estimating the parameters of hidden
Bayesian
Classifiers variables in Bayesian networks, such as in this case, is
Multidimen.
Bayesian
based on the Expectation-Maximization (EM) procedure
Classifiers
Bayesian Chain
Classifiers

Hierarchical
Classification

Applications

References

(INAOE) 25 / 56
Multidimen. Bayesian Classifiers

Multidimensional classification
Introduction

Bayesian
Classifiers

Naive Bayes
Classifier

TAN and BAN • Several important problems need to predict several


Semi-Naive classes simultaneously
Bayesian
Classifiers • For example: text classification, where a document can
Multidimen.
Bayesian
be assigned to several topics; gene classification, as a
Classifiers
Bayesian Chain
gene may have different functions; image annotation, as
Classifiers
an image may include several objects
Hierarchical
Classification

Applications

References

(INAOE) 26 / 56
Multidimen. Bayesian Classifiers

Definition
Introduction

Bayesian
Classifiers

Naive Bayes
Classifier
• The multi-dimensional classification problem
TAN and BAN

Semi-Naive
corresponds to searching for a function h that assigns to
Bayesian each instance represented by a vector of m features
Classifiers

Multidimen.
X = (X1 , . . . , Xm ) a vector of d class values
Bayesian
Classifiers
C = (C1 , . . . , Cd ):
Bayesian Chain
Classifiers

Hierarchical ArgMaxc1 ,...,cd P(C1 = c1 , . . . , Cd = cd |X) (10)


Classification

Applications

References

(INAOE) 27 / 56
Multidimen. Bayesian Classifiers

Multi-label classification
Introduction

Bayesian
Classifiers • Multi-label classification is a particular case of
Naive Bayes multi-dimensional classification, where all class
Classifier
variables are binary
TAN and BAN

Semi-Naive
• Two basic approaches:
Bayesian • Binary relevance approaches transform the multi-label
Classifiers

Multidimen.
classification problem into d independent binary
Bayesian classification problems, one for each class variable,
Classifiers
Bayesian Chain C1 , . . . , Cd
Classifiers
• The label power-set approach transforms the multi-label
Hierarchical
Classification classification problem into a single-class scenario by
Applications defining a new compound class variable whose possible
References values are all the possible combinations of values of the
original classes

(INAOE) 28 / 56
Multidimen. Bayesian Classifiers

Multidimensional Bayesian Network


Introduction
Classifiers
Bayesian
Classifiers

Naive Bayes
• A multidimensional Bayesian network classifier (MBC) s
Classifier a Bayesian network with a particular structure
TAN and BAN
• The set V of variables is partitioned into two sets
Semi-Naive
Bayesian VC = {C1 , . . . , Cd }, d ≥ 1, of class variables and
Classifiers

Multidimen.
VX = {X1 , . . . , Xm }, m ≥ 1, of feature variables
Bayesian
Classifiers
(d + m = n)
Bayesian Chain
Classifiers • The set A of arcs is also partitioned into three sets, AC ,
Hierarchical AX , ACX , such that AC ⊆ VC × VC is composed of the
Classification

Applications
arcs between the class variables, AX ⊆ VX × VX is
References
composed of the arcs between the feature variables,
and finally, ACX ⊆ VC × VX is composed of the arcs from
the class variables to the feature variables

(INAOE) 29 / 56
Multidimen. Bayesian Classifiers

MDBNC - Graphical Model


Introduction

Bayesian
Classifiers

Naive Bayes
Classifier

TAN and BAN

Semi-Naive
Bayesian
Classifiers

Multidimen.
Bayesian
Classifiers
Bayesian Chain
Classifiers

Hierarchical
Classification

Applications

References

(INAOE) 30 / 56
Multidimen. Bayesian Classifiers

Inference
Introduction

Bayesian
Classifiers

Naive Bayes
Classifier

TAN and BAN • The problem of obtaining the classification of an


Semi-Naive instance with a MBC, that is, the most likely
Bayesian
Classifiers combination of classes, corresponds to the MPE (Most
Multidimen. Probable Explanation) or abduction problem
Bayesian
Classifiers
Bayesian Chain
• This is a complex problem with a high computational
Classifiers
cost
Hierarchical
Classification

Applications

References

(INAOE) 31 / 56
Multidimen. Bayesian Classifiers Bayesian Chain Classifiers

Chain Classifiers
Introduction

Bayesian
Classifiers • Chain classifiers are an alternative method for
Naive Bayes
Classifier multi-label classification that incorporate class
TAN and BAN dependencies, while keeping the computational
Semi-Naive efficiency of the binary relevance approach
Bayesian
Classifiers • A chain classifier consists of d base binary classifiers
Multidimen.
Bayesian
which are linked in a chain, such that each classifier
Classifiers
Bayesian Chain
incorporates the classes predicted by the previous
Classifiers
classifiers as additional attributes
Hierarchical
Classification • The feature vector for each binary classifier, Li , is
Applications extended with the labels (0/1) for all previous classifiers
References
in the chain

(INAOE) 32 / 56
Multidimen. Bayesian Classifiers Bayesian Chain Classifiers

Chain Classifiers - example


Introduction

Bayesian
Classifiers

Naive Bayes
Classifier

TAN and BAN

Semi-Naive
Bayesian
Classifiers

Multidimen.
Bayesian
Classifiers
Bayesian Chain
Classifiers

Hierarchical
Classification

Applications

References

(INAOE) 33 / 56
Multidimen. Bayesian Classifiers Bayesian Chain Classifiers

Learning and Inference


Introduction

Bayesian
Classifiers

Naive Bayes • Each classifier in the chain is trained to learn the


Classifier

TAN and BAN


association of label li given the features augmented with
Semi-Naive all previous class labels in the chain
Bayesian
Classifiers • For classification, it starts at L1 , and propagates the
Multidimen. predicted classes along the chain, such that for Li ∈ L
Bayesian
Classifiers (where L = {L1 , L2 , . . . , Ld }) it predicts
Bayesian Chain
Classifiers
P(Li | X, L1 , L2 , . . . , Li−1 )
Hierarchical
Classification • The class vector is determined by combining the
Applications outputs of all the binary classifiers
References

(INAOE) 34 / 56
Multidimen. Bayesian Classifiers Bayesian Chain Classifiers

Bayesian chain classifiers


Introduction

Bayesian • Bayesian chain classifiers are a type of chain classifier


Classifiers

Naive Bayes
under a probabilistic framework:
Classifier

TAN and BAN

Semi-Naive
Bayesian
ArgMaxC1 ,...,cd P(C1 |C2 , . . . , Cd , X) . . . P(Cd |X) (11)
Classifiers

Multidimen. • If we consider the dependency relations between the


Bayesian
Classifiers class variables, and represent these relations as a
Bayesian Chain
Classifiers directed acyclic graph (DAG), then we can simplify the
Hierarchical
Classification
previous Equation:
Applications
d
Y
References
ArgMaxC1 ,...,Cd P(Ci |Pa(Ci ), x) (12)
i=1

(INAOE) 35 / 56
Multidimen. Bayesian Classifiers Bayesian Chain Classifiers

• Further simplification by assuming that the most


probable joint combination of classes can be
Introduction
approximated by concatenating the individual most
Bayesian
Classifiers probable classes
Naive Bayes
Classifier

TAN and BAN


ArgMaxC1 P(C1 |Pa(C1 ), X)
Semi-Naive
Bayesian ArgMaxC2 P(C2 |Pa(C2 ), X)
Classifiers

Multidimen. ·····················
Bayesian
Classifiers ArgMaxCd P(Cd |Pa(Cd ), X)
Bayesian Chain
Classifiers

Hierarchical
Classification
• This last approximation corresponds to a Bayesian
Applications

References
chain classifier.
• For the base classifier we can use any of the Bayesian
classifiers presented in the previous sections, for
instance a NBC
(INAOE) 36 / 56
Multidimen. Bayesian Classifiers Bayesian Chain Classifiers

BCC
Introduction

Bayesian
Classifiers

Naive Bayes
Classifier

TAN and BAN

Semi-Naive
Bayesian
Classifiers

Multidimen.
Bayesian
Classifiers
Bayesian Chain
Classifiers

Hierarchical
Classification

Applications

References

(INAOE) 37 / 56
Hierarchical Classification

Hierarchical Classification
Introduction

Bayesian
Classifiers
• Hierarchical classification is a type of multidimensional
Naive Bayes
Classifier classification in which the classes are ordered in a
TAN and BAN predefined structure, typically a tree, or in general a
Semi-Naive
Bayesian
directed acyclic graph (DAG)
Classifiers
• In hierarchical classification, an example that belongs to
Multidimen.
Bayesian certain class automatically belongs to all its
Classifiers
Bayesian Chain
Classifiers
superclasses
Hierarchical • Hierarchical classification has application in several
Classification
areas, such as text categorization, protein function
Applications

References
prediction, and object recognition

(INAOE) 38 / 56
Hierarchical Classification

Hierarchical Classification - example


Introduction

Bayesian
Classifiers

Naive Bayes
Classifier

TAN and BAN

Semi-Naive
Bayesian
Classifiers

Multidimen.
Bayesian
Classifiers
Bayesian Chain
Classifiers

Hierarchical
Classification

Applications

References

(INAOE) 39 / 56
Hierarchical Classification

Basic approaches
Introduction

Bayesian • The global approach builds a classifier to predicts all


Classifiers
the classes at once; this becomes too complex
Naive Bayes
Classifier computationally for large hierarchies.
TAN and BAN
• The local approaches train several classifiers and
Semi-Naive
Bayesian
combine their outputs:
Classifiers • Local Classifier per hierarchy Level, that trains one
Multidimen.
Bayesian
multi-class classifier for each level of the class hierarchy
Classifiers • Local binary Classifier per Node, in which a binary
Bayesian Chain
Classifiers classifier is built for each node (class) in the hierarchy,
Hierarchical except the root node
Classification
• Local Classifier per Parent Node (LCPN), where a
Applications
multi-class classifier is trained to predict its child nodes
References
• Local methods commonly use a top-down approach for
classification

(INAOE) 40 / 56
Hierarchical Classification

Types of Methods
Introduction

Bayesian
Classifiers

Naive Bayes
Classifier

TAN and BAN

Semi-Naive
Bayesian
Classifiers

Multidimen.
Bayesian
Classifiers
Bayesian Chain
Classifiers

Hierarchical
Classification

Applications

References

(INAOE) 41 / 56
Hierarchical Classification

Hierarchical Structure
Introduction

Bayesian
Classifiers

Naive Bayes
Classifier

TAN and BAN

Semi-Naive
Bayesian
Classifiers

Multidimen.
Bayesian
Classifiers
Bayesian Chain
Classifiers

Hierarchical
Classification

Applications

References

(INAOE) 42 / 56
Hierarchical Classification

Chained path evaluation


Introduction

Bayesian
Classifiers

Naive Bayes
Classifier
• Chained Path Evaluation (CPE) analyzes each possible
TAN and BAN

Semi-Naive
path from the root to a leaf node in the hierarchy, taking
Bayesian
Classifiers
into account the level of the predicted labels to give a
Multidimen.
score to each path and finally return the one with the
Bayesian
Classifiers
best score
Bayesian Chain
Classifiers • Additionally, it considers the relations of each node with
Hierarchical
Classification
its ancestors in the hierarchy, based on chain classifiers
Applications

References

(INAOE) 43 / 56
Hierarchical Classification

Training
Introduction

Bayesian
Classifiers
• A local classifier is trained for each node, Ci , in the
Naive Bayes
Classifier hierarchy, except the leaf nodes, to classify its child
TAN and BAN nodes
Semi-Naive
Bayesian
• The classifier for each node, Ci , for instance a naive
Classifiers
Bayes classifier, is trained considering examples from
Multidimen.
Bayesian all it child nodes, as well as some examples of it sibling
Classifiers
Bayesian Chain
nodes in the hierarchy
Classifiers

Hierarchical
• To consider the relation with other nodes in the
Classification
hierarchy, the class predicted by the parent (tree
Applications
structure) or parents (DAG), are included as additional
References
attributes in each local classifier

(INAOE) 44 / 56
Hierarchical Classification

Classification
Introduction • In the classification phase, the probabilities for each
Bayesian
Classifiers
class for all local classifiers are obtained based on the
Naive Bayes
input data
Classifier
• The score for each path in the hierarchy is calculated by
TAN and BAN

Semi-Naive
a weighted sum of the log of the probabilities of all local
Bayesian classifiers in the path:
Classifiers

Multidimen. n
Bayesian X
Classifiers score = wCi × log(P(Ci |Xi , pa(Ci ))) (13)
Bayesian Chain
Classifiers
i=0
Hierarchical
Classification • The purpose of these weights is to give more
Applications importance to the upper levels of the hierarchy
References
• Once the scores for all the paths are obtained, the path
with the highest score will be selected as the set of
classes corresponding to certain instance

(INAOE) 45 / 56
Hierarchical Classification

CPE - Classification Example


Introduction

Bayesian
Classifiers

Naive Bayes
Classifier

TAN and BAN

Semi-Naive
Bayesian
Classifiers

Multidimen.
Bayesian
Classifiers
Bayesian Chain
Classifiers

Hierarchical
Classification

Applications

References

(INAOE) 46 / 56
Applications

Visual skin detection


Introduction

Bayesian
Classifiers • Skin detection is a useful pre-processing stage for many
Naive Bayes
Classifier
applications in computer vision, such as person
TAN and BAN detection and gesture recognition, among others
Semi-Naive • A simple and very fast way to obtain an approximate
Bayesian
Classifiers classification of pixels in an image as skin or not − skin
Multidimen.
Bayesian
is based on the color attributes of each pixel
Classifiers
Bayesian Chain
• Usually, pixels in a digital image are represented as the
Classifiers
combination of three basic (primary) colors: Red (R),
Hierarchical
Classification Green (G) and Blue (B), in what is known as the RGB
Applications model. There are alternative color models, such as
References
HSV , YIQ, etc.

(INAOE) 47 / 56
Applications

SNBC
Introduction

Bayesian • An alternative is to consider a semi-naive Bayesian


Classifiers
classifier and select the best attributes from the different
Naive Bayes
Classifier color models for skin classification by eliminating or
TAN and BAN joining attributes
Semi-Naive
Bayesian
• Then an initial NBC was learned based on data -
Classifiers
examples of skin and not-skin pixels taken from several
Multidimen.
Bayesian images. This initial classifier obtained a 94% accuracy
Classifiers
Bayesian Chain
when applied to other (test) images.
Classifiers

Hierarchical
• The classifier was then optimized. Starting from the full
Classification
NBC with 9 attributes, the method applies the variable
Applications
elimination and combination stages until the simplest
References
classifier with maximum accuracy is obtained. With this
final model accuracy improved to 98%.

(INAOE) 48 / 56
Applications

Optimization
Introduction

Bayesian
Classifiers

Naive Bayes
Classifier

TAN and BAN

Semi-Naive
Bayesian
Classifiers

Multidimen.
Bayesian
Classifiers
Bayesian Chain
Classifiers

Hierarchical
Classification

Applications

References

(INAOE) 49 / 56
Applications

Example
Introduction

Bayesian
Classifiers

Naive Bayes
Classifier

TAN and BAN

Semi-Naive
Bayesian
Classifiers

Multidimen.
Bayesian
Classifiers
Bayesian Chain
Classifiers

Hierarchical
Classification

Applications

References

(INAOE) 50 / 56
Applications

HIV drug selection


Introduction
• The Human Immunodeficiency Virus (HIV) is the
Bayesian
Classifiers causing agent of AIDS, a condition in which progressive
Naive Bayes
Classifier
failure of the immune system allows opportunistic
TAN and BAN
life-threatening infections to occur
Semi-Naive • The Human Immunodeficiency Virus (HIV) is the
Bayesian
Classifiers causing agent of AIDS, a condition in which progressive
Multidimen. failure of the immune system allows opportunistic
Bayesian
Classifiers life-threatening infections to occur
Bayesian Chain
Classifiers
• To combat HIV infection several antiretroviral (ARV)
Hierarchical
Classification drugs belonging to different drug classes that affect
Applications specific steps in the viral replication cycle have been
References developed. It is important to select the best drug
combination according to the virus’ mutations in a
patient.

(INAOE) 51 / 56
Applications

Multilabel Classification
Introduction

Bayesian
Classifiers

Naive Bayes
Classifier • Selecting the best group of antiretroviral drugs for a
TAN and BAN patient can be seen as an instance of a multi-label
Semi-Naive classification problem
Bayesian
Classifiers
• This particular problem can be accurately modeled with
Multidimen.
Bayesian a multi-dimensional Bayesian network classifiers
Classifiers
Bayesian Chain • This model can be learned from data and then use it for
Classifiers

Hierarchical
selecting the set of drugs according to the features
Classification
(virus mutations)
Applications

References

(INAOE) 52 / 56
Applications

MBC for HIV drug selection


Introduction

Bayesian
Classifiers

Naive Bayes
Classifier

TAN and BAN

Semi-Naive
Bayesian
Classifiers

Multidimen.
Bayesian
Classifiers
Bayesian Chain
Classifiers

Hierarchical
Classification

Applications

References

(INAOE) 53 / 56
References

Book
Introduction

Bayesian
Classifiers

Naive Bayes
Classifier

TAN and BAN

Semi-Naive
Bayesian Sucar, L. E, Probabilistic Graphical Models, Springer 2015 –
Classifiers

Multidimen.
Chapter 4
Bayesian
Classifiers
Bayesian Chain
Classifiers

Hierarchical
Classification

Applications

References

(INAOE) 54 / 56
References

Additional Reading (1)


Introduction

Bayesian
Bielza, C., Li, G., Larrañaga, P.: Multi-Dimensional
Classifiers Classification with Bayesian Networks. International
Naive Bayes
Classifier
Journal of Approximate Reasoning, (2011)
TAN and BAN
Borchani, H., Bielza, C., Toro, C., Larrañaga, P.:
Semi-Naive
Bayesian Predicting human immunodeficiency virus inhibitors
Classifiers
using multi-dimensional Bayesian network classifiers.
Multidimen.
Bayesian Artificial Intelligence in Medicine 57, 219–229 (2013)
Classifiers
Bayesian Chain
Classifiers Cheng, J., Greiner, R.: Comparing Bayesian Network
Hierarchical Classifiers. Proceedings of the Fifteenth Conference on
Classification

Applications
Uncertainty in Artificial Intelligence, 101–108 (1999)
References Friedman, N., Geiger, D., Goldszmidt, M.: Bayesian
Network Classifiers. Machine Learning 29, 131–163
(1997)

(INAOE) 55 / 56
References

Additional Reading (2)


Introduction Kwoh, C.K, Gillies, D.F.: Using Hidden Nodes in
Bayesian
Classifiers Bayesian Networks. Artificial Intelligence 88, Elsevier,
Naive Bayes Essex, UK, 1–38 (1996)
Classifier

TAN and BAN Martinez M., Sucar, L.E.: Learning an optimal naive
Semi-Naive bayes classifier. International Conference on Pattern
Bayesian
Classifiers Recognition (ICPR), Vol. 3, 1236–1239 (2006)
Multidimen.
Bayesian Read, J., Pfahringer, B., Holmes, G., Frank, E.:
Classifiers
Bayesian Chain Classifier chains for multi-label classification.
Classifiers

Hierarchical
Proceedings ECML/PKDD, 254–269 (2009)
Classification

Applications
Ramı́rez, M., Sucar, L.E., Morales, E.: Path Evaluation
References for Hierarchical Multi-label Classification. Proceedings of
the Twenty-Seventh International Florida Artificial
Intelligence Research Society Conference (FLAIRS), pp.
502–507 (2014)
(INAOE) 56 / 56
References

Additional Reading (3)


Introduction Silla-Jr., C. N., Freitas, A. A.: A survey of hierarchical
Bayesian
Classifiers
classification across different application domains. Data
Naive Bayes Min. and Knowledge Discovery, 22(1-2), 31–72 (2011).
Classifier

TAN and BAN Sucar, L.E, Bielza, C., Morales, E., Hernandez P.,
Semi-Naive Zaragoza, J., Larrañaga, P.: Multi-label Classification
Bayesian
Classifiers with Bayesian Network-based Chain Classifiers. Pattern
Multidimen. Recognition Letters 41, 14–22 (2014)
Bayesian
Classifiers
Bayesian Chain Tsoumakas, G., Katakis, I.: Multi-Label Classification:
Classifiers

Hierarchical
An Overview. International Journal of Data Warehousing
Classification and Mining, 3, 1–13 (2007)
Applications

References van der Gaag L.C., de Waal, P.R.: Multi-dimensional


Bayesian Network Classifiers. Third European
Conference on Probabilistic Graphic Models, 107–114,
Prague, Czech Republic (2006)
(INAOE) 57 / 56

You might also like