Welcome to Scribd, the world's digital library. Read, publish, and share books and documents. See more
Standard view
Full view
of .
Look up keyword
Like this
0 of .
Results for:
No results containing your search query
P. 1
1.CIS 2000 Ruta Gabrys Fusion Methods Overview

1.CIS 2000 Ruta Gabrys Fusion Methods Overview

Ratings: (0)|Views: 0 |Likes:
Published by Al Arafat

More info:

Published by: Al Arafat on Jun 15, 2012
Copyright:Attribution Non-commercial


Read on Scribd mobile: iPhone, iPad and Android.
download as PDF, TXT or read online from Scribd
See more
See less





Computing and Information Systems,
(2000) p.1-10
© 2000
University of Paisley
An Overview of Classifier Fusion Methods
Dymitr Ruta and Bogdan Gabrys
 A number of classifier fusion methods have beenrecently developed opening an alternative approachleading to a potential improvement in theclassification performance. As there is little theory of information fusion itself, currently we are faced withdifferent methods designed for different problems and  producing different results. This paper gives anoverview of classifier fusion methods and attempts toidentify new trends that may dominate this area of research in future. A taxonomy of fusion methods
trying to bring some order into the existing “puddingof diversities” is also provided.
The objective of all decision support systems (DSS) isto create a model, which given a minimum amount of input data/information, is able to produce correctdecisions. Quite often, especially in safety criticalsystems, the correctness of the decisions taken is of crucial importance. In such cases the minimuminformation constraint is not that important as long asthe derivation of the final decision is obtained in areasonable time. According to one approach, theprogress of DSS should be based on continuousdevelopment of existing methods as well asdiscovering new ones. Another approach suggests thatas the limits of the existing individual method areapproached and it is hard to develop a better one, thesolution of the problem might be just to combineexisting well performing methods, hoping that betterresults will be achieved. Such fusion of informationseems to be worth applying in terms of uncertaintyreduction. Each of individual methods produces someerrors, not mentioning that the input informationmight be corrupted and incomplete. However,different methods performing on different data shouldproduce different errors, and assuming that allindividual methods perform well, combination of suchmultiple experts should reduce overall classificationerror and as a consequence emphasise correct outputs.Information fusion techniques have been intensivelyinvestigated in recent years and their applicability forclassification domain has been widely tested [1]-[14].The problem arouse naturally as a need of improvement of classification rates obtained fromindividual classifiers. Fusion of data/information canbe carried out on three levels of abstraction closelyconnected with the flow of the classification process:
data level fusion, feature level fusion, and classifier  fusion
There is little theory about the first twolevels of information fusion. However, there havebeen successful attempts to transform the numerical,interval and linguistic data into a single space of symmetric trapezoidal fuzzy numbers [14], [15], andsome heuristic methods have been successfully usedfor feature level fusion [15]. A number of methodshave been developed for classifier fusion also referredto as decision fusion or mixture of experts.Essentially, there are two general groups of classifierfusion techniques. The methods subjectivelyassociated with the first group generally operate onclassifiers and put an emphasis on a development of the classifier structure. They do not do anything withclassifiers outputs until combination process findssingle best classifier or a selected group of classifiersand only then their outputs are taken as a finaldecision or for further processing [2], [9], [10].Another group of methods operate mainly onclassifiers outputs, and effectively the combination of classifiers outputs is calculated [1], [3]-[8], [11]-[15].The methods operating on classifiers outputs can befurther divided according to the type of the outputproduced by individual classifiers. A diagrammaticrepresentation of the proposed taxonomy of classifierfusion methods is shown in Figure 1.
 ¢¡£¥¤§¦©¨§§¡¡¢ §!¡"§#%$&!')(0"!1!
Soft / FuzzyoutputsClass LabelsClassRankingsClass SetReductionClass SetReorderingUnion ofNeighbourhoodsIntersection ofNeighbourhoods
Classifier Fusion (CF) System
ClassifierStructuring andGroupingDynamicClassifierSelectionLogisticReggresionThe HighestRankBorda CountAverage BayesCobminationProduct ofExpertsFuzzy TemplateVoting MethodsKnowledge-Behaviour SpaceDempster-ShafferCombinationHierarchicalMixtures ofExpertsFuzzy IntegralsNeural NetworkNo trainingTrained
CF methods operatingon classifier outputs
CF methods operatingon classifiers
From the three possible types of outputs generated byindividual classifiers the crisp labels offer theminimum amount of input information for fusionmethods, as no information about potentialalternatives is available. Some additional usefulinformation can be gained from classification methodsgenerating outputs in a form of class rankings.
Edited by Foxit ReaderCopyright(C) by Foxit Software Company,2005-2007For Evaluation Only.
 2However, fusion methods operating on classifiers withsoft/fuzzy outputs can be expected to produce thegreatest improvement in classification performance.The remaining of this paper is organised as follows. Insection II methods operating on classifiers are brieflypresented. The following three sections provide anoverview of the classifier fusion methods operating onsingle class labels, class rankings and fuzzy measuresrespectively. Finally conclusions and suggestions forfuture work are presented.
As mentioned in the introduction, a number of fusionmethods operate on the classifiers rather than theiroutputs, trying to improve the classification rate bypushing classifiers into an optimised structure. Amongthese methods a dominant role is played by DynamicClassifier Selection, which is sometimes referred to asan alternative approach to the classifier fusion. Theother two approaches reviewed in this section includeclassifier structuring and grouping and hierarchicalmixture of experts.
2.1 Dynamic Classifier Selection
Dynamic Classifier Selection (DCS) methods reflectthe tendency to extract a single best classifier insteadof mixing many different classifiers. DCS attempts todetermine a single classifier, which is the most likelyto produce the correct classification label for an inputsample [2], [10]. As a result only the output of theselected classifier is taken as a final decision.Dynamic classifier selection process includes apartitioning of the input samples. There are a numberof methods for the partition forming, starting from theclassifiers agreement on the top choices, up to thegrouping of features of input samples.An example of DCS method is recently developedDCS by Local Accuracy (DCS-LA). Having definedthe set of partitions, the best classifier for eachpartition is locally selected. Considering finalclassification process, an unknown sample is assignedto a partition, and the output of the best classifier forthat partition is taken as a final decision. The idea of using DCS-
LA is to estimate each classifier’saccuracy in a local region of feature space and thenthe final decision is taken as an output from the mostlocally accurate classifier.Another approach assumes estimating a localregression model (i.e. logistic regression [2]) for eachpartition. After the model is estimated for eachpartition, a relevant decision combination function isselected dynamically for each test case.All DCS methods strongly rely on training data and bychoosing only locally best classifier they seem to losesome useful information available from other wellperforming local classifiers. However, applying theDCS method sequentially, excluding the best classifiereach time, it is possible to obtain a very reliableranking of classifiers and eventually also classrankings. Such an approach could be treated as a goodpre-processing stage before other methods operatingon class rankings are used.
2.2 Classifier Structuring and Grouping
Classifiers and their combination functions may beorganized in many different ways [2]. The standardapproach is to organise them in parallel andsimultaneously and separately get their outputs as aninput for a combination function or alternativelysequentially apply several combination functions.According to another strategy, a more reasonableapproach is to organise all classifiers into groups andto apply different fusion methods for each group. Ingeneral, classifiers may be arranged in a multistagestructure. At each stage different fusion methodsshould be applied for different groups of classifiers.Additionally, DCS methods could be used at somestage for selection of the best classifier in each group.There are a lot of different design options, which arelikely to be specific for a particular application.However, at each stage of grouping, a very importantfactor is the level of diversity of classifier types,training data and methods involved [17]. Any possibleclassification improvement may only be achieved if the total information uncertainty is reduced [16]. Thisin turn depends on the diversity of informationsupporting different classification methods.On the other hand, the same goal can be achieved byreduction of errors produced by individual classifiers.Decision combination process tries to minimise thefinal classification error. As most combinationfunctions work on the basis of increased importance of repetitive inputs, the greater the diversity of the errorsproduced by individual classifiers, the lower theirimpact on the final decision and effectively the lowerfinal error. This rule can be applied for any kind of groupings and structuring that might be used in themultiple classifier system.
2.3 Hierarchical mixture of experts
Hierarchical mixture of experts (HME) is an exampleof the fusion method, which strength comes fromclassifiers structure. HME represents a supervisedlearning technique based on the divide-and-conquerprinciple, which is broadly used throughout computerscience and applied mathematics [9]. The HME isconceptually organised in a tree-like structure of leaves. Each leave represents an individual expertnetwork, which given the input vector x tries to solvelocal supervised learning problem. The outputs of the
 3elements of the same node are partitioned andcombined by the gating network and the total outputof the node is given as a convex combination. Theexpert networks are trained to increase the posteriorprobability according to Bayes rule and then a numberof learning algorithms can be applied to tune themixture model.Recently the EM algorithm was developed for theHME architecture [9]. The tests on the robot dynamicsproblem showed a substantial improvement incomparison with the back-propagation neural network.The HME technique does not seem to be applicablefor a large dimensional data, as increase of thecomplexity of the tree-like architecture and associatedinput space subdivision lead to the increased varianceand numerical instability.
Classifiers producing crisp, single class labels (SCL)provide the least amount of useful information for thecombination process. However, they are still wellperforming classifiers, which could be applied to avariety of real-life problems. If some training data areavailable, it is possible to upgrade the outputs of theseclassifiers to the group operating on class rankings oreven fuzzy measures. There are a number of methodsto achieve this goal, for instance by performing anempirical probability distribution over a set of trainingdata. The two most representative methods for fusingSCL classifiers, namely generalised voting method,and Knowledge-Bahaviour Space method, are nowpresented.
3.1 Voting Methods
Voting strategies can be applied to a multipleclassifier system assuming that each classifier gives asingle class label as an output and no training data areavailable. There are a number of approaches tocombination of such uncertain information units inorder to obtain the best final decision. However, theyall lead to the generalised voting definition. Forconvenience let the output of the classifiers form thedecision vector
defined as
denotes the label of the i-th classand
the rejection of assigning the input sample toany class. Let binary characteristic function be definedas follows:
i ji ji j
cdcdif if 01)(cB
 Then the general voting routine can be defined as:
{ }
+α =
otherwise)d(m)c(B)c(B if rc)d(E
n1 ji jn1 jt jm,..,1t i
 where is a parameter and
is a function thatprovides additional voting constraints. The mostconservative voting rule is given if 
,meaning that the class is chosen when all classifiersproduce the same output. This rule can be liberalisedby lowering the parameter . The case where
 is commonly known as the majority vote. Function
is usually interpreted as a level of abjection tothe most often selected class and refers mainly to thescore of the second ranked class. This option allows toadjust the level of collision that is still acceptable forgiving correct decision.
3.2 Behaviour-Knowledge Space Method
Most fusion methods assume independence of thedecisions made by individual classifiers. This is in factnot necessarily true and Behaviour-Knowledge Spacemethod (BKS) does not require this condition [12]. Itprovides a knowledge space by collecting the recordsof the decisions of all classifiers for each learnedsample. If the decision fusion problem is defined as amapping of 
, the method operates on the
- dimensionalspace. Each dimension corresponds to an individualclassifier, which can produce
crisp decisions,
class labels and one rejection decision. A unit of BKS is an intersection of decisions of every singleclassifier. Each BKS unit contains three types of data:the total number of incoming samples:
, the bestrepresentative class:
, and the total number of incoming samples for each class:
. In thefirst stage of BKS method the training data areextensively exploited to build the BKS. Then the finalclassification decision for an input sample is derivedin the focal unit where the balance is estimatedbetween the current classifiers decisions and therecorded behaviour information as shown in thefollowing rule:
otherwiserejectionT)(Rn0Tif R E(x)
K1K1K1 K1K1
e,...,ee,...,ee,...,e e,...,ee,...,e
where is a threshold controlling the reliability of thefinal decision. The model tuning process shouldinclude automatic finding of the threshold .
Among the fusion methods operating on classrankings as the outputs from multiple classifiers, twomain approaches are worth mentioning. The first isbased on a class set reduction and its objective is toreduce the set of considered classes to as small anumber as possible but ensuring that the correct classis still represented in the reduced set. Another

You're Reading a Free Preview

/*********** DO NOT ALTER ANYTHING BELOW THIS LINE ! ************/ var s_code=s.t();if(s_code)document.write(s_code)//-->