You are on page 1of 4

Uncertainty Based Classication Fusion - A Soft-Biometrics Test Case

Hugo Gamboa Ana L. N. Fred Instituto de Telecomunicacoes {hgamboa,afred}@lx.it.pt

Abstract
We address the problem of classication of data with low separability. We adopt a Bayesian approach, with discriminant functions expressing a posteriori class probabilities. We propose a novel classication scheme incorporating classication error probability estimates in the decision process. We extend this approach into a classier fusion framework. Presented methods are evaluated in the context of user authentication, using multimodal biometrics. Results on real data conrm the usefulness of the proposed method, outperforming the corresponding deterministic classier.

discriminant function is used both for the classication decision and for the rejection decision. We propose a new classication scheme based on the estimation of the classication uncertainty, by evaluating the random variable gi (x) at each sample x to clas sify. Our goal is to design better classiers by incorporating information about the classication error probability into the decision process.

2. Uncertainty Modeling
We want to characterize the behavior of the esti mates of the classication error, pewi (x) = 1 p(wi |x). This is equivalent to characterizing the random variable gi (x). We use the term uncertainty to denote the disper sion of the estimates gi (x) at the point x . We start by studying p(x|wi ), related with p(wi |x) according to Bayes rule p(wi |x) = p(x|wi )P (wi ) . p(x|wj )P (wj )
j

1. Introduction
Let x Rd represent a pattern in a ddimensional feature space. We assume the classier is expressed in terms of discriminant functions, gi (x), i = 1 . . . , c, where c indicates the number of classes. A MAP classier corresponds to using gi (x) = p(wi |x), the a posteriori probability of class i given the observation x, with the decision rule: decide wi if i = argmaxj (p(wj |x)) = argmaxj (gj (x)). (1) The classication error probability for sample x is then given by pe(x) = 1 max gi (x). (2)
i

(3)

Classiers with a rejection option, typically discard samples x for which pe(x) > , being the maximum acceptable error. In practical situations, the p.d.f. p(wi |x) is not known, being estimated (learned) from the training data. Discriminant functions are thus replaced by estimates gi (x) = p(wi |x). Two problems can occur: (1) too few data are available, leading to poor estimates of the probabilistic distribution; (2) a parametric model is used that does not t the data well (incorrect model). These two problems are not reected in the above discriminant function; in spite of this, the

As an illustrative example, assume that p(x|wi ) follows a univariate normal distribution, with the parameters , : X N (, 2 ). For a given training set of size n, maximum likelihood estimates of these n 1 parameters are given by: = n i=1 xi , 2 = n 2 1 (xi ) , leading to the distribution estimate i=1 n p(x|wi ) = p(x|wi , , ). Using a different training set, will lead to a new estimate. Figure 1 presents cuts at the points [3, 2, 1, 0] were we observe the histogram of the empirical distribution of p(x|wi ), produced by 100 estimates of p(x|wi ) N (0, 1) based on 100 samples, randomly selected from the training patterns. We can verify that for distinct values of x the distribution has different models. Given an arbitrary probability function p(wi |x), and a point x , there is no simple way of obtaining a direct closed form for the distribution of the discriminant function gi (x ) = p(wi |x ). We will charac

1.0

1.0

1.0

1.0

0.8

0.8

0.8

0.8

0.6 pe(x)

0.6

0.6

0.6

0.4

0.4

0.4

0.4

0.2

0.2

0.2

0.2

0.0 0

0.0 5 10 15 20 0 x=-3

0.0 5 10 15 20 0 x=-2

0.0 5 10 15 20 0 x=-1

10 15 20 x=0

Figure 1. Distribution of p(x|wi ) at points x = [3, 2, 1, 0].

terize this r.v. by its mean value and standard deviation, estimated at each point x by bootstrapping [1, 3]. We create bootstrap sets of size n, represented by xb = xb , xb , xb , obtained by sampling with re1 2 n placement from the training population of m training samples x = [x1 , x2 xm ]. With each bootstrap set, xb , we generate a sample gi (x ) to extract statistical b properties, providing insight into the random variable gi (x ). This approach can be done under either a para metric or non-parametric estimation framework. Using the bootstrap approach we estimate the mean value and the variance of gi (x ) for each of the classes. For a sufciently large n, the mean of gi (x ) (denoted by gi (x )) is equal to gi (x ).

This rule, relaxes the reject option by incorporating the uncertainty of the locally estimated decision error into the discriminant function. The threshold is directly related to gi (x ), the probability of a classication being correct (1-pe(x )). We will call the selection of a suitable as the rejection operating point selection. In the case where we are selecting a rejection operating point, we control what is the acceptable error probability for our classier. This approach not only permits direct classication but also generates useful information about the classication, particularly the uncertainty we have in the error probability of our particular classication.

2.2. Uncertainty Based Classication Fusion


We will consider the fusion of MAP classiers with uncertainty reject option using a parallel architecture for the two classier problem [4]. Let x be the data generated by a data source A for instance, a given biometric modality. Similarly, let y represent the data generated by a second source B a distinct biometric modality. Assuming conditionally independent data sources, the a posteriori class probabilities, given the observation (x, y), factorize into p(wi |x, y) = p(wi |x)p(wi |y) . Estimating p(wi |x) and p(wi ) p(wi |y) with the discriminant functions of classiers c1 and c2 respectively, we obtain the product rule for classier fusion (see equation 6). pf (wi |x, y) = pc1 (wi |x)pc2 (wi |y) . p(wi ) (6)

2.1 Uncertainty Based Reject Option Classier


The reject option classication is based on a classier that either select one of the classes, or rejects a sample based on the estimated error probability. We propose a simple extension to the typical rejection option using the variance of gi (x ) to establish a new rejection rule. u We dene gi (x ) as:
u gi (x ) = k[i (x ) + w std(i (x ))], g g

We integrate the previous uncertainty based reject option classier into the fusion rule by replacing u pcj (wi |xj ) with gi (xj ), as given in equation 4. The decision rule based on the the two classiers is:
1 u u true if gi (x)gi (y) p(wi ) > f alse otherwise (7)

(4)

Accept(x, y wi ) =

where k is a normalizing factor to guarantee that nc u i=1 gi (x ) = 1; and w weights the contribution of the standard deviation of gi (x ) in the modied dis criminant function. u When w 0, gi (x ) gi (x ), corresponding to the standard discriminant function. When w , u gi (x ) std(i (x )). For our tests we selected g w = 1. The new discriminant functions add a class dependent value that will balance the uncertainty estimated for the classication among all the classes. The proposed rejection rule is expressed by
u if max(gi (x )) < reject. i

3. Results
We tested the use of the electrodermal activity (EDA), an electrophysiological signal, for user authentication purposes. The experimental setup and authentication results are presented next.

3.1. Experimental Setup


We collected electrodermal activity from 27 subjects, while they were interacting with a computer per-

(5)

dard rejection option classier (solid line) and the proposed uncertainty based reject option classier (dashed line). The example is from a sequential classier with 5 sequential samples (the training vector had 5 sequential samples per user). The bootstrap estimates were computed from 100 bootstrap samples.

Figure 2. EDA data: error bar of the EDA data for each user.

forming a concentration task, consisting of the selection of pairs of consecutive digits that add to 10, from a dense matrix of numbers (see [2]). An EDA event, called a skin conductance response, is a detected response in the Electrodermal signal [5]. For performance evaluation we selected a random set of events from each user guaranteeing that the set dimension is equal for every user. We used 125 samples per user. We used the amplitude feature of each SCR using a lognormal model for p(x|wi ). In all the following results we used half of the data for training and half the data for classication. We used the classier in sequential classication mode where we consider each sample independent, and assuming p(X|wi ) = p(xj |wi ). The discriminant function for a sequence can be derived as: p(wi |X) =
ns j=1 p(xj |wi ) , ns nc j=1 p(xj |wk ) k=1

Figure 3. Error probability versus rejection probability.

We see that the uncertainty based rejection classier presents lower error probability for the same rejection level. If we select a xed error probability, the uncertainty based reject option also has lower rejection levels when compared to the standard reject option.

3.3. Uncertainty Based Classication Fusion


The EDA data is now used in the format of a sequential classier in authentication mode. We also present the improvements resulting from fusing our EDA based classier with a synthetic data classier.

(8)

where ns is the number of sequential samples and nc the number of classes. The sequence was sampled from the testing set. We allowed repetition in the sampling process. For each user we present 125 samples. The error bar plot of gure 2, provides a view of the EDA data, where the solid line is the mean value for each user and the bar extends to standard deviation. For visualization purposes the users are ordered by the mean value of the feature. Nevertheless some user pairs are clearly distinct giving the perspective that some discriminatory information can be extracted from the data.

Figure 4. EDA Equal error rate results.

3.2. Uncertainty Based Reject Option


The uncertainty based reject option (see section 2.1) was tested with the EDA data to understand the possible use of the EDA signal as a stand-alone mode, even if a relevant percentage of the samples would need to be rejected. The identication error probability - probability of rejection trade-off is depicted in gure 3 for the stanFigure 4 presents the results of the stand alone EDA biometric system with, for and increasing number of event, with EER values starting in 35% EER, for a 1 EDA event classier, to near 10%, with 40 sequential EDA events. In the case of the EDA signal, even with a sequence of samples, the error does not decrease to acceptable values. The option is data fusion with other, or sets of

other, hard-biometric techniques. We need to verify the fusion tools that we designed to create a fusion classier, which improves the rst hard-biometric based classier. In the fusion experiments, we created a synthetic data source with the same number of classes. The data model was a multimodal normal N (, ) with i = [1 , , nc ] where j = 1 if j = i and j = 0 if j = i otherwise (i is the class/user number). is a diagonal matrix with in each diagonal element. This model creates a class space where the class means are equally spaced from each other, providing a classication error that is distributed among all the classes. For this model we just need to dene the scalar that controls the data separation. We used a that provided a base error for the synthetic classier equal to 0.071. As base fusion rule we used the multiplication of gi for each classier, referred to as the product rule. We also tested the uncertainty based classication fusion with the sum rule, but results were always worse than the ones obtained with the product rule. In the following examples we just compare the product rule with the product rule based on the bootstrap u discriminant functions gi (x ) (see equation 4). The evolution of the equal error rate for different training sample sizes is depicted in gure 5. The solid line presents the EER results of the regular fusion rule ( standard deviation indicated by the error bars). The dashed line represents the uncertainty reject option based fusion rule. The dotted line is the base EER of the synthetic data classier. We generated the data from 10 runs of the fusion algorithm for different training set sizes (we report the number of samples in the training set per user). The number of bootstrap samples was set to 100. We used a sequence size of 25 sequential testing samples. We observed that with a small testing sample set the direct fusion will generate more error than the synthetic data classier alone. The uncertainty based reject option will always improve classications, and for small training sets will greatly outperform the normal fusion. When more training data is available both classication fusion techniques have similar performance.

Figure 5. Classication fusion versus uncertainty reject option based fusion.

user authentication. Experimental results have shown that this new behavioral biometric modality, when using single event for classication, has a low discriminating capacity between humans. The use of a sequential classier led to better, but still unacceptable performances as a stand alone biometric. Application of the proposed uncertainty based classier to this low separability data led to improved authentication performances. We then tested how this soft biometric could be included in a multimodal biometric system, using synthetic data as the second biometric modality and applying both the conventional product rule for classier fusion, and the proposed uncertainty based classier fusion method. Experimental results have shown an increased performance when using the proposed approach, the fusion of a hard biometric with the soft biometric based on EDA events proving to be a valid and worth exploring biometric modality.

References
[1] B. Efron and R. J. Tibshirani. An Introduction to the Bootstrap. Chapman & Hall/CRC, May 1994. [2] H. Gamboa, H. Silva, and A. Fred. Himotion project report. Technical report, Instituto de Telecomunicacoes, 2006. [3] A. Jain, R. Dubes, and C. Chen. Bootstrap techniques for error estimation. Pattern Analysis and Machine Intelligence, 9(5):628633, September 1987. [4] J. Kittler. Pattern classication: Fusion of information. In Proceedings of the International Conference on Advances in Pattern Recognition, pages 1322. SpringerVerlag, 1998. [5] J. Malmivuo and R. Plonsey. Bioelectromagnetism : Principles and Applications of Bioelectric and Biomagnetic Fields, chapter The Electrodermal Response. Oxford University Press, USA, July 1995.

4. Conclusions
We presented novel classication schemes integrating estimates of classication error into a Bayesian classication framework. Both single classiers with a reject option and classier fusion methods were proposed incorporating the classication error probability estimates into the decision process. The proposed methods were evaluated in a real world problem, testing the use of the electrodermal activity physiological signal for

You might also like