Professional Documents
Culture Documents
Using The Beta-Binomial Distribution For The Analysis of Biometric Identification
Using The Beta-Binomial Distribution For The Analysis of Biometric Identification
net/publication/305867286
CITATIONS READS
4 112
2 authors:
All content following this page was uploaded by Gabor Werner on 28 December 2018.
Abstract—In this paper we presented a method which could characteristic is very useful in the practice. Before the era
be used in the testing process of biometric devices on the of the biometrical identification the main question during
side of users and costumers. The failed events during the the authentication process was that: the given ‘key’,
biometrical identification are more common than the (RFID chip, PIN code, etc.) “is able to open the lock”?
literature data, however there isn’t any standard or widely
This ‘key’ was mostly constructed by a binary code. This
accepted manual for the examinations. The conventional
statistical methods use the binomial distribution to estimate code, excluding that how it is encrypted, gives only one
the expected number of failure, but in the field of the suitable solution which can be totally selective. In
biometrics the probability parameter can’t be constant contrast with this in the biometrics it is almost impossible
which means that it is necessary to describe a process. Our to find a totally matching pattern. Like Herakleitos said
results have shown that the probability is characterized with ”You can't step twice into the same river”, because the
two parameters of the beta distribution, and these are nature, including the humans beings is changing all the
predictable from a smaller sample of the investigated time [2].
population with the maximum-likelihood method. This uncertainty doesn’t mean that the biometrical
Keywords—component; biometrics, beta-binomial
patterns aren’t able to be totally selective. It just needs for
distribution, false rejection rate, Bayesian analysis, another philosophy instead of the bipolar logic, which
maximum likelihood manages the lack of precision and failures. Whether in
the case when it is necessary to calculate with the
failures, it is required as well to estimate the density and
I. INTRODUCTION the distribution of the failures. On the market of the
In the biometrical identification the recognition and biometrical devices the manufacturers usually report low
authentication of individual identifier pattern (IIP) is discrete numbers (FAR: 0,0001-0,01%, FRR: 0,01-1,0%).
quite complex process. The experiences have shown that These values effect the daily number of errors which
the number of false identifications is high. It is accepted posteriori are able to estimate with Poisson distribution.
to use FAR1 – False Acceptance Rate and the FRR2 – The Poisson distribution in fact is actually a limiting case
False Rejection Rate) and the ROC3 – Receiver of the binomial distribution.
Operating Characteristic to classify the ’goodness’ of the However the priori failure distribution isn’t able to be
operation. However these rates don’t represent the approximated with Poisson distribution in biometrics.
uncertainties of the operation’s background. According to Thus the appropriate approximation is neither the Poisson
the Bayesian analysis the ‘goodness’ is able to qualified distribution nor the binomial distribution in the posteriori
more specifically than the common statistical rates: this is failure distribution. To solve this contradiction the model
the beta-binomial distribution, which is a special formula below have been constructed, which adapted the beta-
of binomial distribution, that implicit applies the binomial distribution to classify the ‘goodness’ of
parametric gamma function [1]. biometric devices in reality [3].
The logic of biometric identification is different from
the classical RFID or PIN code authentication, thus the II. METHODS
beta-binomial distribution of the recognition’s
A. Mathematical Background of Beta-binomial
Distribution
1
The false acceptance rate, or FAR, is the measure of the The beta-binomial distribution is a special type of the
likelihood that the biometric security system will incorrectly classical binomial distribution, wherein the probability of
accept an access attempt by an unauthorized user.
2 the possible events is not constant, but it follows a special
A False Rejection Rate megmutatja, hogy az adott működési
beállítások mellett, egy adott eszköz milyen arányban utasít
distribution. This distribution of the probability parameter
vissza belépési jogosultsággal rendelkező személyeket (p) is determined by two parameters (α – alpha, β – beta).
3
In statistics, a receiver operating characteristic (ROC), or ROC According to the statements of Bayesian analysis it can
curve, is a graphical plot that illustrates the performance of a be stated: if the a priori distribution is beta, and the
binary classifier system as its discrimination threshold is varied. transitional distribution is binomial, than the a posteriori
The curve is created by plotting the true positive rate against the distribution is also beta [4], [5].
false positive rate at various threshold settings.
(2.1)
(2.2)
(2.3)
(2.4)
(2.5)
It is proven, the parameters (α,β) are determining the Figure 2. Beta distribution density function if the parameters are
probability parameter of the binomial distribution’s different
density function. However, the determination of these
parameters is quite complex. The parameters are specific
B. The Purpuse of the Study
for each of the analyzed samples and they’re
characterizing the entire multiplicity. Thus it was The main goal of this work was that to create an
necessary to find an automatic and suitable method to algorithm which applies the beta-binomial distribution to
calculate the parameters in every each of experiments. modeling the real mathematical background of FAR and
FRR in biometrical identification. In this paper we
As it is shown on Fig. 1, if the α and β parameters are presented the entire process of Bayesian analysis from the
equal (α=β), then the density function is symmetric. If sampling till the estimation of the failures in the daily
both of them equal ‘one’ (α=1, β=1), then the density use. This algorithm was constructed in MATLAB (Matrix
function value is constant ‘one’. This means that the Laboratory) which uses as incoming values the statistical
probability parameter can be every single value of the results of the prior tests. In the prior tests a fingerprint
interval 0-1 with the same possibility, so the a posteriori reader (iEVO) has been tested by eight volunteer.
distribution is determined only by the normal binomial Unfortunately only four of them have patterns that were
distribution [4], [5]. evaluable. The collected results made the database which
In this paper we’re focusing to the possible rates of has been used as the incoming values.
failures, more specifically the probability of the number Actually we were focusing rather to the FRR, than
of possible failures, whereupon the more the number of FAR, because these failures are enough typical and too
the failures, the less the probability that occurrence. This common in the everyday use. As we experienced in the
is determining an asymmetric shape for the beta practice, the significantly occurrence of false rejection
distribution. In this study the value of the parameters leads to the ignoration of biometric access systems and
mostly floated around discrete values, like α ~ 1 and β ~ bypass them. Thus the risk of the impostor's entry is
2, which are determining a decreasing density function of increasing to significantly higher level than a considered
possible number of failures, as it is presented on Fig. 2. level with less strict setting of the access point.
– 210 –
SISY 2015 • IEEE 13th International Symposium on Intelligent Systems and Informatics • September 17–19, 2015, Subotica, Serbia
The main goal of the Applied Biometrics Institute is to (shift, rotation, dirt), disturbances from the environment
standardize the examination of biometrical identification (optical, thermal or EM radiation, humidity, dust) and the
devices, so the market participants can get a more inner malfunction of the device (reading, computing,
objective view about the practical effectiveness of these deciding). Elementary investigation of these failures is
devices. According to this goal, we used a database which complex and it is another ongoing research, thus in this
has been detected in our laboratory. paper we were focusing to the aggregation of these
elements, in a black-box model. We made differences
C. Methodology of the Examination
between the failures by the level of the individuals and
As we experienced that there isn't any appropriate tests. But this model might be applicable on the level of
methodology for examination, according to this it was different origins [7].
necessary to find one. There have been investigated With the parameters of the Beta distribution it is
statistical methods for easier reproducibility and usage. possible to calculate the beta-binomial density function
The results have shown that the less the number of the that shows the possibility the different cases of the
investigated variables the higher the willingness to apply failures. The individual density function compared with
the method by the side of costumers4. We had to involve the density function of the aggregated data, it is possible
an automatic technique which is easily adoptable in the to establish whether there is a subject with very poor
practice. The theory of the process is the following: pattern or the device’s gone wrong.
In the first step it is necessary to pick a representative
sample of the multiplicity. In this case that means a small D. Applied Mathematical Methods
group of those employees who are using the investigated For the determination of the Beta parameters we used
access point. The human identification patterns (HIP) are the maximum-likelihood method, thus the searched
not permanent, those are continuously changing in time, parameters can be calculated by an extreme value
sometimes day by day or person by person. Consequently theorem. We got the Jacobi matrix below (2.6) with the
the number of the failures is increasing in non-dynamic derivation of the log-likelihood function by the
systems. To determinate the exact value of failure rates parameters.
the examination of these devices has to be regular and
systematic.
With our technique it is possible to establish whether
the devices themselves perform worse (brake down or get
dirty) or just a couple of the users have some trouble with
the access. It is proven there aren't two totally equal (2.6)
sample in the biometrics, even if they have been recorded
from the same person in the same time. What's more, a where:
small percentage of the population even isn't able to
identify by fingerprints. The experiments are showing the
higher average age of the users the more failures are
detected [2], [6], [7].
In the field of biometrics we have to count with
continuously changing user attribution and environment,
thus the numbers of the failures (not recognized HIP) are
variable. But this variety is not chaotic, it is possible to
describe by the beta-binomial distribution.
In the conducted experiences we examined how big the
possibility of the failures. Ten tests have been repeated (2.7)
ten times, and each time we noticed the number of the
failed identification. These rounds were subscribing the There were more iteration methods investigated to
distribution of the possible failures. However we had to calculate the constants of the Jacobi matrix (2.6). The
find the method which is able numerically characterize Newton-Raphson method and fix-point iteration weren't
this distribution. As it was mentioned above this able to use, according to the non-quadratic convergence,
distribution is the beta distribution, so the exact task in but the Armijo type gradient method’ve solved the
the second step was that to find the right way to problem of the iteration. The essence of this method is the
determine the parameters in each of the beta distributions examination of the gradient of a non-negative function
at every subject. It is important to highlight, these results (2.8), because the searched extreme value approximation
individually are not able to classify the goodness of the determined by the highest decreasing of this non-negative
fingerprint reader, just showing whether the methodology function (2.8). During the approximation it is necessary
works or not. to determine two constant (ε,η) in an empirical way. The
In the biometrics those origins of the failures are examined non-negative function is the following [8], [9]:
investigated which can be caused by reading obstacle
4
Those companies which buy and apply biometrical
identification devices for access systems (2.8)
– 211 –
G. Á. Werner and L. Hanka • Using the Beta-Binomial Distribution for the Analysis of Biometric Identification
It's well known, the direction of the highest decreasing Table 1. Distribution of the experimented population
is equal with the conjugate gradient, but to avoid the too
big iteration step and to avoid the too small
approximation the criterions below (2.9) have to be true
at the same time. These Armijo-Goldstein criterions
ensure the suitable approximation [9]:
(3.3)
5
The manufacturer gave the original (empirical) data for
characterize the goodness of the device; FRR<0,1% and
FAR<0,00001%
– 212 –
SISY 2015 • IEEE 13th International Symposium on Intelligent Systems and Informatics • September 17–19, 2015, Subotica, Serbia
After while there have been observed n different event, Table 2. Number of the failure in the tests
the number of the adverse events is x, and so the number
SUBJECT 0 1 2 3 4 5 6 7 8 9 10 SUM
of the not false identification is n-x, the corrected
equation is the following [4], [10]: I. subject 4 0 0 1 2 1 1 1 0 0 0 29
II. subject 7 2 1 0 0 0 0 0 0 0 0 4
III. subject 2 2 0 1 1 2 2 0 0 0 0 32
IV. subject 2 2 4 0 1 0 0 0 1 0 0 18
(3.2)
av. 3,75 1,5 1,25 0,5 1 0,75 0,75 0,25 0,25 0 0 20,75
– 213 –
G. Á. Werner and L. Hanka • Using the Beta-Binomial Distribution for the Analysis of Biometric Identification
and summed value’s parameters is getting closer. In Figure 5, we’ve shown the cumulated density functions
According to the discrete values it would be highly for the better view.
suggested to repeat the tests on bigger population.
As a trend we can highlight the followings: all the test
when the number of the failure is zero or quite small is
significantly higher than the more times failed. The usual
statistic uses the binomial distribution to estimate the
average number of failures. To compare the conventional
method with this examined method we used the average
number of the experienced failures as parameter p in the
binomial equation.
As far as that has been shown in Figure 4 between the
density functions the difference is significant. In the beta-
binomial method the probability of zero failures is about
35%, in contrast with the normal binomial method, where
it is 10-15%. Although the tests were done on a smaller
population, the necessary minimal failed events have
been detected on 99% level of confidence, according to
the Doddington formula. In the testing of smaller
populations usually the deformity means that the
experienced failure number is smaller than the
statistically expected. The Doddington formula helps to
correcting this deformity. So the problem that origins
from the small population slightly corrected, thus we
could focus to compare the significant difference between
the normal binomial and beta-binomial distribution.
Figure 5. Experimental cumulated density functions
– 214 –
SISY 2015 • IEEE 13th International Symposium on Intelligent Systems and Informatics • September 17–19, 2015, Subotica, Serbia
C. Summary of the results [3] Carl Young, Metrics and Methods for Security Risk Management,
pp. 105. Syngress, 2010.
In the Applied Biometric Institute we attempted to [4] I. Fazekas, Probability and Statistic, University of Debrecen,
implement an innovative method for a clear and practical 2004.
testing process of the biometric devices. According to our [5] Dan Navarro, Amy Perfors, An introduction to the Beta-Binomial
experiences, we assumed that there is a method which model, COMPSCI 3016: Computational Cognitive Science,
University of Adelaide
can be further used to analyze the goodness of the
[6] Mike Silverman, Tony Thompson, Written in Blood, Transworld,
identification, because the difference is significant 2014.
between the practical and theoretic FRR. [7] Anil Jain, Lin Hong, Sharath Pankanti, Biometric Identification,
We hypothesized that the fail rejection’s distribution Magazine Communications of the ACM, Vol. 43 Issue 2, Feb.
follows a binomial distribution where the probability of 2000
each failure is describable by beta distribution. Thus after [8] Larry Armijo, Minimization of Functions Having Lipschitz
Continuous Fist Partial Derivates, Pacific Journal of
the Bayesian analysis the posteriori formula follows beta- Mathematics, Vol. 16, No. 1, 1966
binomial distribution, and it is characterized by two [9] Alessandro Astolfi, Optimization, handout pp. 25-27, London,
parameters (alpha, beta). These parameters are 2009.
predictable from a sample of the user population. [10] Zs. Balogh, László Hanka, Bayesian Analyzes in the Risk
To determinate the parameters, we wrote a MATLAB Asessment, Application of Discrete Probability Distributions,
Statements in Aeronautics, vol. XXV. 2013/02
program which implements the Armijo gradient formula
[11] László Hanka, Mathematical Methods in Biometrics, University of
to the extreme-value calculation of the maximum- Óbuda, 2012.
likelihood estimation. As the parameters are determined it
is easy to calculate the density and the distribution
functions. To demonstrate the differences, the normal
binomial and the beta-binomial distribution have been
compared with each other.
In this study the normal binomial distribution’s
maximum has been given at exactly two failure in
contrast with beta-binomial distribution, in which its
maximum is at zero failure and the function monotone
decreases.
Finally we have found that the failed rejection depends
on more variables, for which doesn’t exist an explicit
formula. Because the failures come from different set of
mistakes and statistical uncertainty, so the normal
binomial fail estimation isn’t able to lead to the right
consequences. We’ve chosen a distribution to the p
probability, instead of the constant. With this
methodology it is possible to characterize a biometric
device on smaller sample from a population.
I. CONCLUSIONS
Based on the experiences it may be stated that to
characterize the biometric devices the implemented
mathematical apparatus has to be considered. The given
data and characteristics by the manufacturers usually
aren’t in the same order of magnitudes with the
experienced ones in the practice.
While the manufacturer performs their own testing
processes, there is not any standard to characterize the
biometric devices, thus there is a necessity to investigate
these tools on side of the customers and users. The
method that have been shown in this study may be
involved the testing processes.
According to the experiences that could be stated the
beta-binomial distribution in contrast with the normal
binomial distribution is closer to the reality and gives
more accurate results.
REFERENCES
[1] D. C. Hitchcock, Evaluation and Combination of Biometric
Authentication System, University of Florida, 2003.
[2] Anil K. Jain, Arun Rossm, Multibiometric Systems, in
COMMUNICATIONS OF THE ACM, 2004/Vol. 47, No. 1
– 215 –