You are on page 1of 40

Fundamentals of Biometric System

Design
by S. N. Yanushkevich

Chapter 3
Biometric Methods and Techniques: Part I –
Statistics
FRR

1
Equal error rate (EER)
• Computing type I errors (FRR)
• Computing type II errors (FAR)
• System performance evaluation
FAR
0
1
S.N. Yanushkevich, Fundamentals of Biometric System Design 2

Preface
The key methodology of the measurement in biometric tech-
nology is the engineering statistics.
The key methodology of biometric data processing is signal
processing and pattern recognition.
The key performance metrics in biometrics are related to
matching rates (false match rate, false nonmatch rate, and
failure-to-enroll rate).
The crucial point of biometric system design is the measuring of biometric data. It is
mandatory knowledge of all biometric design teams. In the engineering environment, the
data is always a sample1 selected from some population. For example, the calculation
the reliability parameters of biometric data, such as the confidence interval and the
sample size, is a typical problem of the experimental study, reliability design and quality
control. Engineering statistics provides various tools for assessing the of biometric data,
in particular:
◮ Techniques for estimation of mean, variance, and correlation,
◮ Techniques for computing confidence intervals,
◮ Techniques for hypothesis testing, and
◮ Techniques for computing type I and type II errors.
In biometric system, decision making is based on statistical criteria. This is because
biometric data is characterized by their high variability. Every time a user presents
biometric data, a unique template2 Given the type of biometric, even two immediately
successive samples of data from the same user generate entirely different templates.
Statistical techniques are used to recognize that these templates belong to the same
person. For example, a user may place the same finger on biometric device several times
and all generated templates will be different. To deal with this variety, statistical tools
must be used.
For processing the raw biometric data, various techniques of the signal or image
processing, pattern recognition, and decision making are used, in particular,
◮ 2D discrete Fourier transform,
◮ Filtering in spatial and frequency domains using Fourier transform,
1
In biometric system, sample is a biometric measure submitted by the user and captured by the data
acquisition tool.
2
A template is a small file derived from the distinctive features of a user’s biometric data. A template
is used in a system to perform biometric matches. Biometric system store and compare templates derived
from biometric data. Biometric data cannot be reconstructed from a biometric template.
S.N. Yanushkevich, Fundamentals of Biometric System Design 3

◮ Classifiers, and
◮ Pattern recognition module design.

In design of a biometric system, these techniques should be considered with respect to


the software or hardware implementation. This lecture brings these techniques together
in the context of the implementation.
Finally, this lecture introduces notation of the performance of biometric system.
Because biometric system is an application-specific computer system, the performance
is defined:

◮ In terms of specific application, such as operational accuracy, and


◮ In terms of computer platform, such as operational time.

In this lecture, performance in terms of operational accuracy is introduced (false reject


and accept rates, false match and non-match rates, failure to enroll).

Essentials of this lecture


• Statistical thinking. Statistical approach should be applied at all phases of a life cycle of
biometric system including experimental study, design techniques, testing, reliability
estimation, and quality control. Biometric data must be represented in the form
which is acceptable for decision making in verification and identification procedures.
For this, classic signal processing and pattern recognition methods are adopted.
• Statistical performance evaluation. Performance parameters of biometric system in
terms of the operational (system) accuracy and operational time (computational
speed) cannot be measured exactly, they can only estimated using statistical tech-
niques.
• Statistical decision-making. The variability of biometric data is propagated into the
templates and decision making. Decision making at various levels of biometric sys-
tem hierarchy is a statistical procedure by nature, that is decision making under
uncertainty.
S.N. Yanushkevich, Fundamentals of Biometric System Design 4

Biometric Methods and Techniques


Biometrics is a multidisciplinary area. Various advanced mathematical and engineering
methods and techniques are used in biometric system design. In this lecture, the methods
from the following directions are briefly introduced:

◮ Statistical methods,
◮ Methods of signal processing, and
◮ Methods of pattern recognition.

1 Basic statistics for biometric system design


Biometric systems begin with the measurement of a behavioral/physiological charac-
teristic. Key to all systems is the underlying assumption that the measured biometric
characteristic is both distinctive between individuals and repeatable over time for the
same individual. Statistical methods provide the techniques for the biometric character-
istic measuring.
In the implementation, the problems of measuring and controlling these random vari-
ations begin in the data acquisition module. The users characteristic must be presented
to a sensor. The presentation of any biometric to the sensor introduces a behavioral
(random) component to every biometric method. The output of the sensor forms the
input data upon which the system is built. It is a combination of (a) the biometric
measure, (b) the way the measure is presented, and (c) the technical characteristics of
the sensor. Both the repeatability and distinctiveness of the measurement are negatively
impacted by changes in any of these factors.

The engineering method and statistical thinking


An engineering approach to formulating and solving problems is applicable to design the
biometric devices and systems.
Step 1: Develop a clear and concise description of the problem.
Step 2: Identify, at least tentatively, the important factors that affect this problem or that may play a role
in its solution.
Step 3: Propose a model for the problem, using knowledge of the biometric phenomenon being used in the
biometric system. State any limitation or assumptions of the model.
Step 4: Conduct appropriate experiments and collect data to test or validate the tentative model or con-
clusions made in Steps 2 and 3.
Step 5: Refine the model on the basis of observed data.
Step 6: Manipulate the model to assist in developing an algorithm, program, and hardware platform.
S.N. Yanushkevich, Fundamentals of Biometric System Design 5

Step 7: Conduct an appropriate experiment to confirm that the proposed design solutions are both effective
and efficient with respect to given criteria.
Step 8: Draw conclusions or make recommendations based on design solutions.

The field of statistics deals with the collection, presentation, analysis, and use of
data to make decisions. Statistical techniques are used in all phases of biometric system
design, their comparison and testing, and improving existing designs. Statistical methods
are used to help us describe and understand variability. By variability, we mean that
any successive observation of a biometric system or biometric phenomenon does not
produce an identical result. Because the measurements exhibit variability, we say that
the measured parameter is a random variable. A convenient way to think of a random
variable, say X, which represents a measured quantity, is by using an appropriate model,
for example,
Random variable X = Constant µ + Noise ǫ
.
In the engineering environment, the data is almost always a sample that has been
selected from some population. In biometric system design, data is collected in three
ways:

◮ A retrospective study based on historical data; the engineer uses either all or one
sample of the historical process data from some period of time; for example, bio-
metric data from data bases, data from previous experimental study, etc.
◮ An observation study; the engineer observes the process during a period of routine
operation; for example, facial expressions, signatures, etc.
◮ A designed experiment; the engineer makes deliberate or purposeful changes in
controllable variables called factors of the system, observes the system output,
and makes a decision or an inference about which variables are responsible for the
changes that he/she observes in the system output; for example, feature extraction
from biometric data using an appropriate algorithm

Distinction between a designed experiment and an


observational/retrospective study
An important distinction between a designed experiment and either an observational
or retrospective study is that in the first one the different combinations of the factors of
interest are applied randomly to a set of experimental units. This allows cause-and-effect
relationships to be established, and that cannot be done with observational/retrospective
studies. A designed experiment is based on two statistical techniques: hypothesis testing
and confidence intervals.
S.N. Yanushkevich, Fundamentals of Biometric System Design 6

Example 1: (Designed experiment.) Assume that a company in-


troduces a new biometric device. How should an experiment be
designed to test its effectiveness? The basic method would be to
perform a comparison between the control devices and the new
device.

Any comparison is based on a measurement. If the same thing is measured several


times, in an ideal world, the same result would be obtained each time. In practice, there
are differences. Each result is thrown off by chance error, and the error changes from
measurement to measurement. No matter how carefully it was made, a measurement
could have turned out a bit differently from the previous way it did.

Statistical hypothesis
Many problems in biometric system design require that we decide whether to accept or
reject a statement about some parameters. The statement is called a hypothesis and the
decision-making procedure about the hypothesis is called hypothesis testing.

Statistical hypothesis
A statistical hypothesis is an assertion or conjecture concerning one or more populations.
The truth or false of statistical hypothesis is never known with absolute certainty, unless we
examine the entire population. This is impractical. Instead, we take a random sample from
the population of interest and use the data contained in this sample to provide evidence that
either supports or does not support the hypothesis (leads to rejection of the hypothesis).
The decision procedure must be done with the awareness of the probability of the wrong
conclusion. The rejection of a hypothesis implies that the sample evidence
refutes it. In other words: The rejection means that there is a small probability
of obtaining the sample information observed when, in fact, the hypothesis
is true.

The structure of hypothesis testing is formulated using the term null hypothesis. This
refers to any hypothesis we wish to test and is denoted by H0 . The rejection of H0 leads
to the acceptance of an alternative hypothesis, denoted by H1 .

Null and alternative hypothesis


The alternative hypothesis H1 represents the question to be answered; its specification is
crucial. The null hypothesis H0 nullifies or opposes H1 and is often the logical compliment
to H1 . This results in one of the two following conclusions:
Reject H0 : In favor of H1 because of sufficient evidence in the data
Fail to reject H0 : because of insufficient evidence in the data
S.N. Yanushkevich, Fundamentals of Biometric System Design 7

Example 2: (Null and alternative hypothesis.) Suppose that we


are interested in deciding whether or not the mean, µ, is equal to
value 50. Formally it is expressed as Null hypothesis H0 : µ =
50 and Alternative hypothesis H1 : µ 6= 50. That is, the con-
clusion is that we reject the hypothesis H0 in favor of hypothesis
H1 if µ 6= 50.

Because in Example 2 the alternative hypothesis specifies the values of µ that could
be either greater or less that 50, it is called a two-sided alternative hypothesis. In
some situations, we may wish to formulate a one-sided alternative hypothesis:

Null hypothesis H0 : µ = 50
One-sided alternative hypothesis H1 : µ < 50 or
One-sided alternative hypothesis H1 : µ > 50

Testing a statistical hypothesis


Let the null hypothesis be that the mean is µ = a, and the alternative hypothesis be
that µ 6= a. That is, we wish to test:

Null hypothesis H0 : µ = a
Two-sided alternative hypothesis H1 : µ 6= a

Suppose that a data sample of n is tested, and that the sample mean x is observed.
The sample mean is an estimate of the true population mean µ = a. A value of the
sample mean x, that falls close to the hypothesized value of µ, is the evidence that the
true mean µ is really a; that is, such evidence supports the null hypothesis H0 . On the
other hand, a sample mean x that is considerably different from a, is evidence in support
of the alternative hypothesis H1 . Thus, the sample mean represents the test statistics.

Example 3: (Critical region and values.) The sample mean x can


take on many different values. Suppose that if 48.5 ≤ x ≤ 51.5,
we will not reject the null hypothesis H0 : µ = 50. If either x <
48.5 or x > 51.5, we will reject the null hypothesis in favor of the
alternative hypothesis H1 : µ 6= 50. The values of x that are less
than 48.5 and greater than 51.5 constitute the critical region
for the test. The boundaries that define the critical regions (48.5
and 51.5) are called critical values.
S.N. Yanushkevich, Fundamentals of Biometric System Design 8

Therefore, we reject H0 in favor of H1 if the test statistic falls in the critical region,
and fails to reject H0 otherwise. This decision procedure can lead to either of the two
wrong conclusions:

Type I error or False reject rate (FRR): is defined as rejecting the null hypothesis
H0 when it is true. The type I error is also called the significant level of the test.
The probability of making a type I error is

α = P (Type I error) = P (Reject H0 when H0 is true)

Type II error or False accept rate (FAR): is defined as failing to reject the null hy-
pothesis when it is false. The probability of making a type II error is

β = P (Type II error) = P (Fail to reject H0 when H0 is false)

Properties of type I (FRR) and type II (FAR) errors


Property 1: Type I error and type II error are related. A decrease in the probability
of one generally results in an increase in the probability of the other
Property 2: The size of the critical region, and, therefore, the probability of committing
a type I error, can always be reduced by adjusting the critical value(s)
Property 3: An increase in the sample size n will reduce α and β simultaneously
Property 4: If H0 is false, β is maximum when the true value of a parameter approaches
the hypothesized value. The greater the distance between the true value and the
hypothesized value, the smaller β will be.

Recommendations for computing type I and II errors


Type I error. Generally, the designer controls the type I error probability α, called
a significance level, when the critical values (the boundaries that define the critical
region, see Example 3) are selected. Thus, it is usually easy for the designer to set the
type I error probability at (or near) any desired value. Because the designer can directly
control the probability of wrongly rejecting H0 , we always think of rejection of the null
hypothesis H0 as a strong conclusion.
Because we can control the probability of making a type I error, α, the problem is
what value should be used. The type I error probability is a measure of risk,
specifically, the risk of concluding the the null hypothesis is false when it really isn’t.
So, the value of α should be chosen to reflect the consequences (biometric data, device,
system, etc.) of incorrectly rejecting H0 :
◮ Smaller values of α would reflect more serious consequences, and
S.N. Yanushkevich, Fundamentals of Biometric System Design 9

◮ Larger values of α would be consistent with less severe consequences.

This is often hard to do, and what has evolved in much of biometric system design is
to use the value α = 0.05 in most situations, unless there is information available that
indicates that this is an inappropriate choice.
Type II error. The probability of type II error, β, is not a constant. It depends on
both the true value of the parameter and the sample size that we have selected. Because
the type II error probability β is a function of both, the sample size and extent to which
the null hypothesis H0 is false, it is customary to think of the decision not to reject H0
as a weak conclusion, unless we know that β is acceptably small. Therefore, rather
than saying we “accept H0 ” we prefer the terminology “fail to reject H0 ”.

Failing to reject H0 implies that we have not found sufficient evidence to reject H0 ,
that is, to make a strong statement. Failing to reject H0 does not necessarily mean
there is a high probability that H0 is true. It may simply mean that more data
are required to reach a strong conclusion. This can have important implications
for the formulation of hypotheses.

The power of a statistical test is the probability of rejecting the null hypothesis
H0 when the alternative hypothesis is true. The power is computed as

Power of a statistical test = 1 − β

The power of a statistical test can be interpreted as the probability of correctly


rejecting a false null hypothesis. The power of the test is very descriptive and concise
measure of the sensitivity of a statistical test, where by sensitivity we mean the ability
of the test to detect differences.

Example 4: (Type I and II errors.) The techniques for computing


type I and II errors for given data sample is shown in Fig. 1.

Estimating the mean


Even the most efficient estimator is unlikely to estimate a population parameter θ exactly.
It is true that our accuracy increases with large samples, but there is still no reason
why we should expect a point estimate from a given sample to be exactly equal to
the population parameter it is supposed to estimate. It is preferable to determine an
interval within which we would expect to find the value of the parameter. Such an
interval is called an interval estimate.
S.N. Yanushkevich, Fundamentals of Biometric System Design 10

Design example: Computing type I and II errors


Problem formulation:
Let be face features such as the regions of lips, mouth. nose, ears,
eys, eyebrow, and other facial measureing be detected. Let biometric
data corresponding to the lip topology is represented by a sample of
size n = 10, while the mean and the standard deviation are µ = 50
and σ = 2.5, respectively. This biometric data has a distribution
for which the conditions of the central limit theorem apply, so the
distribution of √
the sample√mean is approximately normal with mean
µ = 50 and σ/ n = 2.5/ 10 = 0.79. Find the probability of type I
error.

Step 1: The probability of type I error


The probability of making type I error (or significance level of our test)

α = P (Type I error) = P (Reject H0 when H0 is true)

is equal to the sum of the areas that have been shaded in the tails of the normal distribution. We
may find this probability as Left tail x2
z }| { z}|{
Probability of type I error, α = P (X < |{z} 48.5 when µ = 50) + P (X > 51.5 when µ = 50)
| {z }
x1 Right tail

The z-values that correspond to the critical values 48.5 and 51.5 are calculated as follows:
x1 − µ 48.5 − 50 x2 − µ 51.5 − 50
z1 = √ = = -1.90 and z2 = √ = = 1.90
σ/ n 0.79 σ/ n 0.79

Therefore α = P (Z < −1.90) + P (Z > 1.90) = P (Z < −1.90) + (1 − P (Z < 1.90)) =


0.0287 + (1 − 0.9713) = 0.0574
Conclusion: This implies that 5.74% of all random samples would lead to rejection of the
hypothesis H0 : µ = 50 when the true mean is really 50.

Step 2: Reducing a type I error by decreasing the critical region


From inspection of the critical region for H0 : µ = 50
versus H1 : µ 6= 50 and n = 10, note that we can reduce
α by pushing the critical regions further into the tails
of the distribution. For example, if we make the critical
values 48 and 52, the values of α is
α/2 = 0.0287 α/2 = 0.0287
48 − 50 52 − 50
α = P (Z2 < + P (Z2 > )
0.79 0.79
48.5 µ = 50 51.5

= P (Z < −2.53) + P (Z > 2.53) = 0.0057 + 0.0057 = 0.0114

Fig. 1: Techniques for computing type I and II errors (Example 4).


S.N. Yanushkevich, Fundamentals of Biometric System Design 11

Design example: Computing type I and II errors (Continuation)

Step 3: Reducing type I error by increasing the sample size


We could also reduce α by increasing the sample size, assuming that the critical values of 48.5
and 51.5 do not change. If n = 16, √σn = √2.5
16
= 0.625, and using the original critical region, we
find

48.5 − 50 51.5 − 50
z1 = = −2.40 and z2 = = 2.40
0.625 0.625

Therefore α = P (Z < −2.40) + P (Z > 2.40) = 0.0082 + 0.0082 = 0.0164


Step 4: Design decision on type I error
An acceptable type I error can be chosen from the following possibilities:
Type I error from the original critical region Z < −1.90, Z > 1.90 is α=0.0574
The type I error reduced by decreasing the critical region from
Z < −1.90, Z > 1.90 to Z < −2.53, Z > 2.53 is α=0.0114

Type I error reduced by increasing the sample size from n = 10 to n = 16 is α = 0.0164

Step 5: Specification of the probability of type II error


The probability of making type II error is

β = P (Type II error) = P (Fail to reject H0 when H0 is false)

To calculate β, we must have a specific alternative hypothesis; that is, we must have a particular
value of µ. For example, suppose we want to reject the null hypothesis H0 : µ = 50 whenever
the mean µ is grater than 52 or less than 48. We could calculate the probability of type II error
β for the values µ = 52 and µ = 48, and use this result to tell us something about how the
test procedure would perform. Because of the symmetry of normal distribution function, it is
only necessary to evaluate one of the two cases, say, find the probability of not rejecting the null
hypothesis H0 : µ = 50 when the true mean is µ = 52.

◮ The normal distribution on the left (see figure on the


left) is the distribution of the test statistic X when
Under H0: Under H1:
µ = 50 µ = 52
the null hypothesis H0 : µ = 50 is true (this is
what is meant by the expression “under H0 : µ =
50”)
◮ The normal distribution on the right is the distribu-
tion of the test statistic X when the alternative
hypothesis is true and the value of the mean is 52
48 50 52 54
(or “under H1 : µ = 52”).

Fig. 2: Techniques for computing type (continuation of Example 4).


S.N. Yanushkevich, Fundamentals of Biometric System Design 12

Design example: Computing type I and II errors (Continuation)

Step 5: (continuation)
Now the type II error will be committed if the sample mean x falls between 48.5 and 51.5 (the
critical region boundaries) when µ = 52. This is the probability that 48.5 ≤ X ≤ 51.5 when the
true mean is µ = 52, or the shaded area under the normal distribution on the right, that is

β = P (Type II error) = P (48.5 ≤ X ≤ 51.5 when µ = 52)

Step 6: Computing of the probability of type II error


The z-values corresponding to 48.5 and 51.5 when µ = 52 are
48.5 − 52 51.5 − 52
z1 = = −4.43 and z2 = = −0.63
0.79 0.79
Therefore,

Probability of type II error, β = P (−4.43 ≤ Z ≤ −0.63)


= P (Z ≤ −0.63) − P (Z ≤ −4.43)
= 0.2643 − 0.0000 = 0.2643

Conclusion: If we are testing H0 : µ = 50 against H1 : µ 6= 50 with n = 10, and the true


value of the mean is µ = 52, the probability that we will fail to reject the false null hypothesis is
0.2643 . By symmetry (see graphical representation in Fig. 2), if the truth value of the mean

is µ = 48, the value of β will also be 0.2643 .

Step 7: Analysis of a type II error


The probability of making a type II error β increases rapidly as the true value µ approaches the
hypothesized value.
For example, consider the case when the true value of the mean is µ = 50.5 and the hypothesized
value is H0 : µ = 50. The true value of µ is very close to 50, and the value for probability of
type II error β is β = P (48.5 ≤ X ≤ 51.5) when µ = 50.5.
The z-values corresponding to 48.5 and 51.5 when µ = 50.5 are
48.5 − 50.5 51.5 − 50.5
z1 = = −2.53 and z2 = = 1.27
0.79 0.79
Therefore β = P (−2.53 ≤ Z ≤ 1.27) = P (Z ≤ 1.27) − P (Z ≤ −2.53) = 0.8980 − 0.0057 =
0.8923 . This is higher than in case µ = 52, that is, we are more likely to accept the faulty
hypothesis µ = 50 (fail to reject H0 : µ = 50).

Fig. 3: Techniques for computing type I and type II errors (continuation of Example 4).
S.N. Yanushkevich, Fundamentals of Biometric System Design 13

Design example: Computing type I and II errors (Continuation)

Step 7: (Continuation)
Conclusion: The type II error probability is much higher for the case in which the true mean is 50.5
than for the case in which the mean is 52. Of course, in many practical situations, we would not be as
concerned with making a type II error if the mean were “close” to the hypothesized value. We would be
much more interested in identifying the large differences between the true mean and the value specified
in the null hypothesis.

Step 8: Reducing a type II error by increasing the sample size


The type II error probability also depends on the sample
size n. Suppose that the null hypothesis is H0 : µ = 50
Under H0:
µ = 50
Under H1:
µ = 52
and that the true value of the mean is µ = 52. By letting
the sample size increase from n = 10 to n = 16, we
can compare them using the graphics on the left. The
normal distribution on the left is the distribution of X
when µ = 50, and the normal distribution on the right is
48 50 52 54 the distribution of X when µ = 52. As shown in figure,
the type II error probability is

Probability of type II error β = P (48.5 ≤ X ≤ 51.5) when µ = 52


√ √
When n = 16, the standard deviation of X is δ/ n = 2.5/ 16 = 0.625, and the z-values corresponding
to 48.5 and 51.5 when µ = 52 are
48.5 − 52 51.5 − 52
z1 = = −5.60 and z2 = = −0.80
0.625 0.625
Therefore

Probability of type II error β = P (−5.60 ≤ Z ≤ −0.80)


= P (Z ≤ −0.80) − P (Z ≤ −5.60)
= 0.2119 − 0.0000 = 0.2119

This β = 0.2119 is smaller than β = 0.2642, so we decrease the probability of accepting the false
hypothesis H0 by increasing the sample size.

Step 9: Design decision on type II error


An acceptable type II error can be chosen from the following possibilities:
The type II error for the original sample size n = 10 and −4.43 ≤ Z ≤ −0.63 is 0.2643

The type II error, reduced by increasing the sample size from n = 10 to n = 16, is 0.2119

Fig. 4: Techniques for computing type I and type II errors(continuation of Example 4).
S.N. Yanushkevich, Fundamentals of Biometric System Design 14

Design example: Computing type I and II errors (Continuation)

Step 10: Computing the power of a test


Suppose that the true value of the mean is µ = 52. When n = 10, we found that β = 0.2643, so the
power of this test is

Power of the test = 1 − β = 1 − 0.2643 = 0.7357

Conclusion: The sensitivity of the test for detecting the difference between a mean of 50 and 52
is 0.7357 . That is, if the true mean is really 52, this test will correctly reject H0 : µ = 50 and
“detect” this difference 73.57% of the time. If this value of power is judged to be too low, the designer
can increase either α or the sample size n.

Fig. 5: Techniques for computing type I and type II errors (continuation of Example 4).

Let 0 < α < 1, then the interval a < θ < b, computed from the selected sample, is
called a 100(1 − α)% confidence interval, the fraction 1 − α is called the degree of
confidence, and the endpoints a and b, are called the lower and the upper confidence
limits.
If x is the mean of a random sample of size n from a population with known variance
σ 2 , a 100(1 − α)% confidence interval for µ is given by
σ σ
x − z α2 √ < µ < x + z α2 √ (1)
n n

where z α2 is the z-value leaving an area of α2 to the right.


Practice recommendation. In experiments, σ is often unknown, and normality
cannot always be assumed. If n ≥ 30, s can replace σ, and the confidence interval

s
Confidence interval = x ± z α2 √
n

may be used. This is often referred to as a large-sample confidence interval. The justifi-
cation lies only in the presumption that with a sample as large as 30 and the population
distribution not too skewed, s (standard deviation from the sample) will be very close
to the true σ and, thus, the central limit theorem prevails. It should be emphasized that
this is only an approximation, and the quality of the approach becomes better as the
sample size grows larger.
S.N. Yanushkevich, Fundamentals of Biometric System Design 15

The 100(1−α)% confidence interval provides an estimate of the accuracy of our point
estimate. If µ is actually the center value of the interval, then x estimates µ without
error. However, in most cases, x will not be exactly equal to µ and the point estimate is
an error.

Theorem 1: If x is used as an estimate of µ, we can then be 100(1 − α)% confident that


the error will not exceed the value
σ
Error = z α2 × √ (2)
n

Example 5: (Errors of the confidence intervals.) Hand geometry


is defined as a surface area of the hand or fingers and the corre-
sponded measures (length, width, and thickness). The average
distance between two points in a hand in 36 different measure-
ments is found to be 2.6 mm. Calculate: (a) the 95% and 99%
confidence intervals for the mean between these hand points and
(b) the accuracy of point estimate using Theorem 1. Assume
that the population standard deviation is σ = 0.3. The solution
is given in Fig. 6.

Often in experimental studies, we are interested in how large a sample of biometric


data is necessary to ensure that the error in estimating µ will be less than a specified
amount e.

Theorem 2: If x is used as an estimate of µ, we can be 100(1 − α)% confident that the


error will not exceed a specified amount e when the sample size is
 α 
z2 × σ 2
Sample size n = (3)
e

Theorem 2 is applicable only if we know the variance of the population from which we
are to select our sample. Lacking this information, we could take a preliminary sample of
size n ≥ 30 to provide an estimate σ. Then, using this estimate as an approximation for
σ in Theorem 2, we could determine approximately how many observations are needed
to provide the desired degree of accuracy.
S.N. Yanushkevich, Fundamentals of Biometric System Design 16

Design example: Errors of the confidence intervals

Problem formulation:
Let the hand geometry measurement results in a sample size of n = 36, the
sample mean x = 2.6, and the population standard deviation is σ = 0.3.
Calculate:
(a) the 95% and 99% confidence intervals for the mean between these hand
points;
(b) the accuracy of a point estimate using Theorem 1
(c) the sample size, if we want to be 95% confident that our estimate of µ
does not exceed 0.05 (Theorem 2)

Step 1: If x is the mean of a random sample of size n from a population with known variance
σ 2 , a 100(1 − α)% confidence interval for µ is given by
σ σ
x − z α2 √ < µ < x + z α2 √
n n
α
where z α2 is the z-value leaving an area of 2 to the right.
Step 2: For α = 0.05 , n = 36, x = 2.6, and σ = 0.3, the 95% confidence interval is

0.3 0.3
2.6 − (1.96) √ < µ < 2.6 + (1.96) √ , that is, 2.50 < µ < 2.70
| {z } 36 | {z } 36
z0.05/2 z0.05/2

Note that z−value, leaving an area of 0.025 to the right, and, therefore, an area of 0.975 to the
left, is z 0.05
2
= z0.025 = 1.96 (see the table)
Step 3: For α = 0.01 , n = 36, x = 2.6, and σ = 0.3, the 99% confidence interval is

0.3 0.3
2.6 − (2.575) √ < µ < 2.6 + (2.575) √ , that is, 2.47 < µ < 2.73
| {z } 36 | {z } 36
z0.01/2 z0.01/2

Note that z−value, leaving an area of 0.005 to the right, and, therefore, an area of 0.995 to the
left, is z 0.01
2
= z0.005 = 2.575 (see the table)
Observation: A longer interval is required to estimate µ with a higher degree of confidence.
Decision: Based on Theorem 1, we are 95% confident that the sample mean x = 2.6 differs from
the true mean µ by an amount that is less than
σ 0.3
z α2 × √ = (1.96) √ = 0.98
n 36
By analogy, we are 99% confident that the sample mean x = 2.6 differs from the true mean µ by
an amount that is less than
σ 0.3
z α2 × √ = (2.575) √ = 0.13
n 36

Fig. 6: The error of estimating the mean (Example 5).


S.N. Yanushkevich, Fundamentals of Biometric System Design 17

Example 6: (Sample size) (Continuation of Example 5) How


large a sample is required if we want to be 95% confident that
our estimate of µ is off by less than 0.05? Using Theorem 2,
 α   2
z2 × σ 2 1.96 × 0.3
n= = = 138.3 ≈ 139
e 0.05

Therefore, we can be 95% confident that a random sample of size


139 will provide an estimate x different from µ by an amount of
less than 0.05.

2 Biometric system performance evaluation


Fig. 7 contains the basic definitions and terminology used in the design and testing of
biometric systems. In thus design, the terms such as a sample of biometric data, user
template, matching score, decision-making, decision rule and decision error rates are
used in specific-application meaning.

2.1 Matching score


A broad category of variables, impacting the way in which the users inherent biometric
characteristics, are displayed to the sensor. In many cases, the distinction between
changes in the fundamental biometric characteristics and the presentation effects may
not be clear.
Two samples of the same biometric characteristic from the same person are not iden-
tical due to imperfect imaging conditions, changes in the users physiological or behavioral
characteristics, ambient conditions, and user‘s interaction with the sensor. Therefore,
the response of a biometric matching system is the matching score S(XQ , XI )
X
z }|I {
Response = Matching score S(Input, T emplate)
| {z }
XQ

that quantifies the similarity between the input XQ and the template XI representations.
This similarity can be encoded by a single number.
S.N. Yanushkevich, Fundamentals of Biometric System Design 18

Basic definitions and terminology


Sample: A biometric measure submitted by the user.
Template: A user’s reference measure based on features extracted from
the enrolment samples.
Matching score: A measure of the similarity between features derived
from a presented sample and a stored template. A match/nonmatch
decision may be made according to whether this score exceeds a
decision threshold.
System decision: A determination of the probable validity of a users
claim to identity/non-identity in the system.
Transaction: An attempt by a user to validate a claim of identity or
non-identity by consecutively submitting one or more samples, as
allowed by the system’s decision policy.
Verification: The user makes a positive claim to an identity, requiring
a one-to-one comparison of the submitted sample to the enrolled
template for the claimed identity.
Identification: The user makes either no claim or an implicit negative
claim to an enrolled identity, and a one-to-many search of the
entire enrolled database is required.
Positive claim of identity: The user claims to be enrolled in or known to
the system. An explicit claim might be accompanied by a claimed
identity in the form of a name, or personal identification number
(PIN). Common access control systems are an example.
Negative claim of identity: The user claims not to be known to or enrolled
in the system. For example, enrolment in social service systems
open only to those not already enrolled.
Genuine claim of identity: A user making a truthful positive claim
about identity in the system. The user truthfully claims to be
him/herself, leading to a comparison of a sample with a truly
matching template.
Impostor claim of identity: A user making a false positive claim about
identity in the system. The user falsely claims to be someone
else, leading to the comparison of a sample with a non-matching
template.

Fig. 7: Basic definitions and terminology that are used in biometric system design.
S.N. Yanushkevich, Fundamentals of Biometric System Design 19

Example 7: (Response.) Similarity encoded by YES (1) or NO


(0), between the input XQ given its number 11101 and the tem-
plate XI given its number 10011 can be represented by the fol-
lowing binary number
NO # XQ #X
z}|{ z }| { z }|I {
0 11101 10011
| {z }
Binary number

2.2 Decision rule


If the stored biometric template of a user I is represented by XI and the acquired input
for recognition is represented by XQ , then the null H0 and alternate H1 hypotheses are:
Null hypothesis H0 : Input XQ does not come from the same person as the template
XI ; the associated decision is: “Person I is not who he/she claims to be”;
Alternate hypotheses H1 : Input XQ comes from the same person as the template
XI ; the associated decision is: “Person I is who he/she claims to be”
That is, we wish to test
Null hypothesis H0 : D = D0
Alternative hypothesis H1 : D 6= D0
The decision rule is as follows: if the matching score S(XQ , XI ) is less than the system
threshold t, then decide H0 , else decide H1 .

Controlled decision making in a biometric system


Decision
The higher the score, the more certain the sys-
making tem is that the two biometric measurements come
from the same person. The system decision is
Probability

regulated by the threshold t:

Threshold Decision 1: Pairs of biometric samples gener-


ating scores higher than or equal to t are
inferred as mate pairs, that is, the pairs
belong to the same person.
t Decision 2: Pairs of biometric samples gener-
Nonmate pairs Mate pairs ating scores lower than t are inferred as
(different persons) (the same person) nonmate pairs, that is, the pairs belong
M a t c h i n g s c o r e to different persons.
S.N. Yanushkevich, Fundamentals of Biometric System Design 20

2.3 Decision error rates


Decision errors are due to matching errors and image acquisition errors. These errors are
summed up and drive the decision process at various levels of the system, in particular,
in situation when (a) one-to-one or one-to-many matching is required; (b) there is a
positive or negative claim of identity; and (c) the system allows multiple attempts (the
decision policy). Biometric performance has traditionally been stated in terms of the
decision error rates.

2.4 FRR computing


The FRR (type I error) is defined as the probability that the user making a true claim
about his/her identity will be rejected as him/herself. That is, the FRR is the expected
proportion of transactions with truthful claims of identity (in a positive ID system) or
non-identity (in a negative ID system) that are incorrectly denied. A transaction may
consist of one or more truthful attempts dependent upon the decision policy. Note that
rejection always refers to the claim of the user

Example 8: (False reject.) If person A1 types his/her correct user


ID into the biometric login for the given terminal, this means
that A1 has just made a true claim that he/she is A1 . Person
A1 presents his/her biometric measurement for verification. If
the biometric system does not match the template of A1 to the
A1 ’s presented measurement, then there is a false reject. This
could happen because the matching threshold is too low, or the
biometric features presented by a person A1 are not close enough
to the biometric template.

Suppose a person A1 was denied his authentication (unsuccessfully authenticated) as


A1 n times, while the total number of attempts was N , then FRR = n/N . Statistically,
the more times something is done, the greater is the confidence in the result. The result
is the mean (average) FRR for K users of the system:
K
1 X
FRR = FRRi
K i=1

FRR and matching algorithm


The strength of the FRR is the robustness of the algorithm. The more accurate the
matching algorithm, the less likely a false rejection will happen.
S.N. Yanushkevich, Fundamentals of Biometric System Design 21

2.5 FAR computing


The FAR (type II error) is defined as the probability that a user making a false claim
about his/her identity will be verified as that false identity. That is, FAR is the expected
proportion of transactions with wrongful claims of identity (in a positive ID system) or
non-identity (in a negative ID system) that are incorrectly confirmed. A transaction may
consist of one or more wrongful attempts, depending upon the decision policy. Note that
acceptance always refers to the claim of the user3 .

Example 9: (False accept.) If a person A1 types the user ID of


another person A2 into the biometric login for the given terminal,
this means that A1 has just made a false claim that he or she
is A2 . The person A1 presents his biometric measurement for
verification. If the biometric system matches A1 to A2 , then
there is a false acceptance. This could happen because the
matching threshold is set too high, or it could be that biometric
features of A1 and A2 are very similar.

Suppose the person A1 was n times successfully authenticated as A2 in the total


number of attempts, N , then FAR = n/N . The FRR is the mean (average) for K users
of a system:
K
1 X
FAR = FARi
K i=1

FAR and matching algorithm


The FAR characterizes the strength of the matching algorithm. The stronger the algorithm,
the less likely that a false authentication will happen.

2.6 Matching errors


Matching algorithm errors, occurred while performing a single comparison of a submitted
sample against a single enrolled template/model, are defined to avoid ambiguity within
the system allowing multiple attempts or having multiple templates.
3
It should be noted that conflicting definitions are implicit in literature. In access control literature,
a false acceptance is said to have occurred when a submitted sample is incorrectly matched to a template
enrolled by another user.
S.N. Yanushkevich, Fundamentals of Biometric System Design 22

False match rate (FMR) is the expected probability that a sample will be falsely
declared to match a single randomly-selected non-self template; that is, measure-
ments from two different persons are interpreted as if they were from the same
person.
False non-match rate (FNMR) is the expected probability that a sample will be
falsely declared not to match a template of the same measure from the same user
supplying the sample; that is, measurements from the same person are treated as
if they were from two different persons.
Equal error rate (EER) is the value defined as EER=FMR=FNMR, that is, the
point where false match and false non-match curves cross is called equal error rate
or crossover rate. The EER provides an indicator of the system’s performance: a
lower EER indicates a system with good level of sensitivity and performance.

Difference between false match/non-match rates and false accept/reject rates is in-
troduced in Fig. 8.

Example 10: (FMR and FNMR.) Let us assume that a certain


commercial biometric verification system wishes to operate at
0.001% FMR. At this setting, several biometric systems, such as
the state-of-the-art fingerprint and iris recognition systems, can
deliver less than 1% FNMR. A FMR of 0.001% indicates that,
if a hacker launches a brute force attack with a large number of
different fingerprints, 1 out of 100 000 attempts will succeed on
an average.

To attack a biometric-based system, one needs to generate (or acquire) a large number
of samples of that biometric, which is much more difficult than generating a large number
of PINs/passwords. The FMR of a biometric system can be arbitrarily reduced for higher
security at the cost of increased inconvenience to the users that results from a higher
FNMR. Note that a longer PIN or password also increases the security while causing
more inconvenience in remembering and correctly typing them.
S.N. Yanushkevich, Fundamentals of Biometric System Design 23

Difference between false match/non-match rates and


false accept/reject rates
False match rate (FMR) and false non-match rate (FMNR) are not generally synonymous
with false accept rate (FAR) and false reject rate (FRR), respectively:

◮ False match/non-match rates are calculated over the number of comparisons:

Biometric False
False match V erif ication V erif ication
←−←− −→−→ non-match
rate system
rate
| {z }
# of comparisons
| {z }
# of comparisons

◮ False accept/reject rates are calculated over transactions and refer to the acceptance
or rejection of the stated hypothesis, whether positive or negative:

False accept V erif ication Biometric V erif ication False reject


←−←− −→−→
rate system rate
| {z } | {z }
# of transactions # of transactions

Fig. 8: Difference between false match/non-match rates and false accept/reject rates.

Example 11: (FMR and FNMR.) Consider that airport authori-


ties are looking for the 100 criminals.
(a) Consider a verification system. The state-of-the-art finger-
print verification system operates at 1% FNMR and 0.001%
FMR; that is, this system would fail to match the correct users
1% of the time and erroneously verify wrong users 0.001% of the
time.
(b) Consider an identification system. Assume that the identifi-
cation FMR is still be 1%, the FNMR is 0.1%. That is, while the
system has a 99% chance of catching a criminal, it will produce
large number of false alarms. For example, if 10,000 people may
use a airport in a day, the system will produce 10 false alarms.
S.N. Yanushkevich, Fundamentals of Biometric System Design 24

In fact, the tradeoff between the FMR and FNMR rates in a biometric system is no
different from that in any detection system, including the metal detectors already in use
at all the airports. Other negative recognition applications such as background checks
and forensic criminal identification are also expected to operate in semi-automatic mode
and their use follows a similar cost-benefit analysis.

2.7 FTE computing


The FTE (failure to enroll) is defined as the probability that a user, attempting to
biometrically enroll, will be unable to do so. The FTE is usually defined by a minimum
of three attempts. The FTE can be calculated as follows. Let unsuccessful enrollment
event occurs if a person A1 , on his third attempt, is still unsuccessful. Let n be the
number of unsuccessful enrollment events, and N be the total number of enrollment
events. Then FTE = n/N . The mean (average) FTE for K users of a system is
K
1 X
FTE = FTEi
K i=1

The EER (equal error rate) is defined as the crossover point on a graph that has both
the FAR and FRR curves plotted.

Genuine and impostor distribution


The distribution of scores, generated from
Imposter
pairs of samples taken from the same person,
distribution Threshold is called the genuine distribution. The dis-
Probability

Genuine tribution of scores while the samples are taken


distribution from different persons, is called the impostor
distribution.
FMR and FNMR for a given threshold t are
FMR
displayed over the genuine and impostor score
FNMR
distributions; FMR is the percentage of non-
mate pairs whose matching scores are greater
t than or equal to t, and FNMR is the percent-
MatchIng score age of mate pairs whose matching scores are
less than t.

The FMR (FAR) and FNMR (FRR) are related and must be balanced (Figure 9).
For example, in access control, perfect security would require denying access to everyone.
Conversely, granting access to everyone would mean no security. Obviously, neither
extreme is reasonable, and biometric system must operate somewhere between the two.
S.N. Yanushkevich, Fundamentals of Biometric System Design 25

False match rate (FMR) or False non-match rate (FNMR) or


False accept rate (FAR) False reject rate (FRR)

◮ A FMR and FAR occurs when ◮ A FNMR and FRR occurs when a system
a system incorrectly rejects a valid identity; A FNMR
matches an identity; (FRR) is the probability of valid
A FMR (FAR) is individuals being wrongly not
the probability of matched.
individuals being ◮ False non-matches occur because
wrongly matched. there is not a sufficiently strong
◮ False matches may occur similarity between individuals’
because there is enrollment and trial templates,
a high degree of which could be caused by any
similarity between number of conditions. For
two individuals’ example, an individual’s biometric
characteristics. data may have changed as a result
◮ In a verification and of aging or injury.
positive identification ◮ In verification and positive
system, unauthorized identification system, people
people can be granted can be denied access to some
access to facilities or facility or resource as a result
resources as a result of a system’s failure to make a
of an incorrect match. correct match.
◮ In a negative ◮ In a negative identification system,
identification system, the result of a false non-match
the result of a false may be that a person is granted
match may be to deny access to resources to which
access. he/she should be denied.

Balance of FMR (FAR) and FNMR (FRR)

FMR (FAR) and FNMR (FRR) are related and must, therefore, always be
assessed in tandem, and acceptable risk levels must be balanced with the
disadvantages of inconvenience.

Fig. 9: Relations of the FMR (FAR) and FNMR (FRR).


S.N. Yanushkevich, Fundamentals of Biometric System Design 26

3 Receiver operating characteristic (ROC) curves


The standard method for expressing the technical performance of a biometric device for a
specific population in a specific application is the Receiver Operating Characteristic
(ROC) curve.

3.1 Applications of biometric system in terms of a ROC


The system performance at all the operating points (thresholds ) can be depicted in the
form given in Fig. 10.
False accept rate (FAR)

Forensic
applications

High security
Civilian
applications
applications

False reject rate (FRR)

Fig. 10: Typical operating points of different biometric applications are displayed on the
ROC curve.

An ROC curve plots, parametrically as a function of the decision threshold t = T ,


the rate of “false positives” (i.e. impostor attempts accepted) is shown on the X-axis,
against the corresponding rate of “true positives” (i.e. genuine attempts accepted) on
the Y -axis.

3.2 Equal error rate (EER) in terms of ROC


Graphical interpretation of the EER is given in Figure 11. The FMR, FNMR, and EER
behavior is expressed in terms of a ROC. The FMR and FNMR can be considered as the
functions of the threshold t = T . These functions give the error rates when the match
decision is made at some threshold T .

3.3 Comparison the performance of biometric systems


ROC curves allow to compare the performance of different systems under similar condi-
tions, or of a single system under differing conditions.
S.N. Yanushkevich, Fundamentals of Biometric System Design 27

◮ When the threshold T is set low, the FMR is high and the
FRR FNMR is low; when T is set high, the FMR is low and
the FNMR is high.
◮ For a given matcher, operating point (a point on the ROC)
1
is often given by specifying the threshold T .
Equal error rate (EER) ◮ In biometric system design, when specifying an applica-
tion, or a performance target, or when comparing two
matches, the operating point is specified by choosing
FMR or FNMR.
FAR
◮ The equal error operating point is defined as the EER. The
0 matcher can operates with highly unequal FMR and
1 FNMR; in this case, the EER is unreliable summary of
system accuracy.

Fig. 11: The relationship between FRR, FAR, and EER.

Example 12: (Comparison two matchers.) Various approaches


can be used in matcher design. The matches must be compared
using criteria of operational accuracy (method and algorithm)
and operational time (computing platform). In Figure 12, the
technique for comparison two matches is introduced using crite-
rion of operational accuracy.

3.4 Confidence intervals for the ROC


Each point on the ROC curve is calculated by integrating “genuine” and “impostor”
score distributions between zero and some threshold, t = T . Confidence intervals for
the ROC at each threshold, t, have been found through a summation of the binomial
distribution under the assumption that each comparison represents a Bernoulli trial4
The confidence, β, given a non-varying probability p, of k sample/template com-
parison scores, or fewer, out of n independent comparison scores being in the region of
4
An experiment can be represented by n repeated Bernoulli trials, each with two outcomes that
can be labeled success, with a probability p, or failure, with probability 1 − p. The probability
distribution of 
the binomial
 random variable X, that is, the number of successes in n independent trials,
n!
is b(x; n, p) = (n−x)!px q n−x , x = 0, 1, . . . n. For example, for n = 3 and p = 0.25, the probability
 
3!
distribution of X can be calculated as b(x; 3, 0.25) = (3−x)! (0.25)x (0.75)3−x , x = 0, 1, . . . 3.
S.N. Yanushkevich, Fundamentals of Biometric System Design 28

Design example: Comparison two matchers using the ROC curves

Problem formulation:
FRR

Matcher A
1 In biometric system design, two types of matches are spec-
Matcher B
ified, type A and type B matcher. These matches are
described by the ROC curves. Figure in the left shows the
corresponding ROCs for these matches and their operating
Target FNMR
points for some specified target FNMR. The problem is to
choose the better matcher.
0 FAR
a b 1

Step 1: Understanding of the initial data


The ROCs of two matches are plotted in the form acceptable for comparison (the same type of
ROC and scaling factors). The ROC shows the trade-off between FMR and FNMR with respect
to the threshold T . For a given operational matcher, the operational point is specified by the
particular threshold T .

Step 2: Comparison of two matchers


It follows from the ROC characteristics of the matchers, that:
◮ For every specified FMR it has a lower FNMR;
◮ For every specified FNMR it has a lower FMR;

Conclusion
The matcher A is better than matcher A for all possible thresholds T .

Fig. 12: Techniques for comparison two matchers using the ROC curves (Example 12).

integration would be
k
X
Confidence intervals 1 − β = P (i ≤ k) = b(i; n, p) (4)
i=0
| {z }
Available from Table

where binomial sums are available and given table for different values of n and p is given.
S.N. Yanushkevich, Fundamentals of Biometric System Design 29

Example 13: (Binomial distribution.) Examples of the manipu-


lation of the binomial distribution given n = 15 and p = 0.4, are
as follows 9 X
(a) P (i ≥ 10) = 1 − P (i < 10) = 1 − b(i; 15, 0.4) = 0.0338
i=0
| {z }
From Table: 0.9662
8
X 8
X 2
X
(b) P (3 ≤ i ≤ 8) = b(i; 15, 0.4) = b(i; 15, 0.4) − b(i; 15, 0.4)
i=3 i=0 i=0
| {z } | {z }
From Table: 0.9050 From Table: 0.0271

= 0.8779
5
X 4
X
(c) P (i = 5) = b(5; 15, 0.4) = b(i; 15, 0.4) − b(i; 15, 0.4)
i=0 i=0
| {z } | {z }
From Table: 0.4032 From Table: 0.2173

= 0.1859

Equation 4 might be inverted to determine the required size, n, of a biometric test


for a given level of confidence, β, if the error probability, p, is known in advance.

3.5 The number of comparison scores


The required number of comparison scores (and test subjects) cannot be predicted prior
to testing. To deal with this, Doddingtons Law is to test until 30 errors have been
observed.
Example 14: (Doddingtons law.) If the test is large enough to
produce 30 errors, we will be about 95% sure that the true value
of the error rate for this test lies within about 40% of that mea-
sured.
If the test is large enough to produce 30 errors, we will be about 95% sure that
the true value of the error rate for this test lies within about 40% of that measured,
provided that Equation 4 is applicable. The comparison of biometric measures will not
be Bernoulli trials and Equation 4 will not be applicable if:
(a) Trials are not independent, and
(b) The error probability varies across the population.

Example 15: (Equation 4 is not applicable.) Trials will not be


independent if users stop after a successful use and continue after
a non-successful use.
S.N. Yanushkevich, Fundamentals of Biometric System Design 30

Example 16: (Failure to enroll (FTE) rate.) A fingerprint bio-


metric system may be unable to extract features from the fin-
gerprints of certain individuals, due to the poor quality of the
ridges. Thus, there is a failure to enroll (FTE) rate associated
with using a single biometric trait. It has been empirically es-
timated that as much as 4% of the population may have poor
quality fingerprint ridges that are difficult to image with the
currently available fingerprint sensors. This fact results in FTE
errors.

3.6 Test size


The size of an evaluation, in terms of the number of volunteers and the number of
attempts made (and, if applicable, the number of fingers/hands/eyes used per person)
will affect how accurately we can measure error rates. The larger the test, the more
accurate the results are likely to be.
Rules such as the Rule of 3 and Rule of 30, detailed below, give lower bounds to
the number of attempts needed for a given level of accuracy. However, these rules are
overoptimistic, as they assume that error rates are due to a single source of variability,
which is not generally the case with biometrics. Ten enrolment-test sample pairs from
each of a hundred people is not statistically equivalent to a single enrolment-test sample
pair from each of a thousand people, and will not deliver the same level of certainty in
the results.
The Rule of 3 addresses the question: What is the lowest error rate that can be
statistically established with a given number N of (independent identically distributed)
comparisons? This value is the error rate p for which the probability of errors in N
trials is zero, purely by chance. It can be, for example, 5%.

The Rule of 3
3
Error rate p ≈ for a 95% confidence level (5)
N
2
Error rate p ≈ for a 90% confidence level (6)
N

Example 17: (Rule of 3.) A test of 300 independent samples can


3
be said with 95% confidence to have an error rate of 300
= 1%
or less.
S.N. Yanushkevich, Fundamentals of Biometric System Design 31

The “Rule of 30” Doddington5 proposes the Rule of 30 for helping determine the
test size: To be 90% confident that the true error rate is within ±30% of the
observed value, we need at least 30 errors.

The rule below generalizes different proportional error bands:

The Rule of 30
To be 90% confident that the true error rate is within

±10% of the observed value, we need at least 260 errors


±30% of the observed value, we need at least 30 errors
±50% of the observed value, we need at least 11 errors

Example 18: (Rule of 30.) If we have 30 false non-match errors in


3,000 independent genuine trials, we can say with 90% confidence
30
that the true error rate is 3000 = 0.01 ± 30%, that is,

1% − 0.3 ≤ True error rate ≤ 1% + 0.3


0.7% ≤ True error rate ≤ 1.3%

3.7 Estimating confidence intervals


With sufficiently large samples, the central limit theorem implies that the observed error
rates should follow an approximately normal distribution. However, because we are
dealing with proportions near to 0%, and the variance in the measures is not uniform
over the population, some skewness is likely to remain until the sample size is quite large.
Confidence intervals under the assumption of normality are considered in Section 1.
Often when Equation 1 is applied, the confidence interval reaches into negative values
for the observed error rate. However, negative error rates are impossible. This is due
to non-normality of the distribution of observed error rates. In these cases a special
approaches are required, such as non-parametric methods. The latter reduce the need
to make assumptions about the underlying distribution of the observed error rates and
the dependencies between attempts.

5
Doddington, G.R., Przybocki, M.A., Martin, A.F., and Reynolds, D.A. The NIST speaker recog-
nition evaluation: Overview methodology, systems, results, perspective. Speech Communication, 2000,
31(2-3), 225-254.
S.N. Yanushkevich, Fundamentals of Biometric System Design 32

References
[1] Bolle R., Connell J., Pankanti S., Ratha N., and Senior A. Guide to Biometrics. Springer,
Heidelberg, 2004.

[2] Germain R. S., Califano A., and Coville S. Fingerprint matching using transformation
parameter clustering. IEEE Computational Science and Engineering, pp. 42–49, Oct.-
Dec. 1997.

[3] Gonzalez R. C., Woods R. E., and Eddins S. L. Digital Image Processing Using MATLAB.
Pearson, Prentice Hall, 2004.

[4] Joliffe I. T. Principle Component Analysis. Springer-Verlag, New York, 1986.

[5] Rangayyan R.M. Biomedical Image Analysis. CRC Press, Boca Raton, FL, 2005.

[6] Richards A. Alien Vision: Exploring the Electromagnetic Spectrum with Imaging Tech-
nology. SPIE, 2001.
S.N. Yanushkevich, Fundamentals of Biometric System Design 33

4 Problems
Problem 1: The distances Di between feature points measured in a sample of signatures are
represented by a normally distributed random variable d with the following parameters of the
mean µ and standard deviation σ, n(d; µ, σ) (Fig. 13a):

(a) If µ = 40 and σ = 1.5, calculate the probability P (39 < d < 42)
Solution:

d1 − µ d2 − µ
Step 1: <z<
σ σ
39 − 40 42 − 40 Standard normal
<z< distribution
1.5 1.5 n(z;0,1)

that is − 0.67 < z < 1.33


Step 2: P (39 < d < 42) = P (−0.67 < z < 1.33)
z
Step 3: P (−0.67 < z < 1.33)
− 0.67 0 1.33

= P (z < 1.33) − P (z < −0.67) = 0.6568 P(− 0.67 < z < 1.33)

Answer: P (39 < d < 42) = 0.6568

(b) If µ = 2.03 and σ = 0.44, calculate the probability P (d > 2.5)


Solution:

d − 2.03 2.5 − 2.03


Step 1: > Standard normal
0.44 0.44 distribution
n(z;0,1)
that is z > 1.07
Step 2: P (d > 2.5) = P (z > 1.07)
Step 3: P (z > 1.07)) = 1 − P (z < 1.07) = 0.1423 z
0 1.07
Answer: P (d > 2.5) = 0.1423 P(z > 1.07)

(c) If µ = 5 and σ = 1.58, calculate the probability P (d = 4)


Solution: Let d1 = 1.5 and d2 =4.5, then
S.N. Yanushkevich, Fundamentals of Biometric System Design 34

d1 − µ d2 − µ
Step 1: <z<
σ σ
1.5 − 5 4.5 − 5
<z< Standard normal
1.58 1.58 distribution
n(z;0,1)
that is − 2.22 < z < −0.32
Step 2: P (1.5 < d < 4.5)
= P (−2.22 < z < −0.32) z
Step 3: P (−2.22 < z < −0.32) − 2.22 0.32 0

P(− 2.22 < z < − 0.32)


= P (z < −0.32) − P (z < −2.22) = 0.3613
Answer: P (39 < d < 42) = 0.3613

Di
Di

(a) (b)

Fig. 13: The distances Di between feature points measured in a signature (a) and hand
(b) are represented by a normally distributed, n(d; µ, σ), random variable d with
the mean µ and standard deviation σ (Problems 1 and 2) .

Problem 2: The distances Di between feature points measured in a sample of signatures are
represented by a normally distributed, n(d; µ, σ), random variable d with the mean µ = 10 and
standard deviation σ = 1.5 (Fig. 13b). Calculate

(a) P (9 < d < 11) and P (8 < d < 12)


(b) P (d < 10) and P (d < 9)
(c) P (d > 10) and P (d > 11)
(d) P (d = 10) , P (d = 9), and P (d = 11)
(e) P (8 < d < 10) and P (10 < d < 12)
S.N. Yanushkevich, Fundamentals of Biometric System Design 35

Problem 3: The sample of distances Di between feature points measured on a retina image are
represented by a normally distributed, n(d; µ, σ), random variable d (Fig. 14a). The sample
size is n = 36 and the sample mean is d = 2.6. The standard variance, σ, of the population is
assumed σ = 0.3. Calculate:

(a) 90% confidence interval for µ


Solution: Using Equation 1
σ σ
d − z α2 √ < µ < d + z α2 √
n n
α Standard normal
for α = 0.1, = 0.05 z0.05 = 1.645 distribution
2 n(z;0,1)

0.3 0.3
2.6 − 1.645 √ < µ < 2.6 + 1.645 √
36 36
2.52 < µ < 2.68 z
−1.645 0 1.645

Answer: With 90% confidence, the true mean lies 5% 5%

within 2.52 < µ < 2.68 of the observed sample


mean d = 2.6
(b) 95% confidence interval for µ
Solution: Using Equation 1
σ σ
d − z α2 √ < µ < d + z α2 √
n n
α Standard normal
for α = 0.05, = 0.025 z0.025 = 1.96 distribution
2 n(z;0,1)

0.3 0.3
2.6 − 1.96 √ < µ < 2.6 + 1.96 √
36 36
2.50 < µ < 2.70 z
−1.96 0 1.96

Answer: With 95% confidence, the true mean lies 2.5% 2.5%

within 2.50 < µ < 2.70 of the observed sample


mean d = 2.6
(c) 99% confidence interval for µ
Solution: Using Equation 1
S.N. Yanushkevich, Fundamentals of Biometric System Design 36

σ σ
d − z α2 √ < µ < d + z α2 √
n n
α Standard normal
for α = 0.01, = 0.005 z0.005 = 2.575 distribution
2 n(z;0,1)

0.3 0.3
2.6 − 2.575 √ < µ < 2.6 + 2.575 √
36 36
2.47 < µ < 2.73 z
−2.575 0 2.575

Answer: With 99% confidence, the true mean lies 0.5% 0.5%

within 2.47 < µ < 2.73 of the observed sample


mean d = 2.6
Observation: The larger the value we choose for z α2 , the wider we make all the intervals and
the more confident we can be that the parameter sample selected will produce an interval
that contains the unknown parameter µ.

(a) (b)

Fig. 14: The distances Di between feature points measured in a retina (a) and gait (b)
are represented by a normally distributed, n(d; µ, σ) (Problems 3 and 4).

Problem 4: The sample of distances Di between feature points measured in a sample of retina
are represented by a normally distributed, n(d; µ, σ), random variable d (Fig. 14b). The sample
size is n = 49 and the sample mean is d = 4.0. The standard variance, σ, of the population is
assumed σ = 0.2. Calculate:

(a) 85% confidence interval for µ


(b) 90% confidence interval for µ
S.N. Yanushkevich, Fundamentals of Biometric System Design 37

(c) 95% confidence interval for µ


(d) 98% confidence interval for µ

Compare the confidence intervals

Problem 5: How large is the size of sample considered in Problem 3, if we want to be:

(a) 90% confident that our estimate of µ is off by less than 0.05.
Solution: Using Equation 3, the sample size is
 2  2
z α2 × σ 1.645 × 0.3
Sample size n = = ≈ 100
e 0.05

(b) 95% confident that our estimate of µ is off by less than 0.05.
Solution: Using Equation 3, the sample size is
 2  2
z α2 × σ 1.96 × 0.3
Sample size n = = ≈ 138
e 0.05

(c) 99% confident that our estimate of µ is off by less than 0.05.
Solution: Using Equation 3, the sample size is
 2  2
z α2 × σ 2.575 × 0.3
Sample size n = = ≈ 225
e 0.05

Problem 6: How large is the size of sample in Problem 4, if we want to be:

(a) 85% confident that our estimate of µ is off by less than 0.5
(b) 90% confident that our estimate of µ is off by less than 0.5
(c) 95% confident that our estimate of µ is off by less than 0.5
(d) 99% confident that our estimate of µ is off by less than 0.5

Problem 7: Estimate the lowest error rate that can be statistically established with the following
number N of (independent identically distributed) comparisons:

(a) 90% confident that the lowest error rate p, for which the probability of zero errors in 30
trials is purely by chance
Solution: Using the Rule 5, the lowest error rate p is p = 2/30 = 0.07 or 7%
(b) 90% confident that the lowest error rate p, for which the probability of zero errors in 100
trials is purely by chance
Solution: Using the Rule 5, the lowest error rate p is p = 2/100 = 0.02 or 2%
S.N. Yanushkevich, Fundamentals of Biometric System Design 38

(c) 95% confident that the lowest error rate p, for which the probability of zero errors in 30
trials is purely by chance.
Solution: Using the Rule 5, the lowest error rate p is p = 3/30 = 0.1 or 10%
(c) 95% confident that the lowest error rate p, for which the probability of zero errors in 100
trials is purely by chance
Solution: Using the Rule 5, the lowest error rate p is p = 3/100 = 0.03 or 3%
Problem 8: Using the Rule of 30, determine the true error rate in the following experiments:
(a) 1 error is observed in 30 independent trials
(b) 1 error is observed in 100 independent trials
(c) 10 error is observed in 500 independent trials
(d) 50 error is observed in 1000 independent trials
Problem 9: Suppose that a device’s performance goal is to reach 1% false non-match rate, and
a 0.1% false match rate. Using the Rule of 30, estimate the number of genuine attempt trials
and impostor attempt trials.
Solution: 30 errors at 1% false non-match rate implies a total of 3,000 genuine attempt trials,
and 30 errors at 0.1% false match rate implies a total of 30,000 impostor attempt trials. Note
that the key assumption is that these trials are independent.
Problem 10: The distances Di between feature points measured in 100 fingerprints are repre-
sented by a normally distributed, n(x; µ, σ), random variable x with the sample mean x = 71.8
(Fig. 15a). Assuming a population standard deviation of σ = 8.9, does this seem to indicate
that the mean of distances is greater than 70? Use a 0.1 level of significance.
Solution:

Input data: x = 71.8, σ = 8.9, n = 100, µ = 70, and α = 0.05


Step 1: Formulate the hypotheses

H0 : µ = 70
H1 : µ > 70
Standard normal
distribution
n(z;0,1)
Step 2: Critical point for α = 0.1 is z0.05 = 1.645 (from the
table); critical region is defined as z0.05 > 1.645
Step 3: Critical point for input data (x = 71.8, σ = 8.9, n = z
100, and µ = 70) 0 1.645 2.02

Critical region z > 1.645


x−µ 71.8 − 70
z = √ = √ = 2.02
σ/ n 8.9/ 100

Step 4 Decision: Reject H0 and conclude that the mean is


greater than 70.
S.N. Yanushkevich, Fundamentals of Biometric System Design 39

Di

Di

(a) (b)

Fig. 15: The distances Di between feature points measured in a fingerprint (a) and face
(b) are represented by a normally distributed, n(d; µ, σ) (Problems 10 and 11).

Problem 11: The distances Di between feature points measured in 50 facial images are repre-
sented by a normally distributed, n(x; µ, σ), random variable x with the sample mean x = 7.8
(Fig. 15b). Assuming a population standard deviation of σ = 0.5, does this seem to indicate
that the mean of distances is greater or less than 78? Use a 0.01 level of significance.
Solution:
S.N. Yanushkevich, Fundamentals of Biometric System Design 40

Input data: x = 7.8, σ = 0.5, n = 50, µ = 8, and α = 0.01


Step 1: Formulate the hypothesises

H0 : µ = 8
H1 : µ 6= 8
Standard normal
distribution
0.01
Step 2: Critical points for α = 2 = 0.005 are z0.005 = n(z;0,1)

{−2.575; 2.575} (from the table); critical region is de-


fined as z0.005 < −2.575 and z0.005 > 2.575
z
Step 3: Critical point for input data (x = 7.8, σ = 0.5, n =
−2.83 −2.575 0 2.575
50, µ = 8, and α = 0.01)
Critical region Critical region
z < − 2.575 z > 2.575
x−µ 7.8 − 8
z = √ = √ = −2.83
σ/ n 0.3/ 50

Step 4 Decision: Reject H0 in favor of the alternative hypoth-


esis H1 : µ 6= 8

Problem 12: Evaluate the performance of a system that accept at least 5 facial images of impos-
tors as belonging to the database of 100, and reject 10 faces of persons enrolled in the database.

You might also like