Professional Documents
Culture Documents
X
nN
n1
F
1
n F
2
n
2
v
u
u
t
(2)
3.3. The Cosine u distance
The Cosine u distance equation (Eq. (3)) is a similarity measurement and is used
to measure the angle between two vectors. A cosine difference of 1.0 between two
faces establishes that those two faces have identical vectors of facial ratios. An
advantage of using this equation is that the range of values is bound from 1.0 to
+1.0 and useful comparisons are ranged from zero to one. A difference of zero is
indicative of a face that shows no correlation whereas a result of 0.5 is achieved by
random chance. Any negative result shows the face comparison produces an inverse
correlation:
Cos u
X
nN
n1
F
1
n F
2
n
F
1
k k F
2
k k
(3)
A comparison between two faces was deemed a true positive match (TP) if the
match was a correct match between the video image and photograph of the same
subject. A true negative match (TN) was one that excludes the faces and which was a
correct exclusion because it involved a video image and a photograph of two
different subjects. A false positive match (FP) was one which was an incorrect match
between a video image and a photograph of two different subjects and a false
negative match (FN) was one which is excluded but which was an incorrect
exclusion because it involved the video image and photograph of the same subject.
To answer the questions laid forth in Section 1, the three equations were applied
in the following four scenarios to test the comparison of Sample 1 faces to Sample 2
faces; within sample comparisons, between sample comparisons, error in landmark
placement, and the potential sample of photographs subject to manual verication.
4. Results
4.1. Within sample comparisons
To test if similar faces were separable from dissimilar faces
within a single sample the equations were applied so that every face
in a single sample was compared to itself and everyother face within
this sample. Each sample contained only one image of each face and
for this reason all that could be determined was the true negativity of
this collection of different faces. Therefore, no estimate of the degree
to which two same faces (true positives) would match when
captured at different times could be made from this data. The same
tests were carried out on Sample 1 (video) and then separately on
Sample 2 (photographs). Testing all combinations of pairs of faces
within each sample was important because it compared faces
acquired under the same capture conditions, allowing the tests to
ascertain if it were possible to discriminate between different (true
negative) faces. Therefore, in this experiment the primary source of
variability between faces should be attributable to differences in the
measured facial landmark ratios, i.e. generated by genuine face
shape differences, whilst the statistics of the remaining sources of
variability remain constant; same media, same operator placing
landmarks and same facial pose.
The similarity or dissimilarity between the faces in a single
sample is cross-checked by comparing the distributions of the
similarity statistics of Sample 1 to those of Sample 2. A Sample 1 to
Sample 2 cross-comparison of the statistics produced by matching
faces within their own samples can be used to in future to predict
how discriminable faces are when making comparisons between
these two samples. If both samples exhibit similar statistics, this
would be indicative that it is possible to distinguish faces between
Fig. 3. Linear measurements that created the ratios utilised in this study.
K.F. Kleinberg, J.P. Siebert / Forensic Science International 219 (2012) 248258 251
samples because any difference between faces would be a result of
the true difference in faces rather than a result of the different
media recording each of the two samples of images. Results are
summarised in Fig. 4ac and are illustrated by superimposing the
normal distribution curves and similarity density histograms of
the two samples. In Table 3a the standard deviation scaled
difference between the Sample 1 and Sample 2 means indicates the
difference in statistics between media, the cosine distance by far
exhibiting the greatest difference.
In order to equalise the absolute ranges of the feature vector
values in each sample and address the observed difference in the
statistics of the comparisons of Sample 1 and Sample 2, the
equations were completed using the application of Z-normalised
ratio values, illustrated in Fig. 4ac. Each element, F(n) in the
measurement vector F is expressed by a population of measure-
ment ratios within a sample, Z-normalisation potentially enhances
range of variation (and accentuates any differences) about the ratio
mean of this sub-population of ratios, allowing small differences in
the data to become more apparent. This was accomplished by
dividing the mean subtracted element by the sample standard
deviation for the particular ratio (Eq. (4)). Z-normalisation can be
applied to all three of the equations:
Z-normalized element F
Z
n
Fn m
Fn
s
Fn
(4)
Therefore, by applying Z-normalisation it becomes possible to
force the distributions into a standardised range. Taking the Z-
normalised cosine distance as an example, the result of this
process, driving the means and standard deviations together and
reducing difference between the variance-scaled means is
illustrated in Table 3b.
4.2. Between sample comparisons
Results from conducting the equations were illustrated using
distribution histograms, separating TP faces from TN faces, Fig. 5a
c. These normal histogram distribution curves of TP faces and TN
faces were superimposed to determine if it was possible to
distinguish between faces in the two groups. The amount of
overlap shows the possibility of achieving either a FP or FN face
match, also known as the rate of misclassication. The smaller the
area, the smaller the chances of obtaining a FP or FN face match.
In the graphs, TP face matches are represented by the dotted
lines and the solid lines represent TN face matches. In order to
ensure equal numbers of faces in the two samples, the 39 faces in
Sample 2 that were not in Sample 1 were not included in this
Table 3b
Mean, standard deviation, and standard deviation scaled difference between the Sample 1 and Sample 2 means of Z-normalised: cosine distance for Sample 1 and Sample 2.
Comparison method Sample Sample mean m Sample standard deviation (SD)
m
Sample
SD
Sample
m
Sample 1
SD
Sample 1
m
Sample 2
SD
Sample 2
N, number of samples
Z-normalised Cos(u) 1 0.0113 0.2460 0.04594 0.01474 3160
Z-normalised Cos(u) 2 0.0077 0.2468 0.0312 7021
Table 3a
Mean, standard deviation, and standard deviation scaled difference between the Sample 1 and Sample 2 means of unnormalised: mean absolute distance (MAD), Euclidean
distance and cosine distance for Sample 1 and Sample 2.
Comparison method Sample Sample mean m Sample standard deviation (SD)
m
Sample
SD
Sample
m
Sample 1
SD
Sample 1
m
Sample 2
SD
Sample 2
N, number of samples
MAD 1 0.08354 0.02316 3.607 0.561 3160
MAD 2 0.08507 0.02041 4.168 7021
Euclidean distance 1 1.015 0.4121 2.463 1.042 3160
Euclidean distance 2 1.149 0.3278 3.505 7021
Cos(u) 1 0.9859 0.01314 75.03 74.955 3160
Cos(u) 2 0.9887 0.006592 149.985 7021
Fig. 4. (ac) Summary of the conditions imposed and results achieved in the within
sample comparisons of faces. Histograms and superimposed mean and standard
deviation of unnormalised: mean absolute distance (a), Euclidean distance (b) and
cosine distance (c) within sample comparisons for Sample 1 (dotted lower line) and
Sample 2 (solid upper line).
K.F. Kleinberg, J.P. Siebert / Forensic Science International 219 (2012) 248258 252
analysis and every face in Sample 1 was compared to every face in
Sample 2.
The mean absolute difference, the Euclidean distance and the
Cosine u distance equations (all distance measures Z-normalised)
were applied in the between sample comparisons. Superimposed
normal histogram distribution curves of TP and TN face matches
were used to illustrate the discrimination between the two groups.
In general, a slightly narrower distribution was seen for the TP
faces. This was most likely because the distribution contained only
TP matches and therefore the data should be centred on a smaller
range of values. The amount of overlap between the TP and TN face
matches correlated to the possibility of achieving either a FP or FN
face match.
Superimposing the normal curves to demonstrate the separa-
tion between TP and TN face matches, the Cosine u distance (Z-
normalised) equation produced the smallest amount of overlap
and of the three equations conducted was determined to be best
equation to test the discrimination between faces of two samples.
Examination of the superimposed curves showed approximately a
30% chance of the best match between compared faces corre-
sponding to a correct identication. The TP distribution is very
small and difcult to see on the graphs. However, it is still possible
to see the 0.7 threshold emerging for the Cosine u distance with
careful observation of Fig. 5c.
Table 4 illustrates that following Z-normalisation, the cosine
distance provides the greatest separation of TP comparisons from
the TN comparisons, based on the difference between the TP and
TN standard deviation scaled means, respectively. The conclusion
made from this investigation was that the cosine distance equation
was the best predictor of face discrimination tested thus far and
was the sole equation used to test the error in landmark placement
and to determine the sample of images from the database that
could be narrowed down for further verication by an operator.
In pattern matching based on the cosine distance between two
unit vectors, the returned measure can be interpreted as a match
probability. Whilst a cosine distance of 1 indicates a 100%
probability of the compared vectors being the same, and 0
indicates zero probability, a distance of 0.5 indicates the 50%
chance level of correlation between compared vectors. The mean
of the TP distribution barely reaches this 50% level, although this of
course indicates that approximately half of the TP comparisons will
at least exceed a chance match value. A standard deviation of
2.37 about the TP distribution mean of 0.48 indicates that over
17.5% of the TP matches will exceed a 70% chance of producing a best
closest match for the database tested.
4.3. Error in landmark placement
A small inter-operator study was carried out, to assess the
inuence of landmark placement conducted by multiple operators.
It has been reported that landmark placement, tested on 3D images
in a clinical setting, reveals that average operator error can vary
widely [30]. Therefore the effect of landmark placement error is
important to test because although landmark placement on images
in the two samples used in this study was conducted by a single
operator, this would not likely occur in practice.
Facial landmarks were placed on a total of six video images,
chosen at random, six times each by ve different operators. One
operator had previous experience in using the equipment and
Table 4
Summary of the conditions imposed and results achieved in the between sample TP and TN face comparisons. Mean, standard deviation, and standard deviation scaled
difference between TP and TN comparisons for Z-normalised vectors: mean absolute distance, Euclidean distance and cosine distance TP and TN data sets.
Comparison method
(all Z-normalised)
Sample Sample mean m Sample standard
deviation (SD)
m
Sample
SD
Sample
m
TP
SD
TP
m
TN
SD
TN
N, number
of samples
MAD TP 0.7771 0.2234 3.479 0.815 80
MAD TN 1.1053 0.2574 4.294 6320
Euclidean distance TP 7.594 2.258 3.3632 0.9660 80
Euclidean distance TN 10.572 2.442 4.3292 6320
Cos(u) TP 0.4822 0.2035 2.370 2.3950 80
Cos(u) TN 0.0061 0.2389 0.02553 6320
Fig. 5. (ac) Summary of the conditions imposed and results achieved in the
between sample TP and TN face comparisons. Results are illustrated by the
superimposed normal histogram curves showing the amount of overlap in TP
(dotted lower line) and TN faces (solid upper line). Mean absolute distance (a),
Euclidean distance (b) and cosine distance (c).
K.F. Kleinberg, J.P. Siebert / Forensic Science International 219 (2012) 248258 253
knowledge of the landmarks; landmark locations were studied
using the denitions provided in [18] and [24]. The remaining
operators had no experience in using the equipment and no
previous knowledge of anthropometric landmarks. The inexperi-
enced operators were given a list of landmark denitions (Table 1)
adapted from the literature [18,24] as well as a single photocopy of
an enlarged male face (A4 sized), front facing, with previously
placed landmarks to use as a guide. The same equipment was used
by all operators and each operator conducted their landmark
placement of images in a single day. Using the Cosine u distance
equation, comparisons of re-landmarked images were analysed
rst from the single experienced operator and second, from all
operators (Fig. 6a and b).
The Cosine u distance (Z-normalised) equation was used to
compare the re-landmarked images because, when applied in the
comparison of faces between samples, it was found to be the
equation in which the statistics of the TP and TN populations were
the most separated. Each face in the subset sample was compared
to every other face in the subset sample and resulting data was
illustrated as superimposed normal histogram curves of TP and TN
face matches. It was hypothesised that conducting an inter-
operator test, using high resolution research material, but
completed by inexperienced operators, would produce a greater
amount of variation than from an experienced operator and this
hypothesis was tested and found to hold.
The effect that inexperienced operators had on the separation
rate of TP and TN faces was compared to that of an experienced
operator and is summarised in Fig. 6a and b and Table 5. Compared
to that of the experienced operator, the effect of landmark
placement by inexperienced operators can clearly be seen in the
separation rates of TP and TN face matches: the mean for TP
comparisons collapses from 0.8 for the experienced operator to
0.44 for the mix of experienced and in-experienced operators.
The TP vs. TN standard deviation scaled means separation is more
than double for the experienced operator compared to that of the
mix of operators. A similar observation can be made by inspecting
the superimposed histograms for the TP and TN comparisons for
the experienced operator vs. the mix of experienced and
inexperienced operators. Although a bimodal result is generated
by the experienced operator for both TP and TN, a much greater
separation of the TPTN distributions is observed. However, for all
operators a typical averaged picture emerges and the effect of the
expert can be seen as a small additional bump at the top of the TP
distribution.
The most important point witnessed in Fig. 6a and b was to
observe the strong effect that the experienced operator had in
creating a larger separation of TP and TN face matches. As multiple
experienced operators were not tested, it cannot be stated that this
difference in separation rates between the experienced operator
and all operators was due to the experience of the operators or
instead, the effect that will naturally occur with multiple
operators. This could be tested by conducting a study using a
pool of experienced operators. A further study analysing the
distribution achieved from the re-landmarked images of each
operator after applying the Cosine u distance (Z-normalised)
equation could determine if any of the inexperienced operators
also achieved the same strong separation rate as the experienced
operator. An inexperienced operator producing a similar degree of
separation to the experienced operator would signify that the large
separation rate produced from all operators was caused by the
inclusion of multiple operators rather than their experience.
However, from the literature in [31], it can be predicted that the
spread from a single inexperienced operator would be larger than
an experienced operator.
4.4. Potential sample of photographs subject to manual verication
The amount of overlap between the distributions of TP and TN
faces illustrates an approximation of the misclassication rate (see
Fig. 5ac). However, given the task of comparing a suspects image
to a large database of identity photographs, the ability to decrease
the number of possible face matches could potentially save
signicant numbers of investigation hours. This smaller sample of
suspect photographs could then be more closely scrutinised by an
expert. For this analysis, each face in Sample 1 was compared to
Fig. 6. (a and b) Superimposed normal curve histograms illustrating TP (dotted
lower line) and TN (solid upper line) face comparisons of the Cosine u (Z-
normalised) distance equations in six re-landmarked images from Sample 1 using
one experienced operator (a) and multiple operators (b).
Table 5
Mean, standard deviation, and standard deviation scaled difference between TP and TN comparisons in landmark placement error study: mean absolute distance, Euclidean
distance and cosine distance for TP and TN data sets illustrating TP and TN face comparisons in six re-landmarked images from Sample 1 using one experienced operator and
multiple operators.
Comparison method: cosine distance (all Z-normalised) Sample Sample mean m Sample standard
deviation (SD)
m
Sample
SD
Sample
m
TP
SD
TP
m
TN
SD
TN
N, number
of samples
One experienced operator TP 0.8029 0.1489 5.392 5.8224 90
One experienced operator TN 0.1666 0.3871 0.4304 540
Multiple operators: experienced and inexperienced TP 0.4409 0.2655 1.6606 2.01351 2610
Multiple operators: experienced and inexperienced TN 0.08657 0.2453 0.3529 13,500
K.F. Kleinberg, J.P. Siebert / Forensic Science International 219 (2012) 248258 254
each face in Sample 2 for a total of 80 comparisons. The Cosine u (Z-
normalised) distance equation was used and the resulting values
were placed in descending order, noting the rank of the TP. The best
match was dened as the match value that returned a Cosine u
value that was highest or closest to 1.0. This was used to determine
within a condence range given a best match value how many
additional faces in the database would need to be veried before
the true positive match was found.
Best match values were placed in intervals of 0.1. The mean
rank of the TP, SD, and 2SD condence interval for each match
interval was found and results shown in Table 6. In this instance
the condence interval says that for within a given condence
range, how many database images should be looked at in total.
Results in Table 6 indicate that given match values of 0.7 the best
match from the database is also likely to be the TP face. This result
is consistent with the observed degree of overlap between the TP
and TN distributions shown in Fig 5. A match threshold of 0.7 is not
an unreasonably high value to set, given that a distance of 0.5
indicates the 50% chance level of correlation between compared
vectors. A larger sample of images should be tested to determine if
results consistent with those presented here are produced.
5. Discussion
Using high resolution photographic research material, the
object of the study was to assess if a facial anthropometric feature
vector could be utilised to distinguish between individuals of a
similar age group, ancestry and sex. Given a database of subjects,
knowledge of the type of information gathered in this study may
help in future to narrow down the number of possible suspects in
an investigation. The technique presented here entailed analysing
vector comparisons to differentiate between images of two
samples. The feature vector was utilised in three types of equations
testing the differences between faces in the samples. Normal-
isation was applied to the ratio values as a way to equalise the
feature vector values in each sample and account (to some degree)
for the statistics that different camera parameters would produce.
Z-normalisation enhances any differences between means and
makes the interpretation of the data more straightforward
allowing small differences in the data can to be more simply
seen. We found that the face matching technology investigated in
this study can assist in a database search; however, it does not
provide an unequivocal means of conrming facial identications
suitable to use in court. Therefore, the focus of future work should
concentrate on the potential for this approach to extract facial
information improving the search of databases and leaving
humans as the ultimate authenticator.
The rst step to answering the objectives laid forth in the
introduction was to evaluate each sample of images to determine if
once the equations were applied, any differences could be seen
between the two samples. Testing faces against those found in the
same sample is important because it allows the equations to
ascertain if there are any differences between faces which fall
under the same conditions. This means that other than the
possibility of slight changes in facial expression the facial ratios
will be the only changeable variable between faces as all other
variables remain constant; same media, same operator placing
landmarks and same facial pose.
Once samples were looked at individually, a between sample
comparison was conducted to determine how distinguishable the
faces were in the two samples. Once the respective equations were
conducted, superimposed normal histogram distribution curves of
true positive faces and true negative faces were used to illustrate
the discrimination of the two groups. In general, a narrower
distribution was seen for the true positive faces. This was because
as the distribution contained only true positive matches, the data
should be centred on a smaller range of values. The amount of
overlap correlated to the possibility of achieving either a false
positive or false negative face match.
Although other research used the squared Euclidean distance to
measure the likeness between pairs of faces [32], we found by
superimposing the normal curves to demonstrate the separation
between true positive and true negative faces, the Cosine u
distance (Z-normalised) equation produced the least amount of
overlap between true positive faces and true negative faces when
statistics of the two samples were known. The match values of true
negative faces in the superimposed histogram normal curves begin
to trail off at 0.7, indicating that although it is still possible to
achieve a true negative identication above this value, it is likely
that a returned match score of below 0.7 will result in a true
negative face after closer examination. Although this result
occurred in this study, it may not be replicated with a larger test
database. The investigations undertaken in this study to determine
if it is possible to discriminate between individuals of two samples
using a multi dimensional facial feature vector found that the
Cosine u distance was the best discriminator this but could further
be improved upon by administering a more comprehensive
statistical analysis.
A small inter-operator study was carried out, to assess the
inuence of landmark placement conducted by multiple operators.
This is important to test because although landmark placement on
all images used in the comparative process of this study was
conducted by a single operator, this would not likely be the case in
the real world. Landmark placement has been tested by other
researchers on 3D images in a clinical setting and it was suggested
that average operator error varies widely [30]. Using a digital
sliding calliper to measure photographs, researchers carried out an
intra observer study to test reliability of measurements and results
showed a low reliability in measurements of ls-sto and n-sn [33].
The current analysis was conducted with one experienced operator
but the remaining operators were inexperienced It would be
benecial to analyse this data further in an inter-operator study
using experienced operators located in different graphical regions
because this scenario would be more likely as a police procedure.
Experience was shown to be a beneting factor when the inter-
operator variation in taking standard skeletal measurements was
Table 6
Interval showing two standard deviations of how many images in the database should be manually investigated.
Interval of best
match values
n (number of best
matches in the interval)
Mean of TP rank SD of TP rank Min of TP rank Max of TP rank Number of images to
manually investigate
in database (mean + 2SD)
0.900.99 0 N/A N/A N/A N/A N/A
0.800.89 2 1 0 1 1 1
0.700.79 7 1 0 1 1 1
0.600.69 23 3.6 7.6 1 37 19
0.500.59 30 8.9 13.5 1 65 36
0.400.49 16 16.7 28.5 1 110 74
0.300.39 2 2 1.4 1 3 5
K.F. Kleinberg, J.P. Siebert / Forensic Science International 219 (2012) 248258 255
tested with a panel of experienced forensic anthropologists and
found to be minimal [31]. For evidence interpretation in the court
of law, any variation that occurs in the data as a result of multiple
operators placing landmarks should be small.
Results achieved by the experienced operator illustrated a
strong separation rate of true positive and true negative faces;
however, once the data from the inexperienced operators were
included, the distribution no longer depicted the strong separation.
Locating exact landmark location on a photograph is difcult, even
for experienced operators, and could account for the wide variation
in measurement in this study. A further study analysing the
distribution achieved from the re-landmarked images of each
operator after applying the Cosine u distance (Z-normalised)
equation could determine if any of the inexperienced operators
also achieved the same strong separation rate as the experienced
operator. An inexperienced operator producing a similar degree of
separation to the experienced operator would signify that the
small separation rate produced from all operators was caused by
the inclusion of multiple operators rather than their experience.
However, from the literature [31], it can be predicted that the
spread from a single inexperienced operator would be larger than
an experienced operator.
Graphing the TP rank vs. the best match value can aid to
determine an approximation of how many images from the
database would need to be manually examined to nd the TP face,
assuming the TP face was in the database. Once the unknown
image has been tested against all database images and the best
match noted, the graph could be consulted to nd where the TP
was ranked and this gives an estimate of how many images should
be given additional verication. Both the gures showing TP rank
against TP match values and best match values indicate that if a
match occurs at 0.7 and above, then it may likely be a TP match and
is worthy of further examination. Values below 0.7 still produce TP
matches; however, there are also poor TP ranks in the same area.
This is an indication that a larger set of images should be tested in
order to get a more accurate graph. The Z-normalised Cosine u
equation used to obtain the match values can also be referred to as
a statistical correlation or match probability and thus a match
value of 0.5 corresponds to chance. Therefore it is reasonable to set
0.7 as a good match threshold. This was conrmed by binning the
matches into intervals. Finding the condence interval based upon
the mean and standard deviations of the rank of TP match scores
for a given interval agreed with match scores of 0.7 and above were
considered good scores for returning a TP face. The conclusion from
studying this small sample of images is that if a match value is
returned above 0.7, there is a good chance that the best match will
be a TP, however, a larger sample of images in both databases
should be tested to determine if consistent results are produced as
well as giving a more accurate approximation of images.
The rate of misclassication tested thus far takes into account
the TP match between Sample 1 and Sample 2 plus any matches
deemed closer. However, in cases where the TP face was the best
match, it does not inform the researcher how close the next best
match was to the TP match. A TP match will hold more weight as
evidence when the next best match proves to be of signicant
distance away. Future work would determine the distance
between the TP match and the next best match by nding the
log likelihood ratio between matches. This is the log of the ratio of
the second best match to the best match. The full analysis would
include nding the log likelihood ratio between each match based
upon a descending order of matches that would show their relation
to each other. A poor log likelihood ratio could be indicative of the
necessity for more images to be tested. Any thoroughly tested and
validated method of identication can only improve the adminis-
tration of justice. The basic principle of matching evidence from an
unknown suspect to a database of known individuals, as the
methods in this study were designed, could be used for police
casework as investigations are likely to be conducted in the same
manner. However, the method tested in this study of utilizing
anthropometric ratios to compare faces is not yet at the stage
where it would benet prosecutors cases in the judicial system. At
present, identifying individuals through a comparison of anthro-
pometric measurements is restricted but can possibly be of benet
if used in conjunction with evidence found at the scene and may
act as a form of corroborating evidence that when combined with
other evidence would serve to strengthen the identication. Using
a substantially different methodology, research reported by Davis
et al. [32], broadly support the same conclusions as described in
our paper. Although the chosen landmarks in the two studies were
similar, there were differences in the way these landmarks were
utilised, for example, different distances between landmarks
measured, different proportions analysed and the inclusion of
angular measurements.
Several limitations were encountered over the course of this
study. First, the number of images for comparison did not provide
the ideal sample size in relation to the number of elements tested.
The method was tested on images of a similar physiognomy, as
occurs in practice, and these were the images available that suited
the criteria.
The landmarks, linear measurements, and to a lesser extent
ratios, were chosen for this study based on their use in previous
research. The number of landmarks chosen was not in question as
it was believed that enough landmarks were selected to gain an
acceptable representation of the face. The quantity of ratios was
selected for this same reason. However, even though the number of
ratios tested was considerably smaller than the possibilities based
on the number of landmarks, it is not known if all 59 ratios were
paramount to the analysis. Incorporating the 1,772,892 ratio
possibilities into a feature vector is something that could be tested
in further study by standard statistical methods such as a PCA and
factor analysis. It would then be necessary to determine which of
those would be benecial and which would hinder the comparison
process. Depending on the pose of the image, it would be suitable
to know which ratios were valuable.
Another limitation of the study was that a more comprehensive
statistical analysis could have further improved the data analysis. A
common precursor to further statistical investigation is a
multivariate technique called a Procrustes analysis. A Procrustes
analysis would match landmarks or shapes from two sets of data
removing the variation of translation, rotation and scaling in the
data so that the data becomes a single frame of reference.
Statistical tests to compare samples such as Students t-test [34] or
Hotellings, which is the multivariate equivalent of the t-test [35],
could be applied to the data investigated in this study; however,
the results achieved thus far can be interpreted and determined
from looking at the data directly. Additional analysis may focus on
creating receiver operating characteristics (ROC) graphs [36]. This
technique separates classiers based on their performance; in this
case true positive and false positive face matches. This is helpful to
visualise the overall efcacy of the comparison method because it
allows the operator to appraise how many false positives matches
are likely to occur with true positives matches.
This study tested the outcome of comparing two sets of 2D
images in a best-case scenario. Video images were of a high
resolution compared to that of typical grainy surveillance video
often found and all faces were positioned to the front with neutral
expressions. Best-case scenario images were utilised to set a
benchmark; if identications could not be made on these images
then there was less hope for identication based on images of a
lower quality. Two-dimensional images were used because they
are more accessible compared to 3D images and can be directly
obtained from surveillance video without the liability of
K.F. Kleinberg, J.P. Siebert / Forensic Science International 219 (2012) 248258 256
potentially changing data through manipulation into a 3D image.
Although not utilised in this study, the creation of 3D images has
its place and is important to the eld of facial recognition to
address the fundamental problem of rotation variation amongst
images. A great deal of effort by researchers has been spent on
creating a 3D facial image [7,8,37,38], but 3D images are not
without their problems as well. Discouraging results were seen
when comparing a 2D image with that of a 3D laser scan image
by comparing the locations of the X and Y coordinates of a
maximum of seven landmarks [39]. The goal of this was to
manipulate the head positioning to mirror that of the unknown
individual [39]. The set of comparisons was small and combined
with the fact that only seven landmarks were used could have
inuenced the poor outcome. In reality, it may not be feasible to
take a 3D image scan of a possible suspect as the subject will
probably not agree, but using photographs taken of the suspect
at several different angles to create a 3D image of the suspect
would be possible [7,8].
Another problem effect which may affect the comparison
between individuals from 2D images, apart from matching head
positions, is distortion. The effect of distortion on photographs may
affect the ability to compare images taken at different times with
different camera parameters. The focal length of the camera lens
and the subject distance from the camera are factors that
contribute to distortion, however, lens distortion can be corrected
by calibration. Farkas found that the greatest effect from distortion
was shortening of the upper third of the face and that one reason
the nasion and stomion landmarks proved to be accurate was
because they were on the same focusing plane [18].
In a photogrammetric analysis, the worst-case scenario is when
nothing is known about the cameras that captured the images. In
this type of circumstance there is a strong danger of generating any
comparison to t, whereas compelling comparisons can be made
from calibrated images. The camera parameters were unknown for
both sets of images utilised in the present study. In practice it is
likely that camera parameters will be known for at least one set of
images. In a photogrammetric analysis, Lee et al. found it necessary
to be aware of the distortion parameter of a camera lens in addition
to obtaining a sufcient number of calibration points in order to
effectively measure the height of a person standing in a xed
location [40]. Camera parameters of video images obtained from
surveillance cameras can be acquired using basic photogrammetry
skills. In an ideal comparison, nothing would deviate between the
two cameras parameters, or at most only the focal length would be
varied.
A signicant problem with distinguishing between two 2D
images is that the face is a complex 3D structure and any
comparison protocol should therefore be designed with that in
mind. Future research should concentrate on creating a 3D
reconstruction of the suspect from their police identity photograph
and this 3D image could then be matched to the facial position of
the individual in the video image. Creating a 3D image should
demonstrate a unique t factor and superimposing a 2D image
(individual on video) onto a 3D image (suspect photograph) will
clearly exhibit the distribution of the tting error vs. the pose angle
and individuality of the person. It is possible to extract 3D
landmarks from video by matching two or more frames and
applying photogrammetry [41], however, facial expression can be
a confounding factor.
The general conclusion derived from the investigations
undertaken in this study was that these tests do not offer a
signicant and infallible method of discriminating between
individuals of two samples. At best they may offer corroborating
evidence and could be used to narrow down a list of suspects as
long as other evidence was available. However, it should be noted
as digital camera technology improves, the potential for capturing
high quality imagery is becoming more relevant and favours this
approach. The cases tested in this research were done as a best case
scenario; facial poses in both samples were faced frontal, images
from both samples were taken on the same day, landmarks on
images from both samples were placed by the same operator and
the resolution of video was high. These best case samples would
most likely not be available to forensic scientists but the
advantages for further testing on worst case scenarios are non-
existent if the discriminatory power is not sufcient with the best
cases.
Acknowledgements
We would like to thank A. Mike Burton in the School of
Psychology at the University of Aberdeen for providing the police
photographs used in this research.
References
[1] D.L. Lewis, Surveillance video in law enforcement, J. Forensic Ident. 54 (2004)
547559.
[2] S.A. Cole (Ed.), Suspect Identities: A History of Fingerprinting and Criminal
Identication, Harvard University Press, Cambridge, MA, 2002 , pp. 3259,
140146.
[3] A.A. Moenssens, Fingerprint Techniques, Chilton Book Company, Philadelphia,
1971.
[4] N.L. Rogers, et al., The belated autopsy and identication of an eighteenth century
naval hero-The saga of John Paul Jones, J. Forensic Sci. 49 (2004) 10361049.
[5] R (on the application of Taj) v Chief Immigration Ofcer, Midlands Enforcement
Unit, Queens Bench Division (Administration Court), CO/1084/99, 29 January
2001, ed, 2001.
[6] Crim. L.R. 1999, SEP, ed, 1999, pp. 750751.
[7] J.P. Siebert, S.J. Marshall, Human body 3D imaging by speckle texture projection
photogrammetry, Sensor Rev. 20 (2000) 218226.
[8] C.W. Urquhart, J.P. Siebert, Towards Real-time Dynamic Close Range Photogram-
metry, Presented at the SPIE Videometrics II, Boston, USA, 1993.
[9] R.A. Halberstein, The application of anthropometric indices in forensic photogra-
phy: three case studies, J. Forensic Sci. 46 (2001) 14381441.
[10] Q. Chen, W. Cham, 3D model based pose invariant face recognition froma single
frontal view, Electron. Lett. Comp. Vis. Image Anal. 6 (2007) 1326.
[11] F.J. Huang, et al., Pose invariant face recognition, in: Fourth IEEE International
Conference on Automatic Face and Gesture Recognition Grenoble, France, 2000.
[12] U. Park, A.K. Jain, Face matching and retrieval using soft biometrics, Inf. Foren. Sec.
IEEE Trans. 5 (2010) 406415.
[13] Databases. Available: 8 June http://www.interpol.int/Public/ICPO/FactSheets/
GI04.pdf.
[14] Reoffending of adults: results from the 2006 cohort England and Wales,
Ministry of Justice Statistics bulletin, 4 September 2008.
[15] K.F. Kleinberg, et al., Failure of anthropometry as a facial identication technique
using high-quality photographs, J. Forensic Sci. 52 (2007) 779783.
[16] K.V. Mardia, et al., On statistical problems with face identication from photo-
graphs, J. Appl. Stat. 23 (1996) 655675.
[17] V. Bruce, et al., Verication of face identities from images captured on video, J.
Exp. Psychol. Appl. 5 (1999) 339360.
[18] L.G. Farkas, Anthropometry of the Head and Face, 2nd ed., Raven Press, Ltd., New
York, 1994.
[19] R. Purkait, Anthropometric landmarks: how reliable are they? Anthropometric
landmarks, Med. Leg. Update 4 (2004) 133140.
[20] N. Fieller, Statistical facial identication. 2006 [cited 2007, 12 September];
available from: http://nickeller.staff.shef.ac.uk/seminars/faces04-10-06.pdf.
[21] M. Evison, R. Bruegge, The magna database: a database of threedimensional facial
images for research in human identication and recognition, Forensic Sci. Com-
mun. (April) (2008).
[22] I. Craw, et al., How should we represent faces for automatic recognition, IEEE
Trans. Pattern Anal. Machine Intelligence 21 (1999) 725735.
[23] K. Okada, C.V.D. Malsburg, S. Akamatsu, A pose-invariant face recognition system
using linear pcmap model, in: Proceedings of IEICE Workshop of Human Infor-
mation Processing, Okinawa, 1999.
[24] J.C. Kolar, E.M. Salter, Craniofacial Anthropometry Practical Measurement of the
Head and Face for Clinical, Surgical and Research Use, Charles C Thomas,
Springeld, IL, 1997.
[25] K.F. Kleinberg, P. Vanezis, Variation in proportion indices and angles between
selected facial landmarks with rotation in the Frankfort plane, Med. Sci. Law 47
(2007) 107116.
[26] G. Porter, G. Doran, An anatomical and photographic technique for forensic facial
identication, Forensic Sci. Int. 114 (2000) 97105.
[27] M. Yoshino, et al., Computer-assisted facial image identication systemusing a 3-
D physiognomic range nder, Forensic Sci. Int. 109 (2000) 225237.
[28] T. Catterick, Facial measurements as an aid to recognition, Forensic Sci. Int. 56
(1992) 2327.
K.F. Kleinberg, J.P. Siebert / Forensic Science International 219 (2012) 248258 257
[29] I.L. Dryden, K.V. Mardia, Statistical Shape Analysis, Wiley-Blackwell, West Sussex,
1998.
[30] A. Ayoub, et al., Validation of a vision-based, three-dimensional facial imaging
system, Cleft Palate: Cran. J. 40 (2003) 523529.
[31] B.J Adams, J.E. Byrd, Interobserver variation of selected postcranial skeletal
measurements, J. Forensic Sci. 47 (2002) 11931202.
[32] J.P. Davis, T. Valentine, R.E. Davis, Computer assisted photo-anthropometric
analyses of full-face and prole facial images, Forensic Sci. Int. 200 (2010)
165176.
[33] M. Roelofse, et al., Photo identication: facial metrical and morphological features
in South African males, Forensic Sci. Int. 177 (2008) 168175.
[34] B. Murphy, R.D. Morrison, Introduction to Environmental Forensics, Academic
Press, 2007.
[35] D. Sheskin, Handbook of Parametric Nonparametric Statistical Procedures, Chap-
man Hall/CRC, 2007.
[36] T. Fawcett, An introduction to ROC analysis, Pattern Recogn. Lett. 27 (2006)
861874.
[37] D. DeCarlo, et al., An anthropometric face model using variational techniques, in:
Proceedings of the 25th Annual Conference on Computer Graphics and Interactive
Techniques, 1998, pp. 6774.
[38] C. Zhang, S.F. Cohen, 3-D face structure extraction and recognition from images
using 3-D morphing and distance mapping, IEEE Trans. Image Proc. 11 (2002)
12491259.
[39] M.I.M. Goos, et al., 2D/3D image (facial) comparison using camera matching,
Forensic Sci. Int. 163 (2006) 1017.
[40] J. Lee, et al., Efcient height measurement method of surveillance camera image,
Forensic Sci. Int. 177 (2008) 1723.
[41] H.C. Longuet-Higgins, A computer algorithmfor reconstructing a scene fromtwo
projections, Nature 293 (1981) 133135.
K.F. Kleinberg, J.P. Siebert / Forensic Science International 219 (2012) 248258 258