You are on page 1of 6

Technical Report: Bat Skulls Classification Based on 2D Shape Matching

Xufeng Han
Department of Computer Science, SUNY Stony Brook.
xufhan@cs.sunysb.edu
February 3, 2010

Abstraction
A textbook method in computer vision—shape matching using shape context[1]—is applied to the clas-
sification of bat skulls. Distances between pairs of shapes are calculated and used by an nearest neighbor
classifier. It achieves an 85% classification accuracy with leave-one-out cross-validation. Though features
used in this method are not from biological knowledge, it is suggested by the experiment results that
some of those knowledge is helpful in feature points selection.

1 Shape Matching Approach


Belongie et al proposed a general shape matching approach that has three stages[1]:
1. Solve correspondence between two shapes;
2. Solve a transformation from one shape to the other based on the correspondence;
3. Compute a distance as the weighted sum of the matching error between corresponding points and
a measure of the aligning transformation.
An iterative process for the first two stage is suggested, in which new correspondence will be solved from
the transformed image, in hopes that it is better than the first one. From the new correspondence we
can get a new transformation. After a fixed number of (6 in our experiment) such iterations, it stops and
calculates a distance based on the lastest correspondence and the lastest transformation.

1.1 Shape as a point set


The 2D shape is modeled as a point set P = {p1 , p2 , . . . , pn }, usually consisting random densely sampled
points on the contour of the object or the edge map of a gray scale image as is shown in Figure 1.

Figure 1: Random points sampled from the contours of two spactially normalized skulls. Coordinates are
pixel positions.

1
1.2 Shape context
Shape context is a descriptor for any of the point (called reference point) in the point set P . It is a vector
expressing the configuration of the entire shape relative to the reference point. This vector is defined as
the histogram hi of the relative coodinates of the remaining n − 1 points[1],

hi = |{qi 6= pi : (qi − pi ) ∈ bin(k)}|, (1)

where the bins are uniform distributed in log-polar space thus giving more weight to points that are close
to pi .
For two points pi and qj on two shapes. The distance between their descriptors C(pi , qj ) is given by
χ2 distance
K
1 X [hpi (k) − hqj (k)]2
C(pi , qj ) = , (2)
2 hpi (k) + hqj (k)
k=1

where hpi (k) and hqj (k) depict the K-bin normalized histogram defined in Equation (1).
Shape context is intrinsically translation-invariant. It can be scale-invariant if we normalize the radial
distance by the mean distance between all the pairs in point set P . Rotation-invariance can be achieved
if needed. In our experiment, we spatially normalized the images in the preprocessing step so that there
will not be significant rotation between two shapes, so rotation-invariant property is not necessary. Shape
context is also reported to be empirically non-sensitive to small geometrical distortions [1].

1.3 Bipartite graph matching


Assume the correspondence between point sets of two shapes is a permutation π that gives a one-to-one
mapping from pi to qπ(i) . Then the best correspondence is defined to be the permutation that minimizes
the total cost of matching, X
H(π) = C(pi , qπ(i) ). (3)
i
3
This matching problem can be solved in O(N ) using Hungarian method [3] with the input being the
cost matrix.
In the case when sizes of the point sets are not equal, dummy points are introduced to match the extra
points. The cost of matching with a dummy point is set to be a fixed value d . Another benefit coming
with the dummy points is that improper mapping that has high cost edges will be less favorable, because
in that case the algorithm tends to match those points with dummy points instead. The points that
match dummy points will not be used for estimating the transformation, however they are kept for the

Figure 2: (a) The correspondence found by Hungarian method. There are 44 matches between real points.
(b) The grid points show the interpolation of the TPS function computed based on the correspondence.
The blue markers are supposed to be moved to the corresponding red markers via the transformation.
2
next round’s correspondence searching in hopes that they can match real points after the transformation.
The correspondence is visualized in Figure 2(a).

1.4 Transformation
Given the correspondence we then try to compute a 2D transformation that moves points in set P to
their counterpart in set Q. Following [1], thin plate spline (TPS) is used to model the transformation.
Specifically two TPS functions are used for transforming the x- and y-coordinates,

T (x, y) = (fx (x, y), fy (x, y)). (4)

The form of fx and fy and the regression details can be found in [1]. The transformation is visualized in
Figure 2(b).

2 Dataset
The dataset we used are lateral photos of 20 specimens belonging to 2 species, Platyrrhinus infuscus and
Platyrrhinus lineatus. Each category has 10 specimens. As you may notice from Figure 3, the scales and
poses of skulls vary among images, which suggests a spatial normalization for the convenience of later
process. Also, there are rulers in the photos which help us find the pixel size (how large in reality is
length of a pixel in the image), a measurement of the real size of the skull.

Figure 3: (a), (b) are sample photos of Platyrrhinus infuscus and Platyrrhinus lineatus respectively.

3 Image preprocessing
The goal of the preprocessing step is to spatially normalize all the skulls so that they fit into the same
rectangular with roughly the same pose, in other words, little rotation from one to another. We first
extract the skull from the image by hand. Then we mark two points, namely the most posterior point
of the occiput and the most anterior point of the excluding incisor. Using these two marks we can find
a rotated bounding box of the skull. After that we perform a simple transformation including a rotation
plus a translation plus a scaling on the content of the bounding box. As a result the box along with its
content will be registered to a standard box with a fixed size. The whole process is illustrated in Figure
4.
Besides the normalized images, the preprocessing also gives the pixel sizes and and scaling factors
along x- and y-axes. The pixel sizes are plotted in Figure 5. It clearly shows the first 10 samples are
similar in size and significantly larger than the rest ten samples, which is consistent with the fact that
3
Figure 4: (a) The original image. (b) The hand-labeled mask. (c) Two marker points. (d) The rotated
bounding box. (e) The normalized mask. (f) The normalized color image.

Figure 5: Pixel size of each normalized image.

the first 10 samples are from the same species. It turns out scale is a diagnostic feature to distinguish
between the two.
Since length will not always be such an effective feature for other kinds of bats, for the sake of
generality, it is still interesting to know the performance of shape features.

4 Generating the distance matrix


Following the approach described in Section 1, we can find the distance matrix between each pair of skulls.
Specifically, the distance is a weighted sum defined by,

D(P, Q) = w1 Dac (P, Q) + w2 Dsc (P, Q) + w3 Dbe (P, Q), (5)

where Dac , Dsc and Dbe are the affine cost, shape context cost and the bending energy of the transfor-
mation respectively and P, Q are point sets corresponding to the two shapes.
4
We generate two distance matrices for comparison (Figure 7). The first one measures the similarity
in the whole contour of the skulls; the second measures only the the similarity in the teeth portion of the
skulls (see Figure 6). This selection of feature points is from zoologists’ knowlege. In Velazco’s paper [2],
besides length features, 35 of totally 60 proposed features are about dentition.

Figure 6: Shape matching when we only consider the teeth portion.

Figure 7: Distance matrices. Left: the whole contour; right: teeth part only. The More bluish the color
patch is the more similar two shapes are. Since the first ten specimens belong to the same kind and
the rest ten the other, ideally the distance matrices should appear blue in the first and fourth quarter
(marked in red boxes) and red in the second and third quater. In this sense the ‘teeth matrix’ is better.

5 Classification results
Given the distance matrix, we use a simple nearest neighbor classifier and the leave-one-out cross-
validation to test the classification performance. The average accuracy is listed in Table 1. Because
we perform the shape matching on the normalized images, the results are not affected by the size of the
skulls and reflect only the effectiveness of scale-independent shape features.

5
Using the whole contour Using only the teeth part
Average Accuracy 60% 85%

Table 1: Classification performance.

6 Conclusion
We use the shape context feature, a distance metric generated from an iterative process and nearest
neighbor classifier to classify two bat species based on their 2D skull photos. We achieve an average
accuracy of 85%.

References
[1] Belongie, S. and Malik, J. and Puzicha, J. Shape Matching and Object Recognition Using Shape
Contexts, IEEE Trans.PAMI ’02
[2] Velazco, Paul M. Morphological phylogeny of the bat genus Platyrrhinus Saussure, 1860 (Chiroptera:
Phyllostomidae) with the description of four new species, Fieldiana Zoology new series, no.105, 2005
[3] C. Papadimitriou and K. Stieglitz, Combinatorial Optimization: Algorithms and Complexity. Pren-
tice Hall, 1982.
[4] F.L. Bookstein, Principal warps: thin-plate splines and the decomposition of deformations, IEEE
Trans. PAMI ’89

You might also like