Professional Documents
Culture Documents
however the object database is rather small so we cannot object. Another recent development has been the Scale
be sure how the results scale to a large set of images. Invariant Feature Transform (SIFT) [Lowe, 1999] which
Other approaches for visual object recognition include has also been applied to recognition of a set of ten dif-
template matching. [Gavrilla and Davis, 1994] proposed ferent household objects by Ke and Sukthankar [Ke and
a modified cross correlation to allow a number of compar- Sukthankar, 2004]. The method uses principal compo-
isons to run in parallel. Approximately 20 images could nents analysis (PCA) to reduce the SIFT descriptors and
be successfully compared without loss of accuracy. Obvi- make them more distinctive. They report a recognition
ously, this improves the execution speed for comparing rate of 68% if the top two matches are used. Recognition
images however, the computation time still scales lin- takes about 5 seconds. Again the test set is very limited,
early with the size of the set. Recently, [Bala and Cetin, though large variations of viewpoint are present.
2004] used decimated wavelet transforms to develop an
affine invariant function for modelling of outlines of ob- Template matching is a simple task of performing a
jects (model plane images). The work is an extension normalised cross-correlation between a template image
from previous work [Guggenheimer, 1963], [L Arbter. W. (object in training set) and a new image to classify. This
E. Synder and Hirzinger, 1990], [Reiss, 1991], [Khalil and paper proposes a new method of using template match-
Bayoumi, 2001], [Tieng and Boles, 1997], using affine in- ing across a large set. Section 2 discusses the approach
variant functions to recognise objects. Their results show to object recognition, while Section 3 explains how to
an improvement by 47% for computational complexity reduce the large set in order to perform efficient ob-
without decreasing recognition performance or accuracy. ject recognition. The results for object recognition using
However, these experiments are on a very limited test set template matching against our reduced set are presented
of 20 images and only make use of the silhouette of the in Section 6.
function ReduceClasses(classes, M) {
blobs = GetBlobsFor(class);
clusters = array.empty();
if (score < M)
clusters.append(blob_found);
else
AddImageToAvgImage(blob_found, score_index, clusters);
}
reduced_classes.append(clusters);
}
return reduced_classes;
}
best_score = -1.0;
best_blob = NULL;
blobs = GetBlobsFor(similar_class);
return GetBlobName(best_blob);
}
resented by the average of the images in the cluster. If but we have sought to develop a more general, rigorous
the best NCC score is less than the threshold, then we approach. Unfortunately, this has failed, so we have had
decide that the image belongs to a separate, new clus- to resort to a more Lego-specific approach: greyscale
ter and initialise a new cluster in the reduced set. The NCC to identify the shape and average colour compari-
algorithm is presented in Figure 2. son to determine the block colour (we call this the colour
hack approach). This is a weakness of our test data set:
4 Recognition Procedure the colours are particularly distinct.
60 60
40 40
20 M = 0.75 20
M = 0.8
M = 0.85
M = 0.9 M = 0.9
0 0
3 5 10 15 20 10 15
Closest classes n_{avg} Closest classes n_{avg}
Figure 5: Recognition accuracy vs number of similar Figure 6: Latest two recognition results: accuracy vs
views (navg ) for recognising 1000 new images against number of similar views (navg ) for recognising 1000 new
various reduced sets (M ). (Using the colour hack.) images against various reduced sets (M ).
25
6 Results
The success of the algorithm was tested via comparing 20
the trade-off of different sized reduced sets and the num-
2500
Accuracy (%)
60
2000
40 1500
1000
20
500
0 0
3 5 10 15 20 3 5 10 15 20
Closest classes n_{avg} Closest classes n_{avg}
Figure 8: Average accuracy vs number of similar views Figure 10: Average number of images examined from
(navg ) for recognising 1000 new images against various each reduced set per classification vs. similar views
reduced sets. (navg ) for recognising 1000 new images against various
reduced sets.
25000
ever, further work is required before it will be suitable
20000 for integration into another system.
Number of Images
0
0.75 0.8 0.85 0.9
8 Acknowledgements
M This work was supported by funding from National ICT
Australia. National ICT Australia is funded by the Aus-
Figure 9: Number of clusters for each reduced set from tralian Government’s Department of Communications,
training set of 101,936 images. Information Technology and the Arts and the Australian
Research Council through Backing Australia’s Ability
accurate classification techniques used here see Fig- and the ICT Centre of Excellence program.
ures 12 and 13 which show (far left) the average im-
age of all angle views for anglebracket-grey-2x2 and References
block-black-4x2, respectively. For our data set there [Austin and Barnes, 2003] D. Austin and N. Barnes. Red is
are a total of 90 of these images which are used in the the new black - or is it? In Australasian Conference on
first phase approximate classification. The other images Robotics and Automation, December 2003.
in Figure 12 and 13 show the same view obtained from [Bala and Cetin, 2004] E. Bala and A. E Cetin. Compu-
different reduced sets. They illustrate how the algorithm tationally efficient wavelet affine invariant functions for
shape recognition. IEEE Transactions on Pattern Analysis
sees the similarity between views and as expected the and Machine Intelligence, 26(8):1095–1099, August 2004.
higher the threshold M the fewer views are used in cre-
[Cyr and Kimia, 2004] C. M. Cyr and B. B. Kimia. A
ating a cluster (average) image.
similarity-based aspect-graph approach to 3d object recog-
nition. International Journal of Computer Vision, 2004.
7 Discussion and Future Work [Gavrilla and Davis, 1994] D. M. Gavrilla and L. S. Davis.
This paper has presented a simple method for visual Fast correlation matching in large (edge) image databases.
object recognition. Visual object recognition is a hard In Proc. of the 23rd AIPR Workshop, August 1994.
problem and cannot really be solved (from a single view [Guggenheimer, 1963] H. W Guggenheimer. Differential Ge-
there are often ambiguities). Furthermore, to process ometry. McGraw Hill, 1963.
the necessary data fast enough for a real-world appli- [ITB CompuPhase, 2004] ITB CompuPhase. Colour metric,
cation is also difficult. The simple template matching 2004. http://www.compuphase.com/cmetric.htm.
algorithm presented here has achieved promising results [Ke and Sukthankar, 2004] Yan Ke and Rahul Sukthankar.
with 90% recognition rates in times of 6.7 seconds. How- Pca-sift: A more distinctive representation for local image
Average image Cluster for M = 0.75 Cluster for M = 0.8 Cluster for M = 0.85 Cluster for M = 0.9
(1024 images) (129 images) (57 images) (31 images) (22 images)
Figure 11: Example class average and one cluster average for a range of different thresholds M for brick
anglebracket-grey-2x2.
Average image Cluster for M = 0.75 Cluster for M = 0.8 Cluster for M = 0.85 Cluster for M = 0.9
(991 images) (319 images) (217 images) (59 images) (45 images)
Figure 12: Example class average and one cluster average for a range of different thresholds M for brick
block-black-4x2.