You are on page 1of 28

Introduction to Object Recognition

Outline
• The Problem of Object Recognition
• Approaches to Object Recognition
• Requirements and Performance Criteria
• Representation Schemes
• Matching Schemes
• Example Systems
• Indexing
• Grouping
• Error Analysis
Problem Statement
• Given some knowledge of how certain objects
may appear and an image of a scene possibly
containing those objects, report which objects are
present in the scene and where.

Recognition should be:


(1) invariant to view point
changes and object transformations
(2) robust to noise and occlusions
Challenges

• The appearance of an object can have a large range of


variation due to:
– photometric effects
– scene clutter
– changes in shape (e.g.,non-rigid objects)
– viewpoint changes
• Different views of the same object can give rise to
widely different images !!
Object Recognition Applications

• Quality control and assembly in industrial plants.


• Robot localization and navigation.
• Monitoring and surveillance.
• Automatic exploration of image databases.
Human Visual Recognition
• A spontaneous, natural activity for humans and
other biological systems.
– People know about tens of thousands of different
objects, yet they can easily distinguish among them.
– People can recognize objects with movable parts or
objects that are not rigid.
– People can balance the information provided by
different kinds of visual input.
Why Is It Difficult?

• Hard mathematical problems in understanding


the relationship between geometric shapes and
their projections into images.

• We must match an image to one of a huge number


of possible objects, in any of an infinite number of
possible positions (computational complexity)
Why Is It Difficult? (cont’d)

• We do not understand the recognition


problem
What do we do in practice?
• Impose constraints to simplify the problem.
• Construct useful machines rather than
modeling human performance.
Approaches Differ According To:

• Knowledge they employ


– Model-based approach (i.e., based on explicit model of
the object's shape or appearance)
– Context-based approach (i.e., based on the context in
which objects may be found)
– Function-based approach (i.e., based on the function
for which objects may serve)
Approaches Differ According To:
(cont’d)

• Restrictions on the form of the objects


– 2D or 3D objects
– Simple vs complex objects
– Rigid vs deforming objects

• Representation schemes
– Object-centered
– Viewer-centered
Approaches Differ According To:
(cont’d)

• Matching scheme
– Geometry-based
– Appearance-based

• Image formation model


– Perspective projection
– Affine transformation (e.g., planar objects)
– Orthographic projection + scale
Requirements
• Viewpoint Invariant
– Translation, Rotation, Scale

• Robust
– Noise (i.e., sensor noise)
– Local errors in early processing modules (e.g., edge
detection)
– Illumination/Shadows
– Partial occlusion (i.e., self and from other objects)
– Intrinsic shape distortions (i.e., non-rigid objects)
Performance Criteria

• Scope
– What kind of objects can be recognized and in what
kinds of scenes ?
• Robustness
– Does the method tolerate reasonable amounts of noise
and occlusion in the scene ?
– Does it degrade gracefully as those tolerances are
exceeded ?
Performance Criteria (cont’d)

• Efficiency
– How much time and memory are required to search the
solution space ?
• Accuracy
– Correct recognition
– False positives (wrong recognitions)
– False negatives (missed recognitions)
Object-centered Representation (cont’d)

• Two different matching approaches:

(1) Derive a similar object-centered description from


the scene and match it with the models (e.g. using
“shape from X” methods).

(2) Apply a model of the image formation process on


the candidate model to back-project it onto the scene
(camera calibration required).
Predicting New Views

• There is some evidence that the human visual


system uses a “viewer-centered” representation for
object recognition.

• It predicts the appearance of objects in images


obtained under novel conditions by generalizing
from familiar images of the objects.
Predicting New Views (cont’d)
Familiar Views

Predict Novel View


Matching Schemes

(1) Geometry-based
explore correspondences
between model and scene
features

(2) Appearance-based
represent objects from all
possible viewpoints and all
possible illumination
directions.
Geometry-based Matching

• Advantage: efficient in “segmenting” the object


of interest from the scene and robust in handling
“occlusion”

• Disadvantage: rely heavily on feature extraction


and their performance degrades when imaging
conditions give rise to poor segmentations.
Appearance-based Matching

• Advantage: circumvent the feature extraction


problem by enumerating many possible object
appearances in advance.

• Disadvantages: (i) difficulties with segmenting the


objects from the background and dealing with
occlusions, (ii) too many possible appearances, (iii)
how to sample the space of appearances ?
Model-Based Object Recognition
• The environment is rather constraint and recognition
relies upon the existence of a set of predefined objects.
Goals of Matching
• Identify a group of features from an unknown scene
which approximately match a set of features from a
known view of a model object.
• Recover the geometric transformation that the model
object has undergone
Transformation Space

• 2D objects (2 translation, 1 rotation, 1 scale)

• 3D objects, perspective projection (3 rotation, 3


translation)

• 3D objects, orthographic projection + scale


(essentially 5 parameters and a constant for depth)
Indexing-based Recognition

• Preprocessing step: groups of model features are


used to index the database and the indexed locations
are filled with entries containing references to the
model objects and information that later can be used
for pose recovering.
• Recognition step: groups of scene features are used to
index the database and the model objects listed in the
indexed locations are collected into a list of candidate
models (hypotheses).
References

• E. Grimson and T. Lozano-Perez, "Localizing overlapping parts by


searching the interpretation tree", IEEE Pattern Analysis and
Machine Intelligence, vol. 9, no. 4, pp. 469-482, July 1987.
• D. Huttenlocher and S. Ullman, "Recognizing solid objects by
alignment with an image", International Journal of Computer
Vision, vol. 5, no. 2, pp. 195-212, 1990.
• Y. Lamdan, J. Schwartz, and H. Wolfson, "Affine invariant model-
based object recognition", IEEE Trans. on Robotics and
Automation, vol. 6, no. 5, pp. 578-589, October 1990.
• Rigoutsos I. & Hummel R., "A Bayesian approach to model matching
with geometric hashing", CVGIP: Image Understanding, 62, 11-26,
1995.
References (cont’d)

• D. Clemens and D. Jacobs, "Space and time bounds on indexing 3D


models from 2D images", IEEE Pattern Analysis and Machine
Intelligence, vol. 13 no. 10, pp. 1007-1017, 1991.
• D. Thompson and J. Mundy, "Three dimensional model matching from
an unconstrained viewpoint", IEEE Conference on Robotics and
Automation, pp. 208-220, 1987.
• D. Ballard, "Generalizing the hough transform to detect arbitrary
patterns", Pattern Recognition, vol. 13, no. 2, pp. 111-122, 1981.
• H. Murase and S. Nayar, "Visual learning and recognition of 3D
objects from appearance", International Journal of Computer Vision,
vol. 14, pp. 5-24, 1995.
References (cont’d)

• M. Turk and A. Pentland, "Eigenfaces for Recognition", Journal of


Cognitive Neuroscience, Vol. 3, pp. 71-86, 1991.
• D. Jacobs, "Robust and efficient detection of salient convex groups",
IEEE Transactions on Pattern Analysis and Machine Intelligence,
vol. 18, no. 1, pp. 23-37, 1996.
• Bowyer and C. Dyer, "Aspect graphs: an introduction and survey of
recent results", International Journal of Imaging Systems and
Technology, vol. 2, pp. 315-328, 1990.

You might also like