ObjectRecognitionIntro 2NOV

Introduction to Object Recognition
Outline
• The Problem of Object Recognition
• Approaches to Object Recognition
• Requirements and Performance Criteria
• Representation Schemes
• Matching Schemes
• Example Systems
• Indexing
• Grouping
• Error Analysis
Problem Statement
• Given some knowledge of how certain objects
may appear and an image of a scene possibly
containing those objects, report which objects are
present in the scene and where.
Recognition should be:

(1) invariant to view point
changes and object transformations
(2) robust to noise and occlusions
Challenges
• The appearance of an object can have a large range of

variation due to:
– photometric effects
– scene clutter
– changes in shape (e.g.,non-rigid objects)
– viewpoint changes
• Different views of the same object can give rise to
widely different images !!
Object Recognition Applications
• Quality control and assembly in industrial plants.

• Robot localization and navigation.
• Monitoring and surveillance.
• Automatic exploration of image databases.
Human Visual Recognition
• A spontaneous, natural activity for humans and
other biological systems.
– People know about tens of thousands of different
objects, yet they can easily distinguish among them.
– People can recognize objects with movable parts or
objects that are not rigid.
– People can balance the information provided by
different kinds of visual input.
Why Is It Difficult?
• Hard mathematical problems in understanding

the relationship between geometric shapes and
their projections into images.
• We must match an image to one of a huge number

of possible objects, in any of an infinite number of
possible positions (computational complexity)
Why Is It Difficult? (cont’d)
• We do not understand the recognition

problem
What do we do in practice?
• Impose constraints to simplify the problem.
• Construct useful machines rather than
modeling human performance.
Approaches Differ According To:
• Knowledge they employ

– Model-based approach (i.e., based on explicit model of
the object's shape or appearance)
– Context-based approach (i.e., based on the context in
which objects may be found)
– Function-based approach (i.e., based on the function
for which objects may serve)
(cont’d)
• Restrictions on the form of the objects

– 2D or 3D objects
– Simple vs complex objects
– Rigid vs deforming objects
• Representation schemes
– Object-centered
– Viewer-centered
(cont’d)
• Matching scheme
– Geometry-based
– Appearance-based
• Image formation model

– Perspective projection
– Affine transformation (e.g., planar objects)
– Orthographic projection + scale
Requirements
• Viewpoint Invariant
– Translation, Rotation, Scale
• Robust
– Noise (i.e., sensor noise)
– Local errors in early processing modules (e.g., edge
detection)
– Illumination/Shadows
– Partial occlusion (i.e., self and from other objects)
– Intrinsic shape distortions (i.e., non-rigid objects)
Performance Criteria
• Scope
– What kind of objects can be recognized and in what
kinds of scenes ?
• Robustness
– Does the method tolerate reasonable amounts of noise
and occlusion in the scene ?
– Does it degrade gracefully as those tolerances are
exceeded ?
Performance Criteria (cont’d)
• Efficiency
– How much time and memory are required to search the
solution space ?
• Accuracy
– Correct recognition
– False positives (wrong recognitions)
– False negatives (missed recognitions)
Object-centered Representation (cont’d)
• Two different matching approaches:
(1) Derive a similar object-centered description from

the scene and match it with the models (e.g. using
“shape from X” methods).
(2) Apply a model of the image formation process on

the candidate model to back-project it onto the scene
(camera calibration required).
Predicting New Views
• There is some evidence that the human visual

system uses a “viewer-centered” representation for
object recognition.
• It predicts the appearance of objects in images

obtained under novel conditions by generalizing
from familiar images of the objects.
Predicting New Views (cont’d)
Familiar Views
Predict Novel View

Matching Schemes
(1) Geometry-based
explore correspondences
between model and scene
features
(2) Appearance-based
represent objects from all
possible viewpoints and all
possible illumination
directions.
Geometry-based Matching
• Advantage: efficient in “segmenting” the object

of interest from the scene and robust in handling
“occlusion”
• Disadvantage: rely heavily on feature extraction

and their performance degrades when imaging
conditions give rise to poor segmentations.
Appearance-based Matching
• Advantage: circumvent the feature extraction

problem by enumerating many possible object
appearances in advance.
• Disadvantages: (i) difficulties with segmenting the

objects from the background and dealing with
occlusions, (ii) too many possible appearances, (iii)
how to sample the space of appearances ?
Model-Based Object Recognition
• The environment is rather constraint and recognition
relies upon the existence of a set of predefined objects.
Goals of Matching
• Identify a group of features from an unknown scene
which approximately match a set of features from a
known view of a model object.
• Recover the geometric transformation that the model
object has undergone
Transformation Space
• 2D objects (2 translation, 1 rotation, 1 scale)
• 3D objects, perspective projection (3 rotation, 3

translation)
• 3D objects, orthographic projection + scale

(essentially 5 parameters and a constant for depth)
Indexing-based Recognition
• Preprocessing step: groups of model features are

used to index the database and the indexed locations
are filled with entries containing references to the
model objects and information that later can be used
for pose recovering.
• Recognition step: groups of scene features are used to
index the database and the model objects listed in the
indexed locations are collected into a list of candidate
models (hypotheses).
References
• E. Grimson and T. Lozano-Perez, "Localizing overlapping parts by

searching the interpretation tree", IEEE Pattern Analysis and
Machine Intelligence, vol. 9, no. 4, pp. 469-482, July 1987.
• D. Huttenlocher and S. Ullman, "Recognizing solid objects by
alignment with an image", International Journal of Computer
Vision, vol. 5, no. 2, pp. 195-212, 1990.
• Y. Lamdan, J. Schwartz, and H. Wolfson, "Affine invariant model-
based object recognition", IEEE Trans. on Robotics and
Automation, vol. 6, no. 5, pp. 578-589, October 1990.
• Rigoutsos I. & Hummel R., "A Bayesian approach to model matching
with geometric hashing", CVGIP: Image Understanding, 62, 11-26,
1995.
References (cont’d)
• D. Clemens and D. Jacobs, "Space and time bounds on indexing 3D

models from 2D images", IEEE Pattern Analysis and Machine
Intelligence, vol. 13 no. 10, pp. 1007-1017, 1991.
• D. Thompson and J. Mundy, "Three dimensional model matching from
an unconstrained viewpoint", IEEE Conference on Robotics and
Automation, pp. 208-220, 1987.
• D. Ballard, "Generalizing the hough transform to detect arbitrary
patterns", Pattern Recognition, vol. 13, no. 2, pp. 111-122, 1981.
• H. Murase and S. Nayar, "Visual learning and recognition of 3D
objects from appearance", International Journal of Computer Vision,
vol. 14, pp. 5-24, 1995.
References (cont’d)
• M. Turk and A. Pentland, "Eigenfaces for Recognition", Journal of

Cognitive Neuroscience, Vol. 3, pp. 71-86, 1991.
• D. Jacobs, "Robust and efficient detection of salient convex groups",
IEEE Transactions on Pattern Analysis and Machine Intelligence,
vol. 18, no. 1, pp. 23-37, 1996.
• Bowyer and C. Dyer, "Aspect graphs: an introduction and survey of
recent results", International Journal of Imaging Systems and
Technology, vol. 2, pp. 315-328, 1990.

ObjectRecognitionIntro 2NOV

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

ObjectRecognitionIntro 2NOV

Uploaded by

Copyright:

Available Formats

Introduction to Object Recognition

Recognition should be:

• The appearance of an object can have a large range of

• Quality control and assembly in industrial plants.

• Hard mathematical problems in understanding

• We must match an image to one of a huge number

• We do not understand the recognition

• Knowledge they employ

• Restrictions on the form of the objects

• Image formation model

• Two different matching approaches:

(1) Derive a similar object-centered description from

(2) Apply a model of the image formation process on

• There is some evidence that the human visual

• It predicts the appearance of objects in images

Predict Novel View

• Advantage: efficient in “segmenting” the object

• Disadvantage: rely heavily on feature extraction

• Advantage: circumvent the feature extraction

• Disadvantages: (i) difficulties with segmenting the

• 2D objects (2 translation, 1 rotation, 1 scale)

• 3D objects, perspective projection (3 rotation, 3

• 3D objects, orthographic projection + scale

• Preprocessing step: groups of model features are

• E. Grimson and T. Lozano-Perez, "Localizing overlapping parts by

• D. Clemens and D. Jacobs, "Space and time bounds on indexing 3D

• M. Turk and A. Pentland, "Eigenfaces for Recognition", Journal of

You might also like