Recognitionbyrelations

Matching by relations
• Idea:
– find bits, then say object is present if bits are ok
• Advantage:
– objects with complex configuration spaces don’t make good
templates
• internal degrees of freedom
• aspect changes
• (possibly) shading
• variations in texture
• etc.
Computer Vision - A Modern Approach

Set: Recognition by relations
Slides by D.A. Forsyth
Simplest
• Define a set of local feature templates

– could find these with filters, etc.
– corner detector+filters
• Think of objects as patterns
• Each template votes for all patterns that contain it
• Pattern with the most votes wins

Figure from “Local grayvalue invariants for image retrieval,” by C. Schmid and R. Mohr,
IEEE Trans. Pattern Analysis and Machine Intelligence, 1997 copyright 1997, IEEE
Probabilistic interpretation
• Write
• Assume
• Likelihood of image given pattern

Possible alternative strategies
• Notice:
– different patterns may yield different templates with different
probabilities
– different templates may be found in noise with different
probabilities

Employ spatial relations

Finding faces using relations
• Strategy:
– Face is eyes, nose, mouth, etc. with appropriate relations between
them
– build a specialised detector for each of these (template matching)
and look for groups with the right internal structure
– Once we’ve found enough of a face, there is little uncertainty
about where the other bits could be

Finding faces using relations
• Strategy: compare
Notice that once some facial

features have been found, the
position of the rest is quite
strongly constrained.
Figure from, “Finding faces in cluttered scenes using
random labelled graph matching,” by Leung, T. ;Burl,
M and Perona, P., Proc. Int. Conf. on Computer Vision,
1995 copyright 1995, IEEE
Detection
This means we compare

Issues
• Plugging in values for position of nose, eyes, etc.
– search for next one given what we’ve found
• when to stop searching
– when nothing that is added to the group could change the
decision
– i.e. it’s not a face, whatever features are added or
– it’s a face, and anything you can’t find is occluded
• what to do next
– look for another eye? or a nose?
– probably look for the easiest to find
• What if there’s no nose response
– marginalize
Figure from, “Finding faces in cluttered scenes using random labelled graph matching,” by Leung, T. ;Burl,
M and Perona, P., Proc. Int. Conf. on Computer Vision, 1995 copyright 1995, IEEE
Pruning
• Prune using a classifier • Example: finding people
– crude criterion: if this small without clothes on
assembly doesn’t work, there – find skin
is no need to build on it. – find extended skin regions
– construct groups that pass local
classifiers (i.e. lower arm,
upper arm)
– give these to broader scale
classifiers (e.g. girdle)

Pruning
• Prune using a classifier
– better criterion: if there is
nothing that can be added to
this assembly to make it
acceptable, stop
– equivalent to projecting
classifier boundaries.

Horses

Hidden Markov Models
• Elements of sign language understanding

– the speaker makes a sequence of signs
– Some signs are more common than others
– the next sign depends (roughly, and probabilistically) only on the
current sign
– there are measurements, which may be inaccurate; different signs
tend to generate different probability densities on measurement
values
• Many problems share these properties
– tracking is like this, for example

Hidden Markov Models
• Now in each state we could

emit a measurement, with
probability depending on the
state and the measurement
• We observe these
measurements

HMM’s - dynamics

HMM’s - the Joint and Inference

Trellises
• Each column corresponds to a
measurement in the sequence
• Trellis makes the collection of
legal paths obvious
• Now we would like to get the
path with the largest negative
log-posterior
• Trellis makes this easy, as
follows.

Fitting an HMM
• I have:
– sequence of measurements
– collection of states
– topology
• I want
– state transition probabilities
– measurement emission probabilities
• Straightforward application of EM
– discrete vars give state for each measurement
– M step is just averaging, etc.

HMM’s for sign language
understanding-1
• Build an HMM for each word

HMM’s for sign language
understanding-2
• Build an HMM for each word
• Then build a language model

For both isolated word recognition tasks
and for recognition using a language
model that has five word sentences
(words always appearing in the order
pronoun verb noun adjective pronoun),
Starner and Pentland’s displays a word
accuracy of the order of 90%. Values are
slightly larger or smaller, depending on
the features and the task, etc.
User gesturing
Figure from “Real time American sign language recognition using desk and wearable computer
based video,” T. Starner, et al. Proc. Int. Symp. on Computer Vision, 1995, copyright 1995, IEEE
HMM’s can be spatial rather than
temporal; for example, we have a
simple model where the position of
the arm depends on the position of
the torso, and the position of the
leg depends on the position of the
torso. We can build a trellis, where
each node represents correspondence
between an image token and a body
part, and do DP on this trellis.

Figure from “Efficient Matching of Pictorial Structures,” P. Felzenszwalb and D.P. Huttenlocher, Proc.
Computer Vision and Pattern Recognition2000, copyright 2000, IEEE
The future is bright
• Computation is cheap
• Lots of pix
– cameras are cheap, many pix are digital, ink wars
• Lots of demand for “slicing and dicing” pix
– generate models
– new movies from old
– search
• Lots of “hidden value”
– can’t do data mining for collections with pix in them
• e.g. mortgage papers, cheques, etc.
• e.g. filtering

Recent flowering of vision
• can do (sort of!) • will be able to do (sort of!)
– structure from motion – face recognition
– segmentation – inference about people
– video representation – character recognition
– model building – perhaps more
– tracking
– face finding

Big open problems
• Next step in structure from • Representation for recognition
motion • Efficient management of
• Really good missing variable relations
formalism • Recognition processes for lots
• Decent understanding of of objects
illumination, materials and
shading • A lot of this looks like applied
• Segmentation statistics


Recognitionbyrelations

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Recognitionbyrelations

Uploaded by

Copyright:

Available Formats

Matching by relations

Computer Vision - A Modern Approach

• Define a set of local feature templates

Computer Vision - A Modern Approach

• Likelihood of image given pattern

Computer Vision - A Modern Approach

Computer Vision - A Modern Approach

Computer Vision - A Modern Approach

Computer Vision - A Modern Approach

Notice that once some facial

This means we compare

Computer Vision - A Modern Approach

Computer Vision - A Modern Approach

Computer Vision - A Modern Approach

Computer Vision - A Modern Approach

• Elements of sign language understanding

Computer Vision - A Modern Approach

• Now in each state we could

Computer Vision - A Modern Approach

Computer Vision - A Modern Approach

Computer Vision - A Modern Approach

Computer Vision - A Modern Approach

Computer Vision - A Modern Approach

Computer Vision - A Modern Approach

Computer Vision - A Modern Approach

Computer Vision - A Modern Approach

Computer Vision - A Modern Approach

Computer Vision - A Modern Approach

Computer Vision - A Modern Approach

You might also like