You are on page 1of 309


Dictionary of Computer Vision and
Image Processing

R. B. Fisher
University of Edinburgh

K. Dawson-Howe
Trinity College Dublin

A. Fitzgibbon
Oxford University

C. Robertson
CEO, Epipole Ltd

C. Williams
University of Edinburgh


Copyright 2004 by John Wiley & Sons, Inc. All rights reserved.

Published by John Wiley & Sons, Inc., Hoboken, New Jersey.

Published simultaneously in Canada.

No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form
or by any means, electronic, mechanical, photocopying, recording, scanning, or otherwise, except as
permitted under Section 107 or 108 of the 1976 United States Copyright Act, without either the prior
written permission of the Publisher, or authorization through payment of the appropriate per-copy fee
the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, (978) 750-8400,
fax (978) 646-8600, or on the web at Requests to the Publisher for permission
be addressed to the Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ
07030, (201) 748-6011, fax (201) 748-6008.

Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their best efforts
preparing this book, they make no representations or warranties with respect to the accuracy or
completeness of the contents of this book and specifically disclaim any implied warranties of
merchantability or fitness for a particular purpose. No warranty may be created ore extended by sales
representatives or written sales materials. The advice and strategies contained herin may not be
suitable for your situation. You should consult with a professional where appropriate. Neither the
publisher nor author shall be liable for any loss of profit or any other commercial damages, including
but not limited to special, incidental, consequential, or other damages.

For general information on our other products and services please contact our Customer Care
Department with the U.S. at 877-762-2974, outside the U.S. at 317-572-3993 or fax 317-572-4002.

Wiley also publishes its books in a variety of electronic formats. Some content that appears in print,
however, may not be available in electronic format.

Library of Congress Cataloging-in-Publication Data:

Dictionary of Computer Vision and Image Processing / Robert B. Fisher . . . [et al.].
Printed in the United States of America.
10 9 8 7 6 5 4 3 2 1
From Bob to Rosemary,
Mies, Hannah, Phoebe
and Lars

From AWF to Liz, to my

parents, and again to D.

To Karen and Aidan.

Thanks pips !

From Ken to Jane,

William and Susie

From Manuel to Emily,

Francesca, and Alistair

Preface ix

References xiii


This dictionary arose out of a continuing interest in the resources needed by

beginning students and researchers in the fields of image processing, computer
vision and machine vision (however you choose to define these overlapping fields).
As instructors and mentors, we often found confusion about what various terms
and concepts mean for the beginner. To support these learners, we have tried to
define the key concepts that a competent generalist should know about these
This second edition adds approximately 1000 new terms to the more than 2500
terms in the original dictionary. We have chosen new terms that have entered
reasonably common usage (e.g., appeared in the index of influential books), and
terms that were not included originally. We are pleased to welcome Chris Williams
into the authorial team and to thank Manuel Trucco for all of his help in the past.
One innovation in the second edition is the addition of reference links for a
majority of the old and new terms. Unlike more traditional dictionaries, which
provide references to establish the origin or meaning of the word, our goal here
was instead to provide further information about the term. Hence, we tried to use
Wikipedia as much as possible as this is an easily accessible and constantly
improving resource.
This is a dictionary, not an encyclopedia, so the definitions are necessarily brief
and are not intended to replace a proper textbook explanation of the term. We
have tried to capture the essentials of the terms, with short examples or
mathematical precision where feasible or necessary for clarity.
Further information about many of the terms can be found in the references
below. These are mostly general textbooks, each providing a broad view of a


portion of the field. Some of the concepts are also quite recent and, although
commonly used in research publications, have not yet appeared in mainstream
textbooks. Thus this book is also a useful source for recent terminology and
Certainly some concepts are still missing from the dictionary, but we have
scanned both textbooks and the research literature to find the central and
commonly used terms.
Although the dictionary was intended for beginning and intermediate students
and researchers, as we developed the dictionary it was clear that we also had some
confusions and vague understandings of the concepts. It also surprised us that
some terms had multiple usages. To improve quality and coverage, each definition
was reviewed during development by at least two people besides its author. We
hope that this has caught any errors and vagueness, as well as reproduced the
alternative meanings. Each of the co-authors is quite experienced in the topics
covered here, but it was still educational to learn more about our field in the
process of compiling the dictionary. We hope that you find using the dictionary
equally valuable.
The authors would like to thank Xiang (Lily) Li and Georgios Papadimitriou for
their help with finding citations for the content from the first edition. We also
greatly appreciate all the support from the Wiley editorial and production team!

To help the reader, terms appearing elsewhere in the dictionary are underlined.
We have tried to be reasonably thorough about this, but some terms, such as 2D,
3D, light, camera, image, pixel and color were so commonly used that we decided
to not cross-reference all of these.

We have tried to be consistent with the mathematical notation: italics for scalars
(s), arrowed italics for points and vectors (~v ), and mathbf letters for matrices (M).

The reference for most of the terms has two parts: AAA: BBB. The AAA
component refers to one of the items listed below. The BBB component normally
refers to a chapter/section/page in reference AAA. Wikipedia entries (WP) are
slightly different, in that the BBB term is the relevant Wikipedia page

1. A. Gibbons. Algorithmic Graph Theory, Cambridge University Press, 1985.

2. A. Hornberg (Ed.). Handbook of Machine Vision, Wiley-VCH, 2006.
3. A. Jain. Fundamentals of Digital Image Processing, Prentice Hall Intl, 1989.
4. A. Low, Introductory Computer Vision and Image Processing, McGraw-Hill, 1991.
5. A. Papoulis, Probability, Random Variables, and Stochastic Processes, McGraw-Hill,
New York, Third Edition, 1991.
6. T. Acharya, A. K. Roy. Image Processing, Wiley, 2005.
7. B. A. Wandell. Foundations of Vision, Sinauer, 1995.
8. D. Ballard, C. Brown. Computer Vision, Prentice Hall, 1982.
9. A. M. Bronstein, M. M. Bronstein, R. Kimmel. Numerical Geometry of Non-Rigid
Shapes, Springer, 2008.
10. B. G. Batchelor, D. A. Hill, D. C. Hodgson. Automated Visual Inspection, IFS, 1985.
11. A. Blake, S. Isard. Active Contours, Springer, 1998.
12. J. C. Bezdek, J. Keller, R. Krisnapuram, N. Pal, Fuzzy Models and Algorithms for
Pattern Recognition and Image Processing, Springer 2005.
13. B. K. P. Horn. Robot Vision, MIT Press, 1986.
14. M. Bennamoun, G. J. Mamic. Object Recognition - Fundamentals and Case Studies,
Springer, 2002.
15. B. Noble, Applied Linear Algebra, Prentice-Hall, 1969.
16. R. D. Boyle, R. C. Thomas. Computer Vision: A First Course, Blackwell, 1988.
17. S. Boyd, L. Vandenberghe. Convex Optimization, Cambridge University Press, 2004.

Dictionary of Computer Vision andImage Processing, First Edition. By Robert B. Fisher, et alxiii
ISBN *** 2012 John Wiley & Sons, Inc.

18. C. Chatfield; The Analysis of Time Series: An Introduction, Chapman and Hall,
London, 4th edition, 1989.
19. C. M. Bishop. Pattern Recognition and Machine Learning, Springer, 2006.
20. B. Croft, D. Metzler, T. Strohman. Search Engines: Information Retrieval in
Practice, Addison-Wesley Publishing Company, USA, 2009.
21. B. Cyganek, J. P. Siebert. An Introduction to 3D Computer Vision Techniques and
Algorithms, Wiley, 2009.
22. T. M. Cover, J. A. Thomas. Elements of Information Theory, John Wiley & Sons,
23. R. O. Duda, P. E. Hart. Pattern Classification and Scene Analysis, James Wiley,
24. D. J. C. MacKay. Information Theory, Inference, and Learning Algorithms,
Cambridge University Press, Cambridge, 2003.
25. D. Marr. Vision, Freeman, 1982.
26. A. Desolneux, L. Moisan, J.-M. Morel. From Gestalt Theory to Image Analysis,
Springer, 2008.
27. E. Hecht. Optics. Addison-Wesley, 1987.
28. E. R. Davies. Machine Vision, Academic Press, 1990.
29. E. W. Weisstein. MathWorldA Wolfram Web Resource,, accessed March 1, 2012.
30. F. R. K. Chung; Spectral Graph Theory.American Mathematical Society, 1997.
31. D. Forsyth, J. Ponce. Computer Vision - a modern approach, Prentice Hall, 2003.
32. J. Flusser, T. Suk, B. Zitov
a. Moments and Moment Invariants in pattern
Recognition, Wiley, 2009.
33. A.Gelman, J. B. Carlin, H. S. Stern, D. B. Rubin, Bayesian Data Analysis,
Chapman and Hall, London, 1995.
34. A. Gersho, R. Gray. Vector Quantization and Signal Compression, Kluwer, 1992.
35. P. Green, L. MacDonald (Eds.). Colour Engineering, Wiley, 2003.
36. G. R. Grimmett, D. R. Stirzaker. Probability and Random Processes, Clarendon
Press, Oxford, Second edition, 1992.
37. G. H. Golub, C. F. Van Loan. Matrix Computations, Johns Hopkins University
Press, Second edition, 1989.
38. S. Gong, T. Xiang. Visual Analysis of Behaviour: From Pixels to Semantics,
Springer, 2011.
39. H. Freeman (Ed). Machine Vision for Three-dimensional Scenes, Academic Press,
40. H. Freeman (Ed). Machine Vision for Measurement and Inspection, Academic Press,
41. R. M. Haralick, L. G. Shapiro. Computer and Robot Vision, Addison-Wesley
Longman Publishing, 1992.
42. H. Samet. Applications of Spatial Data Structures, Addison Wesley, 1990.
43. T. J. Hastie, R. J. Tibshirani, J. Friedman. The Elements of Statistical Learning,
Springer-Verlag, 2008.
44. R. Hartley, A. Zisserman. Multiple View Geometry, Cambridge University Press,

45. J. C. McGlone (Ed.). Manual of Photogrammetry, ASPRS, 2004.

46. J. J. Koenderink, What does the occluding contour tell us about solid shape?,
Perception, Vol. 13, pp 321-330, 1984.
47. R. Jain, R. Kasturi, B. Schunck. Machine Vision, McGraw Hill, 1995.
48. J. Pearl; Probabilistic Reasoning in Intelligent Systems: Networks of Plausible
Inference, Morgan Kaufmann, San Mateo, CA, 1988.
49. D. Koller, N. Friedman. Probabilistic Graphical Models, MIT Press, 2009.
50. K. Fukunaga. Introduction to Statistical Pattern Recognition, Academic Press, 1990.
51. B. V. K. Vijaya Kumar, A. Mahalanobis, R. D. Juday. Correlation Pattern
Recognition, Cambridge, 2005.
52. K. P. Murphy. Machine Learning: a Probabilistic Perspective, MIT Press, 2012.
53. L. A. Wasserman. All of Statistics, Springer, 2004.
54. L. J. Galbiati. Machine Vision and Digital Image Processing Fundamentals, Prentice
Hall, 1990.
55. T. Luhmann, S. Robson, S. Kyle, I. Harley; Close Range Photogrammetry, Whittles,
56. S. Lovett. Differential Geometry of Manifolds, Peters, 2010.
57. F. Mokhtarian, M. Bober; Curvature Scale Space Representation: Theory,
Applications and MPEG-7 Standardization, Springer, Computational Imaging and
Vision Series, Vol. 25, 2003.
58. K. V. Mardia, J. T. Kent, J. M. Bibby. Multivariate Analysis, Academic Press,
London, 1979.
59. C. D. Manning, P. Raghavan, H. Sch
utze; Introduction to Information Retrieval,
Cambridge University Press, 2008.
60. J.-M. Morel, S. Solimini; Variational Models for Image Segmentation: with seven
image processing experiments, Birkhauser, 1994.
61. V. S. Nalwa. A Guided Tour of Computer Vision, Addison Wesley, 1993.
62. M. Nixon, A. Aguado. Feature Extraction & Image Processing, Elsevier Newnes,
63. N. A. C. Cressie. Statistics for Spatial Data, Wiley, New York, 1993.
64. O. Faugeras. Three-Dimensional Computer Vision - A Geometric Viewpoint, MIT
Press, 1999.
65. M. Petrou and P. Bosdogianni. Image Processing: The Fundamentals, Wiley
Interscience, 1999.
66. M. Petrou, P. Garcia Sevilla. Image Processing - Dealing with Texture, Wiley, 2006.
67. W. H. Press, S. A. Teukolsky, W. T. Vetterling, B. P. Flannery. Numerical Recipes
in C, Cambridge University Press, Second edition, 1992.
68. R. J. Schalkoff. Digital Image Processing and Computer Vision, Wiley, 1989.
69. R. Nevatia. Machine Perception, Prentice-Hall, 1982.
70. C. E. Rasmussen, C. K. I. Williams. Gaussian Processes for Machine Learning, MIT
Press, Cambridge, Massachusetts, 2006.
71. R. W. G. Hunt. The Reproduction of Colour, Wiley, 2004.
72. E. Reinhard, G. Ward, S. Pattanaik, P. Debevec. High Dynamic Range Imaging,
Morgan Kaufman, 2006.

73. R. Szeliski. Computer Vision: Algorithms and Applications, Springer, 2010.

74. C. Solomon, T. Breckon. Fundamentals of Digital Image Processing,
Wiley-Blackwell, 2011.
75. R. S. Sutton, A. G. Barto. Reinforcement Learning, MIT Press, 1998.
76. S. E. Palmer. Vision Science: Photons to Phenomenology, MIT Press, 1999.
77. S. E. Umbaugh. Computer Vision and Image Processing, Prentice Hall, 1998.
78. M. Sonka, V. Hlavac, R. Boyle. Image Processing, Analysis, and Machine Vision,
Chapman and Hall, 1993.
79. M. Sonka, V. Hlavac, R. Boyle. Image Processing, Analysis, and Machine Vision,
Thompson, 2008.
80. M. Seul, L. OGorman, M. J. Sammon. Practical Algorithms for Image Analysis,
Cambridge University Press, 2000.
81. W. E. Snyder and H. Qi. Machine Vision, Cambridge, 2004.
82. L. Shapiro, G. Stockman. Computer Vision, Prentice Hall. 2001.
83. B. Sch
olkopf, A. Smola, Learning with Kernels, MIT Press, 2002.
84. J. Shawe-Taylor, N. Cristianini. Kernel Methods for Pattern Analysis, Cambridge
University Press, 2004.
85. S. Winkler. Digital Video Quality, Wiley, 2005.
86. S. Thrun, W. Burgard, D. Fox; Probabilistic Robotics, MIT Press, 2005.
87. J. T. Tou, R. C. Gonzalez. Pattern Recognition Principles, Addison Wesley, 1974.
88. T. M. Mitchell. Machine Learning, McGraw-Hill, New York, 1997.
89. E. Trucco, A. Verri. Introductory Techniques for 3-D Computer Vision, Prentice
Hall, 1998.
90. Wikipedia,, accessed March 11, 2011.
91. X. S. Zhou, Y. Rui, T. S. Huang. Exploration of Visual Data, Kluwer Academic,

1D: One dimensional, usually in direction parallel to the other axis, and
reference to some structure. Examples reading the numbers at the
include: 1) a signal x(t) that is a intersections [ JKS:1.4]:
function of time t, 2) the dimensionality
of a single property value or 3) one
degree of freedom in shape variation or
motion. [ EH:2.1]

2D: Two dimensional. A space

describable using any pair of orthogonal
basis vectors consisting of two elements.
[ WP:Two-dimensional space]

2D coordinate system: A system

associating uniquely 2 real numbers to
any point of a plane. First, two
intersecting lines (axes) are chosen on
2D Fourier transform: A special
the plane, usually perpendicular to each
case of the general Fourier transform
other. The point of intersection is the
often used to find structures in images .
origin of the system. Second, metric
[ FP:7.3.1]
units are established on each axis (often
the same for both axes) to associate 2D image: A matrix of data
numbers to points. The coordinates Px representing samples taken at discrete
and Py of a point, P, are obtained by intervals. The data may be from a
projecting P onto each axis in a variety of sources and sampled in a
2 0

variety of ways. In computer vision features is the SUSAN corner finder.

applications the image values are often [ TV:4.1]
encoded color or monochrome intensity
samples taken by digital camera s but 2D pose estimation: A fundamental
may also be range data . Some typical open problem in computer vision
intensity values are [ SQ:4.1.1]: where the correspondence between two
sets of 2D points is found. The problem
is defined as follows: Given two sets of
points {~xj } and {~yk }, find the
Euclidean transformation {R, ~t} (the
pose) and the match matrix {Mjk }
(the correspondences) that best relates
them. A large number of techniques has
been used to address this problem, for
example tree-pruning methods, the
Hough transform and
06 21 11
geometric hashing . A special case of
21 16 12 10 09
3D pose estimation .
10 09 08 09 20 31
07 06 01 02 08 42 2D projection: A transformation
17 12 09 04 mapping higher dimensional space onto
image values two dimensional space. The simplest
method is to simply discard higher
2D input device: A device for dimensional coordinates, although
sampling light intensity from the real generally a viewing position is used and
world into a 2D matrix of the projection is performed.
measurements. The most popular two
dimensional imaging device is the
charge-coupled device ( CCD ) camera.
Other common devices are flatbed
scanners and X-ray scanners. projected points
[ SQ:4.2.1]
2D point: A point in a 2D space, that
is, characterized by two coordinates;
most often, a point on a plane, for 3d solid
2d space
instance an image point in pixel
coordinates. Notice, however, that two
coordinates do not necessarily imply a
plane: a point on a 3D surface can be For example, the main steps for a
expressed either in 3D coordinates or computer graphics projection are as
by two coordinates given a surface follows: apply normalizing transform to
parameterization (see surface patch) . 3D point world coordinates; clip against
[ JKS:1.4] canonical view volume; project onto
projection plane; transform into
2D point feature: Localized viewport in 2D device coordinates for
structures in a 2D image, such as display. Commonly used projections
interest points , corners and line functions are parallel projection or
meeting points (X, Y and T shaped for perspective projection . [ JKS:1.4]
example). One detector for these
0 3

2.5D image: A range image obtained

by scanning from a single viewpoint . +Y
This allows the data to be represented
in a single image array, where each +Z
pixel value encodes the distance to the
observed scene. The reason this is not
called a 3D image is to make explicit
the fact that the back sides of the scene
objects are not represented. [ SQ:4.1.1]

2.5D sketch: Central structure of

Marrs Theory of vision. An 3D data: Data described in all three
intermediate description of a scene spatial dimensions. See also
indicating the visible surfaces and their range data, CAT and NMR . An
arrangement with respect to the viewer. example of a 3D data set is:
It is built from several different
elements: the contour, texture and
shading information coming from the
primal sketch , stereo information and
motion. The description is theorized to
be a kind of buffer where partial
resolution of the objects takes place.
The name 2 12 D sketch stems from the
fact that although local changes in
depth and discontinuities are well
resolved, and the absolute distance to
all scene points may remain unknown.
[ FP:11.3.2]
3D data acquisition: Sampling data
3D: Three dimensional. A space in all three spatial dimensions. There
describable using any triple of mutually are a variety of ways to perform this
orthogonal basis vectors consisting of sampling, for example using
three elements. structured light triangulation .
[ WP:Three-dimensional space] [ FP:21.1]
3D coordinate system: Same as 3D image: See range image .
2D coordinate system , but in three [ SQ:4.1.1]
dimensions. [ JKS:1.4]
3D interpretation: A 3D model, e.g.,
a solid object, that explains an image
or a set of image data. For instance, a
certain configuration of image lines can
be explained as the
perspective projection of a polyhedron;
in simpler words, the image lines are
the images of some of the polyhedrons
lines. See also image interpretation .
[ BB:9.1]
4 0

3D model: A description of a 3D reconstruction: A general term

3D object that primarily describes its referring to the computation of a
shape. Models of this sort are regularly 3D model from 2D images . [ BT:8]
used as exemplars in
model based recognition and 3D 3D skeleton: See skeleton
computer graphics. [ TV:10.6] [ FP:24.2.1]

3D moments: A special case of 3D stratigraphy: A modeling and

moment where the data comes from a visualization tool used to display
set of 3D points . different underground layers. Often
used for visualizations of archaeological
3D object: A subset of R3 . In sites or for detecting different rock and
computer vision, often taken to mean a soil structures in geological surveying.
volume in R3 that is bounded by a
surface . Any solid object around you is 3D structure recovery: See
an example: table, chairs, books, cups, 3D reconstruction . [ BT:8]
and you yourself. [ BB:9.1]
3D texture: The appearance of
3D point: An infinitesimal volume of texture on a 3D surface when imaged,
3D space. [ JKS:1.4] for instance, the fact that the density of
texels varies with distance due to
3D point feature: A point feature on perspective effects. 3D surface
a 3D object or in a 3D environment. properties (e.g., shape, distances,
For instance, a corner in 3D space. orientation) can be estimated from such
effects. See also shape from texture ,
3D pose estimation: 3D pose texture orientation.
estimation is the process of determining
the transformation (translation and 3D vision: A branch of
rotation) of an object in one coordinate computer vision dealing with
frame with respect to another characterizing data composed of 3D
coordinate frame. Generally, only rigid measurements. For example, this may
objects are considered, models of those involve segmentation of the data into
object exist a priori, and we wish to individual surfaces that are then used
determine the position of that object in to identify the data as one of several
an image on the basis of matched models. Reverse engineering is a
features. This is a fundamental open specialism inside 3D vision.
problem in computer vision where the [ ERD:16.2]
correspondence between two sets of 3D
points is found. The problem is defined 4 connectedness: A type of
as follows: Given two sets of points image connectedness in which each
{~xj } and {~yk }, find the parameters of rectangular pixel is considered to be
an Euclidean transformation {R, t} ~ connected to the four neighboring
(the pose)and the match matrix {Mjk } pixels that share a common crack edge .
(the correspondences) that best relates See also 8 connectedness . This figure
them. Assuming the points correspond, shows the four pixels connected to the
they should match exactly under this central pixel (*) [ SQ:4.5]:
transformation. [ TV:11.2]
0 5

pixels. See also 4 connectedness . This

figure shows the eight pixels connected
to the central pixel (*) [ SQ:4.5]:

and the four groups of pixels joined by *

4 connectedness:

1 1 1 and the two groups of pixels joined by 8

2 4
2 4 3
2 3
2 2 3
2 2 2 1 1 1
1 1
Object pixel Connected Object Pixels 1 2 1
1 2 1
Background pixel
1 1
1 1 1
8 connectedness: A type of 1 1 1

image connectedness in which each Object pixel Connected Object Pixels

rectangular pixel is considered to be Background pixel
connected to all eight neighboring

A*: A search technique that performs [ JKS:15.5]

best-first searching based on an
evaluation function that combines the size(S)
P (E) =
cost so far and the estimated cost to size(Q)
the goal. [ WP:A* search algorithm]

a posteriori probability: Literally, aberration: Problem exhibited by a

after probability. It is the probability lens or a mirror whereby unexpected
p(s|e) that some situation s holds after results are obtained. There are two
some evidence e has been observed. types of aberration commonly
This contrasts with the encountered: chromatic aberration ,
a priori probability p(s) that is the where different frequencies of light
probability of s before any evidence is focus at different positions,
observed. Bayes rule is often used to
compute the a posteriori probability
from the a priori probability and the
evidence. [ JKS:15.5]

a priori probability: Suppose that

there is a set Q of equally likely blue red
outcomes for a given action. If a
particular event E could occur of any
one of a subset S of these outcomes,
then the a priori or theoretical
probability of E is defined by
chromatic abberation
A 7

and spherical aberration, where light absolute point: A 3D point defining

passing through the edges of a lens (or the origin of a coordinate system.
mirror) focuses at slightly different
positions. [ FP:1.2.3] absolute quadric: The symmetric
I3 ~03
4 4 rank 3 matrix = ~ .
absolute conic: The conic in 3D 03 0
projective space that is the intersection Like the absolute conic , it is defined to
of the unit (or any) sphere with the be invariant under Euclidean
plane at infinity. It consists only of transformations, is rescaled under
complex points. Its importance in similarities,
takes the form
computer vision is due to its role in the A A ~03
= ~0 under affine
problem of autocalibration : the image 3 0
of the absolute conic (IAC), a 2D conic, transforms and becomes an arbitrary
is represented by a 3 3 matrix that 4 4 rank 3 matrix under projective
is the inverse of the matrix KK , transforms. [ FP:13.6]
where K is the matrix of the internal
camera calibration parameters. Thus, absorption: Attenuation of light
identifying allows the camera caused by passing through an optical
calibration to be computed. [ FP:13.6] system or being incident on an object
surface. [ EH:3.5]
absolute coordinates: Generally
used in contrast to local or relative accumulation method: A method of
coordinates. A coordinate system that accumulating evidence in histogram
is referenced to some external datum. form, then searching for peaks, which
For example, a pixel in a satellite image correspond to hypotheses. See also
might be at (100,200) in image Hough transform ,
coordinates, but at (51:48:05N, generalized Hough transform .
8:17:54W) in georeferenced absolute [ AL:9.3]
coordinates. [ JKS:1.4.2]
accumulative difference: A means of
absolute orientation: In detecting motion in image sequences.
photogrammetry, the problem of Each frame in the sequence is compared
registering two corresponding sets of to a reference frame (after registration
3D points. Used to register a if necessary) to produce a difference
photogrammetric reconstruction to image. Thresholding the difference
some absolute coordinate system. image gives a binary motion mask. A
Often expressed as the problem of counter for each pixel location in the
determining the rotation R, translation accumulative image is incremented
~t and scale s that best transforms a set every time the difference between the
of model points {m ~ 1, . . . , m
~ n } to reference image and the current image
corresponding data points {d~1 , . . . , d~n } exceeds some threshold. Used for
by minimizing the least-squares error change detection . [ JKS:14.1.1]
n accuracy: The error of a value away
kd~i s(Rm
(R, ~t, s) = ~ i + ~t)k2 from the true value. Contrast this with
i=1 precision .
to which a solution may be found by [ WP:Accuracy and precision]
using singular value decomposition .
acoustic sonar: SOund Navigation
[ JKS:1.4.2]
And Ranging. A device that is used
8 A

primarily for the detection and location deformable curve representation such as
of objects (e.g., underwater or in air, as a snake . The term active refers to the
in mobile robotics, or internal to a ability of the snake to deform shape to
human body, as in medical ultrasound ) better match the image data. See also
by reflecting and intercepting acoustic active shape model . [ SQ:8.5]
waves. It operates with acoustic waves
in an analogous way to that of radar , active contour tracking: A
using both the time of flight and technique used in model based vision
Doppler effects, giving the radial where object boundaries are tracked in
component of relative position and a video sequence using
velocity. [ WP:Sonar] active contour models .

ACRONYM: A vision system active illumination: A system of

developed by Brooks that attempted to lighting where intensity, orientation, or
recognize three dimensional objects pattern may be continuously controlled
from two dimensional images, using and altered. This kind of system may
generalized cylinder primitives to be used to generate structured light .
represent both stored model and objects [ CS:1.2]
extracted from the image. [ RN:10.2]
active learning: Learning about the
active appearance model: A environment through interaction (e.g.,
generalization of the widely used looking at an object from a new
active shape model approach that viewpoint). [ WP:Active learning]
includes all of the information in the
active net: An active shape model
image region covered by the target
that parameterizes a triangulated mesh
object, rather than just that near
modeled edges. The active appearance
model has a statistical model of the active sensing: 1) A sensing activity
shape and gray-level appearance of the carried out in an active or purposive
object of interest. This statistical way, for instance where a camera is
model generalizes to cover most valid moved in space to acquire multiple or
examples. Matching to an image optimal views of an object. (See also
involves finding model parameters that active vision , purposive vision ,
minimize the difference between the sensor planning .) 2) A sensing activity
image and a synthesized model implying the projection of a pattern of
example, projected into the image. energy, for instance a laser line, onto
[ NA:6.5] the scene. See also
laser stripe triangulation ,
active blob: A region based approach
structured light triangulation .
to the tracking of non-rigid motion in
[ FP:21.1]
which an active shape model is used.
The model is based on an initial region active shape model: Statistical
that is divided using models of the shapes of objects that
Delaunay triangulation and then each can deform to fit to a new example of
patch is tracked from frame to frame the object. The shapes are constrained
(note that the patches can deform). by a statistical shape model so that
they may vary only in ways seen in a
active contour models: A technique
training set. The models are usually
used in model based vision where
formed by using
object boundaries are detected using a
A 9

principal component analysis to active volume: The volume of interest

identify the dominant modes of shape in a machine vision application.
variation in observed examples of the
shape. Model shapes are formed by activity analysis: Analyzing the
linear combinations of the dominant behavior of people or objects in a video
modes. [ WP:Active shape model] sequence, for the purpose of identifying
the immediate actions occurring or the
active stereo: An alternative long term sequence of actions. For
approach to traditional example, detecting potential intruders
binocular stereo . One of the cameras is in a restricted area.
replaced with a structured light [ WP:Occupational therapy#Activity analysis]
projector, which projects light onto the
object of interest. If the camera
calibration is known, the triangulation acuity: The ability of a vision system
for computing the 3D coordinates of to discriminate (or resolve) between
object points simply involves finding closely arranged visual stimuli. This
the intersection of a ray and known can be measure using a grating, i.e., a
structures in the light field. [ CS:1.2] pattern of parallel black and white
stripes of equal widths. Once the bars
active surface: 1) A surface become too close, the grating becomes
determined using a range sensor ; 2) an indistinguishable from a uniform image
active shape model that deforms to fit of the same average intensity as the
a surface. [ WP:Active surface] bars. Under optimal lighting, the
minimum spacing that a person can
active triangulation: Determination resolve is 0.5 min of arc. [ SEU:7.6]
of surface depth by triangulation
between a light source at a known adaptive: The property of an
position and a camera that observes the algorithm to adjust its parameters to
effects of the illuminant on the scene. the data at hand in order to optimize
Light stripe ranging is one form of performance. Examples include
active triangulation. A variant is to use adaptive contrast enhancement ,
a single scanning laser beam to adaptive filtering and
illuminate the scene and use a stereo adaptive smoothing .
pair of cameras to compute depth. [ WP:Adaptive algorithm]
[ WP:3D scanner#Triangulation]
adaptive coding: A scheme for the
active vision: An approach to transmission of signals over unreliable
computer vision in which the camera or channels, for example a wireless link.
sensor is moved in a controlled manner, Adaptive coding varies the parameters
so as to simplify the nature of a of the encoding to respond to changes
problem. For example, rotating a in the channel, for example fading,
camera with constant angular velocity where the signal-to-noise ratio
while maintaining fixation at a point degrades. [ WP:Adaptive coding]
allows absolute calculation of scene
point depth, instead of only relative adaptive contrast enhancement:
depth that depends on the camera An image processing operation that
speed. (See also kinetic depth .) applies histogram equalization locally
[ VSN:10] across an image.
[ WP:Adaptive histogram equalization]
10 A

adaptive edge detection:

Edge detection with adaptive pyramid: A method of
adaptive thresholding of the gradient multi-scale processing where small
magnitude image. [ VSN:3.1.2] areas of image having some feature in
common (say color) are first extracted
adaptive filtering: In signal into a graph representation. This graph
processing, any filtering process in is then manipulated, for example by
which the parameters of the filter pruning or merging, until the level of
change over time, or where the desired scale is reached.
parameters are different at different
parts of the signal or image. adaptive reconstruction: Data
[ WP:Adaptive filter] driven methods for creating statistically
significant data in areas of a 3D data
adaptive histogram equalization: cloud where data may be missing due
A localized method of improving image to sampling problems.
contrast. A histogram is constructed of
the gray levels present. These gray adaptive smoothing: An iterative
levels are re-mapped so that the smoothing algorithm that avoids
histogram is approximately flat. It can smoothing over edges. Given an image
be made perfectly flat by dithering. I(x, y), one iteration of adaptive
[ WP:Adaptive histogram equalization] smoothing proceeds as follows:

1. Compute gradient magnitude

image G(x, y) = |I(x, y)|

2. Make weights image

W (x, y) = eG(x,y)

3. Smooth the image

P1 P1
i=1 j=1 Axyij
S(x, y) = P1 P1
original after adaptive histogram equalization i=1 j=1 Bxyij

adaptive Hough transform: A

Hough transform method that
iteratively increases the resolution of
the parameter space quantization. It is Axyij = I(x+i, y+j)W (x+i, y+j)
particularly useful for dealing with high
dimensional parameter spaces. Its Bxyij = W (x + i, y + j)
disadvantage is that sharp peaks in the
histogram can be missed. [ NA:5.6]
[ WP:Additive smoothing]
adaptive meshing: Methods for
creating simplified meshes where adaptive thresholding: An improved
elements are made smaller in regions of image thresholding technique where the
high detail (rapid changes in surface threshold value is varied at each pixel.
orientation) and larger in regions of low A common technique is to use the
detail, such as planes. average intensity in a neighbourhood to
[ WP:Adaptive mesh refinement] set the threshold. [ ERD:4.4]
A 11

common boundary, nodes in a graph

connected by an arc or components in a
geometric model sharing some common
bounding component, etc. Formally
defining adjacent can be somewhat
heuristic because you may need a way
to specify closeness (e.g., on a
quantized grid of pixels) or consider
Image, I Smoothed, S Thresholded
I > S6 how much shared boundary is
required before two structures are
adaptive triangulation: See
adjacent. [ RN:2.1.1]
adaptive meshing .
adjacency: See adjacent. [ RN:2.1.1]
adaptive visual servoing: See
visual servoing . [ WP:Visual Servoing] adjacency graph: A graph that
shows the adjacency between
structures, such as
additive color: The way in which
segmented image regions . The nodes of
multiple wavelengths of light can be
the graph are the structures and an arc
combined to allow other colors to be
implies adjacency of the two structures
perceived (e.g., if equal amounts of
connected by the arc. This figure shows
green and red light are shone on a sheet
the graph associated with the
of white paper the paper will appear to
segmented image on the left:
be illuminated with a yellow light
source. Contrast this with
subtractive color . [ LG:3.7]
2 2

3 8
1 3

Green Red 4 4

5 6

Regions Adjacency graph

additive noise: Generally image affine: A term first used by Euler.

independent noise that is added to it by Affine geometry is a study of properties
some external process. The recorded of geometric objects that remain
image I at pixel (i, j) is then the sum invariant under affine transformations
of the true signal S and the noise N . (mappings). These include:
parallelness, cross ratio, adjacency.
Ii,j = Si,j + Ni,j [ WP:Affine geometry]

The noise added at each pixel (i, j) affine arc length: For a parametric
could be different. [ SEU:3.2] equation of a curve f~(u) = (x(u), y(u)),
arc length is not preserved under an
adjacent: Commonly meaning next affine transformation . The affine length
to each other, whether in a physical
Z u
sense of being connected pixels in an 1
(u) = (x
y x
image, image regions sharing some 0
12 A

is invariant under affine I2 =(230 203 630 21 12 03

transformations. [ SQ:8.4] +430 312 + 4321 03
affine camera: A special case of the 3221 212 )/10
projective camera that is obtained by
constraining the 3 4 camera
parameter matrix T such that I3 =(20 (21 03 212 )
T3,1 = T3,2 = T3,3 = 0 and reducing the
11 (30 03 21 12 ) +
camera parameter vector from 11
degrees of freedom to 8. [ FP:2.3.1] 02 (30 12 221 ))/700

affine curvature: A measure of

curvature based on the I4 =(320 203 6220 11 12 03
affine arc length, . For a parametric
6220 02 21 03 + 9220 02 212
equation of a curve f~(u) = (x(u), y(u)),
its affine curvature, , is +1220 211 21 03
+620 11 02 30 03
( ) = x ( )y ( ) x ( )y ( ) 1820 11 02 21 12
[ WP:Affine curvature] 8311 30 03 620 202 30 12
+920 202 221 + 12211 02 30 12
affine flow: A method of finding the
611 202 30 21 + 302 230 )/11
movement of a surface patch by
estimating the affine transformation
parameters required to transform the where each is the associated
patch from its position in one view to central moment . [ NA:7.3]
affine quadrifocal tensor: The form
affine fundamental matrix: The taken by the quadrifocal tensor when
fundamental matrix which is obtained specialized to the viewing conditions
from a pair of cameras under affine modeled by the affine camera .
viewing conditions. It is a 3 3 matrix
affine reconstruction: A three
whose upper left 2 2 submatrix is all
dimensional reconstruction where the
zero. [ HZ:13.2.1]
ambiguity in the choice of basis is affine
affine invariant: An object or shape only. Planes that are parallel in the
property that is not changed (i.e., is Euclidean basis are parallel in the affine
invariant) by the application of an reconstruction. A
affine transformation . See also projective reconstruction can be
invariant . [ FP:18.4.1] upgraded to affine by identification of
the plane at infinity, often by locating
affine length: See affine arc length . the absolute conic in the
[ WP:Affine curvature] reconstruction. [ HZ:9.4.1]

affine moment: Four shape measures affine stereo: A method of scene

derived from second and third order reconstruction using two calibrated
moments that remain invariant under views of a scene from known view
affine transformation s. They are given points. It is a simple but very robust
by approximation to the geometry of
20 02 211 stereo vision, to estimate positions,
I1 =
400 shapes and surface orientations. It can
A 13

be calibrated very easily by observing The ratio of length of line

just four reference points. Any two segments of a given line remains
views of the same planar surface will be constant.
related by an affine transformation that
maps one image to the other. This The ratio of areas of two triangles
consists of a translation and a tensor, remains constant.
known as the disparity gradient tensor Ellipses remain ellipses and the
representing the distortion in image same is true for parabolas and
shape. If the standard unit vectors X hyperbolas.
and Y in one image are the projections
of some vectors on the object surface Barycenters of triangles (and
and the linear mapping between images other shapes) map into the
is represented by a 2 3 matrix A, then corresponding barycenters.
the first two columns of A will be the
Analytically, affine transformations are
corresponding vectors in the other
represented in the matrix form
image. Since the centroid of the plane
will map to both image centroids, it can f (x) = Ax + b
be used to find the surface orientation
where the determinant det(A) of the
affine transformation: A special set square matrix A is not 0. In 2D the
of transformations in Euclidean matrix is 2 2; in 3D it is 3 3.
geometry that preserve some properties [ FP:2.2]
of the construct being transformed.
affine trifocal tensor: The form
taken by the trifocal tensor when
specialized to the viewing conditions
modeled by the affine camera .

affinely invariant region: Image

patches that automatically deform with
changing viewpoint in such a way that
they cover identical physical parts of a
scene. Since such regions can are
describable by a set of invariant
features they are relatively easy to
match between views under changing
Affine transformations preserve:
illumination .
Collinearity of points: if three agglomerative clustering: A class of
points belong to the same straight iterative clustering algorithms that
line, their images under affine begin with a large number of clusters
transformations also belong to the and at each iteration merge pairs (or
same line and the middle point tuples) of clusters. Stopping the
remains between the other two process at a certain number of
points. iterations gives the final set of clusters,
or the process can be run until only one
Parallel lines remain parallel, cluster remains, and the progress of the
concurrent lines remain algorithm represented as a dendrogram.
concurrent (images of intersecting [ WP:Cluster analysis#Agglomerative hierarchical clustering]
lines intersect).
14 A

smoothing ) before downsampling

albedo: Whiteness. Originally a term mitigates the effect. [ FP:7.4]
used in astronomy to describe reflecting

Albedo values

1.0 0.75 0.5 0.25 0.0

If a body reflects 50% of the light

falling on it, it is said to have albedo
0.5. [ FP:4.3.3]

algebraic distance: A linear

distance metric commonly used in
computer vision applications because of
its simple form and standard matrix
based least mean square estimation
operations. If a curve or surface is
defined implicitly by f (~x, ~a) = 0 (e.g.,
~x ~a = 0 for a hyperplane) the algebraic
distance of a point ~xi to the surface is
simply f (~xi , ~a). [ FP:10.1.5]

aliasing: The erroneous replacement of

high spatial frequency (HF)
components by low-frequency ones
when a signal is sampled . The affected
HF components are those that are
higher than the Nyquist frequency, or
half the sampling frequency. Examples
include the slowing of periodic signals
alignment: An approach to
by strobe lighting, and corruption of
geometric model matching by
areas of detail in image resizing. If the
registering a geometric model to the
source signal has no HF components,
image data. [ FP:18.2]
the effects of aliasing are avoided, so
the low pass filtering of a signal to ALVINN: Autonomous Land Vehicle
remove HF components prior to In a Neural Network: An early attempt,
sampling is one form of anti-aliasing. at Carnegie-Mellon University, to learn
The image below is the perspective a complex behaviour (maneuvering a
projection of a checkerboard. The vehicle) by observing humans.
image is obtained by sampling the
scene at a set of integer locations. First ambient light: Illumination by diffuse
figure: The spatial frequency increases reflections from all surfaces within a
as the plane recedes, producing aliasing scene (including the sky, which acts as
artifacts (jagged lines in the an external distant surface). In other
foreground, moire patterns in the words, light that comes from all
background). Second figure: removing directions, such as the sky on a cloudy
high-frequency components (i.e., day. Ambient light ensures that all
A 15

surfaces are illuminated, including AND operator: A boolean logic

those not directly facing light sources. operator that combines two input
[ FP:5.3.3] binary images, applying the AND logic
p q p&q
AMBLER: An autonomous active
vision system using both structured 0 0 0
light and sonar, developed by NASA 0 1 0
and Carnegie-Mellon University. It is 1 0 0
supported by a 12-legged robot and is 1 1 1
intended for planetary exploration. at each pair of corresponding pixels.
This approach is used to select image
amplifier noise: Spurious regions. The rightmost image below is
additive noise signal generated by the the result of ANDing the two leftmost
electronics in a sampling device. The images. [ SB:3.2.2]
standard model for this type of noise is
Gaussian. It is independent of the
signal. In color cameras, where more
amplification is used in the blue color
channel than in the green or red
channel there tends to be more noise in
the blue channel. In well-designed angiography: A method for imaging
electronics amplifier noise is generally blood vessels by introducing a dye that
negligible. is opaque when photographed by X-ray.
Also the study of images obtained in
analytic curve finding: A method of
this way. [ WP:Angiography]
detecting parametric curves by first
transforming data into a feature space angularity ratio: Given two figures,
that is then searched for the X and Y , i (X) and j (Y ) are angles
hypothesized curve parameters. subtending convex parts of the contour
Examples might be line finding using of the figure X and k (X) are angles
the Hough transform . subtending plane parts of the contour of
figure X, then the angularity ratios are:
anamorphic lens: A lens having one
or more cylindrical surfaces. X i (X)
Anamorphic lenses are used in 360o
photography to produce images that are
compressed in one dimension. Images and P
can later be restored to true form using Pi j
another reversing anamorphic lens set. k k (X)
This form of lens is used in wide-screen anisotropic filtering: Any filtering
movie photography. technique where the filter parameters
anatomical map: A biological model vary over the image or signal being
usable for alignment with or filtered. [ WP:Anisotropic filtering]
region labeling of a corresponding anomalous behavior detection:
image dataset. For example, one could Special case of surveillance where
use a model of the brains functional human movement is analyzed. Used in
regions to assist in the identification of particular to detect intruders or
brain structures in an NMR dataset. behavior likely to precede or indicate
crime. [ WP:Anomaly detection]
16 A

antimode: The minimum between two BEFORE AFTER

maxima. For example one method of
threshold selection is done by
determining the antimode in a bimodal


x apparent contour: The apparent

contour of a surface S in 3D, is the set
aperture: Opening in the lens
of critical values of the projection of S
diaphragm of a camera through which
on a plane, in other words, the
light is admitted. This device is often
silhouette. If the surface is transparent,
arranged so that the amount of light
the apparent contour can be
can be controlled accurately. A small
decomposed into a collection of closed
aperture reduces the amount of light
curves with double points and cusps.
available, but increases the
The convex envelope of an apparent
depth of field . This figure shows nearly
contour is also the boundary of its
closed (left) and nearly open (right)
convex hull . [ VSN:4]
aperture positions [ TV:2.2.2]:
apparent motion: The 3D motion
suggested by the image motion field ,
but not necessarily matching the real
3D motion. The reason for this
mismatch is the motion fields may be
ambiguous, that is, may be generated
by different 3D motions, or light source
closed open
movement. Mathematically, there may
be multiple solutions to the problem of
aperture control: Mechanism for reconstructing 3D motion from the
varying the size of a cameras aperture . image motion field. See also
[ WP:Aperture#Aperture control] visual illusion , motion estimation .
[ WP:Apparent motion]
aperture problem: If a motion sensor
has a finite receptive field, it perceives appearance: The way an object looks
the world through something from a particular viewpoint under
resembling an aperture, making the particular lighting conditions.
motion of a homogeneous contour seem [ FP:25.1.3]
locally ambiguous. Within that
aperture, different physical motions are appearance based recognition:
therefore indistinguishable. For Object recognition where the object
example, the two alternative motions of model encodes the possible
the square below are identical in the appearances of the object (as
circled receptive fields [ VSN:8.1.1]: contrasted with a geometric model
A 17

that encodes the shape as used in

model based recognition ). In principle, appearance model: A representation
it is impossible to encode all used for interpreting images that is
appearances when occlusions are based on the appearance of the object.
considered; however, small numbers of These models are usually learned by
appearances can often be adequate, using multiple views of the objects. See
especially if there are not many models also active appearance model and
in the model base. There are many appearance based recognition .
approaches to appearance based [ WP:Active appearance model]
recognition, such as using a
appearance prediction: Part of the
principal component model to encode
science of appearance engineering,
all appearances in a compressed
where an object texture is changed so
framework, using color histograms to
that the viewer experience is
summarize the appearance, or using a
set of local appearance descriptors such
as Gabor filters extracted at appearance singularity: An image
interest points . A common feature of position where a small change in viewer
these approaches is learning the models position can cause a dramatic change in
from examples. [ TV:10.4] the appearance of the observed scene,
such as the appearance or
appearance based tracking:
disappearance of image features. This
Methods for object or target
is contrasted with changes occurring
recognition in real time, based on image
when in a generic viewpoint . For
pixel values in each frame rather than
example, when viewing the corner of a
derived features. Temporal filtering,
cube from a distance, a small change in
such as the Kalman filter , is often
viewpoint still leaves the three surfaces
at the corner visible. However, when
appearance change: Changes in an the viewpoint moves into the infinite
image that are not easily accounted for plane containing one of the cube faces
by motion, such as an object actually (a singularity), one or more of the
changing form. planes disappears.

appearance enhancement arc length: If f is a function such that

transform: Generic term for its derivative f is continuous on some
operations applied to images to change, closed interval [a, b] then the arc length
or enhance, some aspect of them. of f from x = a to x = b is the integral
Examples include brightness [ FP:19.1]
adjustment, contrast adjustment, edge
sharpening, histogram equalization,
saturation adjustment or magnification.

appearance flow: Robust methods for

real time object recognition from a
sequence of images depicting a moving
object. Changes in the images are used
rather than the images themselves. It is Z b
analogous to processing using p
1 + [f (x)]2 dx
optical flow . a
18 A

single instruction multiple data .

[ WP:Vector processor]
arc length

11 11
00 00
f(x) 111
000 111
000 00 11
11 00
01 01 01
10 10 10
000 111
000 11 00
00 11
0110 0110 0110
10 10 10
x=a x=b x
000 111
000 11 00
00 11
arc of graph: Two nodes in a graph
can be connected by an arc. The 00
11 00
11 00
11 000
dashed lines here are the arcs:
11 00
11 00
11 000
11 11
00 00
000 111
000 00 11
11 00

A B arterial tree segmentation: Generic

term for methods used in finding
internal pipe-like structures in medical
images. Example image types are NMR
images, angiograms and X-rays .
C Example trees are bronchial systems
and veins.

[ WP:Graph (mathematics)] articulated object: An object

composed by a number of (usually)
architectural model reconstruction: rigid subparts or components connected
A generic term for reverse engineering by joints, which can be arranged in a
buildings based on collected 3D data as number of different configurations. The
well as libraries of building constraints. human body is a typical example.
[ BM:1.9]
area: The measure of a region or
surfaces extension in some given units. articulated object model: A
The units could be image units, such as representation of an articulated object
square pixels, or in scene units, such as that includes both its separate parts
square centimeters. [ JKS:2.2.1] and their range of movement (typically
joint angles) relative to each other.
area based: Image operation that is
applied to a region of an image, as articulated object segmentation:
opposed to pixel based. [ CS:6.6] Methods for acquiring an
articulated object from 2D or 3D data.
array processor: A group of
time-synchronized processing elements articulated object tracking:
that perform computations on data Tracking an articulated object in an
distributed across them. Some array image sequence. This includes both the
processors have elements that pose of the object and also its shape
communicate only with their immediate parameters, such as joint angles.
neighbors, as in the topology shown [ WP:Finger tracking]
below. See also
A 19

aspect graph: A graph of the set of features a, b, c and d. The maximal

views (aspects) of an object, where the clique consisting of A:a, B:b and C:c is
arcs of the graph are transitions one match hypothesis. [ BB:11.2.1]
between two neighboring views (the
nodes ) and a change between aspects is
called a visual event. See also A:a B:b
characteristic view . This graph shows
some of the aspects of the
hippopotamus [ FP:20]
C:c C:d

astigmatism: Astigmatism is a
refractive error where the light is
focused within an optical system, such
as in this example.


aspect ratio: 1) The ratio of the sides

of the bounding box of an object,
where the orientation of the box is
chosen to maximize this ratio. Since
this measure is scale invariant it is a
useful metric for object recognition . 2)
In a camera, it is the ratio of the
horizontal to vertical pixel sizes. 3) In It occurs when a lens has irregular
an image, it is the ratio of the image curvature causing light rays to focus at
width to height. For example, an image an area, rather than at a point. It may
of 640 by 480 pixels has an aspect ratio be corrected with a toric lens, which
of 4:3. [ AL:2.2] has a greater refractive index on one
axis than the others. In human eyes,
aspects: See characteristic view and astigmatism often occurs with
aspect graph . [ FP:20] nearsightedness and farsightedness.
[ FP:1.2.3]
association graph: A graph used in
structure matching, such as matching a atlas based segmentation: A
geometric model to a data description. segmentation technique used in medical
In this graph, each node corresponds to image processing, especially with brain
a pairing between a model and a data images. Automatic tissue segmentation
feature (with the implicit assumption is achieved using a model of the brain
that they are compatible). Arcs in the structure and imagery (see
graph mean that the two connected atlas registration ) compiled with the
nodes are pairwise compatible. Finding assistance of human experts. See also
maximal cliques is one technique for image segmentation .
finding good matches. The graph below
shows a set of pairings of model atlas registration: An image
features A, B and C with image registration technique used in medical
20 A

image processing, especially to register itself. For an infinitely long 1D signal

brain images. An atlas is a model f (t) : R 7 R, the autocorrelation at a
(perhaps statistical) of the shift t is
characteristics of multiple brains, Z
providing examples of normal and Rf (t) = f (t)f (t + t)dt

pathological structures. This makes it
possible to take into account anomalies The autocorrelation function Rf always
that single-image registration could not. has a maximum at 0. A peaked
See also medical image registration . autocorrelation function decays quickly
away from t = 0. The sample
ATR: See autocorrelation function of a finite set
automatic target recognition. of values f1..n is
[ WP:Automatic target recognition] {rf (d)|d = 1, . . . , n 1} where
attention: See visual attention . (fi f)(fi+d f)
rf (d) = i=1Pn 2
[ WP:Attention] i=1 (fi f )
attenuation: The reduction of a and f = n1 i=1 fi is the sample mean.
particular phenomenon, for instance, [ WP:Autocorrelation]
noise attenuation as the reduction of
image noise. [ WP:Attenuation] autofocus: Automatic determination
and control of image sharpness in an
attributed graph: A graph useful for optical or vision system. There are two
representing different properties of an major variations in this control system:
image. Its nodes are attributed pairs of active focusing and passive focusing.
image segments, their color or shape for Active autofocus is performed using
example. The relations between them, sonar or infrared signal to determine
such as relative texture or brightness the object distance. Passive autofocus
are encoded as arcs . [ BM:4.5.2] is performed by analyzing the image
itself to optimize differences between
augmented reality: Primarily a adjacent pixels in the CCD array.
projection method that adds graphics [ WP:Autofocus]
or sound, etc as an overlay to original
image or audio. For example, a automatic: Performed by a machine
fire-fighters helmet display could show without human intervention. The
exit routes registered to his/her view of opposite of manual.
the building. [ WP:Augmented reality] [ WP:Automation]
automatic target recognition
autocalibration: The recovery of a (ATR): Sensors and algorithms used
cameras calibration using only point for detecting hostile objects in a scene.
(or other feature) correspondences from Sensors are of many different types,
multiple uncalibrated images and sampling in infrared , visible light and
geometric consistency constraints (e.g., using sonar and radar .
that the camera settings are the same [ WP:Automatic target recognition]
for all images in a sequence).
[ AL:13.7] autonomous vehicle: A mobile robot
controlled by computer, with human
autocorrelation: The extent to which input operating only at a very high
a signal is similar to shifted copies of level, stating the ultimate destination
or task for example. Autonomous
A 21

navigation requires the visual tasks of axis of elongation: 1) The line that
route detection, self-localization , minimizes the second moment of the
landmark location and data points. If {~xi } are the data points,
obstacle detection , as well as robotics and d(~x, L) is the distance from point ~x
tasks such as route planning and motor to line L, then P the axis of elongation A
control. [ WP:Driverless car] minimizes i d(~xi , A)2 . Let ~ be the
meanPof {~xi }. Define the scatter matrix
autoregressive model: A model that S = ~ )T . Then the
i (~
~ )(~xi
uses statistical properties of past axis of elongation is the eigenvector of
behavior of some variable to predict S with the largest eigenvalue . See also
future behavior of that variable. A principal component analysis . The
signal xt at time t satisfies an figure below shows this axis of
Pp model if elongation for a set of points. 2) The
xt = n=1 n xtn + t , where t is longer midline of the bounding box
noise. [ WP:Autoregressive model] with largest length-to-width ratio. A
possible axis of elongation is the line in
autostereogram: An image similar to
this figure [ JKS :2.2.3]:
a random dot stereogram in which the
corresponding features are combined
into a single image. Stereo fusion
allows the perception of a 3D shape in
the 2D image. [ WP:Autostereogram]

average smoothing: See

mean smoothing . [ VSN:3.1]

AVI: Microsoft format for audio and

video files (audio video interleaved). axis of rotation: A line about which
Unlike MPEG, it is not a standard, so a rotation is performed. Equivalently,
that compatibility of AVI video files the line whose points are fixed under
and AVI players is not always the action of a rotation. Given a 3D
guaranteed. rotation matrix R, the axis is the
[ WP:Audio Video Interleave] eigenvector of R corresponding to the
eigenvalue 1. [ JKS :12.2.2]
axial representation: A
region representation that uses a curve axis-angle curve representation: A
to describe the image region. The axis rotation representation based on the
may be a skeleton derived from the amount of twist about the axis of
region by a thinning process. rotation, here a unit vector ~a. The
quaternion rotation representation is

B-rep: See
surface boundary representation . b-spline snake: A snake made from
[ BT:8] b-splines .

b-spline: A curve approximation back projection: 1) A form of display

spline represented as a combination of where a translucent screen is
basis functions: illuminated from the side not facing the
viewer. 2) The computation of a 3D
X quantity from its 2D projection. For
~c(t) = ~ai Bi (x) example, a 2D homogeneous point ~x is
the projection of a 3D point X ~ by a
where Bi are the basis functions and ~ai perspective projection matrix P, so
are the control points. B-splines do not ~x = PX.~ The backprojection of ~x is the
necessarily pass through any of the 3D line {null(P) + P+ ~x} where P+ is
control points; however, if b-splines are the pseudoinverse of P. 3) Sometimes
calculated for adjacent sets of control used interchangeably with
points the curve segments will join up triangulation . 4) Technique to compute
and produce a continuous curve. the attenuation coefficients from
[ JKS:13.7.1] intensity profiles covering a total cross
section under various angles. It is used
b-spline fitting: Fitting a b-spline to in CT and MRI to recover 3D from
a set of data points. This is useful for essentially 2D images. 5) Projection of
noise reduction or for producing a more the estimated 3D position of a shape
compact model of the observed curve.
[ JKS:13.7.1]
B 23

back into the 2D image from which the

shapes pose was estimated. [ AJ:10.3]

background: In computer vision,

generally used in the context of object
recognition. The background is either
(1) the area of the scene behind an
object or objects of interest or (2) the
part of the image whose pixels sample
from the background in the scene. As
opposed to foreground . See also
figure/ground separation . [ JKS:2.5]

background labeling: Methods for

differentiating objects in the foreground
of images or those of interest from
those in the background . [ AL:10.4]

background modeling:
Segmentation or change detection
method where the scene behind the
objects of interest is modeled as a fixed
or slowly changing background , with
possible foreground occlusions . Each
pixel is modeled as a distribution which
is then used to decide if a given
observation belongs to the background
or an occluding object. [ NA:3.5.2]

background normalization:
Removal of the background by some
image processing technique to estimate
the background image and then
dividing or subtracting the background
from an original image. The technique
is useful for when the background is
non-uniform. The images below backlighting: A method of
illustrate this where the first shows the illuminating a scene where the
input image, the second is the background receives more illumination
background estimate obtained by than the foreground . Commonly this is
dilation with ball(9, 9) used to produce silhouettes of opaque
structuring element and the third is the objects against a lit background, for
(normalized) division of the input image easier object detection. [ LG:2.1.1]
by the background image. [ JKS:3.2.1]
bandpass filter: A signal processing
filtering technique that allows signals
between two specified frequencies to
pass but cuts out signals at all other
frequencies. [ FP:9.2.2]
24 B

back-propagation: One of the arranged to give details on products or

best-studied neural network training other objects. Bar codes themselves
algorithms for supervised learning . have many different coding standards
The name arises from using the and arrangements. An example bar
propagation of the discrepancies code is [ LG:7]:
between the computed and desired
responses at the network output back
to the network inputs. The
discrepancies are one of the inputs into
the network weight recomputation
process. [ WP:Backpropagation] barrel distortion: Geometric
lens distortion in an optical system
back-tracking: A basic technique for that causes the outlines of an object to
graph searching : if a terminal but curve outward, forming a barrel shape.
non-solution node is reached, search See also pincushion distortion .
does not terminate with failure, but [ EH:6.3.1]
continues with still unexplored children
of a previously visited non-terminal barycentrum: See center of mass .
node. Classic back-tracking algorithms [ JKS:2.2.2]
are breadth-first, depth-first, and A* .
bas-relief ambiguity: The ambiguity
See also graph , graph searching ,
in reconstructing a 3D object with
search tree . [ BB:11.3.2]
Lambertian reflectance using shading
bar: A raw primal sketch primitive from an image under orthographic
that represents a dark line segment projection. If the true surface is z(x, y),
against a lighter background (or its then the family of surfaces
inverse). Bars are also one of the az(x, y) + bx + cy generate identical
primitives in Marrs theory of vision. images under these viewing conditions,
The following is a small dark bar so any reconstruction, for any values of
observed inside a receptive field : a, b, c is equally valid. The ambiguity is
thus up to a three-parameter family.

Receptive field baseline: Distance between two

cameras used in a binocular stereo
system. [ DH:10.6]

Bar Object point

bar detector: 1) Method or algorithm

Epipolar plane
that produces maximum excitation
when a bar is in its receptive field . 2)
Device used by thirsty undergraduates.
[ WP:Feature detection (nervous system)#History]
Left image plane Right image plane

bar-code reading: Methods and

algorithms used for the detection,
imaging and interpretation of black
parallel lines of different widths Left camera Stereo baseline Right camera
B 25

Bayesian model: A statistical

basis function representation: A modeling technique based on two input
method of representing a function as a models:
sum of simple (usually orthonormal )
ones. For example the
Fourier transform represents functions 1. a likelihood model p(y|x, h),
as a weighted sum of sines and cosines. describing the density of
[ AJ:1.2] observing y given x and h.
Regarded as a function of h, for a
Bayes rule: The relationship between
fixed y and x, the density is also
the conditional probability of event A
known as the likelihood of h.
given B and the conditional probability
of event B given event A. This
expressed as
2. a prior model, p(h|D0 ) which
P (B|A)P (A) specifies the a priori density of h
P (A|B) =
P (B) given some known information
denoted by D0 before any new
providing that P (B) 6= 0. [ SQ:14.2.1] data are taken into account.

Bayesian classifier: A mathematical

approach to classifying a set of data, by The aim of the Bayesian model is to
selecting the class most likely to have predict the density for outcomes y in
generated that data. If ~x is the data test situations x given data
and c is a class, then the probability of D = DT , D0 with both pre-known and
that class is p(c|~x). This probability training data.
can be hard to compute so Bayes rule
can then be used here, which says that Bayesian model learning: See
probabilistic model learning . [ DH:3.1]
p(c|~x) = P (~xp(~
x) . Then we can
compute the probability of the class
p(c|~x) in terms of the probability of Bayesian network: A belief modeling
having observed the given data ~x with, approach using a graph structure.
P (~x|c), and without, p(~x) assuming the Nodes are variables and arcs are
class c plus the a priori likelihood, p(c), implied causal dependencies and are
of observing the class. The Bayesian given probabilities. These networks are
classifier is the most common statistical useful for fusing multiple data (possibly
classifier currently used in computer of different types) in a uniform and
vision processes. [ DH:3.3.1] rigorous manner.
[ WP:Bayesian network]
Bayesian filtering: A probabilistic
data fusion technique. It uses a BDRF/BRDF: See
formulation of probabilities to represent bidirectional reflectance distribution function
the system state and likelihood . [ FP:4.2.2]
functions to represent their
relationships. In this form, Bayes rule beam splitter: An optical system that
can be applied and further related divides unpolarized light into two
probabilities deduced. orthogonally polarized beams, each at
[ WP:Bayesian filtering] 90o to the other, as in this example
[ EH:4.3.4]:
26 B

distributions. Given two arbitrary

distributions pi (x)i=1,2 the
Bhattacharyya distance between them
is [ PGS:4.5]
Z p
d = log (p1 (x)p2 (x).dx
bicubic spline interpolation: A
special case of surface interpolation
that uses cubic spline functions in two
dimensions. This is like
behavior analysis: Model based bilinear surface interpolation except
vision techniques for identifying and that the interpolating surface is curved,
tracking behavior in humans. Often instead of flat.
used for threat analysis. [ WP:Bicubic interpolation#Bicubic convolution algorithm]
[ WP:Applied behavior analysis]

behavior learning: Generation of bidirectional reflectance

goal-driven behavior models by some distribution function
learning algorithm, for example (BRDF/BDRF): If the energy
reinforcement learning. arriving at a surface patch, denoted
E(i , i ), and the energy radiated in a
Beltrami flow: A noise suppression particular direction is denoted L(e , e )
technique where images are treated as in polar coordinates, then BRDF is
surfaces and the surface area is defined as the ratio of the energy
minimized in such a way as to preserve radiated from a patch of a surface in
edges. See also diffusion smoothing . some direction to the amount of energy
arriving there. The radiance is
bending energy: 1) A metaphor determined from the irradiance by
borrowed from the mechanics of thin
metal plates. If a set of landmarks is L(e , e ) = f (i , i , e , e )E(e , e )
distributed on two infinite flat metal
plates and the differences in the where the function f is the bidirectional
coordinates between the two sets are reflectance distribution function. This
vertical displacements of the plate, one function often only depends on the
Cartesian coordinate at a time, then difference between the incident angle i
the bending energy is the energy of the ray falling on the surface and the
required to bend the metal plate so angle e of the reflected ray. The
that the landmarks are coincident. geometry is illustrated by [ FP:4.2.2]:
When applied to images, the sets of
landmarks may be sets of features. 2)
Denotes the amount of energy that is L E
stored due to an objects shape.
best next view: See i
next view planning .
Bhattacharyya distance: A measure e
of the (dis)similarity of two probability
B 27

in x for fixed y. For example, if ~x and ~y

bilateral filtering: A non-iterative are vectors and A is a matrix such that
alternative to anisotropic filtering ~x A~y is defined, then the function
where images can be smoothed but f (~x, ~y ) = ~x A~y + ~x + ~y is bilinear in ~x
edges present in them are preserved. and ~y . [ WP:Bilinear form]
[ WP:Bilateral filter]
bimodal histogram: A histogram
bilateral smoothing: See with two pronounced peaks, or modes.
bilateral filtering. [ WP:Bilateral filter] This is a convenient intensity histogram
for determining a binarizing threshold.
An example is:
bilinear surface interpolation: To
determine the value of a function
f (x, y) at an arbitrary location (x, y), 10000


of which only discrete samples 8000

fij = {f (xi , yj )}ni=1 j=1 are available. 7000

The samples are arranged on a 2D grid, 6000


so the value at point (x, y) is 4000

interpolated from the values at the four 3000


surrounding points. In the diagram 1000

below fbilinear (x, y) = 0

0 50 100 150 200 250

A+B bin-picking: The problem of getting a

(d1 + d1 )(d2 + d2 ) robot manipulator equipped with vision
sensors to pick parts, for instance
where screws, bolts, components of a given
assembly, from a random pile. A classic
A = d1 d2 f11 + d1 d2 f21 challenge for handeye robotic systems,
involving at least segmentation ,
B = d1 d2 f12 + d1 d2 f22 object recognition in clutter and
The gray lines offer an easy aide pose estimation .
memoire: each function value fij is
multiplied by the two closest d values. binarization: See thresholding .
[ TV:8.4.2] [ ERD:2.2.1]

binary image: An image whose pixel s

can either be in an on or off state,
represented by the integers 1 and 0
respectively. An example is [ DH:7.4]:

d1 d1


f12 f22

bilinearity: A function of two

variables x and y is bilinear in x and y binary mathematical morphology:
if it is linear in y for fixed x and linear A group of shape-based operations that
28 B

can be applied to binary images, based simultaneously usually from a similar

around a few simple mathematical viewpoint. See also stereo vision .
concepts from set theory. Common [ TV:7.1]
usages include noise reduction ,
image enhancement and binocular stereo: A method of
image segmentation . The two most deriving depth information from a pair
basic operations are dilation and of calibrated cameras set at some
erosion . These operators take two distance apart and pointing in
pieces of data as input: the input approximately the same direction.
binary image and a structuring element Depth information comes from the
(also known as a kernel). Virtually all parallax between the two images and
other mathematical morphology relies on being able to derive the same
operators can be defined in terms of feature in both images. [ JKS:12.6]
combinations of erosion and dilation
binocular tracking: A method that
along with set operators such as
tracks objects or features in 3D using
intersection and union. Some of the
binocular stereo .
more important are opening , closing
and skeletonization . Binary biometrics: The science of
morphology is a special case of discriminating individuals from
gray scale mathematical morphology . accurate measurement of their physical
See also mathematical morphology . features. Example biometric
[ SQ:7.1] measurements are retinal lines, finger
lengths, fingerprints, voice
binary moment: Given a
characteristics and facial features.
binary image B(i, j), there is an
[ WP:Biometrics]
infinite family of moments indexed by
the integer values p and q. The pqth bipartite matching:
momentPis P given by Graph matching technique often
mpq = i j ip j q B(i, j). applied in model based vision to match
observations with models or stereo to
binary noise reduction: A method
solve the correspondence problem .
of removing salt-and-pepper noise from
Assume a set V of nodes partitioned
binary images. For example, a point
into two non-intersecting subsets V 1
could have its value set to the median
and V 2 . In other words, V = V 1 V 2
value of its eight neighbors.
and V 1 V 2 = 0. The only arcs E in
binary object recognition: the graph lie between the two subsets,
1 2 2 1
Model based techniques and algorithms i.e., E {V V } {V V }. This
used to recognize objects from their is the bipartite graph. The bipartite
binary images . matching problem is to find a maximal
matching in the bipartite graph, in
binary operation: An operation that other words, a maximal set of nodes
takes two images as inputs, such as from the two subsets connected by arcs
image subtraction . [ SOS:2.3] such that each node is connected by
exactly one arc. One maximal matching
binary region skeleton: See in the graph below with sets
skeleton. [ ERD:6.8] V 1 = {A, B, C} and V 2 = {X, Y } pairs
(A, Y ) and (C, X). The selected arcs
binocular: A system that has two
are solid, and other arcs are dashed.
cameras looking at the same scene
B 29

blending operator: An image

V1 V2 processing operator that creates a third
image C by a weighted combination of
A the input images A and B. In other
X words, C(i, j) = A(i, j) + B(i, j) for
two scalar weights and . Usually,
B + = 1. The results of some process
can be illustrated by blending the
Y original and result images. An example
C of blending that adds a detected
boundary to the original image is:
[ WP:Matching (graph theory)#Maximum matchings in bipartite graphs]

bit map: An image with one bit per

pixel. [ JKS:3.3.1]

bit-plane encoding: An image

compression technique where the image blob analysis: Blob analysis is a
is broken into bit planes and run length group of algorithms used in medical
coding is applied to each plane. To get image analysis. There are four steps in
the bit planes of an 8-bit gray scale the process: derive optimum
image, the picture has a boolean AND foreground/background threshold to
operator applied with the binary value segment objects from their background;
corresponding to the desired plane. For binarize the images by applying a
example, ANDing the image with thresholding operation; perform
00010000 gives the fifth bit plane. region growing and assign a labels to
[ AJ:11.2] each discrete group (blob) of connected
pixels; extract physical measurements
bitangent: See curve bitangent . from the blobs.
[ WP:Bitangent]
blob extraction: A part of
bitshift operator: The bitshift blob analysis . See
operator shifts the binary connected component labeling .
representation of each pixel to the left [ WP:Blob extraction]
or right by a set number of bit
positions. Shifting 01010110 right by 2 block coding: A class of signal coding
bits gives 00010101. The bitshift techniques. The input signal is
operator is a computationally cheap partitioned into fixed-size blocks, and
method of dividing or multiplying an each block is transmitted after
image by a power of 2. A shift of n translation to a smaller (for
positions is a multiplication or division compression ) or larger (for
by 2n . error-correction) block size. [ AJ:11.1]
[ WP:Bitwise operation#Bit shifts]
blocks world: The blocks world is the
blanking: Clearing a CRT or video simplified problem domain in which
device. The vertical blanking interval much early artificial intelligence and
(VBI) in television transmission is used computer vision research was done.
to carry data other than audio and The essential feature of the blocks
video. [ WP:Blanking (video)] world is the restriction of analysis to
30 B

simplified geometric objects such as border tracing: Given a pre-labeled

polyhedra and the assumption that (or segmented) image, the border is the
geometric descriptions such as image inner layer of each regions connected
edges can be easily recovered from the pixel set. It can be traced using a
image. An example blocks world scene simple 8-connective or 4-connective
is [ VSN:4]: stepping procedure in a 3 3
neighborhood. [ RN:8.1.4]

boundary: A general term for the

lower dimensional structure that
separates two objects, such as the curve
between neighboring surfaces, or
surface between neighboring volume.
[ JKS:2.5.1]

boundary description: Functional,

geometry based or set-theoretic
description of a region boundary . For
blooming: Blooming occurs when too
an example, see chain code .
much light enters a digital optical
[ ERD:7.8]
system. The light saturates CCD
pixels, causing charge to overspill into boundary detection: An image
surrounding elements giving either processing algorithm that finds and
vertical or horizontal streaking in the labels the edge pixels between two
image (depending on the orientation of neighboring image segments after
the CCD). [ CS:3.3.5] segmentation . The boundary represents
physical discontinuities in the scene, for
Blums medial axis: See
example changes in color, depth, shape
medial axis transform [ JKS:2.5.10]
or texture. [ RN:7.1]
blur: A measure of sharpness in an
boundary grouping: An
image. Blurring can arise from the
image processing algorithm that
sensor being out of focus , noise in the
attempts to complete a fully connected
environment or image capture process,
image-segment boundary from many
target or sensor motion , as a side effect
broken pieces. A boundary might be
of an image processing operation, etc.
broken because it is commonplace for
A blurred image is:
sharp transitions in property values to
appear in the image as slow transitions,
or sometimes disappear due to noise ,
blurring , digitization artifacts, poor
lighting or surface irregularities, etc.

boundary length: The length of the

boundary of an object. See also
perimeter . [ WP:Perimeter]
[ WP:Blur] boundary matching: See
curve matching .
border detection: See
boundary detection . [ RN:7.1]
B 31

boundary property: Characteristics polarized perpendicularly to the surface

of a boundary , such as arc length , normal. The degree of polarization
curvature , etc. depends on the incident angle and the
refractive indices of the air and
boundary representation: See reflective medium. The angle of
boundary description and B-Rep . maximum polarization is called
[ BT:8] Brewsters angle and is given by
boundary segmentation: See

curve segmentation . B = tan1
boundary-region fusion: where n1 and n2 are the refractive
Region growing segmentation approach indices of the two materials. [ EH:8.6]
where two adjacent regions are
merged when their characteristics are brightness: The quantity of radiation
close enough to pass some similarity reaching a detector after incidence on a
test. The candidate neighborhood for surface. Often measured in lux or ANSI
testing similarity can be the pixels lying lumens. When translated into an
near the shared region boundary . image, the values are scaled to fit the
[ WP:Region growing] bit patterns available. For example, if
an 8-bit byte is used, the maximum
bounding box: The smallest value is 255. See also luminance .
rectangular prism that completely [ DH:7.2]
encloses either an object or a set of
points. The ratio of the length of box brightness adjustment: Increase or
sides is often used as a classification decrease in the luminance of an image.
metric in model based recognition . To decrease, one can linearly interpolate
[ WP:Minimum bounding box] between the image and a pure black
image. To increase, one can linearly
bottom-up: Reasoning that proceeds extrapolate from a black image and the
from the data to the conclusions. In target. The extrapolation function is
computer vision, describes algorithms
that use the data to generate v = (1 ) i0 + i1
hypotheses at a low level, that are
refined as the algorithm proceeds. where is the blending factor (often
Compare top-down . [ RJS:6] between 0 and 1), v is the output pixel
value and i0 and i1 are the
BRDF/BDRF: See corresponding image and black pixels.
bidirectional reflectance distribution function
See also gamma correction and
. contrast enhancement .
[ WP:Bidirectional reflectance distribution[ function]
WP:Gamma correction]

Brodatz texture: A well-known set of

breakpoint detection: See texture images often used for testing
curve segmentation . [ BB:8.2.1] texture-related algorithms. [ NA:8.2]

breast scan analysis: See building detection: A general term

mammogram analysis . [ CS:8.4.7] for a specific, model-based set of
algorithms for finding buildings in data.
Brewsters angle: When light reflects The range of data used is large,
from a dielectric surface it will be encompassing stereo images, range
32 B

images, aerial and ground-level in the hope that any defects will
photographs. manifest themselves early in the
components life (e.g., 72 hours of
bundle adjustment: An algorithm typical use). 3) The practice of
used to optimally determine the three discarding the first several samples of
dimensional coordinates of points and an MCMC process in the hope that a
camera positions from two dimensional very low-probability starting point will
image measurements. This is done by be converge to a high-probability point
minimizing some cost function that before beginning to output samples.
includes the model fitting error and the [ NA:1.4.1]
camera variations. The bundles are the
light rays between detected 3D features butterfly filter: A linear filter
and each camera center. It is these designed to respond to butterfly
bundles that are iteratively adjusted patterns in images. A small butterfly
(with respect to both camera centers filter convolution kernel is
and feature positions). [ FP:13.4.2]
0 2 0
burn-in: 1) A phenomenon of early 1 2 1
tube-based cameras and monitors 0 2 0
where, if the same image was presented
for long periods of time it became It is often used in conjunction with the
permanently burnt into the Hough transform for finding peaks in
phosphorescent layer. Since the advent the Hough feature space, particularly
of modern monitors (1980s) this no when searching for lines. The line
longer happens. 2) The practice of parameter values of (p, ) will generally
shipping only electronic components give a butterfly shape with a peak at
that have been tested for long periods, the approximate correct values.

CAD: See computer aided design .

[ WP:Computer-aided design] camera: 1) The physical device used
to acquire images. 2) The
calculus of variations: See mathematical representation of the
variational approach . [ BKPH:6.13] physical device and its characteristics
such as position and calibration. 3) A
calibration object: An object or class of mathematical models of the
small scene with easily locatable projection from 3D to 2D, such as
features used for camera calibration . affine -, orthographic - or
[ HZ:7.5.2] pinhole camera . [ NA:1.4.1]

camera calibration: Methods for

determining the position and
orientation of cameras and range
sensors in a scene and relating them to
scene coordinates. There are essentially
four problems in calibration:

1. Interior orientation. Determining

the internal camera geometry,
including its principal point, focal
length and lens distortion.

2. Exterior orientation. Determining

the orientation and position of the
34 C

camera with respect to some

absolute coordinate system. camera motion estimation: See
sensor motion estimation .
3. Absolute orientation. Determining [ WP:Egomotion]
the transformation between two
coordinate systems, the position camera position estimation:
and orientation of the sensor in Estimation of the optical position of the
the absolute coordinate system camera relative to the scene or observed
from the calibration points. structure. This generally consists of six
degrees of freedom (three for rotation ,
4. Relative orientation. Determining three for translation ). It is often a
the relative position and component of camera calibration .
orientation between two cameras Camera position is sometimes called
from projections of calibration the extrinsic parameters of the camera.
points in the scene. Multiple camera positions may be
These are classic problems in the field estimated simultaneously with the
of photogrammetry . [ FP:3] reconstruction of 3D scene structure in
structure-and-motion algorithms.
camera coordinates: 1) A
viewer-centered representation relative Canny edge detector: The first of
to the camera. The camera coordinate the modern edge detectors . It took
system is positioned and oriented account of the trade-off between
relative to the scene coordinate system sensitivity of edge detection versus the
and this relationship is determined by accuracy of edge localization. The edge
camera calibration . 2) An image detector consists of four stages: 1)
coordinate system that places the Gaussian smoothing to reduce noise
cameras principal point at the origin and remove small details, 2)
(0, 0), with unit aspect ratio and zero gradient magnitude and direction
skew. The focal length in camera calculation, 3)
coordinates may or may not equal 1. If non-maximal suppression of smaller
image coordinates are such that the gradients by larger ones to focus edge
3 4 projection matrix is of the form localization and 4) gradient magnitude
thresholding and linking that uses
hf 0 0i hysteresis so as to start linking at
R | ~t

0 f 0
0 0 1 strong edge positions, but then also
track weaker edges. An example of the
then the image and camera coordinate edge detection results is [ JKS:5.6.1]:
systems are identical. [ HZ:5.1]

camera geometry: The physical

geometry of a camera system. See also
camera model. [ RJS:2]

camera model: A mathematical

model of the projection from 3D (real
world) space to the camera
image plane. For example see
pinhole camera model . [ RJS:2]
canonical configuration: A stereo
camera motion compensation: See camera configuration in which the
sensor motion compensation . optical axes of the cameras are parallel,
C 35

the baselines are parallel to the image Hough transforms , with the output of
planes and the horizontal axes of the one transform used as input to the
image planes are parallel. This results next.
in epipolar lines that are parallel to the
horizontal axes, hence simplifying the cascading Gaussians: A term
search for correspondences. referring to the fact that the
convolution of a Gaussian with itself is
Optical Centers Optical Axes another Gaussian. [ JKS:4.5.4]

Image CAT: See X-ray CAT . [ RN:10.3.4]

plane 1

image pkane 2 catadioptric optics: The general

approach of using mirrors in
Corresponding epipolar lines
combination with conventional imaging
cardiac image analysis: Techniques systems to get wide viewing angles
involving the development of 3D vision (180o ). It is desirable that a
algorithms for tracking the motion of catadioptric system has a single
the heart from NMR and viewpoint because it permits the
echocardiographic images. generation of geometrically correct
perspective images from the captured
Cartesian coordinates: A position images. [ WP:Catadioptric system]
description system where an
n-dimensional point, P~ , is described by categorization: The subdivision of a
exactly n coordinates with respect to n set of elements into clearly distinct
linearly independent and often groups, or categories, defined by specific
orthonormal vectors, known as axes. properties. Also the assignment of an
[ WP:Cartesian coordinate system] element to a category or recognition of
its category. [ WP:Categorization]
category: A group or class used in a
P=( xc yc z c )
classification system. For example, in
mean and Gaussian curvature
shape classification , the local shape of a
surface is classified into four main
categories: planar, ellipsoidal,
P hyperbolic, and cylindrical. Another
example is the classification of observed
grazing animals into one of {sheep, cow,
horse}. See also categorization .
[ WP:Categorization]

content based image retrieval .
cartography: The study of maps and [ WP:Content-based image retrieval]
map-building. Automated cartography
is the development of algorithms that CCD: Charge-Coupled Device. A solid
reduce the manual effort in map state device that can record the number
building. [ WP:Cartography] of photons falling on it.

cascaded Hough transform: An

application of several successive
36 C

curve at P~ , has the same curvature as

the curve at P~ , and lies towards the
concave (inner) side of the curve. This
figure shows the circle and center of
curvature, C,~ of a curve at point P~
[ FP:19.1.1]:

A 2D matrix of CCD elements are used,

together with a lens system, in digital C
cameras where each pixel value in the
final images corresponds to the output P
one or more of the elements.
[ FP:1.4.1]
center of mass: The point within an
CCIR camera: Camera fulfilling color
object at which the force of gravity
conversion and pixel formation criteria
appears to act. If the object can be
laid out by the Comite Consultatif
described by a multi-dimensional point
International des Radio. [ SEU:1.7.3]
set {~xi } containing N points, the
cell microscopic analysis: center of mass is N1 i=0 ~xi f (~xi ),
Automated image processing where f (~xi ) is the value of the image
procedures for finding and analyzing (e.g., binary or gray scale ) at point ~xi .
different cell types from images taken [ JKS:2.2.2]
by a microscope vision system.
Common examples are the analysis of center of projection: The origin of
pre-cancerous cells and blood cell the camera reference frame in the
analysis. [ WP:Live blood analysis] pinhole camera model . In such a
camera, the projection of a point in
cellular array: A massively parallel space is determined by the line passing
computing architecture, composed of a through the point itself and the center
high number of processing elements. of projection. See [ JKS:8.1]:
Particularly useful in machine vision
applications when a simple 1:N LENS
mapping is possible between image CENTER OF
pixels and processing elements. See also AXIS
systolic array and SIMD .
[ WP:Systolic array] IMAGE
center line: See medial line .
center-surround operator: An
center of curvature: The center of operator that is particularly sensitive to
the circle of curvature (or osculating spot-like image features that have
circle) at a point P~ of a plane curve at higher (or lower) pixel values in the
which the curvature is nonzero. The center than the surrounding areas. A
circle of curvature is tangent to the simple convolution mask that can be
C 37

used as an orientation independent small curve shown below using a 4

spot detector is: connected coding scheme, starting from
the upper right pixel [ JKS:6.2.1]
81 18 81
18 1 81
18 18 81

central moments: A family of image

moments that are invariant to
translation because the center of mass
has been subtracted during the
calculation. If f (c, r) is the input image
pixel value ( binary or gray scale ) at
row r and column c then the pq th
P P moment is
c )p (r r)q f (c, r) where
r (c c
c, r) is the center of mass of the image.
[ RJS:6] * ***
* * 2 0
** **
central projection: It is defined by
projection of an image on the surface of 3
a sphere onto a tangential plane by rays
from the center of the sphere. A great
circle is the intersection of a plane with
the sphere. The image of the great
circle under central projection will be a
line. Also known as the gnomonic
projection. [ RJS:2]

centroid: See center of mass .

[ JKS:2.2.2]

certainty representation: Any of a

set of techniques for encoding the belief chamfer matching: A matching
in a hypothesis, conclusion, calculation, technique based on the comparison of
etc. Example representation methods contours, and based on the concept of
are probability and fuzzy logic . chamfer distance assessing the
similarity of two sets of points. This
chain code: An efficient method for can be used for matching edge images
contour coding where an arbitrary using the distance transform . See also
curve is represented by a sequence of Hausdorff distance . To find the
small vectors of unit length in a limited parameters (for example, translation
set of possible directions. Depending on and scale below) that register a library
whether the 4 connected or the image and a test image, the binary edge
8 connected grid is employed, the chain map of the test image is compared to
code is defined as the digits from 0 to 3 the distance transform. Edges are
or 0 to 7, assigned to the 4 or 8 detected on image 1, and the distance
neighboring grid points in a transform of the edge pixels is
counter-clockwise sense. For example, computed. The edges from image 2 are
the string 222233000011 describes the then matched. [ ZRH:2.3]
38 C

the object. The views are chosen so

that small changes in viewpoint do not
cause large changes in appearance (e.g.,
a singularity event ). Real objects have
an unrealistic number of singularities,
so practical approaches to creating
characteristic views require
approximations, such as only using
views on a tessellated viewsphere , or
only representing the viewpoints that
Image 1 Image 2 are reasonable stable over large ranges
on the viewsphere . See also
aspect graph and
appearance based recognition .

chess board distance metric: See

Manhattan metric .
[ WP:Chebyshev distance]

chi-squared distribution: The

Dist. Trans. Edges 2 chi-squared (2 ) probability
distribution describes the distribution
of squared lengths of vectors drawn
from a normal distribution. Specifically
let the cumulative distribution function
of the 2 distribution with d degrees of
freedom be denoted 2 (d, u). Then the
probability that a point ~x drawn from a
d-dimensional Gaussian distribution
will have squared norm |~x|2 less than a
Best Match value is given by 2 (d, ). Empirical
and theoretical plots of the 2
chamfering: See distance transform . probability density function with five
[ JKS:2.5.9] degrees of freedom are here:
change detection: See
motion detection . [ JKS:14.1] 0.06
0.04 Empirical
character recognition: See
optical character recognition . [ RJS:8] 0.02

0 5 10 15 20 25 30
character verification: A process
|X|2, X R5
used to confirm that printed or
displayed characters are within some [ WP:Chi-square distribution]
tolerance that guarantees that they are
readable by humans. It is used in chi-squared test: A statistical test of
applications such as labeling. the hypothesis that a set of sampled
values has been drawn from a given
characteristic view: An approach to distribution. See also
object representation in which an chi-squared distribution .
object is encoded by a set of views of [ WP:Chi-square test]
C 39

chip sensor: A CCD or other

semiconductor based light sensitive
imaging device.
chord distribution: A 2D shape 535nm
description technique based on all 505nm 555nm
chords in the shape (that is all pairwise 0.5
495nm 595nm
segments between points on the 780nm
boundary). Histograms of their 485nm

lengths and orientations are computed. 0 380nm

The values in the length histogram are
0 0.5 1
invariant to rotations and scale linearly
with the size of object. The orientation chrominance: 1) The part of a video
histogram values are invariant to scale signal that carries color. 2) One or both
and shifts. of the color axes in a 3D color space
that distinguishes intensity and color.
chroma: The color portion of a video See also chroma . [ WP:Chrominance]
signal that includes hue and
saturation, requiring luminance to chromosome analysis: Vision
make it visible. It is also referred to as technique used for the diagnosis of
chrominance . [ WP:Chroma] some genetic disorders from microscope
images. This usually includes sorting
chromatic aberration: A focusing the chromosomes into the 23 pairs and
problem where light of different displaying them in a standard chart.
wavelengths (color) is refracted by
different amounts and consequently CID: Charge Injection Device. A type
images at different places. As blue light of semiconductor imaging device with a
is refracted more than red light, objects matrix of light-sensitive cells. Every
may be imaged with color fringes at pixel in a CID array can be individually
places where there are strong changes addressed via electrical indexing of row
in lightness . [ FP:1.2.3] and column electrodes. It is unlike a
CCD because it transfers collected
chromaticity diagram: A 2D slice of charge out of the pixel during readout,
a 3D color space . The CIE 1931 thus erasing the image.
chromaticity diagram is the slice
through the xyz color space of the CIE CIE chromaticity coordinates:
where x + y + z = 1. This slice is shown Coordinates in the CIE color space
below. The color gamut of standard with reference to three ideal standard
0-1 RGB values in this model is the colors X, Y and Z. Any visible color
bright triangle in the center of the can be expressed as a weighted sum of
horseshoe-like shape. Points outside the these three ideal colors, for example, for
triangle have had their saturations a color p = w1 X + w2 Y + w3 Z. The
truncated. See also normalized values are given by
CIE chromaticity coordinates . w1
[ WP:Chromaticity diagram#The CIE xy chromaticity x =diagram and the CIE xyY color space]
w1 + w2 + w3
w1 + w2 + w3
w1 + w2 + w3
40 C

since x + y + z = 1, we only need to

know two of these values, say (x, y).
These are the chromaticity coordinates.
[ JKS:10.3] r

CIE L*A*B* model: A

color representation model based on
that proposed by the Commission
Internationale dEclairage (CIE) as an
international standard for color
measurement. It is designed to be
device-independent and perceptually circle detection: A class of
uniform (i.e., the separation between algorithms, for example the
two points in this space corresponds to Hough transform , that locate the
the perceptual difference between the centers and radii of circles in digital
colors). L*A*B* color consists of a images. In general images, scene circles
luminance, L*, and two chromatic usually appear as ellipses, as in this
components: A* component, from example [ ERD:9]:
green to red; B* component, from blue
to yellow. See also CIE L*U*V* model
. [ JKS:10.3]

CIE L*U*V* model: A

color representation system where
colors are represented by luminance
(L*) and two chrominance
components(U*V*). A given change in
value in any component corresponds
approximately to the same perceptual
difference. See also
CIE L*A*B* model. [ JKS:10.3]
circle fitting: Techniques for deriving
circle: A curve consisting of all points circle parameters from either 2D or 3D
on a plane lying a fixed radius r from observations. As with all fitting
the center point C. The arc defining the problems, one can either search the
entire circle is known as the parameter space using a good metric
circumference and is of length 2r. The (using, for example, a
area contained inside the curve is given Hough transform), or can solve a
by A = r2 . A circle centered at the well-posed least-squares problem.
point (h, k) has equation [ JKS:6.8.4]
(x h)2 + (y k)2 = r2 . The circle is a
special case of the ellipse. [ NA:5.4.3] circular convolution: The circular
convolution (ck ) of two vectors {xi }
and {yi } that are of length n is defined
as ck = i=0 xi yj where 0 k < n
and j = (i k)mod n.
[ WP:Circular convolution]

circularity: One measure C of the

degree to which a 2D shape is similar to
C 41

a circle is given by


C = 4
P2 b c a

close operator: The application of

two binary morphology operators,
where C varies from 0 (non-circular) to dilation followed by erosion , which has
1 (perfectly circular). A is the object the effect of filling small holes in an
area and P is the object perimeter. image. This figure shows the result of
[ WP:Circular definition] closing with a mask 22 pixels in
diameter [ JKS:2.6]:
city block distance: See
Manhattan metric . [ JKS:2.5.8]

classification: A general term for the

assignment of a label (or class) to
structures (e.g., pixels, regions , lines ,
etc.). Example classification problems
include: a) labelling pixels as road,
vegetation or sky, b) deciding whether
cells are cancerous based on cell shapes clustering: 1) Grouping together
or c) the person with the observed face images regions or pixels into larger,
is an allowed system user. homogeneous regions sharing some
[ ERD:1.2.1] property. 2) Identifying the subsets of a
set of data points {~xi } based on some
classifier: An algorithm assigning a property such as proximity.
class among several possible to an input [ FP:14.1.2]
pattern or data. See also classification ,
unsupervised classification , clustering , clutter: A generic term for unmodeled
supervised classification and or uninteresting elements in an image.
rule-based classification . [ FP:22] For example, a face detector generally
has a model for faces, and not for other
clipping: Removal or non-rendering of objects, which are regarded as clutter.
objects that do not coincide with the The background of an image is often
display area. [ NA:3.3.1] expected to include clutter. Loosely
speaking, clutter is more structured
clique: A clique of a graph G is a fully
than noise . [ FP:18.2.1]
connected subgraph of G. In a fully
connected graph, every vertex is a CMOS: Complementary metal-oxide
neighbor of all others. The graph below semiconductor. A technology used in
has a clique with five nodes. (There are making image sensors and other
other cliques in the graph with fewer computer chips. [ NA:1.4.1]
nodes, e.g., ABac with four nodes,
etc.). [ WP:Clique (graph theory)] CMY: See CMYK . [ LG:3.7]
42 C


CMYK: Cyan, magenta, yellow and CAMERA OPTICAL
black color model. It is a subtractive AXIS
model where colors are absorbed by a
medium, for example pigments in TARGET AREA
paints. Where the RGB color model LIGHT SOURCE
adds hues to black to generate a cognitive vision: A part of
particular color, the CMYK model computer vision focusing techniques for
subtracts from white. Red, green and recognition and categorization of
blue are secondary colors in this model. objects , structures and events, learning
[ LG:3.7] and knowledge representation , control
and visual attention .

coherence detection: Stereo vision

technique where maximal patch
correlations are searched for across two
images to generate features. It relies on
having a good correlation measure and
a suitably chosen patch size.

coherent fiber optics: Many

fiber optic elements bound into a single
cable component with the individual
fiber spatial positions aligned, so that it
can be used to transmit images.

coherent light: Light , for example

generated by a laser , in which the
emitted light waves have the same
wavelength and are in phase. Such light
waves can remain focused over long
distances. [ WP:Collimated light]
coarse-to-fine processing: coincidental alignment: When two
Multi-scale algorithm application that structures seem to be related, but in
begins by processing at a large or fact the structures are independent or
coarse level and then, iteratively, to a the alignment is just a consequence of
small or fine level. Importantly, results being in some special viewpoint .
from each level must be propagated to Examples are random edges being
ensure a good final result. It is used for collinear or surfaces coplanar , or
computing, for example, optical flow. object corners being nearby. See also
[ FP:7.7.2] non-accidentalness .
coaxial illumination: Front lighting collimate: To align the optics of a
with the illumination path running vision system, especially those in a
along the imaging optical axis . telescopic system.
Advantages of this technique are no
visible shadows or direct specularities collimated lighting: Collimated
from the cameras viewpoint. lighting (e.g., directional back-lighting)
C 43

is a special form of structured light. A perceptually different (e.g., red versus

collimator produces light in which all blue). [ EH:4.4]
the rays are parallel.
color based database indexing: See
color based image retrieval .
[ WP:Content-
based image retrieval#Color]

color based image retrieval: An
example of the more general
image database indexing process ,
where one of the main indices into the
image database comes from either color
samples, the color distribution from a
sample image, or by a set of text color
object terms (e.g., red), etc. [ WP:Content-
based image retrieval#Color]

optical color clustering: See

system color image segmentation .

color constancy: The ability of a

vision system to assign a color
description to an object that is
lamp independent of the lighting
environment. This will allow the
It is used to produce well defined system to recognize objects under many
shadows that can be cast directly onto different lighting conditions. The
either a sensor or an object. human vision system does this
automatically, but most machine vision
collinearity: The property of lying systems cannot. For example, humans
along the same straight line. [ HZ:1.3] observing a red object in a cluttered
collineation: See scene under a blue light will still see the
projective transformation. [ OF:2.2.1] object as red. A machine vision system
might see it as a very dark blue.
color: Color is both a physical and [ WP:Color constancy]
psychological phenomenon. Physically,
color refers to the nature of an object color co-occurrence matrix: A
texture that allows it to reflect or matrix (actually a histogram ) whose
absorb particular parts of the light elements represent the sum of color
incident on it. (See also reflectance .) values existing, in a given image in a
The psychological aspect is sequence, at a certain pixel position
characterized by the visual sensation relative to another color existing at a
experienced when light of a particular different position in the image. See also
frequency or wavelength is incident on co-occurrence matrix .
the retina. The key paradox here [ WP:Co-occurrence matrix]
concerns why light of slightly different
wavelengths should be be so
44 C

color correction: 1) Adjustment of reproducing about 20% of perceivable

colors to achieve color constancy . 2) colors. The color gamut achieved with
Any change to the colors of an image. premixed inks (like the Pantone
See also gamma correction . Matching System) is also smaller than
[ WP:Color correction] the RGB gamut. [ WP:Gamut]

color differential invariant: A type color halftoning: See dithering .

of differential invariant based on color [ WP:Halftone#Multiple screens and color halftoning]
information, such as ||R||||G|| that
has the same value invariant to
translation, rotation and variations in color histogram matching: Used in
uniform illumination. color image indexing where the
similarity measure is the distance
color doppler: A method for between color histograms of two
noninvasively imaging blood flow images, e.g., by using the
through the heart or other body parts Kullback-Leibler divergence or
by displaying flow data on the two Bhattacharyya distance .
dimensional echocardiographic image.
Blood flow in different directions will be color image: An image where each
displayed in different colors. element ( pixel ) is a tuple of values
from a set of color bases. [ SEU:1.7.3]
color edge detection: The process of
edge detection in color images. A color image restoration: See
simple approach is combine (e.g., by image restoration . [ SEU:1.3]
addition) the edge strengths of the
color image segmentation:
individual RGB color planes.
Segmenting a color image into
color efficiency: A tradeoff that is homogeneous regions based on some
made with lighting systems, where similarity criteria. The boundaries
conflicting design constraints require around typical regions are shown here:
energy efficient production of light
while simultaneously producing
sufficiently broad spectrum illumination
that the the colors look natural. An
obvious example of a skewed tradeoff is
with low pressure sodium street
lighting. This is energy efficient but has
poor color appearance.
color indexing: Using color
color gamut: The subset of all
information, e.g., color histograms , for
possible colors that a particular display
image database indexing . A key issue is
device (CRT, LCD, printer) can
varying illumination. It is possible to
display. Because of physical difference
use ratios of colors from neighboring
in how various devices produce colors,
locations to obtain illumination
each scanner, display, and printer has a
invariance. [ WP:Color index]
different gamut, or range of colors, that
it can represent. The RGB color gamut color matching: Due to the
can only display approximately 70% of phenomenon of trichromacy, any color
the colors that can be perceived. The stimulus can be matched by a mixture
CMYK color gamut is much smaller, of the three primary stimuli. Color
C 45

matching is expressed as :

C = RR + GG + BB

where a color stimulus C is matched by

R units of primary stimulus R mixed 16,777,216 colors 256 colors
with G units of primary stimulus G
and B units of primary stimulus B.
[ SW:2.5.1]

color mixture model: A

mixture model based on distributions
in some color representation system
that specifies both the color groups in a 16 colors 4 colors

model as well as their relationships to [ WP:Color quantization]

each other. The conditional probability
of a observed pixel ~xi belonging to an color re-mapping: An image
object O is modeled as a mixture with transformation where each original
K components. color is replaced by another color from
a colormap. If the image has indexed
color models: See colors, this can be a very fast operation
color representation system . and can provide special graphical
[ WP:Color model] effects for very low processing overhead.
color moment: A color image
description based on moments of each
color channels histogram , e.g., the
mean, variance and skewness of the

color normalization: Techniques for

normalizing the distribution of color
values in a color image, so that the
image description is invariant to
illumination . One simple method for
Original Color remapped
producing invariance to lightness is to color representation system: A 2D
use vectors of unit length for color or 3D space used to represent a set of
entries, rather than coordinates in the absolute color coordinates. RGB and
color representation system . CIE are examples of such spaces.
color quantization: The process of color spaces: See
reducing the number of colors in a color representation system .
image by selecting a subset of colors, [ WP:Color space]
then representing the original image
using only them. This has the color temperature: A scalar measure
side-effect of allowing of colour. 1) The colour temperature of
image compression with fewer bits. A a given colour C is the temperature in
color image encoded with progressively kelvins at which a heated black body
fewer numbers of colors is shown here: would emit light that is dominated by
46 C

colour C. It is relevant to computer response of separate edge operators

vision in that the illumination color applied at several orientations. The
changes the appearance of the observed edge response at a pixel is commonly
objects. The color temperature of the maximum of the responses over the
incandescent lights is about 3200 several orientations.
kelvins and sunlight is about 5500
kelvins. 2) Photographic color composite filter: Hardware or
temperature is the ratio of blue to red software image processing method
intensity. [ WP:Color temperature] based on a mixture of components such
as noise reduction , feature detection ,
color texture: Variations ( texture ) in grouping, etc.
the appearance of a surface (or region , [ WP:Composite image filter]
illumination , etc.) arising because of
spatial variations in either the color , composite video: A television video
reflectance or lightness of a surface. transmission method created as a
backward-compatible solution for the
colorimetry: The measurement of transition from black-and-white to color
color intensity relative to some television. The black-and-white TV
standard. [ WP:Colorimetry] sets ignore the color component while
color TV sets separate out the color
combinatorial explosion: When information and display it with the
used correctly, this term refers to how black-and-white intensity.
the computational requirements of an [ WP:Composite video]
algorithm increases very quickly
relative to the increase in the number of compression: See image compression .
elements to be processed, as a [ SEU:1.3]
consequence of having to consider all
combinations of elements. For example, computational theory: An approach
consider matching M model features to to computer vision algorithm
D data features with D M , each data description promoted by Marr. A
feature can be used at most once and process can be described at three levels,
all model features must be matched. implementation (e.g., as a program),
Then the number of possible matchings algorithm (e.g., as a sequence of
that need to be considered is activities) and computational theory.
D (D 1) (D 2) (D M + 1). This third level is characterized by the
Here, if M increases by only one, assumptions behind the process, the
approximately D times as much mathematical relationship between the
matching effort is needed. input and output process and the
Combinatorial explosion is also loosely description of the properties of the
used for other non-combination input data (e.g., assumptions of
algorithms whose effort grows rapidly statistical distributions). The claimed
with even small increases in input data advantage of this approach is that the
sizes. [ WP:Combinatorial explosion] computational theory level makes
explicit the essentials of the process,
compactness: A scale , translation that can then be compared to the
and rotation invariant descriptor based essentials of other processes solving the
on the ratio perimeter
area . [ JKS:2.5.7] same problem. By this method, the
implementation details that can confuse
compass edge detector: A class of comparisons can be ignored.
edge detectors based on combining the
C 47

computational vision: See focus. The reflecting surface usually is

computer vision . [ JKS:1.1] rotationally symmetric about the
optical or principal axis and mirror
computer aided design: 1) A general surface can be part of a sphere ,
term for object design processes where paraboloid, ellipsoid , hyperboloid or
a computer assists the designer, e.g., in other surfaces. It is also known as a
the specification and layout of converging mirror because it brings
components. For example, most current light to a focus. In the case of the
mechanical parts are designed by a spherical mirror, half way between the
computer aided design (CAD) process. vertex and the sphere center, C, is the
2) A term used for distinguishing mirror focal point, F, as shown here:
objects designed with the assistance of
a computer.
[ WP:Computer-aided design] concave

computer vision: A broad term for

object image
the processing of image data. Every principal axis C F
professional will have a different
definition that distinguishes computer
vision from machine vision ,
image processing or
[ WP:Curved mirror#Concave mirrors]
pattern recognition . The boundary is
not clear, but the main issues that lead concave residue: The set difference
to this term being used are more between a shape and its convex hull .
emphasis on 1) underlying theories of For a convex shape, the concave residue
optics, light and surfaces, 2) underlying is empty. Some shapes (in black) and
statistical, property and shape models, their concave residues (in gray) are
3) theory-based algorithms, as shown here:
contrasted to commercially exploitable
algorithms and 4) issues related to
what humans broadly relate to
understanding as contrasted with
automation. [ JKS:1.1]

computed axial tomography: Also

known as CAT. An X-ray procedure
used in conjunction with vision
techniques to build a 3D
volumetric image from multiple X-ray
images taken from different viewpoints . concavity: Loosely, a depression, dent,
The procedure can be used to produce hollow or hole in a shape or surface.
a series of cross sections of a selected More precisely, a connected component
part of the human body, that can be of a shapes concave residue .
used for medical diagnosis. [ WP:X-
ray computed tomography#Terminology]concavity tree: An hierarchical
description of an object in the form of a
tree. The concavity tree of a shape has
concave mirror: The type of mirror the convex hull of its shape as the
used for imaging, in which a concave parent node and the concavity trees of
surface is used to reflect light to a its concavities as the child nodes.
48 C

These are subtracted from the parent dilate(X, J) M , where X is the

shape to give the original object. The original image, M is the mask and J is
concavity tree of a convex shape is the the structuring element .
shape itself. The concavity tree of the
gray shape is shown below [ ERD:6.6]: conditional distribution: A
distribution of one variable given
the values of one or more other variables.
[ WP:Conditional probability distribution]

conditional replenishment: A
method for coding of video signals,
S where only the portion of a video image
that has changed since the previous
frame is transmitted. Effective for
sequences with largely stationary
S4 backgrounds, but more complex
sequences require more sophisticated
algorithms that perform motion
S1 S2 S3

S31 S32 S41

[ WP:MPEG-1#Motion vectors]

conformal mapping: A function from

concurrence matrix: See the complex plane to itself, f : C 7 C,
co-occurrence matrix . [ RJS:6] that preserves local angles. For
condensation tracking: Conditional example, the complex function
density propagation tracking. The y = sin(z) = 21 i(eiz eiz ) is
particle filter technique applied by conformal. [ WP:Conformal map]
Blake and Isard to edge tracking . A conic: Curves arising from the
framework for object tracking with intersection of a cone with a plane (also
multiple simultaneous hypotheses that called conic sections). This is a family
switches between multiple continuous of curves including the circle, ellipse,
autoregressive process motion models parabola and hyperbola. The general
according to a discrete transition form for a conic in 2D is
matrix. Using importance sampling it ax2 + bxy + cy 2 + dx + ey + f = 0.
is possible to keep only the N strongest Some example conics are [ JKS:6.6]:

condenser lens: An optical device

used to collect light over a wide angle
and produce a collimated output beam.
circle ellipse parabola hyperbola

conditional dilation: A binary image conic fitting: The fitting of a

operation that is a combination of the geometric model of a conic section
dilation operator and a logical ax2 + bxy + cy 2 + dx + ey + f = 0 to a
AND operation with a mask , that only set of data points {(xi , yi )}. Special
allows dilation into pixels that belong cases include fitting circles and ellipses.
to the mask. This process can be [ JKS:6.6]
described by the formula:
C 49

conic invariant: An invariant of a target function is found by iteratively

conic section . If the conic is in descending along non-interfering
canonical form (conjugate) directions . The conjugate
gradient method does not require
ax2 + bxy + cy 2 + dx + ey + f = 0 second derivatives and can find the
optima of an N dimensional quadric
with a2 + b2 + c2 + d2 + e2 + f 2 = 1, form in N iterations. By comparison, a
then the two invariants to rotation and Newton method requires one iteration
translation are functions of the and gradient descent can require an
eigenvalues of the leading
quadratic arbitrarily large number of iterations.
form matrix A = ab cb . For example, [ WP:Conjugate gradient method]
the trace and determinant are
invariants that are convenient to connected component labeling: 1)
compute. For an ellipse, the eigenvalues A standard graph problem. Given a
are functions of the radii. The only graph consisting of nodes and arcs , the
invariant to affine transformation is the problem is to identify nodes forming a
class of the conic (hyperbola, ellipse, connected set. A node is in a set if it
parabola, etc.). The invariant to has an arc connecting it to another
projective transformation is the set of node in the set. 2) Connected
signs of the eigenvalues of the 3 3 component labeling is used in binary
matrix representing the conic in and gray scale image processing to
homogeneous coordinates . join together neighboring pixels into
regions. There are several efficient
conical mirror: A mirror in the shape sequential algorithms for this
of (possibly part of) a cone. It is procedure. In this image, the pixels in
particularly useful for robot navigation each connected component have a
since a camera placed facing the apex different color [ JKS:2.5.2]:
of the cone aligning the cones axis and
the optical axis and oriented towards
its base can have a full 360o view.
Conical mirrors were used in antiquity
to produce cipher images known as

conjugate direction: Optimization

scheme where a set of independent
directions are identified on the search
space. A pair of vectors ~u and ~v are
conjugate with respect to matrix A if connectivity: See pixel connectivity .
~u A~v = 0. A conjugate direction [ JKS:2.5.1]
optimization method is one in which a
series of optimization directions are conservative smoothing: A noise
devised that are conjugate with respectfiltering technique whose name derives
to the normal matrix but do not requirefrom the fact that it employs a fast
the normal matrix in order for them to filtering algorithm that sacrifices noise
be determined. suppression power to preserve the
image detail. A simple form of
conjugate gradient: A basic conservative smoothing replaces a pixel
technique of numerical optimization in that is larger (smaller) than its
which the minimum of a numerical 8 connected neighbors by the largest
50 C

(smallest) value amongst those iterative methods, most notably

neighbors. This process works well with sequential quadratic programming.
impulse noise but is not as effective [ WP:Constraint optimization]
with Gaussian noise .
constraint satisfaction: An approach
constrained least squares: It is to problem solving that consists of
sometimes useful to minimize three components: 1) a list of what
||A~x ~b||2 over some subset of possible variables need values, 2) a set of
solutions ~x that are predetermined. For allowable values for each variable and
example, one may already know the 3) a set of relationships that must hold
function values at certain points on the between the values for each variable
parameterized curve. This leads to an (i.e., the constraints). For example, in
equality constrained version of the least computer vision, this approach has
squares problem, stated as: minimize been used for different structure
||A~x ~b||2 subject to B~x = ~c. There labelling (e.g., line labelling ,
are several approaches to the solution of region labelling ) and geometric model
this problem such as QR factorization recovery tasks (e.g., reverse engineering
and the SVD . As an example, this of 3D parts or buildings from range
regression technique can be useful in data). [ WP:Constraint satisfaction]
least squares surface fitting where the
plane described by ~x is constrained to constructive solid geometry (CSG):
be perpendicular to some other plane. A method for defining 3D shapes in
terms of a mathematically defined set
constrained matching: A generic of primitive shapes. Boolean set
term for recognition approaches where theoretic operations of intersection,
two objects are compared under a union and difference are used to
constraint on either or both. One combine shapes to make more complex
example of this would be a search for shapes. For example [ JKS:15.3.2]:
moving vehicles under 20 feet in length.

constrained optimization: - =
Optimization of a function f subject to
constraints on the parameters of the
function. The general problem is to find content based image retrieval:
the x that minimizes (or maximizes) Image database searching methods that
f (x) subject to g(x) = 0 and produce matches based on the contents
h(x) >= 0, where the functions f, g, h of the images in the database, as
may all take vector-valued arguments, contrasted with using text descriptors
and g and h may also be vector-valued, to do the indexing. For example, one
encoding multiple constraints to be can use descriptors based on
satisfied. Optimization subject to color moments to select images with
equality constraints is achieved by the similar invariants.
method of Lagrange multipliers . [ WP:Content-based image retrieval]
Optimization of a quadratic form
subject to equality constraints results context: In vision, the elements,
in a generalized eigensystem. information, or knowledge occurring
Optimization of a general f subject to together with or accompanying some
general g and h may be achieved by data, contributing to the datas full
meaning. For example, in a video
C 51

sequence one can speak of spatial change detection ) as the illumination

context of a pixel, indicating the changes during the day.
intensities at surrounding location in a
given frame (image), or of temporal contour analysis: Analysis of outlines
context, indicating the intensities at of image regions.
that pixel location (same coordinates)
contour following: See
but in previous and following frames.
contour linking . [ DH:7.7]
Information deprived of appropriate
context can be ambiguous: for instance, contour grouping: See
differential optical flow methods can contour linking .
only estimate the normal flow ; the full
flow can be estimated considering the contour length: The length of a
spatial context of each pixel. At the contour in appropriate units of
level of scene understanding , knowing measurements. For instance, the length
that the image data comes from a of an image contour in pixels. See also
theater performance provides context arc length . [ WP:Arc length]
information that can help distinguish
between a real fight and a stage act. contour linking: Edge detection or
[ DH:2.11] boundary detection processes typically
identify pixels on the boundary of a
contextual image classification: region . Connecting these pixels to form
Algorithms that take into account the a curve is the goal of contour linking.
source or setting of images in their
search for features and relationships in contour matching: See
the image. Often this context is curve matching .
composed of region identifiers, color,
contour partitioning: See
topology and spatial relationships as
curve segmentation .
well as task-specific knowledge.
contour representation: See
contextual method: Algorithms that
boundary representation .
take into account the spatial
arrangement of found features in their contour tracing: See contour linking .
search for new ones.

continuous convolution: The contour tracking: See

convolution of two continuous signals. contour linking .
In 2D image processing terms the
convolution of two images f and h is: contours: See object contour .
R Ry) = f (x, y) h(x, y) = [ FP:19.2.1]

f ( u , v )h(x u , y v )d u d v
contrast: 1) The difference in
brightness values between two
continuous Fourier transform: See structures, such as regions or pixels. 2)
Fourier transform . [ NA:2.3] A texture measure. In a
gray scale image , contrast, C, is defined
continuous learning: A general term as
describing how a system continually
updates its model of a process based on
current data. For example, updating a XX
C= (i j)2 P [i, j]
background model (for
i j
52 C

where P is the gray-level image analysis or scene understanding

co-occurrence matrix . [ JKS:7.2] system. For instance, control can be
top-down (searching for image data
contrast enhancement: Contrast that verifies an expected target) or
enhancement (also known as contrast bottom-up (progressively acting on
stretching) expands the distribution of image data or results to derive
intensity values in an image so that a hypotheses). The control strategy may
larger range of sensitivity in the outputallow selection of alternative
device can be used. This can make hypotheses, processes or parameter
subtle changes in an image more values, etc.
obvious by increasing the displayed
contrast between image brightness convex hull: Given a set of points, S,
levels. Histogram equalization is one the convex hull is the smallest convex
method of contrast enhancement. An set that contains S. a 2D example is
example of contrast enhancement is shown here [ ERD:6.6]:


convex hull

input image
convexity ratio: Also known as
solidity. A measure that characterizes
deviations from convexity. The ratio for
shape X is defined as area(C X)
, where
CX is the convex hull of X. A convex
figure has convexity factor 1, while all
other figures have convexity less than 1.

convolution operator: A widely used

general image and signal processing
operator thatPcomputes the weighted
after contrast enhancement sum y(j) = i w(i)x(j i) where w(i)
contrast stretching: See are the weights, x(i) is the input signal
contrast enhancement . [ ERD:2.2.1] and y(j) is the result. Similarly,
convolutions ofPimage data take the
control strategy: The guidelines form y(r, c) = i,j w(i, j)x(r i, c j).
behind the sequence of processes Similar forms using integrals exist for
performed by an automatic continuous signals and images. By the
C 53

appropriate choice of the weight values, cooperative processing between

convolution can compute low elements representing the disparity at
pass/smoothing, high a given picture element.
pass/differentiation filtering or template
matching/matched filtering, as well as coordinate system: A spanning set
many other linear functions. The right of linearly independent vectors defining
image below is the result of convolving a vector space. One example is the set
(and then inverting) the left image with generally referred to as the X, Y and Z
axes. There are, of course, an infinite
a +1 1 mask [ FP:7.1.1]: number of sets of three linearly
independent vectors describing 3D
space. The right-handed version of this
is shown in the figure. [ FP:2.1.1]

co-occurrence matrix: A
representation commonly used in
texture analysis algorithms. It records coordinate system transformation:
the likelihood (usually empirical) of two A geometric transformation that maps
features or properties being at a given points, vectors or other structures from
position relative to each other. For one coordinate system to another. It is
example, if the center of the matrix M also used to express the relationship
is position (a, b) then the likelihood between two coordinate systems.
that the given property is observed at Typical transformations include
an offset (i, j) from the current pixel is translation and rotation . See also
given by matrix value M(a + i, b + j). Euclidean transformation.
[ WP:Co-occurrence matrix] [ WP:Coordinate system#Transformations]
cooperative algorithm: An
algorithm that solves a problem by a coplanarity: The property of lying in
series of local interactions between the same plane. For example, three
adjacent structures, rather than some vectors ~a, ~b and ~c are coplanar if their
global process that has access to all scalar triple product (~a ~b) ~c = 0 is
data. The value at a structure changes zero. [ WP:Coplanarity]
iteratively in response to changing
values at the adjacent structures, such coplanarity invariant: A
as pixels, lines, regions, etc. The projective invariant that allows one to
expectation is that the process will determine when five corresponding
converge to a good solution. The points observed in two (or more) views
algorithms are well suited for massive are coplanar in the 3D space. The five
local parallelism (e.g., SIMD ), and are points allow the construction of a set of
sometimes proposed as models for four collinear points whose cross ratio
human image processing. An early value can be computed. If the five
algorithm to solve the points are coplanar, then the cross ratio
stereo correspondence problem used value must be the same in the two
54 C

views. Here, point A is selected and the

lines AB, AC, AD and AE are used to correspondence constraint: See
define an invariant cross ratio for any stereo correspondence constraint .
line L that intersects them:
correspondence problem: See
stereo correspondence problem .
[ JKS:11.2]

cosine diffuser: Optical correction
mechanism for correcting spatial
L responsivity to light. Since off-angle
D light is treated with the same response
as normal light, a cosine transfer is
used to decrease the relative
responsivity to it.
cosine transform: Representation of
an signal in terms of a basis of cosine
core line: See medial line .
functions. For an even 1D function
corner detection: See f (x), the cosine transform is
curve segmentation . [ NA:4.6] Z
F (u) = 2 f (x) cos(2ux)dx.
corner feature detectors: See 0
interest point feature detectors and For a sampled signal f0..(n1) , the
curve segmentation . [ NA:4.6.4] discrete cosine transform is the vector
b0..(n1) where, for k 1:
coronary angiography: A class of
image processing techniques (usually r n1
based on X-ray data) for visualizing 1X
b0 = fi
and inspecting the blood vessels n i=0
surrounding the heart (coronaries). See r n1
also angiography . bk = fi cos (2i + 1)k
[ WP:Coronary catheterization] n i=0 2n

correlation: See cross correlation . For a 2D signal f (x, y) the cosine

[ OF:6.4] transform F (u, v) is
correlation based optical flow 4 f (x, y) cos(2ux)
estimation: Optical flow estimated 0 0
by correlating local image texture at
each point in two or more images and
noting their relative movement. [ SEU:2.5.2]

correlation based stereo: cost function: The function or metric

Dense stereo reconstruction (i.e., at quantifying the cost of a certain action,
every pixel) computed by move or configuration, that is to be
cross correlating local image minimized over a given parameter
neighborhoods in the two images to space. A key concept of optimization .
find corresponding points, from which See also Newtons optimization method
depth can be computed by and functional optimization . [ HZ:3.2]
stereo triangulation .
C 55

neither a step edge nor fold edge is

covariance: The covariance, denoted seen:
2 , of a random variable X is the
expected value of the square of the
deviation of the variable from the
mean. If is the mean, then
2 = E[(X )2 ]. CRACK EDGE
For a d-dimensional data set
represented as a set of n column vectors
~x1..n , the
Pnsample mean is crack following: Edge tracking on
~ = n1 i=1 ~xi , and the sample

the dual lattice or cracks between
covariancePis the d d matrix
1 n pixels based on the continuous
= n1 i=1 (~
xi ~ ) .
~ )(~xi
segments of line from a crack code .
[ DH:2.7.2]
Crimmins smoothing operator: An
covariance propagation: A method
iterative algorithm for speckle
of statistical error analysis, in which
(salt-and-pepper noise ) reduction. It
the covariance of a derived variable can
uses a nonlinear noise reduction
be estimated from the covariances of
technique that compares the intensity
the variables from which it is derived.
of each image pixel with its eight
For example, assume that independent
neighbors and either increments or
variables ~x and ~y are sampled from
decrements the value to try and make it
multi-variate normal distributions with
more representative of its surroundings.
associated covariance matrices Cx and
The algorithm raises the intensity of
Cy . Then, the covariance of the derived
pixels that are darker relative to their
variable ~z = a~x + b~y is
neighbors and lowers pixels that are
Cz = a2 Cx + b2 Cy . [ HZ:4.2]
relatively brighter. More iterations
crack code: A contour description produce more reduction in noise but at
method that codes not the pixels the cost of increased blurring of detail.
themselves but the cracks between
critical motion: In the problem of
them. This is done as a four-directional
self-calibration of a moving camera,
scheme as shown below. It can be
there are certain motions for which
viewed as a chain code with four
calibration algorithms fail to give
directions rather than eight.
unique solutions. Sequences for which
[ WP:Chain code]
self-calibration is not possible are
known as critical motion sequences.

0 cross correlation: Standard method

of estimating the degree to which two
series are correlated. Given two series
3 1 {xi } and {yi }, where
i = 0, 1, 2, .., (N 1) the cross
2 correlation, rd , at a delay d is defined as
(xi mx ).(yid my )
crack code = { 2, 2, 1, 2, 3, 2 } pP i pP
((x 2 2
i i mx ) i (yid my )
crack edge: A type of edge used in
line labeling research to represent where mx and my are the means of the
where two aligned blocks meet. Here, corresponding sequences. [ EH:11.3.4]
56 C

cross correlation matching:

Matching based on the cross correlation
of two sets. The closer the correlation is CROSS SECTION AXIS CROSS SECTION FUNCTION
to 1, the better the match is. For
example, in correlation based stereo ,
for each pixel in the first image, the
corresponding pixel in the second image
is the one with the highest correlation TRUNCATED PYRAMID

score, where the sets being matched are cross-validation: A test of how well a
the local neighborhoods of each pixel. model generalizes to other data (i.e.,
[ NA:5.3.1] using samples other than those that
were used to create the model). This
cross ratio: The simplest projective
approach can be used to determine
invariant. It generates a scalar from
when to stop training/learning, before
four points of any 1D projective space
over-generalization occurs. See also
(e.g., a projective line). The cross ratio
leave-one-out test . [ FP:16.3.5]
for the four points ABCD below is
[ FP:13.1]: crossing number: The crossing
number of a graph is the minimum
number of arc intersections in any
(r + s)(s + t) drawing of that graph. A planar graph
s(r + s + t) has crossing number zero. This graph
has a crossing number of one
[ ERD:6.8.1]:

b c a

cross section function: Part of the

generalized cylinder representation that
CSG: See constructive solid geometry
gives a volumetric based representation
[ BT:8]
of an object. The representation defines
the volume by a curved axis, a cross CT: See X-ray CAT .
section and a cross section function at [ WP:X-ray computed tomography]
each point on that axis. The cross
section function defines how the size or cumulative histogram: A histogram
shape of the cross section varies as a where the bin contains not only the
function of its position along the axis. count of all instances having that value
See also generalized cone . This but also the count of all bins having a
example shows how the size of the lower index value. This is the discrete
square cross section varies along a equivalent of the cumulative probability
straight line to create a truncated distribution. The right figure is the
pyramid: cumulative histogram corresponding to
C 57

the normal histogram on the left:

[ WP:Histogram#Cumulative histogram] curvature primal sketch: A
multi-scale representation of the
significant changes in curvature along a
planar curve . [ NA:4.8]
6 12
4 8 curvature scale space: A multi-scale
2 4 representation of the curvature
zero-crossing points of a planar contour
1 2 3 4 5 1 2 3 4 5
as it evolves during smoothing. It is
found by parameterizing the contour
currency verification: Algorithms for using arc length, which is then
checking that printed money and convolved with a Gaussian filter of
coinage are genuine. A specialist field increasing standard deviation.
involving optical character recognition . Curvature zero-crossing points are then
recovered and mapped to the
scale-space image with the horizontal
curse of dimensionality: The axis representing the arc length
exponential growth of possibilities as a parameter on the original contour and
function of dimensionality . This might the vertical axis representing the
manifest as several effects as the standard deviation of the Gaussian
dimensionality increases: 1) the filter. [ WP:Curvature Scale Space]
increased amount of computational
effort required, 2) the exponentially curvature sign patch classification:
increasing amount of data required to A method of local surface classification
populate the data space in order that based on its mean and
training works and 3) how all data Gaussian curvature signs, or
points tend to become equidistant from principal curvature sign class . See also
each other, thus causing problems for mean and Gaussian curvature shape
clustering and machine learning classification .
curve: A set of connected points in 2D
[ WP:Curse of dimensionality]
or 3D, where each point has at most
cursive script recognition: Methods two neighbors. The curve could be
of optical character recognition defined by a set of connected points, by
whereby hand-written cursive (also an implicit function (e.g., y + x2 = 0),
called joined-up) characters are by an explicit form (e.g., (t, t2 ) for all
automatically classified. [ BM:5.2] t), or by the intersection of two surfaces
(e.g., by intersecting the planes X = 0
curvature: Usually meant to refer to and Y = 0), etc. [ NA:4.6.2]
the change in shape of a curve or
surface . Mathematically, the curvature curve binormal: The vector
of a curve is the length of the second perpendicular to both the tangent and
derivative | s~
2 | of the curve ~x(s) normal vectors to a curve at any given
parameterized as a function of arc point:
length s. A related definition holds for
surfaces, only here there are two
distinct principal curvatures at each
point on a sufficiently smooth surface.
[ NA:4.6]
58 C

BINORMAL curve fitting: Methods for finding the

TANGENT parameters of a best-fit curve through a
set of 2D (or 3D) data points. This is
often posed as a minimization of the
least-squares error between some
NORMAL hypothesized curve and the data points.
curve bitangent: A line tangent to a If the curve, y(x), can be thought of as
curve or surface at two different points, the sum of a set of m arbitrary basis
as illustrated here: [ WP:Bitangent] functions, Xk and written
INFLECTION POINTS y(x) = ak Xk (x)

then the unknown parameters are the

weights ak . The curve fitting process
can then be considered as the
BITANGENT POINTS LINE minimization of some log-likelihood
function giving the best fit to N points
curve evolution: A curve abstraction whose Gaussian error has standard
method whereby a curve can be deviation i . This function may be
iteratively simplified, as in this defined as
X yi y(xi ) 2
2 =

The weights that minimize this can be

found from the design matrix D
Xj (xi )
Di,j =
by finding the solution to the linear
Da = r
where the vector ri = i . [ NA:4.6.2]

curve inflection: A point on a curve

where the curvature is zero as it
changes sign from positive to negative,
For example, a relevance measure is as in the two examples below
assigned to every vertex in the curve. [ FP:19.1.1]:
The least important can be removed at
each iteration by directly connecting its INFLECTION POINTS
neighbors. This elimination is repeated
until the desired stage of abstraction is
reached. Another method of curve
evolution is to progressively smooth
the curve with Gaussian weighting of
increasing standard deviation. BITANGENT POINTS
C 59

curve representation system:

curve invariant: Measures taken over Methods of representing or modeling
a curve that remain invariant under curves parametrically. Examples
certain transformations, e.g., include: b-splines , crack codes ,
arc length and curvature are invariant cross section functions ,
under Euclidean transformations . Fourier descriptors , intrinsic equations,
polycurves , polygonal approximations ,
curve invariant point: A point on a
radius vector functions , snakes ,
curve that has a geometric property
splines, etc. [ JKS:6.1]
that is invariant to changes in
projective transformation . Thus, the curve saliency: A voting method for
point can be identified and used for the detection of curves in a 2D or 3D
correspondence in multiple views of the image. Each pixel is convolved with a
same scene. Two well known planar curve mask to build a saliency map.
curve invariant points are curvature This map will hold high values for
inflection points and bitangent points, locations in space where likely
as shown here: candidates for curves exist.

INFLECTION POINTS curve segmentation: Methods of

identifying and splitting curves into
different primitive types. The location
of changes between one primitive type
and another is particularly important.
For example, a good curve
LINE segmentation algorithm should detect
the four lines that make up a square.
curve matching: The comparison of Methods include: corner detection ,
data sets to previously modeled curves Lowes method and recursive splitting .
or other curve data sets. If a modeled
curve closely corresponds to a data set
then an interpretation of similarity can curve smoothing: Methods for
be made. Curve matching differs from rounding polygon approximations or
curve fitting in that curve fitting vertex-based approximations of surface
involves minimizing the parameters of boundaries. Examples include
theoretical models rather than actual Bezier curves in 2D and NURBS in
examples. 3D. See also curve evolution . An
example of a polygonal data curve
curve normal: The vector smoothed by a Bezier curve is:
perpendicular to the tangent vector to
a curve at any given point and that
also lies in the plane that locally
contains the curve at that point:


60 C

to base the reconstructed 3D

coordinates, or what viewpoint to use
when presenting the reconstruction.
The cyclopean viewpoint is located at
the midpoint of the baseline between
the two cameras.

cylinder extraction: Methods of

identifying the cylinders and the
constituent data points from 2.5D and
data curve 3D images that are samples from 3D
smoothed curve cylinders.
cylinder patch extraction: Given a
curve tangent vector: The vector range image or a set of 3D data points,
that is instantaneously parallel to a cylinder patch extraction finds (usually
curve at any given point: connected) sets of points that lie on the
surface of a cylinder, and usually also
the equation of that cylinder. This
process is useful for detecting and
modelling pipework in range images of
industrial scenes.

cylindrical mosaic: A
photomosaicing approach where
cut detection: The identification of individual 2D images are projected onto
the frames in film or video where the a cylinder. This is possible only when
camera viewpoint suddenly changes, the camera rotates about a single axis
either to a new viewpoint within the or the camera center of projection
current scene or to a new scene. remains approximately fixed with
[ WP:Shot transition detection] respect to the distance to the nearest
scene points.
cyclopean view: A term used in
stereo image analysis, based on the cylindrical surface region: A region
mythical one-eyed Cyclops. When of a surface that is locally cylindrical.
stereo reconstruction of a scene occurs A region in which all points have zero
based on two cameras, one has to Gaussian curvature , and nonzero
consider what coordinate system to use mean curvature.

darkfield illumination: A specialized data reduction: A general term for

illumination technique that uses oblique processes that 1) reduce the number of
illumination to enhance contrast in data points, e.g., by subsampling or by
subjects that are not imaged well under using cluster centers of mass as
normal illumination conditions. representative points or by decimation ,
[ LG:2.1.1] or 2) reduce the number of dimensions
in each data point, e.g., by projection
data fusion: See sensor fusion . or principal component analysis
[ WP:Data fusion] (PCA). [ WP:Data reduction]
data integration: See sensor fusion . data structure: A fundamental
[ WP:Data integration] concept in programming: a collection of
computer data organized in a precise
data parallelism: Reference to the
structure, for instance a tree (see for
parallel structuring of either the input
instance quadtree ), a queue, or a stack.
to programs, the organization of
Data structures are accompanied by
programs themselves or the
sets of procedures, or libraries,
programming language used. Data
implementing various types of data
parallelism is a useful model for much
manipulation, for instance storage and
image processing because the same
indexing. [ WP:Data structure]
operation can be applied independently
and in parallel at all pixels in the DCT: See discrete cosine transform .
image. [ RJS:8] [ SEU:2.5.2]

62 D

deblur: To remove the effect of a

known blurring function on an image.
If an observed image I is the Rule

convolution of an unknown image I ? ? ?

and a known blurring kernel B, so that
I = I B, then deblurring is the ? ? ? ? ?
process of computing I given I and B.
See deconvolution , image restoration ,
Wiener filtering . ? ? ? ? ? ?

decentering distortion (lens): Lens

Decisions made
decentering is a common cause of
? Decisions
tangential distortion . It arises when
the lens elements are not perfectly Results

aligned and creates an asymmetric decoding: Converting a signal that

component to the distortion. has been encoded back into its original
[ WP:Distortion (optics)#Software correction]
form (lossless coding) or into a form
close to the original (lossy coding). See
decimation: 1) In digital signal also image compression .
processing, a filter that keeps one [ WP:Decoding]
sample out of every N , where N is a decomposable filters: A complex
fixed number. See also subsampling . 2) filter that can be applied as a number
Mesh decimation: merging of similar of simpler filters applied one after the
adjacent surface patches or other. For example the 2D
mesh vertices in order to reduce the Laplacian of Gaussian filter can be
size of a model. Often used as a decomposed into four simpler filters.
processing step when deriving a surface
model from a range image . deconvolution: The inverse process of
[ WP:Decimation (signal processing)] convolution. Deconvolution is used to
remove certain signals (for example
decision tree: Tools for helping to blurring) from images by
choose between several courses of inverse filtering (see deblur ). For a
action. They are an effective structure convolution producing image
within which an agent can search h = f g + given f and g, the image
options and investigate the possible and convolution mask, is the noise
outcomes. They also help to balance and is the convolution, deconvolution
the risks and rewards associated with attempts to estimate f . Deconvolution
each possible course of action. is often an ill-posed problem and may
[ WP:Decision tree] not have a unique solution. See also
image restoration . [ AL:14.5]

defocus: Blurring of an image, either

accidental or deliberate, by incorrect
focus or viewpoint parameters use or
estimation. See also shape from focus ,
shape from defocus . [ BKPH:6.10]

defocus blur: Deformation of an

image due to the predictable behavior
D 63

of optics when incorrectly adjusted. which gets corrupted by unwanted

The blurring is the result of light rays processes. For instance, MPEG
that, after entering the optical system, compressiondecompression can alter
misconverge on the imaging plane. If some intensities, so that the image is
the camera parameters are known in degraded. (See also
advance, the blurring can be partially JPEG image compression), image noise.
corrected. [ BKPH:6.10] [ WP:Degradation (telecommunications)]

deformable model: Object

descriptors that model a specific class degree of freedom: A free variable in
of deformable objects (e.g., eyes, a given function. For instance,
hands) where the shapes vary according rotations in 3D space depend on three
to the values of the parameters. If the angles, so that a rotation matrix has
general, but not specific, characteristics nine entries but only three degrees of
of an object type are known then a freedom. [ VSN:3.1.3]
deformable model can be constructed
and used as a matching template for Delaunay triangulation: The
new data. The degree of deformation Delaunay graph of the point set can be
needed to match the shape can be used constructed from its Voronoi diagram
as matching score. See also by connecting the points in adjacent
modal deformable model , polygons. The connections form the
geometric deformable model . Delaunay triangulation. The
[ WP:Active contour model] triangulation has the property that the
circumcircle of every triangle contains
deformable shape: See no other points. The approach can be
deformable model . used to construct a polyhedral surface
approximation from a set of 3D sample
deformable superquadric: A type of points. The solid lines connecting the
superquadric volumetric model that points below are the Delaunay
can be deformed by bending, twisting, triangulation and the dashed lines are
etc. in order to fit to the data being the boundaries of the Voronoi diagram.
modeled. [ OF:10.4.4]
deformable template model: See
deformable model .

deformation energy: The metric

that must be minimized when
determining an active shape model .
Comprised of terms for both
internal energy (or force) arising from
the model shape deformation and
external energy (or force) arising from
the discrepancy between the model demon: A program that runs in the
shape and the data. background, for instance performing
[ WP:Internal energy#Description and definition]
checks or guaranteeing the correct
functioning of a module of a complex
degradation: A loss of quality system.
suffered by an image, the content of [ WP:Daemon (computer software)]
64 D

demosaicing: The process of relationships among the depth, camera

converting a single color per pixel parameters and the amount of blurring
image (as captured by most in images to derive the depths from
digital cameras ) into a three color per parameters that can be directly
pixel image. [ WP:Demosaicing] measured.

DempsterShafer: A belief modeling depth from focus: A method to

approach for testing a hypothesis that determine distance to one point by
allows information, in the form of taking many images in better and
beliefs, to be combined into a better focus. This is also called
plausibility measure for that hypothesis. autofocus or software focus.
[ WP:Dempster-Shafer theory] [ WP:Depth of focus]

dense reconstruction: A class of depth image: See range image .

techniques estimating depth at each [ JKS:11]
pixel of an input image or sequence,
thus generating a dense sampling of the depth image edge detector: See
3D surfaces imaged. This can be range image edge detector .
achieved, for instance, by
depth map: See range image .
range sensing, or stereo vision .
[ JKS:11]
dense stereo matching: A class of
depth of field: The distance between
methods establishing the
the nearest and the farthest point in
correspondence (see
focus for a given camera [ JKS:8.3]:
stereo correspondence problem )
between all pixels in a stereo pair of
Nearest point Furthest point
images. The generated disparity map in focus in focus
can then be used for depth estimation.

densitometry: A class of techniques

that estimate the density of a material
from images, for instance bone density Depth of field
in the medical domain (bone
densitometry). [ WP:Densitometry] depth perception: The ability to
perceive distances from visual stimuli,
depth: Distance of scene points from for instance motion or stereo vision .
either the camera center or the camera [ WP:Depth perception]
imaging plane. In a range image , the
intensity value in the image is a
measure of depth. [ JKS:13.1]

depth estimation: The process of 3D model

estimating the distance between a
sensor (e.g., a stereo pair) and a part of
the scene being imaged. Stereo vision
and range sensing are two well-known
ways to estimate depth.

depth from defocus: The depth from

defocus method uses the direct View 1 View 2
D 65

depth sensor: See range sensor . dichroic filter: A dichroic filter

[ BT:8] selectively transmits light of a given
wavelength. [ WP:Dichroic filter]
Deriche edge detector: Convolution
filter for edge finding similar to the dichromatic model: The dichromatic
Canny edge detector . Deriche uses a model states that the light reflected
different optimal operator where the from a surface is the sum of two
filter is assumed to have infinite extent. components, body and interface
The resulting convolution filter is reflectance. Body reflectance follows
sharper than the derivative of the Lamberts law. Interface reflectance
Gaussian that Canny uses models highlights. The model has been
applied to several computer vision tasks
f (x) = Axe including color constancy , shape
recovery and color image segmentation .
See also edge detection . See also color .

derivative based search: Numerical difference image: An image

optimization methods assuming that computed as pixelwise difference of two
the gradient can be estimated. An other images, that is, each pixel in the
example is the quasi-Newton approach, difference image is the difference
that attempts to generate an estimate between the pixels at the same location
of the inverse Hessian matrix. This is in the two input images. For example,
then used to determine the next in the figure below the right image is
iteration point. the difference of the left and middle
images (after adding 128 for display
purposes). [ RJS:5]


diffeomorphism: A differentiable
one-to-one map between manifolds.
The map has a differentiable inverse.
[ WP:Diffeomorphism]

difference-of-Gaussians operator:
A convolution operator used to locate
edges in a gray-scale image using an
Conjugate gradient search approximation to the
Laplacian of Gaussian operator. In 2D
DFT: See discrete Fourier transform . the convolution mask is:
[ SEU:2.5.1]
(x2 +y 2 )

(x2 +y 2 )

2 2
c1 e 1
c2 e 2

diagram analysis: Syntactic analysis

of images of line drawings, possibly where the constants c1 and c2 control
with text in a report or other the height of the individual Gaussians
document. This field is closely related and 1 , 2 are the standard deviations.
to the analysis of visual languages. [ CS:4.5.4]
66 D

differential geometry: A field of m=2

mathematics studying the local m=1

derivative-based properties of curves m=0

and surfaces, for instance tangent plane

and curvature . [ TV:A.5]

differential invariant: Image source Light banding

descriptors that are invariant under Diffraction


geometric transformations as well as

illumination changes. Invariant diffuse illumination: Light energy
descriptors are generally classified as that comes from a multitude of
global invariants (corresponding to directions, hence not causing significant
object primitives) and local invariants shading or shadow effects. The opposite
(typically based on derivatives of the of diffuse illumination is
image function). The image function is directed illumination .
always assumed to be continuous and
differentiable. diffuse reflection: Scattering of light
[ WP:Differential invariant] by a surface in many directions. Ideal
Lambertian diffusion results in the
differential pulse code modulation: same energy being reflected in every
A technique for converting an analogue direction regardless of the direction of
signal to binary by sampling it, the incoming light energy.
expressing the value of the sampled [ WP:Diffuse reflection]
data modulation in binary and then
reducing the bit rate by taking account
of the fact that consecutive samples do
not change much. [ AJ:11.3] Reflected Light

differentiation filtering: See

gradient filter.

diffraction: The bending of light rays diffusion smoothing: A technique

at the edge of an object or through a achieving Gaussian smoothing as the
transparent medium. The amount by solution of a diffusion equation with the
which a ray is bent is dependent on image to be filtered as the initial
wavelength. [ VSN:2.1.4] boundary condition. The advantage is
diffraction grating: An array of that, unlike repeated averaging,
diffracting elements that has the effect diffusion smoothing allows the
of producing periodic alterations in a construction of a continuous
waves phase, amplitude or both. The scale space .
simplest arrangement is an array of slits digital camera: A camera in which
(see moire interferometry ). the image sensing surface is made up of
[ WP:Diffraction grating] individual semiconductor sampling
elements (typically one per pixel of the
image), and quantized versions of the
sensed values are recorded when an
image is captured .
[ WP:Digital camera]
D 67

digital elevation map: A sampled blood vessels are made more visible by
and quantized map where every point using an X-ray contrast medium. See
represents a height above a reference also medical image registration .
ground plane (i.e., the elevation). [ WP:Digital subtraction angiography]

digital terrain map: See

digital elevation map .

digital topology: Topology (i.e., how

things are connected/arranged) in a
digital domain (e.g., in a
digital image). See also connectivity .
[ WP:Digital topology]

digital watermarking: The process

of embedding a signature/watermark
digital geometry: Geometry (points,
into digital data. In the domain of
lines, angles, surfaces, etc.) in a
digital images this is most normally
sampled and quantized domain.
done for copyright protection. The
[ WP:Digital geometry]
digital watermark may be invisible or
digital image: Any sampled and visible (as shown).
quantized . image [ SEU:1.7] [ WP:Digital watermarking]

41 43 45 51 56 49 45 40
56 48 65 85 55 52 44 46
59 77 99 81 127 83 46 56
52 116 44 54 55 186 163 163
51 129 46 48 71 164 86 97
50 85 192 140 167 99 51 44
57 63 91 126 102 56 54 49
146 169 213 246 243 139 180 163
41 44 54 56 47 45 36 54 digitization: The process of making a
sampled digital version of some analog
digital image processing: signal (such as an image).
Image processing restricted to the [ WP:Digitizing]
domain of digital images .
[ WP:Digital image processing] dihedral edge: The edge made by two
planar surfaces. A fold in a surface:
digital signal processor: A class of
co-processors designed to execute
processing operations on digitized
signals efficiently. A common
characteristic is the provision of a fast
multiply and accumulate function, e.g.,
a a + b c.
[ WP:Digital signal processor]

digital subtraction angiography: A dilate operator: The operation of

basic technique used in medical image expanding a binary or gray-scale
processing to detect, visualize and object with respect to the background .
inspect blood vessels, based on the This has the effect of filling in any
subtraction of a background image from small holes in the object(s) and joining
the target image, usually where the any object regions that are close
68 D

together. Most frequently described as discontinuity preserving

a morphological transformation , and is regularization: A method for
the dual of the erode operator . preserving edges (discontinuities) from
[ SEU:2.4.6] being blurred as a result of some
regularization operation (such as the
recovery of a dense disparity map from
a sparse set of disparities computed at
matching feature points).

discontinuous event tracking:

Tracking of events (such as a moving
person) through a sequence of images.
The discontinuous nature of the
dimensionality: The number of tracking is caused by the distance that
dimensions that need to be considered. a person (or hand, arm, etc.) can travel
For example 3D object location is often between frames and also be the
considered as a seven dimensional possibility of occlusion (or
problem (three dimensions for position, self-occlusion).
three for orientation and one for the
object scale). [ SQ:18.3.2]

direct least square fitting: Direct

fitting of a model to some data by a
method that has a closed form or
globally convergent solution.

directed illumination: Light energy

that comes from a particular direction discrete cosine transform (DCT):
hence causing relatively sharp shadows. A transformation that converts digital
The opposite of this form of images into the frequency domain in
illumination is diffuse illumination . terms of the coefficients of discrete
cosine functions. Used, for example,
directional derivative: A derivative within JPEG image compression .
taken in a specific direction, for [ SEU:2.5.2]
instance, the component of the
gradient along one coordinate axis. discrete Fourier transform (DFT):
The images on the right are the vertical A version of the Fourier transform for
and horizontal directional derivatives of sampled data. [ SEU:2.5.1]
the image on the left.
discrete relaxation: A technique for
[ WP:Directional derivative]
labeling objects in which the possible
type of each object is iteratively
constrained based on relationships with
other objects in the scene. The aim is
to obtain a globally consistent
interpretation (if possible) from locally
consistent relationships.

discontinuity detection: See discrimination function: A binary

edge detection . function separating data into two
classes. See classifier . [ DH:2.5.1]
D 69

applied to binary images in which every

disparity: The image distance shifted object point is transformed into a value
between corresponding points in stereo representing the distance from the
image pairs. [ JKS:11.1] point to the nearest object boundary.
This operation is also referred to as
Left image features Right image features Disparity
chamfering (see chamfer matching ).
[ JKS:2.5.9]

disparity gradient: The gradient of a

disparity map for a stereo pair, that
4 3 2 2 2 2 1 1 1 1
estimates the surface slope at each 4 3 2 1 1 1 1 0 0 0
4 3 2 1 0 0 0 0 1 0
image point. See also binocular stereo . 4

[ OF:6.2.5] 4
4 3 2 1 1 1 1 1 1 1
4 3 2 2 2 2 2 2 2 2

disparity gradient limit: The

maximum allowed disparity gradient in
a potential stereo feature match.
distortion coefficient: A coefficient
disparity limit: The maximum
in a given image distortion model, for
allowed disparity in a potential stereo
instance k1 , k2 in the
feature match. The notion of a
distortion polynomial . See also
disparity limit is supported by evidence
pincushion distortion , barrel distortion
from the human visual system.
dispersion: Scattering of light by the
distortion polynomial: A polynomial
medium through which it is traveling.
model of radial lens distortion . A
[ WP:Dispersion (optics)]
common example is
distance function: See x = xd (1 + k1 r2 + k2 r4 ),
distance metric . [ JKS:2.5.8] y = yd (1 + k1 r2 + k2 r4 ). Here, x, y are
the undistorted image coordinates,
distance map: See range image . xd , yd are the distorted image
[ JKS:11] coordinates, r2 = x2d + yd2 , and k1 , k2
are the distortion coefficients . Usually
distance metric: A measure of how k2 is significantly smaller than k1 , and
far apart two things are in terms of can be set to 0 in cases where high
physical distance or similarity. A metric accuracy is not required.
can be other functions besides the
standard Euclidean distance , such as distortion suppression: Correction
the algebraic or Mahalanobis of image distortions (such as
distances. A true metric must satisfy: non-linearities introduced by a lens).
1) d(x, y) + d(y, z) d(x, z), 2) See geometric distortion and
d(x, y) = d(y, x), 3) d(x, x) = 0 and 4) geometric transformation .
d(x, y) = 0 implies x = y, but computer
vision processes often use functions that dithering: A technique simulating the
do not satisfy all of these criteria. appearance of different shades or colors
[ JKS:2.5.8] by varying the pattern of black and
white (or different color) dots. This is a
distance transform: An common task for inkjet printers.
image processing operation normally [ AL:4.3.5]
70 D

character recognition and

document mosaicing ).

document mosaicing:
Image mosaicing of documents.

document retrieval: Identification of

a document in a database of scanned
documents based on some criteria.
divide and conquer: A technique for [ WP:Document retrieval]
solving problems efficiently by
subdividing the problem into smaller DoG: See difference of Gaussians .
subproblems, and then recursively [ CS:4.5.4]
solving these subproblems in the
expectation that the smaller problems dominant plane: A degenerate case
will be easier to solve. An example is encountered in uncalibrated
an algorithms for deriving a structure and motion recovery where
polygonal approximation of a contour most or all of the tracked
in which a straight line estimate is image features are coplanar in the
recursively split in the middle (into two scene.
segments with the midpoint put exactly
Doppler: A physics phenomenon
on the contour) until the distance
whereby an instrument receiving
between the polygonal representation
acoustic or electromagnetic waves from
and the actual contour is below some
a source in relative motion measures an
increasing frequency if the source is
[ WP:Divide and conquer algorithm]
approaching, and decreasing if receding.
The acoustic Doppler effect is employed
Final Estimate in sonar sensors to estimate target
velocity as well as position.
[ WP:Doppler effect]

Initial Estim
ate downhill simplex: A method for
finding a local minimum using a
divisive clustering: simplex (a geometrical figure specified
Clustering/cluster analysis in which all by N + 1 vertices) to bound the
items are initially considered as a single optimal position in an N -dimensional
set (cluster) and subsequently divided space. See also optimization .
into component subsets (clusters). [ WP:Nelder-Mead method]

DIVX: An MPEG 4 based video DSP: See digital signal processor .

compression technology aiming to [ WP:Digital signal processing]
achieve sufficiently high compression to
enable transfer of digital video contents dual of the image of the absolute
over the Internet, while maintaining conic (DIAC): If is the matrix
high visual quality. [ WP:DivX] representing the image of the
absolute conic , then 1 represents its
document analysis: A general term dual (DIAC). Calibration constraints
describing operations that attempt to are sometimes more readily expressed
derive information from documents in terms of the DIAC than the IAC.
(including for example [ HZ:7.5]
D 71

duality: The property of two concepts

or theories having similar properties
that can be applied to the one or to the
other. For instance, several relations
linking points in a projective space are
formally the same as those linking lines
in a projective space; such relations are
dual. [ OF:2.4.1]
dynamic scene: A scene in which
dynamic appearance model: A some objects move, in contrast to the
model describing the changing common assumption in
appearance of an object/scene over shape from motion that the scene is
time. rigid and only the camera is moving.

dynamic programming: An dynamic stereo: Stereo vision for a

approach to numerical optimization in moving observer. This allows
which an optimal solution is searched shape from motion techniques to be
by keeping several competing partial used in addition to the stereo
paths throughout and pruning techniques.
alternative paths that reach the same
point with a suboptimal value. dynamic time warping: A technique
[ VSN:7.2.2] for matching a sequence of observations
(usually one per time sample) to a
dynamic range: The ratio of the model sequence of feature, where the
brightest and darkest values in an hope is for a one-to-one match of
image. Most digital images have a observations to features. But, because
dynamic range of around 100:1 but of variations in rate at which
humans can perceive detail in dark observations are produced, some
regions when the range is even 10,000:1. features may get skipped or others
To allow for this we can create high matched to more than one observation.
dynamic range images. [ SQ:4.2.1] The usual goal is to minimize the
amount of skipping or multiple samples
matched (time warping). Efficient
algorithms to solve this problem exist
based on the linear ordering of the
sequences. See also
hidden Markov models (HMM) .
[ WP:Dynamic time warping]

early vision: A general term referring length of any orthogonal chord.

to the initial stages of computer vision [ WP:Eccentricity (mathematics)]
(i.e., image capture and
image processing ). Also known as

low level vision . [ BKPH:1.4]
um o
earth movers distance: A metric for rthog

comparing two distributions by hord

um C

evaluating the minimum cost of


transforming one distribution into the

other (e.g., can be applied to
color histogram matching ).
[ FP:25.2.2]
echocardiography: Cardiac
Distribution 1 Distribution 2 Transformation ultrasonography (echocardiography) is
a non-invasive technique for imaging
the heart and surrounding structures.
Generally used to evaluate cardiac
chamber size, wall thickness, wall
motion, valve configuration and motion
eccentricity: A shape representation and the proximal great vessels.
that measures how non-circular a shape [ WP:Echocardiography]
is. One way of computing this is to take
the ratio of the maximum chord length edge: A sharp variation of the
of the shape to the maximum chord intensity function. Represented by its
E 73

position, the magnitude of the intensity

gradient, and the direction of the
maximum intensity variation. [ FP:8]

edge based segmentation:

Segmentation of an image based on the
edges detected.
edge finding: See edge detection .
edge based stereo: A type of [ FP:8.3]
feature based stereo where the features
used are edges . [ VSN:7.2.2] edge following: See edge tracking .
[ FP:8.3.2]
edge detection: An image processing
operation that computes edge vectors edge gradient image: See
(gradient and orientation) for every edge image . [ WP:Image gradient]
point in an image. The first stage of
edge based segmentation . [ FP:8.3] edge grouping: See edge tracking .

edge image: An image where every

pixel represents an edge or the
edge magnitude .

edge linking: See edge tracking .

[ AJ:9.4]

edge magnitude: A measure of the

contrast at an edge, typically the
magnitude of the intensity gradient at
the edge point. See also edge detection,
edge point . [ JKS:5.1]

edge direction: The direction edge matching: See curve matching .

perpendicular to the normal to an [ BKPH:13.9.3]
edge, that is, the direction along the
edge motion: The motion of edges
edge, parallel to the lines of constant
through a sequence of images. See also
intensity. Alternatively, the normal
shape from motion and the
direction to the edge, i.e., the direction
aperture problem . [ JKS:14.2.1]
of maximum intensity change
(gradient). See also edge detection , edge orientation: See edge direction .
edge point . [ TV:4.2.2] [ TV:4.2.2]
edge enhancement: An edge point: 1) A location in an image
image enhancement operation that where some quantity (e.g., intensity)
makes the gradient of edges steeper. changes rapidly. 2) A location where
This can be achieved, for example, by the gradient is greater than some
adding some multiple of a Laplacian threshold. [ FP:8]
convolved version of the image L(i, j)
to the image g(i, j). edge preserving smoothing: A
f (i, j) = g(i, j) + L(i, j) where f (i, j) smoothing filter that is designed to
is the enhanced image and is some preserve the edges in the image while
constant. [ RJS:4] reducing image noise . For example see
74 E

median filter . of A are images of faces. These vectors

[ WP:Edge-preserving smoothing] can be used for face recognition .
[ WP:Eigenface]

eigenspace based recognition:

Recognition based on an
eigenspace representation . [ TV:10.4]
edge sharpening: See eigenspace representation: See
edge enhancement . [ RJS:4] principal component representation.
[ TV:10.4.2]
edge tracking: 1) The grouping of
edges into chains of significant edges. eigenvalue: A scalar that for a
The second stage of matrix A satisfies Ax = x where x is a
edge based segmentation . Also known nonzero vector ( eigenvector ).
as edge following , edge grouping and [ SQ:2.2.3]
edge linking . 2) Tracking how the edge
moves in a video sequence. [ ERD:4] eigenvector: A non-zero vector x that
for a matrix A satisfies Ax = x where
edge type labeling: Classification of is a scalar (the eigenvalue ).
edge points or edges into a limited [ SQ:2.2.3]
number of types (e.g., fold edge ,
shadow edge, occluding edge, etc.). eigenvector projection: Projection
[ ERD:6.11] onto the PCA basis vectors.
[ SQ:13.1.4]
EGI: See extended Gaussian image .
[ FP:20.3] electromagnetic spectrum: The
entire range of frequencies of
egomotion: The motion of the electromagnetic waves including X-rays,
observer with respect to the observed ultraviolet, visible light, infrared,
scene. [ FP:17.5.1] microwave and radio waves. [ EH:3.6]
egomotion estimation:
Wavelength (in meters)
Determination of the motion of a -12 -10 -8 -6 -4 -2 2 4
10 10 10 10 10 10 1 10 10
camera. Generally based on image
features corresponding to static objects X rays Microwave Radio
in the scene. See also Ultraviolet Visible Infrared

structure and motion . A typical image

pair where the camera position is to be ellipse fitting: Fitting of an ellipse
estimated is: [ WP:Egomotion] model to the boundary of some shape,
data points, etc. [ TV:5.3]
Image from Position A Image from Position B

Position A Position B

000 Motion of the observer
111 000
eigenface: An eigenvector determined ellipsoid: A 3D volume in which all
from a matrix A in which the columns plane cross sections are ellipses or
E 75

circles. An ellipsoid is the set of points MPEG and JPEG image compression .
2 2 2
(x, y, z) satisfying xa2 + yb2 + zc2 = 1. [ WP:Code]
Ellipsoids are used in computer vision
as a basic shape primitive and can be endoscope: An instrument for visually
combined with other primitives in order examining the interior of various bodily
to describe a complex shape. [ SQ:9.9] organs. See also fiberscope .
[ WP:Endoscopy]
elliptic snake: An
active contour model of an ellipse energy minimization: The problem
whose parameters are estimated of determining the absolute minimum
through energy minimization from an of a multivariate function representing
initial position. (by a potential energy-like penalty) the
distance of a potential solution from
elongatedness: A the optimal solution. It is a
shape representation that measures specialization of the optimization
how long a shape is with respect to its problem. Two popular minimization
width (i.e., the ratio of the length of algorithms in computer vision are the
the bounding box to its width), as LevenbergMarquardt and Newton
illustrated below. See also eccentricity . optimization methods.
[ WP:Elongatedness] [ WP:Energy minimization]

entropy: 1. Colloquially, the amount

of disorder in a system. 2. A measure
gth of the information content of a

random variable X. Given that X has


a set of possible values or outcomes X,

with probabilities {P (x), x X}, the
entropy H(X) of X is defined as

P (x) log P (x)
EM: See expectation maximization .
[ FP:16.1.2]
with the understanding that
empirical evaluation: Evaluation of 0 log 0 := 0. For a multivariate
computer vision algorithms in order to distribution, the joint entropy H(X, Y )
characterize their performance by of X, Y is
comparing the results of several
algorithms on standardized test
problems. Careful evaluation is a X
difficult research problem in its own P (x, y) log P (x, y)

encoding: Converting a digital signal,

represented as a set of values, from one For a set of values represented as a
form to another, often to compress the histogram , the entropy of the set may
signal. In lossy encoding, information is be defined as the entropy of the
lost in the process and the decoding probability distribution function
algorithm cannot recover it. See also represented by the histogram.
76 E

epipolar plane image (EPI): An

image that shows how a particular line
from a camera changes as the camera
position is changed such that the image
line remains on the same epipolar plane
. Each line in the EPI is a copy of the
relevant line from the camera at a
different time. Features that are distant
Left: p log p as a function of p. from the camera will remain in the
Probabilities near 0 and 1 signal high same position in each line, and features
entropy, probabilities between are less that are close to the camera will move
entropic. Right: The entropy of the from line to line (the closer the feature
gray scale histograms in some windows the further it will move). [ AL:17.3.4]
on an image. [ AJ:2.13]
Image 1 Image 8
epipolar constraint: A geometric
constraint reducing the dimensionality
of the stereo correspondence problem .
For any point in one image, the possible
matching points in the other image are
constrained to lie on a line known as
EPI from 8 images for highlighted line:
the epipolar line . This constraint may
be described mathematically using the
fundamental matrix . See also
epipolar geometry . [ FP:10.1.1] epipolar plane image analysis: An
approach to determining
epipolar correspondence matching: shape from motion in which epipolar
Stereo matching using the plane images (EPIs) are analyzed. The
epipolar constraint . slope of lines in an EPI is proportional
to the distance of the object from the
epipolar geometry: The geometric
camera, where vertical lines
relationship between two
corresponding to features at infinity
perspective cameras . [ FP:10.1.1]
[ AL:17.3.4]

Real world point epipolar plane motion: See

Optical Center Optical Center epipolar plane image analysis .
Image Plane Image Plane
Eipolar Line Eipolar Line
epipolar rectification: The
Image Point Image Point
Camera 1 Camera 2 image rectification of stereo images so
that the epipolar lines are aligned with
epipolar line: The intersection of the the image rows (or columns).
epipolar plane with the image plane .
See also epipolar constraint . epipolar transfer: The transfer of
[ FP:10.1.1] corresponding epipolar lines in a stereo
pair of images, defined by a
epipolar plane: The plane defined by homography . See also stereo and
any real world scene point together stereo vision . [ FP:10.1.4]
with the optical centers of two
cameras. [ FP:10.1.1] epipole: The point through which all
epipolar lines from a camera appear to
E 77

pass. See also epipolar geometry . function of the translation and rotation
[ FP:10.1.1] of the camera in the world reference
frame. See also the fundamental matrix
Image Epipolar Lines
. [ FP:10.1.2]

Euclidean distance: The geometric

distance between two points (x1 , y1 )
p (x2 , y2 ), i.e.,
(x1 x2 )2 + (y1 y2 )2 . For
n-dimensional Pnvectors ~x1 and ~x21 , the
distance is ( i=1 (x1,i x2,i )2 ) 2 .
epipole location: The operation of
[ SQ:9.1]
locating the epipoles . [ OF:]
Euclidean reconstruction: 3D
equalization: See
reconstruction of a scene using a
histogram equalization . [ JKS:4.1]
Euclidean frame of reference, as
erode operator: The operation of opposed to an affine reconstruction or
reducing a binary or gray scale object projective reconstruction . The most
with respect to the background . This complete reconstruction achievable. For
has the effect of removing any isolated example, using stereo vision .
object regions and separating any
Euclidean space: A representation of
object regions that are only connected
the space of all n-tuples (where n is the
by a thin section. Most frequently
dimensionality ). For example the three
described as a
dimensional Euclidean space (X, Y, Z)
morphological transformation and is
is typically used to describe the real
the dual of the dilate operator .
world. Also known as Cartesian space
[ AL:8.2]
(see Cartesian coordinates ).
[ WP:Euclidean space]

Euclidean transformation: A
transformation that operates in
Euclidean space (i.e., maintaining the
Euclidean spatial arrangements).
error propagation: 1) The Examples include rotation and
propagation of errors resulting from one translation. Often applied to
computation to the next computation. homogeneous coordinates. [ FP:2.1.2]
2) The estimation of the error (e.g., [ SQ:7.3]
variance) of a process based on the Euler angle: The Euler angles
estimates of the error in the input data (, , ) are a particular set of angles
and intermediate computations. describing rotations in three
[ WP:Propagation of uncertainty] dimensional space. [ JKS:12.2.1]
essential matrix: In EulerLagrange: The
binocular stereo, a matrix E expressing EulerLagrange equations are the basic
a bilinear constraint between equations in the calculus of variations ,
corresponding image points u, u in a branch of calculus concerned with
camera coordinates: u Eu = 0. This maxima and minima of definite
constraint is the basis for several integrals. They occur, for instance, in
reconstruction algorithms. E is a
78 E

Lagrangian mechanics and have been method works well even when there are
used in computer vision for a variety of missing values. [ FP:16.1.2]
optimizations, including for surface
interpolation. See also expectation value: The mean value
variational approach and of a function (i.e., the average expected
variational problem . [ TV:9.4.2] value). If p(x) is the probability density
function of a random variable
R x, the
Euler number: The number of expectation of x is x = p(x)xdx.
contiguous parts (regions) less the [ VSN:A2.2]
number of holes. Also known as the
genus. [ AJ:9.10] expert system: A system that uses
available knowledge and heuristics to
even field: The first of the two fields solve problems. See also
in an interlaced video signal. knowledge based vision . [ AL:11.2]
[ AJ:11.1]
exponential smoothing: A method
even function: for predicting a data value (Pt+1 ) based
A function where f (x) = f (x) for all x. on the previous observed value (Dt )
[ WP:Even and odd functions#Even functions] and the previous prediction (Pt ).
Pt+1 = Dt + (1 )Pt where is a
weighting value between 0 and 1.
event analysis: See [ WP:Exponential smoothing]
event understanding .
[ WP:Event study]

event detection: Analysis of a

sequence of images to detect activities
in the scene. Pt (=1.0)

Image from a sequence of images Movement detected in the image

Pt (=0.5)

0 1 2 3 4 5 6 7 8 9

event understanding: Recognition of

an event (such as a person walking) in exponential transformation: See
a sequence of images. Based on the pixel exponential operator .
data provided by event detection .
[ WP:Event study] expression understanding: See
facial expression analysis .
exhaustive matching: Matching
where all possibilities are considered. extended Gaussian image (EGI):
As an alternative see Use of a Gaussian sphere for
hypothesize and verify . histogramming surface normals. Each
surface normal is considered from the
expectation maximization (EM): A center of the sphere and the value
method of finding a maximum associated with the surface patch with
likelihood estimate of some parameters which it intersects is incremented.
based on a sample data set. This [ FP:20.3]
E 79

and an active shape model that is part

of the models deformation energy .
This measure is used to deform the
model to the image data. [ SQ:8.5.1]

extremal point: Points that lie on the

boundary of the smallest convex region
extended light source: A enclosing a set of points (i.e., that lie
light source that has a significant size on the convex hull ). [ SOS:4.6.1]
relative to the scene, i.e., is not
extrinsic parameters: See
approximated well by a
exterior orientation . [ TV:2.4.2]
point light source . In other words this
type of light source has a diameter and eye location: The task of finding eyes
hence can produce fuzzy shadows. in images of faces. Approaches include
Contrast with: point light sources . blink detection, face feature detection ,
[ BKPH:10.5] etc.

eye tracking: Tracking the position of

No shadow the eyes in a face image sequence. Also,
Light Source
Fuzzy shadow tracking the gaze direction .
Complete shadow
[ WP:Eye tracking]

exterior orientation: The position of

a camera in a global coordinate system.
That which is determined by an
absolute orientation calculation.
[ FP:3.4]

external energy (or force): A

measure of fit between the image data

face analysis: A general term covering skin color analysis .

the analysis of face images and models. [ WP:Face detection]
Often used to refer to
facial expression analysis .

face authentication: Verification that

(the image of) a face corresponds to a
particular individual. This differs from
the face recognition in that here only
the model of a single person is face feature detection: The location
considered. of features (such as eyes, nose, mouth)
[ WP:Facial recognition system] from a human face. Normally
performed after face detection
although it can be used as part of
face detection . [ WP:Face detection]

face detection: Identification of faces

within an image or series of images.
This often involved a combination of
human motion analysis and
F 81

face identification: See face feature detection .

face recognition . [ WP:Computer facial animation]
[ WP:Facial recognition system]
facial expression analysis: Study or
face indexing: Indexing from a identification of the facial expression(s)
database of known faces as a precursor of a person from an image or sequence
to face recognition . of images.

face modeling: Representing a face Happy Perplexed Surprised

using some type of model typically

derived from an image (or images).
These models are used in
face authentication , face recognition,

face recognition: The task of factorization: See

recognizing a face from an image as an motion factorization . [ TV:8.5.1]
instance of a person recorded in a
database of faces. false alarm: See false positive .
[ WP:Facial recognition system] [ TV:A.1]

false negative: A binary classifier

c(x) returns + or - for examples x. A
false negative occurs when the classifier
returns - for an example that is in
= reality +. [ TV:A.1]

false positive: A binary classifier c(x)

returns + or - for examples x. A false
positive occurs when the classifier
face tracking: Tracking of a face in a returns + for an example that is in
sequence of images. Often used as part reality -. [ TV:A.1]
of a humancomputer interface. [ F. J.
Huang, and T. Chen, Tracking of fast Fourier transform (FFT): A
multiple faces for human-computer version of the Fourier transform for
interfaces and virtual environments, discrete samples that is significantly
IEEE Int. Conf. on Multimedia and more efficient (order N log2 N ) than the
Expo, Vol. 3, pp 1563-1566, 2000.] standard discrete Fourier transform
(which is order N 2 ) on data sets with
face verification: See N points. [ AL:13.5]
face authentication .
[ WP:Facial recognition system] fast marching method: A type of
level set method in which the search
facet model based extraction: The can move in only one direction (hence
extraction of a model based on facets making it faster).
(small simple surfaces; e.g., see [ WP:Fast marching method]
planar facet model ) from range data .
See also planar patch extraction . feature: 1) A distinctive part of
something (e.g., the nose and eyes are
facial animation: The way in which distinctive features of the face), or an
facial expressions change. See also attribute derived from an object/shape
82 F

(e.g., circularity ). See also feature point: The image location at

image feature . 2) A numerical property which a particular feature is found.
(possibly combined with others to form
a feature vector ) and generally used in feature point correspondence:
a classifier . [ TV:4.1] Matching feature points in two or more
images . The assumption is that the
feature based optical flow feature points are the image of the same
estimation: Calculation of scene point. Having the correspondence
optical flow in a sequence of images allows the estimation of the depth from
from image features . binocular stereo , fundamental matrix ,
homography or trifocal tensor in the
feature based stereo: A solution to case of 3D scene structure recovery or
the stereo correspondence problem in of the 3D target motion in the case of
which image features are compared target tracking. [ TV:8.4.2]
from the two images. The main
alternative approach is feature point tracking: Tracking of
correlation based stereo . individual image features in a sequence
of images.
feature based tracking: Tracking the
motion of image features through a feature selection: Selection of
sequence. [ TV:8.4.2] suitable features (properties) for a
specific task, for example, classification.
feature contrast: The difference Typically features should be
between two features. This can be independent, detectable, discriminatory
measured in many domains (e.g., and reliable. [ FP:22.3]
intensity, orientation, etc.).
[ SEU:2.6.1] feature similarity: How much two
features resemble each other. Measures
feature detection: Identification of of feature similarity are required for
given features in an image (or model). feature based stereo ,
For example see corner detection . feature based tracking,
[ SEU:2.6] feature matching , etc. [ SEU:2.6.1]
feature extraction: See feature space: The dimensions of a
feature detection . [ SEU:2.6] feature space are the feature (property)
values of a given problem. An object or
feature location: See
shape is mapped to feature space by
feature detection . [ SEU:2.6]
computing the values of the set of
feature matching: Matching of features defining the space, typically for
image features in several images of the recognition and classification. In the
same object (for instance, example below, different shapes are
feature based stereo ), or of features mapped to a 2D feature space defined
from an unknown object with features by area and rectangularity.
from known objects (feature based [ SEU:2.6.1]
recognition ). [ TV:8.4.2]

feature orientation: The orientation

of an image feature with respect to the
image frame of reference.
F 83

Ferets diameter


feature stabilization: A technique

for stabilizing the position of an image
feature in an image sequence so that it
remains in a particular position on a FERET: A standard database of face
display (allowing/causing the rest of the images with a defined experimental
image to move relative to that feature). protocol for the testing and comparison
of face recognition algorithms.
[ WP:FERET database]
Original sequence
FFT: See fast Fourier transform .
[ AL:13.5]

Stabilized sequence
fiber optics: A medium for
transmitting light that consists of very
thin glass or plastic fibers. It can be
Stabilized feature
used to provide much higher bandwidth
for signals encoded as patterns of light
pulses. Alternately, it can be used to
feature tracking: See transmit images directly through
feature based tracking . [ TV:8.4.2] rigidly connected bundles of fibers, so
as to see around corners, past obstacles,
feature vector: A vector formed by etc. [ EH:5.6]
the values of a number of image
features (properties), typically all fiberscope: A flexible fiber optic
associated with the same object or instrument allowing parts of an object
image. [ SEU:2.6.1] to be viewed that would normally be
inaccessible. Most often used in medical
feedback: The use of outputs from a examinations. [ WP:Fiberscope]
system to control the systems actions.
[ WP:Feedback] fiducial point: A reference point for a
given algorithm, e.g., a fixed, known,
Ferets diameter: The distance easily detectable pattern for a
between two parallel lines at the calibration algorithm.
extremities of some shape that are
tangential to the boundary of the figureground separation: The
shape. Maximum, minimum and mean segmentation of the area of the image
values of Ferets diameter are often representing the object of interest (the
used (where every possible pair of figure) from the remainder of the image
parallel tangent lines is considered). (the background).
84 F

Image Figure Ground

fingerprint indexing: See
fingerprint database indexing .

finite element model: A class of

numerical methods for solving
differential problems. Another relevant
class is finite difference methods.
[ WP:Finite element method]
figure of merit: Any scalar that is
used to characterize the performance of finite impulse response filter (FIR):
an algorithm. [ WP:Figure of merit] A filter that produces an output value
(yn ) based on the current
Ppand past
filter: In general, any algorithm that input values (xi ). yn = i=0 ai xni
transforms a signal into another. For where ai are weights. See also
instance, bandpass filters infinite impulse response filters .
remove/reduce the parts of an input [ AJ:2.3]
signal outside a given frequency
interval; gradient filters allow only FIR: See finite impulse response filter .
image gradients to pass through; [ AJ:2.3]
smoothing filters attenuate high
frequencies. [ ERD:3] Firewire (IEEE 1394): A serial
digital bus system supporting 400
filter ringing: A type of distortion Mbits per second. Power, control and
caused by the application of a steep data signals are carried in a single
recursive filter. Normally this term cable. The bus system makes it possible
applies to electronic filters in which to address up to 64 cameras from a
certain components (e.g., capacitors single interface card and multiple
and inductors) can store energy and computers can acquire images from the
later release it, but there are also same camera simultaneously.
digital equivalents to this effect. [ WP:IEEE 1394]

filtering: Application of a filter . first derivative filter: See

[ BB:3.1] gradient filter .

fingerprint database indexing: first fundamental form: See

Indexing into a database of fingerprints surface curvature . [ FP:21.2.1]
using a number of features derived from
the fingerprints. This allows a smaller Fisher linear discriminant (FLD):
number of fingerprints to be considered A classification method that maps high
when attempting dimensional data into a single
fingerprint identification within the dimension in such a way as to maximize
database. class separability. [ DH:4.10]

fingerprint identification: fisheye lens: See wide angle lens .

Identification of an individual through [ WP:Fisheye lens]
comparison of an unknown fingerprint
flat field: 1) An object of uniform
(or fingerprints) with previously known
color, used for photometric calibration
of optical systems. 2) A camera system
[ WP:Automated fingerprint identification]
is flat field correct if the gray scale
F 85

output at each pixel is the same for a nuclear magnetic resonance .

given light input. [ AJ:4.4] [ WP:Functional magnetic resonance imaging]

flexible template: A model of a

shape in which the relative position of FOA: See focus of attention .
points is not fixed (e.g., defined in [ WP:Focus of attention]
probabilistic form). This approach FOC: See focus of contraction .
allows for variations in the appearance [ JKS:14.5.2]
of the shape.
focal length: 1) The distance between
FLIR: Forward Looking Infrared. An the camera lens and the focal plane . 2)
infrared system mounted on a vehicle The distance from a lens at which an
looking ahead along the direction of object viewed at infinity would be in
travel. [ WP:Forward looking infrared] focus. [ FP:1.2.2]

LIGHT (from infinity)

Infrared Sensor

Focal Length
flow field: See optical flow field .
[ OF:9.2] focal point: The point on the
optical axis of a lens where light rays
flow histogram: A histogram of the
from an object at infinity (also placed
optical flow in an image sequence. This
on the optical axis) converge.
can be used, for example, to provide a
[ FP:1.2.2]
qualitative description of the motion of
the observer. Focal Point

flow vector field: Optical flow is

Optical Axis
described by a vector (magnitude and
orientation) for each image point.
Hence a flow vector field is the same as
an optical flow field . [ OF:9.2]
focal plane: The plane on which an
fluorescence: The emission of visible image is focused by a lens system.
light by a substance caused by the Generally this consists of an array of
absorption of some other (possibly photosensitive elements. See also
invisible) electromagnetic wavelength. image plane . [ EH:5.2.3]
This property is sometimes used in
focal surface: A term most frequently
industrial machine vision . [ FP:4.2]
used when a concave mirror is used to
fMRI: Functional Magnetic Resonance focus an image (e.g., in a reflector
Imaging, or fMRI, is a technique for telescope). The focal surface in this
identifying which parts of the brain are case is the surface of the mirror.
activated by different types of physical [ WP:Focal surface]
stimulation, e.g., visual or acoustic
Focal Surface
stimuli. A MRI scanner is set up to
register the increased blood flow to the Optical Axis
activated areas of the brain on
Functional MRI scans. See also Focal Point
86 F

directly forwards along the optical axis

focus: To focus a camera is to arrange then the optical flow vectors would all
for the focal points of various image emanate from the principal point
features to converge on the focal plane . (usually near the center of the image).
An image is considered to be in focus if [ FP:10.1.3]
the main subject of interest is in focus.
Note that focus (or lack of focus) can be Two images from a moving observer. Blended Image
used to derive useful information (e.g., FOE

see depth from focus ). [ TV:2.2.2]

In focus Out of focus

FOE: See focus of expansion .
[ FP:10.1.3]

fold edge: A surface orientation

discontinuity. An edge where two
locally planar surfaces meet. The figure
below shows a fold edge.
focus control: The control of the
focus of a lens system usually by
moving the lens along the optical axis FOLD EDGE
or by adjusting the focal length . See
also autofocus .

focus following: A technique for

slowly changing the focus of a camera
as an object of interest moves. See also
depth from focus . [ WP:Follow focus]

focus invariant imaging: Imaging foreground: In computer vision,

systems that are designed to be generally used in the context of object
invariant to focus . Such systems have recognition. The area of the scene or
large depths of field. image in which the object of interest
lies. See figureground separation .
focus of attention (FOA): The [ JKS:2.5.1]
feature or object or area to which the
attention of a visual system is directed. foreshortening: A typical perspective
[ WP:Focus of attention] effect whereby distant objects appear
smaller than closer ones. [ FP:4.1.1]
focus of contraction (FOC): The
point of convergence of the optical flow form factor: The physical size or
vectors for a translating camera. The arrangement of an object. This term is
component of the translation along the frequently used with reference to
optical axis must be nonzero. Compare computer boards. [ FP:5.5.2]
focus of expansion . [ JKS:14.5.2]
Forstner operator: A
focus of expansion (FOE): The feature detector used for
point from which all optical flow corner detection as well as other edge
vectors appear to emanate in a static features. [ WP:Interest-
scene where the observer is moving. For Operator#F.C3.B6rstner-Operator]
example if a camera system was moving
F 87

Fourier space: The frequency domain

forward looking radar: A radar space in which an image (or other
system mounted on a vehicle looking signal) is represented after application
ahead along the direction of travel. See of the Fourier transform.
also side looking radar .
Fourier space smoothing:
FourierBessel transform: See Application of a smoothing filter (e.g.,
Hankel transform . to remove high-frequency noise) in a
[ WP:Hankel transform] Fourier transformed image.
[ SEU:2.5.4]
Fourier domain convolution:
Convolution in the Fourier domain Fourier transform: A transformation
involves simply multiplication of the that allows a signal to be considered in
Fourier transformed image by the the frequency domain as a sum of sine
Fourier transformed filter. For very and cosine waves or equivalently as a
large filters this operation is much more sum of exponentials. For a two
efficient than convolution in the dimensional image F (u, v) =
original domain. [ BB:2.2.4] f (x, y)e2i(xu+yv) dxdy. See

also fast Fourier transform ,
Fourier domain inspection:
discrete Fourier transform and
Identification of defects based on
inverse Fourier transform . [ FP:7.3.1]
features in the Fourier transform of an
image. fovea: The high-resolution central
region of the human retina. The
Fourier image processing:
analogous region in an artificial sensor
Image processing in the
that emulates the retinal arrangement
Fourier domain (i.e., processing images
of photoreceptor, for example a
that have been transformed using the
log-polar sensor. [ FP:1.3]
Fourier transform ). [ SEU:2.5.4]
foveal image: An image in which the
Fourier matched filter object
sampled pattern is inspired by the
recognition: Object recognition in
arrangement of the human fovea, i.e.,
which correlation is determined using a
sampling is most dense in the image
matched filter that is the conjugate of
center and gets progressively sparser
the Fourier transform of the object
towards the periphery of the image.
being located.

Fourier shape descriptor: A

boundary representation of a shape in
terms of the coefficients of a Fourier
transformation. [ BB:8.2.4]

Fourier slice theorem: A slice at an

angle of a 2D Fourier transform of an
object is equal to a 1D Fourier
transform of a parallel projection of the
object taken at the same angle. See
also slice based reconstruction .
[ WP:Projection-slice theorem]
88 F

foveation: 1) The process of creating a frame grabber: See frame store .

foveal image . 2) Directing the camera [ FP:1.4.1]
optical axis to a given direction.
[ WP:Foveated imaging] frame of reference: A
coordinate system defined with respect
fractal image compression: An to some object, the camera or with
image compression method based on respect to the real world.
exploiting self-similarity at different [ WP:Frame of reference]
scales. [ WP:Fractal compression]
Z world
fractal measure/dimension: A
measure of the roughness of a shape.
Consider a curve whose length (L1 and
L2 ) is measured at two scales (S1 and
S2 ). If the curve is rough the length
will grow as the scale is increased. The
fractal dimension is D = log(L 1 L2 ) X cube
log(S2 S1 ) . Z cube
[ JKS:7.4]
X world

fractal representation: A
representation based on self-similarity. Yworld

For example a fractal representation of cylinder X cylinder

an image could be based on similarity

of blocks of pixels. frame store: An electronic device for
recording a frame from an imaging
fractal surface: A surface model that system. Typically such devices are used
is defined progressively using fractals as interfaces between CCIR cameras
(i.e., the surface displays self-similarity and computers. [ ERD:2.2]
at different scales).
freeform surface: A surface that does
fractal texture: A not follow any particular mathematical
texture representation based on form; for example, the folds of a piece
self-similarity between scales. of fabric, as shown below. [ BM:4.1]
[ JKS:7.4]

frame: 1) A complete standard

television video image consisting of
both the even and odd video fields . 2)
A knowledge representation technique
suitable for recording a related set of
facts, rules of inference, preconditions,
etc. [ TV:8.1]

frame buffer: A device that stores a

video frame for access, display and Freeman code: A type of chain code
processing by a computer. For example in which a contour is represented by
such devices are used to store the frame coordinates for the first point followed
from which a video display is refreshed. by a series of direction codes (typically
See also frame store . [ TV:2.3.1] 0 through 7). In the following figure we
show the Freeman codes relative to the
center point on the left and an example
F 89

of the codes derived from a chain of

points on the right. [ AJ:9.6] full primal sketch: A representation
described as part of Marrs theory of
vision, that is made up of the
raw primal sketch primitives together
5 6 7
with grouping information. The sketch
4 0 contains described image structures
3 2 1 that could correspond with scene
structures (e.g., image regions with
0, 0, 2, 3, 1, 0, 7, 7, 6, 0, 1, 2, 2, 4 scene surfaces).
Frenet frame: A triplet of mutually function based model: An
orthogonal unit vectors (the normal , object representation based on the
the tangent and the object functionality (e.g., an objects
binormal/bitangent ) describing a point purpose or the way in which an object
on a curve. [ BB:9.3.1] moves and interacts with other objects)
rather than its geometric properties.
Normal function based recognition:
Tangent Object recognition based on object
Binormal functionality rather than geometric
properties. See also
frequency domain filter: A filter
function based model .
defined by its action in the
Fourier space . See high pass filter and functional optimization: An
low pass filter . [ SEU:3.4] analytical technique for optimizing
(maximizing or minimizing) complex
frequency spectrum: The range of
functions of continuous variables.
(electromagnetic) frequencies.
[ BKPH:6.13]
[ EH:7.8]
functional representation: See
front lighting: A general term
function based model .
covering methods of lighting a scene
[ WP:Function representation]
where the lights are on the same side of
the object as the camera. As an fundamental form: A metric that
alternative consider backlighting . For useful in determining local properties of
example, [ WP:Frontlight] surfaces. See also
first fundamental form and
second fundamental form . [ OF:C.3]
Source fundamental matrix: A bilinear
relationship between corresponding
Objects points (u, u ) in binocular stereo
imaged. images. The fundamental matrix, F,
incorporates the two sets of camera
Light parameters (K, K ) and the relative
Source position (~t) and orientation (R) of the
cameras. Matching points ~u from one
frontal: Frontal presentation of a image and ~u from the other image
planar surface is one in which the plane satisfy ~uT F~u = 0 where S(~t) is the
is parallel to the image plane .
90 F

skew symmetric matrix of ~t and

F = (K1 )T S(~t)R1 (K )1 . See also fuzzy morphology: A type of
the essential matrix . [ TV:7.3.4] mathematical morphology that is based
on fuzzy logic rather than the more
fusion: Integration of data from conventional Boolean logic.
multiple sources into a single
representation. [ SQ:18.5] fuzzy set: A grouping of data (into a
set) where each item in the set has an
fuzzy logic: A form of logic that associated grade/likelihood of
allows a range of possibilities between membership in the set.
true and false (i.e., a degree of truth). [ WP:Fuzzy set]
[ WP:Fuzzy logic]
fuzzy reasoning: See fuzzy logic .

Gabor filter: A filter formed by restricted by a Gaussian envelope

multiplying a complex oscillation by an function. [ NA:2.7.3]
elliptical Gaussian distribution
(specified by two standard deviations gaging: Measuring or testing. A
and an orientation). This creates filters standard requirement of industrial
that are local, selective for orientation, machine vision systems.
have different scales and are tuned for
gait analysis: Analysis of the way in
intensity patterns (e.g., edges, bars and
which human subjects move.
other patterns observed to trigger
Frequently used for biometric or
responses in the simple cells of the
medical purposes. [ WP:Gait analysis]
mammalian visual cortex) according to
the frequency chosen for the complex
oscillation. The filter can be applied in
the frequency domain as well as the
spatial domain . [ FP:9.2.2]

Gabor transform: A transformation

that allows a 1D or 2D signal (such as
an image) to be represented as a
weighted sum of Gabor functions.
[ NA:2.7.3]

Gabor wavelets: A type of wavelet gait classification: 1) Classification of

formed by a sinusoidal function that is different types of human motion (such
as walking, running, etc.). 2) Biometric
92 G

identification of people based on their Original Image Normal first derivative Gaussian first derivative

gait parameters. [ WP:Gait#Energy-

based gait classification]

Galerkin approximation: A method

for determining the coefficients of a Gaussian distribution: A probability
power series solution for a differential density function with this distribution:
1 (x)2

gamma: Devices such as cameras and P (x) = e 22

displays that convert between analogue
(denoted a) and digital (d) images where is the mean and is the
generally have a nonlinear relationship standard deviation. If ~x d , then the
between a and d. A common model for multivariate probability density
this nonlinearity is that the signals are function is p(~x) =
related by a gamma curve of the form det(2) 12
exp( 21 (~x
~ ) 1 (~x
~ ))

a = c d , for some constant c. For where ~ is the distribution mean and
CRT displays, common values of are is its covariance. [ BKPH:2.5.2]
in the range 1.02.5. [ BB:2.3.1]
Gaussian mixture model: A
gamma correction: The correction of representation for a distribution based
brightness and color ratios so that an on a combination of Gaussians. For
image has the correct dynamic range instance, used to represent color
when displayed on a monitor. histograms with multiple peaks. See
[ WP:Gamma correction] expectation maximization .
[ WP:Mixture model]
gauge coordinates: A coordinate
system local to the image surface itself. Gaussian noise: Noise whose
Gauge coordinates provide a convenient distribution is Gaussian in nature.
frame of reference for operators such as Gaussian noise is specified by its
the gradient operator . standard deviation about a zero mean,
and is often modeled as a form of
Gaussian convolution: See
additive noise . [ TV:3.1.1]
Gaussian smoothing . [ TV:3.2.2]

Gaussian curvature: A measure of

the surface curvature at a point. It is
the product of the maximum and
minimum of the normal curvatures in
all directions through the point. See
also mean curvature . [ FP:19.1.2]

Gaussian derivative: The

combination of Gaussian smoothing Gaussian pyramid: A
and a gradient filter . This results in a multi-resolution representation of an
gradient filter that is less sensitive to image formed by several images, each
noise. [ FP:8.2.1] one a subsampled and
Gaussian smoothed version of the
original one at increasing standard
deviation. [ WP:Gaussian pyramid]
G 93

Gaussian Smoothed Images

Original Image sigma = 1.0 sigma = 3.0

Gaussian smoothing: An
image processing operation aimed to gaze direction tracking: Continuous
attenuate image noise computed by gaze direction estimation (e.g., in a
convolution with a mask sampling a video sequence or a live camera feed).
Gaussian distribution . [ TV:3.2.2]
gaze location: See
gaze direction estimation .

generalized cone: A
generalized cylinder in which the swept
curve changes along the axis.
[ VSN:9.2.3]

generalized curve finding: A general

Gaussian speckle: Speckle that has term referring to methods that locate
a Gaussian distribution . arbitrary curves. For example, see
generalized Hough transform .
Gaussian sphere: A sampled
[ ERD:10]
representation of a unit sphere where
the surface of the sphere is defined by a generalized cylinder: A
number of triangular patches (often volumetric representation where the
computed by dividing a dodecahedron). volume is defined by sweeping a closed
See also extended Gaussian image . curve along an axis. The axis does not
[ VSN:9.2.5] need to be straight and the closed curve
may vary in shape as it is moved along
the axis. For example a cylinder may
be defined by moving a circle along a
straight axis, and a cone may be defined
by moving a circle of changing diameter
along a straight axis. [ FP:24.2.1]


gaze control: The ability of a human

subject or a robot head to control their
gaze direction.
generalized Hough transform: A
gaze direction estimation: version of the Hough transform
Estimation of the direction in which a capable of detecting the presence of
human subject is looking. Used for arbitrary shapes. [ ERD:10]
humancomputer interaction.
94 G

generalized order statistics filter: influence on perception theories, and

A filter in which the values within the subsequently on computer vision. Its
filter mask are considered in increasing basic tenet was that a perceptual
order and then combined in some pattern has properties as a whole,
fashion. The most common such filter which cannot be explained in terms of
is the median filter that selects the its individual components. In other
middle value. words, the whole is more than the sum
of its parts. This concept was captured
generate and test: See in some basic laws (proximity,
hypothesize and verify . similarity, closure, common destiny
[ WP:Trial and error] or good form, saliency), that would
apply to all mental phenomena, not
generic viewpoint: A viewpoint such
just perception. Much work on
that small motions may cause small
low-level computer vision, most notably
changes in the size or relative positions
on perceptual grouping and
of features, but no features appear or
perceptual organization , has exploited
disappear. This contrasts with a
these ideas. See also visual illusion .
privileged viewpoint .
[ FP:14.2]
[ WP:Neuroesthetics#The Generic Viewpoint]
geodesic: The shortest line between
two points (on a mathematically
genetic algorithm: An optimization
defined surface). [ AJ:3.10]
algorithm seeking solutions by refining
iteratively a small set of candidates geodesic active contour: An
with a process mimicking genetic active contour model similar to the
evolution. The suitability (fitness) of a snake model in that it attempts to
set of possible solutions (population) is minimize an energy function between
used to generate a new population until the model and the data, but which also
some conditions are satisfied (e.g., the incorporates a geometrical model.
best solution has not changed for a
given number of iterations). Initial Contour Final Contour

[ WP:Genetic algorithm]

genetic programming: Application

of genetic algorithms in some
programming language to evolve
programs that satisfy some evaluation geodesic active region: A technique
criteria. [ WP:Genetic programming] for region based segmentation that
genus: In the study of topology , the builds on geodesic active contours by
number of holes in a surface. In adding a force that takes into account
computer vision, sometimes used as a information within regions. Typically a
discriminating feature for simple object geodesic active region will be bounded
recognition. [ WP:Genus] by a single geodesic active contour.

Gestalt: German for shape. The geodesic distance: The length of the
Gestalt school of psychology, led by the shortest path between two points along
German psychologists Wertheimer, some surface. This is different from the
Kohler and Koffka in the first half of Euclidean distance that takes no
the twentieth century, had a profound account of the surface. The following
example shows the geodesic distance
G 95

between Calgary and London (following

the curvature of the Earth). geometric distance: In curve and
[ WP:Distance (graph theory)] surface fitting , the shortest distance
from a given point to a given surface.
In many fitting problems, the geometric
distance is expensive to compute but
yields more accurate solutions.
Compare algebraic distance .
[ HZ:3.2.2]

geometric distortion: Deviations

from the idealized image formation
geodesic transform: Assigns to each model (for example, pinhole camera) of
point the geodesic distance to some an imaging system. Examples include
feature or class of feature. radial lens distortion in standard
geographic information system
(GIS): A computer system that stores geometric feature: A general term
and manipulates geographically describing a shape characteristic of
referenced data (such as images of some data, that encompasses features
portions of the Earth taken by such as edges , corners , geons , etc.
[ WP:Geographic information system] geometric feature learning:
Learning geometric features from
geometric compression: The examples of the feature.
compression of geometric structures
such as polygons. geometric feature proximity: A
measure of the distance between
geometric constraint: A limitation geometric features, e.g., as by using the
on the possible physical distance between data and overlaid
arrangement/appearance of objects model features in
based on geometry. These types of hypothesis verification .
constraints are used extensively in
stereo vision (e.g., the geometric hashing: A technique for
epipolar constraint ), motion analysis matching models in which some
(e.g., rigid motion constraint) and geometric invariant features are
object recognition (e.g., focusing on mapped into a hash table, and this
specific classes of objects or relations hash table is used to perform the
between features). [ OF:6.2.6] recognition. [ BM:4.5.4]

geometric correction: In geometric invariant: A quantity

remote sensing , an algorithm or describing some geometric configuration
technique for correction of that remains unchanged under certain
geometric distortion . [ AJ:8.16] transformations (e.g., cross-ratio ,
perspective projection ).
geometric deformable model: A [ WP:Geometric invariant theory]
deformable model in which the
deformation of curves is based on the geometric model: A model that
level set method and stops at object describes the geometric shape of some
boundaries. A typical example is a object or scene. A model can be 2D
geodesic active contour model. (e.g., polycurve ) or 3D (e.g., surface
96 G

based models), etc.

[ WP:Geometric modeling]

geometric model matching:

Comparison of two geometric models
or of a model and a set of image data
shapes, for the purposes of recognition .

geometric optics: A general term geon: GEometrical iON. A basic

referring to the description of optics volumetric primitive proposed by
from a geometrical point of view. Biederman and used in
Includes concepts such as the simple recognition by components . Some
pinhole camera model , magnification , example geons are:
lenses , etc. [ EH:3] [ WP:Geon (psychology)]

geometric reasoning: Reasoning

with geometric shapes in order to
address such tasks as robot motion
planning, shape similarity, spatial
position estimation, etc.

gesture analysis: Basic analysis of
representation: See geometric model.
video data representing human gestures
[ WP:RGB color model#Geometric representation]
preceding the task of
gesture recognition .
geometric shape: A shape that takes [ WP:Gesture recognition]
a relatively simple geometric form (such
gesture recognition: The recognition
as a square, ellipse, cube, sphere,
of human gestures generally for the
generalized cylinder , etc.) or that can
purpose of humancomputer
be described as a combination of such
interaction. See also
geometric primitives.
hand sign recognition .
[ WP:Tomahawk (geometric shape)]
[ WP:Gesture recognition]
geometric transformation: A class
of image processing operations that
transform the spatial relationships in an
image. They are used for the correction
of geometric distortions and general
image manipulation. A geometric
transformation requires the definition of
a pixel coordinate transformation
together with an interpolation scheme.
For example, a rotation does Gibbs sampling: A method for
[ SEU:3.5]: probabilistic inference based on
transition probabilities (between
states). [ WP:Gibbs sampling]

GIF: Graphics Interchange Format. A

common compressed image format
G 97

based on the LempelZivWelch Radon transform , and the

algorithm. [ SEU:1.8] wavelet transform .

GIS: See golden template: An image of an

geographic information system . unflawed object/scene that is used
[ WP:Geographic information system] within template matching to identify
any deviations from the ideal
glint: A specular reflection visible on object/scene.
a mirror -like surface. [ WP:Glint]
gradient: Rate of change. This is
Glint frequently associated with
edge detection . See also
gray scale gradient . [ VSN:3.1.2]


global: A global property of a
mathematical object is one that Position in image row
depends on all components of the Position in image row

object. For example, the average

intensity of an image is a global gradient based flow estimation:
property, as it depends on all the image Estimation of the optical flow based on
pixels. [ WP:Global variable] gradient images. This computation can
be done directly through the
global positioning system (GPS): computation of a time derivative as
A system of satellites that allow the long as the movement between frames
position of a GPS receiver to be is quite small. See also the
determined in absolute aperture problem .
Earth-referenced coordinates. Accuracy
of standard civilian GPS is of the order gradient descent: An iterative
of meters. Greater accuracy is method for finding the (local) minimum
obtainable using differential GPS. of a function. [ DH:5.4.2]
[ WP:Global Positioning System]
gradient edge detection:
global structure extraction: Edge detection based on image
Identification of high level gradients . [ BB:3.3.1]
structures/relationships in an image
(e.g., symmetry detection ). gradient filter: A filter that is
convolved with an image to create an
global transform: A general term image in which every point represents
describing an operator that transforms the gradient in the original image in an
an image into some other space. orientation defined by the filter.
Sample global transforms include the Normally two orthogonal filters are
discrete cosine transform , the used and by combining these a
Fourier transform , the Haar transform , gradient vector can be determined for
the Hadamard transform , the every point. Common filters include
Hartley transform, histograms , the Roberts cross gradient operator ,
Hough transform , the Prewitt gradient operator and the
KarhunenLoeve transform , the Sobel gradient operator . The Sobel
98 G

horizontal gradient operator gives:

[ WP:Edge detection]

Gradient Filter
-1 0 1

* -2 0 2 =
-1 0 1

gradient image: See edge image .

[ WP:Image gradient#Computer vision]
gradient space: A representation of
surface orientations in which each
gradient magnitude thresholding: orientation is represented by a pair
Thresholding of a gradient image in z
(p, q) where p = x z
and q = y (where
order to identify strong edge points . the z axis is aligned with the
optical axis of the viewing device).
[ BKPH:15.3]

Vectors representing various
A surface orientations Gradient Space
B q
Y A p

gradient matching stereo: An gradient vector: A vector describing

approach to stereo matching in which the magnitude and direction of
the image gradients (or features maximal change on an N-dimensional
derived from the image gradients) are surface. [ WP:Gradient]
matched. [ CS:6.9]
graduated non-convexity: An
gradient operator: An algorithm for finding a global minimum
image processing operator that in a function that has many sharp local
produces a gradient image from a minima (a non-convex function). This
gray scale input image I. Depending is achieved by approximating the
on the usage of the term, the output function by a convex function with just
could be 1) the vectors I of the x and one minimum (near the global
y derivatives at each point or 2) the minimum of the non-convex function)
magnitudes of these gradient vectors. and then gradually improving the
The usual role of the gradient operator approximation.
is to locate regions of strong gradients
that signals the position of an edge . grammar: A system of rules
The figure below shows a gray scale constraining the way in which
image and its gradient magnitude primitives (such as words) can be
image, where darker lines indicate combined. Used in computer vision to
stronger magnitudes. The gradient was represent objects where the primitives
calculated using the Sobel operator . are simple shapes, textures or features.
[ DH:7.3] [ DH:12.2.1]
G 99

Determining whether two graphs are

grammatical representation: A isomorphic is the graph isomorphism
representation that describes shapes problem and is believed to be
using a number of primitives that can NP-complete . These small graphs are
be combined using a particular set of isomorphic with A:b, C:a, B:c
rules (the grammar). [ OF:11.2.2]:
granulometric spectrum: The
resultant distribution from a A b
granulometry . B
granulometry: The study of the size
C a
characteristics of a set (e.g., the size of
a set of regions). Most normally this is graph matching: A general term
achieved by applying a series of describing techniques for comparing two
morphological openings (with graph models . These techniques may
structured elements of increasing size) attempt to find graph isomorphisms ,
and then studying the resultant size subgraph isomorphisms , or may just
distributions. try to establish similarity between
[ WP:Granulometry (morphology)] graphs. [ OF:11.2.2]
graph: A graph is formed by a set of
vertices V and a set of edges E V V
linking pairs of vertices. Vertices u and
Graph Model Graph Model
v are neighbors if (u, v) E or
(v, u) E. See graph isomorphism , ?
subgraph isomorphism . This is a graph
with five nodes [ FP:14.5.1]:

graph model: A model of data in

A terms of a graph. Typical uses in
computer vision include
object representation (see
graph matching ) and edge gradients
b c a (see graph searching ).
[ WP:Graphical model]

graph partitioning: The operation of

graph cut: A partition of the vertices splitting a graph into subgraphs
of a directed graph V into two disjoint satisfying some criteria. For example
sets S and T . The cost of the cut is the we might want to partition a graph of
costs of all the edges that go from a all polygonal edge segments in an image
vertex in S to a vertex in T . [ CS:6.11] into subgraphs corresponding to objects
in the scene. [ WP:Graph partition]

graph isomorphism: Two graphs are graph representation: See

isomorphic if there exists a mapping graph model .
(bijection) between their vertices that [ WP:Graph (data structure)#Representations]
makes the edge sets identical.
100 G

graph searching: Search for a specific 0 255

node or path through a graph . Used

for, among other things,
border detection (e.g., in an edge gray scale co-occurrence: The
gradient image) and object occurrence of two particular gray levels
identification (e.g., decision trees). some particular distance and
orientation apart. Used in
graph similarity: The degree to co-occurrence matrices . [ RN:8.3.1]
which two graph representations are
similar. Typically (in computer vision) gray scale correlation: The
these representations will not be cross correlation of gray scale values in
exactly the same and hence a double image windows or full images.
subgraph isomorphism may need to be
found to evaluate similarity. gray scale distribution model: A
model of how gray scales are
graph theoretic clustering: distributed in some image region . See
Clustering algorithms that use concepts also intensity histogram .
from graph theory, in particular
leveraging efficient graph-theoretic gray scale gradient: The rate of
algorithms such as maximum flow. change of the gray levels in a
[ WP:Cluster analysis#Graph- gray scale image . See also edge ,
theoretic methods] gradient image and
first derivative filter .

grassfire algorithm: A technique for gray scale image: A monochrome

finding a region skeleton based on image in which pixels typically
wave propagation. A virtual fire is lit represents brightness values ranging
on all region boundaries and the from 0 to 255. See also gray scale .
skeleton is defined by the intersection of [ SQ:4.1.1]
the wave fronts.



grating: See diffraction grating .

gray scale mathematical
[ WP:Grating]
morphology: The application of
gray level . . . : See gray scale . . . mathematical morphology to
[ LG:3.4] gray scale images . Each
quantization level is treated as a
gray scale: A monochromatic distinct set where pixels are members of
representation of the value of a pixel. the set if they have a value greater than
Typically this represents image or equal to particular quantization
brightness and ranges from 0 (black) to levels. [ SQ:7.2]
255 (white). [ LG:3.4]
G 101

gray scale moment: A moment that of a number of pixels) from the local
is based on image or region neighborhood are used. Grid filters
gray scales . See also binary moment. require a training phase where noisy
data and corresponding ideal data are
gray scale morphology: See presented.
gray scale mathematical morphology.
[ SQ:7.2] ground following: See
ground tracking .
gray scale similarity: See
gray scale correlation . ground plane: The horizontal plane
that corresponds to the ground (the
gray scale texture moment: A surface on which objects stand). This
moment that describes texture in a concept is only really useful when the
gray scale image (e.g., the Haralick ground is roughly flat. The ground
texture operator describes image plane is highlighted here:
homogeneity). [ WP:Ground plane]
gray scale transformation: A
general term describing a class of
image processing operations that apply
to gray scale images , and simply
manipulate the gray scale of pixels.
Example operations include
contrast stretching and
histogram equalization .

gray value . . .: See gray scale . . . ground tracking: A loosely defined

[ WP:Grey] term describing the robot navigation
problem of sensing the ground plane
greedy search: A search algorithm and following some path.
seeking to maximize a local criterion [ WP:Ground track]
instead of a global one. Greedy
algorithms sacrifice generality for speed. ground truth: In performance
For instance, the stable configuration of analysis, the true value, or the most
a snake is typically found by an accurate value achievable, of the output
iterative energy minimization . The of a specific instrument under analysis,
snake configuration at each step of the for instance a vision system measuring
optimization can be found globally, by the diameter of circular holes. Ground
searching the space of all allowed truth values may be known
configurations of all pixels theoretically, e.g., from formulae, or
simultaneously (a large space) or locally obtained through an instrument more
(greedy algorithm), by searching the accurate than the one being evaluated.
space of all allowed configurations of [ TV:A.1]
each pixel individually (a much smaller
space). [ NA:6.3.2] grouping: 1) In human perception,
the tendency to perceive certain
grey . . .: See gray . . . [ LG:3.4] patterns or clusters of stimuli as a
coherent, distinct entity as opposed to a
grid filter: An approach to set of independent elements. 2) A
noise reduction where a nonlinear whole class of segmentation algorithms
function of features (pixels or averages is based on this idea. Much of this work
102 G

was inspired by the Gestalt school of

psychology. See also segmentation , grouping transform: An
image segmentation , image analysis technique for grouping
supervised classification , and clustering image features together (e.g., based on
. [ FP:14] collinearity, etc.). [ TV:5.5]

Haar transform: A wavelet transform Hamming distance of 01110 and 01100

that is used in image compression . The is 1, that of 10100 and 10001 is 2. A
basis functions used are similar to those very important concept in digital
used by first derivative edge detectors, communications. [ CS:6.3.2]
resulting in images that are decomposed
into horizontal, diagonal and vertical hand sign recognition: The
edges at different scales. [ PGS:4.4] recognition of hand gestures such as
those used in sign language.
Hadamard transform: A
transformation that can be used to H I
transform an image to its constituent
Hadamard components. A fast version
of the algorithms exists that is similar
to the fast Fourier transform , but all hand tracking: The tracking of a
values in the basis functions are either persons hand in a video sequence ,
+1 or 1. It requires significantly less often for use in humancomputer
computation and as such is often used interaction.
for image compression . [ SEU:2.5.3]
handeye calibration: The
halftoning: See dithering . calibration of a manipulator (such as a
[ WP:Halftone] robot arm) together with a visual
system (such as a number of cameras).
Hamming distance: The number of The main issue here is ensuring that
different bits in corresponding positions both systems use the same frame of
in two bit strings. For instance, the reference. See also camera calibration .
104 H

but the coefficients used are real

handeye coordination: The use of (whereas those used in the Fourier
visual feedback to direct the movement transform are complex). [ AL:13.4]
of a manipulator. See also
handeye calibration . Hausdorff distance: A measure of
the distance between two sets of
handwriting verification: (image) points. For every point in both
Verification that the style of sets determine the minimum distance
handwriting corresponds to that of to any point in the other set. The
some particular individual. Hausdorff distance is the maximum of
[ WP:Handwriting recognition] these minimum values. [ OF:10.3.1]
handwritten character recognition: HDTV: High Definition TeleVision.
The automatic recognition of characters [ WP:High-definition television]
that have been written by hand.
[ WP:Handwriting recognition] height image: See range image .
[ WP:Range imaging]

Helmholtz reciprocity: An
observation by Helmholtz about the
bidirectional reflectance
distribution function fr (~i, ~e) of a local
Hankel transform: A simplification surface patch, where ~i and ~e are the
of the Fourier transform for radially incoming and outgoing light rays
symmetric functions. respectively. The observation is that
[ WP:Hankel transform] the reflectance is symmetric about the
incoming and outgoing directions, i.e.,
hat transform: See fr (~i, ~e) = fr (~e,~i). [ FP:4.2.2]
Laplacian of Gaussian (also known as
Mexican hat operator ) and/or Hessian: The matrix of second
top hat operator . derivatives of a multi-valued scalar
[ WP:Top-hat transform] function. It can be used to design an
Harris corner detector: A second derivative edge detector
corner detector where a corner is
" 2 f (i,j) 2 f (i,j) #
detected if the eigenvalues of the matrix [ FP:3.1.2]. H = 2 i f (i,j)
2 f (i,j)
M are large and locally maximum ji j 2
(f (i, j) is the intensity at point (i,j)). heterarchical/mixed control: An
approach to system control where
" #
f f f f
i i i j
M = f f f f . control is shared amongst several
i j j j systems.
To avoid explicit comutation of the
eigenvalues, the local maxima of heuristic search: A search process
det(M) 0.004 trace(M) can be that employs common-sense rules
used. This is also known as the (heuristics) to speed up search.
Plessey corner finder . [ BB:4.4]
[ WP:Harris affine region detector#Harris corner measure]
hexagonal image representation:
An image representation where the
Hartley transform: Similar pixels are hexagonal rather than
transform to the Fourier transform ,
H 105

rectangular. This representation might sequence of problems beginning with a

be used because 1) it is similar to the low-resolution Hough space and
human retina or 2) the distances to all proceeding to high-resolution space, or
adjacent pixels are equal, unlike using low-resolution images, or
diagonally connected pixels in operating on subimages of the input
rectangular grids image before combining the results.

Hexagonal Sampling Grid

hierarchical image compression:
Image compression using
hierarchical coding . This leads to the
concept of progressive image

hierarchical matching: Matching at

increasingly greater levels of detail.
hidden Markov model (HMM): A This approach can be used when
model for predicting the probability of matching images or more abstract
system state on the basis of the representations.
previous state together with some
observations. HMMs have been used hierarchical model: A model formed
extensively in by smaller submodels, each of which
handwritten character recognition . may have further smaller submodels.
[ FP:23.4] The model may contain multiple
instances of the subcomponent models.
hierarchical: A general term referring The subcomponents may be placed
to the approach of considering data at a relative to the model by using a
low level of detail initially and then coordinate system transformation or
gradually increasing the level of detail. may just be listed in a set structure.
This approach often results in better This is a three-level hierarchical model
performance. [ WP:Hierarchy] with multiple usage of the
hierarchical clustering: An approach subcomponents:
to grouping in which each item is [ WP:Hierarchical database model]
initially put in a separate cluster, the
two most similar clusters are merged
and this merging is repeated until some
condition is satisfied (e.g., no clusters
of less that a particular size remain).
[ DH:6.10]

hierarchical coding: Coding of

(image) data at multiple layers starting
with the lowest level of detail and hierarchical recognition: See
gradually increasing the resolution. See hierarchical matching .
also hierarchical image compression . [ WP:Cognitive neuroscience of visual object recognition#Hierarchical Rec

hierarchical Hough transform: A

technique for improving the efficiency of hierarchical texture: A way of
the standard Hough transform . considering texture elements at
Commonly used to describe any multiple levels (e.g., basic texture
Hough-based technique that solves a elements may themselves be grouped
106 H

together to form a texture element at intensity levels (i.e., whose

another scale, and so on). [ BB:6.2] intensity histogram is flat). When this
technique is applied to a digital image ,
hierarchical thresholding: A however, the resulting histogram will
thresholding technique where an image often have large values interspersed
is considered at different levels of detail with zeros. [ AL:5.3]
in a pyramid data structure, and
thresholds are identified at different
levels in the pyramid starting at the
highest level.

high level vision: A general term

referring to image analysis and
understanding tasks (i.e., those tasks histogram modeling: A class of
that address reasoning about what is techniques, such as
seen, as opposed to basic processing of histogram equalization , modifying the
images). [ BKKP:5.10] dynamic range and contrast of an image
by changing its intensity histogram into
high pass filter: A one with desired properties.
frequency domain filter that removes or
suppresses all low-frequency histogram modification: See
components. [ SEU:2.5.4] histogram modeling . [ SEU:4.2.1]

highlight: See specular reflection . histogram moment: A moment

[ FP:4.3.4] derived from a histogram .
[ WP:Algorithms for calculating variance#Higher-
histogram: A representation of the order statistics]
frequency distribution of some values.
See intensity histogram , an example of
which is shown below. [ AL:5.2] histogram smoothing: The
application of a smoothing filter (e.g.,
600 Gaussian smoothing ) to a histogram .
This is often required before
histogram analysis operations can be


600 600
0 Grey Scale 255

histogram analysis: A general term

describing a group of techniques that
abstract information from histograms 0 0

(e.g., determining the 0 Grey Scale 255 0 Grey Scale 255

anti-mode/trough in a
bi-modal histogram for use in hit and miss/hit or miss operator:
thresholding). A morphological operation where a
new image is formed by ANDing
histogram equalization: An (logical AND) together corresponding
image enhancement operation that bits for every pixel of an input image
processes a single image and results in and a structuring element. This
an image with a uniform distribution of operator is most appropriate for
H 107

binary images but may also be applied homogeneous coordinates: Points

to gray scale images . described in projective space. For
[ WP:Hit-or-miss transform] example an (x, y, z) point in Euclidean
space would be described as
(x, y, z, ) for any in
homogeneous coordinates. [ FP:2.1.1]

homogeneous representation: A
representation defined in
HK: See mean and Gaussian curvature projective space . [ HZ:1.2.1]
shape classification . homography: The relationship
[ WP:Gaussian curvature] described by a
HK segmentation: See homography transformation .
mean and Gaussian curvature [ WP:Homography]
shape classification . homography transformation: Any
[ WP:Gaussian curvature] invertible linear transformation between
HMM: See hidden Markov model . projective spaces. It is commonly used
[ FP:23.4] for image transfer , which maps one
planar image or region to another. The
holography: The process of creating a transformation can be estimated using
three dimensional image (a hologram) four non-collinear point pairs.
by recording the interference pattern [ WP:Homography]
produced by coherent laser light that
has been passed through a homomorphic filtering: An
diffraction grating. [ WP:Holography] image enhancement technique that
simultaneously normalizes brightness
homogeneous, homogeneity: 1. ( and enhances contrast. It works by
Homogeneous coordinates :) In applying a high pass filter to the
projective n-dimensional geometry, a original image in the frequency domain,
point is represented by a n + 1 element hence reducing intensity variation (that
vector, with the Cartesian changes slowly) and highlighting
representation being found by dividing reflection detail (that changes rapidly).
the first n components by the last one. [ SEU:3.4.4]
Homogeneous quantities such as points
are equal if they are scalar multiples of homotopic transformation: A
each other. For example a 2D point is continuous deformation that preserves
represented as (x, y) in Cartesian the connectivity of object features (e.g.,
coordinates and in homogeneous skeletonization ). Two objects are
coordinates by the point (x, y, 1) and homotopic if they can be made the
any multiple thereof. 2. (Homogeneous same by some series of homotopic
texture:) A two (or higher) dimensional transformations.
pattern, defined on a space S R2 for Hopfield network: A type of neural
which some functions (e.g., mean, network mainly used in optimization
standard deviation) applied to a problems, which has been used in
window on S have values that are object recognition . [ WP:Hopfield net]
independent of the position of the
window. [ WP:Homogeneous space]
108 H

horizon line: The line defined by all

vanishing points from the same plane. HSV: Hue Saturation Value
The most commonly used horizon line is color image format. [ FP:6.3.2]
that associated with the ground plane .
hue: Describes color using the
[ WP:Horizon#Theoretical model]
dominant wavelength of the light. Hue
is a common component of color image
formats (see HSI , HSL , HSV ).
[ FP:6.3.2]

Hueckel edge detector: A

Horizon Line Horizon Line
parametric edge detector that models
Hough transform: A technique for an edge using a parameterized model
transforming image features directly within a circular window (the
into the likelihood of occurrence of parameters are edge contrast, edge
some shape. For example see orientation and distance background
Hough transform line finder and mean intensity).
generalized Hough transform .
[ AL:9.3] Huffman encoding: An optimal,
variable-length encoding of values (e.g.,
Hough transform line finder: A pixel values) based on the relative
version of the Hough transform based probability of each value. The code
on the parametric equation of a line lengths may change dynamically if the
(s = i cos + j sin ) in which a set of relative probabilities of the data source
edge points {(i, j)} is transformed into change. This technique is commonly
the likelihood of a line being present as used in image compression. [ AL:15.3]
represented in a (s, ) space. The
likelihood is quantified, in practice, by human motion analysis: A general
a histogram of the sin , cos values term describing the application of
observed in the images. [ AL:9.3.1] motion analysis to human subjects.
Such analysis is used to track moving
Image Edge Image Significant Lines people, to recognize the pose of a
person and to derive 3D properties.
[ WP:Motion analysis#Human motion analysis]

HSI: Hue-Saturation-Intensity HYPER: HYpothesis Predicted and

color image format. [ JKS:10.4] Evaluated Recursively. A well known
vision system developed by Nicholas
HSL: Hue-Saturation-Luminance Ayache and Olivier Faugeras, in which
color image format. geometric relations derived from
[ WP:HSL and HSV] polygonal models are used for
Color Image

hyperbolic surface region: A region

of a 3D surface that is locally
saddle-shaped. A point on a surface at
Hue Saturation Luminance which the Gaussian curvature is
negative (so the signs of the principal
curvatures are opposite).
H 109

hyperspectral image .
[ WP:Hyperspectral imaging]
00 hypothesize and test: See
hypothesize and verify . [ JKS:15.1]

hyperfocal distance: The distance D hypothesize and verify: A common

at which a camera should be focused in approach to object recognition in
order that the depth of field extends which possibilities (of object type and
from D/2 to infinity. Equivalently, if a pose) are hypothesized and then
camera is focused at a point at distance evaluated against evidence from the
D, points at D/2 and infinity are images. This is done either until all
equally blurred. [ JKS:8.3] possibilities are considered or until a
hypothesis with a sufficiently high
hyperquadric: A class of degree of fit is found. [ JKS:15.1]
volumetric shape representations that
include superquadrics . Hyperquadric Possible hypotheses:
models can describe arbitrary convex
What piece
polyhedra. [ SQ:9.11] goes here?

hyperspectral image: An image with Hypotheses which do not need to be considered

(in this 3 by 3 jigsaw):
a large number (perhaps hundreds) of
spectral bands. An image with a lower
number of spectral bands is referred to
as multi-spectral image .
[ WP:Hyperspectral imaging]

hyperspectral sensor: A sensor

capable of collecting many (perhaps hysteresis tracking: See
hundreds) of spectral bands thresholding with hysteresis . [ OF:4.5]
simultaneously. Produces a

ICA: See a digital image , which will suffer from

independent component analysis . rasterization. May also be used to refer
[ WP:Independent component analysis] to a vanishing point. [ HZ:1.2.2]

iconic: Having the characteristics of an IDECS: Image Discrimination

image. See iconic model . [ SQ:4.1.1] Enhancement Combination System. A
well-known vision system developed by
iconic model: A representation Haralick and Currier.
having the characteristics of an image.
For example the template used in identification: The process of
template matching . [ SQ:4.1.1] associating some observations with a
particular instance or class of object
iconic recognition: that is already known. [ TV:10.1]
Object recognition using iconic models.
identity verification: Confirmation of
the identity of a person based on some
ICP: See iterative closest point . biometrics (e.g., face authentication ).
[ FP:21.3.2] This differs from the recognition of an
unknown person in that only one model
ideal line: A line described in the
has to be compared with the
continuous domain as opposed to one in
information that is observed.
a digital image , which will suffer from
rasterization. [ HZ:1.2.2] IGS: Interpretation Guided
Segmentation. A vision technique for
ideal point: A point described in the
grouping image elements into regions
continuous domain as opposed to one in
I 111

based on semantic interpretations in illusory contour: A perceived border

addition to raw image values. where there is no edge present in the
Developed by Tenenbaum and Barrow. image data. See also
subjective contour. For example the
IHS: Intensity Hue Saturation following diagram shows the Kanizsa
color image format. [ BB:2.2.5] triangles. [ FP:14.2]
IIR: See
infinite impulse response filter.
[ WP:Infinite impulse response]

ill-posed problem: A mathematical

problem that infringes at least one of
the conditions in the definition of
well-posed problem. Informally, these
are that the solution must (a) exist, (b)
be unique, and (c) depend continuously
on the data. Ill-posed problems in
computer vision have been approached image: A function describing some
using regularization theory. See quantity (such as brightness ) in terms
regularization . [ SQ:6.2.1] of spatial layout (See
image representation ). Most frequently
illuminance: The total amount of computer vision is concerned with two
visible light incident upon a point on a dimensional digital images . [ SB:1.1]
surface. Measured in lux (lumens per
meter squared), or footcandles (lumens image addition: See
per foot squared). Illuminance pixel addition operator . [ SB:3.2.1]
decreases as the distance between the
viewer and the source increases. image analysis: A general term
[ JKS:9.1.1] covering all forms of analysis of image
data. Generally image analysis
illuminant direction: The direction operations result in a symbolic
from which illuminance originates. See description of the image contents.
also light source geometry . [ TV:9.3] [ AJ:1.5]

illumination: See illuminance . image acquisition: See

[ JKS:9.1.1] image capture . [ TV:2.3]

illumination constancy: The image arithmetic: A general term

phenomenon that allows humans to covering image processing operations
perceive the lightness/brightness of that are based on the application of an
surfaces as approximately constant arithmetic or logical operator to two
regardless of the illuminance . images. Such operations included
addition , subtraction , multiplication ,
illumination field calibration: division , blending , AND , NAND ,
Determination of the illuminance OR, XOR , and XNOR . [ SB:3.2]
falling on a scene. Typically this is
done by taking an image of a white image based: A general term
object of known brightness. describing operations or representations
that are based on images.
[ WP:Image analysis]
112 I

example, keywords) with images that

image based rendering: The allows the images to be indexed
production of a new image of a scene efficiently within a database.
from an arbitrary viewpoint based on a [ SQ:13A.3]
number of images of the scene together
with associated range images . image difference: See
[ FP:26] image subtraction . [ SB:3.2.1]

image blending: An image digitization: The process of

arithmetic operation similar to sampling and quantizing an analogue
image addition where a new image is image function to create a
formed by blending the values of digital image . [ VSN:2.3.1]
corresponding pixels from two input
images. Each input image is given a image distortion: Any effect that
weight for the blending so that the alters an image from the ideal image.
total weight is 1.0. [ SB:] Most typically this term refers to
geometric distortions , although it can
also refer to other types of distortion
* 0.7 + * 0.3 =
such as image noise and effects of
sampling and quantization .
[ WP:Distortion (optics)]
image capture: The acquisition of an
image by a recording device, e.g., a Correct Image Distorted Image

camera . [ TV:2.3]

image coding: The mapping or

algorithm required to encode or decode
an image representation (such as a
compressed image) .
[ WP:Graphics Interchange Format#Image coding]
image encoding: The process of
converting an image into a different
representation. For example see
image compression: A method of
image compression .
representing an image in order to
[ WP:Image compression]
reduce the amount of storage space
that it occupies. Techniques can be image enhancement: A general term
lossless (which allows all image data to covering a number of image processing
be recorded perfectly) or lossy (where operations, that alter an image in order
some loss of quality is allowed, typically to make it easier for humans to
resulting in significantly better perceive. Example operations include
compression rates). [ SB:1.3.2] contrast stretching and
histogram equalization . For example,
image connectedness: See
the following shows a histogram
pixel connectivity . [ SB:4.2]
equalization operation [ SB:4]:
image coordinates: See image plane
coordinates and pixel coordinates .
[ JKS:12.1]

image database indexing: The image feature: A general term for an

technique of associating indices (for interesting image structure that could
I 113

arise from a corresponding interesting coordinates in some input image. The

scene structure. Features can be single computation is based on the values of
points such as interest points , nearby pixels in the input image. This
curve vertices , image edges , lines or type of operation is required for most
curves or surfaces , etc. [ TV:4.1] geometric transformations and
computations requiring
image feature extraction: A group subpixel resolution . Types of
of image processing techniques interpolation scheme include
concerned with the identification of nearest-neighbor interpolation,
particular features in an image. bilinear interpolation , bicubic
Examples include edge detection and interpolation, etc. This figure shows the
corner detection. [ TV:4.1] result of interpolation in image
enlargement [ RJS:2]:
image flow: See optic flow .
[ JKS:14.4] Enlarged image using bicubic interpolation

image formation: A general term

covering issues relating to the manner
in which an image is formed. For
example in the case of a digital camera
this term would include the camera
geometry as well as the process of
sampling and quantization. [ SB:2.1]

image grid: A geometric map image interpretation: A general

describing the image sampling in which term for computer vision processes
every image point is represented by a that extract descriptions from images
vertex (or hole) in the map/grid. (as opposed to processes that produce
output images for human viewing).
image indexing: See There is often the assumption that the
image database indexing . [ SQ:13A.3] descriptions are very high-level, e.g.,
the boy is walking to the store
image intensifier: A device for carrying a book or these cells are
amplifying an image, so that the cancerous. A broader definition would
resultant sensed luminous flux is also allow processes that extract
significantly higher. information needed by a subsequent
[ WP:Image intensifier] (usually non- image processing )
activity, e.g., the position of a bright
image interleaving: Describes the spot in an image.
way in which image pixels are
organized. Different possibilities include image invariant: An image feature or
pixel interleaving (where the image measurement image that is invariant to
data is ordered by pixel position), and some properties. For example invariant
band interleaving (where the image color features are often used in
data is ordered by band, and is then image database indexing .
ordered by pixel position within each [ WP:Image moment]
band). [ WP:Interleaving]
image irradiance equation: Usually
image interpolation: A method for expressed as E(x, y) = R(p, q), this
computing a value for a pixel in an equality (up to a constant scale factor
output image based on non-integer to account for illumination strength,
114 I

surface color and optical efficiency) says

that the observed brightness E at pixel
(x, y) is equal to the reflectance R of
the surface for surface normal
(p, q, 1). Usually there is a
one-degree-of-freedom family of surface
normals with the same reflectance value image morphology: An approach to
so the observed brightness only image processing that considers all
partially constrains local surface operations in terms of set operations.
orientation and thus shape. See mathematical morphology .
[ JKS:9.3.1] [ WP:Mathematical morphology]

image magnification: The extent to image mosaic: A composition of

which an image is expanded for viewing. several images, to provide a single
If the image size is actually changed larger image with covering a wider field
then image interpolation must be used. of view. For example, the following is a
Normally quoted relative to the original mosaic of three images [ RJS:2]:
size (e.g., 2, 10, etc.). [ AJ:7.4]

Magnified image (x4)

image matching: The comparison of

two images, often evaluated using image motion estimation:
cross correlation . See also Computation of optical flow for all
template matching . [ TV:10.4.2] pixels/features in an image.
[ WP:Motion estimation]
Image 1
Image 2
Locations where Image 2 matches Image 1.
image multiplication: See
pixel multiplication operator .
[ SB:]

image noise: Degradation of an image

image memory: See frame store .
where pixels have values which are
[ ERD:2.2]
different from the ideal values. Often
image modality: A general term for noise is modeled as having a Gaussian
the sensing technique used to capture distribution with a zero mean, although
an image, e.g., a visible light, infrared it can take on different forms such as
or X-ray image. salt-and-pepper noise depending upon
the cause of the noise (e.g., the
image morphing: A gradual environment, electrical inference, etc.).
transformation from one image to Noise is measured in terms of the
another image. [ WP:Morphing] signal-to-noise ratio . [ SB:2.3.3]
I 115

Original Image Image with Gaussian Noise Image with Salt and Pepper Noise



image normalization: The purpose Y

of image normalization is to reduce or
eliminate the effects of different
illumination on the same or similar
scenes. A typical approach is to
subtract the mean of the image and
divide by the standard deviation, which
image plane coordinates: The
produces a zero mean, unit variance
position of points in the physical image
image. Since images are not Gaussian
sensing plane. These have physically
random samples, this approach does
meaningful values, such as centimeters.
not completely solve the problem.
These can be converted to
Further, light source placement can
pixel coordinates , which are in pixels.
also cause variations in shading that
The two meanings are sometimes used
are not corrected by this approach.
interchangeably. [ JKS:1.6]
This figure shows an original image
(left) and its normalization (right): image processing: A general term
[ WP:Normalization (image processing)] covering all forms of processing of
captured image data. It can also mean
processing that starts from an image
and results in an image, as contrasted
to ending with symbolic descriptions of
the image contents or scene. [ JKS:1.2]

image processing operator: A

function that may be applied to an
image in order to transform it in some
image of absolute conic: See way. See also image processing.
absolute conic . [ HZ:7.5] [ ERD:2.2]

image pair rectification: See image pyramid: A hierarchical

image rectification . [ FP:11.1.1] image representation in which each
level contains a smaller version of the
image plane: The mathematical plane image at the previous level. Often pixel
behind the lens onto which an image is values are obtained by a smoothing
focused. In practice, the physical process. Usually the reduction is by a
sensing surface aims to be placed here, power of two (i.e., 2 or 4). The figure
but its position will vary slightly due to below shows four levels of a pyramid in
minor variations in sensor shape and which each level is formed by averaging
placement. The term is also used to together two pixels from the previous
describe the geometry of the image layer. The levels are enlarged to the
recorded at this location. See original image size for inspection of the
[ JKS:1.4]: effect of the compression. [ FP:7.7]
116 I

data is often stored in arrays where the

spatial layout of the array reflects the
spatial layout of the data. The figure
below shows a small 10 10 pixel image
patch with the gray scale values for the
corresponding pixels. [ AJ:1.2]
image quality: A general term,
usually referring to the extent to which
123 123 123 123 123 123 123 123 96 96
the image data records the observed 123 123 112 96 96 123 123 123 123 96
123 123 96 96 112 123 137 123 123 96
scene faithfully. The specific issues that 123 123 96 96 123 214 234 178 123 96
are important to image quality are 123 100 72 109 178 230 230 137 123 96
125 78 51 142 218 178 96 76 96 96
problem specific, but may include low 92 100 92 92 81 76 76 96 123 123
81 109 129 129 100 81 92 123 123 123
image noise , high image contrast , good 51 109 142 137 123 123 123 123 123 123
33 76 123 123 137 137 123 123 123 123
image focus , low motion blur , etc.
[ WP:Image quality]
image resolution: Usually used to
image querying: A shorthand term record the number of pixels in the
for indexing into image databases . horizontal and vertical directions in the
This is often done based on color , image, but may also refer to the
texture or shape indices . The database separation between pixels (e.g., 1 m)
keys could be based on global or local or the angular separation between the
measures. [ WP:Content- lines of sight corresponding to adjacent
based image retrieval#Query by example]pixels. [ SB:1.2]
image restoration: The process of
image reconstruction: A term used removing some known (and modelled)
in image compression to describe the distortion from an image, such as blur
process of recreating a digital image in an out-of-focus image. The process
from some compressed form. may not produce a perfect image, but
may remove an undesired distortion
image rectification: A warping of a (e.g., motion blur ) at the cost of
stereo pair of images such that another ignorable distortion (e.g.,
conjugate epipolar lines (defined by the phase distortion). [ SB:6]
two cameras epipoles and any 3D
scene point) are collinear. Usually the image sampling: The process of
lines are transformed to be parallel to measuring some pixel values from the
the horizontal axis so that physical image focused onto the
corresponding image features can be image plane . The sampling could be
found on the same raster line. This monochrome , color or multi-spectral ,
reduces the computational complexity such as RGB . The sampling usually
of the stereo correspondence problem . results in a rectangular array of pixels
[ FP:11.1.1] sampled at nearly equally spacing, but
other sampling could be used such as
image registration: See registration . space variant sensing . [ VSN:2.3.1]
[ FP:21.3]
image scaling: The operation of
image representation: A general increasing or reducing the size of an
term for how the image data is image by some scale factor. This
represented. Image data can be one, operation may require the use of some
two, three or more dimensional. Image type of image interpolation method.
I 117

See also image magnification . would be to remove systematic camera

[ WP:Image scaling] motions to produce a motionless image.
See also feature stabilization .
image segmentation: The grouping [ WP:Image stabilization]
of image pixels into meaningful, usually
connected, structures such as curves image sharpening operator: An
and regions . The term is applied to a image enhancement operator that
variety of image modalities , such as increases the high spatial frequency
intensity data or range data and component of the image, so as to make
properties, such as similar the edges of objects appear sharper or
feature orientation , feature motion, less blurred . See also
surface shape or texture . [ SB:10.1] edge enhancement . These images show
a raw image (left) and an image
image sequence: A series of images sharpened with the unsharp operator
generally taken at regular intervals in (right). [ SB:4.6]
time. Typically the camera and/or
objects in the scene will be moving.
[ TV:8.1]

image sequence fusion: The

integration of information from the
many images in an image sequence .
Different types of fusion include
3D structure recovery , production of a image size: The number of pixels in
mosaic of the scanned scene, tracking an image, for example, 768 horizontally
of a moving object, improved scene by 494 vertically.
imaging due to image averaging , etc. [ WP:Wikipedia:What is a featured picture%3F/Image size]
[ WP:Image fusion]

image sequence matching: image smoothing: See

Computing the correspondence between noise reduction . [ TV:3.2]
pixels or image features in frames of
image stabilization: See
the image sequence. With the
image sequence stabilization
correspondences, one can construct
[ WP:Image stabilization]
image mosaics , stabilize image jitter or
recover scene structure . image storage devices: See
frame store . [ ERD:2.2]
image sequence stabilization:
Normal hand-held video camera image subtraction operator: See
recordings contain some image motion pixel subtraction operator . [ SB:3.2.1]
due to the jitter of the human operator.
Image stabilization attempts to image transfer: 1) See
estimate the random portion of the novel view synthesis . 2) Alternatively,
camera motion jitter and translate the a general term describing the movement
images in the sequence to reduce or of an image from one device to another,
remove the jitter. A similar application or alternatively from one representation
118 I

to another. vegetation or mineral types).

[ WP:Picture Transfer Protocol] [ WP:Imaging spectroscopy]

image understanding: A general imaging surface: The surface within

term referring to the derivation of a camera on which the image is
high-level (abstract) information from projected by the lens . This surface in a
an image or series of images. This term digital camera is comprised of
is often used to refer to the emulation of photosensitive elements that record the
human visual capabilities. [ AJ:9.15] incident illumination. See also
image plane .
Image Understanding Environment
(IUE): A C++ based collection of implicit curve: A curve that is
data-types (classes) and standard defined by an equation of the form
computer vision algorithms. The f (~x) = 0. Then the curve is the set of
motivation behind the development of points S = {~x | f (~x) = 0}. [ FP:15.3.1]
the IUE was to reduce the independent
re-invention of basic computer vision
code in government funded computer implicit surface: The representation
vision research. of a surface as the set of points that
makes a function have the value zero.
image warping: A general term for For example, the sphere
transforming the positions of pixels in x2 + y 2 + z 2 = r2 of radius r at the
an image, usually while maintaining origin could be represented by the
image topology (i.e., neighboring function f (x, y, z) = x2 + y 2 + z 2 r2 .
original pixels remain neighbors in the The set of points where f (x, y, z) = 0 is
warped image). This results in an the implicit surface. [ SQ:4.1.2]
image with a new shape. This operation
might be done, for example, to correct impossible object: An object that
some geometric distortion , align two cannot physically exist, such as
images (see image rectification ), or [ VSN:4.1.1]:
transform shapes into a more easily
processed form (e.g., circles into
straight lines). [ SB:7.10]

imaging geometry: A general term

referring to the relative placement of
sensors , structured light sources ,
point light sources , etc. [ BB:2.2.2]

imaging spectroscopy: The

acquisition and analysis of surface
composition by using image data from
multiple spectral channels. A typical
sensor (AVIRIS) records 224 impulse noise: A form of image
measurements at 10 nm increments corruption where image pixels have
from 400 to 2500 nm. The term might their value replaced by the maximum
refer to the raw multi-dimensional value (e.g., 255). See also
signal or to the classification of that salt-and-pepper noise . This figure
signal into surface types (e.g., shows impulse noise on an image
[ TV:3.1.2]:
I 119

passes into a new material (Snells

Law). [ FP:1.2.1]

indexing: The process of retrieving an

element from a data structure using a
key. A powerful concept imported into
computer vision from programming.
For example, the problem of
incandescent lamp: A light source establishing the identity of an object
whose light arises from the glowing of a given an image and a set of candidate
very hot structure, such as a tungsten models is typically approached by
filament in the common light bulb. locating some characterizing elements
[ WP:Incandescent light bulb] in the image, or features , then using
the features properties to index a data
incident light: A general term base of models. See also
referring to the light that strikes or model base indexing . [ FP:18.4.2]
illuminates a surface.
industrial vision: A general term
incremental learning: Learning that covering uses of machine vision
is incremental in nature. See technology to industrial processes.
continuous learning . [ WP:Population- Applications include product
based incremental learning] inspection, process feedback, part or
tool alignment. A large range of
lighting and sensing techniques are
independent component analysis: used. A common feature of industrial
A multi-variate data analysis method. vision systems is fast processing rates
It finds a linear transformation that (e.g., several times a second), which
makes each component of the may require limiting the rate at which
transformed data vectors independent targets are analyzed or limiting the
of each other. Unlike types of processing.
principal component analysis, which
considers only second order properties infinite impulse response filter
(covariances) and transforms onto basis (IIR): A filter that produces an
vectors that are orthogonal to each output value (yn ) based on the current
other, ICA considers properties of the and past input values (xi ) together
whole distribution and transforms onto with pastPp output valuesPq (yj ).
basis vectors that need not be yn = i=0 ai xni + j=1 bj ynj where
orthogonal. ai and bj are weights.
[ WP:Independent component analysis] [ WP:Infinite impulse response]

index of refraction: The absolute inflection point: A point at which the

index of refraction in a material is the second derivative of a curve changes its
ratio of the speed of an electromagnetic sign, corresponding to a change in
wave in a vacuum to the speed in the concavity. See also curve inflection .
material. More commonly used is the [ FP:19.1.1]
relative index of refraction of two
media, which is the ratio of their
absolute indices of refraction. This
ratio is used in lens design and explains
the bending of light rays as the light
120 I

INFLECTION inspection: A general term for

visually examining a target to detect
defects. Common practical inspection
examples include printed circuit boards
for breaks or solder joint failures, paper
production for holes or discolorations,
and food for irregularities. [ SQ:17.4]

influence function: A function integer lifting: A method used to

describing the effect of an individual construct wavelet representations.
observations on a statistical model. [ WP:Lifting scheme]
This allows us to evaluate whether the
observation is having an undue integer wavelet transform: An
influence on the model. integer version of the discrete
[ WP:Influence function] wavelet transform .

information fusion: Fusion of integral invariant: An integral (of

information from multiple sources. See some function) that is invariant under a
sensor fusion . set of transformations. For example,
[ WP:Information integration] local integrals along a curve of
curvature or arc length are invariant to
infrared: See infrared light . [ SB:3.1] rotation and translation. Integral
invariants potentially have greater
stability to noise than, e.g., differential
infrared imaging: Production of a invariants, such as curvature itself.
image through use of an
infrared sensor. [ SB:3.1] integration time: The length of time
that a light-sensitive sensor medium is
infrared light: Electromagnetic exposed to the incident light (or other
energy with wavelengths approximately stimulus). Shorter times reduce the
in the range 700 nm to 1 mm. signal strength and possible
Immediately shorter wavelengths are motion blur (if the sensor or objects in
visible light and immediately longer the scene are moving).
wavelengths are microwave radio.
Infrared light is often used in intensity: 1) The brightness of a
machine vision systems because: 1) it light source . 2) Image data that records
is easily observed by most the brightness of the light that comes
semiconductor image sensors yet is not from the observed scene. [ TV:2.2.3]
visible by humans or 2) it is a measure
of the heat emitted by the observed intensity based database indexing:
scene. [ SB:3.1] This is a form of
image database indexing that uses
infrared sensor: A sensor capable of intensity descriptors such as histograms
observing or measuring infrared light . of pixel ( monochrome or color ) values
[ SB:3.1] or vectors of local derivative values.

inlier: A sample that falls within an intensity cross correlation:

assumed probability distribution (e.g., Cross correlation using intensity data .
within the 95 percentile). See also
outlier . [ WP:RANSAC]
I 121

intensity data: Image data that

represents the brightness of the
measured light. There is not usually a
linear mapping between the brightness
of the measured light and the stored
values. The term can refer to the
intensity of observed visible light as

intensity gradient: The

mathematical gradient operation
applied to an intensity image I gives
the intensity gradient I at each image
point. The intensity gradient direction
shows the local image direction in intensity histogram: A data
which the maximum change in intensity structure that records the number of
occurs. The pixels of each intensity value. A typical
intensity gradient magnitude gives the gray scale image will have pixels with
magnitude of the local rate of change in values in [0,255]. Thus the histogram
image intensity. These terms are will have 256 entries recording the
illustrated below. At each of the two number of pixels that had value 0, the
designated points, the length of the number having value 1, etc. A dark
vector shows the magnitude of the object against a lighter background and
change in intensity and the direction of its histogram are shown here [ SB:3.4]
the vector shows the direction of
greatest change. [ WP:Image gradient]


intensity image: An image that

intensity gradient direction: The records the measured intensity data.
local image direction in which the [ TV:2.1]
maximum change in intensity occurs.
See also intensity gradient . intensity level slicing: An
[ WP:Gradient#Interpretations] image processing operation in which
pixels with values other than the
intensity gradient magnitude: The selected value (or range of values) are
magnitude of the local rate of change in set to zero. If the image is viewed as a
image intensity. See also landscape, with height proportional to
intensity gradient . The image below brightness, then the slicing operator
shows the raw image and its intensity takes a cross section through the height
gradient magnitude (contrast enhanced surface. The right image below shows
for clarity). (in black) the intensity level 80 of the
[ WP:Gradient#Interpretations] left image. [ AJ:7.2]
122 I

interest point feature detector: An

operator applied to an image to locate
interest points . Well-known examples
are the Moravec and the Plessey
interest point operators. [ SB:10.9]

intensity matching: This approach interference: When 1) ordinary light

finds corresponding points in a pair of interacts with matter that has
images by matching the gray scale dimensions similar to the wavelength of
intensity patterns. The goal is to find the light or 2) coherent light interacts
image neighborhoods that have nearly with itself, then interference occurs.
identical pixel intensities. All image The most notable effect from a
points could be considered for matching computer vision perspective is the
or only feature or interest points . An production of interference fringes and
algorithm where intensity matching is the speckle of laser illumination. May
used is correlation based stereo alternatively refer to electrical
matching. interference which can affect an image
when it is being transmitted on an
intensity sensor: A sensor that electrical medium. [ EH:9]
measures intensity data . [ BM:1.9.1]
interference fringe: When optical
interest point: A general term for interference occurs, the most noticeable
pixels that have some interesting effect it has is the production of
property. Interest points are often used interference fringes where the light
for making illuminates a surface. These are parallel
feature point correspondences between roughly equally spaced lighter and
images. Thus, the points usually have darker bands of brightness . One
some identifiable property. Further, important consequence of these bands
because of the need to limit the is blurring of the edge positions.
combinatorial explosion that matching [ EH:9.1]
can produce, interest points are often
expected to be infrequent in an image. interferometric SAR: An
Interest points are often points of high enhancement of
variation in pixel values. See also synthetic aperture radar (SAR) sensing
point feature . Example interest points to incorporate phase information from
from the Harris corner detector the reflected signal, increasing accuracy.
(courtesy of Marc Pollefeys) are seen [ WP:Interferometric synthetic aperture radar]
here [ SB:10.9]:
I 123

internal parameters (of camera):

interior orientation: A See intrinsic parameters. [ FP:2.2]
photogrammetry term for the
calibration of the intrinsic parameters inter-reflection: The reflection caused
of a camera, including its focal length, by light reflected off a surface and
principal point, lens distortion, etc. bouncing off another surface of the
This allows transformation of measured same object. See also
image coordinates into mutual illumination .
camera coordinates . [ JKS:12.9] [ WP:Diffuse reflection#Interreflection]

interlaced scanning: A technique interval tree: An efficient structure

arising from television engineering, for searching in which every node in the
whereby alternate rows of an image are tree is a parent to nodes in a particular
scanned or transmitted instead of interval of values. [ WP:Interval tree]
consecutive rows. Thus, one television
frame is transmitted by sending first interpolation: A mathematical
the odd rows, forming the odd field , process whereby a value is inferred from
and then the even rows, forming the other nearby values or from a
even field . [ LG:4.1.2] mathematical function linking nearby
values. For example, dense values along
intermediate representation: A a curve can be linearly interpolated
representation that is created as a stage between two known curve points by
in the derivation of some other fitting a line connecting the two curve
representation from some input points. Image, surface and volume
representation. For example the values can be interpolated, as well as
raw primal sketch , full primal sketch , higher dimensional structures.
and 2.5D sketch were intermediate Interpolating functions can be curved
representation between input images as well as linear. [ BB:A1.11]
and a 3D model in Marrs theory . In
the following example a binary image of interpretation tree search: An
the notice board is an intermediate algorithm for matching between
representation between the input image members between two discrete sets. For
and the textual output. each feature from the first set, it builds
[ WP:Intermediate representation] a depth-first search tree considering all
possible matching features from the
second set. After a match is found for
one feature (by satisfying a set of
1 Malahide
consistency tests), then it tries to
match the remaining features. The
Intermediate Representation algorithm can cope when no match is
possible for a given feature by allowing
internal energy (or force): A a given number of skipped features.
measure of the stability of a shape Here we see an example of a partial
(such as smoothness) of an interpretation tree that is matching
active shape or deformable contour model features to data features
model which is part of the [ TV:10.2]:
deformation energy . This measure is
used to constrain the appearance of the
model. [ WP:Internal energy]
124 I

properties intrinsic to the scene, instead

of properties of the input image.
Example intrinsic images include:
DATA 1 M1 M2 M3 * distance to scene points, scene
X ? ? surface orientations , surface reflectance
, etc. The right image below shows a
DATA 2 M1 M2 M3 *
depth image registered with the
X ?
intensity image on the left. [ BB:1.5]
DATA 3 M1 M2 M3 *

? ? ?

M1 - MODEL 1
M2 - MODEL 2
M3 - MODEL 3

intrinsic camera parameters:

Parameters such as focal length,
coefficients of radial lens distortion, and
the position of the principal point, that
describe the mapping from image pixels intruder detection: An application of
to world rays in a camera. Determining machine vision , usually analyzing a
the parameters of this mapping is the video sequence to detect the
task of camera calibration . For a appearance of an unwanted person in a
pinhole camera, world rays ~r are scene. [ SQ:17.5]
mapped to homogeneous image
coordinates ~x by ~x = K~r where K is invariant: Something that does not
the upper triangular 3 3 matrix change under specified operations (e.g.,
translation invariant ).
u f s u0 [ WP:Invariant (mathematics)]
K= 0 v f v0
0 0 1 invariant contour function: The
contour function characterizes the
In this form, f represents the focal shape of a planar figure based on the
length, s is the skew angle between the external boundary. Values invariant to
image coordinate axes, (u0 , v0 ) is the position, scale or orientation can be
principal point, and u and v are the computed from the contour functions.
the aspect ratios (e.g., pixels/mm) in These invariants can be used for
the u and v image directions. [ FP:2.2] recognition of instances of the planar
intrinsic dimensionality: The inverse convolution: See
number of dimensions (degrees of deconvolution . [ AL:14.5]
freedom) inherent in a data set,
independent of the dimensionality of inverse Fourier transform: A
the space in which it is represented. For transformation that allows a signal to
example, a curve in 3D is intrinsically be recreated from its Fourier
1D although its points are represented coefficients. See Fourier transform .
in 3D. [ WP:Intrinsic dimension] [ SEU:2.5.1]

intrinsic image: A term describing inverse square law: A physical law

one of a set of images registered with that says the illumination power
the input intensity image that describe received at distance d from a point light
I 125

source is inversely proportional to the constant. i.e., f (x, y, z) = C where C is

square of d, i.e., is proportional to d12 . some constant. [ WP:Isosurface]
[ WP:Inverse-square law]
isotropic gradient operator: A
invert operator: A low-level gradient operator that computes the
image processing operation where a scalar magnitude of the gradient, i.e., a
new image is formed by replacing each value that is independent of edge
pixel by an inverted value. For direction. [ JKS:5.1]
binary images , this is 1 if the input
pixel is 0 or 0 if the input pixel is 1. isotropic operator: An operator that
For gray level images , this depends on produces the same output irrespective
the maximum range of intensity values. of the local orientation of the pixel
If the range of intensity values is [0,255] neighborhood where the operator is
then the inverse inverse of a pixel with applied. For example, a
value x is 256 x. The result is like a mean smoothing operator produces the
photographic negative. Below is a gray same output value, even if the image
level image and its inverted image data is rotated at the point where the
[ LG:5.1.2]: operator is being applied. On the other
hand, a directional derivative operator
would produce different values if the
image were rotated. This concept is
particularly relevant to
feature detectors , some of which are
sensitive to the local orientation of the
image pixel values and some of which
IR: See infrared . [ SB:3.1] are not (isotropic). [ LG:6.4.1]
irradiance: The amount of energy iterated closest point: See
received at a point on a surface from iterative closest point . [ FP:21.3.2]
the corresponding scene point.
[ JKS:9.1] iterative closest point (ICP): A
shape alignment algorithm that works
isometry: A transformation that by iterating its two-stage process until
preserves distances. Thus the some termination point: step 1) given
transformation T : x 7 u is an isometry an estimated transformation of the first
if, for all pairs (x, y), we have shape onto the second, find the closest
|x y| = |T (x) T (y)|. [ HZ:1.4.1] feature from the second shape for each
isophote curvature: Isophotes are feature of the first shape, and step 2)
curves of constant image intensity. given the new set of closest features,
Isophote curvature is defined at any re-estimate the transformation that
given pixel as: Lw , where Lw is maps the first feature set onto the
magnitude of the gradient second. Most variations of the
perpendicular to the isophote and Lvv algorithm need a good initial estimate
is the curvature of the intensity surface of the alignment. [ FP:21.3.2]
along the isophote at that point. IUE: See
iso-surface: A surface in a 3D space Image Understanding Environment .
where the value of some function is

Jacobian: The matrix of derivatives of

a vector function. Typically if the JPEG: A common format for
function f (~x) is written in component compressed image representation
form as [ SQ:2.2.1] designed by the Joint Photographic
Experts Group (JPEG). [ SEU:1.8]
f~(~x) = f~(x1 , x2 , . . . , xp )
junction label: A symbolic label for
the pattern of edges meeting at the

f1 (x1 , x2 , . . . , xp )
f2 (x1 , x2 , . . . , xp ) junction. This approach is mainly used
= .. in blocks world scenes where all objects

are polyhedra, and thus all lines are
fn (x1 , x2 , . . . , xp ) straight and meet at only a limited
then the Jacobian J is the n p matrix number of configurations. Example Y
f (i.e., corner of a block seen front on)

. . . xp and arrow (i.e., corner of a block
.1 . seen from the side) junctions are shown
J= .. .. here. See also line label . [ VSN:4.1.1]
x1 . . . f xp

joint entropy registration:

Registration of data using joint entropy ARROW Y JUNCTION
(a measure of the degree of uncertainty)
as a criterion.
[ WP:Mutual information#Applications of mutual information]


k-means: An iterative componentwise definition m ~ =

squared error clustering algorithm. (median{xi1 }ni=1 , ..., median{xid }ni=1 )
Input is a set of points {~xi }ni=1 , and and the analogue of the one
initial guess at the locations ~c1 , . . . , ~ck dimensional definition
of k cluster centers. The algorithm m~ = argminmR~ d ~ ~xi |.
i=1 |m
alternates two steps: points are [ WP:K-medians clustering]
assigned to the cluster center closest to
them, and then the cluster centers are k-nearest-neighbor algorithm: A
recomputed as the mean of the nearest neighbor algorithm that uses
associated points. Iterating yields an the classifications of the nearest k
estimate of the k cluster centers that is neighbors when making a decision.
likely to minimize ~x min~c |~x ~c| . 2 [ FP:22.1.4]
[ FP:14.4.2]
Kalman filter: A recursive linear
k-means clustering: See k-means . estimator of a varying state vector and
[ FP:14.4.2] associated covariance from observations,
their associated covariances and a
k-medians (also k-medoids): A dynamic model of the state evolution.
variant of k-means clustering in which Improved estimates are calculated as
multi-dimensional medians are new data is obtained. [ FP:17.3]
computed instead of means. The
definition of multi-dimensional median KarhunenLo` eve transformation:
varies, but options for the median m ~ of The projection of a vector (or image
a set of points {~xi }ni=1 , i.e., when treated as a vector) onto an
i i n
{(x1 , . . . , xd )}i=1 include the orthogonal space that has uncorrelated

128 K

components constructed from the

autocorrelation (scatter) matrix of a set kernel function: (1) A function in an
of example vectors. An advantage is the integral transformation (e.g., the
orthogonal components have a natural exponential term in the
ordering (by the largest eigenvalues of Fourier transform ); (2) a function
the covariance of the original vector applied at every point in an image (see
space) so that one can select the most convolution ).
significant variation in the dataset. The [ WP:Kernel (mathematics)]
transformation can be used as a basis
kernel principal component
for image compression, for estimating
analysis: An extension of the
linear models in high dimensional
principal component analysis (PCA)
datasets and estimating the dominant
method that allows classification with
modes of variation in a dataset, etc. It
curved region boundaries. The kernel
is also known as the
method is equivalent to a nonlinear
principal component transformation.
mapping of the data into a high
The following image shows a dataset
dimensional space from which the
before and after the KL transform was
global axes of maximum variation are
applied. [ AJ:5.11]
extracted. The method provides a
transformation via a kernel so that
PCA can be done in the input space
instead of the transformed space.
+X +X [ WP:Kernel principal component analysis]

key frames: Primarily a computer

PRINCIPAL EIGENVECTOR graphics animation technique, where
kernel: 1) A small matrix of numbers key frames in a sequence are drawn by
that is used in image convolutions . 2) more experienced animators and then
The structuring element used in intermediate interpolating frames are
mathematical morphology . 3) The drawn by less experienced animators.
mathematical transformation used In computer vision
kernel discriminant analysis . motion sequence analysis , key frames
[ FP:7.1.1] are the analogous video frames ,
typically displaying
kernel discriminant analysis: A motion discontinuities between which
classification approach based on three the scene motion can be smoothly
key observations: 1) some problems interpolated. [ WP:Key frame]
need curved classification boundaries,
2) the classification boundaries should KHOROS: An image processing
be defined locally by the classes rather development environment with a large
than globally and 3) a high set of operators. The system comes
dimensional classification space can be with a pull-down interactive
avoided by using the kernel method. development workspace where operators
The method provides a transformation can be instantiated and connected by
via a kernel so that linear discriminant click and drag operations.
analysis can be done in the input space
instead of the transformed space. kinetic depth: A technique for
[ WP:Linear discriminant analysis#Practical use] the depth at image feature
points (usually edges ) by exploiting a
K 129

controlled sensor motion. This when they might be usable or might

technique generally does not work at all fail. An additional common component
points of the image because of is some form of task dependent
insufficient image structure or sensor knowledge encoded in a
precision in smoothly varying regions, knowledge representation that is used
such as walls. See also to help guide the reasoning algorithm.
shape from motion . A typical motion Also common is some uncertainty
case is for the camera to rotate on a mechanism that records the confidence
circular trajectory while fixating on a that the system has about the
point in front of the camera, as seen here: outcomes of its processing. For
[ WP:Depth perception#Monocular cues] example, a knowledge-based vision
system might be used for aerial analysis
of road networks, containing specialized
detection modules for straight roads,
FIXATION road junctions, forest roads as well as
survey maps, terrain type classifiers,
curve linking, etc. [ RN:10.2]

knowledge representation: A
general term for methods of computer
encoding knowledge. In
SWEPT computer vision systems, this is usually
knowledge about recognizable objects
and visual processing methods. A
common knowledge representation
Kirsch compass edge detector: A
scheme is the geometric model that
first derivative edge detector that
records the 2D or 3D shape of objects.
computes the gradient in different
Other commonly used vision knowledge
directions according to which
representation schemes are
calculation mask is used. Edges have
graph models and frames . [ BT:9]
high gradient values, so thresholding
the intensity gradient magnitude is one Koenderinks surface shape
approach to edge detection . A Kirsch classification: An alternative to the
mask that detects edges at 45 degrees is more common mean curvature and
[ SEU:2.3.4]: Gaussian curvature 3D
surface shape classification labels.
3 5 5 Koenderinks scheme decouples the two
3 0 5 intrinsic shape parameters into one
3 3 3 parameter (S) that represents the
local surface shape (including
cylindrical, hyperbolic, spherical and
knowledge-based vision: A style of planar) and a second parameter (C)
image interpretation that relies on that encodes the magnitude of the
multiple processing components capable curvedness of the shape. The shape
of different image analysis processes, classes represented in Koenderinks
some of which may solve the same task classification scheme are illustrated:
in different ways. Linking the
components together is a reasoning
algorithm that knows about the
capabilities of the different components,
130 K

Nyquist noise#Thermal noise on capacitors]

S: -1 -1/2 0 +1/2 +1 KullbackLeibler

distance/divergence: A measure of
the relative entropy or distance between
Kohonen network: A multi-variate two probability densities p1 (~x) and
data clustering and analysis method p2 (~x), defined as [ CS:6.3.4]
that produces a topological
p1 (~x)
organization of the input data. The D(p1 || p2 ) = p1 (~x) log d~x
response of the whole network to a p2 (~x)
given data vector can be used as a
lower dimensional signature of the data kurtosis: A measure of the flatness of
vector. a distribution of gray scale values. If
[ WP:Counterpropagation network] ng is the number of pixels out of N
with gray scale value g, then the fourth
KTC noise: A type of noise histogram moment is
associated with Field Effect Transistor 4 = N1 g ng (g 1 )4 , where 1 is the
(FET) image sensors. The KTC term mean pixel value. The kurtosis is
is used because the noise is 4 3. [ AJ:9.2]
proportional to kT C where T is the
temperature, C is the capacitance of Kuwahara: An edge-preserving
the image sensor and k is Boltzmanns noise reduction filter . The filter uses
constant. This noise arises during four regions surrounding the pixel being
image capture at each pixel smoothed. The smoothed value for that
independently and is also independent pixel is the mean value of the region
of integration time. [ WP:Johnson- with smallest variance.

label: A description associated with

something for the purposes of
identification. For example see
region labeling . [ BB:12.4]

labeling problem: Given a set S of

image structures (which may be pixels
as well as more structured objects like
edges ) and a set of labels L, the
labeling problem is the question of how
to assign a label l L for each image lacunarity: A scale dependent
structure s S. This process is usually measure of translational invariance
dependent on both the image data and based on the size distribution of holes
neighboring labels. A typical remote within a set. High lacunarity indicates
sensing application is to label image that the set is heterogeneous and low
pixels by their land type, such as water, lacunarity indicates homogeneity.
snow, sand, wheat field, forest, etc. A [ PGS:3.3]
range image (below left) has its pixels
labeled by the sign of their LADAR: LAser Detection And
mean curvature (white: negative, light Ranging or Light Amplification for
gray: zero, dark gray: positive, black: Detection and Ranging. See laser radar
missing data). [ BB:12.4] . [ BB:2.3.2]

Lagrange multiplier technique: A

method of constrained optimization to
132 L

find a solution to a numerical problem electronic circuit card or an anatomical

that includes one or more constraints. feature such as the tip of the nose, or
The classical form of the Lagrange might be a more general image feature
multiplier technique finds the parameter such as interest points . [ SB:9.1]
vector ~v minimizing (or maximizing)
the function f (~v ) = g(~v ) + h(~v ), where LANDSAT: A series of satellites
g() is the function being minimized and launched by the United States of
h() is a constraint function that has America that are a common source of
value zero when its argument satisfies satellite images of the Earth.
the constraint. The Lagrange multiplier LANDSAT 7 for example was launched
is . [ BKPH:A.5] in April 1999 and provides complete
coverage of the Earth every 16 days.
Laguerre formula: A formula for [ BB:2.3.1]
computing the directed angle between
two 3D lines based on the cross ratio of Laplacian: Loosely, the Laplacian of a
four points. Two points arise where the function is the sum of its second order
two image lines intersect the ideal line partial derivatives. For example the
(i.e., the line through the Laplacian of f (x, y, z) : R3 7 R is
2 2 2
vanishing points ) and the other two 2 f (x, y, z) = xf2 + yf2 + zf2 . In
points are the ideal lines computer vision, the Laplacian
absolute points (intersection of the operator may be applied to an image,
ideal line and the absolute conic ). by convolution with the Laplacian
kernel, one definition of which is given
Lamberts law: The observed shading by the sum of second derivative kernels
on ideal diffuse reflectors is independent [1, 2, 1] and [1, 2, 1] , with zero
of observer position and varies with the padding to make the result 3 3
angle between the surface normal and [ JKS:5.3.1]:
source direction [ JKS:9.1.2]:
0 1 0
1 4 1

Laplacian of Gaussian operator: A

Lambertian surface: A surface whose low-level image operator that applies
reflectance obeys Lamberts law , more the second derivative
commonly known as a matte surface . Laplacian operator (2 ) after a
These surfaces have equally bright Gaussian smoothing operation
appearance from all viewpoints . Thus, everywhere in an image. It is an
the shading of the surface depends only isotropic operator . It is often used as
on the relative direction of the part of a zero crossing edge detection
incident illumination . [ FP:4.3.3] operator because the locations where
the value changes sign (positive to
landmark detection: A general term negative or vice versa) of the output
for detecting an image feature that is image are located near the edges in the
commonly used for registration . The input image, and the detail of the
registration might be between a model detected edges can be controlled by use
and the image or it might be between of the scale parameter of the Gaussian
two images, etc. Landmarks might be smoothing. An example mask that
task specific, such as components on an implements the Laplacian of Gaussian
L 133

operator with smoothing parameter laser radar: (LADAR) A LIDAR

= 1.4 is [ JKS:5.4]: range sensor that uses laser light. See
also laser range sensor . [ BB:2.3.2]

laser range sensor: A laser -based

range sensor records the distance from
the sensor to a target or target scene by
detecting the image of a laser spot or
stripe projected onto the scene. These
sensors are commonly based on
structured light triangulation ,
time of flight or phase difference
technologies. [ TV:2.5.3]

laser speckle: A time-varying light

pattern produced by interference of the
Laplacian pyramid: A compressed light reflected from a surface
image representation in which a illuminated by a laser . [ EH:14.2.2]
pyramid of Laplacian images is
created. At each level of the scheme, laser stripe triangulation: A
the current gray scale image has the structured light triangulation system
Laplacian applied to it. The next level that uses laser light. For example, a
gray scale image is formed by projected plane of light that would
Gaussian smoothing and subsampling. normally result in a straight line in the
At the final level, the smoothed and camera image is distorted by any
subsampled image is kept. The original objects in the scene where the
image can be approximately distortion is proportional to the height
reconstructed level by level through of the object. A typical triangulation
expanding and smoothing the current geometry is illustrated here
level image and then adding the [ JKS:11.4.1]:
Laplacian. [ FP:9.2.1]

laser: Light Amplification by PROJECTOR

Stimulated Emission of Radiation. A

very bright light source often used for
machine vision applications because of
its properties: most light is at a single
spectral frequency , the light is
coherent, so various interference effects LASER STRIPE

can be exploited and the light beam

can be processed so that divergence is SCENE OBJECT

slight. Two common applications are

for structured light triangulation and CAMERA/SENSOR

range sensing . [ EH:14.2]

lateral inhibition: A process whereby
laser illumination: A very bright a given feature weakens or eliminates
light source useful because of its limited nearby features. An example of this
spectrum, bright power and coherence. appears in the Canny edge detector
See also laser . [ EH:14.2] where locally maximal
intensity gradient magnitudes cause
134 L

adjacent gradient values that lie across computation for the iterative and
(as contrasted with along) the edge to sorting algorithms but can be more
be set to zero. [ RN:6.2] robust to outliers than the
least mean square estimator .
Laws texture energy measure: A [ JKS:13.6.3]
measure of the amount of image
intensity variation at a pixel. The least square curve fitting: A
measure is based on 5 one dimensional least mean square estimation process
finite difference masks convolved that fits a parametric curve model or a
orthogonally to give 25 2D masks. The line to a collection of data points,
25 masks are then convolved with the usually 2D or 3D. Fitting often uses the
image. The outputs are smoothed Euclidean , algebraic or
nonlinearly and then combined to give Mahalanobis distance to evaluate the
14 contrast and rotation invariant goodness of fit. Here is an example of
measures. [ PGS:4.6] least square ellipse fitting
[ FP:15.2-15.3]:
least mean square estimation: Also
known as least square estimation or
mean square estimation. Let ~v be the
parameter vector that we are searching
for and ei (~v ) be the error meaasure
associated with the ith of N data items.
The error measure often used is the
Euclidean , algebraic or
Mahalanobis distance between the ith
data item and a curve or surface being
fit, that is parameterized by ~v . Then
the mean square error is: least square estimation: See
least mean square estimation .
1 X 2
ei (~v )
N i=1 least squares fitting: A general term
for a least mean square estimation
The desired parameter vector ~v process that fits some parametric
minimizes this sum. shape, such as a curve or surface , to a
[ WP:Least squares] collection of data. Fitting often uses
the Euclidean , algebraic or
least median of squares estimation: Mahalanobis distance to evaluate the
Let ~v be the parameter vector that we goodness of fit. [ BB:A1.9]
are searching for and ei (~v ) be the error
associated with the ith of N data items. least square surface fitting: A
The error measure often used is the least mean square estimation process
Euclidean , algebraic or that fits a parametric surface model to
Mahalanobis distance between the ith a collection of data points, usually
data item and a curve or surface being range data. Fitting often uses the
fit that is parameterized by ~v . Then the Euclidean , algebraic or
median square error is the median or Mahalanobis distance to evaluate the
middle value of the sorted set {ei (~v )2 }. goodness of fit. The range image (below
The desired parameter vector ~v left) has planar and cylindrical surfaces
minimizes this median value. This fitted to the data (below right).
estimator usually requires more [ JKS:3.5]
L 135

LempelZivWelch (LZW): A form

of file compression based on encoding
commonly occurring byte sequences.
This form of compression is used in the
common GIF image file format.
[ SEU:5.2.3]

lens: A physical optical device for

focusing incident light onto an imaging
surface, such as photographic film or an
leave-one-out test: A method for electronic sensor. Lenses can also be
testing a solution in which one sample used to change magnification , enhance
is left out of the training set and used or modify a field of view. [ BKPH:2.3]
instead for testing. This can be done
for every sample. [ FP:22.1.5] lens distortion: Unexpected variation
in the light field passing through a lens.
LED: Light Emitting semiconductor Examples are radial lens distortion or
Diode. Often used as detectable chromatic aberration and usually arise
point light source markers or from how the lens differs from the ideal
controllable illumination. [ LG:7.1] lens. [ JKS:12.9]
left-handed coordinate system: A lens equation: The simplest case of a
3D coordinate system with the XYZ convex converging lens with focal
axes arranged as shown below. The length f perfectly focused on a target
alternative is a at distance D has distance d between
right-handed coordinate system . the lens and the image plane as related
[ WP:Cartesian coordinate system#Orientation
by theand
equation 1 = 1 + 1 and
f D d
illustrated here [ JKS:8.1]:


d H

lens type: A general term for lens

shapes and functions, such as convex or
Legendre moment: The Legendre half-cylindrical, converging, magnifying,
moment of a piecewise continuous etc. [ BKPH:2.3]
function f (x, y) with order (m, n) is level set: The set of data points ~x that
(2m + 1)(2n + 1) satisfy a given equation of the form:
R4 +1 R +1
1 1 m
P (x)Pn (y)f (x, y)dxdy where f (~x) = c. Varying the value of c gives
Pm (x) is the mth order Legendre different sets of usually closely related
polynomial. These moments can be points. A visual analogy is of a
used for characterizing image data and geographic surface and the ocean rising.
images can be reconstructed from the If the function f () is the sea level, then
infinite set of moments. the level sets are the shore lines for
different sea levels c. The figure below
136 L

shows an intensity image and the pixels

at level (brightness) 80. [ SQ:8.6.1] light: A general term for the
electromagnetic radiation used in many
computer vision applications. The term
could refer to the illumination in the
scene or the irradiance coming from
the scene onto the sensor. Most
computer vision applications use light
that is visible , infrared or ultraviolet .
[ AJ:3.2]

light source: A general term for the

source of illumination in a scene,
LevenbergMarquardt whether deliberate or accidental. The
optimization: A numerical light source might be a
multi-variate optimization method that point light source or an
switches smoothly between gradient extended light source . [ FP:5.2]
descent when far from a (local) light source detection: The process
optimum and a second-order inverse of detecting the position of or direction
Hessian (quadratic) method when to the light sources in the scene, even if
nearer. [ FP:3.1.2] not observable. The light sources are
license plate recognition: A usually assumed to be
computer vision application that aims point light sources for this process.
to identify a vehicles license plate from light source geometry: A general
image data. Image data is often term referring to the shape and
acquired from automatic cameras at placement of the light sources in a
places where vehicles slow down such as scene.
bridges and toll barriers.
[ WP:Automatic number plate recognition] light source placement: A general
term for the positions of the
light sources in a scene. It may also
LIDAR: LIght Detection And refer to the care that machine vision
Ranging. A range sensor using applications engineers take when
(usually) laser light . It can be based placing the light sources so as to
on the time of flight of a pulse of laser minimize unwanted lighting effects,
light or the phase shift of a waveform. such as shadows and
The measurement could be of a single specular reflections , and to enhance the
point or an array of measurements if visibility of desired scene structures,
the light beam is swept across the e.g., by back lighting or
scene/object. [ BB:2.3.2] oblique lighting .
Lie groups: A group that can be light stripe ranging: See
represented as a continuous and structured light triangulation .
differentiable manifold of a space, such [ JKS:11.4.1]
that group operations are also
continuous. An example of a Lie group lightfield: A function that encodes the
is the orthogonal group SO(3) = {R radiance on an empty point in space as
R33 : R R = I, det(R) = 1} of rigid a function of the points position and
3D rotations . [ WP:Lie group]
L 137

the direction of the illumination . A

lightfield allows image based rendering line cotermination: When two lines
of new (unoccluded) scene views from have endpoints in exactly or nearly the
arbitrary positions within the lightfield. same location. See examples:
[ WP:Light field]

lighting: A general term for the

illumination in a scene, whether
deliberate or accidental. [ LG:2.1.1] COTERMINATIONS
lightness: The estimated or perceived
reflectance of a surface, when viewed in
monochrome . [ RN:6.1-6.2]

lightpen: A user-interface device that

allows people to indicate places on a line detection operator: A
computer screen by touching the screen feature detection process that detects
at the desired place with the pen. The lines. Depending on the specific
computer can then draw items, select operator, locally linear line segments
actions, etc. It is effectively a type of may be detected or straight lines might
mouse that acts on the display screen be globally detected. Note that this
instead of on a mat. [ WP:Light pen] detects lines as contrasted with edges .
[ RN:7.3]
likelihood ratio: The ratio of
probabilities of observing data D with line drawing analysis: 1) Analysis of
and without condition C: PP(D|C)
. hand-made or CAD drawings to extract
[ WP:Likelihood function] a symbolic description or shape
description. For example, research has
limb extraction: A process of investigated extracting 3D building
image interpretation that extracts 1) models from CAD drawings. Another
the arms or legs of people or animals, application is the analysis of
e.g., for tracking or 2) the barely visible hand-drawn circuit sketches to form a
edge of a curved surface as it curves circuit description. 2) Analysis of the
away from an observer (derived from an line junctions in a polyhedral
astronomical term). See figure below. blocks world scene, in order to
See also occluding contour . understand the 3D structure of the
scene. [ VSN:4]

line fitting: A curve fitting problem

where the objective is to estimate the
parameters of a straight line that best
LIMB interpolates given point data.
[ DH:9.2]

line following: See line grouping .

[ DH:7.7]
line: Usually refers to a straight ideal
line that passes through two points, but line grouping: Generally refers to the
may also refer to a general curve process of creating a longer curve by
marking, e.g., on paper. grouping together shorter fragments
[ RJS:APPENDIX 1] found by line detection . These might
138 L

be short connecting locally detected

line fragments, or might be longer line matching: The process of making
straight line segments separated by a a correspondence between the lines in
gap. May also refer to the grouping of two sets. One set might be a
line segments on the basis of grouping geometric model such as used in
principles such as parallelism. See also model based recognition or
edge tracking , perceptual organization , model registration or alignment .
Gestalt . Alternatively, the lines may have been
extracted from different images, as
line intersection: Where two or more when doing feature based stereo or
lines intersect at a point. The lines estimating the epipolar geometry
cross or meet at a line junction . See between the two lines.
[ BKPH:15.6]:
line moment: A line moment is
similar to the traditional area moment
but is calculated only at points
(x(s), y(s)) along the object contour.
The pq th moment is: x(s)p y(s)q ds.

The infinite set of line moments

LINE INTERSECTIONS uniquely determine the contour.
line junction: The point at which two line moment invariant: A set of
or more lines meet. See invariant values computable from the
junction labeling . [ VSN:4.1.1] line moments . These may be invariant
line label: In an ideal polyhedral to translation, scaling and rotation.
blocks world scene, lines arise from line of sight: A straight line from the
only a limited set of physical situations observer or camera into the scene,
such as convex or concave usually to some target. See [ JKS:1.4]:
surface shape discontinuities (
fold edges ), occluding edges where a
fold edge is seen against the LINE OF SIGHT
background (blade edge), crack edges
where two polyhedra have aligned edges
or shadow edges. Line labels identify
the type of line (i.e., one of these
types). Assigning labels is one step in
scene understanding that helps deduce
the 3D structure of the scene. See also line scan camera: A camera that uses
junction label . Here is an example of a solid-state or semiconductor (e.g.,
the usual line labels for convex(+), CMOS) linear array sensor , in which
concave() and occluding (>) edges. all of the photosensitive elements are in
[ BKPH:15.6] a single 1D line. Typical line scan
cameras have between 32 and 8192
elements. These sensors are used for a
variety of machine vision applications
+ + such as scanning, flow process control
+ and position sensing. [ BT:3]
line segmentation: See
line linking: See line grouping . curve segmentation. [ DH:9.2.4]
L 139

line spread function: The line spread linear features: A general term for
function describes how an ideal features that are locally or globally
infinitely thin line would be distorted straight, such as lines or straight
after passing through an optical system. edges.
Normally, this can be computed by
integrating the point spread functions linear filter: A filter whose output is a
of an infinite number of points along weighted sum of its inputs, i.e., all
the line. [ EH:11.3.5] terms in the filter are either constants
or variables. If {xi } are the inputs
line thinning: See thinning . (which may be pixel values from a
[ JKS:2.5.11] local neighborhood or pixel values from
the same position in different images of
linear: 1) Having a line-like form. 2) A the same scene, etc.), then the linear
mathematical description for a process filter output would be
ai xi + a0 , for
in which the relationship between some some constants ai . [ FP:7]
input variables ~x and some output
variables ~y is given by ~y = A~x where A linear regression: Estimation of the
is a matrix. [ BKPH:6.1] parameters of a linear relationship
between two random variables X and Y
linear array sensor: A solid-state or given sets of samples ~xi and ~yi . The
semiconductor (e.g., CMOS) sensor in objective is to estimate the matrix A
which all of the photosensitive elements and vector ~a that minimize the residual
are in a single 1D line. Typical linear r(A, ~a) = i k~yi A~xi ~ak2 . In this
array sensors have between 32 and 8192 form, the ~xi are assumed to be
elements and are used in line scan noise-free quantities. When both
cameras. variables are subject to error,
orthogonal regression is preferred.
linear discriminant analysis: See
[ WP:Linear regression]
linear discriminant function .
[ SB:11.6] linear transformation: A
mathematical transformation of a set of
linear discriminant function:
values by addition and multiplication
Assume a feature vector ~x based on
by constants. If the set of values is a
observations of some structure.
vector ~x, the general linear
(Assume that the feature vector is
transformation produces another vector
augmented with an extra term with
~y = A~x, where ~y need not have the
value 1.) A linear discriminant function
same dimension as ~x and A is a
is a basic classification process that
constant matrix (i.e., is not a function
determines which of two classes or cases
of ~x). [ SQ:2.2.1]
the structure belongs to based on the
sign of the Plinear function lip shape analysis: An application of
l = ~a ~x = ai xi , for a given coefficient computer vision to understanding the
vector ~a. For example, to discriminate position and shape of human lips as
between unit side squares and unit part of face analysis . The goal might
diameter circles based on the area A, be face recognition or
the feature vector is ~x = (A, 1) and the expression understanding .
coefficient vector ~a = (1, 0.89) . If
l > 0, then the structure is a square, lip tracking: An application of
otherwise a circle. [ SB:11.6] computer vision to following the
140 L

position and shape of human lips in a the curve is locally uncurved or

video sequence. The goal might be for straight), although the curve has
lip reading, augmentation of deaf sign nonzero local curvature at other other
analysis or focusing of resolution during points (e.g., at 4 ). See also
image compression . differential geometry .

local: A local property of a Local Feature Focus (LFF) method:

mathematical object is one that is A 2D part identification and
defined in terms only of a small pose estimation algorithm that can
neighborhood of the object, for cope with large amounts of occlusion of
instance, curvature . In image the parts. The algorithm uses a
processing, a local operator operates on mixture of property-based classifiers ,
a small number of nearby pixels at a graph models and geometric models .
time. [ BKPH:4.2] The key identification process is based
around local configurations of
local binary pattern: Given a local image features that is more robust to
neighborhood about a point, use the occlusion . [ R. C. Bolles, and R. A.
value of the central pixel to threshold Cain, Recognizing and locating
the neighborhood. This creates a local partially visible objects, the
descriptor of the gray scale structure local-feature-focus method, Int. J. of
that is invariant to lightness and Robotics Research, 1:57-82, 1982.]
contrast transformations, that can be
used to create local texture primitives . local invariant: See
[ PGS:4.7] local point invariant .

local contrast adjustment: A form local operator: An image processing

of contrast enhancement that adjusts operator that computes its output at
pixel intensities based on the values of each pixel from the values of the nearby
nearby pixels instead of the values of all pixels instead of using all or most of the
pixels in the image. The right image pixels in the image. [ JKS:1.7.2]
has the eye areas brightness (from
original image at the left) enhanced local point invariant: A property of
while maintaining the backgrounds local shape or intensity that is invariant
contrast: to, e.g., translation, rotation, scaling,
contrast or brightness changes, etc. For
example, a surfaces
Gaussian curvature is invariant to
change in position.

local surface shape: The shape of a

surface in a small region around a
point, often classified into one of a small
number of surface shape classifications .
local curvature estimation: A part Computed as a function of the
of surface or curve shape estimation surface curvatures .
that estimates the curvature at a given local variance contrast: The
point based on the position of nearby variance of the pixel values computed
parts of the curve or surface. For in a neighborhood about each pixel.
example, the curve y = sin(x) has zero Contrast is the difference between the
local curvature at the point x = 0 (i.e.,
L 141

larger and smaller values of this calculus. For example, a square can be
variance. Large values of this property defined as: square(s)
occurs in highly textured or varying polygon(s) & number of sides(s, 4)
areas. & e1 e2 (e1 6= e2 &
side of (s, e1 ) & side of (s, e2 )
log-polar image: An & length(e1 ) = length(e2 )
image representation in which the & (parallel(e1 , e2 )
pixels are not in the standard Cartesian | perpendicular(e1 , e2 ))) .
layout but instead have a space varying
layout. In the log-polar case, the image long baseline stereo: See
is parameterized by a polar coordinate wide baseline stereo .
and a radial coordinate r. However,
unlike polar coordinates , the radial long motion sequence: A
distance increases exponentially as r video sequence of more than just a few
grows. The mapping from position frames in which there is significant
(, r) to Cartesian coordinates is camera or scene motion. The essential
r r
( cos(), sin()), where is some idea is that the 3D scene structure can
design parameter. Further, the amount be inferred by effectively a stereo vision
of area of the image plane represented process. Here the matched
by each pixel grows exponentially with image features can be tracked through
r, although the precise pixel size the sequence, instead of having to solve
depends on factors like amount of pixel the stereo correspondence problem . If
overlap, etc. See also foveal image . a long sequence is not available, then
The receptive fields of a log-polar analysis could use optical flow or
image (courtesy of Herman Gomes) can short baseline stereo .
be seen in the outer rings of:
look-up table: Given a finite set of
input values { xi } and a function on
these values, f (x), a look-up table
records the values
{ (xi , f (xi )) } so that the value of the
function f () can be looked up directly
rather than recomputed each time.
Look-up tables can be easily used for
color remapping or standard functions
of integer pixel values (e.g., the
logarithm of a pixels value).
[ BKPH:10.14]

lossless compression: A category of

log-polar stereo: A form of image compression in which the
stereo vision in which the input images original image can be exactly
come from log-polar sensors instead of reconstructed from the compressed
the standard Cartesian layout. image. This contrasts with
logarithmic transformation: See lossy compression . [ SB:1.3.2]
pixel logarithm operator . [ SB:3.3.1] lossy compression: A category of
logical object representation: An image compression in which the
object representation based on some original image cannot be exactly
logical formalism such as the predicate reconstructed from the compressed
142 L

image. The goal is to lose insignificant

image details (e.g., noise ) while
limiting perception of changes to the
image appearance. Lossy algorithms
generally produce greater compression
than lossless compression . [ SB:1.3.2]

low angle illumination: A

machine vision technique, often used
for industrial vision , where a
light source (usually a
point light source ) is placed so that a
ray of light from the source to the low level vision: A general and
inspection point is almost somewhat imprecisely (i.e.,
perpendicular to the surface normal at contentiously) defined term for the
that point. The situation can also arise initial stages of image analysis in a
naturally, e.g., from the sun position at vision system. It can also be used for
dawn or dusk. One consequence of this the initial stages of processing in
low angle is that shallow surface shape biological vision systems. Roughly, low
defects and cracks cast strong shadows level vision refers to the first few stages
that may simplify the inspection of processing applied to
process. See: intensity images . Some authors use this
term only for operations that result in
other images. So, edge detection is
about where most authors would say
that low-level vision ends and
middle-level vision starts. [ BB:1.2]

low pass filter: This term is imported

from 1D signal processing theory into
image processing . The term low is a
shorthand for low frequency, that, in
the context of a single image, means
low spatial frequency , i.e., intensity
patterns that change over many pixels.
Thus a low pass filter applied to an
low frequency: Usually referring to image leaves the low spatial frequency
low spatial frequency in the context of patterns, or large, slowly changing
computer vision . The low-frequency patterns, and removes the high spatial
components of an image are the slowly frequency components (sharp edges ,
changing intensity components of the noise ). Low pass filters are a kind of
image, such as large regions of bright smoothing or noise reduction filter.
and dark pixels. If low temporal Alternatively, filtering is applied to the
frequency is the intended meaning, then changing values of a given pixel over an
low frequency refers to slowly changing image sequence. In this case the pixel
patterns of brightness or darkness at values can be treated as a sampled time
the same pixel in a video sequence. sequence and the original signal
This image shows the low-frequency processing definition of low pass filter
components of an image. is appropriate. Filtering this way
[ WP:Low frequency] removes rapid temporal changes. See
L 143

also high pass filter . Here is an image

and a low-pass filtered version luma: The luminance component of
[ LG:6.2]: light. Color can be divided into luma
and chroma . [ FP:6.3.2]

luminance: The measured intensity

from a portion of a scene. [ AJ:3.2]

luminance efficiency: The sensor

specific function V () that determines
how the observed light I(x, y, ) at
sensor position (x, y) of wavelength
R to the measured luminance
l(x, y) = I()V ()d at that point.
[ AJ:3.2]
Lowes curve segmentation
luminous flux: The amount of light
method: An algorithm that tries to
at all wavelengths that passes through
split a curve into a sequence of straight
a given region in space. Proportional to
line segments. The algorithm has three
perceived brightness.
main stages: 1) a recursive splitting of
[ WP:Luminous flux]
segments into two shorter, but more
line-like segments, until all remaining luminosity coefficient: A component
segments are very short. This forms a of tristimulus color theory . The
tree of segments. 2) Merging segments luminosity coefficient is the amount of
in the tree in a bottom-up fashion luminance contributed by a given
according to a straightness measure. 3) primary color to the total perceived
Extracting the remaining unmerged luminance. [ AJ:3.8]
segments from the tree as the
segmentation result.

M-estimation: A robust some of the competences of the human

generalization of vision system. [ JKS:1.1]
least square estimation and
maximum likelihood estimation . macrotexture: The intensity pattern
[ FP:15.5.1] formed by spatially organized texture
primitives on a surface, such as a tiling.
Mach band effect: An effect in the This contrasts with microtexture .
human visual system in which a human [ JKS:7.1]
observer perceives a variation in
brightness at the edges of a region of magnetic resonance imaging
constant brightness. This variation (MRI): See NMR . [ FP:18.6]
makes the region appear slightly darker
magnification: The process of
when it is beside a brighter region and
enlargement (e.g., of an image). The
appear slightly brighter when it is
amount of enlargement applied.
beside a darker region. [ AJ:3.2]
[ AJ:7.4]
machine vision: A general term for
magnitude-retrieval problem: The
processing image data by a computer
reconstruction of a signal based on only
and often synonymous with
the phase (not the magnitude) of the
computer vision . There is a slight
Fourier transform .
tendency to use machine vision for
practical vision systems, such as for Mahalanobis distance: The distance
industrial vision , and computer between two N -dimensional points
vision for more exploratory vision scaled by the statistical variation in
systems or for systems that aim at
M 145

each component of the point. For require identifying roads, buildings or

example, if ~x and ~y are two points from land features. This image shows a road
the same distribution that has model (black) overlaying an aerial
covariance matrix C then the image
Mahalanobis distance is given by
((~x ~y ) C1 (~x ~y )) 2

The Mahalanobis distance is the same

as the Euclidean distance if the
covariance matrix is the identity
matrix. A common usage in
computer vision systems is for
comparing feature vectors whose
elements are quantities having different
ranges and amounts of variation, such
as a 2-vector recording the properties of marching cubes: An algorithm for
area and perimeter. [ SB:11.8] locating surfaces in volumetric datasets.
Given a function f () on the voxels , the
mammogram analysis: A algorithm estimates the position of the
mammogram is an X-ray of the human surface f (~x) = c for some c. This
female breast. The main purpose of requires estimating where the surface
analysis is the detection of potential intersects each of the twelve edges of a
signs of cancerous growths. voxel. Many implementations
propagate from one voxel to its
Manhattan distance: Also called the
neighbors, hence the marching term.
Manhattan metric. Motivated by the
[ W. Lorensen, and H. Cline, Marching
problem of only being able to walk
Cubes: a high resolution 3D surface
along city blocks in dense urban
construction algorithm, Computer
environments, the distance between
Graphics, Vol. 21, pp 163-169, 1987.]
points (x1 , y1 ) and (x2 , y2 ) is
| x1 x2 | + | y1 y2 |. [ BB:2.2.6] marginal distribution: A probability
distribution of a random variable X
many view stereo: See
derived from the joint probability
multi-view stereo . [ FP:11.4]
distribution of a number of random
MAP: See variables integrated over all variables
maximum a posteriori probability . except X. [ WP:Marginal distribution]
[ AJ:8.15]

map analysis: Analyzing an image of Markov Chain Monte Carlo:

a map (e.g., obtained with a flat-bed Markov Chain Monte Carlo (MCMC) is
scanner) in order to extract a symbolic a statistical inference method useful for
description of the terrain described by estimating the parameters of complex
the map. This is now a largely obsolete distributions. The method generates
process given digital map databases. samples from the distribution by
[ WP:Map analysis] running the Markov Chain that models
the problem for a long time (hopefully
map registration: The registration to equilibrium) and then uses the
of a symbolic map to (usually) aerial ensemble of samples to estimate the
or satellite image data. This may distribution. The states of the Markov
146 M

Chain are the possible configurations of type style, or a particular face viewed
the problem. at the right scale. It is similar to
[ WP:Markov chain Monte Carlo] template matching except the matched
filter can be tuned for spatially
Markov random field (MRF): An separated patterns. This is a
image model in which the value at a signal processing term imported into
pixel can be expressed as a linear image processing . [ AJ:9.12]
weighted sum of the values of pixels in
a finite neighborhood about the matching function: See
original pixel plus an additive random similarity metric . [ DH:6.7]
noise value. [ JKS:7.4]
matching method: A general term
Marrs theory: A shortened term for for finding the correspondences between
Marrs theory of the human vision two structures (e.g., surface matching )
system. Some of the key stages in this or sets of features (e.g.,
integrated but incomplete theory are stereo correspondence ). [ JKS:15.5.2]
the raw primal sketch ,
full primal sketch , 2.5D sketch and 3D mathematical morphology
object recognition . [ BT:11] operation: A class of mathematically
defined image processing operations in
MarrHildreth edge detector: An which the result is based on the spatial
edge detector based on multi-scale pattern of the input data values rather
analysis of the zero-crossings of the than values themselves. For example, a
Laplacian of Gaussian operator . morphological line thinning algorithm
[ NA:4.3.3] would identify places in an image where
a line description was represented by
mask: A term for an m n array of data more than 1 pixel wide (i.e., the
numbers or symbolic labels. A mask pattern to match). As this is
can be the smoothing mask used in a redundant, the thinning algorithm
convolution , the target in a would chose one of the redundant pixels
template matching or the kernel used to be set to 0. Mathematical
in a mathematical morphology morphology operations can apply to
operation, etc. Here is a simple mask both binary and gray scale images .
for computing an approximation to the This figure shows a small image patch
Laplacian operator [ TV:3.2]: image before and after a thinning
operation [ SQ:7]

matched filter: A matched filter is an

operator that produces a strong result matrix: A mathematical structure of a
in the output image when it processes a given number of rows and columns with
portion of the input image containing a each entry usually containing a number.
pattern for which it is matched. For A matrix can be used to represent a
example, the filter could be tuned for transformation between two coordinate
the letter e in a given font size and systems, record the covariance of a set
M 147

of vectors, etc. A matrix for rotating a parameters, position or identity

2D vector by 6 radians is [ AJ:2.7]: respectively that have highest
probability given the observed image
cos( 6 ) sin( 6 )

data. [ AJ:8.15]
sin( 6 ) cos( 6 )
maximum entropy: A method for
extracting the maximum amount of

0.866 0.500
= information ( entropy ) from a
0.500 0.866
measurement (such as an image) in the
presence of noise. This method will
matrix array camera: A 2D always give a conservative result; only
solid state imaging sensor , such as presenting structure where there is
those found in typical current video, evidence for it. [ AJ:6.2]
webcam and machine vision cameras.
[ LG:2.1.3] maximum entropy restoration: An
image restoration technique based on
matte surface: A surface whose maximum entropy . [ AJ:8.14]
reflectance follows the Lambertian
model. [ BB:3.5.1] maximum likelihood estimation:
Estimating the parameters of a problem
maximal clique: A clique (all nodes that has the highest likelihood or
are connected to all other nodes in the probability, i.e., given the observed
clique) where no further nodes exist data. For example, the maximum
that are connected to all nodes in the likelihood estimate of the mean of a
clique. Maximal cliques may have Gaussian distribution is the average of
different sizes the issue is maximality, the observed samples drawn from that
not size. Maximal cliques are used in distribution. [ AJ:8.15]
association graph matching algorithms
to represent maximally matched MCMC: See
structures. The graph below has two Markov Chain Monte Carlo .
maximal cliques: BCDE and ABD. [ WP:Markov chain Monte Carlo]
[ BB:11.3.3]
MDL: See
minimum description length.
A [ FP:16.3.4]

mean and Gaussian curvature

shape classification: A classification
of a local (i.e., very small) surface
patch (often at single pixels from a
range image ) into one of a set of simple
surface shape classes based on the signs
of the mean and Gaussian curvatures.
The standard set of shape classes is:
maximum a posteriori probability: {plane, concave cylinder, convex
The highest probability after some cylinder, concave ellipsoid, convex
event or observations. This term is ellipsoid, saddle valley, saddle ridge,
often used in the context of minimal}. Sometimes the classes
parameter estimation , pose estimation {saddle valley, saddle ridge, minimal}
or object recognition problems, in are conflated into the single class
which case we wish to estimate the
148 M

hyperbolic. This table summarizes

the classifications based on the
curvature signs:


- measurement resolution: The

degree to which two differing quantities
can be distinguished by measurement.
0 This may be the minimum spatial
distance that two adjacent pixels
represent ( spatial resolution ) or the
minimum time difference between visual
+ IMPOSSIBLE observations ( temporal resolution ), etc.
[ WP:Resolution#Measurement resolution]
mean curvature: A mathematical
characterization for a component of
medial axis skeletonization: See
local surface shape at a point on a
medial axis transform . [ BB:8.3.4]
smooth surface. Each point can be
uniquely described by a pair of medial axis transform: An operation
principal curvatures . The mean on a binary image that transforms
curvature is the average of the principal regions into sets of pixels that are the
curvatures. [ JKS:13.3.2] centers of circles that are bitangent to
the boundary and that fit entirely
mean filter: See
within the region. The value of each
mean smoothing operator . [ JKS:4.3]
point on the axis is the radius of the
mean shift: An adaptive gradient bitangent circle. This can be used to
ascent technique that operates by represent the region by a simpler
iteratively moving the center of a search axis-like structure and is most effective
window to the average of certain points on elongated regions. A region and its
within the window. [ WP:Mean-shift] medial axis are below. [ BB:8.3.4]

mean smoothing operator: A

noise reduction operator that can be
applied to a gray scale image or to
separate components of a
multi-spectral image . The output value
at each pixel is the average of the
values of all pixels in a neighborhood
of the input pixel. The size of the
neighborhood determines how much
smoothing (or noise reduction) is done, medial line: A curve going through
but also how much blurring of fine the middle of an elongated structure.
detail also occurs. A image with See also medial axis transform . This
Gaussian noise with = 13 and its figure shows a region and its medial
mean smoothing are [ JKS:4.3]: line. [ BB:8.3.4]
M 149

medical image registration: A

general term for registration of two or
more medical image types or an atlas
with some image data. A typical
registration would align X-ray CAT
and NMR images.
[ WP:Image registration#Applications]

membrane model: A surface fitting

model that minimizes a combination of
the smoothness of the fit surface and
the closeness of the fit surface to the
original data. The surface class must
medial surface: The medial surface of
have C 0 continuity and thus it differs
a volume is the 3D generalization of the
from the smoother thin plate model
medial axis of a planar region. It is the
locus of centers of spheres that touch that has C 1 continuity.
the surface of the volume at three or mesh model: A tessellation of an
more points. [ BB:8.3.4] image or surface into polygonal
median filter: See median smoothing. patches, much used in
[ JKS:4.4] computer aided design (CAD) . The
vertices of the mesh are called nodes, or
median flow filtering: A nodal points. A popular class of meshes
noise reduction operation on vector is based on triangles, for instance the
data that generalizes the median filter Delaunay triangulation . Meshes can be
on image data. The assumption is that uniform, i.e., all polygons are the same,
the vectors in a spatial neighborhood or non-uniform. Uniform meshes can be
about the current vector should be represented by small sets of parameters.
similar. Dissimilar vectors are rejected. Surface meshes have been used for
The term flow arose through the modeling free-form surfaces (e.g., faces,
filters development in the context of landscapes). See also surface fitting .
image motion. This icosahedron is a mesh model of a
nearly spherical object [ JKS:13.5]:
median smoothing: An image
noise reduction operator that replaces a
pixels value by the median (middle) of
the sorted pixel values in its
neighborhood . An image with
salt-and-pepper noise and the result of
applying median smoothing are
[ JKS:4.4]:

mesh subdivision: Methods for

subdividing cells in a mesh model into
progressively smaller cells. For example
see Delaunay triangulation .
[ WP:Mesh subdivision]
150 M

metameric colors: Colors that are extrinsic camera parameters to enable

defined by a limited number of channels metric reconstruction of a scene.
each of which integrates a range of the
spectrum. Hence the same metameric Mexican hat operator: A
color can be caused by a variety of convolution operator that implements
spectral distributions. [ BKPH:2.5.1] either a Laplacian of Gaussian or
difference of Gaussians operator (which
metric determinant: The metric produce very similar results). The mask
determinant is a measure of curvature. that can be used to implement this
For surfaces, it is the square root of the convolution has a shape similar to a
determinant of the Mexican hat (sombrero), as seen here
first fundamental form matrix of the [ JKS:5.4]:
metric property: A visual property x 10
that is a measurable quantity, such as a
distance or area. This contrasts with
logical properties such as 0
image connectedness . [ HZ:1.7]
metric reconstruction: 2
Reconstruction of the 3D structure of a
scene with correct spatial dimensions
and angles. This contrasts with 4
projective reconstruction . Two views of 2 2
0 0
a metrical and projective reconstruction 2 2
of a cube are below. The metrical Y
projection looks correct from all
views, but the perspective projection micron: One millionth of a meter; a
may look correct only from the views micrometer. [ EH:2.2]
where the data was acquired.
[ WP:Camera auto- microscope: An optical device
calibration#Problem statement] observing small structures such as
organic cells, plant fibers or integrated
circuits. [ EH:5.7.5]

microtexture: See statistical texture .

[ RN:8.3.1]

OBSERVED VIEW mid-sagittal plane: The plane that


METRICAL RECONSTRUCTION separates the body (and brain) into left


and right halves. In medical imaging

metric stratum: These are the set of (e.g., NMR ), it usually refers to a view
similarity transformations (i.e., rigid of the brain sliced down the middle
transformations with a scaling). This is between the two hemispheres.
what can be recovered from image data [ WP:Sagittal plane#Variations]
without external information such as
some known length. middle level vision: A general term
referring to the stages of visual data
metrical calibration: Calibration of processing between low level and
intrinsic and high level vision. There are many
M 151

variations of the definition of this term usually requires several components: 1)

but a usable rule of thumb is that the models observed (e.g., whether
middle level vision starts with lines or circular arcs), 2) the parameters
descriptions of the contents of an image of the models (e.g., the line endpoints),
and results in descriptions of the 3) how the image data varies from the
features of the scene. Thus, models (e.g., explicit deviations or
binocular stereo would be a middle noise model parameters) and 4) the
level vision process because it acts on remainder of the image that is not
image edge fragments to produce 3D explained by the models. [ FP:16.3.4]
scene fragments.
minimum distance classifier: Given
MIMD: See an unknown sample with feature vector
multiple instruction multiple data . ~x, select the class c with model vector
[ RJS:8] m
~ c for which the distance || ~x m
~ c || is
smallest. [ SB:11.8]
minimal point: A point on a
hyperbolic surface where the two minimum spanning tree: See
principal curvatures are equal in minimal spanning tree . [ DH:]
magnitude but opposite in sign, i.e.,
1 = 2 . [ WP:Maxima and minima] MIPS: millions of instructions per
second. [ WP:Instructions per second]

minimal spanning tree: Consider a mirror: A specularly reflecting surface

graph G and a subset T of the arcs in for which incident light is reflected only
G such that all nodes in G are still at the same angle and in the same plane
connected in T and there is exactly one as the surface normal . [ EH:5.4]
path joining any two nodes. T is a
miss-one-out test: See
spanning tree. If each arc has a weight
leave-one-out test . [ FP:22.1.5]
(possibly constant), the minimal
spanning tree is the tree T with missing data: Data that is
smallest total weight. This is a graph unavailable, hence requiring it to be
and its minimal spanning tree estimated. For example a moving
[ DH:]: person may become occluded resulting
in missing position data for a number of
frames. [ FP:16.6.1]

missing pixel: A pixel for which no

value is available (e.g., if there was a
minimum bounding rectangle: The problem with a sensing element in the
rectangle of smallest area that image sensor). [ FP:16.6.1]
surrounds a set of image data.
[ WP:Minimum bounding rectangle] mixed pixel: A pixel whose
measurement arises from more than one
minimum description length scene phenomena. For example, a pixel
(MDL): A criterion for comparing that observes the edge between two
descriptions usually based on the regions. This pixel has a gray level
implicit assumption that the best that lies between the different gray
description is the one that is shortest levels of the two regions.
(i.e., takes the fewest number of bits to
encode). The minimum description
152 M

mixed reality: Image data that

contains both original image data and
overlaid computer graphics. See also
augmented reality . This image shows
an example of mixed reality, where the
butterfly is a graphical object added to
the image of the small robot:
[ WP:Mixed reality]

model: An abstract representation of

some object or class of objects.
[ WP:Model]

model acquisition: The process of

learning a model, usually based on
observed instances or examples of the
structure being modeled. This may be
simply learning the parameters of a
distribution from examples. For
example, one might learn the image
texture properties that distinguish
tumorous cells from normal cells.
Alternatively, the structure of the
object might be learned as well, such as
constructing a model of a building from
a video sequence. Another type of
mixture model: A probabilistic model acquisition is learning the
representation in which more than one properties of an object, such as what
distribution is combined, modeling a properties and relations define a square
situation where the data may arise from as compared to other geometric shapes.
different sources or have different [ FP:21.3]
behaviors, each with different
probability distributions. [ FP:16.6.1] model base: A database of models
usually used as part of an identification
MLE: See process. [ JKS:15.1]
maximum likelihood estimation .
[ AJ:8.15] model base indexing: Selecting one
or more candidate models from a
modal deformable model: A model database of structures known by
deformable model based on modal the system. This is usually to eliminate
analysis (i.e., study of the different exhaustive testing with every member
shapes that an object can assume). of the model base. [ FP:16.3]

mode filter: A noise reduction filter model based coding: A method of

that, for each pixel, outputs the mode encoding the contents of an image (or
(most common) value in its local video sequence ) using a pre-defined or
neighborhood . The figure below shows learned set of models. This could be for
a raw image with salt-and-pepper noise producing a more compact description
and the filtered version at the right. of the image data (see
[ NA:3.5.3] model based compression ) or for
M 153

producing a symbolic description. For an image sequence . For example, the

example, a Mondrian style image could estimated position, orientation and
be encoded by the positions, sizes and velocity of a modeled vehicle in one
colors of the colored rectangular image allows a strong prediction of its
regions. location in the next image in the
sequence. [ FP:17]
model based compression: An
application of model based coding for model based vision: A general term
the purpose of reducing the amount of for using models of the objects
memory required to describe the image expected to be seen in the image data
while still allowing reconstruction of the to help with the image analysis. The
original image. [ SEU:5.3.6] model allows, among other things,
prediction of additional model feature
model based feature detection: positions, verification that a set of
Using a parametric model of a feature features could be part of the model and
to locate instances of the feature in an understanding of the appearance of the
image. For example, a model in the image data. [ FP:18]
parametric edge detector uses a
parameterized model of a step edge model building: See also
that encodes edge direction and model acquisition . The process of
edge magnitude. constructing a geometric model usually
based on observed instances or
model based recognition: examples of the structure being
Identification of the structures in an modeled, such as from a video
image by using some internally sequence. [ FP:21.3]
represented model of the objects known
to the computer system. The models model fitting: See model registration.
are usually geometric models. The [ RN:3.3]
recognition process finds image features
that match the model features with the model invocation: See
right shape and position. The model base indexing . [ FP:16.3]
advantage of model based recognition is
model reconstruction: See
that the model encodes the object
model acquisition . [ FP:21.3]
shape thus allowing predictions of
image data and less chance of model registration: A general term
coincidental features being falsely for aligning a geometric model to a set
recognized. [ TV:10.1] of image data. The process may require
estimating the rotation , translation
model based segmentation: An
and scale that maps a model onto the
image segmentation process that uses
image data. There may also be shape
geometric models to partition the
parameters, such as model length, that
image into different regions. For
need to be estimated. The fitting may
example, aerial images could have the
need to account for
visible roads segmented by using a
perspective distortion . This figure
geographic information system model
shows a 2D model registered on an
of the road network. [ FP:14]
intensity image of the same part.
model based tracking: An image [ RN:3.3]
tracking process that uses models to
locate the position of moving targets in
154 M

Moire interferometry: A technique

for contouring surfaces that works by
projecting a fringe pattern (e.g., of
straight lines) and observing this
model selection: See pattern through another grating. This
model base indexing . [ FP:16.3] effect can be acheieved in other ways as
well. The technique is useful for
modulation transfer function measuring extremely small stress and
(MTF): Informally, the MTF is a distortion movements.
measure of how well spatially varying [ WP:Moire pattern#Interferometric approach]
patterns are observed by an optical
system. More formally, in a 2D image,
let X(fh , fv ) and Y (fh , fv ) be the Moire pattern: See moire fringe .
Fourier transforms of the input x(h, v) [ AJ:4.4]
and output y(h, v) images. Then, the Moire topography: A method for
MTF of a horizontal and vertical measuring the local shape of a surface
spatial frequency pair (fh , fv ) is by analyzing the spacing of
| H(fh , fv ) | / | H(0, 0) |, where moire fringes on the target surface.
H(fh , fv ) = Y (fh , fv )/X(fh , fv ). This
is also the magnitude of the moment: A method for summarizing
optical transfer function . [ AJ:2.6] the distribution of pixel positions or
values. Moments are a parameterized
Moire fringe: An interference pattern family of values. For example, if I(x, y)
that is observed when spatially is a binary image then x,y I(x, y)xp y q
sampling, at a given spatial frequency ,
computes its pq th moment mpq . (See
a signal that has a slightly different
also gray level moments and
spatial frequency. The result is a set of
moments of intensity .) [ AJ:9.8]
light and dark bands in the observed
image. As well as causing image moment characteristic: See
degradation, this effect can also be used moment invariant . [ AJ:9.8]
in range sensors , where the fringe
positions give an indication of surface moment invariant: A function of
depth. An example of typical observed image moment values that keeps the
fringe patterns is [ AJ:4.4]: same value even if the image is
M 155

transformed in some manner. For another. For example, motion parallax

example, the value A12 ((20 )2 + (02 )2 ) or occlusion relationships give evidence
is invariant where pq are of relative depths.
central moments of a binary image [ WP:Depth perception#Monocular cues]
region and A is the area of the region.
This value is a constant even if the
image data is translated , rotated or monocular visual space: The visual
scaled . [ AJ:9.8] space behind the lens in an optical
system. This space is commonly
moments of intensity: An image assumed to be without structure but
moment value that takes account of the scene depth can be recovered from the
gray scales of the image pixels as well defocus blurring that occurs in this
as their positions. For example, if space.
G(x, y) is a gray scale image , then
x,y G(x, y)xp y q computes its pq th monotonicity: A sequence of values
moment of intensity gpq . See also or function that is either continuously
gray level moment . increasing (monotone increasing) or
continuously decreasing (monotone
Mondrian: A famous visual artist decreasing). [ WP:Monotonic function]
from the Netherlands, whose later
paintings were composed of adjacent
rectangular blocks of constant (i.e., Moravec interest point operator:
without shading ) color . This style of An operator that locates interest points
image has been used for much color at pixels where neighboring intensity
vision research and, in particular, values change greatly in at least one
color constancy because of its direction. These points can be used for
simplified image structure without stereo matching or feature point
shading, specularities , shadows or tracking. The operator computes the
light sources . [ BKPH:9.2] sum of the squares of pixel differences
in a line vertically, horizontally and
monochrome: Containing only both diagonal directions in a 5 5
different shades of a single color . This window about the given pixel. The
color is usually different shades of gray, minimum of these four values is
going from pure black to pure white. selected and then all values that are not
[ WP:Monochrome] local maxima or are below a given
threshold are suppressed. This image
monocular: Using a single camera, shows the interest points found by the
sensor or eye. This contrasts with Moravec operator as white dots on the
binocular and multi-ocular stereo original image. [ JKS:14.3]
where more than one sensor is used.
Sometimes there is also the implication
that the image data is acquired from
only a single viewpoint as a single
camera taking images over time is
mathematically equivalent to multiple
cameras. [ BB:2.2.2]

monocular depth cue: Image

evidence that indicates that one surface morphological gradient: A
may be closer to the viewer than gray scale mathematical morphology
156 M

operation applied to gray scale images morphology: The shape of a

that results in an output image similar structure. See also
to the standard intensity gradient . The mathematical morphology . [ AJ:9.9]
gradient is calculated by
1 morphometry: Techniques for the
2 (DG (A, B) EG (A, B)) where DG ()
and EG () are the gray scale dilate and measurement of shape.
erode respectively of image A by kernel [ WP:Morphometrics]
B. [ CS:4.5.5]
mosaic: The construction of a larger
morphological segmentation: Using image from a collection of partially
mathematical morphology operations overlapping images taken from different
applied to binary images to extract viewpoints . The reconstructed image
isolated regions of the desired shape. could have different geometries, e.g., as
The desired shape is specified by the if seen from a single perspective
morphological kernel . The process viewpoint, or as if seen from an
could also be used to separate touching orthographic viewpoint. See also
objects. image mosaic . [ RJS:2]

morphological smoothing: A motion: A general language term, but,

gray scale mathematical morphology in the context of computer vision, refers
operation applied to gray scale images to analysis of an image sequence where
that results in an output image similar the camera position or scene structure
to that produced by standard changes over time. [ BB:7]
noise reduction . The smoothing is
motion analysis: Analysis of an
calculated by CG (OG (A, B), B) where
image sequence in order to extract
CG () and OG () are the gray scale close
useful information. Examples of
and open operations respectively of
information routinely extracted include:
image A by kernel B.
shape of observed scene,
morphological transformation: One figureground separation ,
of a large class of binary and egomotion estimation , and estimates of
gray scale image transformations whose a targets position and motion.
primary characteristic is they react to [ BB:7.2-7.3]
the pattern of the pixel values rather
motion blur: The blurring of an
than the values themselves. Examples
image that arises when either the
include dilation , erosion , skeletonizing,
camera or something in the scene
thinning , etc. The right figure below is
moves while the image is being
the opening of the left figure, when
acquired. The image below shows the
using a disk shaped structuring element
blurring that occurs when an object
11 pixels in diameter. [ AJ:9.9]
moves during image capture.
[ WP:Motion blur]
M 157

point onto the image plane . In many

circumstances this is closely related to
the optical flow , but may differ as
image intensities can also change due to
illumination changes. Similarly, motion
of a uniformly shaded region is not
observable locally because there is no
changes in image intensity values .
[ TV:8.2]

motion layer segmentation: The

motion coding: 1) A component of segmentation of an image into different
video sequence compression in which regions where the motion is locally
efficient methods are used for consistent. The layering effect is most
representing movement of image regions noticeable when the observer is moving
between video frames . 2) A term for through a scene with objects at
neural cells tuned to respond for different depths (causing different
direction and speeds of image motion. amounts of parallax ) some of which
[ WP:Motion coding] might also be moving. See also
motion detection: Analysis of an motion segmentation . [ TV:8.6]
image sequence to determine if or when motion model: A mathematical
something in the observed scene moves. model of types of motion allowable for
See also change detection . [ JKS:14.1] the target object or camera, such as

motion discontinuity: When the only linear motion along the optical
smooth motion of either the camera or axis with constant velocity. Another
something in the scene changes, such example might allow velocities and
as the speed or direction of motion. accelerations in any direction, but
Another form of motion discontinuity is occasionally discontinuities, such as for
between two groups of adjacent pixels a bouncing ball. [ BB:7]
that have different motions. motion representation: See
motion estimation: Estimating the motion model. [ BB:7]
motion direction and speed of the motion segmentation: See
camera or something in the scene . motion layer segmentation . [ TV:8.6]
[ RJS:5]
motion sequence analysis: The class
motion factorization: Given a set of of computer vision algorithms that
tracked feature points through an process sequences of images captured
image sequence , a measurement matrix close together in space and time,
can be constructed. This matrix can be typically by a moving camera. These
factored into component matrices that analyses are often characterized by
represent the shape and 3D motion of assumptions on temporal coherence
the structure up to an 3D that simplify computation. [ BB:7.3]
affine transform (which is removable
using knowledge of the motion smoothness constraint: The
intrinsic camera parameters ). assumption that nearby points in the
image have similar motion directions
motion field: The projection of the and speeds, or similar optical flow .
relative motion vector for each scene
158 M

This constraint is based on the fact

that adjacent pixels generally record moving observer: A camera or other
data from the projection of adjacent sensor that is moving. Moving
surface patches from the scene. These observers have been extensively used in
scene components will have similar recent research on
motion relative to the observer. This structure from motion . [ VSN:8]
assumption can help reduce motion
MPEG: Moving Picture Experts
estimation errors or constrain the
Group. A group developing standards
ambiguity in optical flow estimates
for coding digital audio and video, as
arising from the aperture problem .
used in video CD, DVD and digital
motion tracking: Identification of the television. This term is often used to
same target feature points through an refer to media that is stored in the
image sequence . This could also refer MPEG 1 format.
to tracking complete objects as well as [ WP:Moving Picture Experts Group]
feature points, including estimating the
MPEG 2: A standard formulated by
trajectory or motion parameters of the
the ISO Motion Pictures Expert Group
target. [ FP:17]
(MPEG), a subset of ISO
movement analysis: A general term Recommendation 13818, meant for
for analyzing an image sequence of a transmission of studio-quality audio
scene where objects are moving. It is and video. It covers four levels of video
often used for analysis of human motion resolution. [ WP:MPEG-2]
such as for people walking or using sign
MPEG 4: A standard formulated by
language. [ BB:7.2-7.3]
the ISO Motion Pictures Expert Group
moving average smoothing: A form (MPEG), originally concerned with
of image noise reduction that occurs similar applications as H.263 (very low
over time by averaging the most recent bit rate channels, up to 64 kbps).
images together. It is based on the Subsequently extended to encompass a
assumption that variations in time of large set of multimedia applications,
the observed intensity at a pixel are including over the Internet.
random. Thus, averaging the values [ WP:MPEG-4]
will produce intensity estimates closer
MPEG 7: A standard formulated by
to the true (mean) value. [ DH:7.4]
the ISO Motion Pictures Expert Group
moving light display: An (MPEG). Unlike MPEG 2 and MPEG
image sequence of a darkened scene 4, that deal with compressing
containing objects with attached multimedia contents within specific
point light sources . The light sources applications, it specifies the structure
are observed as a set of moving bright and features of the compressed
spots. This sort of image sequence was multimedia content produced by the
used in the early research on different standards, for instance to be
structure from motion . used in search engines.
[ WP:MPEG-7]
moving object detection: Analyzing
an image sequence , usually with a MRF: See Markov random field .
stationary camera, to detect whether [ JKS:7.4]
any objects in the scene move.
[ JKS:14.1]
M 159

MRI: Magnetic Resonance Imaging. counts or other evidence values in the

See nuclear magnetic resonance . array makes it a histogram. [ BB:5.3.1]
[ FP:18.6]

MSRE: Mean Squared Reconstruction multi-grid method: An efficient

Error. algorithm for solving systems of
discretized differential (or other)
MTF: See equations. The term multi-grid is
modulation transfer function . used because the system is first solved
[ AJ:2.6] at a coarse sampling level, which is then
used to initialize a higher-resolution
multi-dimensional edge detection:
solution. [ WP:Multigrid method]
A variation on standard edge detection
of gray scale images in which the input multi-image registration: A general
image is multi-spectral (e.g., a RGB term for the geometric alignment of two
color image). The edge detection or more image datasets. Alignment
operator may detect edges in each allows pixels from the different source
dimension independently and then images to lie on top of each other or to
combine the edges or may use all be combined. (See also sensor fusion .)
information at each pixel directly. The For example, two overlapping intensity
following image shows edges detected images could be registered to help
from red, green and blue components of create a mosaic . Alternatively, the
an RGB image. images need not be from the same type
of sensor. (See multi-modal fusion .)
For example, NMR and CAT images
of the same body part could be
R registered to provide richer information,
e.g., for a doctor. This image shows
two unregistered range images on the
left and the registered datasets on the
right. [ FP:21.3]

multi-level: See multi-scale method.

multi-dimensional histogram: A multi-modal analysis: A general

histogram with more than one term for image analysis using image
dimension. For example consider data from more than one sensor type.
measurements as vectors, e.g., from a There is often the assumption that the
multi-spectral image , with N data is registered so that each pixel
dimensions in the vector. Then one records data of two or more types from
could create a histogram represented the same portion of the observed scene.
by an array with dimension N . The N [ WP:Computer Audition#Multi-
components in each vector are used to modal analysis]
index into the array. Accumulating
160 M

methods are: 1) some structures have

multi-modal fusion: See different natural scales (e.g., a thick
sensor fusion . bar could also be considered to be two
[ WP:Multimodal integration] back-to-back edges) and 2) coarse scale
information is generally more reliable in
multi-modal neighborhood
the presence of image noise , but the
signature: A description of a feature
spatial accuracy is better in finer scale
point based on the image data in its
information (e.g., an edge detector
neighborhood. The data comes several
might use a coarse scale to reliably
registered sensors, such as X-ray and
detect the edges and a finer scale to
locate them more accurately). Below is
multi-ocular stereo: A an image with two scales of blurring.
stereo triangulation process that uses
more than one camera to infer 3D
information. The terms
binocular stereo and trinocular stereo
are commonly used when there are only
two or three cameras respectively.

multi-resolution method: See

multi-scale method . [ BB:3.7]

multi-scale description: See

multi-scale method .

multi-scale integration: 1)
Combining information extracted by
using operators with different scales . multi-scale representation: A
2) Combining information extracted representation having image features
from registered images with different or descriptions that belong to two or
scales. These two definitions could just more scales . An example might be
be two ways of considering the same zero crossings detected from
process if the difference in operator intensity images that have received
scale is only a matter of the amount of increasing amounts of
smoothing . An example of multi-scale Gaussian smoothing . A multi-scale
integration occurs combining edges model representation might represent
extracted from images with different an arm as a single generalized cylinder
amounts of smoothing to produce more at a coarse scale, two generalized
reliable edges. cylinders at an intermediate scale and
with a surface triangulation at a fine
multi-scale method: A general term scale. The representation might have
for a process that uses information results from several discrete scales or
obtained from more than one scale of from a more continuous range of scales,
image. The different scales might be as in a scale space . Below are zero
obtained by reducing the image size or crossings found at two scales of
by Gaussian smoothing of the image. Gaussian blurring.
Both methods reduce the [ WP:Scale space#Related multi-
spatial frequency of the information. scale representations#Related multi-
The main reasons for multi-scale scale representations]
M 161

red, green and blue components of an

RGB image. [ SEU:1.7.4]

multi-sensor geometry: The relative
placement of a set of sensors or
multiple views from a single sensor but
from different positions. One key
consequence of the different placements B
is ability to deduce the 3D structure of
the scene. The sensors need not be the
same type but usually are for
multi-spectral segmentation:
convenience. [ FP:11.4]
Segmentation of a
multi-spectral analysis: Using the multi-spectral image. This can be
observed image brightness at different addressed by segmenting the image
wavelengths to aid in the understanding channels individually and then
of the observed pixels. A simple version combining the results, or alternatively
uses RGB image data. Seven or more the segmentation can be based on some
bands, including several infrared combination of the information from
wavelengths are often used for satellite the channels.
remote sensing analysis. Recent [ WP:Multispectral segmentation]
hyperspectral sensors can give
multi-spectral thresholding: A
measurements at 100200 different
segmentation technique for
wavelengths. [ SQ:17.1]
multi-spectral image data. A common
multi-spectral image: An image approach is to threshold each spectral
containing data measured at more than channel independently and then
one wavelength. The number of logically AND together the resulting
wavelengths may be as low as two (e.g., images. An alternative is to cluster
some medical scanners), three (e.g., pixels in a multi-spectral space and
RGB image data), or seven or more choose thresholds that select desired
bands, including several infrared clusters. The images below show a
wavelengths (e.g., satellite colored image first thresholded in the
remote sensing ). Recent blue channel (0100 accepted) and then
hyperspectral sensors can give ANDed with the thresholded green
measurements at 100200 different channel (0100 accepted).
wavelengths. The typical
image representation uses a vector to
record the different spectral
measurements at each pixel of an image
array. The following image shows the
162 M

multi-tap camera: A camera that multiple motion segmentation: See

provides multiple outputs. motion segmentation . [ TV:8.6]

multi-thresholding: Thresholding multiple target tracking: A general

using a number of thresholds giving a term for tracking multiple objects
result that has a number of gray scales simultaneously in an image sequence.
or colors. In the following example the Example applications include tracking
image has been thresholded with two football players and automobiles on a
thresholds (113 and 200). road.

multiple view interpolation: A

technique for creating (or recognizing)
new unobserved views of a scene from
example images captured from other
viewpoints .

multiplicative noise: A model for the

multi-variate normal distribution:
corruption of a signal where the noise is
A Gaussian distribution for a variable
proportional to the signal strength.
that is a vector rather than as a scalar.
f (x, y) = g(x, y) + g(x, y).v(x, y) where
Let ~x be the vector variable with
f (x, y) is the observed signal, g(x, y) is
dimension N . Assume that this
the ideal (original) signal and v(x, y) is
variable has mean value ~ x and
the noise.
covariance matrix C. Then the
probability of observing the particular Munsell color notation system: A
value ~x is given by [ SB:11.11]: system for precisely specifying colors
and their relationships, based on hue ,
1 21 (~ x ) C1 (~
x~ x~
x )
N 1 e
value ( brightness ) and chroma
(2) 2 | C | 2 (saturation). The Munsell Book of
Color contains colored chips indexed
by these three attributes. The color of
multi-view geometry: See any unknown surface can be identified
multi-sensor geometry . [ FP:11.4] by comparison with the colors in the
book under specified lighting and
multi-view image registration: See viewing conditions. [ GM:5.3.6]
multi-image registration . [ FP:21.3]
mutual illumination: When light
multi-view stereo: See reflecting from one surface illuminates
multi-sensor geometry . [ FP:11.4] another surface and vice versa. The
multiple instruction multiple data consequence of this is that light
(MIMD): A form of parallelism in observed coming from a surface is a
which, at any given time, each function of not only the light source
processor might be executing a different spectrum and the reflectance of the
instruction or program on a different target surface, but also the reflectance
dataset or pixel. This contrasts with of the nearby surface (through the
single instruction multiple data spectrum of the light reflecting from
parallelism where all processors execute the nearby surface onto the first
the same instruction simultaneously surface). The following diagram shows
although on different pixels. [ RJS:8] how mutual illumination can occur.
M 163

images) have in common. In other

CA LIGHT words given a data item A and an
M unknown data item B, the mutual

A information
M I(A, B) = H(B) H(B|A) where

H(x) is the entropy. [ CS:6.3.4]




mutual interreflection: See

RED SURFACE mutual illumination .
mutual information: The amount of
information two pieces of data (such as

NAND operator: An from a video sequence taken by a

arithmetic operation where a new moving camera.
image is formed by NANDing (logical
AND followed by NOT) together near infrared: Light wavelengths
corresponding bits for every pixel of the approximately in the range 7505000
two image images. This operator is nm. [ WP:Infrared]
most appropriate for binary images but
nearest neighbor: A classification ,
may also be applied to
labeling or grouping principle in which
gray scale images . For example the
a data item is associated with or takes
following shows the NAND operator
the same label as the previously
applied to two binary images
classified data item that is nearest to
[ SB:3.2.2]:
the first data item. This distance might
be based on spatial distance or a
distance in a property space. In this
figure the unknown square is classified
with the label of the nearest point,
namely a circle. [ JKS:15.5.1]

narrow baseline stereo: A form of x

stereo triangulation in which the sensor x x
positions are close together. The x
baseline is the distance between the
sensor positions. Narrow baseline stereo x
often occurs when the image data is x x
N 165

vertices that are connected to v by an

Necker cube: A line drawing of a arc. 2) The neighborhood of a point (or
cube drawn under pixel) x is a set of points near x. A
orthographic projection , which as a common definition is the set of points
result can be interpreted in two ways. within a certain distance of x, where
[ VSN:4] the distance metric may be
Manhattan distance or
Euclidean distance . 3) The 4 connected
neighborhood of a 2D location (x, y) is
the set of image locations
{(x+1, y), (x1, y), (x, y+1), (x, y1)}.
The 8 connected neighborhood is the
Necker reversal: An ambiguity in the set of pixels
recovery of 3D structure from multiple {(x + i, y + j)| 1 i, j 1}. The 26
images. Under affine viewing connected neighborhood of a 3D point
conditions, the sequence of 2D images (x, y, z) is defined analogously.
of a set of rotating 3D points is the [ SQ:4.5]
same as the sequence produced by the
rotation in the opposite direction of a
different set of points, so that two
solutions to the structure and motion
problem are possible. The different set
of points is the reflection of the first set 4-connected 8-connected

about any plane perpendicular to the

optical axis of the camera. [ HZ:13.6] neural network: A classifier that
maps input data ~x of dimension n to a
needle map: An image representation space of outputs ~y of dimension m. As
used for displaying 2D and 3D vector a black box, the network is a function
fields, such as surface normals . Each f : Rn 7 [0, 1]m . The most commonly
pixel has a vector. Diagrams showing used form of neural network is the
these use little lines with the magnitude multi-layer perceptron (MLP). An
and direction of the vector projected MLP is characterized by a m n
onto the image of a 3D vector. To avoid matrix of weights W, and a transfer
overcrowding the image, the pixels function that maps the reals to [0, 1].
where the lines are drawn are a subset The output of the single-layer network
of the full image. This image shows a is f~(~x) = (W~x) where is applied
needle map of the surface normals on elementwise to vector arguments. A
the block sides. [ BKPH:11.8] multi-layer network is a cascade of
single-layer networks, with different
weights matrices at each layer. For
example, a two-layer network with k
hidden nodes is defined by weights
matrices W1 Rkn and W2 Rmk ,
and written f (~x) = (W2 (W1 ~x)). A
negate operator: See invert operator common choice for is the sigmoid
. [ SB:3.2.2] function (t) = (1 + est )1 for some
value of s. When we make it explicit
neighborhood: 1) The neighborhood that f~ is a function of the weights as
of a vertex v in a graph is the set of well as the input vector, it is written
166 N

f~(W; ~x).Typically, a neural network is noise: A general term for the deviation
trained to predict the relationship of a signal away from its true value.
between the ~xs and ~y s of a given In the case of images , this leads to pixel
collection of training examples . values (or other measurements) that are
Training means setting the weights different from their expected values.
matrices Pto minimize the training error The causes of noise can be random
e(W) = i d(~yi , f~(W; ~xi )) where d factors, such as thermal noise in the
measures distance between the network sensor, or minor scene events, such as
output and a training example. dust or smoke. Noise can also represent
Common choices for d(~y , ~y ) include the systematic, but unmodeled, events such
2-norm k~y ~y k2 . [ FP:22.4] as short term lighting variations or
quantization . Noise might be reduced
Newtons optimization method: To or removed using a noise reduction
find a local minimum of function method. Here are images without and
f : Rn 7 R from starting position ~x0 . with salt-and-pepper noise . [ TV:3.1]
Given the functions gradient f and
Hessian H evaluated at ~xk , the Newton
update is ~xk+1 = ~xk H1 f . If f is a
quadratic form then a single Newton
step will directly yield the global
minimum. For general f , repeated
Newton steps will generally converge to
a local optimum. [ FP:3.1.2]

next view planning: When

inspecting an object or obtaining a
geometric or appearance-based model, noise model: A way to model the
it may be necessary to observe the statistical properties of noise without
object from several places. Next view having to model the causes of the noise.
planning determines where to next One general assumption about noise is
place the camera (by moving either the that it has some underlying, but
object or the camera) based on either perhaps unknown, distribution. A
what was observed (in the case of Gaussian noise model is a commonly
unknown objects) or a geometric model used for random factors and a
(in the case of known objects). uniform distribution is often used for
unmodeled scene effects. Noise could be
next view prediction: See modeled with a mixture model . The
next view planning . noise model typically has one or more
parameters that control the magnitude
NMR: See nuclear magnetic resonance of the noise. The noise model can also
. [ FP:18.6] specify how the noise affects the signal,
node of graph: A symbolic such as additive noise (which offsets
representation of some entity or feature. the true value) or multiplicative noise
It is connected to other nodes in a (which rescales the true value). The
graph by arcs , that represent type of noise model can constrain the
relationships between the different type of noise reduction method.
entities. [ SQ:12.1] [ AJ:8.2]

noise reduction: An
image processing method that tries to
N 167

reduce the distortion of an image that

has been caused by noise . For example, non-accidentalness: A general
the images from a video sequence taken principle that can be used to improve
with a stationary camera and scene image interpretation based on the
can be averaged together to reduce the concept that when regularities appear
effect of Gaussian noise because the in an image , they are most likely to
average value of a signal corrupted with result from regularities in the scene .
this type of noise converges to the true For example, if two straight lines end
value. Noise reduction methods often near to each other, then this could have
introduce other distortions, but these arisen from a coincidental alignment of
may be less significant to the the line ends and the observer.
application than the original noise. An However, it is much more probable that
image with salt-and-pepper noise and the two lines end at the same point in
its noise reduced by median smoothing the observed scene. This figure shows
are shown in the figure. [ TV:3.2] line terminations and orientations that
are unlikely to be coincidental.



noise removal: See noise reduction.

non-hierarchical control: A way of
[ TV:3.2]
structuring the sequence of actions in
noise source: A general term for an image interpretation system.
phenomena that corrupt image data. Non-hierarchical control is when there
This could be systematic unmodeled is no master process that orders the
processes (e.g., 60 Hz electromagnetic sequence of actions or operators
noise) or random processes (e.g., applied. Instead, typically, each
electronic shot noise). The sources operator can observe the current results
could be in the scene (e.g., chaff), in and decide if it is capable of executing
the medium (e.g., dust), in the lens and if it is desirable to do so.
(e.g., imperfections) or in the sensor
nonlinear filter: A process where the
(e.g., sensitivity variations).
outputs are a nonlinear function of the
[ WP:Noise]
inputs. This covers a large range of
noise suppression: See algorithms. Examples of nonlinearity
noise reduction . [ TV:3.2] might be: 1) doubling the values of all
input data does not double the values
noise-whitening filter: A noise of the output results (e.g., a filter that
modifying filter that outputs images reports the position at which a given
whose pixels have noise that is value appears), 2) applying an operator
independent of 1) other pixels noise to the sum of two images gives different
(spatial noise) or 2) other values of that results from adding the results of the
pixel at other times (temporal noise). operator applied to the two original
The resulting images noise is images (e.g., thresholding ). [ AJ:8.5]
white noise . [ AJ:6.2]
168 N

non-maximal suppression: A
technique for suppressing multiple non-rigid registration: The problem
responses (e.g., high values of of registering, or aligning, two shapes
gradient magnitude ) representing a that can take on a variety of
single edge or other feature. The configurations (unlike rigid shapes).
resulting edges should be a single pixel For instance, a walking person, a fish,
wide. [ JKS:5.6.1] and facial features like mouth and eyes
are all non-rigid objects, the shape of
non-parametric clustering: A data which changes in time. This type of
clustering process such as registration is frequently needed in
k-nearest neighbor that does not medical imaging as many human body
assume an underlying probability parts deform. Non-rigid registration is
distribution. considerably more complex than rigid
registration. See also alignment ,
non-parametric method: A registration , rigid registration .
probabilistic method used when the
form of the underlying probability non-rigid tracking: A tracking
distribution is unknown or multi-modal. process that is designed to track
Typical applications are to estimate the non-rigid objects . This means that it
a posteriori probability of a can cope with changes in actual object
classification given an observation. shape as well as apparent shape due to
Parzen windows or k-nearest neighbor perspective projection and observer
classifiers are often used. viewpoint .
[ WP:Non-parametric statistics]
non-symbolic representation: A
non-rigid model representation: A model representation in which the
model representation where the shape appearance is described by a numerical
of the model can change, perhaps under or image-based description rather than
the control of a few parameters. These a symbolic or mathematical description.
models are useful for representing For example, non-symbolic models of a
objects whose shape can change, such line would be a list of the coordinates of
as moving humans or biological the points in the line or an image of the
specimens. The differences in shape line. Symbolic object representations
may occur over time or be between include the equation of the line or the
different instances. Changes in endpoints of the line.
apparent shape due to
perspective projection and observer normal curvature: A plane that
viewpoint are not relevant here. By contains the surface normal ~n at point
contrast, a rigid model would have the p~ to a surface intersects that surface to
same actual shape irrespective of the form a planar curve that passes
viewpoint of the observer. through p~. The normal curvature is the
curvature of at p~. The intersecting
non-rigid motion: A motion of an plane can be at any specified
object in the scene in which the shape orientation about the surface normal.
of the object also changes. Examples See [ JKS:13.3.2]:
include: 1) the position of a walking
persons limbs and 2) the shape of a
beating heart. Changes in apparent
shape due to perspective projection
and viewpoint are not relevant here.
N 169

different viewpoints. One method is by

3D reconstruction, e.g., from
binocular stereo , and then rendering
the reconstruction using computer
graphics. However, the main
p approaches to novel view synthesis use
epipolar geometry and the pixels of two
or more images of the object to directly
synthesize a new image without
creating a 3D reconstruction.

NP-complete: A concept in
computational complexity covering a
normal distribution: See special set of problems. All of these
Gaussian distribution . [ AJ:2.9] problems currently can be solved, in the
worst case, in time exponential O(eN )
normal flow: The component of
in the number or size N of their input
optical flow in the direction of the
data. For the subset of exponential
intensity gradient . The orthogonal
problems called NP-complete, if an
component is not locally observable
algorithm for one could be found that
because small motions orthogonally do
executes in polynomial time O(N p ) for
not change the appearance of local
some p, then a related algorithm could
be found for any other NP-complete
normalized correlation: 1) An image algorithm. [ SQ:12.5]
or signal similarity measure that scales
NTSC: National Television System
the differences between the signals by a
Committee. A television signal
measure of the average signal strength:
recording system used for encoding
2 video data at approximately 60 video
(xi y i )
p Pi fields per second. Used in the USA,
P 2
( i xi )( i yi )
Japan and other countries. [ AJ:4.1]
This scales the difference so that it is
less significant if the inputs are larger. nuclear magnetic resonance
The similarities lie in the range [0,1], (NMR): An imaging technique based
where 0 is most similar. 2) A statistical on magnetic properties of the atomic
cross correlation process where the nuclei. Protons and neutrons within
correlation coefficient is normalized to atomic nuclei generate a magnetic
lie in the range [ 1,1], where 1 is dipole that can respond to an external
most similar. In the case of two scalar magnetic field. Several properties
variables, this means dividing by the related to the relaxation of that
standard deviations of the two magnetic dipole give rise to values that
variables. [ RJS:6] depend on the tissue type, thus allowing
identification or at least visualization of
NOT operator: See invert operator . the different soft tissue types. The
[ SB:3.2.2] measurement of the signal is a way of
measuring the density of certain types
novel view synthesis: A process of atoms, such as hydrogen in the case
whereby a new view of an object is of biological NMR scanners. This
synthesized by combining information technology is used for medical body
from several images of the object from
170 N

scanning, where a detailed 3D including freeform surfaces .

volumetric image can be produced. [ WP:Non-uniform rational B-spline]
Signal levels are highly correlated with
different biological structures so one can Nyquist frequency: The minimum
easily observe different tissues and their sampling frequency for which the
positions. Also called MRI/magnetic underlying true image (or signal) can
resonance imaging. [ FP:18.6] be reconstructed from the samples. If
sampling at a lower frequency, then
NURBS: Non-Uniform Rational aliasing will occur, creating apparent
B-Splines: a type of shape modeling image structure that does not exist in
primitive based on ratios of b-splines . the original image. [ SB:]
Capable of accurately representing a
wide range of geometric shapes Nyquist sampling rate: See
Nyquist frequency . [ SB:]

object: 1) A general term referring to a rectangular solid defined in its local

a group of features in a scene that coordinate system [ JKS:15.3.2]:
humans consider to compose a larger
structure. In vision it is generally
thought of as that to which attention is (L,H,W)
directed. 2) A general system theory H
term, where the object is what is of
interest (unlike the background ).
Resolution or scale may determine
what is considered the object. [ AL:p.
object contour: See
object centered representation: A occluding contour . [ FP:19.2]
model representation in which the
position of the features and components object grouping: A general term
of the model are described relative to meaning the clustering of all of the
the position of the object itself. This image data associated with a distinct
might be a relative description (the observed object. For example, when
nose is 4 cm from the mouth) or might observing a person, object grouping
use a local coordinate system (e.g., the could cluster all of the pixels from the
right eye is at position (0,25,10) where image of the person. [ FP:24.1]
(0,0,0) is the nose.) This contrasts
with, for example, a object plane: In the case of convex
viewer centered representation . Here is simple lenses typically used in
laboratory TV cameras, the object
172 O

plane is the 3D scene plane where all which images are being supplied. See
points are exactly in focus on the also observer motion estimation .
image plane (assuming a perfect lens [ WP:Observer]
and the optical axis is perpendicular to
the image plane). The object plane is observer motion estimation: When
illustrated here: an observer is moving, image data of
[ WP:Microscopy#Oblique illumination] the scene provides optical flow or
trackable scene feature points . These
allow an estimate of how the observer is
moving relative to the scene, which is
useful for navigation control and
position estimation. [ BKPH:17.1]
obstacle detection: Using visual data
object recognition: A general term to detect objects in front of the
for identifying which of several (or observer, usually for mobile robotics
many) possible objects is observed in applications.
an image. The process may also include
computing the objects image or scene Occams razor: An argument
position , or labeling the image pixels attributed to William of Occam
or image features that belong to the (Ockham), an English nominalist
object. [ FP:21.4] philosopher of the early fourteenth
century, stating that assumptions must
object representation: An encoding not be needlessly multiplied when
of an object into a form suitable for explaining something (entia non sunt
computer manipulation. The models multiplicanda praeter necessitatem).
could be geometric models , Often used simply to suggest that,
graph models or appearance models , other conditions being equal, the
as well as other forms. [ JKS:15.3] simplest solution must be preferred.
Notice variant spelling Ockham. See
object verification: A component of also minimum description length .
an object recognition process that [ WP:Occams razor]
attempts to verify a hypothesized
object identity by examining evidence. occluding contour: The visible edge
Commonly, geometric object models of a smooth curved surface as it bends
are used to verify that object features away from an observer . The occluding
are observed in the correct image contour defines a 3D space curve on
positions. [ FP:18.5] the surface, such that a line of sight
from the observer to a point on the
objective function: 1) The cost space curve is perpendicular to the
function used in an optimization surface normal at that point. The 2D
process. 2) A measure of the misfit image of this curve may also be called
between the data and the model. the occluding contour. The contour can
[ SQ:2.3] often be found by an edge detection
oblique illumination: See process. The cylinder boundaries on
low angle illumination . both the left and right are occluding
contours from our viewpoint [ FP:19.2]:
observer: The individual (or camera)
making observations. Most frequently
this refers to the camera system from
O 173

occlusion understanding: A general

term for analyzing scene occlusions
that may include
occluding contour detection ,
occluding contour analysis: A determining the relative depths of the
general term that includes 1) detection surfaces on both sides of an
of the occluding contour , 2) inference occluding contour , searching for
of the shape of the 3D surface at the tee junctions as a cue for occlusion and
occluding contour and 3) determining depth order, etc. [ ERD:7.7]
the relative depth of the surfaces on
both sides of the occluding contour. occupancy grid: A map construction
[ FP:19.2] technique used mainly for autonomous
vehicle navigation. The grid is a set of
occluding contour detection: squares or cubes representing the scene
Determining which of the image edges , which are marked according to
arise from occluding contours . whether the observer believes the
[ FP:19.2] corresponding scene region is empty
(hence navigable) or full. A
occlusion: Occlusion occurs when one probabilistic measure could also be
object lies between an observer and used. Visual evidence from range ,
another object. The closer object binocular stereo or sonar sensors are
occludes the more distant one in the typically used to construct and update
acquired image. The occluded surface is the grid as the observer moves.
the portion of the more distant object [ WP:Occupancy grid mapping]
hidden by the closer object. Here, the
cylinder occludes the more distant brick OCR: See
[ ERD:7.7]: optical character recognition.
[ JKS:2.7]

octree: A volumetric representation in

which 3D space is recursively divided
into eight (hence oct) smaller
volumes by planes parallel to the XY,
YZ, XZ coordinate system planes. A
tree is formed by linking the eight
occlusion recovery: The process of subvolumes to each parent volume.
attempting to infer the shape and Additional subdivision need not occur
appearance of a surface hidden by when a volume contains only object or
occlusion . This recovery helps improve empty space. Thus, this representation
completeness when reconstructing can be more efficient than a pure voxel
scenes and objects for virtual reality . representation. Here are three levels of
This image shows two occluded pipes a pictorial representation of an octree,
and an estimated recovery [ ERD:7.7]: where one octant and the largest
(leftmost) level is expanded to give the
middle figure, and similarly an octant
of the middle [ H. H. Chen, and T. S.
Huang, A Survey of Construction and
Manipulation of Octrees, Computer
174 O

Vision, Graphics and Image Processing, shadows and occlusion .

Vol. 43, pp 409-431, 1988.] : [ WP:Opacity (optics)]

open operator: A
mathematical morphology operator
applied to a binary image . The
operator is a sequence of N erodes
followed by N dilates , both using a
specified structuring element . The
odd field: Standard interlaced video operator is useful for separating
transmits all of the even scan lines in touching objects and removing small
an image frame first and then all of the regions. The right image was created
odd lines. The set of odd lines is the by opening the left image with an
odd field. [ AJ:11.1] 11-pixel disk kernel [ SB:8.15]:

OGorman edge detector: A

parametric edge detector . A
decomposition of the image and model
by orthogonal Walsh function masks
was used to compute the step edge
parameters (contrast and orientation).
One advantage of the parametric model operator: A general term for a
was a goodness of model fit as well as function that is applied to some data in
the edge contrast that increased the order to transform it in some way. For
reliability of the detected edges. example see image processing operator .
[ LG:5]
omnidirectional sensing: Literally,
sensing all directions simultaneously. In opponent color: A
practice, this means using mirrors and color representation system originally
lenses to project most of the developed by Hering in which an image
lines of sight at a point onto a single is represented by three channels with
camera image . The space behind the contrasting colors: RedGreen,
mirrors and camera(s) is typically not YellowBlue, and BlackWhite.
visible. See also catadioptric optics . [ BB:2.2.5]
Here a camera using a spherical mirror
achieves a very wide field of view: optical: A process that uses light and
[ WP:Omnidirectional camera] lenses is an optical process.
[ WP:Optics]

optical axis: The ray, perpendicular

to the lense and through the
optical center , around which the lense
is symmetrical. [ FP:1.1.1]

Focal Point

Optical Axis

opaque: When light cannot pass

through a structure. This causes
O 175

optical center: See focal point . completely determine the image motion,
[ FP:1.2.2] as this has two degrees of freedom. The
equation provides only one constraint,
optical character recognition thus leading to an aperture problem .
(OCR): A general term for extracting [ WP:Optical flow#Estimation of the optical flow]
an alphabetic text description from an
image of the text. Common specialisms
include bank numerals, handwritten optical flow field: The field composed
digits, handwritten characters, cursive of the optical flow vector at each pixel
text, Chinese characters, Arabic in an image. [ FP:25.4]
characters, etc. [ JKS:2.7]
optical flow field segmentation:
optical flow: An instantaneous The segmentation of an optical flow
velocity measurement for the direction image into regions where the optical
and speed of the image data across the flow has a similar direction or
visual field. This can be observed at magnitude. The regions can arise from
every pixel, creating a field of velocity objects moving in different directions
vectors. The set of apparent motions of or surfaces at different depths. See also
the image pixel brightness values. optical flow boundary .
[ FP:25.4]
optical flow region: A region where
optical flow boundary: The the optical flow has a similar direction
boundary between two regions where or magnitude. Regions can arise from
the optical flow is different in direction objects moving in different directions,
or magnitude. The regions can arise or surfaces at different depths. See also
from objects moving in different optical flow boundary .
directions or surfaces at different
depths. See also optical flow smoothness constraint:
optical flow field segmentation . The The constraint that nearby pixels in an
dashed line in this image is the image usually have similar optical flow
boundary between optical flow moving because they usually arise from
left and right: projection of adjacent surface patches
having similar motions relative to the
observer . The constraint can be relaxed
at optical flow boundaries .

optical image processing: An

image processing technique in which
the processing occurs by use of lenses
and coherent light instead of by a
computer. The key principle is that a
coherent light beam that passes
optical flow constraint equation: through a transparency of the target
The equation t + I ~ux = 0 that image and is then focused produces the
links the observed change in image Is Fourier transform of the image at the
intensities over time I focal point where
t at image
position ~x to the spatial change in pixel frequency domain filtering can occur.
intensities at that position I and the A typical processing arrangement is:
velocity ~ux of the image data at that
pixel. The constraint does not
176 O

takes as input two binary images , I1

and I2 , and returns an image I3 in
which the value of each pixel is 0 if
both I1 and I2 are 0, and 1 otherwise.
PLANE SENSOR The rightmost image below shows the
FILTER result of ORing the left and middle
optical transfer function (OTF): figures (note that the white pixels have
Informally, the OTF is a measure of value 1) [ SB:3.2.2]:
how well spatially varying patterns are
observed by an optical system. More
formally, in a 2D image, let X(fh , fv )
and Y (fh , fv ) be the Fourier transforms
of the input x(h, v) and output y(h, v)
images. Then, the OTF of a horizontal
and vertical spatial frequency pair
(fh , fv ) is H(fh , fv )/H(0, 0), where
order statistic filter: A filter based
H(fh , fv ) = Y (fh , fv )/X(fh , fv ). The
on order statistics, a technique that
optical transfer function is usually a
sorts the pixels of a neighborhood by
complex number encoding both the
intensity value, and assigns a rank (the
reduction in signal strength at each
position in the sorted sequence) to
spatial frequency and the phase shift.
each. An order statistics filter replaces
[ SB:5.11]
the central value of the filtering
optics: A general term for the neighborhood with the value at a given
manipulation and transformation of rank in the sorted list. A popular
light and images using lenses and example is the median filter . As this
mirrors . [ JKS:8] filter is less sensitive to outliers, it is
often used in robust statistics
optimal basis encoding: A general processes. See also rank order filter .
technique for encoding image or other [ SEU:3.3.1]
data by projecting onto some basis
functions of a linear space and then ordered texture: See macrotexture.
using the projection coefficients instead [ JKS:7.3]
of the original data. Optimal basis
ordering: Sorting a collection of
functions produce projection
objects by a given property, for
coefficients that allow the best
instance, intensity values in a
discrimination between different classes
order statistic filter . [ SEU:3.3.1]
of objects or members in a class (such
as for face recognition). orientation: The property of being
directed towards or facing a particular
optimization: A general term for
region of space, or of a line; also, the
finding the values of the parameters
pose or attitude of a body in space. For
that maximize or minimize some
instance, the orientation of a vector
quantity. [ BB:11.1.2]
(where the vector points to), specified
optimization parameter estimation: by its unit vector; the orientation of an
See optimization . [ BB:11.1.2] ellipsoid , specified by its
principal directions ; the orientation of
OR operator: A pixelwise logic a wire-frame model, specified by its
operator defined on binary variables. It own reference frame with respect to a
O 177

world reference frame.

[ WP:Orientation (computer vision)] orthographic camera: A camera in
which the image is formed according to
orientation error: The amount of a orthographic projection . [ FP:2.3]
error associated with an orientation
value. orthographic projection: Rendering
of a 3D scene as a 2D image by a set of
orientation representation: See rays orthogonal to the image plane.
pose representation . The size of the objects imaged does not
depend on their distance from the
oriented texture: A texture in which viewer. As a consequence, parallel lines
a preferential direction can be detected. in the scene remain parallel in the
For instance, the direction of the bricks image. The equations of orthographic
in a regular brick wall. See also projections are
texture direction , texture orientation .
x=X y=Y
orthogonal image transform:
Orthogonal Transform Coding is a where x, y are the image coordinates of
well-known class of techniques for image an image point in the camera reference
compression. The key process is the frame (that is, in millimeters, not
projection of the image data onto a set pixels), and X, Y, Z are the coordinates
of orthogonal basis functions. See, for of the corresponding scene point. An
instance, the discrete cosine , Fourier example is seen here [ FP:2.3] :
or Haar transforms. This is a special
case of the linear integral transform.
orthogonal regression: Also known
as total least squares. Traditionally
seen as the generalization of
linear regression to the case where both IMAGE PLANE
x and y are measured quantities and orthoimage: In photogrammetry, the
subject to error. Given samples xi and warp of an aerial photograph to an
yi , the objective is to find estimates of approximation of the image that would
the true points ( xi , yi ), and line have been taken had the camera
parameters (a, b, c) such that pointed directly downwards. See also
axi +P b
yi + c = 0, i, and such that the orthographic projection .
error (xi x i )2 + (yi yi )2 is [ WP:Orthophoto]
minimized. This estimate is easily
obtained as the line (or plane, etc., in orthonormal: A property of a set of
higher dimensions) passing through the basis functions or vectors. If <, > is the
centroid of the data, in the direction of inner product function and a and b are
the eigenvector of the data any two different members of the set,
scatter matrix that has smallest then we have < a, a >=< b, b >= 1 and
eigenvalue. [ WP:Total least squares] < a, b >= 0. [ WP:Orthonormal basis]

orthographic: The characteristic

property of orthographic (or OTF: See optical transfer function .
perpendicular) projection onto the [ SB:5.11]
image plane. See
orthographic projection . [ FP:2.3] outlier: If a set of data mostly
conforms to some regular process or is
178 O

well represented by a model, with the

exception of a few data points, then over-segmented: Describing the
these exception points are outliers. output of a segmentation algorithm.
Classifying points as outliers depends Given an image where a desired
on both the models used and the segmentation result is known, the
statistics of the data. This figure shows algorithm over-segments if the desired
a line fit to some points and an outlying regions are represented by too many
point. [ CS:3.4.6] algorithmically output regions. This
image should be segmented into three
regions but it was oversegmented into
OUTLIER five regions [ SQ:8.7]:


outlier rejection: Identifying outliers

and removing them from the current
process. Identification is often a
difficult process. [ CS:3.4.6]

paired boundaries: See segments. The representation is

paired contours . invariant to rotation and translation.
PGHs can be compared using the
paired contours: A pair of contours Bhattacharyya metric .
occurring together in images and
related by a spatial relationship, for PAL camera: A camera conforming
instance the contours generated by to the European PAL standard (Phase
river banks in aerial images, or the Alternation by Line). See also NTSC ,
contours of a human limb (arm, leg). RS-170 , CCIR camera . [ AJ:4.1]
Co-occurrence can be exploited to make
contour detection more robust. See also palette: The range of colors available.
feature extraction . An example is seen [ NA:2.2]
pan: Rotation of a camera about a
single axis through the camera center
and (approximately) parallel to the
image vertical:
[ WP:Panning (camera)]

pairwise geometric histogram: A

line- or edge-based shape representation
used for object recognition , especially
2D. Histograms are built by computing,
for each line segment, the relative angle
and perpendicular distance to all other
180 P

panoramic image stereo: A stereo

system working with a very large field
of view, say 360 degrees in azimuth and
120 degrees in elevation. Disparity
maps and depths are recovered for the
whole field of view simultaneously. A
normal stereo system would have to be
moved and results registered to achieve
the same result. See also
binocular stereo , multi-view stereo ,
panchromatic: Sensitive to light of all omnidirectional sensing .
visible wavelengths. Panchromatic
images are gray scale images where Pantone matching system (PMS):
each pixel averages light equally over A color matching system used by the
the visible range. printing industry to print spot colors.
Colors are specified by the Pantone
panoramic: Associated with a name or number. PMS works well for
wide field-of-view often created or spot colors but not for process colors,
observed by a panned camera. usually specified by the CMYK color
[ WP:Panoramic photography] model. [ WP:Pantone]

panoramic image mosaic: A class of Panums fusional area: The region

techniques for collating a set of of space within which single vision is
partially overlapping images into a possible (that is, you do not perceive
panoramic, single image. This double images of objects) when the eyes
fixate a given point. [ CS:]

parabolic point: A point on a smooth

surface where the Gaussian curvature
is positive. See also HK segmentation .
[ VSN:9.2.5]

parallax: The angle between the two

is a mosaic build from the frames of a straight lines that join a point (possibly
hand-held camera sequence. Typically, a moving one) to two viewpoints. In
the mosaic yields both very high motion analysis, motion parallax occurs
resolution and large field of view, which when two scene points that project to
cannot be simultaneously achieved by a the same image point at one viewpoint
physical camera. There are several later project to different points as the
ways to build panoramic mosaic, but, in camera moves. The vector between the
general, there are three necessary steps: two new points is the parallax. See
first, determining correspondences (see [ TV:8.2.4]:
stereo correspondence problem )
between adjacent images; second, using
the correspondences to find a warping
transformation between the two images
(or between the current mosaic and a
new image); third, blending the new
image into the current mosaic.
P 181

estimated, for instance, by

least square surface fitting . [ DH:3.1]

parametric edge detector: An

edge detection technique that seeks to
match image data using a
parametric model of edge points and
thus detects edges when the image data
POSITION POSITION fits the edge model well. See
Hueckel edge detector . [ VSN:3.1.3]

parallel processing: An algorithm is parametric mesh: A type of surface

executed in parallel, or through parallel modeling primitive for 3D models in
processing, when it can be divided into which the surface is defined by a mesh
a number of computations that are of points. A typical example is NURBS
performed simultaneously on separate ( non-uniform rational b-splines ).
hardware. See also
single instruction multiple data, parametric model: A mathematical
multiple instruction multiple data, model expressed as function of a set of
pipeline parallelism , task parallelism . parameters, for instance, the
[ BB:10.4.1] parametric equation of a curve or
surface (as opposed to its implicit
parallel projection: A generalization form), or a parametric edge model (see
of orthographic projection in which a parametric edge detector ).
scene is projected onto the image plane [ VSN:3.1.3]
by a set of parallel rays not necessarily
perpendicular to the image plane. This paraperspective: An approximation
is a good approximation of perspective of perspective projection , whereby a
projection, up to a uniform scale factor, scene is divided into parts that are
when the scene is small in comparison imaged separately by
to its distance from the parallel projection with different
center of projection . Parallel projection parameters. [ FP:2.3.1-2.3.3]
is a subset of weak perspective part recognition: A class of
viewing, where the weak perspective techniques for recognizing assemblies or
projection matrix is subject not only to articulated objects from their
orthogonality of the rows of the left subcomponents (parts), e.g., a human
2 3 submatrix, but also to the body from head, trunk, and limbs.
constraint that the rows have equal Parts have been represented by 3D
norm. In orthographic projection, both models like generalized cones ,
rows have unit norm. [ FP:2.3.1] superquadrics , and others. In industrial
parameter estimation: A class of contexts, part recognition indicates the
techniques aimed to estimate the recognition of specific items (parts) in a
parameters of a given production line, typically for
parametric model. For instance, classification and quality control.
assuming that a set of image points lie part segmentation: A class of
on an ellipse, and considering the techniques for partitioning a set of data
implicit ellipse model into components (parts) with an
ax2 + bxy + cy 2 + dx + ey + f , the identity of their own, for instance a
parameter vector [a, b, c, d, e, f ] can be
182 P

human body into limbs, head, and parameters, which is updated via a
trunk. Part segmentation methods dynamical model and observation
exist for both 2D and 3D data, that is, model to produce the new set
intensity images and range images , representing the posterior distribution.
respectively. Various geometric models See also condensation tracking .
have been adopted for the parts, e.g., [ WP:Particle filter]
generalized cylinders , superellipses ,
and superquadrics . See also particle segmentation: A class of
articulated object segmentation . techniques for detecting individual
[ BM:6.2.2] instances of small objects (particles)
like pebbles, cells, or water droplets, in
partially constrained pose: A images or sequences. A typical problem
situation whereby an object is subject is severe occlusion caused by
to a number of constraints restricting overlapping particles. This problem has
the number of admissible orientations been approached successfully with the
or positions, but not fixing one watershed transform .
univocally. For instance, cars on a road
are constrained to rotate around an particle tracking: See
axis perpendicular to the road. condensation tracking

particle counting: An application of Parzen: A Parzen window is a linearly

particle segmentation to counting the increasing and decreasing weighting
instances of small objects (particles) window (triangle-shaped) used to limit
like pebbles, cells, or water droplets, in leakage to spurious frequencies when
images or sequences, such as in this computing the power spectrum of a
image: [ WP:Particle counter] signal:

See also windowing , Fourier transform.

[ DH:4.3]

passive sensing: A sensing process

particle filter: A tracking strategy that does not emit any stimulus or
where the probability density of the where the sensor does not move is
model parameters is represented as a passive. A normal stationary camera is
set of particles. A particle is a single passive. Structured light triangulation
sample of the model parameters, with or a moving video camera are active .
an associated weight. The probability [ VSN:1.1]
density represented by the particles is
typically a set of delta functions or a passive stereo: A passive stereo
set of Gaussians with means at the algorithm uses only the information
particle centers. At each tracking obtainable using a stationary set of
iteration, the current set of particles cameras and ambient illumination.
represents a prior on the model This contrasts with the active vision
P 183

paradigm in stereo , where the and statistical pattern recognition .

camera(s) might move or some [ RJS:6]
projected stimulus might be used to
help solve the PCA: See
stereo correspondence problem . principal component analysis .
[ BM:1.9.2] [ FP:22.3.1]

patch classification: The problem of PDM: See point distribution model.

attributing a surface patch to a [ WP:Point distribution model]
particular class in a shape catalogue,
peak: A general term for when a signal
typically computed from dense range
value is greater than the neighboring
data using curvature estimates or
signal values. An example of a signal
shading . See also
peak measured in one dimension is
curvature sign patch classification ,
when crossing a bright line lying on a
mean and Gaussian curvature
dark surface along a scanline . A
shape classification .
cross-section along a scanline of an
path coherence: A property used in image of a light line on a dark
tracking objects in an image sequence . background might observe the pixel
The assumption is that the object values 7, 45, 105, 54, 7. The peak
motion is mostly smooth in the scene would be at 105. A two dimensional
and thus the observed motion in a example is when observing the image of
projected image of the scene is also a bright spot on a darker background.
smooth. [ JKS:14.6] [ SOS:3.4.5]

path finding: The problem of pedestrian surveillance: See

determining a path with given person surveillance .
properties in a graph, for example, the
pel: See pixel . [ SB:3]
shortest path connecting two given
nodes, or two nodes with given pencil of lines: A bundle of lines
properties. A path is defined as a linear passing through the same point. For
subgraph. Path finding is a example, if p~ is a generic bundle point
characteristic problem of state-space and p~0 the point through which all lines
methods, inherited from symbolic pass, the bundle is
artificial intelligence. See also
graph searching . This term is also used p~ = p~0 + ~v
in the context of dynamic programming
search, for instance applied to the where is a real number and ~v the
stereo correspondence problem. direction of the individual line (both
[ WP:Pathfinding] are parameters). An example is
[ FP:13.1.4]:
pattern grammar: See
shape grammar .
[ WP:Pattern grammar]

pattern recognition: A large research

area concerned with the recognition
and classification of structures,
relations or patterns in data. Classic percentile method: A specialized
techniques include syntactic , structural thresholding technique used for
184 P

selecting the threshold. The method

assumes that the percentage of the
scene that belongs to the desired object
(e.g., a darker object against a lighter
background) is known. The threshold
that selects that percentage of pixels is
used. [ JKS:3.2.1] performance characterization: A
class of techniques aimed to assess the
perception: The process of performance of computer vision systems
understanding the world through the in terms of, for instance, accuracy,
analysis of sensory input (such as precision, robustness to noise,
images). [ DH:1.1] repeatability, and reliability. [ TV:A.1]
perceptron: A computational element
(w~ ~x) that acts on a data vector ~x, perimeter: 1) The perimeter of a
where w ~ is a vector of weights and () binary image is the set of foreground
is the activation function. Perceptrons pixels that touch the background. 2)
are often used for classifying data into The length of the path through those
one of two sets (i.e., if (w~ ~x) 0 or pixels. [ JKS:2.5.6]
(w~ ~x) < 0). See also classification ,
supervised classification , periodicity estimation: The problem
pattern recognition . [ RN:2.4] of estimating the period of a periodic
phenomenon, e.g., given a texture
perceptron network: A multi-layer created by the repetition of a fixed
arrangement of perceptrons , closely pattern, determine the patterns size.
related to the well-known
back-propagation networks. [ RN:2.4] person surveillance: A class of
techniques aimed at detecting, tracking,
perceptual grouping: See counting, and recognizing people or
perceptual organization . [ FP:14.2] their behavior in CCTV videos, for
perceptual organization: A theory security purposes. For examples,
based on Gestalt psychology, centered systems have been reported for the
on the tenet that certain organizations automated surveillance of car parks,
(or interpretations) of visual stimuli are banks, airports and the like. A typical
preferred over others by the human system must detect the presence of a
visual system. A famous example is person, track the persons movement
that a drawing of a wire-frame cube is over time, possibly identify the person
immediately interpreted as a 3D object, using a database of known faces, and
instead of a 2D collection of lines. This classify the persons behavior according
concept has been used in several to a small class of pre-defined behaviors
low-level vision systems, typically to (e.g., normal or anomalous). See also
find groups of low-level features most anomalous behavior detection ,
probably generated by interesting face recognition , and face tracking .
objects. See also grouping and perspective: The rendering of a 3D
Lowes curve segmentation . A more scene as a 2D image according to
complex example is below, where the perspective projection , the key
line of feature endings suggests a characteristic of which is, intuitively,
virtual horizontal line. [ FP:14.2] that the size of the imaged objects
depend on their distance from the
P 185

viewer. As a consequence, the image of projection equation of perspective is

a bundle of parallel lines is a bundle of
lines converging into a point, the X Y
x=f y=f ,
vanishing point . The geometry of Z Z
perspective was formalized by the where x, y are the image coordinates of
master painters of the Italian an image point in the camera reference
Quattrocento and Renaissance. frame (e.g., in millimeters, not pixels), f
[ FP:2.2] is the focal length and X, Y, Z are the
perspective camera: A camera in coordinates of the corresponding scene
which the image is formed according to point. [ FP:1.1.1]
perspective projection . The PET: See
corresponding mathematical model is positron emission tomography .
commonly known as the [ AJ:10.1]
pinhole camera model . An example of
the projection in the perspective phase congruency: The property
camera is [ FP:2.2]: whereby components of the
Fourier transform of an image are
LENS maximally in phase at feature points
PROJECTION like step edges or lines. Phase
AXIS congruency is invariant to image
brightness and contrast and has been
PLANE SCENE therefore used as an absolute measure
of the significance of feature points. See
perspective distortion: A type of also image feature .
distortion in which lines that are [ WP:Phase congruency]
parallel in the real world appear to
phase correlation: A motion
converge in a perspective image. In the
estimation method that uses the
example notice how the train tracks
translation-phase duality property of
appear to converge in the distance.
the Fourier transform , that is, a shift
[ SB:2.3.1]
in the spatial domain is equivalent to a
phase shift in the frequency domain.
When using log-polar coordinates, and
the rotation and scale properties of the
Fourier transform, spatial rotation and
scale can be estimated from the
frequency shift, independent of spatial
translation. See also
planar motion estimation .
[ WP:Phase correlation]
perspective inversion: The problem
of determining the position of a 3D phase matching stereo algorithm:
object from its image. I.e., solving the An algorithm for solving the
perspective projection equations for the stereo correspondence problem by
3D coordinates. See also looking for similarity of the phase of
absolute orientation . [ FP:2.2] the Fourier transform .

perspective projection: Imaging a phase-retrieval problem: The

scene with foreshortening. The problem of reconstructing a signal
186 P

based on only the magnitude (not the sensor, converting light to an electric
phase) of the Fourier transform . signal. [ WP:Photodiode]
[ WP:Phase retrieval]
photogrammetry: A research area
phase spectrum: The concerned with obtaining reliable and
Fourier transform of an image can be accurate measurements from
decomposed into its phase spectrum noncontact imaging, e.g., a digital
and its power spectrum . The phase height map from a pair of overlapping
spectrum is the relative phase offset of satellite images. Consequently, accurate
the given spatial frequency . camera calibration is a primary
[ EH:11.2.1] concern. The techniques used overlap
many typical of image processing and
phase unwrapping technique: The pattern recognition . [ FP:3.4]
process of reconstructing the true phase
shift from phase estimates wrapped photometric invariant: A feature or
into [, ] . The true phase shift characteristic of an image that is
values may not fall in this interval but insensitive to changes in illumination.
instead be mapped into the interval by See also invariant .
addition or subtraction of multiples of
2. The technique maximizes the photometric decalibration: The
smoothness of the phase image by correction of intensities in an image so
adding or subtracting multiples of 2 at that the same surface (at the same
various image locations. See also orientation) will give the same response
Fourier transform . regardless of the position in which it
[ WP:Range imaging#Interferometry] appears in the image.

phis curve (s): A technique for photometric stereo: A technique

representing planar contours . Each recovering surface shape (more
point in the contour is represented by precisely, the surface normal at each
the angle formed by the line through surface point) using multiple images
P and the shapes center (e.g., the acquired from a single viewpoint but
barycentrum or center of mass ) with a under different illumination conditions.
fixed direction, and the distance s from These lead to different
the center to P : reflectance maps, that together
constrain the surface normal at each
point. [ FP:5.4]
photometry: A branch of optics
concerned with the measurement of the
amount or the spectrum of light. In
computer vision, one frequently uses
photometric models expressing the
See also shape representation . amount of light emerging from a
[ BB:8.2.3] surface, be it fictitious, or the surface of
a radiating source, or from an
photo consistency: See illuminated object. A well-known
shape from photo consistency . photometric model is Lamberts law.
[ WP:Photo-consistency] [ WP:Photometry (optics)]
photodiode: The basic element, or
pixel, of a CCD or other solid state
P 187

photon noise: Noise generated by the the smallest directly measured

statistical fluctuations associated with image feature . [ SB:3]
photon counting over a finite time
interval in the CCD or other solid state picture tree: A recursive image and
sensor of a digital camera. Photon noise 2D shape representation in which a
is not independent of the signal, and is tree data structure is used. Each node
not additive. See also image noise , in the tree represents a region that is
digital camera . [ WP:Image noise] then decomposed into subregions.
These are represented by child nodes.
photopic response: The The figure below shows a segmented
sensitivity-wavelength curve modeling image with four regions (left) and the
the response of the human eye to corresponding picture tree.
normal lighting conditions. In such [ JKS:3.3.4]
conditions, the cones are the
photoreceptors on the retina that best
respond to light. Their response curve
peaks at 555 nm, indicating that the
eye is maximally sensitive to
green-yellow colors in normal lighting
conditions. When light intensity is very
low, the rods determine the eyes *
response, modeled by the scotopic
curve, which peaks near to 510 nm. C
[ AJ:3.2] B A B
photosensor spectral response:
The spectral response of a photosensor
characterizing the sensors output as a
function of the input lights spectral
frequency. See also Fourier transform ,
frequency spectrum ,
spectral frequency.
[ WP:Frequency spectrum]
piecewise rigidity: The property of
physics based vision: An area of an object or scene that some of its
computer vision seeking to apply parts, but not the object or scene as a
physics laws or methods (of optics , whole, are rigid. Piecewise rigidity can
surfaces, illumination, etc.) to the be a convenient assumption, e.g., in
analysis of images and videos. motion analysis.
Examples include
polarization based methods , in which pincushion distortion: A form of
physical properties of the scene surfaces radial lens distortion where image
are estimated via estimates of the state points are displaced away from the
of polarization of the incoming light, center of distortion by an amount that
and the use of detailed radiometric increases with the distance to the
models of image formation. center. A straight line that would have
been parallel to an image side is bowed
picture element: A pixel . It is an towards the center of the image. This is
indivisible image measurement. This is the opposite of barrel distortion .
[ EH:6.3.1]
188 P

requires the following steps

a1 = A(x1 )
y1 = B(a1 )
a2 = A(x2 )
y2 = B(a2 )
a3 = A(x3 )
ai = A(xi )
yi = B(ai )....

However, notice that we compute yi

pinhole camera model: The just after yi1 , so the computation can
mathematical model for an ideal be arranged as
perspective camera formed by an image
plane and a point aperture, through a1 = A(x1 )
which all incoming rays must pass. For a2 = A(x2 ) y1 = B(a1 )
equations, see perspective projection . a3 = A(x3 ) y2 = B(a2 )
This is a good model for simple convex
lens camera, where all rays pass ....
through the virtual pinhole at the focal ai+1 = A(xi+1 ) yi = B(ai )
point. [ FP:1.1] ....

where steps on the same line may be

computed concurrently as they are
independent. The output values yi
AXIS therefore arrive at a rate of one every
cycle rather than one every two cycles
PLANE SCENE without pipelining. The pipeline
process can be visualized as:

pink noise: Noise that is not white , xi+1 ai yi1

i.e., when there is a correlation between
the noise at two pixels or at two times.
[ WP:Pink noise] pit: 1) A general term for when a
signal value is lower than the
pipeline parallelism: Parallelism neighboring signal values. Unlike signal
achieved with two or more, possibly peaks , pits usually refer to two
dissimilar, computation devices. The dimensional images . For example, a pit
non-parallel process comprises steps A occurs when observing the image of a
and B, and will operate on a sequence dark spot on a lighter background. 2)
of items xi , i > 0, producing outputs yi . A local point-like concave shape defect
The result of B depends on the result of in a surface.
A, so a sequential computer will
compute ai = A(xi ); yi = B(ai ); for pitch: A 3D rotation representation
each i. A parallel computer cannot (along with yaw and roll ) often used
compute ai and yi simultaneously as for cameras or moving observers. The
they are dependent, so the computation pitch component specifies a rotation
P 189

about a horizontal axis to give an image segmentation ,

u-p-down change in orientation. This supervised classification , and
figure shows the pitch rotation direction clustering. This image shows the pixels
[ JKS:12.2.1]: of the left image classified into four
classes denoted by the four different
shades of gray [ VSN:3.3.1]:


pixel: The intensity values of a digital

image are specified at the locations of a
discrete rectangular grid; each location
is a pixel. A pixel is characterized by
its coordinates (position in the image) pixel connectivity: The pattern
and intensity value (see intensity and specifying which pixels are considered
intensity image ). Values can express neighbors of a given one (X) for the
physical quantities other than intensity purposes of computation. Common
for different kinds of images, as in, e.g., connectivity schemes are
infrared imaging . In physical terms, a 4 connectedness and 8 connectedness ,
pixel is the photosensitive cell on the as seen in the left and right images here
CCD or other solid state sensor of a [ SB:4.2]:
digital camera. The CCD pixel has a
precise size, specified by the
manufacturer and determining the
CCDs aspect ratio . See also
intensity sensor and
photosensor spectral response . [ SB:3]

pixel addition operator: A low-level

pixel coordinates: The coordinates of
image processing operator taking as
a pixel in an image. Normally these are
input two gray scale images, I1 and I2 ,
the row and column position.
and returning an image I3 in which the
[ JKS:12.1]
value of each pixel is I3 = I1 + I2 . This
figure shows at the right the sum of the pixel coordinate transformation:
two images at the left (the sum divided The mathematical transformation
by 2 to rescale to the original intensity linking two image reference frames ,
level) [ SB:3.2.1]: specifying how the coordinates of a
pixel in one reference frame are
obtained from the coordinate of that
pixel in the other reference frame. One
linear transformation can be specified
by i1 = ai2 + bj2 + e
j1 = ci2 + dj2 + f
pixel classification: The problem of where the coordinates of p~2 = (i2 , j2 )
assigning the pixels of an image to are transformed into p~1 = (i1 , j1 ). In
certain classes. See also matrix form, p~1 = A~ p2 + ~t, with
190 P

a b intensity values. See also intensity ,
A= a rotation matrix and
c d intensity image , and intensity sensor .

~t = e
a translation vector. See pixel interpolation: See
also Euclidean , affine and image interpolation . [ WP:Pixelation]
holography transforms .
pixel jitter: A frame grabber must
pixel counting: A simple algorithm to estimate the pixel sampling clock of a
determine the area of an image region digital camera, i.e., the clock used to
by counting the numbers of pixels read out the pixel values, which is not
composing the region. See also region . included in the output signal of the
[ WP:Simulation cockpit#Aircraft Simpits]camera. Pixel jitter is a form of
image noise generated by time
variations in the frame grabbers
pixel division operator: An operator estimate of the cameras clock.
taking as input two gray scale images,
I1 and I2 , and returning an image I3 in pixel logarithm operator: An
which the value of each pixel is image processing operator taking as
I3 = I1 /I2 . input one gray scale image, I1 , and
returning an image I2 in which the
pixel exponential operator: A value of each pixel is
low-level image processing operator I2 = c logb (| I1 + 1 |). This operator is
taking as input one gray scale image, used to change the dynamic range of
I1 , and returning an image I2 in which an image (see also
the value of each pixel is I2 = cbI1 . contrast enhancement ), such as for the
This operator is used to change the enhancement of the magnitude of the
dynamic range of an image. The value Fourier transform . The base b of the
of the basis b depends on the desired logarithm function is often e, but it
degree of compression of the dynamic does not actually matter because the
range. c is a scaling factor. See also relationship between logarithms of any
logarithmic transformation , two bases is only one of scaling . See
pixel logarithm operator . The right also pixel exponential operator . The
image is 1.005 raised to the pixel values right image is the scaled logarithm of
of the left image: the pixel values of the left image
[ SB:3.3.1]:

pixel gray scale resolution: The pixel multiplication operator: An

number of different gray levels that can image processing operator taking as
be represented in a pixel, depending on input two gray scale images, I1 and I2 ,
the number of bits associated with each and returning an image I3 in which the
pixel. For instance, an 8-bit pixel (or value of each pixel is I3 = I1 I2 . The
image) can represent 28 = 256 different right image is the product of the left
P 191

and middle images (scaled by 255 for planar patch extraction: The
contrast here) [ SB:]: problem of finding planar regions, or
patches, most commonly in
range images . Plane extraction can be
useful, for instance, in
3D pose estimation , as several
model-based matching techniques yield
higher accuracy with planar than
non-planar surfaces.

planar patches: See

pixel subsampling: The process of surface triangulation .
producing a smaller image from a given planar projective transformation:
one by including only one pixel out of See homography . [ HZ:1.3]
every N . Subsampling is rarely applied
this literally, however, as severe aliasing planar rectification: A class of
is introduced; scale space filtering is rectification algorithms projecting the
applied instead. original images onto a plane parallel to
the baseline of the cameras. See also
pixel subtraction operator: A stereo and stereo vision .
low-level image processing operator
taking as input two gray scale images, planar scene: 1) When the depth of a
I1 and I2 , and returning an image I3 in scene is small with respect to its
which the value of each pixel is distance from the camera, the scene can
I3 = I1 I2 . This operator implements be considered planar, and useful
the simplest possible change detection approximations can be adopted; for
algorithm. The right image (with 128 instance, the transformation between
added) is the middle image subtracted two views taken by a
from the left image [ SB:3.2.1]: perspective camera is a homography .
See also planar mosaic. 2) When all of
the surfaces in a scene are planar, e.g.,
a blocksworld scene.

plane: The locus of all points ~x such

that the surface normal ~n of the plane
planar facet model: See and a point in the plane p~ satisfy the
surface mesh . [ JKS:13.5] relation (~x p~) ~n = 0. In 3D space, for
instance, a plane is defined by two
planar mosaic: A vectors and a point lying on the plane,
panoramic image mosaic of a planar so that the planes parametric equation
scene. If the scene is planar, the is
transformation linking different views is p~ = a~u + b~v + p~0 ,
a homography .
where p~ is the generic plane point,
planar motion estimation: A class ~u, ~v , p~0 are the two vectors and the
of techniques aiming to estimate the point defining the plane, respectively.
motion parameters of bodies moving on The implicit equation of a plane is
a planes in space. See also ax + by + cz + d = 0, where [x, y, z] are
motion estimation . [ HZ:18.8] the coordinates of the generic plane
point. In vector form, p~ ~n = d, where
192 P

p~ = [x, y, z], ~n = [a, b, c] is a vector

perpendicular to the plane, and d is plenoptic function representation:
A parameterized function for describing
the distance of the plane from the everything that is visible from a given
origin. All of these definitions are point in space, a fundamental
equivalent. [ JKS:13.3.1] representation in
plane conic: Any of the curves defined image based rendering . [ FP:26.3]
by the intersection of a plane with a 3D Plessey corner finder: A well-known
double cone, namely ellipse, hyperbola corner detector also known as
and parabola. Two intersecting lines Harris corner detector , based on the
and a single point represent degenerate local autocorrelation of first-order
conics, defined by special configurations image derivatives. See also
of the cone and plane. The implicit feature extraction .
equation of a conic is [ WP:Corner detection#The Harris .26 Stephens .2F Plessey .2F Shi-
ax2 + bxy + cy 2 + dx + ey + f = 0. See Tomasi corner detection algorithm]
also conic fitting . This figure shows an
ellipse formed by intersection
[ JKS:6.6]: Plucker line coordinates: A
representation of lines in projective 3D
space. A line is represented by six
numbers (l12 , l13 , l14 , l23 , l24 , l34 ) that
must satisfy the constraint that
l12 l34 + l13 l24 + l14 l23 = 0. The numbers
are the entries of the Pl ucker matrix, L,
for the line. For any two points A, B on
the line, L is given by
lij = Ai Bj Bi Aj . The pencil of
planes containing the line are the
nullspace of L. The six numbers may
also be seen as a pair of 3-vectors, one a
point ~a on the line, one the direction ~n
plane projective transfer: An with ~a ~n = 0. [ OF:2.5.1]
algorithm based on
projective invariants that, given two PMS: See Pantone matching system .
images of a planar object, I1 and I2 , [ WP:Pantone]
and four feature correspondences, point: A primitive concept of
determines the position of any other Euclidean geometry, representing an
point of I1 in I2 . Interestingly, no infinitely small entity. In computer
knowledge of the scene or of the vision, pixels are regarded as image
imaging systems parameters is points, and one speaks of points in the
necessary. scene as positions in the 3D space
plane projective transformation: observed by the cameras.
The linear transformation between the [ WP:Point (geometry]
coordinates of two projective planes, point distribution model (PDM):
also known as homography . See also A shape representation for flexible 2D
projective geometry , projective plane , contours. It is a type of
and projective transformation . deformable template model and its
[ FP:18.4.1]
P 193

parameters can be learned by point of extreme curvature: A

supervised learning . It is suitable for point where the curvature achieves an
2D shapes that undergo general but extremum, that is, a maximum or a
correlated deformations or variations, minimum. This figure shows one of each
such as component motion or shape type circled: [ WP:Vertex (geometry)]
variation. For instance, fronto-parallel
images of leaves, fish or human hands, MINIMA
resistors on a board, people walking in
surveillance videos, and the like. The
shape variations of the contour in a
series of examples are captured by
principal component analysis .
[ WP:Point distribution model]

point feature: An image feature that

occupies a very small portion of an
image, ideally one pixel, and is therefore
local in nature. Examples are corners
(see corner detection ) or edge pixels. MAXIMA
Notice that, although point features point sampling: Selection of discrete
occupy only one pixel, they require a points of data from a continuous signal.
neighborhood to be defined; for For example a digital camera samples
instance, an edge pixel is characterized a continuous image function into a
by a sharp variation of image values in digital image .
a small neighborhood of the pixel. [ WP:Sampling (signal processing)]
point invariant: A property that 1) point similarity measure: A
can be measured at a point in an image function measuring the similarity of
and 2) is invariant to some image points (actually small
transformation. For instance, the ratio neighborhoods to include sufficient
of a pixels observed intensity to that of information to characterize the image
its brightest neighbor is invariant to location), for instance cross correlation,
changes in illumination. Another SAD (sum of absolute differences), or
example: the magnitude of the gradient SSD (sum of squared differences).
of intensity at a point is invariant to
translation and rotation. (Both of these point source: A point light source .
examples assume ideal images and An ideal illumination source in which
observation.) all light comes from a single spatial
point. The alternative is an
point light source: A point-like extended light source . The assumption
light source , typically radiating energy of being a point source allows easier
radially, whose intensity decreases as interpretation of shading and shadows ,
r 2 , where r is the distance to the etc. [ FP:5.2.2]
source. [ FP:5.2.2]
point spread function: The response
point matching: A class of algorithms of a 2D system or filter to an input
solving the matching or correspondence Dirac impulse. The response is typically
problem for point features . spread over a region surrounding the
point of application of the impulse,
hence the name. Analogous to the
194 P

impulse response of a 1D system. See polycurve: A simple curve C that is

also filter , linear filter . [ FP:7.2.2] smooth everywhere but at a finite set of
points, and such that, given any point
polar coordinates: A system of P on C, the tangent to C converges to
coordinates specifying the position of a a limit approaching P from each
point P in terms of the direction of the direction. Computer vision shape
line through P and the origin, and the models often describe boundary shapes
distance from P to the origin along using polycurve models consisting of a
that line. For example, the sequence of curved or straight
transformation between polar (r, ) and segments, such as in this example using
Cartesian coordinates (x, y) in the four circular arcs. See also polyline .
plane is given by xp = r cos and
y = r sin , or r = x2 + y 2 and
= atan( xy ) . [ BB:A1.1.2]

polar rectification: A rectification

algorithm designed to cope with any
camera geometry in the context of
uncalibrated vision, re-parameterizing
the images in polar coordinates around
the epipoles .
polygon: A closed, piecewise linear,
polarization: The characterizing 2D contour. Squares, rectangles and
property of polarized light . [ EH:8] pentagons are examples of regular
polygons, where all sides have equal
polarized light: Unpolarized light length and all angles formed by
results from the nondeterministic contiguous sides are equal. This does
superposition of the x and y not hold for a general polygon.
components of the electric field. [ WP:Polygon]
Otherwise, the light is said to be
polarized, and the tip of the electric polygon matching: A class of
field evolves on an ellipse (elliptically techniques for matching polygonal
polarized light). Light is often partially shapes. See polygon .
polarized, that is, it can be regarded as
the sum of completely polarized and polygonal approximation: A
completely unpolarized light. In polyline approximating a curve. This
computer vision, polarization analysis is circular arc is (badly) approximated by
an area of physics based vision , and the polyline [ BB:8.2]:
has been used for metaldielectric
discrimination, surface reconstruction ,
fish classification, defect detection, and
in structured light triangulation .
[ EH:8]

polarizer: A device changing the state polyhedron: A 3D object with planar

of polarization of light to a specific faces, a 3D polygon. A subset of R3
polarized state, for example, producing whose boundary is a subset of finitely
linearly polarized light in a given plane. many planes. The basic primitive of
[ EH:8.2] many 3D modeling schemes, as many
hardware accelerators process polygons
P 195

particularly quickly. A tetrahedron is thereof. Often the term means finding

the simplest polyhedron [ DH:12.4]: the transformation that aligns a
geometric model with the image data.
Several techniques exist for this
purpose. See also alignment ,
model registration ,
orientation estimation , and
rotation representation .
[ WP:Pose (computer vision)#Pose Estimation]

polyline: A piecewise linear contour. pose representation: The problem of

If closed, it becomes a polygon . See representing the angular position, or
also polycurve , contour analysis and pose, of an object (especially 3D) in a
contour representation . [ JKS:6.4] given reference frame. A common
pose: The location and orientation of representation is the rotation matrix ,
an object in a given reference frame, which can be parameterized in different
especially a world or camera reference ways, e.g., Euler angles , pitch -, yaw -,
frame. A classic problem of computer roll -angles, rotation angles around the
vision is pose estimation . [ SQ:4.2.2] coordinate axes, axis-angle , and
quaternions . See also
pose clustering: A class of algorithms orientation estimation and
solving the pose estimation problem rotation representation .
using clustering techniques (see
clustering/cluster analysis ). See also position: Location in space (either 2D
pose , k-means clustering . [ FP:18.3] or 3D ). [ WP:Position (vector)]

pose consistency: An algorithm position dependent brightness

seeking to establish whether two shapes correction: A technique seeking to
are equivalent. Given two sets of points counteract the brightness variation
G1 and G2 , for example, the algorithm caused by a real imaging system,
finds a sufficient number of point typically the fact that brightness
correspondences to determine a decreases as one moves away from the
transformation T between the two sets, optical axis in a lens system with finite
then applies T to all other points of G1 . aperture . This effect may be noticeable
If the transformed points are close to only in the periphery of the image. See
points in G2 , consistency is satisfied. also lens .
Also known as viewpoint consistency. position invariant: Any property
See also feature point correspondence . that does not vary with position. For
[ FP:18.2] instance, the length of a 3D line
pose determination: See segment is invariant to the lines
pose estimation . position in 3D space, but the length of
[ WP:Pose (computer vision)#Pose Estimation]the lines projection on the image plane
is not. See also invariant .

pose estimation: The problem of positron emission tomography

determining the orientation and (PET): A medical imaging method
translation of an object, especially a that can measure the concentration and
3D one, from one or more images
196 P

movement of a positronemitting deviation of the measurements. See also

isotope in living tissue. [ AJ:10.1] accuracy . 2) The number of significant
bits in a floating point or double
postal code analysis: A set of image precision number that lie to the right of
analysis techniques concerned with the decimal point.
understanding written or printed postal [ WP:Accuracy and precision]
codes. See handwritten and
optical character recognition . predictive compression method: A
[ WP:Handwriting recognition] class of image compression algorithms
using redundancy information, mostly
posture analysis: A class of correlation, to build an estimate of a
techniques aiming to estimate the pixel value from values of neighboring
posture of an articulated body, for pixels. [ WP:Linear predictive coding]
instance a human body (e.g., pointing,
sitting, standing, crouching, etc.). pre-processing: Operations on an
[ WP:Motion analysis#Applications] image that, for example, suppress some
distortion(s) or enhance some
potential field: A mathematical feature(s). Examples include
function that assigns some (usually geometric transformations ,
scalar) value at every point in some edge detection , image restoration , etc.
space. In computer vision and robotics, There is no clear distinction between
this is usually a measure of some scalar image pre-processing and
property at each point of a 2D or 3D image processing .
space or image, such as the distance [ WP:Data Pre-processing]
from a structure. The representation is
used in path planning, such that the Prewitt gradient operator: An
potential at every point indicates, for edge detection operator based on
example, the ease/difficulty of getting template matching . It applies a set of
to some destination. [ DH:5.11] convolution masks, or kernels (see
Prewitt kernel ), implementing
power spectrum: In the context of matched filters for edges at various
computer vision , normally the amount (generally eight) orientations. The
of energy at each spatial frequency . magnitude (or strength) of the edge at
The term could also refer to the a given pixel is the maximum of the
amount of energy at each light responses to the masks. Alternatively,
frequency. Also called the power some implementations use the sum of
spectrum density function or spectral the absolute value of the responses from
density function. [ AJ:11.5] the horizontal and vertical masks.
[ JKS:5.2.3]
precision: 1) The repeatability of the
accuracy of a vision system (in general, Prewitt kernel: The mask used by
of an instrument) over many measures the Prewitt gradient operator . The
carried out in the same conditions. horizontal and vertical masks are
Typically measured by the standard [ JKS:5.2.3]:
deviation of a target error measure. For
instance, the precision of a vision
system measuring linear size would be
assessed by taking thousands of
measurements of a perfectly known
object and computing the standard
P 197

and a vector y of projection weights:

~y = A(~x

so that
~x = A1 ~y +
Usually only a subset of the
components of ~y is sufficient to
approximate ~x. The elements of this
subset correspond to the largest
primal sketch: A representation for
eigenvalues of the covariance matrix.
early vision introduced by Marr ,
See also
focusing on low-level features like
KarhunenLo`eve transformation .
edges. The full primal sketch groups
[ FP:22.3.1]
the information computed in the raw
primal sketch (consisting largely of principal component basis space:
edge, bar , end and blob feature In principal component analysis , the
information extracted from the images), space generated by the basis formed by
for instance by forming the eigenvectors, or eigendirections, of
subjective contours . See also the covariance matrix.
MarrHildreth edge detection and [ WP:Principal component analysis]
raw primal sketch . [ RN:7.2]
principal component
primary color: A color coding scheme representation: See
whereby a range of perceivable colors principal component analysis .
can be made by a weighted combination [ FP:22.3.1]
of primary colors. For example, color
television and computer screens use principal curvature: The maximum
red, green and blue lightemitting or minimum normal curvature at a
chemicals to produce these three surface point, achieved along a
primary colors. The ability to use only principal direction . The two principal
three colors to generate all others arises curvatures and directions, together
from the tri-chromacy of the human completely specify the local surface
eye, which has cones that respond to shape. The principal curvatures in the
three different color spectral ranges. two directions at the point X on the
See also additive and subtractive cylinder of radius r below are 0 (along
color. [ EH:4.4] axis) and 1r (across axis). [ JKS:13.3.2]

principal component analysis

(PCA): A statistical technique useful
for reducing the dimensionality of data, X
at the basis of many computer vision PRINCIPAL DIRECTIONS
techniques (e.g.,
point distribution models and
eigenspace based recognition ). In
essence, the deviation of a random
vector, ~x, from the population mean, , principal curvature sign class: See
can be expressed as the product of A, mean and Gaussian
the matrix of eigenvectors of the curvature shape classification .
covariance matrix of the population,
198 P

principal direction: The direction in in essence an acyclic graph in which

which the normal curvature achieves nodes represents variables and directed
an extremum, that is, a arcs represent cause and effect. A
principal curvature . The two principal probabilistic causal model is a causal
curvatures and directions, together, graph with the probability distribution
specify completely the local surface of each variable conditional to its
shape. The principal directions at the causes.
point X on the cylinder below are
parallel to the axis and around the probabilistic Hough transform:
cylinder. [ FP:19.1.2] The probabilistic Hough transform
computes an approximation to the
Hough transform by using only a
percentage of the image data. The goal
X is to reduce the computational cost of
PRINCIPAL DIRECTIONS the standard Hough transform. A
threshold effect has been observed so
that if the percentage sampled is above
the threshold level then few false
positives are detected.
principal point: The point at which [ WP:Randomized Hough Transform]
the optical axis of a
pinhole camera model intersects the probabilistic model learning: A
image plane , as in [ JKS:12.9]: class of Bayesian learning algorithms
based on probabilistic networks, that
allow you to input information at any
node (unlike neural networks), and
AXIS associate uncertainty coefficients to
classification answers. See also
PLANE SCENE Bayes rule , Bayesian model ,
Bayesian network .
principal texture direction: An [ WP:Bayesian probability]
algorithm identifying the direction of a
texture . A directional or probabilistic principal component
oriented texture in a small image patch analysis: A technique defining a
generates a peak in the probability model for
Fourier transform . To determine the principal component analysis (PCA).
direction, the Fourier amplitude plot is The model can be extended to
regarded as a distribution of physical mixture models , trained using the
mass, and the minimum-inertia axis expectation maximization (EM)
identified. algorithm. The original data is modeled
as being generated by the
privileged viewpoint: A viewpoint reduced-dimensionality subset typical of
where small motions cause image PCA plus Gaussian noise (called a
features to appear or disappear. This latent variable model).
contrasts with a generic viewpoint . [ WP:Nonlinear dimensionality reduction#Gaussian-
process latent variable models]
probabilistic causal model: A
representation used in artificial
intelligence for causal models. The probabilistic relaxation: A method
simplest causal model is a causal graph, of data interpretation in which local
P 199

inconsistencies act as inhibitors and knowledge). A classic example is the

local consistencies act as excitors. The production system . In contrast,
hope is that the combination of these declarative representations encode how
two influences constrains the an entity is structured. [ RJS:7]
Procrustes analysis: A method for
probabilistic relaxation labeling: comparing two data sets through the
An extension of relaxation labeling in minimization of squared errors, by
which each entity to be labeled, for translation, rotation and scaling.
instance each image feature, is not [ WP:Procrustes analysis]
simply assigned to a label, but to a set
of probabilities, each giving the production system: 1) An approach
likelihood that the feature could be to computerized logical reasoning,
assigned a specific label. [ BM:2.9] whereby the logic is represented as a set
of production rules. A rule is of the
probability: A measure of the form LHSRHS. This states that if
confidence one may have in the the pattern or set of conditions encoded
occurrence of an event, on a scale from in the left-hand side (LHS) are true or
0 (impossible) to 1 (certain), and hold, then do the actions specified in
defined as the proportion of favorable the right-hand side (RHS), which may
outcomes to the total number of simply be the assertion of some
possibilities. For instance, the conclusion. A sample rule might be If
probability of getting any number from the number of detected edge fragments
a dice in a single throw is 61 . is less than 10, then decrease the
Probability theory, an important part threshold by 10%. 2) An industrial
of statistics, is the basis of several system that manufactures some
vision techniques. [ WP:Probability] product. 3) A system that is to be
actually used, as compared to a
probability density estimation: A demonstration system. [ RJS:7]
class of techniques for estimating the
density function or its parameters given profiles: A shape signature for image
a sample from a population. A related regions, specifying the number of pixels
problem is testing whether a particular in each column (vertical profile) or row
sample has been generated by a process (horizontal profile). Used in
characterized by a particular pattern recognition. See also shape ,
probability distribution. Two common shape representation . [ SOS:4.9.2]
tests are the goodness-of-fit and the
KolmogorovSmirnov tests. The former progressive image transmission: A
is a parametric test best used with method of transmitting an image in
large samples; the latter gives good which a low-resolution version is first
results with smaller samples, but is a transmitted, followed by details that
non-parametric test and, as such, does allow progressively higher resolution
not produce estimates of the population versions to be recreated.
parameters. See also
non-parametric method . [ VSN:A2.2]

procedural representation: A class

of representations used in artificial
intelligence that are used to encode how
to perform a task (procedural
200 P

projective geometry: A field of

geometry dealing with projective spaces
and their properties. A projective
geometry is one where only properties
preserved by projective transformations
are defined. Projective geometry
provides a convenient and elegant
FIRST BETTER BEST theory to model the geometry of the
IMAGE IMAGE IMAGE common perspective camera . Most
notably, the perspective projection
progressive scan camera: A camera equations become linear. [ FP:13.1]
that transfers an entire image in the
order of left-to-right, top-to-bottom, projective invariant: A property, say
without the alternate line interlacing I, that is not affected by a
used in television standards. This is projective transformation . More
much more convenient for machine specifically, assume an invariant, I(P~ ),
vision and other computer-based of a geometric structure described by a
applications. parameter vector P~ . When the
[ WP:Digital video#Technical overview] structure is subject to a projective
transformation (M) this gives a
structure with parameter vector p~, and
projection: 1) The transformation of
I(P~ ) = I(~
p). The most fundamental
a geometric structure from one space to
projective invariant is the cross ratio .
another, e.g., the projection of a 3D
In some applications, invariants of
point onto the nearest point in a given
weight w occur, which transform as
plane. The projection may be specified
p) = I(P~ )(det M)w . [ TV:10.3.2]
by a linear function, i.e., for all points
p~ in the initial structure, the points p~ projective plane: A plane, usually
in the projected structure are given by denoted by P 2 , on which a
p~ = M~ p for some matrix M. projective geometry is defined.
Alternatively, the projection need not [ TV:A.4]
be linear, e.g., p~ = f~(~
p). 2) The
specific case of projection of a scene projective reconstruction: The
that creates an image on a plane by use problem of reconstructing the geometry
of, for example, a perspective camera , of a scene from a set or sequence of
according to the rules of perspective . images in a projective space. The
[ VSN:2.1] transformation from projective to
Euclidean coordinates is easy if the
projection matrix: The matrix Euclidean coordinates of the five points
transforming the homogeneous in a projective basis are known. See
projective coordinates of a 3D scene also projective geometry and
point (x, y, z, 1) into the pixel projective stereo vision .
coordinates (u, v, 1) of the points image [ WP:Fundamental matrix (computer vision)#Projective Reconstruction
in a pinhole camera. It can be factored
as the product of the two matrices of
the intrinsic camera parameters and projective space: A space of
extrinsic camera parameters . See also (n + 1)-dimensional vectors, usually
camera coordinates , image coordinates, denoted by P n , on which a
scene coordinates . [ FP:2.2-2.3]
P 201

projective geometry is defined. prototype: An object or model serving

[ FP:13.1.1] as representative example for a class,
capturing the defining characteristics of
projective stereo vision: A class of the class. [ WP:Prototype]
stereo algorithms based on
projective geometry . Key concepts proximity matrix: A matrix M
expressed elegantly by the projective occurring in cluster analysis . M(i, j)
framework are epipolar geometry , denotes the distance (e.g., the
fundamental matrix , and Hamming distance ) between clusters i
projective reconstruction . and j.

projective stratum: A layer in the pseudocolor: A way of assigning a

stratification of 3D geometries. Moving color to pixels that is based on an
from the simplest to the most complex, interpretation of the data rather than
we have the projective, affine, metric the original scene color. The usual
and Euclidean strata. See also purpose of pseudocoloring is to label
projective geometry , image pixels in a useful manner. For
projective reconstruction . example, one common pseudocoloring
assigns different colors according to the
projective transformation: Also local surface shape class . A
known as projectivity, from one pseudocoloring scheme for aerial or
projective plane to another. It can be satellite images of the earth assigns
represented by a non-singular 3 3 colors according to the land type, such
matrix acting on as water, forest, wheat field, etc.
homogeneous coordinates . The [ JKS:7.7]
transformation has eight
degrees of freedom , as only the ratio of PSF: See point spread function .
projective coordinates is significant. [ FP:7.2.2]
[ FP:2.1.2]
purposive vision: An area of
property based matching: The computer vision linking perception with
process of comparing two entities (e.g., purposive action; that is, modifying the
image features or patterns) using their position or parameters of an imaging
properties, e.g., the moments of a system purposively, so that a visual
region. See also classification , task is facilitated or made possible.
boundary property , metric property . Examples include changing the lens
parameters so to obtain information
property learning: A class of about depth , as in depth from defocus,
algorithms aiming at learning and or moving around an object to achieve
characterizing attributes of full shape information.
spatio-temporal patterns. For example,
learning the color and texture pyramid: A representation of an
distributions that differentiate beween image including information at several
normal and cancerous cells. See also spatial scales . The pyramid is
boundary property , metric property, constructed by the original image
unsupervised learning and (maximum resolution) and a
supervised learning . scale operator that reduces the content
[ WP:Supervised learning] of the image (e.g., a Gaussian filter) by
discarding details at coarser scales:
202 P

Gaussian pyramid , Laplacian pyramid ,

64x64 pyramid transform . [ JKS:3.3.2]

pyramid architecture: A computer

architecture supporting pyramid-based
processing, typically occurring in the
context of multi-scale processing. See
also scale space , pyramid ,
256x256 image pyramid , Laplacian pyramid ,
Gaussian pyramid . [ JKS:3.3.2]

pyramid transform: An operator for

Applying the operator and subsampling building a pyramid from an image. See
the resulting image leads to the next pyramid , image pyramid ,
(lower-resolution) level of the pyramid. Laplacian pyramid , Gaussian pyramid .
See also scale space , image pyramid , [ JKS:3.3.2]

QBIC: See query by image content .

[ WP:Content- quadric: A surface defined by a
second-order polynomial. See also
based image retrieval#Other query methods]
conic. [ FP:2.1.1]

quadratic variation: 1) Any function quadric patch: A quadric surface

(here, expressing a variation of some defined over a finite region of the
variables) that can be modeled by a independent variables or parameters;
quadratic polynomial. 2) The specific for instance, in range image analysis, a
measure of surface shape deformation part of a range surface that is well
2 2 2
fxx + 2fxy + fyy of a surface f (x, y). approximated by a quadric (e.g., an
This measure has been used to elliptical patch). [ WP:Quadric]
constrain the smoothness of
quadric patch extraction: A class of
reconstructed surfaces. [ BKPH:8.2]
algorithms aiming to identify the
quadrature mirror filter: A class of portions of a surface that are well
filters occurring in wavelet and image approximated by quadric patches .
compression filtering theory. The filter Techniques are similar to those applied
splits a signal into a high pass for conic fitting . See also
component and a low pass component, surface fitting,
with the low pass components transfer least square surface fitting .
function a mirror image of that of the
quadrifocal tensor: An algebraic
high pass component.
constraint imposed on quadruples of
[ WP:Quadrature mirror filter]
corresponding points by the geometry

204 Q

of four simultaneous views, analogous

to the epipolar constraint for the
two-camera case and to the
trifocal tensor for the three-camera
case. See also stereo correspondence,
epipolar geometry . [ FP:10.3]

quadrilinear constraint: The

geometric constraint on four views of a
point (i.e., the intersection of four
epipolar lines ). See also
epipolar constraint and
trilinear constraint . [ FP:10.3]

quadtree: A hierarchical structure

representing 2D image regions, in which
each node represents a region, and the
whole image is the root of the tree.
Each non-leaf node, representing a
region R, has four children, that
represent the four subregions into which
R is divided:, as illustrated below.
Hierarchical subdivision continues until
the remaining regions have constant
properties. Quadtrees can be used to
create a compressed image structure.
The 3D extension of a quadtree is the
octree . [ SQ:5.9.1]

qualitative vision: A paradigm based

on the idea that many perceptual tasks
could be better accomplished by
computing only qualitative descriptions
of objects and scenes from images, as
opposed to quantitative information
like accurate measurements. Suggested
in the framework of computational
theories of human vision. [ VSN:10]

quantization: See
spatial quantization . [ SEU:2.2.4]
Q 205

quasi-invariant: An approximation of
quantization error: The an invariant . For instance,
approximation error created by the quasi-invariant parameterizations of
quantization of a continuous variable, image curves have been built by
typically using a regularly spaced scale approximating the invariant arc length
of values. This figure with lower spatial derivatives.
[ WP:Quasi-invariant measure]

quaternion: A forerunner of the

modern vector concept, invented by
5 Hamilton, used in vision to represent
4 rotations . Any rotation matrix , R, can
3 be parameterized by a vector of four
2 numbers, ~q = (q0 , q1 , q2 , q3 ), such that
P3 2
k=0 qk = 1, that define uniquely the
1 rotation. A rotation has two
0 representations, ~q and ~q. See
rotation matrix for alternative
shows a continuous function (dashed) representations of rotations.
and its quantized version (solid line) [ FP:21.3.1]
using six values only. The quantization
error is the vertical distance between query by image content (QBIC): A
the two curves. For instance, the class of techniques for selecting
intensity values in a digital image can members from a database of images by
only take on a certain number (often using examples of the desired image
256) of discrete values. See also content (as opposed to textual search).
sampling theorem and Examples of contents include color,
Nyquist sampling rate . [ SQ:4.2.1] shape, and texture. See also
image database indexing .
quantization noise: See [ WP:Content-
quantization error . [ SQ:4.2.1] based image retrieval#Other query methods]

RS curve: A contour representation radial lens distortion: A type of

giving the distance, r, of each point of geometric distortion introduced by a
the contour from an origin chosen real lens. The effect is to shift the
arbitrarily, as a function of the arc position of each image point, p, away
length, s. Allows rotation-invariant from its true position, along the line
comparison of contour. See also through the image center and p. See
contour , shape representation . also lens , lens distortion ,
barrel distortion , tangential distortion ,
pin cushion distortion ,
s s=0
distortion coefficient . This figure shows
the typical deformations of a square
(exaggerated) [ FP:3.3]:

radar: An active sensor detecting the

presence of distant objects. A narrow
beam of very high-frequency radio
pulses is transmitted and reflected by a
target back to the transmitter. The
direction of the reflected beam and the
time of flight of the pulse determine the
targets position. See also
time-of-flight range sensor . [ TV:2.5.2] radiance: The amount of light
(radiating energy) leaving a surface.
The light can be generated by the
R 207

surface itself, as in a light source , or figure (usually the center of gravity or a

reflected by it. The surface can be real physically meaningful point). The
(e.g., a wall) or imaginary (e.g., an representation then records the distance
infinite plane). See also irradiance , r() from ~c to points on the boundary,
radiometry . [ FP:4.1.3] as a function of , which is the angle
between the direction and some
radiance map: A map of radiance for reference direction. The representation
a scene. Sometimes used to refer to a has problems when the vector at angle
high dynamic range image. [ FP:4.1.3] intersects the boundary more than
one time. See:
radiant flux: The radiant energy per
time unit, that is, the amount of energy
transmitted or absorbed per time unit. r()

See also radiance , irradiance ,
radiometry . [ EH:3.3.1] c

radiant intensity: See radiant flux.

[ EH:3.3.1]

radiometric calibration: A process radon transform: A transformation

seeking to estimate radiance from pixel mapping an image into a parameter
values. The rationale for radiometric space highlighting the presence of lines.
calibration is that the light entering a It can be regarded as an extension of
real camera (the radiance) is, in the Hough transform . One definition is
general, altered by the camera itself. A
simple calibration model is g(, ) =
E(i, j) = g(i, j)I + o(i, j), where, for Z Z
each pixel (i, j), E is the radiance to I(x, y)( x cos y sin )dxdy
estimate, I the measured intensity, and
g and o a pixel-specific gain and offset where I(x, y) is the image (gray values)
to be calibrated. Ground truth values and = x cos + y sin is a parametric
for E can be measured using line in the image. Lines are identified
photometers. by peaks in the , space. See also
[ WP:Radiometric calibration] Hough transform line finder .
[ AJ:10.2]
radiometry: The measurement of
optical radiation, i.e., electromagnetic RAG: See region adjacency graph .
radiation between 3 1011 and 3 1016 [ JKS:3.3.4]
Hz (wavelengths between 0.01 and 1000
m). This includes ultraviolet, visible random access camera: A random
and infrared. Common units access camera is characterized by the
encountered are watts photons possibility of accessing any image
m2 and secsteradian .
Compare with photometry , which is location directly. The name was
the measurement of visible light. introduced to distinguish such cameras
[ FP:4.1] from sequential scan cameras, where
image values are transmitted in a
radius vector function: A contour standard order.
or boundary representation based
about a point ~c in the center of the random dot stereogram: A stereo
pair formed by one random dot image
208 R

(that is, binary images in which each compression it will be hard to see the
pixel is assigned to black or white at structure in the pixels with the low
random), and a second image that is values. The left image shows the
derived from the first. This figure magnitude of a 2D Fourier transform
with a single bright spot in the middle.
The right image shows the logarithm of
the left image, revealing more details.
[ AJ:7.2]

shows an example, in which a central

square is shifted horizontally. Looking
cross-eyed at close distance, you should
perceive a strong 3D effect. See also
stereo and stereo vision . [ VSN:7.1] range data: A representation of the
spatial distribution of a set of 3D
random sample consensus: See points. The data is often acquired by
RANSAC. [ FP:15.5.2] stereo vision or by a range sensor . In
computer vision, range data are often
random variable: A scalar or a vector
represented as cloud of points, i.e., a
variable that takes on a random value.
set of triplets representing the X, Y, Z
The set of possible values may be
coordinate of each point, or as
describable by a standard distribution,
range images , also known as moir`e
such as the Gaussian ,
patch. The figure below shows a range
mixture of Gaussians , uniform , or
image of an industrial part, where
Poisson distributions. [ VSN:A2.2]
brighter pixels are closer [ TV:2.5]:
randomized Hough transform: A
variation of the standard
Hough transform designed to produce
higher accuracy with less
computational effort. The line-finding
variant of the algorithm selects pairs of
image edge points randomly and
increments the accumulator cell
corresponding to the line through these
two points. The selection process is
repeated a fixed number of times.
[ WP:Randomized Hough Transform]

range compression: Reducing the

dynamic range of an image to enhance
the appearance of the image. This is range data fusion: The merging of
often needed for images resulting from multiple sets of range data , especially
the magnitude of the Fourier transform for the purpose of 1) extending the
which might have pixels with both large portion of an objects surface described
and very low values. Without range by the range data, or 2) increasing the
R 209

accuracy of measurements by exploiting

the redundancy of multiple measures
available for each point of surface area.
See also information fusion , fusion ,
sensor fusion .

range data integration: See

range data fusion .

range data registration: See

range image edge detector: An
registration . [ FP:21.3]
edge detector working on range images.
range data segmentation: A class of Typically, edges occur where depths or
techniques partitioning range data into surface normal directions ( fold edge )
a set of regions. For instance, a change rapidly. See also edge detection,
well-known method for segmenting range images . The right image shows
range images is HK segmentation , the depth and fold edges extracted from
which produces a set of surface patches the left range image:
covering the initial surface. The right
image shows the plane, cylinder and
spherical patches extracted from the
left range image. See also
surface segmentation . [ RN:9.3]

range sensor: Any sensor acquiring

range data . The most popular range
sensors in computer vision are based on
optical and acoustic technologies. A
laser range sensor often uses
range edge: See structured light triangulation . A
surface shape discontinuity time-of-flight range sensor measures the
round-trip time of an acoustic or optical
range flow: A class of algorithms for pulse. See also depth estimation. An
the measurement of motion in example of a triangulation range sensor
time-varying range data, made possible is [ TV:2.5.2]
by the evolution of fast range sensors .
See also optical flow .
range image: A representation of PROJECTOR

range data as an image. The pixel

coordinates are related to the spatial
position of each point on the range OBJECT
surface, and the pixel value represents STRIPE IMAGE

the distance of the surface point from

the sensor (or from an arbitrary, fixed
background). The figure below shows a
range image of a face, where darker rank order filtering: A class of
pixels are closer [ JKS:11.4]: filters the output of which depends on
210 R

an ordering (ranking) of the pixels of the input value (assuming it is a

within the region of support . The Gaussian random variable). Then
classic example is the median filter Rd = max(0, 12 log2 ( D )). [ AJ:2.13]
which selects the middle value of the
set of input values. More generally, the raw primal sketch: The first
filter selects the kth largest value in the representation built in the perception
input set. [ SB:4.4.3] process according to Marrs theory of
vision, heavily based on detection of
RANSAC: Acronym for random local edge features. It represents the
sample consensus, a robust estimator location, orientation, contrast and scale
seeking to counter the effect of outliers of centersurround , edge , bar and
in data used, for example, in a least truncated bar features. See also
square estimation problem. In essence, primal sketch .
RANSAC considers a number of data
subsets of the minimum size necessary RBC: See recognition by components .
to solve the problem (say a parametric [ WP:Recognition by Components Theory]
surface fit), then looks for statistical
agreement of the results. See also
least median square estimation , real time processing: Any
M-estimation , outlier rejection . computation performed within the time
[ FP:15.5.2] limits imposed by a given process. For
example, in visual servoing a tracking
raster scan: Raster refers to the system feeds positional data to a
region of a monitor, e.g., a cathode ray control algorithm generating control
tube (CRT) or a liquid crystal display signals; if the control signals are
(LCD) capable of rendering images. In generated too slowly, the whole system
a CRT, the raster is a sequence of may become unstable. Different
horizontal lines that are scanned processes can impose very different
rapidly with an electron beam from left constraints for real time processing.
to right and top to bottom, largely in When processing video-stream data,
the same way as a TV picture tube is real time means complete processing of
scanned. In an LCD, the raster (usually one frame of data in the time before the
called a grid) covers the whole device next frame is acquired (possibly with
area and is scanned differently, in that several frames lag time as in a
image elements are displayed pipeline parallel process).
individually. [ AL:4.2] [ WP:Real-time computing]

rate-distortion: A statistical method receiver operating curves and

useful in analog-to-digital conversion. Itperformance analysis for vision: A
determines the minimum number of receiver operating curve (ROC) is a
bits required to encode data while diagram showing the performance of a
tolerating a given level of distortion, orclassifier. It plots the number or
vice versa. [ AJ:2.13] percentage of true positives against the
number or percentage of true negatives.
rate-distortion function: The Performance analysis is a substantial
number of bits per sample (the rate Rd ) topic in computer vision and the object
to encode an analog image (or other of an ongoing debate. See also
signal) value given the allowable performance characterization , test ,
distortion D (or mean square of the classification . [ FP:22.2.1]
error). Also needed is the variance 2
R 211

many shape from X methods reported

receptive field: 1) The retinal area (see shape from contour and following
generating the response to a entries). [ TV:7.4]
photostimulus. The main cells
responsible for visual perception in the reconstruction error: Inaccuracies in
retina are the rods and the cones, active a model when compared to reality.
in high- and low-intensity situations These can be caused by inaccurate
respectively. See also sensing or compression. (See
photopic response. 2) The region of lossy compression .)
visual space giving rise to that response. [ WP:Constructivism (learning theory)#Pedagogies based on constructivism
3) The region of an image that is input
to the calculation of each output value.
(See region of support .) [ FP:1.3] rectification: A technique warping
two images into some form of geometric
recognition: See identification . alignment, e.g., so that the vertical
[ TV:10.1] pixel coordinates of corresponding
points are equal. See also
recognition by components (RBC): stereo image rectification . This figure
1) A theory of human image shows a stereo pair (top row) and its
understanding devised by Biederman. rectified version (bottom row),
The foundation is a set of 3D shape highlighting some of the corresponding
primitives called geons , reminiscent of scanlines, where corresponding image
Marrs generalized cones . Different features lie [ JKS:12.5]:
combinations of geons yield a large
variety of 3D shapes, including
articulated objects. 2) The recognition
of a complex object by recognizing
subcomponents and then combining
these to recognize more complex
objects. See also hierarchical matching ,
shape representation ,
model based recognition ,
object recognition .
[ WP:Recognition by Components Theory]

recognition by
parts: See recognition by components.
[ WP:Object recognition (computer vision)#Recognition
recursive region by growing:
parts] A class of
recursive algorithms for region growing
. An initial pixel is chosen. Given an
recognition by structural adjacency rule to determine the
decomposition: See neighbors of a pixel, (e.g., 8-adjacency),
recognition by components . the neighboring pixels are explored. If
reconstruction: The problem of any meets the criteria for addition to
computing the shape of a 3D object or the region, the growing procedure is
surface from one or more intensity or called recursively on that pixel. The
range images. Typical techniques process continues until all connected
include model acquisition and the image pixels have been examined. See
also adjacent , image connectedness ,
212 R

neighborhood recursive splitting . reflectance estimation: A class of

[ SQ:8.3.1] technique for estimating the
bidirectional reflectance distribution
recursive splitting: A class of function (BDRF) . Used notably within
recursive algorithms for region the techniques for shape from shading
segmentation, dividing an image into a and image based rendering , which
region set. The region set is initialized seeks to render arbitrary images of
to the whole image. A homogeneity scenes from video material only. All
criterion is then applied; if not satisfied, information about geometry and
the image is split according to a given photometry (e.g., the BDRF) is derived
scheme (e.g., into four sub-images, as in from video. See also
a quadtree ), leading to a new region physics based vision . [ FP:4.2.2]
set. The procedure is applied
recursively to all regions in the new reflectance map: The reflectance
region set, until all remaining regions map expresses the reflectance of a
are homogeneous. See also material in terms of a viewer-centered
region segmentation , representation of local
region based segmentation , surface orientation . The most
recursive region growing . [ RN:8.1.1] commonly used is the Lambertian
reflectance map, based on
reference frame transformation: Lamberts law . See also
See coordinate system transformation . shape from shading ,
[ WP:Rotating reference frame] photometric stereo . [ JKS:9.3]
reference image: An image of a reflectance ratio: A
known scene or of a scene at a photometric invariant used for
particular time used for comparison segmentation and recognition. It is
with a current image. See, for example, based on the observation that the
change detection . illumination on both sides of a
reflectance or color edge is nearly the
reference views: In
same. So, although we cannot factor
iconic recognition, the views chosen as
out the reflectance and illumination
most representative for a 3D object.
from only the observed lightness, the
See also eigenspace based recognition ,
ratio of the lightnesses on both sides of
characteristic view .
the edge equals the ratio of the
reference white: A sample image reflectances, independent of
value which corresponds to a known illumination. Thus the ratio is invariant
white object. The knowledge of such a to illumination and local surface
value facilitates white balance geometry for a significant class of
corrections. [ WP:White point] reflectance maps . See also invariant ,
physics based vision .
reflectance: The ratio of reflected to
incident flux, in other words the ratio of reflection: 1) A mathematical
reflected to incident (light) power. See transformation where the output image
also bidirectional is the input image flipped over about a
reflectance distribution function . given transformation line in the image
[ JKS:9.1.2] plane. See reflection operator . 2) An
optics phenomenon whereby all incident
light incident on a surface is deflected
R 213

away, without absorption, diffusion or where n1 and n2 are the refraction

scattering. An ideal mirror is the indices of the two media, and 1 , 2 the
perfect reflecting surface. Given a respective refraction angles [ EH:4.1]:
single ray of light incident on a
reflecting surface, the angle of incidence INCIDENT RAY
equals the angle of reflection, as shown
below. See also specular reflection .



region: A connected part of an image,

usually homogeneous with respect to a
[ WP:Reflection (physics)]
given criterion. [ BB:5.1]
reflection operator: A linear
region adjacency graph (RAG): A
transformation intuitively changing
graph expressing the adjacency
each vector or point of a given space to
relations among image regions , for
its mirror image, as shown below. The
instance generated by a segmentation
transformation corresponding matrix,
algorithm. See also
say H, has the property HH = I, i.e.,
region segmentation and
H1 = H: a reflection matrix is its own
region based segmentation . The
inverse. See also rotation .
adjacency relations of the regions in the
left figure are encoded in the RAG at
the right [ JKS:3.3.4]:


region based segmentation: A class

of segmentation techniques producing
a number of image regions, typically on
refraction: An optical phenomenon the basis of a given homogeneity
whereby a ray of light is deflected while criterion. For instance, intensity image
passing through different optic regions can be homogeneous by color
mediums, e.g., from air to water. The (see color image segmentation ) or
amount of deflection is governed by the texture properties (see
difference between the refraction indices texture field segmentation ); range
of the two mediums, according to image regions can be homogeneous by
Snells law: shape or curvature properties (see
n1 n2 HK segmentation ). [ JKS:3.2]
sin(1 ) sin(2 )
214 R

region boundary extraction: The merged into the region when the data
problem of computing the boundary of are consistent with the previous region.
a region, for example, the contour of a The region is often redescribed after
region in an intensity image after each new set of data is added to it.
color based segmentation . Many region growing algorithms have
the form: 1) Describe the region based
region decomposition: A class of on the current pixels that belong to the
algorithms aiming to partition an image region (e.g., fit a linear model to the
or region thereof into regions . See also intensity distribution). 2) Find all
region based segmentation . [ JKS:3.2] pixels adjacent to the current region. 3)
Add an adjacent pixel to the region if
region descriptor: 1) One or more
the region description also describes
properties of a region, such as
this pixel (e.g., it has a similar
compactness or moments . 2) The data
intensity). 4) Return to step 1 as long
structure containing all data pertaining
as new pixels continue to be added. A
to a region . For instance, for image
similar algorithm exists for region
regions this could include the regions
growing with 3D points, giving a
position in the image (e.g., the
surface fitting . The data points could
coordinates of the center of mass ), the
come from a regular grid (pixel or
regions contour (e.g., a list of 2D
voxel) or from an unstructured list. In
coordinates), some indicator of the
the latter case, it is harder to determine
region shape (e.g., compactness or
adjacency. [ JKS:3.5]
perimeter squared over area), and the
value of the regions homogeneity index. region identification: A class of
[ NA:7.3] algorithms seeking to identify regions
with special properties, for instance, a
region detection: A vast class of
human figure in a surveillance video, or
algorithms seeking to partition an
road vehicles in an aerial sequence.
image into regions with particular
Region identification covers a very wide
properties. See for details
area of techniques spanning many
region identification , region labeling ,
applications, including remote sensing ,
region matching ,
visual surveillance , surveillance , and
region based segmentation .
agricultural and forestry surveying. See
[ SOS:4.3.2]
also target recognition ,
region filling: A class of algorithms automatic target recognition (ATR),
assigning a given value to all the pixels binary object recognition ,
in the interior of a closed contour object recognition , pattern recognition
identifying a region . For instance, one .
may want to fill the interior of a closed
region invariant: 1) A property of a
contour in a binary image with zeros or
region that is invariant (does not
ones. See also morphology ,
change) after some transformation is
mathematical morphology ,
applied to the region, such as
binary mathematical morphology .
translation , rotation or
[ SOS:4.3.2]
perspective projection . 2) A property
region growing: A class of algorithms or function which is invariant over a
that construct a connected region by region .
incrementally expanding the region,
usually at the boundary . New data are
R 215

region labeling: A class of algorithms

which are used to assign a label or
meaning to each image region in a
given image segmentation to achieve an
appropriate image interpretation .
Representative techniques are
relaxation labeling ,
probabilistic relaxation labeling , and
interpretation trees (see
interpretation tree search ). See also
labeling problem .

region matching: 1) Establishing the

correspondences between matching region of support: The subregion of
members of two sets of regions. 2) The an image that is used in a particular
degree of similarity between two computation. For example, an
regions, i.e., solving the matching edge detector usually only uses a
problem for regions. See, for instance, subregion of pixels neighboring the
template matching, color matching , pixel currently being considered for
color histogram matching . being an edge. [ KMJ:5.4.1-5.4.2]

region merging: A class of algorithms region neighborhood graph: See

fusing two image regions into one if a region adjacency graph . [ JKS:3.3.4]
given homogeneity criterion is satisfied.
See also region , region propagation: The problem of
region based segmentation , tracking moving image regions.
region splitting . [ RJS:6] region representation: A class of
region of interest: A subregion of an methods to represent the defining
image where processing is to occur. characteristics of an image region . For
Regions of interest may be used to: 1) encoding the shapes, see
reduce the amount of computation that axial representation , convex hull ,
is required or 2) to focus processing so graph model , quadtree ,
that image data outside the region do run-length coding , skeletonization . For
not distract from or distort results. As encoding a region by its properties, see
an example, when tracking a target moments , curvature scale space ,
through an image sequence, most Fourier shape descriptor ,
algorithms for locating the target in the wavelet descriptor ,
next video frame only consider image shape representation . [ JKS:3.3]
data from a region of interest region segmentation: See
surrounding the predicted target region based segmentation . [ JKS:3.2]
position. The figure shows a boxed
region of interest: region snake: A snake representing
[ WP:Region of interest] the boundary of some region . The
operation of computing of the snake
may be used as a region segmentation

region splitting: A class of

algorithms dividing an image, or a
216 R

region thereof, into parts (subregions) if where that functionality did not exist.
a given homogeneity criterion is not [ WP:Regression analysis]
satisfied over the region. See also
region , region based segmentation , regularization: A class of
region merging . [ RJS:6] mathematical techniques to solve an
ill-posed problem . In essence, to
registration: A class of techniques determine a single solution, one
aiming to align , superimpose, or match introduces the constraint that the
two objects of the same kind (e.g., solution must be smooth, in the
images, curves, models); more intuitive sense that similar inputs must
specifically, to compute the geometric correspond to similar outputs. The
transformation superimposing one to problem is then cast as a
the other. For instance, image variational problem , in which the
registration determines the region variational integral depends both on the
common to two images, thereby finding data and on the smoothness constraint.
the planar transformation (rotation and For instance, a regularization approach
translation) aligning them; similarly, to the problem of estimating a function
curve registration determines the f from a set of values y1 , y2 , . . . , yn at
transformation aligning the similar (or the data point ~x1 , . . . , ~xn , leads to the
same) part of two curves. This figure minimization of the functional
H(f ) = (f (~xi ) yi )2 + (f )

where (f ) is the smoothness

functional, and a positive parameter
shows the registration (right) of the called the regularization number.
solid (left) and dashed (middle) curves. [ JKS:13.7]
The transformation needs not be rigid;
non-rigid registration is common in relational graph: A graph in which
medical imaging, for instance in the arcs express relations between the
digital subtraction angiography . Notice properties of image entities (e.g.,
also that most often there is no exact regions or other features) which are the
solution, as the two objects are not nodes in the graph. For regions, for
exactly the same, and the best instance, commonly used properties are
approximate solution must be found by adjacency, inclusion, connectedness,
least squares or more complex methods. and relative area size. See also
See also Euclidean transformation , region adjacency graph (RAG),
medical image registration , shape representation . The adjacency
model registration , relations of the regions in the left figure
multi-image registration. [ FP:21.3] are encoded in the RAG at the right
[ DH:12.2.2]:
regression: 1) In statistics, the
relationship between one variable and
another, as in linear regression . A B B
particular case of curve and surface A A D
fitting . 2) Regression testing verifies C D C
that changes to the implementation of a
system have not caused a loss of relational matching: A class of
functionality, or regression to the state matching algorithms based on
R 217

relational descriptors. See also As the number of iterations increases,

relational graph . [ BB:11.2] the effect of local constraints are
propagated to farther and farther parts
relational model: See of the network. Convergence is achieved
relational graph . [ DH:12.2.2] when no more changes occur, or
changes become insignificant. See also
relational shape description: A
discrete relaxation , relaxation labeling ,
class of shape representation
probabilistic relaxation labeling .
techniques based on relations between
[ SQ:6.1]
the properties of image entities (e.g.,
regions or other features). For regions, relaxation labeling: A relaxation
for instance, commonly used properties technique for assigning a label from a
are adjacency, inclusion, connectedness, discrete set to each node of a network
and relative area size. See also or graph. A well-known example, a
relational graph , classic in artificial intelligence, is
region adjacency graph . Waltzs line labeling algorithm (see also
line drawing analysis ). [ JKS:14.3]
relative depth: The difference in
depth values (distance from some relaxation matching: A relaxation
observer ) for two points. In certain labeling technique for model matching,
situations while it may not be possible the purpose of which is to label (match)
to compute actual or absolute depth, it each model primitive with a scene
may be possible to compute relative primitive. Starting from an initial
depth. labeling, the algorithm harmonizes
iteratively neighboring labels using a
relative motion: The motion of an
coherence measure for the set of
object with respect to some other,
matches. See also discrete relaxation,
possibly also moving, frame of reference
relaxation labeling ,
(typically the observers).
probabilistic relaxation labeling .
relative orientation: The problem of
relaxation segmentation: A class of
computing the orientation of an object
segmentation techniques based on
with respect to another coordinate
relaxation . See also
system, such as that of the sensor.
image segmentation . [ BT:5]
More specifically, the rotation matrix
aligning the reference frames attached remote sensing: The acquisition,
to the object and second object. See analysis and understanding of imagery,
also pose and pose estimation . mainly of the Earths surface, acquired
[ JKS:12.4] by airplanes or satellites. Used
frequently in agriculture, forestry,
relaxation: A technique for assigning
meteorological and military
values from a continuous or discrete set
applications. See also
to the node of a network or graph by
multi-spectral analysis ,
propagating the effects of local
multi-spectral image ,
constraints. The network can be an
geographic information system (GIS).
image grid, in which case the pixels are
[ RJS:6]
nodes, or features, for instance edges or
regions. At each iteration, each node representation: A description or
interacts with its neighbors, altering its model specifying the properties defining
value according to the local constraints. an object or class of objects. A classic
218 R

example is shape representation, a removes the slowly varying components

group of techniques for describing the by exploiting the fact that the observed
geometric shape of 2D and 3D objects. brightness B = L I is product of the
See also Koenderinks surface lightness (or reflectance) L and
surface shape classification . the illumination I. By taking the
Representations can be symbolic or logarithm of B at each pixel, the
non-symbolic (see product of L and I become a sum of
symbolic object representation and logarithms. Slow changes can be
non-symbolic representation ), a detected by differentiation and then
distinction inherited from artificial removed by thresholding.
intelligence. Re-integration of the result produces
[ WP:Representation (mathematics)] the lightness image (up to an arbitrary
scale factor). [ BKPH:9.3]
resection: The computation of the
position of a camera given the images reverse engineering: The problem of
of some known 3D points. Also known generating a model of a 3D object from
as camera calibration , or a set of views, for instance a VRML or
pose estimation . [ HZ:21.1] a triangulated model . The model can
be purely geometric, that is, describing
resolution: The number of pixels per just the objects shape, or combine
unit area, length, visual angle, etc. shape and textural properties.
[ AL:p. 236] Techniques exists for reverse
engineering from both range images
restoration: Given a noisy sample of
and intensity images. See also
some true data, the goal of restoration
geometric model , model acquisition .
is to recover the best possible estimate
[ TV:4.6]
of the original true data, using only the
noisy sample. [ TV:3.1.1] RGB: A format for color images,
encoding the Red, Green, and Blue
reticle: The network of fine wires or
component of each pixel in separate
receptors placed in the focal plane of an
channels. See also YUV , color image.
optical instrument for measuring the
[ FP:6.3.1]
size or position of the objects under
observation. [ WP:Reticle] ribbon: A shape representation for
pipe-like planar objects whose contours
retinal image: The image which is
are approximately parallel, e.g., roads
formed on the retina of the human eye.
in aerial imagery. See also
[ VSN:1.2.2]
generalized cones ,
retinex: An image enhancement shape representation.
algorithm based on retinex theory, [ FP:24.2.2-24.2.3]
aimed to compute an
ridge: A particular type of
illuminant-independent quantity called
discontinuity of the intensity function,
lightness at each image pixel. The key
giving rise to thick edges and lines.
observation is that normal illumination
This figure
on a surface changes slowly, leading to
slow changes in the observed brightness
of a surface. This contrasts with strong
changes in brightness at reflectance
and fold edges. The retinex algorithm
R 219

from a sequence of images by assuming

that there are no changes in shape.
INTENSITY Rigidity simplifies the problem
significantly so that changes in
appearance arise solely from changes in
relative position and projection.
Techniques exist for using known
3D models , or estimating the motion of
a general cloud of 3D points , or from
image feature points or estimating
shows a characteristic motion from optical flow . See also
dark-to-light-to-dark intensity ridge motion estimation , egomotion .
profile along a scanline. See also [ BKPH:17.2]
step edge , roof edge , edge detection .
[ WP:Ridge detection] rigid registration: Registration
where neither the model nor data is
ridge detection: A class of allowed to deform. This reduces
algorithms, especially edge and line registration to estimating the
detectors, for detecting ridges in Euclidean transformation that aligns
images. [ WP:Ridge detection] the model with the data. See also
non-rigid registration .
right-handed coordinate system: A
3D coordinate system with the XYZ rigidity constraint: The assumption
axes arranged as follows. The that a scene or object under analysis is
alternative is a rigid, implying that all 3D points
left-handed coordinate system . remain in the same relative position in
[ FP:2.1.1] space. This constraint can simplify
significantly many algorithms, for
+Y instance shape reconstruction (see
shape and following shape from
entries) and motion estimation .
[ JKS:14.7]

+X road structure analysis: A class of

techniques which are used to derive
information about roads from images.
+Z These can be close-up images (e.g.,
images of the tarmac as acquired from
(OUT OF PAGE) a moving vehicles, to map defects
rigid body segmentation: The automatically over extended distances)
problem of partitioning automatically or remotely sensed images (e.g., to
the image of an articulated or analyze the geographical structure of
deformable body into a number of rigid road networks).
subcomponents. See also
part segmentation , Roberts cross gradient operator:
recognition by components (RBC) . An operator used for edge detection,
computing an estimate of perpendicular
rigid motion estimation: A class of components of the image gradient at
techniques aiming to estimate the 3D each pixel. The image is convolved with
motion of a rigid body or scene in space the two Roberts kernels , yielding two
220 R

components, Gx and Gy ,q
for each pixel. arctan Gxy can then be estimated as for
The gradient magnitude G2x + G2y any 2D vector. See also edge detection ,
Roberts cross gradient operator ,
and orientation arctan Gxy can then be Sobel gradient operator , Sobel kernel ,
estimated as for any 2D vector. See also Canny edge detector ,
edge detection , Canny edge detector , Deriche edge detector,
Sobel gradient operator , Sobel kernel , Hueckel edge detector ,
Deriche edge detector, Kirsch edge detector ,
Hueckel edge detector , MarrHildreth edge detector ,
Kirsch edge detector , OGorman edge detector . [ SEU:2.3.5]
MarrHildreth edge detector ,
OGorman edge detector ,
Robinson edge detector . [ JKS:5.2.1] robust: A general term referring to a
technique which is insensitive to noise
Roberts kernel: A pair of kernels, or or other perturbations. [ FP:15.5]
masks, used to estimate perpendicular
components of the image gradient robust estimator: A statistical
within the estimator which, unlike normal
Roberts cross gradient operator : least square estimators , is not
distracted by even significant
percentages of outliers in the data.
0 1 1 0 Popular robust estimators in computer
vision include RANSAC ,
least median of squares , and
1 0 0 1 M-estimators . See also
outlier rejection. [ FP:15.5]
The masks respond maximally to edge
oriented to plus or minus 45 degrees robust regression: A form of
from the vertical axis of the image. regression that does not use outlier
[ JKS:5.2.1] values in computing the fitting
parameters. For example, if doing a
Robinson edge detector: An least square straight line fit to a set of
operator for edge detection, computing data, normal regression methods use all
an estimate of the directional first data points, which can give distorted
derivatives of the image in eight results if even one point is very far away
directions. The image is convolved with from the true line. Robust processes
the eight kernels, three of which as either eliminate these outlying points or
shown here reduce their contribution to the results.
The figure below shows a rejected
1 1 1 1 1 1 1 1 1
outlying point [ JKS:6.8.3]:
1 2 1 1 2 1 1 2 1

1 1 1 1 1 1 1 1 1

Two of these, typically those
responding maximally to differences
along the coordinate axes, can be taken
as estimates of the two components of
the gradient,
qGx and Gy . The gradient INLIERS
magnitude G2x + G2y and orientation
R 221

the masks used in the

robust statistics: A general term Robinson edge detector . Most
describing statistical methods which are commonly used as a type of
not significantly influenced by outliers . average smoothing in which the most
[ WP:Robust statistics] homogeneous mask is used to compute
the smoothed value for every pixel. In
robust technique: See
the example, notice how although image
robust estimator . [ FP:15.5]
detail has been reduced the major
ROC: See boundaries have not been smoothed.
receiver operating characteristic .
[ FP:22.2.1]

roll: A 3D rotation representation

component (along with pitch and yaw )
often used for cameras or moving
observers. The roll component specifies
a rotation about the optical axis or line
of sight. This figure shows the roll
rotation direction [ JKS:12.2.1]:


roof edge: 1) An image edge where

the values increase continuously to a
maximum and then decrease
continuously, such as the brightness
values on a Lambertian cylinder when
lit by a point light source , or an
rotation: A circular motion of a set of
orientation discontinuity (or fold edge )
points or object around a given point
in a range image . 2) A scene edge
(2D) or line (3D, called the
where an orientation discontinuity
axis of rotation ). [ JKS:12.2.1-12.2.2]
occurs. The figure shows a horizontal
roof edge in a range image [ JKS:5]: rotation estimation: The problem of
estimating rotation from raw or
processed image, video or range data,
typically from two sets of corresponding
points (or lines, planes, etc.) taken
from rotated versions of a pattern. The
problem usually appears in one of three
forms: 1) estimating the 3D rotation
rotating mask: A mask which is from 3D data (three points are needed),
considered in a number of orientations 2) estimating the 3D rotation from 2D
relative to some pixel. See, for example, data (three points are needed but lead
222 R

to multiple solutions), or 3) estimating quaternions , Euler angles , yaw - pitch -

the 2D rotation from 2D data (two roll , rotation angles around the
points are needed). A second issue to coordinate axes, and axis-angle , etc.
consider is the effect of noise: typically have also been used. [ BKPH:18.10]
more than the minimum number of
points are needed to counteract the rotational symmetry: The property
effects of noise, which leads to least of a set of point or object to remain
square algorithms. unchanged after a given rotation. For
instance, a cube has several rotational
rotation invariant: A property that symmetries, with respect to any 90
keeps the same value even if the data degree rotation around any axis passing
values, the camera, the image or the through the centers of opposite faces.
scene from which the data comes is See also rotation , rotation matrix .
rotated. One needs to distinguish [ WP:Rotational symmetry]
between 2D (i.e., in the image) and 3D
(i.e., in the scene) rotation invariance. RS-170: The standard
For example, the angle between two black-and-white video format in the
image lines is invariant to image United States. The EIA (Electronic
rotation, but not to rotation of the lines Industry Association) is the standards
in the scene. body that originally defined the
[ WP:Rotational invariance] 525-line, 30 frame per second TV
standard for North America, Japan,
rotation matrix: A linear operator and a few other parts of the world. The
rotating a vector in a given space. The EIA standard, also defined under US
inverse of a rotation matrix equals its standard RS-170A, defines only the
transpose. A rotation matrix has only monochrome picture component but is
three degrees of freedom in 3D and one mainly used with the NTSC color
in 2D. In 3D space, there are three encoding standard. A version exists for
eigenvalues, namely 1, cos + i sin , PAL cameras . [ LG:4.1.3]
cos i sin , where i is the imaginary
unit. A rotation matrix in 3D has nine rubber sheet model: See
entries but only three degrees of membrane model.
freedom, as it must satisfy six [ WP:Gravitational well#The rubber-
orthogonality constraints. It can be sheet model]
parameterized in various ways, usually
through Euler angles , yaw - pitch - roll ,
rule-based classification: A method
rotation angles around the coordinate
of object recognition drawn from
axes, and axis-angle , etc. See also
artificial intelligence in which logical
orientation estimation ,
rules are used to infer object type.
rotation representation , quaternions.
[ WP:Concept learning#Rule-
[ FP:2.1.2]
Based Theories of Concept Learning]
rotation operator: A linear operator
expressed by a rotation matrix .
run code: See run length coding .
[ JKS:12.2.1]
[ AJ:9.7]
rotation representation: A
run length coding: A
formalism describing rotations and
lossless compression technique used to
their algebra. The most frequent is
reduce the size of a repeating string of
definitely the rotation matrix , but
R 223

characters, called a run, also compared to other methods, but the

applicable to images. The algorithm algorithm is easy to implement and
encodes a run of symbols into two quick to execute. Run-length coding is
bytes, a count and a symbol. For supported by bitmap file formats such
instance, the 6-byte string xxxxxx as TIFF, BMP and PCX. See also
would become 6x occupying 2 bytes image compression , video compression ,
only. It can compress any type of JPEG . [ AJ:11.2, 11.9]
information content, but the content
itself affects, obviously, the compression run length compression: See
ratio. Compression ratios are not high run length coding . [ AJ:11.2, 11.9]

saccade: A movement of the eye or salt-and-pepper noise: A type of

camera, changing the direction of impulsive noise. Let x, y [0, 1] be two
fixation sharply. [ WP:Saccade] uniform random variables, I the true
image value at a given pixel, and In the
saliency map: A representation corrupted (noisy) version of I. We can
encoding the saliency of given image define the effect of salt-and-pepper
elements, typically features or groups noise as In = imin + y(imax imin ) iff
thereof. See also salient feature , x l, where l is a parameter
Gestalt , perceptual grouping , controlling how much of the image is
perceptual organization . corrupted, and imin , imax the range of
[ WP:Salience (neuroscience)] the noise. See also image noise ,
Gaussian noise . This image was
salient feature: A feature associated
corrupted with 1% noise [ TV:3.1.2]:
with a high value of a saliency measure,
quantifying feature suggestiveness for
perception (from the Latin salire, to
leap). For instance, inflection points
have been indicated as salient features
for representing contours. Saliency is a
concept originated from Gestalt
psychology. See also
perceptual grouping ,
perceptual organization .

S 225

when d is the more complicated

sampling: The transformation of a geometric distance
continuous signal into a discrete one by d(~x, S(~a)) = miny~S k~x ~y k2 . The
recording its values at discrete instants Sampson approximation defines
or locations. Most digital images are
sampled in space, time and intensity, as f (~a; ~x)2
intensity values are defined only on a d(~x, S(~a)) =
kf (~a; ~x)k2
regular spatial grid, and can only take
integer values. This shows an example which is a first-order approximation to
of a continuous signal and its samples the geometric distance. If an efficient
[ FP:7.4.1]: algorithm for minimizing weighted
algebraic distance is available, then the
Sampson iterations are a further
approximation, where the k th iterate ~ak
is the solution to
~ak = argmin wi f (~a; ~xi )2
a i=1

with weights computed using the

sampling density: The density of a previous estimate so
sampling grid, that is, the number of wi = 1/kf (~ak1 ; ~xi )k2 . [ HZ:3.2.6,
samples collected per unit interval. See 11.4]
also sampling . [ BB:2.2.6]
SAR: see synthetic aperture radar .
sampling theorem: If an image is [ WP:Synthetic aperture radar]
sampled at a rate higher than its
SAT: See symmetric axis transform.
Nyquist frequency then an analog
[ VSN:9.2.2]
image could be reconstructed from the
sampled image whose mean square satellite image: An image of a section
error with the original image converges of the Earth acquired using a camera
to zero as the number of samples goes mounted on an orbiting satellite.
to infinity. [ AJ:4.2] [ WP:Satellite imagery]
Sampson approximation: An saturation: Reaching the upper limit
approximation to the of a dynamic range. For instance,
geometric distance in the fitting of intensity saturation occurs for a 8-bit
implicit curves or surfaces that are monochromatic image when intensities
defined by a parameterized function of greater than 255 are recorded: any such
the form f (~a; ~x) = 0 for ~x on the value is encoded as 255, the largest
surface S(~a) defined by parameter possible value in the range.
vector ~a. Fitting the surface to the set [ WP:Saturation (color theory)]
of points {~x1 , ..., ~xn } consists in
Pn a function of the form SavitzkyGolay filtering: A class of
e(~a) = i=1 d(~xi , S(~a)). Simple filters achieving least square fitting of
solutions are often available if the a polynomial to a moving window of a
distance function d(~x, S(~a)) is the signal. Used for fitting and data
algebraic distance d(~x, S(~a)) = f (~a; ~x)2 , smoothing. See also linear filter ,
but under certain common curve fitting .
assumptions, the optimal solution arises [ WP:Savitzky-Golay smoothing filter]
226 S

representing simplifications of the finer

scalar: A one dimensional entity; a real ones. The finest scale is the input image
number. [ WP:Scalar (mathematics)] itself. See scale space representation
for details. [ CS:5]
scale: 1) The ratio between the size of
an object, image, or feature and that of scale space filtering: The filtering
a reference or model. 2) The property operation that transforms one
that some image features are apparent resolution level into another in a
only when viewed at a given size, such scale space , for instance Gaussian
as a line being enlarged so much that it filtering. [ RJS:7]
appears as a pair of parallel edge
features. 3) A measure of the degree to scale space matching: A class of
which fine features have been removed matching techniques that compare
or reduced in an image. One can shape at various scales. See also
analyze images at multiple spatial scale space and image matching .
scales, whereby only features in certain [ CS:5.2.3]
size ranges appear at each scale (see
scale space and pyramid ). scale space representation: A
[ VSN:3.1.2] representation of an image, and more
generally of a signal, making explicit
scale invariant: A property that the information contained at multiple
keeps the same value even if the data, spatial scales , and establishing a causal
the image or the scene from which the relationship between adjacent scale
data comes is shrunk or enlarged. The levels. The scale level is identified by a
perimeter 2 scalar parameter, called scale
ratio area is invariant to image
scaling. [ WP:Scale invariance] parameter. A crucial requirement is
that coarser levels, obtained by
scale operator: An operator successive applications of a
suppressing details (high-frequency scale operator , should constitute
contents) in an image, e.g., simplifications of previous (finer) levels,
Gaussian smoothing . Details at small i.e., introduce no spurious details. A
scales are discarded. The resulting popular scale space representation is
content can be represented in a the Gaussian scale space, in which the
smaller-size image. See also next coarser image is obtained by
scale space, image pyramid , convolving the current image with a
Gaussian pyramid , Laplacian pyramid , Gaussian kernel. The variance of this
pyramid transform. kernel is the scale parameter. See also
scale space , image pyramid ,
scale reduction: The result of the Gaussian smoothing . [ CS:5.3]
application of a scale operator .
scaling: 1) The process of zooming or
scale space: A theory for early vision shrinking an image. 2) Enlarging or
developed to account properly for the shrinking a model to fit a set of data.
multi-scale nature of images. The 3) The process of transforming a set of
rationale is that, in the absence of a values so that they lie inside a standard
priori information on the optimal range (e.g., [1,1]), often to improve
spatial scale at which a specific numerical stability. [ VSN:6.2.1]
problem should be treated (e.g., edge
detection), images should be analyzed scanline: A single (horizontal) line of
at all possible scales, the coarser ones an image. Originally this term was used
S 227

for cameras in which the image is

acquired line by line by a sensing scattergram: See scatterplot .
element that generally scans each pixel [ DH:1.2]
on a line and then moves onto the next
scatterplot: A data display technique
line. [ WP:Scan line]
in which each data item is plotted as a
scanline slice: The cross section of a single point in an appropriate
structure along an image scanline . For coordinate system , that might help a
instance, the scanline slice of a convex person to better understand the data.
polygon in a binary image is: For example, if a set of estimated
surface normals is plotted in a 3D
scatterplot, then planar surfaces should
produce tight clusters of points. The
figure shows a set of data points plotted
1 according to their values of features 1
and 2 [ DH:1.2]:
scanline stereo matching: The
stereo matching problem with rectified
images, whereby corresponding points
lie on scanlines with the same index.
See also rectification ,
stereo correspondence .

scanning electron microscope

(SEM): A scientific microscope FEATURE 1
introduced in 1942. It uses a beam of scene: The part of 3D space captured
highly energetic electrons to examine by an imaging sensor, and every visible
objects on a very fine scale. The object therein. [ RJS:1]
imaging process is essentially the same
as for a light microscope apart from the scene analysis: The process of
type of radiation used. Magnification is examining an image or video, for the
much higher than what can be achieved purpose of inferring information about
with light. The images are rendered in the scene in view, such as the shape of
gray shades. This technique is the visible surfaces, the identity of the
particularly useful for investigating objects in the scene, and their spatial
microscopic details of surfaces. or dynamic relationships. See also
[ BKPH:11.1.3] shape from contour and the following
shape from entries,
scatter matrix: For a set of d object recognition , and
dimensional points represented as symbolic object representation .
columnPvectors {~x1 , ..., ~xn }, with mean [ RJS:6,7]
~ = n1 i=1 ~xi , the scatter matrix is the

d d matrix scene constraint: Any constraint
imposed on the image data by the
S= (~xi
~ )(~xi
nature of the scene, for instance, rigid
motion, or the orthogonality of walls
It is (n 1) times the sample and floors, etc. [ HZ:9.4.1-9.4.2]
covariance matrix. [ DH:4.10]
228 S

scene coordinates: A 3D
coordinate system that describes the screw motion: A 3D transformation
position of scene objects relative to a comprising a rotation about an axis ~a
given coordinate system origin. and translation along ~a. The general
Alternative coordinate systems are Euclidean transformation ~x 7 R~x + ~t is
camera coordinates , a screw transformation if R~t = ~t.
viewer centered coordinates or [ VSN:8.2.1]
object centered coordinates .
search tree: A data structure that
[ JKS:1.4.2]
records the choices that could be made
scene labeling: The problem of in a problemsolving activity, while
identifying scene elements from image searching through a space of alternative
data, associating them to labels choices for the next action or decision.
representing their nature and roles. See The tree could be explicitly created or
also labeling problem, region labeling , be implicit in the sequence of actions.
relaxation labeling , For example, a tree that records
image interpretation , alternative model-to-data feature
scene understanding . [ BB:12.4] matching is a specialized search tree
called an interpretation tree . If each
scene reconstruction: The problem non-leaf node has two children, we have
of estimating the 3D geometry of a a binary search tree. See also
scene, for example the shape of visible decision tree , tree classifier .
surfaces or contours, from image data. [ DH:12.4.1]
See also reconstruction ,
shape from contour and the following SECAM: SECAM (Sequential Couleur
shape from entries or avec Memoire) is the television
architectural model , volumetric broadcast standard in France, the
, surface and slice based reconstruction. Middle East, and most of Eastern
[ WP:Computer vision#Scene reconstruction] Europe. SECAM broadcasts 819 lines
per second. It is one of three main
television standards throughout the
scene understanding: The problem world, the other two being PAL (see
of constructing a semantic PAL camera ) and NTSC . [ AJ:4.1]
interpretation of a scene from image
data, that is, describing the scene in second derivative operator: A
terms of object identities and linear filter estimating the second
relationships among objects. See also derivative from an image at a given
image interpretation , point and in a given direction.
object recognition , Numerically, a simple approximation of
symbolic object representation , the second derivative of a 1D function f
semantic net , graph model , is the central (finite) difference, derived
relational graph. from the Taylor approximation of f :
fi+1 2fi + fi1
SCERPO: Spatial Correspondence, fi = + O(h)
Evidential Reasoning and Perceptual h2
Organization. A well known vision where h is the sampling step (assumed
system developed by David Lowe that constant), and O(h) indicates that the
demonstrated recognition of complex truncation error vanishes as h. A
polyhedral objects (e.g., razors) in a similar but more complicated
complex scene. approximation exists for estimating the
S 229

second derivative in a given direction in See also camera calibration ,

an image. See also autocalibration , stratification ,
first derivative filter. [ JKS:5.3] projective geometry . [ FP:13.6]

second fundamental form: See self-localization: The problem of

surface curvature . [ FP:19.1.2] estimating the sensors position within
an environment from image or video
seed region: The initial region used data. The problem can be cast as
in a region growing process such as geometric model matching if models of
surface fitting in range data or sufficiently complex objects are
intensity region finding in an available, i.e., containing enough points
intensity image . The patch on the to allow a full solution of the
surface here is a potential seed region pose estimation problem. In some
for region growing the full cylindrical situations it is possible to identify a
patch [ JKS:3.5]: sufficient number of landmark points
(see landmark detection ). If no
information at all is available about the
scene , one can still apply tracking or
optical flow techniques to get
corresponding points over time, or
stereo correspondences in multiple
simultaneous frames. See also
motion estimation , egomotion .
segmentation: The problem of
dividing a data set into parts according self-occlusion: Occlusion in which
to a given set of rules. The assumption part of an object is occluded by another
is that the different segments part of the same object. In the
correspond to different structures in the following example the left leg of the
original input domain observed in the person is occluding their right leg.
image. See for instance
image segmentation ,
color image segmentation ,
curve segmentation ,
motion segmentation ,
part segmentation ,
range data segmentation,
texture segmentation . [ FP:14-14.1.2]

self-calibration: The problem of SEM: See

estimating the calibration parameters scanning electron microscope .
using only information extracted from a [ BKPH:11.1.3]
sequence or set of images (typically
feature point correspondences in semantic net: A graph representation
subsequent frames of a sequence or in in which nodes represent the objects of
several simultaneous views), as opposed a given domain, and arcs properties and
to traditional calibration in relations between objects. See also
photogrammetry , that adopt specially symbolic object representation ,
built calibration objects. Self graph model , relational graph . A
calibration is intimately related with simple example: an arch and its
basic concepts of multi-view geometry . semantic net representation [ BB:10.2]:
230 S

from the sequence. A typical example is

ARCH image sequence stabilization , in which
a target moving across the image in the
original sequence appears stationary in
the output sequence. Another example
PART_OF is keeping a robot stationary in front of
SUPPORTS a target using only visual data (station
keeping). Suppression of jitter in
semantic region growing: A hand-held video recorders is now
region merging scheme incorporating a commercially available. Basic
priori knowledge about adjacent ingredients are tracking and
regions ; for instance, in aerial imagery motion estimation . See also egomotion.
of countryside areas, the fact that roads
are usually surrounded by fields.
Constraint propagation can then be sensor motion estimation: See
applied to achieve a globally optimal egomotion . [ FP:17.5.1]
region segmentation. See also
constraint satisfaction , sensor path planning: See
relaxation labeling , sensor planning.
region segmentation , sensor placement determination:
region based segmentation , See camera calibration and
recursive region growing . [ BB:5.5] sensor planning .
sensor: A general word for a sensor planning: A class of
mechanism that records information techniques aimed to determine optimal
from the outside world, generally for sensing strategies for a reconfigurable
processing by a computer. The sensor sensor system, normally given a task
might obtain raw measurements, e.g., a and a geometric model of the target
video camera, or partially processed object (that may be partially acquired
information, e.g., depth from a in previous views). For example, given
stereo triangulation process. [ BM:1.9] a geometric feature on an object for
which a CAD-like model is known, and
sensor fusion: A vast class of the task to verify the features size, a
techniques aiming to combine the sensor planning system would
different information contained in data determine the best position and
from different sensors, in order to orientation of, say, a single camera and
achieve a richer or more accurate associated illumination for estimating
description of a scene or action. Among the size of each feature. The two basic
the many paradigms for fusing sensory approaches have been generate-and-test,
information are the Kalman filter , in which sensor configurations are
Bayesian models , fuzzy logic , generated and then evaluated with
DempsterShafer evidential reasoning, respect to the task constraints, and
production systems and synthetic methods, in which task
neural networks . [ WP:Sensor fusion] constraints are characterized
analytically and the resulting equations
sensor motion compensation: A solved to yield the optimal sensor
class of techniques aiming to suppress configuration. See also active vision ,
the motion of a sensor (or its effects) in purposive vision .
a video sequence, or in data extracted
S 231

sensor posi- blocking filters should be considered for

tion estimation: See pose estimation. fine measurements depending on
[ WP:Pose (computer vision)#Pose Estimation]camera intensities. We also notice that
a CCD camera makes a very good
sensor for the near-infrared range
sensor response: The output of a (7501000 nm).
sensor, or a characterization of some [ WP:Spectral sensitivity]
key output quantities, given a set of
inputs. Typically expressed in the separability: A term used in
frequency domain , as a function linking classification problems referring to
the magnitude and phase of the whether the data is capable of being
Fourier transform of the output signal split into distinct subclasses by some
with the known frequency of the input. automatic decision process. If property
See also phase spectrum , values of two classes overlap, then the
power spectrum , spectral response . classes are not separable. The circle
class is linearly separable in the figure
sensor sensitivity: In general, the below, but the and box classes are
weakest input signal that a sensor can not:
detect. It can be inferred from the
sensor response curve. For the common
CCD sensor of video cameras,
sensitivity depends on various
parameters, mainly the fill factor (the
percentage of the sensors area actually x
sensitive to light) and well capacity (the x
amount of charge that a photosensitive x x x
element can hold). The larger the x

values of the above parameters, the

more sensitive the camera. See also
sensor spectral sensitivity . separable filter: A 2D (in image
[ WP:Sensor#Use] processing) filter that can be expressed
as the product of two filters, each of
sensor spectral sensitivity: A which acts independently on rows and
characterization of a sensors response columns. The classic example is the
in frequency. For example, linear Gaussian filter (see
Gaussian convolution ). Separability
implies a significant reduction in
computational complexity, typically
reducing processing costs from O(N 2 )
to O(2N ), where N is the filter size.
See also linear filter ,
separable template .

separable template: A template or

shows the spectral sensitivity of a structuring element in a filter , for
typical CCD sensor (actually its instance a morphological filter (see
spectral response, from which the morphology ), that can be decomposed
spectral sensitivity can be inferred). into a sequence of smaller templates,
Notice that the high sensitivity of similarly to separable kernels for
silicon in the infrared means that IR linear filters . The main advantage is a
232 S

reduction in the computational

complexity of the associated filter. See ATTACHED
also separable filter . SHADOW

set theoretic modeling: See CAST SHADOW

constructive solid geometry .
[ JKS:15.3.2]

shading: The pattern formed by the shadow, attached: A shadow caused

graded areas of an intensity image, by an object on itself by self-occlusion.
suggesting light and dark. Variations in See also shadow, cast. [ FP:5.3.1]
the lightness of surfaces in the scene
may be due to vartiations in shadow, cast: A shadow thrown by
illumination , surface orientation and an object on another object. See also
surface reflectance . See also shadow, attached . [ FP:5.3.1]
illumination , shadow . [ BKPH:10.10] shadow detection: The problem of

shading correction: A class of identifying image regions

techniques for changing undesirable corresponding to shadows in the scene,
shading effects , for instance strongly using photometric properties. Useful
uneven brightness distribution caused for true color estimation and region
by nonuniform illumination . All analysis. See also color ,
techniques assume a shading model, color image segmentation,
i.e., a photometric model of color matching , photometry ,
image formation , formalizing the region segmentation .
dependency of the measured image shadow type labeling: A problem
brightness on camera parameters similar to shadow detection , but
(typically gain and offset), illumination requiring classification of different types
and object reflectance . See also of shadows.
shadow , photometry .
shadow understanding: Estimating
shading from shape: A technique various properties of a 3D scene based
recovering the reflectance of isolated on the appearance or size of shadows,
objects given a single image and a e.g., building height. See also
geometric model , but not exactly the shadow type labeling .
inverse of the classic
shape from shading problem. See also shape: Informally, the form of an
photometric stereo . image or scene object. Typically
described in computer vision through
shadow: A part of a scene that direct geometric representations (see
illumination does not reach because of shape representation ), e.g., modeling
self-occlusion ( attached shadow or image contours with polynomials or
self-shadow) or occlusion caused by b-spline , or range data patches with
other objects (cast shadow ). Therefore, quadric surfaces. More formally,
this region appears darker than its definitions are: 1. (adj) The quality of
surroundings. See also an object that is invariant to changes of
shape from shading , the coordinate system in which it is
shading from shape , expressed. If the coordinate system is
photometric stereo [ FP:5.3.1]. See: Euclidean , this corresponds to the
conventional idea of shape. In an affine
S 233

coordinate system, the change of parameters) that must be calibrated.

coordinates may be affine, so that, for Depth is estimated using this model
example, an ellipse and a circle have once image readings (pixel values) are
the same shape. 2. (n) A family of available. Notice that the camera uses a
point sets, any pair being related by a large aperture, so that the points in the
coordinate system transformation. scene are in focus over the smallest
3. (n) A specific set of n-dimensional possible depth interval. See also
points, e.g., the set of squares. For shape from focus . [ E. Krotkov,
example a curve in R2 defined Focusing, Int. J. of Computer Vision,
parametrically as ~c(t) = (x(t), y(t)) 1:223-237, 1987.]
comprises the point set or shape
{~c(t) | < t < }. The volume shape from focus: A class of
inside the unit sphere in 3D is the shape algorithms for estimating scene depth
{~x | k~xk < 1, ~x R3 }. [ ZRH:2.3] at each image pixel, and therefore
surface shape, by varying the focus
shape class: One in a set of classes setting of a camera until the image
representing different types of shape in achieves optimal focus (minimum blur)
a given classification, for instance, in a neighborhood of the pixel under
locally convex or hyperbolic in examination. Obviously, pixels
HK segmentation of a range image . corresponding to different depths would
[ TV:4.4.1] achieve optimal focus for different
settings. A model of the relation
shape decomposition: See between depth and image focus is
segmentation and assumed, containing a number of
hierarchical modeling . [ FP:14-14.1.2] parameters (e.g., the optics parameters)
that must be calibrated. Notice that
shape from contours: A class of
the camera uses a large aperture, so
algorithms for estimating the shape of
that the smallest possible depth interval
a 3D object from the contour it
generates in-focus image points. See
generates in an image. A well-known
also shape from defocus . [ EK]
technique, shape from silhouettes ,
consists in extracting the objects shape from line drawings: A class
silhouette from a number of views, and of symbolic algorithms inferring 3D
intersecting the 3D cones generated by properties of scene objects (as opposed
the silhouettes contours and the to exact shape measurements, as in
centers of projections. The intersection other shape from methods) from line
volume is known as the visual hull . drawings. First, assumptions are made
Work also exists on understanding about the type of line drawings
shape from the differential properties of admissible, e.g., polyhedral objects
apparent contours . [ JJK] only, no surface markings or shadows,
maximum three lines forming an image
shape from defocus: A class of
junction. Then, a dictionary of line
algorithms for estimating scene depth
junctions is formed, assigning a
at each image pixel, and therefore
symbolic label to every possible
surface shape, from multiple images
appearance of the line junctions in
acquired at different, controlled focus
space under the given assumptions.
settings. A closed-form model of the
This figure shows part of a simple
relation between depth and image focus
dictionary of junctions and a labeled
is assumed, containing a number of
parameters (e.g., the optics
234 S

as dense motion fields, i.e.,

optical flow, seeking to reconstruct
+ + dense surfaces. See also
motion factorization . [ JKS:11.3]
shape from multiple sensors: A
class of algorithms recovering shape
from information collected from a

+ + number of sensors of the same type, or

of different types. For the former class,
+ + + see multi-view stereo , For the second
class, see sensor fusion .
shape from optical flow: See
optical flow .
where + means planes intersecting in a
convex shape, in a concave shape, shape from orthogonal views: See
and the arrows a discontinuity shape from contours . [ JJK]
(occlusion) between surfaces. Each shape from perspective: A class of
image junction is then assigned the set techniques estimating depth for various
of all possible labels that its shape features from perspective cues, for
admits locally (e.g., all possible two-line instance the fact that a translation
junction labels for a two-line junction). along the optical axis of a
Finally, a constraint satisfaction perspective camera changes the size of
algorithm is used to prune labels the imaged objects. See also
inconsistent with the context. See also pinhole camera model . [ SQ:9A.2.1]
Waltzs line labeling ,
relaxation labeling . shape from photo consistency: A
technique based on space carving for
shape from monocular depth cues: recovering shape from multiple views
A class of algorithms estimating shape (photos). The basic constraint is that
from information related to depth the underlying shape must be
detected in a single image, i.e., from photo-consistent with all the input
monocular cues. See photos, i.e., roughly speaking, give rise
shape from contours , to compatible intensity values in all
shape from line drawings , cameras.
shape from perspective ,
shape from shading , shape from photometric stereo:
shape from specularity, See photometric stereo . [ JKS:11.3]
shape from structured light ,
shape from texture . shape from polarization: A
technique recovering local shape from
shape from motion: A vast class of the polarization properties of a surface
algorithms for estimating 3D shape under observation. The basic idea is to
(structure), and often depth , from the illuminate a surface with known
motion information contained in an polarized light , estimate the
image sequence. Methods exist that polarization state of the reflected light,
rely on tracking sparse sets of then use this estimate in a closed-form
image features (for instance, the model linking the surface normals with
TomasiKanade factorization ) as well
S 235

the measured polarization parameters. shape from texture: The problem of

In practice, polarization estimates can estimating shape, here in the sense of a
be noisy. This method can be useful field of normals from which a surface
wherever intensity images do not can be recovered up to a scale factor,
provide information, e.g., featureless from the image texture . The
specular surfaces. See also deformation of a planar texture
polarization based methods . recorded in an image (the
texture gradient ) depends on the shape
shape from shading: The problem of of the surface to which the texture is
estimating shape, here in the sense of a applied. Techniques exist for shape
field of normals from which a surface estimation from statistical texture and
can be recovered up to a scale factor, regular texture patterns. [ FP:9.4-9.5]
from the shading pattern (light and
shadows) of an image. The key idea is shape from X: A generic term for a
that, assuming a reflectance map for method that generates 3D shape or
the scene (typically Lambertian), an position estimates from one of a variety
image irradiance equation can be of possible techniques, such as stereo ,
written linking the surface normals to shading , focus , etc. [ TV:9.1]
the illumination direction and the
image intensity. The constraint can be shape from zoom: The problem of
used to recover the normals assuming computing shape (in the sense of the
local surface smoothness. [ JKS:9.4] distance of each scene point from the
sensor) from two or more images
shape from shadows: A technique acquired at different zoom settings,
for recovering geometry from a number achieved through a zoom lens . The
of images of an outdoor scene acquired basic idea is to differentiate the
at different times, i.e., with the sun at projection equations with respect to the
different angles. Geometric information focal length , f , achieving an expression
can be recovered under various linking the variations of f and pixel
assumptions and knowledge of the suns displacement with depth . [ J. Ma, and
position. Also called shape from S. I. Olsen, Depth from zooming, J.
darkness. See also shape from shading Optical Society of America A, Vol. 7,
and photometric stereo. pp 1883-1890, 1990.]

shape from silhouettes: See shape grammar: A grammar

shape from contours . [ JJK] specifying a class of shapes , whose
rules specify patterns for combining
shape from specularity: A class of more primitive shapes. Rules are
algorithms for estimating local shape composed of two parts, 1) describing a
from surface specularities. A specific shape and 2) how to replace or
specularity constrains the surface transform it. Used also in design, CAD,
normal as the incident and reflection and architecture. See also
angles must coincide. The detection of production system , expert system .
specularities in images is, in itself, a [ BB:6.3.2]
non-trivial problem.
shape index: A measure, usually
shape from structured light: See indicated by S, of the type of shape of
structured light triangulation . a surface patch in terms of its
[ JKS:11.4.1]
236 S

principal curvature . Formally shape texture: The texture of a

surface from the point of view of the
2 M + m variation in the shape, as contrasted to
S= arctan
M m the variation in the reflectance
patterns on the surface. See also
where m and M are the principal
surface roughness characterization .
curvatures. S is undetermined for
planar patches. A related parameter, sharpunsharp masking: A form of
R, called curvedness, measures the image enhancement that makes the
amount of curvedness of the patch: edges of image structures crisper. The
q operator can either add a weighted
(2M + 2m )/2 amount of a gradient or
high-pass filter of the image or subtract
All curvature-based shape classes map a weighted amount of a smoothing or
to the unit circle in the RS plane, with low pass filter of the image. The image
planar patches at the origin. See also on the right is an unsharp masked
mean and Gaussian curvature version of the one on the left
shape classification , [ SEU:4.3]:
shape representation .

shape magnitude class: Part of a

local surface curvature representation
scheme in which each point has a
curvature class , and a magnitude of
curvature (shape magnitude). This
representation is an alternative to the
more common shape classification based
on either the two principal curvatures
or the mean and Gaussian curvature .

shape representation: A large class shear transformation: An affine

of techniques seeking to capture the image transformation changing one
salient properties of shapes, both 2D coordinate only. The corresponding
and 3D, for analysis and comparison transformation matrix, S, is equal to
purposes. Many representations have the identity apart from s12 = sx , which
been proposed in the literature, changes the first image coordinate.
including skeletons for 2D and 3D Shear on the second image coordinate is
shapes (see medial axis skeletonization obtained similarly by s21 = sy . An
and distance transform ), example of the result of a shear
curvature-based representations (for transformation is [ SQ:9.1]:
instance, the curvature primal sketch ,
the curvature scale space , the
extended Gaussian image ),
generalized cones for articulated
objects, invariants , and flexible objects
models (for instance snakes ,
deformable superquadrics , and
deformable template model ).
[ ZRH:2.3] shock tree: A 2D
shape representation technique based
S 237

on the singularities (see purposes. See image compression ,

singularity event ) of the radius function digital watermarking .
along the medial axis (MA). The MA is
represented by a tree with the same signal processing: The collection of
structure, and is divided into mathematical and computational tools
continuous segments of uniform for the analysis of typically 1D (but
behavior (local maximum, local also 2D, 3D, etc.) signals such as audio
minimum, constant, monotonic). See recordings or other intensity versus
also medial axis skeletonization , time or position measurements. Digital
distance transform . signal processing is the subset of signal
processing which pertains to signals
short baseline stereo: See that are represented as streams of
narrow baseline stereo . binary digits. [ WP:Signal processing]

shot noise: See impulse noise and signal-to-noise ratio (SNR): A

salt-and-pepper noise . [ FP:1.4.2] measure of the relative strength of the
interesting and uninteresting (noise)
shutter: A device allowing the light part of a signal. In signal processing,
into a camera for enough time to form SNR is usually expressed in decibels as
an image on a photosensitive film or the ratio of the power of signal and
chip. Shutters can be mechanical, as in noise, i.e., 10 log Ps . With statistical
10 Pn
traditional photographic cameras, or noise, the SNR can be defined as 10
electronic, as in a digital camera . In times the log of the ratio of the
the former case, a window-like standard deviations of signal and noise.
mechanism is opened to allow the light [ AJ:3.6]
to be recorded by a photosensitive film.
In the latter case, a CCD or other type signature identification: A class of
of sensor is triggered electronically to techniques for verifying a written
record the amount of incident light at signature. Also known as Dynamic
each pixel. Signature Verification. An area of
[ WP:Shutter (photography)] biometrics . See also
handwriting verification ,
shutter control: The device handwritten character recognition ,
controlling the length of time that the fingerprint identification ,
shutter is open. face identification.
[ WP:Exposure (photography)#Exposure control]
[ WP:Handwriting recognition]

signature verification: The problem

side looking radar: A radar of authenticating a signature
projecting a fan-shaped beam automatically with image processing
illuminating a strip of the scene at the techniques; in practice, deciding
side of the instrument, typically used whether a signature matches a
for mapping a large area. The map is specimen sufficiently well. See also
produced as the instrument is carried handwriting verification and
along by a vehicle sweeping the surface handwritten character recognition .
to the side. See also sonar . [ WP:Handwriting recognition]
signal coding system: A system for silhouette: See object contour .
encoding a signal into another, [ FP:19.2]
typically for compression or security
238 S

single instruction multiple data . simple lens: A lens composed by a
[ RJS:8] single piece of refracting material,
shaped in such a way to achieve the
similarity: The property that makes desired lens behavior. For example, a
two entities (images, models, objects, convex focusing lens. [ BKPH:2.3]
features, shape, intensity values, etc.)
or sets thereof similar, that is, simulated annealing: A
resembling each other. A coarse-to-fine, iterative optimization
similarity transformation creates algorithm. At each iteration, a
perfectly similar structures and a smoothed version of the energy
similarity metric quantifies the degree landscape is searched and a global
of similarity of two possibly minimum located by a statistical (e.g.,
non-identical structures. Examples of random) process. The search is then
similar structures are 1) two polygons performed at a finer level of smoothing,
identical except for a change in size, and so on. The idea is to locate the
and 2) two image neighborhoods whose basin of the absolute minimum at
intensity values are identical except for coarse scales, so that fine-resolution
scaling by a multiplicative factor. The search starts from an approximate
concept of similarity lies at the heart of solution close enough to the absolute
several classic vision problems, minimum to avoid falling into
including stereo correspondence , surrounding local minima. The name
image matching , and derives from the homonymous
geometric model matching . procedure for tempering metal, in
[ JKS:14.3] which temperature is lowered in stages,
each time allowing the material to
similarity metric: A metric reach thermal equilibrium. See also
quantifying the similarity of two coarse-to-fine processing . [ SQ:2.3.3]
entities. For instance, cross correlation
is a common similarity metric for image single instruction multiple data
regions. For similarity metrics on (SIMD): A computer architecture
specific objects encountered in vision, allowing the same instruction to be
see feature similarity , graph similarity , simultaneously executed on multiple
gray scale similarity . See also processors and thus different portions of
point similarity measure , matching . the data set (e.g., different pixels or
[ DH:6.7] image neighborhoods). Useful for a
variety of low-level image processing
similarity transformation: A operations. See also MIMD ,
transformation changing an object into pipeline parallelism , data parallelism ,
a similar-looking one; formally, a parallel processing. [ RJS:8]
conformal mapping preserving the ratio
of distances (the magnification ratio). single photon emission computed
The transformation matrix, T, can be tomography (SPECT): A medical
written as T = B AB, where A and imaging technique that involves the
B are similar matrices, that is, rotation of a photon detector array
representing the same transformation around the body in order to detect
after a change of basis. Examples photons emitted by the decay of
include rotation, translation, expansion previously injected radionuclides. This
and contraction (scaling). [ SQ:9.1] technique is particularly useful for
creating a volumetric image showing
S 239

metabolic activity. Resolution is lower which are the eigenvalues of a special

than PET but imaging is cheaper and symmetric tridiagonal matrix. This
some SPECT radiopharmaceuticals includes the
may be used where PET nuclides discrete cosine transform (DCT) .
cannot. [ WP:SPECT] [ AJ:5.12]

singular value decomposition skeleton: A curve, or tree-like set of

(SVD): A factorization of any m n curves, capturing the basic structure of
matrix A into A = UDVT . The an object. This figure shows an
columns of the m m matrix U are example of a linear skeleton for a
mutually orthogonal unit vectors, as are puppet-like 2D shape:
the columns of the n n matrix V. The
m n matrix D is diagonal, and its
nonzero elements, the singular values
i , satisfy 1 2 . . . n 0. The
SVD has extremely useful properties.
For example:
A is nonsingular if and only if all
its singular values are nonzero,
and the number of nonzero
singular values gives the rank of
the columns of U corresponding The curves forming the skeleton are
to the nonzero singular values typically central to the shape. Several
span the range of A; the columns algorithms exist for computing
of V corresponding to the skeletons, for instance, the medial axis
nonzero singular values span the transform (see
null space of A; medial axis skeletonization ) and the
the squares of the nonzero distance transform , for which the
singular values are the nonzero grassfire algorithm can be applied.
eigenvalues of both AAT and [ AJ:9.9]
AT A, and the columns of U are skeleton by influence zones (SKIZ):
eigenvectors of AAT , those of V Commonly known as the
of AT A. Voronoi diagram . [ SQ:7.3.2]
Moreover, the pseudoinverse of a
matrix, occurring in the solution of skeletonization: A class of techniques
rectangular linear systems, can be that try to reduce a 2D (or 3D) binary
easily computed from the SVD image to a skeleton form in which
definition. [ FP:12.3.2] every remaining pixel is a skeleton
pixel, but the essential shape of the
singularity event: A point in the input image is captured. Definitions of
domain of the map of a geometric curve the skeleton include the set of centers of
or surface where the first derivatives circles bitangent to the object
vanish. boundary and
[ WP:Singular point of a curve] smoothed local symmetries . [ RJS:6]

sinusoidal projection: A family of skew: An error introduced in the

linear image transforms, C, the rows of imaging geometry by a non-orthogonal
240 S

pixel grid, in which rows and columns

of pixels do not form an angle of exactly
90 degrees. This is usually considered SURFACE NORMAL
only in high-accuracy photogrammetry OF VIEW SLANT
applications. [ JKS:12.10.2] ANGLE

skew correction: A transformation

compensating for the skew error.
[ JKS:12.10.2]

skew symmetry: A skew symmetric See also tilt , shape from texture .
contour is a planar contour such that [ FP:9.4.1]
every straight line oriented at an angle
with respect to a particular axis, slant normalization: A class of
called the skew symmetry axis of the algorithms used in handwritten
contour, intersects the contour at two character recognition, transforming
points equidistant from the axis. An slanted cursive character into vertical
example [ BB:9.5.4]: ones. See
handwritten character recognition,
optical character recognition .

slice based reconstruction: The

reconstruction of a 3D object from a
number of planar slices, or sections
taken across the object. The slice plane
AXIS is typically advanced at regular spatial
intervals to sweep the working volume.
See also tomography ,
d computerized tomography ,
single photon emission
computed tomography and
nuclear magnetic resonance .

slope density function: This is the

histogram of the tangential orientations
(slopes) of a curve or region boundary.
skin color analysis: A set of It can be used to represent the curve
techniques for color analysis applied to shape in a manner invariant to
images containing skin, for instance for translation and rotation (up to a shift
retrieving images from a database (see of the density function). [ BB:8.4.5]
color based image retrieval ). See also
color, color image , small motion model: A class of
color image segmentation , mathematical models representing very
color matching , and colorimetry . small (ideally, infinitesimal)
camera-scene motion between frames.
SKIZ: See skeleton by influence zones . Used typically in shape from motion .
[ SQ:7.3.2] See also optical flow .

slant: The angle between a smart camera: A hardware device

surface normal in the scene and the incorporating a camera and an
viewing direction: on-board computer in a single, small
S 241

container, thus achieving a signal-to-noise ratio of the image. See

programmable vision system within the also discontinuity preserving
size of a normal video camera. smoothing, anisotropic diffusion and
[ TV:2.3.1] adaptive smoothing . [ FP:7.1.1]

smooth motion curve: The curve smoothing filter: Smoothing is often

defined by a motion that can be achieved by convolution of the image
expressed by smooth (that is, with a smoothing filter to reduce noise
differentiable: derivatives of all orders or high spatial frequency detail. Such
exist) parametric functions of the image filters include discrete approximations
coordinates. Notice that smooth is to the symmetric probability densities
often used in an intuitive sense, not in such as the Gaussian , binomial and
the strict mathematical sense above uniform distributions. For example, in
(clearly, an exacting constraint), as, for 1D, the discrete signal x1 . . . xn is
example, in image smoothing . See also convolved with the kernel [ 16 46 61 ] to
motion , motion analysis . produce the smoothed signal y1 . . . yn+2
in which yi = 61 xi1 + 64 xi + 16 xi+1 .
smoothed local symmetries: A class [ FP:7.1.1]
of skeletonization algorithms,
associated with Asada and Brady. smoothness constraint: An
Given a 2D curve that bounds a closed additional constraint used in data
region in the plane, the skeleton as interpretation problems. The general
computed by smoothed local principle is that results derived from
symmetries is the locus of chord nearby data must themselves have
midpoints of bitangent circles. similar values. Traditional examples of
Compare the symmetric axis transform. where the smoothness constraint can be
Two skeleton points as defined by applied are in shape from shading and
smoothed local symmetries are shown: optical flow . The underlying
observation that supports this
computational constraint is that the
observed real world surfaces and
motions are smooth almost everywhere.
[ JKS:9.4]

snake: A snake is the combination of a

deformable model and an algorithm for
fitting that model to image data. In
one common embodiment, the model is
a parameterized 2D curve , for example
smoothing: Generally, any a b-spline parameterized by its control
modification of a signal intended to points. Image data, which might be a
remove the effects of noise . Often used gradient image or 2D points, induces
to mean the attenuation of high forces on points on the snake that are
spatial frequency components of a translated to forces on the control
signal. As many models of noise have a points or parameters. An iterative
flat power spectral density (PSD), algorithm adjusts the control points
while natural images have a PSD that according to these forces and
decays toward zero at high spatial recomputes the forces. Stopping
frequencies, suppressing the high criteria, step lengths, and other issues
frequencies increases the overall
242 S

of optimization are all issues that must may be computed. See also
be dealt with in an effective snake. fuzzy morphology .
[ TV:5.4]
soft morphology: See
SNR: See signal-to-noise ratio . soft mathematical morphology .
[ AJ:3.6]
soft vertex: A point on a polyline
Sobel edge detector: An whose connecting line segments are
edge detector based on the almost collinear. Soft vertices may arise
Sobel kernels . The edge magnitude from segmentation of a smooth curve
image E is the square root of the sum into line segments. They are called
of squares of the convolution of the soft because they may be removed if
image with horizontal and vertical the segments of the polyline are
Sobelpkernels, given by replaced by curve segments. [ JKS:6.6]
E = (Kx I)2 + (Ky I)2 . The
Sobel operator applied to the left image
gives the right image [ JKS:5.2.2]: solid angle: Solid angle is a property
of a 3D object: the amount of the unit
spheres surface that the objects
projection onto the unit sphere
occupies. The unit spheres surface area
is 4, so the maximum value of a solid
angle is 4 steradians [ FP:4.1.2]:

Sobel gradient operator: See

Sobel kernel. [ JKS:5.2.2]

Sobel kernel: A gradient estimation SOLID ANGLE

kernel used for edge detection . The
horizontal kernel is the convolution of a
smoothing filter , s = [1, 2, 1] in the
horizontal direction and a
gradient operator d = [1, 0, 1] in the
vertical direction. The kernel

1 2 1
Ky = s d = 0 0 0 .
1 2 1

highlights horizontal edges. The

vertical kernel Kx is the transpose of
Ky . [ JKS:5.2.2] source: An emitter of energy that
illuminate the vision systems sensors .
soft mathematical morphology: An
extension of gray scale morphology in source geometry: See
which the min/max operations are light source geometry .
replaced by other rank operations e.g.,
replace each pixel in an image by the source image: The image on which an
90th percentile value in a 5 5 window image processing or an image analysis
centered at the pixel. Weighted ranks operation is based.
S 243

Source Image Target Image

spatial angle: The area on a unit

sphere that is bounded by a cone with
its apex in the center of the sphere.
Measured in steradians. This is
source placement: See frequently used when analyzing
light source placement . luminance .
space carving: A method for creating
a 3D volumetric model from 2D
images. Starting from a voxel
representation in which a 3D cube is
marked occupied, voxels are removed Spatial Angle
if they fail to be photo-consistent in
the set of 2D images in which they
appear. The order in which the voxels
are processed is a key aspect of space
carving, as it allows otherwise
intractable visibility computations to
be avoided. [ K. N. Kutulakos, and S. spatial averaging: The pixels in the
M. Seitz, A Theory of Shape by Space output image are weighted averages of
Carving, Int. J. of Computer Vision, their neighboring pixels in the input
Vol. 38, pp 199-218, 2000.] image. Mean and Gaussian smoothing
are examples of spatial averaging.
space curve: A curve that may follow [ AJ:7.4]
a path in 3D space (i.e., it is not
restricted to lying in a plane). spatial domain smoothing: An
[ WP:Space curve#Topology] implementation of smoothing in which
each pixel is replaced by a value that is
space variant sensor: A sensor in directly computed from other pixels in
which the pixels are not uniformly the image. In contrast,
sampling the projected image data. For frequency domain smoothing first
example, a log-polar sensor has rings of processes all pixels to create a linear
pixels of exponentially increasing size as transformation of the image, such as a
one moves radially from the central point Fourier transform and expresses the
[ WP:Space Variant Imaging#Foveated sensors]smoothing operation in terms of the
: transformed image. [ RJS:4]
244 S

spatial frequency: The rate of spatial matched filter: See

repetition of intensities across an image. matched filter . [ ERD:10.4]
In a 2D image the space to which
spatial refers is the images XY plane. spatial occupancy: A form of object
or scene representation in which a 3D
space is divided into a grid of voxels .
Voxels containing a part of the object
are marked as being occupied and other
voxels are marked as free space. This
representation is particularly useful for
tasks where properties of the object are
less important than simply the presence
and position of the object, as in robot
navigation. [ JKS:15.3.2]

spatial proximity: The distance

between two structures in real space (as
contrasted with proximity in a feature
This image has significant repetition at or property space). [ JKS:3.1]
a spatial frequency of 10 pixel1 in the spatial quantization: The conversion
horizontal direction. The 2D of a signal defined on an infinite
Fourier transform represents spatial domain to a finite set of
frequency contributions in all limited-precision samples. For example
directions, at all frequencies. A discrete the function f (x, y): R2 7 R might be
approximation is efficiently computed quantized to the image g, of width w
using the fast Fourier transform (FFT). and height h defined as g(i, j):
[ EH:7.7] {1..w} {1..h} 7 R. The value of a
spatial hashing: See spatial indexing . particular sample g(i, j) is determined
[ WP:Spatial index] by the point-spread function p(x, y),
and is given
R by
spatial indexing: 1) Conversion of a g(i, j) = p(x i, y j)f (x, y)dxdy.
shape to a number, so that it may be [ SEU:2.2.4]
quickly compared to other shapes.
Intimately linked with the computation spatial reasoning: Inference from
of invariants to spatial transformations geometric rather than symbolic or
and imaging distortions of the shape. linguistic information. See also
For example, a shape represented as a geometric reasoning .
collection of 2D boundary points might [ WP:Spatial reasoning]
be indexed by its compactness . 2) The spatial relation: An association of
design of efficient data structures for two or more spatial entities, expressing
search and storage of geometric the way in which such entities are
quantities. For example closest-point connected or related. Examples include
queries are made more efficient by the perpendicularity or parallelism of lines
computation of spatial indices such as or planes, and inclusion of one image
the Voronoi diagram , region in another. [ BKKP:5.8]
distance transform , k-D trees, or
Binary Space Partitioning (BSP) trees.
[ WP:Spatial index]
S 245

spatial resolution: The smallest

separation between distinct signal Laser source
Imaging surface
(e.g. CCD array)
Beam interference
features that can be measured by a gives light/dark spot

sensor. For a CCD camera, this is

dictated by the distance between
Rough surface
adjacent pixel centers. It is often xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

specified as an angle: the angle between

the 3D rays corresponding to adjacent speckle reduction: Restoration of
pixels. The inverse of the highest images corrupted with speckle noise,
spatial frequency that a sensor can such as laser or ultrasound images.
represent without aliasing . [ JKS:8.2] [ AJ:8.13]

spatio-temporal analysis: The SPECT: See single-photon emission

analysis of moving images by processing computed tomography . [ WP:SPECT]
that operates on the 3D volume formed
by the stack of 2D images in a sequence.
Examples include kinetic occlusion, the spectral analysis: 1) Analysis
epipolar plane image (EPI) and performed in either the spatial ,
spatio-temporal autoregressive models temporal or electromagnetic frequency
(STAR). domain. 2) Generally, any analysis that
involves the examination of eigenvalues.
special case motion: A subproblem This is a nebulous concept, and
of the general structure from motion consequently the number of spectral
problem, where the camera motion is techniques is large. Often equivalent
known to be constrained a priori. to PCA .
Examples include planar motion ,
turntable motion or single-axis rotation, spectral decomposition method:
and pure translation. In each case, the See spectral analysis .
constrained motion simplifies the spectral density function: See
general problem, yielding one or more power spectrum . [ AJ:2.11]
of: closed-form solutions, greater
efficiency, increased accuracy. Similar spectral distribution: The
benefits can be obtained from spatial power spectrum or
approximations such as the electromagnetic spectrum distribution.
affine camera and weak perspective .
spectral factorization: A method for
speckle: A pattern of light and dark designing linear filters based on
spots superimposed on the image of a difference equations that have a given
scene that is illuminated by coherent spectral density function when applied
light such as from a laser. Rough to white noise . [ AJ:6.3]
surfaces in the scene change the path
lengths and thus the interference effects spectral filtering: Modifying the light
of different rays, so a fixed scene, laser before it enters the sensor by using a
and imager configuration results in a filter tuned to different spectral
fixed speckle pattern on the imaging frequencies. A common use is with
surface. [ AJ:8.13] laser sensing, in which the filter is
chosen to pass only light at the lasers
frequency. Another usage is to
eliminate ambient infrared light in
order to increase the sharpness of an
246 S

image (as most silicon-based sensors are spherical harmonic: A function

also sensitive to infrared light). defined on the unit sphere of the form

spectral frequency: Electromagnetic Ylm (, ) = lm Plm (cos)eim

or spatial frequency. [ EH:7.7]
is a spherical harmonic, where lm is a
spectral reflectance: See reflectance . normalizing factor, and Plm is a
[ JKS:9.1.2] Legendre polynomial. Any real function
defined on the sphere f (, ) has an
spectral response: The response R of expansion in terms of the spherical
an imaging sensor illuminated by harmonics of the form
monochromatic light of wavelength is
the product of the input light intensity X
I and the spectral response at that f (, ) = lm Ylm (, )
wavelength s(), so R = Is(). l=0 m=l

that is analogous to the Fourier

spectrum: A range of values such as
expansion of a function defined on the
the electromagnetic spectrum .
plane, with the lm analogous to the
[ WP:Spectrum]
Fourier coefficients. Polar plots of the
specular reflection: Mirror-like first ten spherical harmonics, for
reflection or highlight. Formed when a m = 0...2, l = 0...m. The plots show
light source at 3D location L, surface r = 1 + Ylm (, ) in polar coordinates
point P , surface normal N at that [ BB:9.2.3]:
point and camera center C are all
coplanar, and the angles LP N and
N P C are equal. [ FP:4.3.4-4.3.5]

Light source Camera C

L Surface

specularity: See specular reflection.

[ FP:4.3.4-4.3.5]

sphere: 1. A surface in any dimension spherical mirror: Sometimes used in

defined by the ~x such that k~x ~ck = r catadioptric cameras. A mirror whose
for a center ~c and radius r. 2. The shape is a portion of a sphere.
volume of space bounded by the above, [ WP:Spherical mirror#Mirror shape]
or ~x such that k~x ~ck r.
[ WP:Sphere] spin image: A local surface
representation of Johnson and Hebert.
spherical: Having the shape of, At selected points p~ with
characteristics of, or associations with, surface normal ~n, all other surface
a sphere . [ WP:Spherical] points ~x can be represented in a 2D
basis as (, ) =
S 247

( || ~x p~ ||2 (~n (~x p~))2 , ~n (~x p~)). predicted at that point by a spline x (t)
The spin image is the histogram of all fitted to neighboring values. [ AJ:8.7]
of the (, ) values for the surface.
Each selected points p~ leads to a split and merge: A two-stage
different spin image. Matching points procedure for segmentation or
compares their spin images by clustering . The data is divided into
correlation. Key advantages of the subsets, with the initial division being a
representation are 1) it is independent single set containing all the data. In
of pose and 2) it avoids ambiguities of the split stage, subsets are repeatedly
representation that can occur with subdivided depending on the extent to
nearly flat surfaces. [ FP:21.4.2] which they fail to satisfy a coherence
criterion (for example, similarity of
splash: An invariant representation of pixel colors). In the merge stage, pairs
the region about a 3D point. It gives a of adjacent sets are found that, when
local shape representation useful for merged, will again satisfy a coherence
position invariant object recognition. criterion. Even if the coherence criteria
are the same for both stages, the merge
spline: 1) A curve ~c(t) defined as a stage may still find subsets to merge.
weightedPnsum of control points: [ VSN:3.3.2]
~c(t) = i=0 wi (t)~ pi , where the control
points are p~1...n and one weighting (or SPOT: Systeme Probatoire de
blending) function wi is defined for lObservation de la Terre. A series of
each control point. The curve may satellites launched by France that are a
interpolate the control points or common source of satellite images of
approximate them. The construction of the earth. SPOT-5 for example was
the spline offers guarantees of launched in May 2002 and provides
continuity and smoothness. With complete coverage of the earth every 26
uniform splines the weighting functions days. [ WP:SPOT (satellites)]
for each point are translated copies of
each other, so wi (t) = w0 (t i). The spot detection: An image processing
form of w0 determines the type of operation for locating small bright or
spline: for B-splines and Bezier curves, dark locations against contrasting
w0 (t) is a polynomial (typically cubic) backgrounds. The issues here are what
in t. Nonuniform splines reparameterize size of spot and amount of contrast.
the t axis, ~c(t) = ~c(u(t)) where u(t)
spur: A short segment attached to a
maps the integers k = 0..n to knot
more significant line or edge . Spurs
points t0..n with linear interpolation for
often arise when linear structures are
non-integer values of t. Rational splines
tracked through noisy data, such as by
with n-D control points are perspective
an edge detector . This figure shows
projections of normal splines with
some spurs [ SOS:5.2]:
(n + 1)-D control points.
2) Tensor-product splines define a 3D
surface ~x(u, v) as a product of splines
in u and v. [ JKS:6.7]

spline smoothing: Smoothing of a

discretely sampled signal x(t) by
replacing the value at ti by the value
248 S

(~x, l) pairs or by a self-organizing

SPURS learning algorithm. [ AJ:9.14]

statistical pattern recognition:

Pattern recognition that depends on
classification rules learned from
examples rather than constructed by
designers. Compare
structural pattern recognition .
[ RJS:6]

statistical shape model: A

parameterized shape model where the
parameters are assumed to be random
squared error clustering: A class of variables drawn from a known
clustering algorithms that attempt to probability distribution. The
find cluster c