Professional Documents
Culture Documents
Images of natural scenes contain a rich variety of visual patterns. To learn and recognize these patterns
from natural images, it is necessary to construct statistical models for these patterns. In this review arti-
cle we describe three statistical principles for modeling image patterns: the sparse coding principle, the
minimax entropy principle, and the meaningful alignment principle. We explain these three principles
and their relationships in the context of modeling images as compositions of Gabor wavelets. These three
principles correspond to three regimes of composition patterns of Gabor wavelets, and these three regimes
are connected by changes in scale or resolution.
KEY WORDS: Gabor wavelet; Meaningful alignment; Minimax entropy; Scale; Sparse coding.
249
250 YING NIAN WU ET AL.
Figure 1. Examples from three regimes of image patterns: (a) texture regime, (b) object regime, and (c) geometry regime. The three regimes
are connected by the change in viewing distance or, equivalently, the change in camera resolution.
then we can recognize different image patterns within the win- however, the multiscale structure of the recognized patterns has
dow, and these patterns belong to different regimes described not received as much treatment. Recent attempts to understand
in the previous section. We recognize lines and regions at high this issue have been made by Wu, Zhu, and Guo (2007) and
resolution or small scale, shapes and objects at medium reso- Wang, Bahrami, and Zhu (2005). The compositional relation-
lution or scale, and textures and clutters at low resolution or ships have been studied by Geman, Potter, and Chi (2002). The
large scale. In addition, these patterns are highly organized. AND–OR grammars for the multilayer organizations of the im-
The large-scale texture patterns are composed of medium-scale age patterns have been studied by Zhu and Mumford (2007).
shape patterns, which in turn are composed of small-scale edge
and corner patterns. Figure 2(b) provides an illustration, where
the image patches at different resolutions are labeled as various 1.3 Bayesian Generative Modeling and
patterns such as foliage, leaves, and edges or corners. The ver- Posterior Inference
tical and diagonal arrows indicate the compositional relation-
ships, and the horizontal arrows indicate the spatial arrange-
An elegant approach to recognizing the multiscale patterns
ments. Thus a complete interpretation of an image must con-
from the multiscale image data is to construct a Bayesian gen-
sist of the labels of image patches at all of these layers. It also
should include the compositional and spatial organizations of erative model and use the posterior distribution to guide the in-
the labels. ferential process (Grenander 1993; Grenander and Miller 1994;
Figure 2 illustrates the multiscale structures of both the im- Geman and Geman 1984; Mumford 1994). Given the multilayer
age data and the recognized patterns. The multiscale structure structures of both the image data and the labels of the recog-
of the image data has been extensively studied in wavelet the- nized patterns as shown in Figure 2(b), a natural form of this
ory (Mallat 1989; Simoncelli, Freeman, Adelson, and Heeger model is a recursive coarse-to-fine generative process consist-
1992) and scale-space theory (Witkin 1983; Lindberg 1994); ing of the following two schemes:
(a) (b)
Figure 2. Multiscale representations of data and patterns. (a) A Gaussian pyramid. The image at each layer is a zoomed-out version of the
image at the layer beneath. The Laplacian pyramid can be obtained by computing the differences between consecutive layers of the Gaussian
pyramid. (b) For the same image, different patterns are recognized at different scales. These patterns are organized in terms of compositional
and spatial relationships. The vertical and diagonal arrows represent compositional relationships; the horizontal arrows, spatial relationships.
Figure 3. Gabor wavelets. (a) A sample of Gabor wavelets at different locations, scales, and orientations. (b) An example of a Gabor sine
wavelet. (c) An example of a Gabor cosine wavelet.
Scheme 1: The labels generate their image data at the cor- 1.4 Hint From Biological Vision
responding scale or frequency band; for example, a leaf
pattern generates a leaf image patch at the same layer. Given the fact that biological vision is far superior to com-
Scheme 2: The labels generate those constituent labels at the puter vision in terms of learning and recognizing natural image
lower scale; for example, a leaf pattern is decomposed patterns, it is useful to take some clues from biological vision
into edge and corner patterns at the lower layer. when constructing image models to be used for computer vi-
These two schemes are closely related and lead to a joint sion. In neuroscience, neuron recording data (Hubel and Wiesel
distribution of the labels of the recognized patterns and the 1962) indicate that visual cells in the primary visual cortex or
bandpass image patches over all of the scales, Pr(labels, image V1 area (the part of the brain responsible for the initial stage
patches). In principle, this model can be learned from training of visual processing) respond to local image structures, such as
images that have already been labeled. A new image can be bars, edges, and gratings at different positions, scales, and ori-
interpreted by sampling from or maximizing the posterior dis- entations.
tribution Pr(labels|image patches). Specifically, the V1 cells are classified into simple cells
In the posterior distribution Pr(labels|image patches), the la- and complex cells. The simple cells are modeled by Gabor
bel of a particular image patch at a certain layer is determined wavelets (Daugman 1985). Figure 3(a) displays a sample of
not only by this image patch itself, but also by the labels of Gabor wavelets (Lee 1996), which are localized functions at
other image patches. For instance, the leaf pattern in the center different locations with different orientations and scales. These
of Figure 2(b) can be recognized by combining the following functions are in the form of sinusoidal waves multiplied by
four sources of information: Gaussian functions. Figures 3(b) and 3(c) display the perspec-
Source 1. The image patch of this leaf at the current layer tive plots of two such functions. A more detailed explanation
Source 2. The bottom-up information from the edge and cor- is provided in Section 2. Responses of simple cells are mod-
ner patterns of the image patches at the layers below eled as projection coefficients of the image onto these func-
Source 3. The top-down information from the foliage pat- tions. These projection coefficients capture image structures
terns of the image patches at the layers above at different locations, orientations, and scales. The complex
Source 4. The contextual information from the leaf patterns cells are also sensitive to similar local image structures; how-
of the adjacent image patches at the same layer. ever, they are more tolerant or invariant of the shifting of loca-
tions and scales of the image structures (Lampl, Ferster, Pog-
Although source 1 is the most direct, the other sources also
gis, and Riesenhuber 2004). Such nonlinear behavior can be
can be important, especially when source 1 contains too much
modeled by combining or pooling the outputs from the sim-
ambiguity, which can often be the case in natural images. The
ple cells (Adelson and Bergen 1985; Riesenhuber and Poggio
combination of information can be accomplished by Bayesian
1999).
posterior computation.
A realistic generative model, Pr(labels, image patches), The set of Gabor wavelets is self-similar and has a multiscale
based on the two schemes mentioned earlier, remains beyond structure that is consistent with the multiscale structures of the
our reach. The posterior computation for propagating and com- image data described in the previous section. Specifically, pro-
bining information also is not well understood. In this article jecting an image onto a large-scale Gabor wavelet is equivalent
we study statistical principles that may eventually lead to such to projecting a zoomed-out version of this image onto a smaller-
a generative model. The focus of this article is on Scheme 1— scale Gabor wavelet at the same location and orientation. These
that is, how the patterns generate their image data at the same Gabor wavelets can be the link between the image data and the
layer. A recent article by Zhu and Mumford (2007) on the sto- corresponding patterns. Specifically, these Gabor wavelets may
chastic AND–OR grammars for Scheme 2 explores how the serve as the building elements in the generative model Pr(image
patterns are decomposed into the constituent patterns at lower patches|labels), as well as the intermediate step in computing
layers in the compositional hierarchy. Pr(labels|image patches).
TECHNOMETRICS, AUGUST 2007, VOL. 49, NO. 3
252 YING NIAN WU ET AL.
1.5 Three Statistical Principles for Composing 2. GABOR WAVELETS: EDGE AND SPECTRUM
Gabor Wavelets
Gabor wavelets are localized in both spatial and frequency
In this article we examine three existing statistical principles domains, and can detect edges and bars in the spatial domain
for image modeling. These principles shed light on the roles of and extract spectra in the frequency domain. In this section we
Gabor wavelets in pattern recognition and provide useful guid- review Gabor wavelets first in the spatial domain, and then in
ance for constructing image models as compositions or pool- the frequency domain. For simplicity, we assume that both the
ings of Gabor wavelets.
image and the Gabor wavelets are functions defined on the two-
1.5.1 Sparse Coding Principle. Olshausen and Field
dimensional real domain R2 .
(1996) proposed that sparse coding is a strategy used by the
primary visual cortex to represent image data. This principle
seeks to find a dictionary of linear basis elements, so that any 2.1 Edge-Bar Detector
typical natural image can be expressed as a linear combination
of a small number of basis elements chosen from the dictionary. A Gabor wavelet (Daugman 1985) is a sinusoidal wave mul-
The basis elements learned by Olshausen and Field from natural tiplied by a Gaussian density function. The following function
images exhibit close resemblance to the Gabor wavelets. is an example:
1.5.2 Minimax Entropy Principle. Zhu, Wu, and Mum-
1 1 x12 x22
ford (1997) proposed the minimax entropy principle for mod- G(x) = exp − + eix1 , (1)
2πσ1 σ2 2 σ12 σ22
eling textures. This principle seeks to find statistical summaries
of the observed image so that the maximum entropy model con- √
where x = (x1 , x2 ) ∈ R2 and i = −1. This complex-valued
strained by these statistical summaries has the minimum en- function G(x) consists of two parts. The first part is an elongate
tropy. The particular statistical summaries adopted by Zhu et al. Gaussian function with σ1 < σ2 , so the shorter axis is along
are marginal histograms of Gabor wavelet coefficients. The re-
the x1 direction and the Gaussian function is oriented along the
sulting models are in the form of Markov random fields (Besag
x2 direction. The second part is a pair of sine and cosine plane
1974; Geman and Graffigne 1987). Realistic texture patterns
waves propagating along the x1 axis—that is, the shorter axis of
can be generated by sampling from such random fields.
1.5.3 Meaningful Alignment Principle. Moisan, Desol- the Gaussian function. Because of the Gaussian function, G(x)
neux, and Morel (2000) proposed the meaningful alignment is localized in the spatial domain. The Gabor cosine component
principle for perceptual grouping. This principle seeks to iden- or the real part of G(x) is an even-symmetric function, and the
tify coincidence patterns in the observed image data that other- Gabor sine component or the imaginary part of G(x) is an odd-
wise would be extremely rare in a completely random image. symmetric function (see Figs. 3(b) and 3(c) for illustrations).
Such coincidences are said to be meaningful. One particular We can translate, rotate, and dilate the function G(x) of (1)
example of this is the detection of line segments in image data to obtain a general form of Gabor wavelets such as those plot-
based on the alignment of local orientations along the potential ted in Figure 3(a): Gy,s,θ (x) = G(x̃/s)/s2 , where x̃ = (x̃1 , x̃2 ),
line segments. The local orientations can be computed as the x̃1 = (x1 − y1 ) cos θ − (x2 − y2 ) sin θ , and x̃2 = (x1 − y1 ) sin θ +
orientations of the best-tuned Gabor wavelets. (x2 − y2 ) cos θ . The function Gy,s,θ is centered at y, with ori-
These three principles correspond to the three regimes of im- entation θ . The standard deviations of the two-dimensional
age patterns discussed in Section 1.1. The sparse coding prin- Gaussian function in Gy,s,θ (x) along its shorter and longer axes
ciple corresponds to the object regime, where the image can be are sσ1 and sσ2 . The sine and cosine waves propagate at the
modeled based on the composition of a small number of wavelet frequency 1/s along the shorter axis of the Gaussian function.
elements at different positions, scales, and orientations. The For an image I(x), the projection coefficient of I onto Gy,s,θ
minimax entropy principle corresponds to the texture regime, is
where the image is modeled by pooling the wavelet coeffi-
cients into marginal histograms while discarding the position ry,s,θ = I, Gy,s,θ = I(x)Gy,s,θ (x) dx, (2)
information. The meaningful alignment principle corresponds
to the geometry regime, where the best-tuned Gabor wavelets where the integral is over R2 . In engineering literature, the Ga-
are highly aligned in both the spatial and frequency domains. bor wavelets are also called Gabor filters, ry,s,θ obtained by (2)
Although these three principles correspond to three differ- is called the filter response, and the image J(y) = ry,s,θ is called
ent regimes of patterns with different complexities and scales,
the filtered image. The Gabor sine component or the imagi-
they all seek to identify a small number of constraints on the
nary part of Gy,s,θ is odd-symmetric and sensitive to step-edge
Gabor wavelet coefficients to restrict the observed image to an
structures. If the image I has a step-edge structure at location
image ensemble of small volume. As a result, these constraints
constitute a simple and informative description of the observed y with orientation θ , then the imaginary part of ry,s,θ will be
image. The uniform distribution on the constrained image en- of large magnitude. The Gabor cosine component of Gy,s,θ is
semble can be linked to an unconstrained statistical model. This even-symmetric and sensitive to bar structures. If the image I
point of view suggests that it is possible to develop a unified has a bar structure at location y with orientation θ , then the real
statistical model for all these three regimes of image patterns as part of ry,s,θ will be of large magnitude.
the composition or pooling of Gabor wavelet coefficients. The Gabor wavelets can be used as edge-bar detectors (Wang
The rest of the article is organized as follows. Section 2 re- and Jenkin 1992; Perona and Malik 1990; Mallat and Zhong
views Gabor wavelets. Section 3 reviews the three statistical 1992; Canny 1986). Let rx,s,θ = ax,s,θ + ibx,s,θ . Here |rx,s,θ |2 =
principles, and Section 4 investigates the relationships between a2x,s,θ + b2x,s,θ is the energy extracted by the Gabor wavelet
these principles. Section 5 concludes with a discussion. Gx,s,θ (Adelson and Bergen 1985). For fixed x and s, let θ∗ =
TECHNOMETRICS, AUGUST 2007, VOL. 49, NO. 3
STATISTICAL PRINCIPLES IN IMAGE MODELING 253
Figure 4. The observed image (a) and edge maps at two different scales [(b) and (c)]. The scale of (b) is finer than that of (c).
arg maxθ |rx,s,θ |2 . We call Gx,s,θ∗ the best-tuned Gabor wavelet The Gabor wavelets can serve this purpose. Let Gs,θ (x) =
at (x, s). The local orientation is defined as θ (x, s) = θ∗ . The G0,s,θ (x). [See the previous section for the definition of Gy,s,θ ;
local phase is defined as φ(x, s) = arctan(bx,s,θ∗ /ax,s,θ∗ ). For here y = (0, 0).] The Fourier transform of Gs,θ is
edge structure, the local phase is π/2 or −π/2; for bar struc-
1 1 2
ture, the local phase is 0 or π . The local energy is defined as Ĝs,θ (ω) = exp − sσ1 ω̃1 − + (sσ2 ω̃2 )2 , (4)
A(x, s) = |rx,s,θ∗ |2 . Here x is an edge-bar point at scale s if 2 s
A(x, s) ≥ A(y, s) for all y such that the direction of y − x is per- where ω̃ = (ω̃1 , ω̃2 ), with ω̃1 = ω1 cos θ + ω2 sin θ and ω̃2 =
pendicular to θ (x, s) and |y − x| < d for predefined range d; −ω1 sin θ + ω2 cos θ . Ĝs,0 is a Gaussian function centered at
that is, A(x, s) is a local maximum along the normal direction (1/s, 0) with standard deviations 1/(sσ1 ) and 1/(sσ2 ) along ω1
of x, and we call Gx,s,θ(x,s) a locally best-tuned Gabor wavelet at and ω2 , and Ĝs,θ is a Gaussian function obtained by rotating
scale s. Figures 4(b) and 4(c) display the edge-bar points of the Ĝs,0 by an angle −θ , and Ĝs,θ is localized in the frequency do-
observed image in Figure 4(a) at two different scales, where the main. Figure 5 shows an example of Ĝs,θ in the frequency do-
intensity of an edge-bar point (x, s) is proportional to A(x, s)1/2 . main for a sample of (s, θ ) corresponding to the sample of Ga-
The scale s for computing the edge map is smaller in Figure 4(b) bor wavelets displayed in Figure 3(a). Each Ĝs,θ is illustrated
than in Figure 4(c). by an ellipsoid whose two axes are proportional to 1/(sσ1 ) and
1/(sσ2 ). One may intuitively imagine each ellipsoid as the fre-
2.2 Spectral Analyzer quency band covered by Ĝs,θ , and together these Ĝs,θ can pave
or tile the entire frequency domain.
The Gabor wavelets also extract power spectra. For an image Let J = I ∗ Gs,θ . J(x) = I(y), Gx,s,θ (−y) can be ex-
I(x), which is a function in L2 , this can be represented as the pressed as the projection of I onto the Gabor wavelet cen-
superposition of sinusoidal waves tered at x. We have that 4π 2 J(x) = Ĵ(ω)eiωx dω, where
Ĵ(ω) = Î(ω)Ĝs,θ (ω). Thus J extracts the frequency content
1
I(x) = Î(ω)eiωx dω, (3) of I within the frequency band covered by Ĝs,θ . According
4π 2
to the Parseval identity, 4π 2 J 2 = 4π 2 |J(x)|2 dx = Ĵ 2 =
where x = (x1 , x2 ) ∈ R2 , ω = (ω1 , ω2 ) ∈ R2 , ωx = x1 ω1 + |Î(ω)|2 |Ĝs,θ (ω)|2 dω. Thus J 2 extracts the local average of
x2 ω2 , and the integral is over the two-dimensional frequency the power spectrum of I within the frequency band covered by
domain R2 . For fixed ω, the plane wave eiωx in (3) propa- Ĝs,θ . Section 3.2 provides more discussion on this issue.
gates in the spatial domain R2 along the orientation of the vec-
tor ω = (ω1 , ω2 ) at frequency |ω| = (ω12 + ω22 )1/2 . The coef-
ficient Î(ω)
in (3) can be obtained by the Fourier transform
Î(ω) = I(x)e−iωx dx.
Ideally, we canextract certain frequency content of I by cal-
culating 4π 2 J = F Î(ω)eiωx dω for a certain set F ⊂ R2 of fre-
quencies. If F contains only those frequencies ω such that |ω|
are below a threshold, then J is called a low-pass image. If F
contains only those frequencies ω such that |ω| are within a cer-
tain finite interval, then J is called a bandpass image. In prac-
tice, such images can be approximately extracted by convolv-
I with a kernel function g. The convoluted image is J(x) =
ing
I(y)g(y − x) dy. In the frequency domain, Ĵ(ω) = Î(ω)ĝ(ω);
thus 4π 2 J(x) = Î(ω)ĝ(ω)eiωx dω. One may consider ĝ(ω) a
generalized version of the indicator function for F and design Figure 5. A sample of Ĝs,θ in the frequency domain. Each Ĝs,θ
ĝ(ω) to be a function localized in the frequency domain to ex- is illustrated by an ellipsoid whose two axes are proportional to the
tract the corresponding frequency content of I. corresponding standard deviations of Ĝs,θ .
In this section we explain that Gabor wavelets can be used where p(C) is often assumed to have independent components
to extract edge-bar points as well as spectra. However, how to and each cn is assumed to follow a mixture of a normal distrib-
use Gabor wavelets to represent and recognize image patterns ution and a point mass at 0 (Pece 2002; Olshausen and Millman
remains unclear. In the next section we review three statistical 2000). This is the Bayesian variable selection model in linear
principles that shed light on this issue. regression (George and McCulloch 1997).
An efficient algorithm for selecting the basis elements to
3. THREE STATISTICAL PRINCIPLES code a given image is the matching-pursuit algorithm (Mal-
lat and Zhang 1993), which is a stepwise forward-regression
3.1 Sparse Coding Principle: Beyond Edge Detection algorithm. Assume that all of the basis elements are normal-
ized to have unit L2 norm. The algorithm starts from the empty
The sparse coding principle can be used to justify the Ga- set of basis elements. At the kth step, let {B1 , . . . , Bk } be
bor wavelets as linear basis elements for coding natural images. the set of elements selected, with coefficients {c1 , . . . , ck }. Let
Specifically, let {Bn (x), n = 1, . . . , N} be a set of linear basis = I − (c1 B1 + · · · + ck Bk ). Then we choose an element Bk+1
functions. Then we can code I by from the dictionary so that the inner product , Bk+1 is maxi-
N
I= cn Bn + , C = (cn , n = 1, . . . , N) ∼ p(C), (7) tion, and length. (d) An edge-bar map created by the edge-bar detector
described in Section 2.1. There are 4,099 edge-bar points.
n=1
TECHNOMETRICS, AUGUST 2007, VOL. 49, NO. 3
STATISTICAL PRINCIPLES IN IMAGE MODELING 255
Section 2.1, the edge-bar points are detected by the locally best- is a set of Gabor kernel functions {Gs,θ }. Although the sampled
tuned Gabor wavelets. One may ask whether the sparse coding image in Figure 7(b) is a different image than the observed one
principle offers anything more than edge detection. The answer in Figure 7(a), the two images share identical texture patterns
lies in sparsity. Figure 6(d) displays the map of edge-bar points judged by human visual perception (see Zhu, Liu, and Wu 2000
detected at a small scale. This edge map may be more visually for details).
pleasing than Figure 6(c); however, there are 4,099 edge points, In terms of the image ensemble , the meaning of the mini-
far more than the number of elements in sparse coding (343). max entropy principle can be stated as follows:
To represent a local shape in Figure 6(a), a few basis elements
1. For a fixed set of kernel functions {Gk }, the maximum
are sufficient, but a large number of edge points is required
entropy model is the uniform distribution over .
for this task. From a modeling perspective, it is much easier
2. To select the set of kernel functions {Gk , k = 1, . . . , K}
to model a low-dimensional structure than a high-dimensional
(with prespecified K) from a dictionary (e.g., a dictionary
structure. Such models have been studied by Wu et al. (2002),
of Gabor functions), we want to select the set {Gk , k =
Guo, Zhu, and Wu (2003a,b), Wang and Zhu (2004), and Zhu,
1, . . . , K} to minimize the volume of , (i.e., ||).
Guo, Wang, and Xu (2005). We discuss object modeling in Sec-
tion 4.1. (See also Candes and Donoho 1999 and Huo and Chen For the uniform distribution over , its entropy is log ||,
2005 for curvelet and beamlet systems for sparse coding.) which is also the negative log-likelihood. By minimizing the
volume of , we are minimizing the entropy of the correspond-
3.2 Minimax Entropy Principle: Beyond ing uniform distribution or maximizing its log-likelihood.
Spectral Analysis Although intuitively simple, the constrained image ensemble
or its entropy log || is computationally difficult to handle.
The minimax entropy principle was adopted by Zhu et al. As pointed out by Wu et al. (2000), if the image size |D| is large,
(1997) for modeling textures. For convenience, we assume in then the uniform distribution over the constrained ensemble
this section that I is a stationary texture image defined on a fi- can be approximated by an unconstrained statistical model,
nite lattice D with |D| pixels; for instance, D = {1, . . . , M} × K
{1, . . . , N}, so |D| = M × N. A popular choice of texture statis- 1
tics is the marginal histograms of wavelet coefficients (Heeger p(I; ) = exp λk , Hk (I) , (9)
Z()
and Bergen 1995). Let {Gk , k = 1, . . . , K} be a set of kernel k=1
functions—for example, the Gabor sine and cosine functions where = (λk , k = 1, . . . , K), λk = (λk,t , t = 1, . . . , T), and
Gs,θ defined in Section 2.2. Let Hk be the marginal histogram Z() is the normalizing constant to make p(I; ) integrate to
of the convoluted image I ∗ Gk . Specifically, we divide the range 1. Here can be estimated by maximizing the log-likelihood
of [I ∗ Gk ](x) into T bins 1 , . . . , T , so that log p(Iobs ; ), which amounts to solving the estimating equa-
tion E [Hk (I)] = Hk (Iobs ) for k = 1, . . . , K. The entropy
Hk,t (I) = δ [I ∗ Gk ](x) ∈ t , t = 1, . . . , T, (8)
log || can be approximated by the negative log-likelihood of
x
the fitted model log p(Iobs ; ). Thus the minimax entropy prob-
where δ(·) is the indicator function. Let hk,t (I) = Hk,t (I)/|D| be lem is equivalent to selecting K kernel functions {Gk } and com-
the normalized marginal histogram of I ∗ Gk . For simplicity, we puting the corresponding {λk } to maximize the log-likelihood
write Hk = (Hk,t , t = 1, . . . , T) and hk = (hk,t , t = 1, . . . , T). log p(Iobs ; ).
Wu, Zhu, and Liu (2000) defined the following image ensem- An intuitive explanation for the approximation to the uni-
ble: = {I : hk (I) = hk (Iobs ), k = 1, . . . , K}. If the set of mar- form distribution over by a statistical model (9) is that un-
ginal histograms {hk } captures the texture information in Iobs ,
der p(I), hk,t (I) = Hk,t (I)/|D| = x δ([I ∗ Gk ](x) ∈ t )/|D| →
then all the images in the ensemble should share the same Pr([I ∗ Gk ](x) ∈ t ) as |D| → ∞, if ergodicity holds. Here er-
texture pattern. Thus we can model the image Iobs as a random godicity means that the spatial averages go to the corresponding
sample from the uniform distribution over . Figure 7 shows an fixed expectations; that is, the random fluctuations in the spa-
example of this. Figure 7(a) is the observed image Iobs , whereas tial averages diminish in the limit. Because p(I) assigns equal
Figure 7(b) is an image randomly sampled from , where {Gk } probabilities to images with fixed values of {hk (I)}, p(I) goes to
a uniform distribution over an ensemble of images I with fixed
(a) (b) {hk (I)}.
The model (9) generalizes early versions of Markov random-
field models (Besag 1974; Geman and Graffigne 1987). An-
other way to express (9) is p(I) ∝ k,x fk ([I ∗ Gk ](x)), where
fk is a step function over the bins of the histogram Hk such that
fk (r) ∝ exp{λk,t } if r ∈ t . This is similar to the form of projec-
tion pursuit density estimation (Friedman 1987) except that for
each fk , there is the product over all pixels x.
Zhu et al. (1997) proposed a filter pursuit procedure to add
one filter or kernel function Gk at a time to approximately max-
imize the log-likelihood log p(Iobs ; ) or minimize the entropy
Figure 7. Observed image Iobs (a) and a random image sampled log ||. Figure 8 displays an example of filter pursuit proce-
from the image ensemble = {I : hk (I) = hk (Iobs ), k = 1, . . . , K} (b). dure; (a) is the observed image, and (b)–(e) are images sampled
TECHNOMETRICS, AUGUST 2007, VOL. 49, NO. 3
256 YING NIAN WU ET AL.
Figure 8. Filter pursuit: Adding one filter at a time to reduce the entropy. (a) The observed image. (b)–(e) Images sampled from the fitted
models.
from the fitted models p(I; ) corresponding to an increasing at the edge point x1 = 0. If |x1 /σ1 | is large, so that the corre-
sequence of the filter set {G1 , . . . , Gk } for k = 0, . . . , 7. Specifi- sponding Gabor wavelet is away from the edge, then J(x) = 0.
cally, for each k, after fitting the model p(I; ) to the observed The marginal histogram of |J(x)|2 has a high probability to be 0
image for filters {G1 , . . . , Gk }, we sample an image from the fit- and a low probability to be significantly greater than 0. For such
ted model p(I; ). With k = 0 filter, the sampled image is white a distribution, var[|J(x)|2 ] > E2 [|J(x)|2 ] for a random point x.
noise. With k = 7 filters, the sampled image in (e) is perceptu- We may consider hypothesis 0 a null hypothesis of locally
ally equivalent to the input image. Della Pietra, Della Pietra, white Gaussian process. Hypothesis 1 is an alternative periodic
and Lafferty (1997) described earlier work on introducing fea- hypothesis, and hypothesis 2 is an alternative edge hypothe-
tures into the exponential models. sis. The two alternative hypotheses are extremes of two direc-
One may ask what is in the marginal histograms defined in tions of departure from the null hypothesis. Dunn and Higgins
(8) that is beyond the spectrum information extracted by the (1995) derived a class of marginal distributions for image pat-
Gabor kernel functions. Without loss of generality, let J = I ∗ G, terns along the direction toward hypothesis 1, and Srivastava,
where G is the Gabor function defined in Section 2.1. Suppose Grenander, and Liu (2002) derived a class of marginal distrib-
that we normalize the sine and cosine parts of G so that both utions for image patterns along the direction toward hypothesis
have mean 0 and unit L2 norm. Let us consider the marginal 2. Biological vision recognizes patterns along both directions.
distribution of |J(x)|2 under the following three hypotheses: There are V1 cells that are sensitive to edge and bar patterns
(Hubel and Wiesel 1962). There is also a small portion of V1
Hypothesis 0: I(x) is a white noise process, that is, I(x) ∼
cells that respond to periodic grating patterns but not to edge
N(0, σ 2 ) independently.
patterns (Petkov and Kruizinga 1997). Both types of cells can
Hypothesis 1: I(x) is a plane wave I(x) = cos(x1 + φ) prop-
be modeled by combining Gabor wavelets.
agating along the x1 direction at the unit frequency, which
For a set of {Gs,θ } that paves the frequency domain (as illus-
is the frequency of the wave component of G.
trated in Fig. 5), let Js,θ = I ∗ Gs,θ . Then the set of marginal
Hypothesis 2: I(x) is a step-edge function I(x) = δ(x1 > 0),
averages {E[|Js,θ (x)|2 ]} captures the spectrum of I over the fre-
where δ is the indicator function.
quency bands covered by these {Gs,θ }. If we constrain the set
Under hypothesis 0, the real and imaginary parts of J(x) fol- of {E[|Js,θ (x)|2 ]}, then the corresponding image ensemble is ap-
low independent normal distribution N(0, σ 2 ), so |J(x)|2 fol- proximately a Gaussian process with a smooth spectrum. If we
lows σ 2 χ22 , or the exponential distribution with E[|J(x)|2 ] = σ 2 further constrain {var[|Js,θ (x)|2 ]}, then we add new informa-
and var[|J(x)|2 ] = σ 4 . This conclusion still holds as long as I(x) tion about the two directions of departures from the smooth-
is a “locally white” stationary Gaussian process, in the sense spectrum Gaussian process. Spectrum and non-Gaussian statis-
that the spectrum of the process is a constant within the fre- tics of natural images also have been studied by Ruderman and
quency band covered by G. This is because J does not con- Bialek (1994), Simoncelli and Olshausen (2001), and Srivas-
tain frequency content of I outside this frequency band. There- tava, Lee, Simoncelli, and Zhu (2003). Portilla and Simoncelli
fore, for the marginal distribution of |J(x)|2 , E[|J(x)|2 ] cap- (2002) described the use of joint statistics for texture modeling.
tures the average spectrum over the frequency band of G, and For an image patch, the issue of whether to summarize it by
var[|J(x)|2 ] = E2 [|J(x)|2 ] if the process is locally white. marginal histograms or to represent it by sparse coding arises.
Under hypothesis 1, I(x) = [eiφ eix1 + e−iφ e−ix1 ]/2; that is, it Figure 9 displays an example of this situation. Here Figure 9(a)
has two frequency components at (1, 0) and (−1, 0). Ĝ(ω) is is the observed 256 × 256 image, Figure 9(b) is a random im-
centered at (1, 0) with standard deviations 1/σ1 and 1/σ2 . As- age from the ensemble constrained by the marginal histograms
sume that 1/σ1 is sufficiently small so that Ĝ does not cover of Gabor wavelets, and Figure 9(c) is the image reconstructed
(−1, 0). Then J(x) = Ĝ(1, 0)eiφ eix1 . Thus |J(x)|2 = Ĝ(1, 0)2 , by the sparse coding model using 1,265 wavelet elements. Al-
which is a constant, and var[|J(x)|2 ] = 0 < E2 [|J(x)|2 ]. There- though Figure 9(b) captures some edge information, the mar-
fore, a small marginal variance of J(x) indicates the possible ginal histograms are implicit in the sense that by pooling the
presence of a periodic component within the frequency band marginal histograms, we discard all of the position information.
covered by Ĝ. The sparse coding model captures the local edges at different
Under hypothesis 2, it is clear that |J(x)|2 > 0 only if |x1 | positions explicitly and preserves the local shapes. It would be
is small compared with σ1 , and |J(x)|2 achieves its maximum interesting to study the transition between the implicit marginal
TECHNOMETRICS, AUGUST 2007, VOL. 49, NO. 3
STATISTICAL PRINCIPLES IN IMAGE MODELING 257
Figure 9. (a) The observed 256 × 256 image. (b) A random sample from the image ensemble constrained by the marginal histograms of
Gabor wavelets. (c) Reconstructed image by the sparse coding model using 1,265 wavelet elements.
histograms and the explicit sparse coding. Guo et al. (2003a,b, 4. CONNECTIONS AMONG THREE
2007) have attempted to explain this issue. STATISTICAL PRINCIPLES
3.3 Meaningful Alignment Principle: Beyond The three statistical principles reviewed in the previous sec-
Marginal Properties tion can be connected by a common perspective, where the goal
is to identify a small number of most powerful constraints to re-
The sparsity and marginal histograms studied in the previous strict the observed image into an image ensemble of a minimum
two sections are marginal properties of the wavelet coefficients. volume. Also see the recent article by Wu et al. (2007), which
They do not capture the joint properties of wavelet coefficients. unifies the three statistical principles in a theoretical framework
The meaningful alignment principle studied by Moisan et al. combining hypothesis testing and statistical modeling.
(2000) is intended to capture these joint properties.
One particular type of joint property studied by Moisan et 4.1 Pooling Projections
al. (2000) is the alignment of local orientations on line seg-
ments in natural images. For a pixel x, its orientation θ (x, s) First, we consider the sparse coding principle. Given the dic-
at scale s is defined in Section 2.1. The alignment event is de- tionary of basis elements, this principle seeks to identify a small
fined in terms of the coincidence of the orientations of a set number of basis elements {B1 , . . . , BK } from the dictionary to
of pixels. Specifically, consider any two pixels a and b, and code the observed image I with small error. Geometrically,
let θ (a, b) be the orientation of the line segment that connects this means that I projects most of its squared norm onto the
a and b. We can sample a number of equally spaced pixels subspace spanned by {B1 , . . . , BK }. Thus we can view sparse
a = x1 , x2 , . . . , xl = b on this line segment (a, b) and compute coding as constraining the projection of I onto the subspace
their orientations θ1 , θ2 , . . . , θl . Here θi is said to be aligned spanned by {B1 , . . . , BK }.
with θ (a, b) if |θi − θ (a, b)| ≤ α, Specifically, for any image I defined on a finite grid D of
where α is a prespecified |D| pixels, we may vectorize I to make it a |D|-dimensional
threshold. We then compute S = li=1 δ(|θi − θ | ≤ α), that is,
the number of sampled pixels whose orientations are aligned vector. Similarly, we also can vectorize B1 , . . . , BK into |D|-
with θ (a, b). dimensional vectors. Let rk = I, Bk , and let R = (r1 , . . . , rK )
Then we want to determine whether S is sufficiently large be the K-dimensional vector of projection coefficients. Let
for (a, b) to be a meaningful line segment. For this purpose, B = (B1 , . . . , BK ) be the (|D| × K)-dimensional matrix whose
we may choose a threshold k(l) = min{k : Pr(S ≥ k) ≤ /|D|2 }, kth column is Bk . Then R = B I. Let H = B(B B)−1 B be the
where Pr(A) is the probability of an event A under the hypoth- projection matrix. Then HI = B(B B)−1 R is the projection of
esis that the image is white noise and |D| is the total num- I onto the subspace spanned by {B1 , . . . , BK }, and HI 2 =
ber of pixels, so that |D|2 bounds the total number of poten- R (B B)−1 R.
tial line segments (a, b). If S ≥ k(l), then (a, b) is said to be Let B̄ be an (|D| × (|D| − K))-dimensional matrix whose
an -meaningful line segment. In a completely random image, columns are orthonormal vectors that are also orthogonal to
the expected number of falsely declared -meaningful line seg- all of the Bk ’s. Let R̄ = B̄ I. Then R̄ 2 = I 2 − HI 2 =
ments is less that . For computational convenience, we can I 2 − R (B B)−1 R.
sample the equally spaced pixels a = x1 , x2 , . . . , xl = b so that Consider the image ensemble
in the random image, θ1 , θ2 , . . . , θl are independent. Then S fol- = I : I 2 = |D|σ 2 = Iobs 2 , B I = R = B Iobs . (10)
lows a binomial distribution, and relevant probabilities can be
calculated or approximated in closed form. This ensemble is a sphere viewed in the (|D| − K)-dimensional
Generally speaking, an -meaningful event A is such that in space spanned by B̄,
a white noise image, the expected number of such events is = {R̄ : R̄ 2 = |D|σ 2 − R (B B)−1 R}. (11)
less than . This means that Pr(A) is small under the complete-
randomness hypothesis. The meaningful alignment principle is Let
closely related to the hypothesis testing (or, more precisely, |D|σ 2 − R (B B)−1 R
multiple hypothesis testing) problem in statistics. γ2 = . (12)
|D| − K
TECHNOMETRICS, AUGUST 2007, VOL. 49, NO. 3
258 YING NIAN WU ET AL.
(a)
(b)
(c)
Multiscale hierarchical models of wavelet coefficients have The Bayesian framework discussed in Section 1.3 can be an
been developed by De Bonet and Viola (1997) for texture syn- elegant framework for resolving the mutual dependencies be-
thesis and recognition and by Buccigrossi and Simocelli (1999) tween the tasks at different levels. In this framework, object pat-
for image compression and denoising. What we are propos- terns and edge patterns are labeled simultaneously at different
ing here is to add the labels of the recognized patterns in the layers. The edge patterns are not treated as being more funda-
coarse-to-fine generative model. Needless to say, we remain far mental than the object patterns. Although an object pattern can
away from such a model, and much theoretical and experimen- be composed of edge patterns at a fine resolution, at a coarse
tal work needs to be done. A preliminary attempt toward such a resolution an object pattern occupies an image patch that is no
model has been made by Wang et al. (2005). larger than the image patch of an edge pattern at a fine resolu-
tion. Prior knowledge of how the object patterns are composed
of the edge patterns aids the labeling of both types of patterns.
5. DISCUSSION Intuitively, this prior knowledge serves as a constraint to help
eliminate the ambiguities in the local image patches at different
5.1 Classification Rule or Generative Model?
layers.
There are two major schools of thoughts on statistical learn-
ing in computer vision. One school emphasizes learning gen- ACKNOWLEDGEMENT
erative models (Cootes, Edwards, and Taylor 2001; Doretto,
Chiuso, Wu, and Soatto 2003; Isard and Blake 1998; Wang and The work presented in this article was supported by NSF
Zhu 2004) and deriving the likelihood or the posterior distri- DMS 0707055, NSF IIS 0413214, ONR C2Fusse.
bution Pr(labels|image patches) from the learned models. The
other school emphasizes learning Pr(labels|image patches) or, [Received January 2006. Revised September 2006.]
more simply, the classification rules: Label = f (image patch)
directly (Amit and Geman 1997; Viola and Jones 2004), with- REFERENCES
out learning the generative models. Attempts have been made
to combine these two schools of methods (Tu and Zhu 2002; Adelson, E. H., and Bergen, J. R. (1985), “Spatiotemporal Energy Models for
the Perception of Motion,” Journal Optical Society of America, Ser. A, 2,
Tu, Chen, Yuille, and Zhu 2005). 284–299.
Comparing these two schools shows that the second school is Amit, Y., and Geman, D. (1997), “Shape Quantization and Recognition With
more direct than the first and is very powerful for classification Randomized Trees,” Neural Computation, 9, 1545–1588.
Bell, A., and Sejnowski, T. J. (1997), “The ‘Independent Components’ of Nat-
tasks in which the variable “label” is a category that takes values ural Scenes Are Edge Filters,” Vision Research, 37, 3327–3338.
in a finite or binary set. But for general vision tasks, the variable Besag, J. (1974), “Spatial Interaction and the Statistical Analysis of Lattice
Systems” (with discussion), Journal of the Royal Statistical Society, Ser. B,
“labels” has a much more complex structure than a category; for 36, 192–236.
example, it can have a hierarchical structure with compositional Buccigrossi, R. W., and Simoncelli, E. P. (1999), “Image Compression via Joint
and spatial organizations, as illustrated in Figure 2(b). For such Statistical Characterization in the Wavelet Domain,” IEEE Transactions on
Image Processing, 8, 1688–1701.
a complex structure, learning Pr(labels|image patches) may not Burt, P., and Adelson, E. H. (1983), “The Laplacian Pyramid as a Compact
be practical; learning the coarse-to-fine generative model may Image Code,” IEEE Transactions on Communication, 31, 532–540.
be simpler and more natural. Candes, E. J., and Donoho, D. L. (1999), “Curvelets: A Surprisingly Effective
Nonadaptive Representation for Objects With Edges,” in Curves and Sur-
faces, ed. L. L. Schumakeretal, Nashville, TN: Vanderbilt University Press.
5.2 Low-Level and Midlevel Vision? Canny, J. (1986), “A Computational Approach to Edge Detection,” IEEE Trans-
actions on Pattern Analysis and Machine Intelligence, 8, 679–698.
Chen, S., Donoho, D., and Saunders, M. A. (1999), “Atomic Decomposition by
Vision tasks are roughly classified into low-level, mid-level, Basis Pursuit,” SIAM Journal on Scientific Computing, 20, 33–61.
and high-level tasks. Low-level tasks involve detecting local Cootes, T. F., Edwards, G. J., and Taylor, C. J. (2001), “Active Appearance
image structures (e.g., edges, corners, junctions) or detecting Models,” IEEE Transactions on Pattern Analysis and Machine Intelligence,
23, 681–685.
what Julesz called “textons” (Julesz 1981), or what Marr called Daugman, J. (1985), “Uncertainty Relation for Resolution in Space, Spatial
“symbols” or “tokens” (Marr 1982). Midlevel tasks involve ex- Frequency, and Orientation Optimized by Two-Dimensional Visual Cortical
Filters,” Journal of the Optical Society of America, 2, 1160–1169.
tracting curves and contours from the image or segmenting the De Bonet, J. S., and Viola, P. A. (1997), “A Non-Parametric Multi-Scale Statis-
image into regions of coherent intensity patterns. High-level tical Model for Natural Images,” Advances in Neural Information Processing
tasks involve recognizing objects and their shapes and poses. Systems.
Della Pietra, S., Della Pietra, V., and Lafferty, J. (1997), “Inducing Features
Detecting local image structures in low-level tasks can be dif- of Random Fields,” IEEE Transactions on Pattern Analysis and Machine
ficult, because natural images are full of ambiguities if we look Intelligence, 19, 380–393.
only at a local image patch at a fixed resolution. Extracting con- Doretto, G., Chiuso, A., Wu, Y. N., and Soatto, S. (2003), “Dynamic Textures,”
International Journal of Computer Vision, 51, 91–109.
tours or regions from images also can be difficult, even with Dunn, D., and Higgins, W. E. (1995), “Optimal Gabor Filters for Texture Seg-
the help of regularity terms or generic prior distributions (Kass, mentation,” IEEE Transactions on Image Processing, 4, 947–964.
Witkin, and Terzopoulos 1987; Mumford and Shah 1989) to en- Friedman, J. H. (1987), “Exploratory Projection Pursuit,” Journal of the Amer-
ican Statistical Association, 82, 249.
force smoothness and continuity. The low-level and midlevel Geman, S., and Geman, D. (1984), “Stochastic Relaxation, Gibbs Distribu-
tasks are often considered the foundation for the high-level tions and the Bayesian Restoration of Images,” IEEE Transactions on Pattern
tasks of object recognition, but at the same time, the ambigu- Analysis and Machine Intelligence, 6, 721–741.
Geman, S., and Graffigne, C. (1987), “Markov Random Field Image Models
ities in low-level and midlevel tasks can be resolved only after and Their Applications to Computer Vision,” Proceedings of the Interna-
the specific objects are recognized. tional Congress of Mathematicians, 1, 1496–1517.
Geman, S., Potter, D. F., and Chi, Z. (2002), “Composition System,” Quarterly Olshausen, B. A., and Millman, K. J. (2000), “Learning Sparse Codes With
of Applied Mathematics, 60, 707–736. a Mixture-of-Gaussians Prior,” Advances in Neural Information Processing
George, E. I., and McCulloch, R. E. (1997), “Approaches to Bayesian Variable Systems, 12, 841–847.
Selection,” Statistica Sinica, 7, 339–373. Pece, A. (2002), “The Problem of Sparse Image Coding,” Journal of Mathe-
Grenander, U. (1993), General Pattern Theory, Oxford, U.K.: Oxford Univer- matical Imaging and Vision, 17, 89–108.
sity Press. Perona, P., and Malik, J. (1990), “Detecting and Localizing Composite Edges in
Grenander, U., and Miller, M. I. (1994), “Representation of Knowledge in Com- Images,” in Proceedings of the International Conference of Computer Vision,
plex Systems,” Journal of the Royal Statistics Society, Ser. B, 56, 549–603. pp. 52–57.
Guo, C., Zhu, S. C., and Wu, Y. N. (2003a), “Visual Learning by Integrating De- Petkov, N., and Kruizinga, P. (1997), “Computational Models of Visual Neu-
scriptive and Generative Models,” International Journal of Computer Vision, rons Specialised in the Detection of Periodic and Aperiodic Oriented Visual
53, 5–29. Stimuli: Bar and Grating Cells,” Biological Cybernetics, 76, 83–96.
(2003b), “A Mathematical Theory of Primal Sketch and Sketchabil- Portilla, J., and Simoncelli, E. P. (2000), “A Parametric Texture Model Based
ity,” in Proceedings of the International Conference of Computer Vision, on Joint Statistics of Complex Wavelet Coefficients,” International Journal
pp. 1228–1235. of Computer Vision, 40, 49–71.
(2007), “Primal Sketch: Integrating Structure and Texture,” Computer Riesenhuber, M., and Poggio, T. (1999), “Hierarchical Models of Object
Vision and Image Understanding, 106, 5–19. Recognition in Cortex,” Nature Neuroscience, 2, 1019–1025.
Heeger, D. J., and Bergen, J. R. (1995), “Pyramid-Based Texture Analy- Ruderman, D. L., and Bialek, W. (1994), “Statistics of Natural Images: Scaling
sis/Synthesis,” Computer Graphics Proceedings, 229–238. in the Woods,” Physics Review Letters, 73, 814–817.
Hubel, D., and Wiesel, T. (1962), “Receptive Fields, Binocular Interaction and Simoncelli, E. P., Freeman, W. T., Adelson, E. H., and Heeger, D. J. (1992),
Functional Architecture in the Cat’s Visual Cortex,” Journal of Physiology, “Shiftable Multiscale Transforms,” IEEE Transactions on Information The-
160, 106–154. ory, 38, 587–607.
Huo, X., and Chen, J. (2005), “JBEAM: Multiscale Curve Coding via Beam- Simoncelli, E. P., and Olshausen, B. A. (2001), “Natural Image Statistics and
lets,” IEEE Transactions on Image Processing, 14, 1665–1677. Neural Representation,” Annual Review of Neuroscience, 24, 1193–1216.
Isard, M., and Blake, A. (1998), “Condensation-Conditional Density Propa- Srivastava, A., Grenander, U., and Liu, X. (2002), “Universal Analytical Forms
gation for Visual Tracking,” International Journal of Computer Vision, 29, for Modeling Image Probabilities,” IEEE Transactions on Pattern Analysis
5–28. and Machine Intelligence, 24, 1200–1214.
Julesz, B. (1981), “Textons, the Elements of Texture Perception and Their In- Srivastava, A., Lee, A., Simoncelli, E., and Zhu, S. (2003), “On Advances in
teractions,” Nature, 290, 91–97. Statistical Modeling of Natural Images,” Journal of Mathematical Imaging
Kass, M., Witkin, A., and Terzopoulos, D. (1987), “Snakes: Active Contour and Vision, 18, 17–33.
Tu, Z. W., and Zhu, S. C. (2002), “Image Segmentation by Data Driven Markov
Models,” International Journal of Computer Vision, 1, 321–331.
chain Monte Carlo,” IEEE Transactions on Pattern Analysis and Machine
Kovesi, P. (1999), “Image Features From Phase Congruency,” Videre: Journal
Intelligence, 24, 657–673.
of Computer Vision Research, 1.
Tu, Z. W., Chen, X. R., Yuille, A. L., and Zhu, S. C. (2005), “Image Parsing:
Lampl, I., Ferster, D., Poggio, T., and Riesenhuber, M. (2004), “Intracellular
Unifying Segmentation, Detection and Recognition,” International Journal
Measurements of Spatial Integration and the MAX Operation in Complex
of Computer Vision, 63, 113–140.
Cells of the Cat Primary Visual Cortex,” Journal of Neurophysiology, 92,
Viola, P. A., and Jones, M. J. (2004), “Robust Real-Time Face Detection,” In-
2704–2713.
ternational Journal of Computer Vision, 57, 137–154.
Lee, T. S. (1996), “Image Representation Using 2D Gabor Wavelets,” IEEE Wang, Y. Z., and Zhu, S. C. (2004), “Analysis and Synthesis of Textured Mo-
Transactions on Pattern Analysis and Machine Intelligence, 10, 959–971. tion: Particles and Waves,” IEEE Transactions on Pattern Analysis and Ma-
Lewicki, M. S., and Olshausen, B. A. (1999), “Probabilistic Framework for the chine Intelligence, 26, 1348–1363.
Adaptation and Comparison of Image Codes,” Journal of the Optical Society Wang, Y. Z., Bahrami, S., and Zhu, S. C. (2005), “Perceptual Scale Space and
of America, 16, 1587–1601. Its Applications,” in Proceedings of the International Conference of Com-
Lindberg, T. (1993), “Effective Scale: A Natural Unit for Measuring Scale- puter Vision, pp. 58–65.
Space Lifetime,” IEEE Transactions on Pattern Analysis and Machine Intel- Wang, Z., and Jenkin, M. (1992), “Using Complex Gabor Filters to Detect and
ligence, 15, 1068–1074. Localize Edges and Bars,” Advances in Machine Vision, 32, 151–170.
(1994), Scale-Space Theory in Computer Vision, Dordrecht: Kluwer Witkin, A. (1983), “Scale-Space Filtering,” in Proceedings of the International
Academic Publishers. Joint Conference on Artificial Intelligence, pp. 1019–1021.
Mallat, S. (1989), “A Theory of Multiresolution Signal Decomposition: The Wu, Y. N., Zhu, S. C., and Guo, C. (2002), “Statistical Modeling of Texture
Wavelet Representation,” IEEE Transactions on Pattern Analysis and Ma- Sketch,” in Proceedings of the European Conference of Computer Vision,
chine Intelligence, 11, 674–693. pp. 240–254.
Mallat, S., and Zhang, Z. (1993), “Matching Pursuit in a Time-Frequency Dic- (2007), “From Information Scaling to Regimes of Statistical Models,”
tionary,” IEEE Transactions on Signal Processing, 41, 3397–3415. Quarterly of Applied Mathematics, to appear.
Mallat, S., and Zhong, Z. (1992), “Characterization of Signals From Multiscale Wu, Y. N., Zhu, S. C., and Liu, X. W. (2000), “Equivalence of Julesz Ensem-
Edges,” IEEE Transactions on Pattern Analysis and Machine Intelligence, ble and FRAME Models,” International Journal of Computer Vision, 38,
14, 710–732. 245–261.
Marr, D. (1982), Vision, Gordonsville, VA: W. H. Freeman. Young, R. A. (1987), “The Gaussian Derivative Model for Spatial Vision: I.
Moisan, L., Desolneux, A., and Morel, J.-M. (2000), “Meaningful Alignments,” Retinal Mechanism,” Spatial Vision, 2, 273–293.
International Journal of Computer Vision, 40, 7–23. Zhu, S. C., and Mumford, D. (2007), “Quest for a Stochastic Grammar of Im-
Morrone, M. C., Ross, J., Burr, D. C., and Owens, R. A. (1986), “Mach Bands ages,” Foundations and Trends in Computer Graphics and Vision, to appear.
Are Phase Dependent,” Nature, 324, 250–253. Zhu, S. C., Guo, C. E., Wang, Y. Z., and Xu, Z. J. (2005), “What Are Textons?”
Mumford, D. B. (1994), “Pattern Theory: A Unifying Perspective,” in Proceed- International Journal of Computer Vision, 62, 121–143.
ings of the First European Congress of Mathematics, Boston: Birkhauser. Zhu, S. C., Liu, X., and Wu, Y. N. (2000), “Exploring Texture Ensembles by
Mumford, D., and Shah, J. (1989), “Optimal Approximations by Piecewise Efficient Markov Chain Monte Carlo: Towards a ‘Trichromacy’ Theory of
Smooth Functions and Associated Variational Problems,” Communications Texture,” IEEE Transactions on Pattern Analysis and Machine Intelligence,
on Pure and Applied Mathematics, 42, 577–685. 22, 554–569.
Olshausen, B. A., and Field, D. J. (1996), “Emergence of Simple-Cell Receptive Zhu, S. C., Wu, Y. N., and Mumford, D. (1997), “Minimax Entropy Prin-
Field Properties by Learning a Sparse Code for Natural Images,” Nature, 381, ciple and Its Applications in Texture Modeling,” Neural Computation, 9,
607–609. 1627–1660.