You are on page 1of 11

Chapter 3

Receptive fields
of a sample of
Given that cells with all the receptive field types
simple cells with shown in 3.6-3.7, each for a full range of orienta-
differing preferred tions, are needed for each of the myriad retinal
orientations but patches, one can see that the striate cortex is a
all receiving their
inputs from the very elaborate biological image processing device
same patch of indeed. We will examine its structute and function
retinal receptors in Ch 9.
A patch
of retinal
receptors Are Simple Cells Feature Detectors?

If you look at the optimal stimuli shown in 3.6-


3.7 then you can easily see why these cells have
often been dubbedfoature detectors. That is, it has
3.10 Many simple cells analyze each patch of retina commonly been assumed that if a cell has a specific
Beware: this is a highly schematic diagram-it does not
edge feature as its optimal stimulus then that cell
show the cells intervening between the receptors in the
retina and the simple cells in the brain. The properties of must be a template recognition mechanism for
these intervening cells are described in Ch 6. signaling the presence of an edge feature in a given
position on the retina.
But computer experiments that make use of
by different cells. That is, any given patch of retina
simple cell models to perform image processing
is "looked at" by a whole range of cells, each cell
have shown this simple-minded "feature detector"
dealing with one or other orientation. This is illus-
idea must be wrong. The reason is that the output
trated in 3.10 for cells with "bar detecting" fields.
of any given simple cell is far too ambiguous to be
The angle to which each cell is tuned is determined
regarded as a reliable feature detector on its own.
by the pattern of its excitatory and inhibitory
For example, consider a cell with a preferred
regions. 0
orientation of 90 (vertical) with the tuning curve
"Slit" and "edge" simple cells are also found for
shown in 3.lla. Such a cell would by defini-
a full range of orientations. Cells of those types are
tion respond most strongly when a high contrast
also "looking at" any given patch of the retina and
vertical black edge falls in the appropriate position
hence the associated region of the input image.
in its receptive field. If the contrast of the edge
Thus, a wide range of cell types is found analysing
is reduced, by making the black zone a dark gray
what is present in each patch of retina. These cells
and the white zone a light grey, then the cell still
all share the same receptors and hence work from
responds but less strongly, 3.llb. The response is
the same gray level description. But the fibers go-
lowered because less excitation and more inhibition
ing from the receptors to each cell are different and
is being fed to the cell than before.
the patterns of excitatory and inhibitory connec-
tions are thereby different. Now consider the cell's response to a high con-
trast edge rotated by, say, ±20° or so away from the
Different Cells for Different Places optimal vertical orientation, 3.llc. The cell would
respond just as well to this non-optimally oriented
So far we have considered analyzing the image in
stimulus as it would to the optimally oriented one
just one patch of retina. But the mammalian retina
oflower contrast. If activity in this cell is taken
consists of millions of receptors and each patch of
simply and directly as the neural representation
receptors needs to be analyzed . Convolution is the
of a vertical edge then we would be suscepti-
technical term for applying a given receptive field
ble to some very awkward illusions. We would
type all over an image. We will explain that con-
confuse faint vertical edges with high contrast
cept in detail in Ch 5. For the present we note that
just-off-vertical ones, a quite unsatisfactory state of
a convolution is implemented in biological visual
affairs which fortunately does not arise. The brain
systems by having many cells of the same type,
has mechanisms for solving this problem, and we
each one devoted to its patch of retina.
describe these next.
64
Seeing with Receptive Fields

Different stimuli Maintained Stimuli on during Maintained a


on same discharge this period discharge Low contrast
receptive field (faint) vertical
a edge stimulus

I 1"1111111111111111111111111111 I
Preferred stimulus:
very brisk response Row of"",
Some -s showing more
inhibition than in a preferred
IJt t /'
Fewer +s showing less orientations Cells with
b excitation than in a The most active very low
cell (firing at, activity (say,
I I 1111111111111111111 II II I say, 20 impulses 0-5 impulses
per second) per second)
Preferred stimulus orientation but
low contrast: weaker response b
than in a High contrast
just-off-vertical
edge stimulus
Balance of excitation and
c / inhibition the same as in b

I I I 11111 II 11111111111111 II I Same row


Not preferred stimulus orientation of cells as
but high contrast: similar above
response to that in b

Active at 20 impulses per second, just as in The most


All responses are from the same simple striate cell active cell (say,
response to the faint edge above, but still much less
which has a vertical preferred stimulus orientation 100 impulses
active than the neighboring cell which is tuned to
prefer the just-off-vertical edge orientation per second)

3.11 Simple cell ambiguity 3.12 Interpreting simple cell responses in context

active one, and register the input feature as having


Many Cells Make Light Work
a vertical orientation.
The general answer to the ambiguity problem just Now consider responses to a high contrast edge
0
introduced is to consider each cell's output, not on oriented about 20 from vertical, 3.12b. This
its own, but in the context of the activities of other non-vertical stimulus stimulates the vertically ori-
cells examining the same patch of retina. ented receptive field just as well as the vertical but
For example, suppose mechanisms were present faint edge shown in 3.12a. But in 3.12b the cell
0
that identified the most active neuron when assess- tuned to 20 from vertical fires at an even greater
0
ing the orientation of an input feature . (A better rate than the vertically tuned one because 20 from
scheme will be described shortly but this example vertical is its preferred orientation.
is a helpful starting point.) In this case, for the Consequently, the mechanism which detects the
faint vertical edge in 3.11b, the most active cell most active neuron would register the input feature
would still be vertically tuned, 3.12a. This cell not as a vertical one but at its true orientation of
0
would not be firing as briskly as it would have been 20 from vertical.
if the edge were a high contrast vertical one. Nev- In this way, by taking advantage of the context
ertheless, it would still "come out on top" against in which any given cell's activity occurs, the brain
the opposition from other, even less active cells. need not be fooled by the intrinsically ambiguous
Hence, the hypothetical pattern decision responses of individual simple cells. (If cell outputs
mechanism would note that this cell was the most are free of any jitter or noise, the outputs of only

65
Chapter 3

two cells are sufficient to correctly resolve this orientations of 70° and 110° also fire, but relatively
ambiguity; Ch 11.) weakly, and the responses from cells tuned to 60°
This orientation-contrast example illustrates and 120° are very small indeed. For cells with
very nicely the idea of regarding each simple cell optimal orientations further away from vertical
as delivering image measurements for interpretation, than about ±30° the firing rates quickly reduce to
and not as an "edge detectors" pure and simple. resting discharge levels. This symmetric pattern of
They can't serve as feature detectors because their firing rates centered on the vertically tuned cell can
individual responses are far too ambiguous. be regarded as a "signature" profile of activity for
The example of ambiguity just discussed in- the feature representation verticaL bar. It would be
volves two stimulus dimensions, orientation and the signature noted by the interpretive mechanisms
contrast. We examine in detail in Ch 11 how to whose task it is to decide what the simple cell
resolve ambiguities of that sort. We now consider measurements convey about the input.
how to compute the orientation of a stimulus Now study 3.14, which illustrates firing rates
when its contrast is held constant. Our goal is to that arise for an input bar whose orientation is 92°.
show that there is a much better way of using the As you can see, the distribution is slightly skewed,
responses of a range of cells with different preferred not symmetric. The vertically tuned cell is still
orientations than simply to seek the most actjve firing fastest but it has almost been caught up by
one. the 100° cell, which is now firing well above the
level of the 80° cell. That wasn't the case for the 90°
Computing Stimulus Orientation input. Equally, the 110° cell is firing more briskly
Single cell recordings show that there are simple than the 70° cell. This skewed activity profile is the
cells for only 18-20 different preferred orienta- signature profile for a 92° bar.
tions within each patch of retina. Hence, if cell This is a much more sophisticated approach
responses were interpreted as suggested in 3.12, to the interpretation of simple cell measurements
with the most active cell signalling the orientation (outputs) than simply finding the most active
of the input feature, then we could only see 18-20 neuron and assuming its preferred orientation is
different orientations. This implies that we would the orientation of the input feature. What we now
be limited to discriminating between lines differing have is a system in which simple cells can be said
in orientation only if their orientations differed by to sample the stimulus dimension oforientation,
about 10°. That is, the 180° of orientation would and infer what is going on in the image from the
be shared between, say, 18 orientation detectors pattern of simple cell outputs.
with the peaks of their tuning curves (3.13,3.14) One advantage of this scheme for finding edge
separated by 180°/18 =10°. But our perceprual ca- orientation is that it is a very neat way of avoiding
pabilities are much better than this: we can manage more simple cells than are necessary to do the job.
discriminations of less than 0.25°. Clearly, there is This makes it economical in terms of number of
a need for some method of interpolation between cells required. Another clever brain trick.
neighboring orientation measurements. By this we
Integrating Channel Outputs: Weighted Means
mean there must be a way for the brain to estimate
stimulus orientations lying in between the 18 pre- A channel is defined as a population of cells that
ferred orientations encoded by simple cells. all have the same preferred value of a particular
For example, compare the two situations illus- stimulus property. For example, if all the cells in a
trated schematically in 3.13 and 3.14. The input given population have the same preferred orienta-
feature in 3.13 is vertical and this causes a symmet- tion then together they constitute a single orienta-
ric distribution of simple cell firing rates with peak tion channel. Note that the cells in such a channel
firing shown for the vertically-tuned cell. Either are not necessari ly located close to o ne another.
side of this peak response, simple cells whose pre- In fact, they can be distributed widely across the
ferred stimuli are bars with orientations of 80° and striate cortex (Ch 9). The examples shown in 3.13
100° are shown firing quite briskly-but not as fast and 3.14 illustrate orientation channels, but show
as the vertical ly (90°) tuned cell. Cells with optimal only one single cell taken from each channel with

66
Seeing with Receptive Fields

Same input 90° to each cell

I: ,:
:
. I .I .1 1 I

ltl itt:
! 1l::
I
I :
I
I
I
I
:

:
I
'
I
I
I
I
I
I
I
I
I
I
I
I
j'm
i All I
I
I
I
I
I
I
I
I
I
I
I
: .




I
I
I

. ..I...L80......
Tuning of
Cells

Each line in this


graph shows
the response
. ... ... Profile of channel activities to
vertical bar (90°)

to a vertical
(90°) stimulus of
just one of the LOw L-____________ _____________
differently tuned
cells shown above
.... .. ..... . .. .. Row of cells
Receptive fields

3.13 Activities of simple cells to a vertically oriented stimulus


The upper graphs show the tuning functions of cells with preferred orientations from 60° to 120°. The striped bars under each tuning curve
represent the receptive field of the cell: their orientations show the preferred orientations of each cell. Above each tuning curve is shown a
vertical bar as input. Following down the dotted lines from the input bars leads to the firing rate , shown by the thin vertical line under each tun-
ing curve, for each cell to the vertical bar stimulus. The vertically tuned (90°) channel fires most vigorously. Cells with preferred orientations
close to verti ca l also become activated by the vertical input bar though to a lesser extent. The overall pattern of activity is symmetrical around
vertical. (Drawn schematically.)

Same Input 92° to each cell

: A:
:
\ .\ .\ \ \. \. \
A:
itl ' I :

:
I
I
I
I
:
I

:
I


I

:
I


I

I
I
I
I
I

:
I


All l 1j i !
I

:
I


: :

' .

I

Tuning of
Cells

Profile of channel activities to


Each line in this graph Activity
92° bar
shows the response of level
one of the differently in each cell
tuned cells shown above
to a 92° stimulus- a Low
bar slightly rotated
anti-clockwise from
vertical
.... .. ... .. Row of cells
Receptive fields

3.14 Activities of simple cell to a stimulus oriented at 92°


As in 3.13, the vertically tuned (90°) cell fires most vigorously but to a slightly lesser extent. However, the ce ll tuned to 100° fires much more
strongly than in 3.13, indeed almost as strongly as the 90° cell. Also, the cell tuned to 80° fires less briskly than it did when the stimulus was a
vertical bar (compare 3.14). The overall pattern of activity is thus asymmetrical , being skewed toward the cells tuned to orientations over 90°.
(Drawn schematically to emphasize main points .)

67
Chapter 3

a particular orientation tuning (90°, 80°, 70°, and examp le, the full range shown in 3.13 and 3.14.
so on). We say more about the problems of noise in eh 5.
We now explore further the idea of interpret- [This simple scheme for calculating a weighted
ing simple cell outputs by giving an example of mean is used as an example to exp lain the basic
one particularly simple way of doing it. This is to idea. However, it would need to be refined in a
regard all the activities of the cells as a mass of data practical system, as it collapses arithmetical ly for
and work out the weighted mean of all these data. any orientation coded as 0°. This problem can be
Slcip these details if you prefer and go to the next fixed by expressing angles trigonometrically, but we
section. will not go into detail here.]
To keep things arithmetically straightforward, But you might say: the brain doesn't have a cal-
let each impulse in a given time interval of, say, 1 culator for doing arithmetic, so how might it use
second be regarded as one data item. Also, we will its neurons to implement this type of interpolation
consider computing a weighted mean from just calculation? The general answer is: the brain can
three cells, with preferred orientations of 80°,90° "do" arithmetic using the processes of excitation
and 100°. The basic idea is to let each cell contrib- and inhibition in combi nation.
ute to the computation according to its firing rate Suppose the brain did do something along the
(output) in comparison with the other cells. lin es of a weighted mean calculation , and then
Let's first take the situation in which these cells used just one neuron to encode each discriminable
are responding to a vertical bar (90°), and let's sup- orientation. Given that we can distinguish orienta-
pose their firing rates are as shown in the second tions as little as 0.25° apart then that would entail
column of 3.15. The total number of impulses in having a few hundred neurons to encode the orien-
1 second is 35+50+35= 120 impulses per sec- tation of every edge feature in each patch of retina.
ond. We now weight the contribution of each cell Each such neuron would then be said to be a local
taking into account how much each cell is firing code for just one particular orientation.
in comparison with the other cells. Thus" we Alternatively, perhaps the brain doesn't do a
weight the output of the 80° cell by the fraction weighted means calculation to decide which one
35/120=0 .29, the 90° cell by 501120=0.42 and of a set of neurons should become activated as the
the 100° cell by 35/120=0.29. You can think of code for a given orientation. Perhaps instead the
this weighting as reflecting how much influence is patterns of simple cell responses shown in 3.13
to be given to each of the cells in computing the and 3.14 are used as a population code for the
stimulus orientation they are dealing with. feature representation vertical bar present.
The final column of the table multiplies the pre- After all, our simple weighted mean calculation
ferred orientation of each cell by its weighted firing has demonstrated that the population of simple
rate, which sum to give the weighted mean of 90°. cells taken together has the orientation of the
This is exactly as it should be, of course, for a 90° bar encoded in its activity pattern. Perhaps this
input- this what we want the feature code to be distributed representation is sufficient as it stands
saying when this particular symmetrical "signature for the uses the brain has for orientation data. If
tune" is "playing" in the orientation channels. so, why bother going a step further and malcing
But now consider 3.16 which shows asymmetri- the bar orientation explicit in a local code? This
cal firing rates in the same three channels for a question raises some fundamental issues in trying
stimulus just-off-vertical, 92°. The weighted mean to understand seeing and the brain. We return to
is now 92°. This output can be regarded as the re- them in detail in later chapters.
sult of interpolating between the preferred orienta-
tions of the three orientation channels to find the Coarsely Tuned Channels Are a Good Idea
orientation of the stimulus. Progress. We said above that simple cells can be viewed as
This strategy of using weighted outputs from a channels sampling the stimulus dimension oforienta-
set of channels has the advantage that it averages tion. It turns out that the basic principle underly-
out the effects of noise in responses. Obviously, ing this SOrt of san1pling scheme applies generally
this will be better if more channels are used, for in vision.

68
Seeing with Receptive Fields

Cellfiring One consequence of having a limited


Preferred Orientations number of samples of a stimulus dimension
rates Weighted
Orientation Weighted is that the cells taking each measurement
(impulses per firing rates
of Cells (degrees) need to be broadly tuned. This is why using
sec)
such chann els is called coarse coding. The
80° 35 35/ 120 = 0.29 80°xO.29 = 23.43°
sampling idea would not work for color
90° 50 50/ 120 = 0.42 90° x0.42 = 37.50° vision if each retinal co ne were exquisitely
100° 35 35/120 = 0.29
0
100 xO .29 = 29.17° sensitive to just one wavelength .
The reason is that as soon as a wavelength
Total = 120 Total = 90.0°
appeared that was outside its narrowly tuned
This is the
range it would fall silent. It would literal ly
Weighted Mean
have nothing to say about that wavelength.
3.15 Calculating a weighted mean from the responses Thus, we would either have to be blind to
of three channels to a vertical stimulus (90°) that wavelength, or we wou ld need myriad
sharply tuned cones, each specializing in
just one wavelength. That is unworkable. A
Cellfiring
Preferred Orientations massive number of cones would be needed
rates Weighted
Orientation Weighted for each small patch of the fovea (the central
(impulses per firing rates
of Cells (degrees) region of the retina that mediates highest
sec)
resolution vision). How could this myriad
80° 24 24/118 = 0.20 80°xO.20 = 16.27° be packed in without unacceptable loss of
90° 46 46/1 18 = 0.39 90° xO.39 = 35 .08° spatial resolution for fine details?
100° 48 4811 18 = 0.40 100 xO.40 = 40 .68°
0 Broad tuning is, th en, an im portant
requirement for exploiting the economy
Total= 118 Total = 92.0°
offered by using channels. However, broad
This is the
tuning means that the coarsely coded cells
Weighted Mean
in question cannot serve as a local code for a
3.16 Calculating a weighted mean from the responses feature assertion. As stated above, the output
of three channels to a 92° stimulus of each broadly tuned cell is far too ambigu-
ous to serve that role. Either the activities
in the whole set of chan nels need to be used
Take color vision for example. Have you ever as a population code, or a process of interpretation
had to replace colored inks in a color printer and needs to be performed to create a local code with
wondered: how can just three in ks (cyan, magenta, the required resolution.
and yel low) be sufficient to create al l the colors Whichever of these coding schemes, local or
that you see on the printed page? population, turns out to be used by the brain, it is
The short answer is: evolution has "discovered" clear that we need to break away entirely from the
that having just three types of color receptors idea that simple cells are feature detectors, pure
(often referred to as red, green , and blue cones; Ch and simple. It is this realization wh ich lay behind
17) is a sufficient sampling of the stimul us dimen- our earlier remarks about the inappropriateness of
sion of wavelength of light. That is, just three calling such cells detectors: "slit detectors, "
measurements obtained usi ng red, green, and blue "bar detectors, " and so on.
cones are sufficient to infer all the colors that mat- This concl usion was drawn by Marr and others
ter to us for survival. The three colored inks of a from computer-based image processing experi-
printer are sufficient to trigger these three receptor ments that modelled simple cells. The ambigu-
types appropriately for almost all the colors we can ity in the responses of their models of simple
see. The result is a clever eco nomy in stimulus crea- cells mean t that these cells must be regarded as
tion (the printer) and in stimulus sampling (the image-measuring devices which provide useful data
eye and brain). about features of the input image but either this

69
Chapter 3

A region where output A region where output changes a lot


changes relatively little as stimulus orientation changes
as stimulus orientation
changes
3.17 How a change in stimulus
orientation is related to the
change in cell response
100 The solid curve depicts the tuning
curve of a cell. The dashed curves
show how the tuning curve slope
u-
Q)
80 changes with stimulus orientation ;
en
en
Q)
these curves therefore depicts the
-'"
.6. 60 sensitivity of the cell to changes
.!!l.- of orientation at points along the
Q)
orientation continuum . Note that,
ro
a::: 40 surprisingly at first sight, the cell
Ol is least sensitive to changes at
c I
·c
i.L 20
I its preferred orientation (here 0°)
where its firing rate is highest.
I
\ I It has similarly poor sensitivity
0 \ I at the extremes (i.e. , +60 and
0

_60°). The cell is most sensitive at


-60 -40 -20 o 20 40 60 around +20° and -20°.
Orientation (degrees)

data needs to be interpreted before a proper feature stimulus orientation would cause very little change
description can be asserted, or their responses have in output, 3.17. The output of such a very broadly
to be treated as a population code. tuned cell changes very little unless retinal line
One method for reducing the effects of noise is orientation falls on the flanks of the tuning curve.
to average responses from many cells. The underly- These flanks are where the slope of the tuning
ing assumption here is that the noise in any given curve is greatest, and therefore where the change in
cell will be independent of that affecting other cell output per degree of change in line orientation
cells. This means that the noise variations will tend is greatest.
to cancel out when an average is taken. One way to ensure that most retinal line ori-
This is one reason why it may be a good idea entations coincide with this' sensitive" part of a
for the brain to consider the responses of entire tuning curve is to use a large number of cells with
populations of cells when attempting to recover tuning curves which, taken together, tile the space
the parameters of the stimulus that caused those of all possible orientations to yield adequate resolu-
cells to respond. Using averaging to get around tion, as in 3.18. This represents another reason for
the problem of noise is explored in detail Ch 5 in using many cells to encode orientation, which is
connection with the task of edge detection from independent of the noise ptoblem above.
noisy images. A curious side effect of this way of extracting
orientation is that it predicts peaks and troughs
Problem of Parameter Resolution in the system's sensitivity to orientation. A peak
A question that we discuss in Ch 11 is: how few of high sensitivity should be found in the re-
channels are needed to resolve the ambiguity in gion where the two response curves are changing
simple cells responses? It turns out that, in princi- sharply. Troughs should arise in the regions covered
ple, only two, very broadly tuned. However, if we by the top of each cell's response curve. This is
had only two cells to span the entire range of 180
0 because at the tops the change in response as retinal
then each cell would be very insensitive to most orientation changes is not as great.
orientation changes. This is because a large part of This is paradoxical. It would, at first sight, be
each cell's orientation range would be on regions natural to expect maximum sensitivity for orienta-
of the tuning curve that change very slowly with tions falling on the highest point of each cell's tun-
changes in input orientation, so that changes in ing curve. But on careful examination, each cell is

70
Seeing with Receptive Fields

Examp les of regions of peak sensitivity to stimulus orientation changes because these are regions
with steepest changes in channel outputs

100 -I -I -I -
U-
Q) 80
en
u;
Q)

"""
.6.
.e 60
2
ell
a:
OJ 40
.'u:::c" 3.18 Using many simple
20 cells with different
preferred orientations
to "tile" the full stimulus
0 orientation range

Orientation (degrees)

most sensitive to changes in orientation about half number template has a much more complex pat-
way down from the top. tern than the simple bar feature. Moreover, bank
So it is reasonable to ask: does human vision check numbers are made more readily distinguish-
show these predicted peaks and troughs in orien- ab le one from another by using specially designed
tation sensitivity? The answer is yes. Regan and numerals with lines of different thicknesses to
Beverley reported experiments in 1985 on human facilitate recognition.
orientation discrimination which confirmed the But that trick alone would not be enough to get
prediction (see Regan, 2000). such templates to serve as pattern recognizers. The
crucial added ingredient is using a special check
Can Templates Ever Work as Recognizers? scanning device which prevents comp lications
arising from large variations in the input images in
We started this chapter by considering the use terms of the brightness, contrast, shape, size, and
of bar templates to detect stimulus bars. We orientation of numerals. This permits a template
discovered that the problem of response ambigu- recognition system that works well for the task it
ity bedevils their use, but this led us to a general tackles.
principle: ambiguities can be resolved by drawing
inferences from the outputs of many templates
(channels).
Even so, simple templates can be made to work
well as pattern recognition devices in some special N. _ _ a.ntPlc

limited contexts. Consider, for example, a bank


check number recognition system, 3.19. One way Cheque No.

that the numbers can be recognized is to build into


the number-recognizing machine a set of tem- lI'ooooa ?II'
plates, one for each numeral. Then the task is be to
note which template best fits the number on the
check being analyzed. 3.19 Bank check number recognition
Using a template in this way is similar to the
template bar detector in 3.4, except that the

71
Chapter 3

However, that task is a very si mple one by the We now have 18 x 18 x 18 x 18 x 18 = 1,889,568
standards of biological vision systems. They have templates.
to cope with all manner of variations in the way And, once again, this is for just one object, for
objects appear in retinal images, variations over example just one of the numerals that our bank
which they have no control. How human vision check number template recognizer would have to
copes with some of these variations is an issue that deal with if it was stripped of careful control of
we will address in later chapters, particularly eh variations in the input image.
8, Seeing Objects. But we pursue here the topic of Imagine needing this huge number of tem-
template recognition a bit further by way of intro- plates for all the different objects that we so readily
d ucing some basic facts that illuminate why the recognize-numerals, letters, birds, trees, chairs,
seeing problem is so hard. people, and so on.
There is a general formula for working out the
Templates and the Combinatorial Explosion number of combinations of parameters involved
You might be wondering: could a template recog- in this combinatorial explosion: the total number
nition system be made to work by having a range equals N where N is the number of templates per
of different templates, each tuned to deal with one parameter (18 in our example), and the exponent
or other source of image variation? k is the number of parameters. Don't worry if
The way we coped with the problem of vari- this exponential formula seems a bit opaque: if
able image bar orientation illustrates this idea: we you want to know more about it then read e h 11
found it a good idea to have 18 or so differently where we discuss its implications at length .
oriented bar templates, and to use these coarsely The combinatorial explosio n reveals just how
coded measurements of orientation to work out hard the problem vision is. Any attempt to solve it
bar orientation using weighted means. What if this using simple-minded templates doesn't work: the
ap proach was extended for other sources of image brain just doesn't have enough neurons.
variation, such as color and size?
Binding Problem
Each such variable, often called a parameter as
already noted, would need its own set of coarsely You might think at this point this is si lly; surely
coded templates, each one dealing with a limited there is no need to have a cell for every combina-
range of the parameter's possible values. tion of parameter values? Why not simply have one
The trouble with this idea is that it immediately population of cells that encodes only one param-
hits a major snag, called the combinatorial explo- eter exclusively, for all objects.
sion. If we need 18 templates for orientation then For example, one population could encode only
for each one of these we will need a suitable range color, and another could encode only orientation .
of templates for size. Let's say for simplicity that Using this scheme we would require k popula-
this would also be 18. Hence, 18 x 18 templates tions to encode k parameters. If each population
would be needed to deal with al l combinations of consisted of one million cells then no more than k
orientation and size. million cells would be required in total. The brain
Now consider adding another variable, such may do something along these lines (as we will
as contrast, and suppose that too needed 18 see), but this raises another fundamental vision
templates. This takes us to 18 x 18 x 18 = 5,832 problem: how do we attach parameter values to
templates for one numeral. objects? This is known as the bindingproblem and
Well, the brain has a lot of neurons so perhaps we discuss it in eh 11.
that isn't so bad. But things rapidly get worse when
Back to the Jumping Spider
other sources of image variation are brought into
the equation . We can now see that a spider would have a very
Take object position on the retina for example. hard time using templates as a means of deciding
Again to keep things simple, let's suppose the the questio n: is the object over there mate or prey?
parameter of vertical position needs 18 templates Such a spider would also be subject to the combi-
and similarly for the horizontal position parameter. natorial explosion (further details in eh 11).

72
Seeing with Receptive Fields

However, it may be that the jumping spider has, sion arising from the exponential IV" formula. We
as suggested earlier, evolved some special-purpose examine that problem in considerable depth in
visual mechanisms that are quite unlike our own. Ch 11 to explain in more detail why vision is such
Land has suggested that these spiders use scanning a hard problem.
movements of their boomerang-shaped retinae to Finally, armed with core ideas introduced in this
align them with the orientations of leg-like bars in chapter, we are ready to have a much closer look
the input image. This trick might allow the spider in Ch 5 at the task of edge detection. We consider
to avoid the 18 or so different orien tation tuned there and in Ch 6 the tasks of how ro recover fea-
channels that monkeys and humans seem to pos- ture properties other than orientation, such as bar
sess. width, and whether an edge is a sharp or fuzzy.
These retinal scanning movements might also Ch 5, Seeing Edges, will also remedy a nagging
solve the problem of variable object position in the irritation that may have formed in your mind. We
image. They would do that if they ensured that the emphasized in Chs 1 and 2 the need to be very
object's image falls on exactly the right spot for the clear about the computationaL theory, aLgorithmic,
spider's limited number of templates to be able to and hardware Levels of task analysis when studying
recognize Mate or Prey. Perhaps these and other vision. But we have blatantly ignored our own ad-
special-purpose adaptations give the spider a work- vice in this chapter. We have jumped straight into
ing template-based recognition system. considering a particular sort of algorithm, applying
The general idea here is that perhaps the spider templates, without any guidance from a computa-
has not evolved a general-purpose vision system, tional theory as to the design of those templates.
such as our own, but one specialized for its particu- You might feel a bit cheated. But we have done this
lar ecological niche. It can survive if it can capture simply to introduce a wide range of basic concepts
insect prey and find mates. Perhaps it has a visual and terms that it is best to get out of the way first.
system set up to do those tasks and very little else. In any event, if you do feel a bit cheated then
In this respect it may be a bit like the simple-mind- you have drawn the right conclusion because this
ed but highly specialised bank check recognition chapter illustrates just how unsatisfactory it can be
system described above. to start addressing an image processing task with-
out a clear computational theory of that task. Tem-
Concluding Remarks plate matching is a species of algorithm. The design
This chapter has explored the task of building a bar and use of templates demands the clarity afforded
detector using templates to discover what problems by a decent theory of the task. We investigate in
have to be overcome. AJong the way we defined Ch 5 how we can get a much better understanding
some essential technical terms, many to do with of what the receptive fields of simple cells are doing
basic facts about the "hardware" of biological visual by thinking a lot harder about the task of feature
detection. That will be seen to be the moral of the
systems. A key concept has been that of a receptive
story of simple cells told here.
fieLd with excitatory and inhibitory regions.
Linked to this is the idea of receptive fields of
Further Reading
various types, organised as channels analysing each
patch of retina and providing measurements about This main function of this early chapter has been
a stimuLus dimension (such as orientation) from to explore some core ideas needed to understand
which the brain can work out which features are seeing, such as receptive fields, channels, coarse
present in the input image. In the next chapter, Ch and fine coding. We do not recommend much fur-
4, we use these ideas in showing how certain illu- ther reading at this stage on this material. We will
sions called aftereffects have revealed orientation be making suggestions for further reading for later
channels in the human visual system without need chapters that use the core concepts dealt with here.
for invasive single cell recordings. Hence, it is not suggested you consult the
This chapter has also introduced a fundamental sources overleaf at this juncture but we provide
concept in vision research: the combinatoriaL expLo- them so that you can follow them up if you wish.

73
Chapter 3

Barlow HB (1972) Single units and sensation: See pp.l16-120. Comment: An excellent book
A neuron doctrine for perceptual psychology, that we recommend strongly for readers who want
Perception 1 37 1-394 Comment C lassic paper that to pursue various topics in human and animal vi-
discusses the relationship between the firing of sio n at an advanced level, and specifically the work
single neurons in sensory systems and subjectively of Regan and Beverley on peaks and troughs in
experienced sensations. Co mm entaries celebrating orientation sensitivity.
this landmark paper are in Perception 38 795-807.
It is probably best tackled after reading Ch 11 .

Drees 0 (1952) Untersuchungen liber die an-


geborrenen Verhaltensweisen bei Springspinnen
(Salticidae). Z. TierpsychoL. 9 169-209.

Hubel DH and Wiesel TN (1962) Receptive fields,


binocular interaction and functional architecture
in the ca t's striate co rtex. Jou rnal ofPhysiology 160
106-154.
Hubel DH (1988) Eye, Brain and Vision. Scien-
tific American Library. New York. WH Freeman.
Comment: This gives a highly readable overview of
Hubel and Wiesel 's Nobel Prize-winning research.
It is the source of the quotation from Hubel given
in this chapter. However, it is best left until after
reading C hs 8 and 9.
Jonsso n E (2008) Channel-Coded Feature Maps for
Computer Vision and Machine Learning. Linkop-
ing Studies in Science and Technology, Linkoping
University, Sweden. ISBN 978-91-7393-988-l.
Comment: An advan ced mathematical treatment of
theory underl ying the use of channels in computer
vision. We cite it because it is up-to-date and of
possib le interest to students of computer vision.
Land M (1969a) Structure of the retinae of the
principal eyes of jumping spiders (Salticidae: Den-
dryphantinae) in relation to visual optics. Jou rnal of
Experimental Biology 51 443-470.
Land M (l969b) Movements of the retina of
jumping spiders (Salticidae: Dendryphantinae) in
response to visual stimuli. Journal ofExperimental
Biology 51 47 1-493.
Land MF and Nilsson D-E (2001) Animal Eyes.
Oxford University Press. Comment: A short,
erudite, and fascin ating account of animal eyes in
all their diversity. Subsumes papers by Land cited
above.
Regan D (2000) Human Perception of Objects. Sin-
auer Associates, Inc. Publi shers: Sunderland, Mass.

74

You might also like