You are on page 1of 9

J. A.

Fodor

References

Bruner, J S ,Goodnow, J J ,and Austin, G A (1956)A Loewenstein, W R (1960) Biological transducers Siien-
Study of Thinking New York: Wiley (Paperback tific American, August (Also in Perception Mechanisms
Wiley Science Editions, 1962 )
Capranica, R R (1965) The Evoked Vocal Response ofthe
Bullftog A Study of Communication by Sound Cam-
bridge, MA- MIT Press
and Models, Readings from Scientific American (1972),
San Francisco. Freeman )
Ratliff, F (1961) Inhibitory interaction and the detection
and enhancement of contours In Sensory Communzia-
u
Chomsky, N (1959) Review of Skinner's Verbal Beha-
vior Language, 35, 26-58
tion (W Rosenblith, ed) Cambridge, MA MIT
Press Vision
Fodor, J A , Bevel, T , and Ganett, M (1974) The Psy- Teuber, H L (1960) Perception In Handbook of Phy-
chology ofLanguage A n Introduction to Psycholznpstics siology, vol 3 (J Field, H W Magoun, and V E Hall,
and Generative Grammar New York: McGraw-Hill eds) Washington, DL. Amer Phys Soc
Gibson, J J (1966) The Senses Considered as Perceptual Thorpe W H (1963) Learung and Instinct in Animals
Systems Boston' Houghton London Methuen
Goodman, N (1965) Fait, Fiction and Forecast Indiana- Tolman, E C (1932) Purposive Behavior in Animals and
polis: Bobbs-Merrill Men New York' Century
Gregory, R L (1966) Eye and Brain The Psychology of
Seezng New York- McGraw-Hill
Wason, P C , and Johnson-Laird, P N (1972)
Psychology of Reasoning Structure and Content Lon- D. Marr
Lettvin, J , Maturana, H , Pitts, W , and McCulloch, W don- Batsford; Cambridge, MA: Harvard University
(1961) Two remarks on the visual system of the frog Press
In Sensory Communication (W Rosenblith, ed ) Cam- Young, R K (1968) Serial learning In Verbal Behavior
and General Behavior Theory (T Dixon and D Horton,
Understanding Complex Information- Representation and dewiption
bridge, MA: MIT Press
eds) Englewood Cliffs, NJ: Prentice-Hall processing Systems A ?epresentatzon is a formal system for making
explicit certain entities or types of information,
[ ] Almost never can a complex system of any together with a specification of how the system
kind be understood as a simple extrapolation from does this And I shall call the result of using a
the proper ties of its elementary components Con- representation to describe a given entity a desmp-
sider, for example, some gas in a bottle A descrip- tzon of the entity in that representation (Man and
tion of thermodynamic effects - temperature, Nishihara, 1978)
pressure, density, and the relationships among For example, the Arabic, Roman, and binary
these factors - is not formulated by using a large numeral systems are all formal systems for repres-
set of equations, one for each of the particles enting number s T h e Arabic representation con-
involved Such effects are described at their own sists of a string of symbols drawn from the set (0,
level, that of an enormous collection of particles; 1, 2, 3, 4, 5 , 6, 7, 8, 9), and the rule for construct-
the effort is to show that in principle the micro- ing the description of a particular integer n is that
scopic and macroscopic descriptions are consistent one decomposes n into a sum of multiples of
with one another If one hopes to achieve a full powers of 10 and unites these multiples into a
understanding of a system as complicated as a string with the largest powers on the left and the
nervous system, a developing embryo, a set of smallest on the right Thus, thirty-seven equals
metabolic pathways, a bottle of gas, or even a +
3 x 10' 7 x lo0, which becomes 37, the Arabic
large computer program, then one must be pre- numeral system's description of the number
pared to contemplate different kinds of explana- What this description makes explicit is the
tion at different levels of description that are number's decomposition into powers of 10
linked, at least in principle, into a cohesive The binary numeral system's description of
whole, even if linking the levels in complete detail the number thirty-seven is 100101, and
is impractical For the specific case of a system that this description makes explicit the number's
solves an information-processing problem, there decomposition into powers of 2 In the Roman
are in addition the twin strands of process and numeral system, thirty-seven is represented as
representation, and both these ideas need some XXXVII
discussion This definition of a representation is quite gen-
eral For example, a representation for shape
would be a formal scheme for describing some
Marr, D,,,Vision (0
1982 by W. H. Freeman and Com- aspects of shape, together with rules that specify
pany, Used with permission). how the scheme is applied to any particular shape
D. Marr Vision

A musical score provides a way of representing a entation? On the whole, business computers and when combining the prices of the purchased items tion, we might choose Arabic numerals for the
symphony; the alphabet allows the construction of pocket calculators take the second approach, and to arrive at a final bill T h e reason is that the rules representations, and for the algorithm we could
a written representation of words; and so forth general purpose computers take the first But even we intuitively feel to be appropriate for combining follow the usual rules about adding the least sig-
T h e phrase "formal scheme" is critical to the though one is not restricted to using just one the individual prices in fact define the mathema- nificant digits first and "carrying" if the sum
definition, but the reader should not be frightened representation system for a given type of informa- tical operation of addition These can be formu- exceeds 9 Cash registers, whether mechanical or
by it T h e reason is simply that we are dealing with tion, the choice of which to use is important and lated as constraints in the following way: electronic, usually use this type of representation
information-processing machines, and the way cannot be taken lightly It determines what informa- and algorithm
such machines work is by using symbols to stand tion is made explicit and hence what is pushed 1 If you buy nothing, it should cost you nothing; There are three important points here First,
for things - to represent things, in our termi- further into the background, and it has a far-reach- and buying nothing and something should cost there is usually a wide choice of representation
nology T o say that something is a formal ing effect on the ease and difficulty with which the same as buying just the something (The Second, the choice of algorithm often depends
scheme means only that it is a set of symbols operations may subsequently be carried out on that rules for zero ) rather critically on the particular representation
with rules for putting them together - no more information 2 T h e order in which goods are presented to the that is employed And third, even for a given
and no less cashier should not affect the total (Commut- fixed representation, there are often several poss-
A representation, therefore, is not a foreign idea ativity ) ible algorithms for carrying out the same process
Process
at all - we all use representations all the time 3 Arranging the goods into two piles and paying Which one is chosen will usually depend on any
However, the notion that one can capture some The term process is very broad For example, addi- for each pile separately should not affect the particularly desirable or undesirable character istics
aspect of reality by making a description of it using tion is a process, and so is taking a Fourier trans- total amount you pay (Associativity: the basic that the algorithms may have; for example, one
a symbol and that to do so can be useful seems to form But so is making a cup of tea, or going operation for combining prices ) algorithm may be much more efficient than
me a fascinating and powerful idea But even the shopping For the purposes of this book, I want 4 If you buy an item and then return it for a another, or another may be slightly less
simple examples we have discussed introduce to restrict our attention to the meanings associated refund, your total expenditure should be zero efficient but more robust (that is, less sensitive to
some rather general and important issues that with machines that are carrying out information- (Inverses ) slight inaccuracies in the data on which it
arise whenever one chooses to use one particular processing tasks So let us examine in depth the must run) Or again, one algorithm may be
representation For example, if one chooses the notions behind one simple such device, a cash It is a mathematical theorem that these conditions parallel, and another, serial The choice, then,
Arabic numeral representation, it is easy to dis- register at the checkout counter of a supermarket define the operation of addition, which is therefore may depend on the type of hardware or machinery
cover whether a number is a power of 10 but There are several levels at which one needs to the appropriate computation to use in which the algorithm is to be embodied
difficult to discover whether it is a power of 2 If understand such a device, and it is perhaps most This whole argument is what I call the computa- physically
one chooses the binary representation, the situa- useful to think in terms of three of them T h e most tional theory of the cash register Its important This brings us to the third level, that of the
tion is reversed Thus, there is a trade-off; any abstract is the level of what the device does and features are (1) that it contains separate arguments device in which the process is to be realized phys-
pal ticular representation makes certain informa- why What it does is arithmetic, so our first task is about what is computed and why and (2) that the ically The important point here is that, once
tion explicit at the expense of information that is to master the theory of addition Addition is a resulting operation is defined uniquely by the con- again, the same algorithm may be implemented
pushed into the background and may be quite hard mapping, usually denoted by +, from pairs of straints it has to satisfy In the theory of visual in quite different technologies T h e child who
to recover numbers into single numbers, for example, + processes, the underlying task is to reliably derive methodically adds two numbers from right to
This issue is important, because how informa- maps the pair (3, 4) to 7, and I shall write this in properties of the world from images of it; the left, carrying a digit when necessary, may be
tion is represented can greatly affect how easy it is +
the form (3 4) -^ 7 Addition has a number of business of isolating constraints that are both using the same algorithm that is implemented by
to do different things with it This is evident even abstract properties, however It is commutative: powerful enough to allow a process to be defined the wires and transistors of the cash register in the
from our numbers example: It is easy to add, to + +
both (3 4) and (4 3) are equal to 7; and asso- and generally true of the world is a central theme neighborhood supermarket, but the physical real-
subtract, and even to multiply if the Arabic or + +
ciative: the sum of 3 (4 5) is the same as the of our inquiry ization of the algorithm is quite different in these
binary representations are used, but it is not at all + +
sum of (3 4) 5 Then there is the unique dis- In order that a process shall actually run, how- two cases Another example: Many people have
easy to do these things - especially multiplication - tinguished element, zero, the adding of which has ever, one has to realize it in some way and there- written computer programs to play tic-tac-toe,
with Roman numerals This is a key reason why +
no effect: (4 0) 4 4 Also, for every number fore choose a representation for the entities that and there is a more or less standard algorithm
the Roman culture failed to develop mathematics there is a unique "inverse," written (-4) in the the process manipulates T h e second level of the that cannot lose This algorithm has in fact been
in the way the earlier Arabic cultures had case of 4, which when added to the number gives analysis of a process, therefore, involves choosing implemented by W D Hillis and B Silverman in
An analogous problem faces computer engineers +
zero: [4 (-4)] -> 0 two things: (1) a representation for the input and for a quite different technology, in a computer made
today Electronic technology is much more suited Notice that these properties are part of the the output of the process and (2) an algorithm by out of Tinkertoys, a children's wooden building
to a binary number system than to the conven- fundamental theory of addition They are true no which the transformation may actually be accom- set T h e whole monstrously ungainly engine,
tional base 10 system, yet humans supply their matter how the numbers are written whether in
- plished For addition, of course, the input and out- which actually works, currently resides in a
data and require the results in base 10 T h e design binary, Arabic, or Roman representation - and no put representations can both be the same, because museum at the University of Missouri in St
decision facing the engineer, therefore, is: Should matter how the addition is executed Thus part of they both consist of numbers However this is not Louis
one pay the cost of conversion into base 2, carry this first level is something that might be charac- true in general In the case of a Fourier transform, Some styles of algorithm will suit some physical
out the arithmetic in a binary representation, and terized as what is being computed for example, the input representation may be the substrates better than others For example, in con-
then convert back into decimal numbers on out- T h e other half of this level of explanation has to time domain, and the output, the frequency ventional digital computers, the number of con-
put; or should one sacrifice efficiency of circuitry do with the question of why the cash register per- domain If the first of our levels specifies what nections is comparable to the number of gates,
to carry out operations directly in a decimal repres- forms addition and not, for instance, multiplication and why, this second level specifies how For addi- while in a brain, the number of connections is
D. Marr Vision

much larger ( X lo4) than the number of nerve of each level involves issues that are rather inde-
cells T h e underlying reason is that wires are pendent of the other two
rather cheap in biological architecture, because Each of the three levels of description will have
they can grow individually and in three dimen- its place in the eventual understanding of percep-
sions In conventional technology, wire laying is tual information processing, and of course they are
more or less restricted to two dimensions, which logically and causally related But an important
quite severely restricts the scope for using parallel point to note is that since the three levels are
techniques and algorithms; the same operations are only rather loosely related, some phenomena may
often better carried out serially be explained at only one or two of them This
means, for example, that a cotrect explanation of
some psychophysical observation must be formu-
The three levels lated at the appropriate level In attempts to relate
psychophysical problems to physiology, too often figwe 5 I The so-called Necker illusion, named after L A Necker, the Swiss naturalist who developed it in 1832 The
We can summarize our discussion in something essence of the matter is that the two-dimensional representation (a) has collapsed the depth out of a cube and that
there is confusion about the level at which prob-
like the manner shown in table 5 1, which illus- a certain aspect of human vision is to recover this missing third dimension The depth of the cube can indeed be
lems should be addressed For instance, some are
trates the different levels at which an information- but two interpretations are possible, (b) and (c) A person's perception characteristically flips from one to
related mainly to the physical mechanisms of
processing device must be understood before one the other
vision - such as afterimages (for example, the one
can be said to have understood it completely At
you see after staring at a light bulb) or such as
one extreme, the top level, is the abstract compu- limits of their performance or are deprived of a distinction at the level of algorithm; it is not
the fact that any color can be matched by a suitable
tational theory of the device, in which the perfor- critical information As we shall see, primarily fundamental at all - anything programmed in par-
mixture of the three primaries (a consequence
mance of the device is characterized as a mapping psychophysical evidence proved to Poggio and allel can be rewritten serially (though not necess-
principally of the fact that we humans have three
from one kind of information to another, the myself that our first stereo-matching algorithm arily vice versa) The distinction, therefore,
types of cones) On the other hand, the ambiguity
abstract properties of this mapping are defined (Man and Poggio, 1976) was not the one that is provides no grounds for arguing that the brain
of the Necker cube (figure 5 1) seems to demand a
precisely, and its appropriateness and adequacy used by the brain, and the best evidence that our operates so differently from a computer that a
different kind of explanation T o be sure, part of
for the task at hand are demonstrated In the second algorithm (Marr and Poggio, 1979) is computer could not be programmed to perform
the explanation of its perceptual reversal must
center is the choice of representation for the roughly the one that is used also comes from the same tasks
have to do with a bistable neural network (that is,
input and output and the algorithm to be used to psychophysics Of course, the underlying computa-
one with two distinct stable states) somewhere
transform one into the other And at the other tional theory remained the same in both cases, only
inside the brain, but few would feel satisfied by
extreme are the details of how the algorithm and the algorithms were different
Importance of computational theory
an account that failed to mention the existence of
representation are realized physically the -
Psychophysics can also help to determine the Although algorithms and mechanisms are empir-
two different but perfectly plausible three-dimen-
detailed computer architecture, so to speak nature of a representation T h e work of Roger ically more accessible, it is the top level, the level
sional interpretations of this two-dimensional
These three levels are coupled, but only loosely Shepard (1975), Eleanor Rosch (1978), or Eliza- of computational theory, which is critically impor-
image
T h e choice of an algorithm is influenced for beth Warrington (1975) provides some interesting tant from an information-processing point of view
For some phenomena, the type of explanation
example, by what it has to do and by the hardware hints in this direction More specifically, Stevens T h e reason for this is that the nature of the com-
required is fairly obvious Neur oanatomy, for
in which it must run But there is a wide (1979) argued from psychophysical experiments putations that underlie perception depends more
example, is clearly tied principally to the third
choice available at each level, and the explication that surface orientation is represented by the co- upon the computational problems that have to be
level, the physical realization of the computation
The same holds for synaptic mechanisms, action ordinates of slant and tilt, rather than (for example) solved than upon the particular hardware in which
Table 5 1 The three levels at which any machine potentials, inhibitory interactions, and so forth the more traditional (p, q) of gradient space (see their solutions are implemented T o phrase the
carrying out an information-processing task must be Neurophysiology, too, is related mostly to this chapter 3) H e also deduced from the uniformity of matter another way, an algorithm is likely to be
understood level, but it can also help us to understand the the size of errors made by subjects judging surface understood more readily by understanding the
type of representations being used, particularly if orientation over a wide range of orientations that nature of the problem being solved than by exam-
Computational Repiesentation and Hardware
one accepts something along the lines of Barlow's the representational quantities used for slant and ining the mechanism (and the hardware) in which
theory alga! tthm tmplemeittation
views that I quoted earlier But one has to exercise tilt are pure angles and not, for example, their it is embodied
What is the goal How can this How can the extreme caution in making inferences from neuro- cosines, sines, or tangents In a similar vein, trying to understand percep-
of the computational representation physiological findings about the algorithms and More generally, if the idea that different pheno- tion by studying only neurons is like trying to
computation, theory be and algorithm be representations being used, particularly until one mena need to be explained at different levels is understand bird flight by studying only feathers:
why is it implemented? In realized has a clear idea about what information needs to be kept clearly in mind, it often helps in the assess- It just cannot be done In order to understand bird
appropriate, and particular, what is physically? represented and what processes need to be imple- ment of the validity of the different kinds of objec- flight, we have to understand aerodynamics; only
what is the logic the representation tions that are raised from time to time For then do the structure of feathers and the different
mented
of the strategy for the input and example, one favorite is that the brain is quite
Psychophysics, on the other hand, is related shapes of birds' wings make sense More to the
by which it can output, and what is
more directly to the level of algorithm and repres- different from a computer because one is parallel point, as we shall see, we cannot understand why
be carried out? the algorithm for the
entation Different algorithms tend to fail in rad- and the other serial T h e answer to this, of course, retinal ganglion cells and lateral geniculate neurons
transformation?
ically different ways as they are pushed to the is that the distinction between serial and parallel is have the receptive fields they do just by studying
D. Marr Vision
their anatomy and physiology We can understand T h e explanation is simply that finding algo- constitute, therefore, information about the per- the apparent simplicity of the act of seeing The
how these cells and neurons behave as they do by rithms by which Chomsky's theory may be imple- manent environment " This led him to a view in whole tradition of philosophical inquiry into the
studying their wiring and interactions, but in order mented is a completely different endeavor from which the function of the brain was to "detect nature of perception seems not to have taken seri-
to understand why the receptive fields are as they formulating the theory itself In our terms, it is a invariants" despite changes in "sensations" of ously enough the complexity of the information
are - why they are circularly symmetrical and why study at a different level, and both tasks have to be light, pressure, or loudness of sound Thus, he processing involved For example, Austin's (1962)
their excitatory and inhibitory regions have char- done This point was appreciated by Marcus says that the "function of the brain, when looped Seme and Sensibilia entertainingly demolishes the
acteristic shapes and distributions - we have to (1980), who was concerned precisely with how with its perceptual organs, is not to decode signals, argument, apparently favored by earlier philo-
know a little of the theory of differential operators, Chomsky's theory can be realized and with the nor to interpret messages, nor to accept images, sophers, that since we are sometimes deluded by
band-pass channels, and the mathematics of the kinds of constraints on the power of the human nor to organize the sensory input or to piocess the illusions (for example, a straight stick appears bent
unceitainty principle (see chapter 2) grammatical processor that might give rise to the data, in modern terminology I t is to seek and if it is partly submerged in water), we see sense-
Perhaps it is not surprising that the very special- structural constraints in syntax that Chomsky extract information about the environment from data rather than material things T h e answer is
ized empirical disciplines of the neurosciences found It even appears that the emerging "trace" the flowing array of ambient energy," and he simply that usually our perceptual processing
failed to appreciate fully the absence of computa- theory of grammar (Chomsky and Lasnik, 1977) thought of the nervous system as in some way does run correctly (it delivers a true description
tional theory, but it is surprising that this level of may provide a way of synthesizing the two "resonating" to these invariants He then of what is there), but although evolution has seen
approach did not play a more forceful role in the approaches - showing that, for example, some of embarked on a broad study of animals in their to it that our processing allows for many changes
early development of artificial intelligence For far the rather ad hoc restrictions that form part of the environments, looking for invariants to which (like inconstant illumination), the perturbation due
too long, a heuristic program for canying out some computational theor y may be consequences of they might resonate This was the basic idea to the refraction of light by water is not one of
task was held to be a theory of that task, and the weaknesses in the computational power that is behind the notion of ecological optics (Gibson, them And incidentally, although the example of
distinction between what a program did and how it available for implementing syntactical decoding 1966, 1979) the bent stick has been discussed since Aristotle, I
did it was not taken seriously As a result, (1) a Although one can criticize certain shortcomings have seen no philosophical inquiry into the nature
style of explanation evolved that invoked the use of in the quality of Gibson's analysis, its major and, of the perceptions of, for instance, a heron, which
The approach of 77 Gibson
special mechanisms to solve particular problems, in my view, fatal shortcoming lies at a deeper level is a bird that feeds by pecking up fish first seen
(2) particular data structures, such as the lists of In perception, perhaps the nearest anyone came to and results from a failure to realize two things from above the water surface For such birds the
attribute value pairs called property lists in the the level of computational theory was Gibson First, the detection of physical invariants, like visual correction might be present
LISP programing language, were held to amount (1966) However, although some aspects of his image surfaces, is exactly and precisely an informa- Anyway, my main point here is another one
to theories of the representation of knowledge, and thinking were on the right lines, he did not under- tion-processing problem, in modern termino- Austin (1962) spends much time on the idea
(3) there was frequently no way to determine stand properly what information processing was, logy And second, he vastly underrated the sheer that perception tells one about real properties of
whether a program would deal with a particular which led him to seriously underestimate the com- difficulty of such detection In discussing the the external world, and one thing he considers
case other than by running the program plexity of the information-processing problems recovery of three-dimensional information from is "real shape" (p 66), a notion which had
Failure to recognize this theoretical distinction involved in vision and the consequent subtlety the movement of an observer, he says that "in cropped up earlier in his discussion of a coin that
between what and how also greatly hampered com- that is necessar y in appr caching them motion, perspective information alone can be "looked elliptical" from some points of view Even
munication between the fields of artificial intelli- Gibson's important contribution was to take the used" (Gibson, 1966: 202) And perhaps the key so,
gence and linguistics Chomsky's (1965) theory of debate away from the philosophical considerations to Gibson is the following:
transformational grammar is a true computational of sense-data and the affective qualities of sensa- it had a real shape which remained unchanged
theory in the sense defined earlier It is concerned tion and to note instead that the important thing The detection of non-change when an object But coins in fact are rather special cases For one
solely with specifying what the syntactic decom- about the senses is that they are channels for moves in the world is not as difficult as it thing their outlines are well defined and very
position of an English sentence should be, and not perception of the real world outside or, in the might appear It is only made to seem difficult highly stable, and for another they have a known
at all with how that decomposition should be case of vision, of the visible surfaces H e therefore when we assume that the perception of constant and a nameable shape But there are plenty of
achieved Chomsky himself was very clear about asked the critically important question, How does dimensions of the object must depend on the things of which this is not true What is the real
this - it is roughly his distinction between com- one obtain constant perceptions in everyday life on correcting of sensations of inconstant form and shape of a cloud? or of a cat? Does its real
petence and performance, though his idea of the basis of continually changing sensations? This size The information for the constant dimen- shape change whenever it moves? If not, in what
performance did include other factors, like stopping is exactly the right question, showing that Gibson sion of an object is normally carried by invariant posture i s its real shape on display? Further-
in midutterance - but the fact that his theory was correctly regarded the problem of perception as relations in an optic array Rigidity is specified more, is its real shape such as to be fairly smooth
defined by transformations, which look like com- that of recovering from sensory information (emphasis added) outlines, or must it be finely enough serrated to
putations, seems to have confused many people "valid" properties of the external world His prob- take account of each hair? I t ispretty obvious that
Winograd (1972), for example, felt able to criticize lem was that he had a much oversimplified view of Yes, to be sure, but how? Detecting physical invar- there is no answer to thew questtons - no rules
Chomsky's theory on the grounds that it cannot be how this should be done His approach led him to iants is just as difficult as Gibson feared, but according to which, no procedure by which, answers
inverted and so cannot be made to run on a com- consider higher-order variables - stimulus energy, nevertheless we can do it And the only way to are to be d e t e m e d (emphasis added) (p 67)
puter; I had heard reflections of the same argu- ratios, proportions, and so on - as "invariants" of understand how is to treat it as an information-
ment made by Chomsky's colleagues in linguistics the movement of an observer and of changes in processing problem But there are answers to these questions There
as they turn their attention to how grammatical stimulation intensity The underlying point is that visual information are ways of describing the shape of a cat to an
structure might actually be computed from a real 'These invariants," he wrote, "correspond to processing is actually very complicated, and Gib- arbitrary level of precision (see chapter 5), and
English sentence permanent properties of the environment They son was not the only thinker who was misled by there are rules and procedures for arriving at
D. Marr Vision

such descriptions That is exactly what vision is mate Otherwise, maybe a meal The frog detects
about, and precisely what makes it complicated bugs with its retina; and the rabbit retina is full of
special gadgets, including what is apparently a
hawk detector, since it responds well to the pattern
A Representational Framework for made by a preying hawk hovering overhead
Vision Human vision, on the other hand, seems to be
very much more general, although it clearly con-
Vision is a process that produces from images of tains a variety of special-purpose mechanisms that
the external world a description that is useful to can, for example, direct the eye toward an un-
the viewer and not cluttered with irrelevant inform- expected movement in the visual field or cause
ation (Man, 1976; M a n and Nishihara, 1978) We one to blink or otherwise avoid something that
have already seen that a process may be thought of approaches one's head too quickly
as a mapping from one representation to another, Vision, in short, is used in such a bewildering
and in the case of human vision, the initial variety of ways that the visual systems of different
representation is in no doubt - it consists of animals must differ significantly from one another
arrays of image intensity values as detected by Can the type of formulation that I have been
the photoreceptors in the retina advocating, in terms of representations and pro-
It is quite proper to think of an image as a cesses, possibly prove adequate for them all? I
representation; the items that are made explicit think so The general point here is that because
are the image intensity values at each point in the vision is used by different animals for such a wide
array, which we can conveniently denote by I (v,y) variety of purposes, it is inconceivable that all
at coordinate (x,y) In order to simplify our dis- seeing animals use the same representations; each Figure 7 2 The horizontal component of the visual input R to the fly's flight system is described by the formula
cussion, we shall neglect for the moment the fact can confidently be expected to use one or more
4
R = D(*)- r($)*, where $ is the direction of the stimulus and is its angular velocity in the fly's visual field D($)is
an odd function, as shown in (a), which has the effect of keeping the target centered in the fly's visual field; r($) is
that there are several different types of receptor, representations that are nicely tailored to the own- essentially constant as shown in (b)
and imagine instead that there is just one, so that e r s purposes
the image is black-and-white Each value of I (x,y) As an example, let us consider briefly a primit-
thus specifies a particular level of gray; we shall ive but highly efficient visual system that has the the torque produced by the asymmetry of the front of a textured ground - having some kind of
refer to each detector as a picture element or pixel added virtue of being well understood Werner horizontal thrust from the left and right wings) motion relative to its background; and if there is
and to the whole array I as an image Reichardt's group in Tubingen has spent the last The visual input to the horizontal control system, 4
such a patch, (3) $ and for this patch are deliv-
But what of the output of the process of vision? 14 years patiently unraveling the visual flight- for example, is completely described by the two ered to the motor system And that is probably
We have already agreed that it must consist of a control system of the housefly, and in a famous terms about 60° of fly vision In particular, it is ex-
useful description of the world, but that require- collaboration, Reichardt and Tomaso Poggio have tremely unlikely that the fly has any explicit repres-
ment is rather nebulous Can we not do better? gone far toward solving the problem (Reichardt entation of the visual world around him - no true
Well, it is perfectly true that, unlike the input, the and Poggio, 1976, 1979; Poggio and Reichardt, conception of a surface, for example, but just a few
where r and D have the form illustrated in figure
result of vision is much harder to discern, let alone
specify precisely, and an important aspect of this
new approach is that it makes quite concrete pro-
1976) Roughly speaking, the fly's visual apparatus
controls its flight through a collection of about five
independent, rigidly inflexible, very fast respond-
5 2 This input describes how the fly tracks an
object that is present at angle ip in the visual field
meters like ip and *
triggers and some specifically fly-centered para-

It is clear that human vision is much more


posals about what that end is But before we begin ing systems (the time from visual stimulus to and has angular velocity if) This system is trig- complex than this, although it may well incorpor-
that discussion, let us step back a little and spend a change of torque is only 21 ms) For example, gered to track objects of a certain angular dimen- ate subsystems not unlike the fly's to help with
little time formulating the more general issues that one of these systems is the landing system; if the sion in the visual field, and the motor strategy is specific and rather low-level tasks like the control
are raised by these questions visual field "explodes" fast enough (because a such that if the visible object was another fly a few of pursuit eye movements, Nevertheless, as Poggio
surface looms nearby), the fly automatically inches away, then it would be intercepted success- and Reichardt have shown, even these simple sys-
"lands" toward its center If this center is above fully If the target was an elephant 100 yd away, tems can be understood in the same sort of way, as
The purpose of vision the fly, the fly automatically inverts to land interception would fail because the fly's built-in information-processing tasks And one of the fas-
The usefulness of a representation depends upon upside down When the feet touch, power to the parameters are for another fly nearby, not an ele- cinating aspects of their work is how they have
how well suited it is to the purpose for which it is wings is cut off Conversely, to take off, the fly phant far away managed not only to formulate the differential
used A pigeon uses vision to help it navigate, fly, jumps; when the feet no longer touch the ground, Thus, fly vision delivers a representation in equations that accurately describe the visual con-
and seek out food Many types of jumping spider power is restored to the wings, and the insect flies which at least these three things are specified: (1) trol system of the fly but also to express these
use vision to tell the difference between a potential again whether the visual field is looming sufficiently fast equations, using the Volterra series expansion, in
meal and a potential mate One type, for example, In-flight control is achieved by independent that the fly should contemplate landing; (2) a way that gives direct information about the min-
has a curious retina formed of two diagonal strips systems controlling the fly's vertical velocity whether there is a small patch - it could be a imum possible complexity of connections of the
arranged in a V If it detects a red V on the back of (through control of the lift generated by the black speck or, it turns out, a textured figure in under lying neur onal networks
an object lying in front of it, the spider has found a wings) and horizontal direction (determined by
D. Marr Vision

Advanced vision was made of, and so forth If their view was uncon- was somehow the quintessential fact of human Table 5 2 Representational framework for deriving
ventional - a pail seen from above, for example - vision - that it tells about shape and space and shape information from images
Visual systems like the fly's serve adequately and not only would the patients fail to recognize it, but spatial arrangement Here lay a way to formulate
with speed and precision the needs of their owners, they would vehemently deny that it could be a view its purpose - building a description of the shapes Name Purpose Primitives
but they are not very complicated; very little of a pail Patients with left parietal lesions behaved and positions of things from images Of course, that
objective information about the world is obtained Image(s) Represents intensity Intensity value
completely differently Often these patients had no is by no means all that vision can do; it also tells at each point
T h e information is all very much subjective - the language, so they were unable to name the viewed about the illumination and about the reflectances of in the image
angular size of the stimulus as the fly sees it rather object or state its purpose and semantics But they the surfaces that make the shapes - their bright- Primal sketch Makes explicit Zero-crossings
than the objective size of the object out there, the could convey that they correctly perceived its geo- nesses and colors and visual textures - and about important Blobs
angle that the object has in the fly's visual field metry - that is, its shape - even from the uncon- their motion But these things seemed secondary; information about Terminations
rather than its position relative to the fly or to ventional view they could be hung off a theory in which the the two-dimensional and
some external reference, and the object's angular Warrington's talk suggested two things First, main job of vision was to derive a representation image, primarily the discontinuities
velocity, again in the fly's visual field, rather than the representation of the shape of an object is of shape intensity changes Edge segments
any assessment of its true velocity relative to the stored in a different place and is therefore a quite there and their Virtual lines
fly or to some stationary reference point different kind of thing from the representation of geometrical Groups
One reason for this simplicity must be that these its use and purpose And second, vision alone can
To the dewable via the possible distribution and Curvilinear
facts provide the fly with sufficient information for organization organization
deliver an internal description of the shape of a Finally, one has to come to terms with cold reality Boundaries
it to survive Of course, the information is not viewed object, even when the object was not Desirable as it may be to have vision deliver a 2 % - D sketch Makes explicit the Local surface
optimal and from time to time the fly will fritter recognized in the conventional sense of under- completely invariant shape description from an orientation and orientation (the
away its energy chasing a falling leaf a medium standing its use and purpose image (whatever that may mean in detail), it is rough depth of the "needles"
distance away or an elephant a long way away as a This was an important moment for me for two almost certainly impossible in only one step We visible surfaces, and primitives)
direct consequence of the inadequacies of its per- reasons T h e general trend in the computer vision can only do what is possible and proceed from contours of Distance from
ceptual system But this apparently does not mat- community was to believe that recognition was so there toward what is desirable Thus we arrived discontinuities in viewer
ter very much - the fly has sufficient excess energy difficult that it required every possible kind of at the idea of a sequence of representations, start- these quantities in a Discontinuities
for it to be able to absorb these extra costs information The results of this point of view ing with descriptions that could be obtained viewer-centered in depth
Another reason is certainly that translating these straight from an image but that are carefully coordinate frame Discontinuities
duly appeared a few years later in programs like
rather subjective measurements into more object- in surface
Freuder's (1974) and Tenenbaum and Barrow's designed to facilitate the subsequent recovery of
ive qualities involves much more computation orientation
(1976) In the latter program, knowledge about gadually more objective, physical properties about
How, then, should one think about more advanced 3-D model Describes shapes and 3-D models
offices - for example, that desks have telephones an object's shape The main stepping stone toward representation their spatial organi- arranged
visual systems - human vision, for example What on them and that telephones are black - was used this goal is describing the geometry of the
are the issues? What kind of information is vision zation in an object- hierarchically,
to help "segment" out a black blob halfway up an visible surfaces, since the information encoded in centered coordinate each one based
really delivering, and what are the representational image and "recognize" it as a telephone Freuder's images, for example by stereopsis, shading, frame, using a on a spatial
issues involved? program used a similar approach to "segment" and texture, contours, or visual motion, is due to a modular hierarchical configuration of
My approach to these problems was very much 'recognize" a hammer in a scene Clearly, we do shape's local surface properties T h e objective of representation that a few sticks or
influenced by the fascinating accounts of clinical use such knowledge in real life; I once saw a brown many early visual computations is to extract this includes volumetric axes, to which
neurology, such as Critchley (1953) and Warring- blob quivering amongst the lettuce in my garden information primitives (i e , volumetric or
ton and Taylor (1973) Particularly important was a and correctly identified it as a rabbit, even though However, this description of the visible surfaces primitives that surface shape
lecture that Elizabeth Warrington gave at M I T in represent the volume primitives are
the visual information alone was inadequate And turns out to be unsuitable for recognition tasks
October 1973, in which she described the capacities of space that a shape attached
yet here was this young woman calmly telling us There are several reasons why, perhaps the most
and limitations of patients who had suffered left or occupies) as well as
not only that her patients could convey to her that prominent being that like all early visual processes, sut face primitives
right parietal lesions For me, the most important they had grasped the shapes of things that she had it depends critically on the vantage point T h e
thing that she did was to draw a distinction between shown them, even though they could not name the final step therefore consists of transforming the
the two classes of patient (see Warrington and objects or say how they were used, but also that viewer-centered surface description into a repres- such as intensity changes and local two-dimen-
Taylor, 1978) For those with lesions on the right they could happily continue to do so even if she entation of the three-dimensional shape and spatial sional geometry; (2) the representation of proper-
side, recognition of a common object was possible made the task extremely difficult visually by show- arrangement of an object that does not depend ties of the visible surfaces in a viewer-centered
provided that the patient's view of it was in some ing them peculiar views or by illuminating the upon the direction from which the object is being coordinate system, such as surface orientation, dis-
sense straightfor ward She used the words conven- objects in peculiar ways It seemed clear that the viewed This final description is object centered tance from the viewer, and discontinuities in these
ttonal and unconventional - a water pail or a clarinet intuitions of the computer vision people were com- rather than viewer centered quantities; surface reflectance; and some coarse
seen from the side gave "conventional" views but pletely wrong and that even in difficult circum- The overall framework described here therefore description of the prevailing illumination; and (3)
seen end-on gave "unconventional" views If these stances shapes could be determined by vision divides the derivation of shape information from an object-centered representation of the three-
patients recognized the object at all, they knew its alone images into three representational stages (table dimensional structure and of the organization of
name and its semantics - that is, its use and pur- T h e second important thing, I thought, was that 5 2): (1) the representation of properties of the the viewed shape, together with some description
pose, how big it was, how much it weighed, what it Elizabeth Warrington had put her finger on what two-dimensional image, of its surface proper ties
D. Marr Vision

This framework is summarized in table 5 2 been able to formulate a rather clear over all frame- a particular problem In the study both of repre-
coarse psychophysical
Chapters 2 through 5 give a more detailed account work for the process of vision This framework is sentations and of processes, general problems are
[ 1 based on the idea that the critical issues in vision often suggested by everyday experience or by psy-
revolve around the nature of the representations chophysical or even neurophysiological findings of
used - that is, the particular characteristics of the a quite gener a1 nature Such general observations
Synopsis world that are made explicit during vision - and can often lead to the formulation of a particular
the nature of the processes that recover these char- process or representational theory, specific exam-
Our survey of this new, computational approach to acteristics, create and maintain the representa- ples of which can be programmed or subjected to
vision is now complete Although there are many tions, and eventually read them By analyzing the detailed psychophysical testing Once we have suf-
gaps in the account, I hope that it is solid enough spatial aspects of the problem of vision, we arrived ficient confidence in the correctness of the process
to establish a firm point of view about the subject at an overall framework for visual information or representation at this level, we can inquire about
(can be programmed)
and to prompt the reader to begin to judge its processing that hinges on three principal represent- its detailed implementation, which involves the
value In this brief chapter, I shall take a very ations: (1) the primal sketch, which is concerned ultimate and very difficult problems of neurophy-
broad view of the whole approach, inquiring into with making explicit properties of the two-dimen- siology and neuroanatomy
sional image, ranging from the amount and dis- psychophysics
its most important general features and how they The second observation is that there is no real

"I
relate to one another, and trying to say something position of the intensity changes there to primitive recipe for this type of research - even though I
about the style of research that this approach representations of the local image geometry, and Specific neural Specific neural have sometimes suggested that there is -any more
mechanism mechanism
implies It is convenient to divide the discussion including at the more sophisticated end a hierarch- than there is a straightforward procedure for dis-
into four main points ical description of any higher-or der structure pr es- I I
covering things in any other branch of science
The first point is one that we have met through- ent in the underlying reflectance distributions; (2) Detailed Indeed, part of the fun is that we never really
out the account - the notion of different levels of the 2 % - D sketch, which is a viewer-centered neurophysiology know where the next key is going to come from -
explanation The central tenet of the approach is representation of the depth and orientation of the 1 and neuroanatomy 1 a piece of daily experience, the report of a neuro-
that to understand what vision is and how it works, visible surfaces and includes contours of disconti- Figwe S 3 Relationships between representations and logical deficit, a theorem about three-dimensional
an understanding at only one level is insufficient It nuities in these quantities; and (3) the 3-D model processes geometry, a psychophysical finding in hyperacuity,
is not enough to be able to describe the responses of representation, whose important features are that a neurophysiological observation, or the careful
single cells, nor is it enough to be able to predict its coordinate system is object centered, that it computational theory are valid and may be implicit analysis of a representational problem All these
locally the results of psychophysical experiments includes volumetric primitives (which make expli- kinds of information have played important roles
in the human processor; second, if a process
Nor is it enough even to be able to write computer cit the organization of the space occupied by an in establishing the framework that I have
matches human performance, it is probably suffi-
programs that perform approximately in the object and not just its visible surfaces), and that described, and they will presumably continue to
ciently powerful to form part of a general purpose
desired way One has to do all these things at primitives of various size are included, arranged in contribute to its advancement in an interesting and
vision machine
once and also be very aware of the additional level a modular, hierarchical organization The final point concerns the methodology or unpredictable way I hope only that these observa-
of explanation that I have called the level of com- The third main point concerns the study of tions may persuade some of my readers to join in
style of this type of approach, and it involves two
putational theory The recognition of the existence processes for recovering the various aspects of the adventures we have had and to help in the long
main observations First, the duality between
and importance of this level is one of the most the physical characteristics of a scene from images representations and processes, which is set out but rewarding task of unraveling the mysteries of
important aspects of this approach Having recog- of it The critical act in formulating computational explicitly in figure 5 3, often provides a useful human visual perception
nized this, one can formulate the three levels of theories for such processes is the discovery of valid aid to thinking how best to proceed when studying
explanation explicitly (computational theory, algo- constraints on the way the world behaves that
rithm, and implementation), and it then becomes provide sufficient additional information to allow
clear how these different levels are related to the recovery of the desired characteristic We saw
Note
different types of empirical observation and theo- many examples of this in chapter 3, and they
retical analysis that can be conducted I have laid were summarized in Table 3-3 The power of
1 Editor's note: the passages to which Man here refers response is often dramatic; they turn towards the
particular stress on the level of computational the- this type of analysis resides in the fact that the are as follows (from pp 12-13 of Viwn) target and make repeated feeding responses consist-
ory, not because I regard it as inherently more discovery of valid, sufficiently universal con- ing of a jump and snap The selectivity of the retinal
important than the other two levels - the real straints leads to conclusions about vision that If one explores the responsiveness of single ganglion neurons and the frog's reaction when they are selec-
power of the approach lies in the integration of all have the same permanence as conclusions in cells in the frog's retina using handheld targets, one tively stimulated, suggest that they are "hug detec-
three levels of attack - but because it is a level of other branches of science finds that one particular type of ganglion cell is most tors" (Barlow, 1953) performing a primitive but
explanation that has not previously been recog- Furthermore, once a computational theory for a effectively driven by something like a black disc sub- vitally important form of recognition The result
nized and acted upon It is therefore probably one process has been formulated, algorithms for imple- tending a degree or so moved rapidly to and fro makes one suddenly realize that a large part of the
menting it may be designed, and their performance within the unit's receptive field This causes a vigor- sensory machinery involved in a frog's feeding
of the most difficult ideas for newcomers to the field
ous discharge which can be maintained without much responses may actually reside in the retina rather
to grasp, and for this reason alone its importance compared with that of the human visual processor
decrement as long as the movement is continued than in mysterious "centers" that would be too diffi-
should not be understated [ 1 This allows two kinds of results First, if perfor-
Now, if the stimulus which is optimal for this class cult to understand by physiological methods The
The second main point is that by taking an mance is essentially identical, we have good of cells is presented to intact frogs, the behavioral essential lock-like property resides in each member
infor mation-pr ocessing point of view, we have evidence that the constraints of the underlying
D. Marr Vision
of a whole class of neurons and allows the cell to added] Neurons do not loosely and unreliably remap Shepard, R N (1975) Form formation, and transfoima- Warrington, E K (1975) The selective impairment of
discharge only to the appropriate key pattern of sen- the luminous intensities of the visual image onto our tion of internal representations In R Solso (ed ), semantic memory Quarterly fournal of Experimental
sory stimulation Lettvin et a1 (1959) suggested that sensorium, but instead they detect pattern elements, Information Processing and Cognition The Loyola Psychology, 27, 635-57
there were five different classes of cell in the frog, and discriminate the depth of objects, ignore irrelevant Symposium (pp 87-122) Hillsdale, NJ: Erlbaum Warrington, E K , and Taylor, A M (1973) The con-
Barlow, Hill, and Levick (1964) found an even larger causes of variation and are arranged in an intriguing Stevens, K A (1979) Surface Perception from Local tribution of the right parietal lobe to object recogni-
number of categories in the rabbit [Barlow et a1 1 hierarchy Furthermore, there is evidence that they Analysis of Texture and Contour PhD dissertation, tion Cortex, 9, 152-64
called these key patterns "trigger features," and give prominence to what is informationally important, M I T (Available as' The information content of tex- Warrington, E K , and Taylor, A M (1978) Two cat-
Maturana et a1 (1960) emphasized another important can respond with great reliability, and can have their ture gradients Biological Cybernetics, 42, 95-105; also, egorical stages of object recognition Perception, 7,
aspect of the behavior of these ganglion cells; a cell pattern selectivity permanently modified by early The visual interpretation of suiface contours Artificial 695-705
continues to respond to the same trigger feature in visual experience This amounts to a revolution in Intelligence, 17 (1981), 47-74 ) Winograd, T (1972) Undeistanding Natural Language
spite of changes in light intensity over many decades our outlook It is now quite inappropriate to regard Tenenbaum, J M , and Barrow, H G (1976) Experi- New York: Academic Press
The properties of the retina are such that a ganglion unit activity as a noisy indication of more basic and ments in Interpretation-guided Segmentation Stan-
cell can, figuratively speaking, reach out and deter- reliable processes involved in mental operations: ford Research Institute Technical Note 123
mine that something specific is happening in front of instead, we must iegaid single neurons as the prime
the eye Light is the agent by which it does this, but it movers of these mechanisms Thinking is brought
is the detailed pattern of the light that carries the about by neurons and we should not use phrases like
information, and the overall level of illumination "unit activity reflects, reveals or monitors thought
prevailing at the time is almost totally disregarded piocesses," because the activities of neurons, quite
(Bailow, 1972: 373) simply, are thought processes
The cumulative effect of all the changes I have tried This revolution stemmed from physiological work
to outline above has been to make us realize that each and makes us realize that the activity of each single
single neuron can perform a much move complex and neuron may play a significant role in perception
subtle task than had pieviously been thought [emphasis (ibid : 380)

References

Austin, J L (1962) Sense and Sensibzlzu Oxford: Man, D (1976) Early processing of visual information
Clarendon Press Phil Ttansactions of the Royal Society (Lond B), 275,
Barlow, H (1953) Summation and inhibition in the frog's 483-524
retina 7 Phywol (Lond ), 119, 69-88 Man, D , and Nishihara, H K (1978) Representation
Barlow, H (1972) Single units and sensation a neuron and recognition of the spatial organization of three-
doctrine for perceptual psychology? Perception, 1, dimensional shapes Proceedings of the Royal Society
371-94 (Lond B), 200,269-94
Barlow, H , Hill, R , and Levick, W (1964) Retinal Man, D , and Poggio, T (1976) Cooperative computa-
ganglion cells responding selectively to direction and tion of stereo dispaiity Science, 194, 283-7
speed of image motion in the rabbit 7 Physzol Marr, D , and Poggio, T (1979) A computational theory
(Lond ), 173, 377407 of human stereo vision Proceedings of the RoyalSociety
Chomsky, N (1965) Aspects of the Theory of Syntax of London B , 204, 301-28
Cambridge, MA M I T Press Maturana, H ,Lettvin, 1 , McCulloch, W , and Pitts, W
Chomsky, N , and Lasnik, H (1977) Filters and control (1960) Anatomy and physiology of vision in the frog
Linguiftic Inquiry, 8, 425-504 (Rana pipiens) 7 Gen Phyaol, 43 (suppl no 2,
Critchley, M (1953) The Parietal Lobes London Mechanisms of Vision), 129-71
Edward Arnold Poggio, T , and Reichardt, W (1976) Visual control of
Freuder, E C (1974) A Computer Vision System foi orientation behavior in the fly Part 11: Towards the
Visual Recognition using Active Knowledge M I T underlying neural interactions Quaiterly Review of
Artificial Intelligence Labor ator y Technical Report Biophys , 9 , 377438
345 Reichardt, W , and Poggio, T (1976) Visual control of
Gibson, J J (1966) The Senses Considered as Perceptual orientation behavior in the fly Part I A quantitative
Systems Boston Houghton-Mifflin analysis Quarterly Review of Bzophyf , 9, 311-75
Gibson, J J (1979) The Ecological Appioach to Visual Reichardt, W , and Poggio, T (1979) Visual control of
Perception Boston Houghton-Mifflin flight in flies In W E Reichardt, V B Mountcastle,
Lettvin, J ,Maturana, H , McCulloch, W ,and Pitts, W and T Poggio (eds), Recent Theoretical Developments in
(1959) What the frog's eye tells the frog's brain Proc Nenrobiology
Inst Rad Eng , 47, 1940-51 Rosch, E (1978) Principles of categorization In E Rosch
Marcus, M P (1980) A Theory of Syntactic Recognition and B Lloyd (eds), Cognition and Categorization (pp
for Natural Language Cambridge, MA M I T Press 2748) Hillsdale, N J Erlbaum
BLACKWELL PHILOSOPHY ANTHOLOGIES

Each volume in this outstanding series provides an authoritative and comprehensive collection of
the essential primary readings from philosophy's main fields of study Designed to complement
the Blackwell Companions to Philosophy series, each volume represents an unparalleled resource in
its own right, and will provide the ideal platform for course use.

Cottingham: Western Philosophy An Anthology


Cahoone: From Modernism to Postmodernism An Anthology
LaFollette: Ethic s in Practice. An Anthology
Goodin and Pettit: Contemporary Political Philosophy An Anthology
Eze: African Philosophy An Anthology
McNeill and Feldman: Continental Philosophy An Anthology
Kim and Sosa: Metaphysics An Anthology Edited b y
Lycan: Mind and Cognition An Anthology (second edition)
Kuhse and Singer: Bioethics An Anthology
Cummins and Cummins: Minds, Brains, and Computers The Foundations of Cognitive Robert Cummins
Science An Anthology
Sosa and Kim: Epistemology An Anthology University of California at Davis
and

Denise Dellarosa Cummins


University of California at Davis

You might also like