You are on page 1of 30

Chrono-Geometry

Jan Koenderink

D E C LOOTCRANS P RESS
Chrono-Geometry
Jan Koenderink

D E C LOOTCRANS P RESS , MMXII


Front cover: Umberto Boccioni (1882–1916), A strada entra nella casa 1911; Oil on Foreword
canvas, 100 x 100.6 cm; Sprengel Museum, Hannover.

This short E–Book is intended as preliminary reading material for a sum-


mer course in visual perception. It is one of a series of short introductions.
The book was prepared in PDF–LATEX, using the movie15 and hyperref
packages. In order to make use of its structure use the (free!) Adobe Reader.
Make sure that all options are enabled! Video and sound clips will start by
clicking (or double clicking) the image, they are embedded in the file.
Make sure you are online. Clicking the light blue text will get you to Inter-
net sites with additional material (try this page!). Be sure to read some of that,
if good material is available elsewhere I’ll skip it in the text. Most references
occur only once, saving both you and me time. However, it means that you
may have to backtrack at times.
All links were active when I checked last. I will check them again with the
next reprint.

De Clootcrans Press Utrecht, february 12, 2012 — Jan Koenderink


Utrecht The Netherlands
jan.koenderink@telfort.nl

Copyright © 2012 by Jan Koenderink


All rights reserved. Please do not redistribute this file in any form without my
express permission. Thank you!
Jan Koenderink
Katholieke Universiteit Leuven
pax / jan koenderink Laboratory of Experimental Psychology
Tiensestraat 102 – bus 3711
3000 Leuven
Belgium
First edition, 2012 jan.koenderink@ppw.kuleuven.be
Personal page
10 9 8 7 6 5 4 3 2 1

i
If points exist, then they would all be the same, indistinguishable through
PROBLEMS OF SPACE AND TIME any imaginable property. But if two entities do not differ in any way, then how
are they different from a single entity? This is Leibniz’s ontological Principle
of the Identity of Indiscernibles. But if there is only a single point in the
Space and time have remained enigmatic throughout the history of
universe, then what about space?
human understanding. One problem is the nature of the continuum.
Other perspectives on the problem derive from physics. Here I con-
sider the continuum, space and time in the context of (mostly visual)
awareness.

Continua in Awareness

Continua that one may grasp in a blink of the eye are a curve or a surface.
Of course, one grasps only part of either, but the structure of the continuum
is contained in any part of it. Here “part” means something as vague as not
being able to see the curve, or surface in its totality in any given moment.
The vagueness is important, for it is highly debatable whether you can cut
a line into two parts, so to speak. In order to apply the axe, you need to
decide on the point of the cut. (Clearly you can’t cut between points, for then Gottfried Wilhelm Leibniz (1646–1716) and Sir Isaac Newton PRS (1642–
the continuum would not exist, you would have a discretum.) But to which 1727). (Although Samuel Clarke (1675–1729), rather than Newton himself,
part should the cut-point go? (You evidently cannot “split it”, for a point conducted the correspondence with Leibniz, I have used Newton’s portrait
(according to Euclid) is “that which has no part”.) Moreover, what about the instead of his: it seems to make more sense as seen from a distant perspec-
points at either side of it? No one has been able to deal with such problems, tive).
although attempts were made throughout millennia. A modern concensus is
perhaps that lines do not consist of points, although some would hold that
points can be “pointed out” on a line. Others would deny that though, to point In the famous Leibniz-Clark correspondence2 The Newtonean position is
out a point (with infinite precision!) is certainly beyond human competence1 . that “absolute space” exists, even in the absence of objects. Objects assume
So if points exist at all, they would have to be “ideal” entities. To many that space (a piece of space in which they exactly fit). In contradistinction, Leib-
amounts to their non-existence. niz argues that such “absolute space” is a nonentity. All there is are relations
1 2
Here the “choice sequences” of the Dutch mathematicial Brouwer are especially impor- Leibniz wrote to the Princess of Wales who handed the letters to Clarke. In the (al-
tant. A choice sequence defines a point of the real number line through its decimal expansion, most a year long) interchange Clarke essentially argues for Newton. Newton himself was
like 3.1415926535897932384626433832795028841971693993751 . . .. He also considered not involved in the debate. The debate came to an end when Leibniz died two weeks after
sequences where the next decimal is decided by a throw of dice. Such numbers are never one of Clarke’s letters. The debate was dominated by misunderstandings of the other party,
“ready”, though as precisely known as you might ever need. When are such numbers equal? especially on Clarke’s side.

1
between objects. We understand such relations as “spatial”. The arguments One question that might be expected to find an empirical answer is whether
on both sides are partly theological, partly theoretical physics (“natural phi- the physical spatiotemporal relations encoded in the video clip are reflected
losophy”), partly formal, focussing on the difference between continua and in the spatiotemporal relations in the awareness of perceivers. Here we are
discreta (atomism). not concerned with such issues as that you hear the thunder clap after you
The time continuum is even more difficult to consider, because you obvi- see the lightning (a historically important case), here we are only concerned
ously can’t overview a stretch of time in the blink of an eye3 . The discussion with spatiotemporal relations in the retinal illuminance pattern and in visual
has primarily focussed on the extendedness of the “now”. Does the now in- awareness.
clude relations to past and future, that is to say, is there a “presence of the
past”, and a “presence of the future” as there (certainly) is a “presence of the
presence”4 ? Evidently past and future have to be understood in terms of the
presence, because the past is gone and the future is still to happen. Neither,
strictly speaking, exists in the present.
Here I pose such fundamental questions in the context of visual awareness.
The continua considered are space (visual field, visual world, . . . ) and time
(the structure — if any? — of the “moment now”).

Space and Time in Immediate Visual Awareness Vittorio Benussi (Trieste, 1878–
1927, Padova) did the work on
sounds at the University of Graz,
Both time and space appear to have a certain “grain size”. There is clearly
when working under Alexius
an interval so short such that no one is able to discriminate temporal order, or
Meinong.
even structure within the interval. Likewise, there is clearly a spatial extent
such that no one is able to discriminate spatial order, or even structure, within
that extent. A modern digital movie is a good example. You are not aware
of the sequence of individual images, nor of the pixels within such an image.
But that is not to say that space or time are discreta. They are evidently not to
visual awareness. Neither the movie frames, nor the pixels are natural parts of
the movie. You are aware of a spacetime continuum when you view the video An early study was due to Vittorio Benussi. He used sequences of sounds
clip. that came within a short period. He noticed that the sequence dah (low tone)
— dih (high tone) — bzz (noise) is heard in immediate auditive awareness as
3
Saint Augustine remarks (concerning time): “What then is time? If no one asks of me, I such, but that te sequence dah—bzz—dih is also heard as dah—dih—bzz.
know; if I wish to explain to him who asks, I know not”.
4
On this topic Saint Augustine has: “[there is . . . ] a present of things past, a present of
Apparently the time sequence in auditive awareness is unlike the physical
things present, and a present of things future. . . . The present of things past is memory, the stimulus sequence in the latter case. You may want to judge this for yourself.
present of things present is sight, and the present of things future is expectation”. Notice that it is not an easy task to monitor your immediate awareness!

2
Your report is necessarily a product of reflective thought, it has to be formu- ◦ the sound with periodic interruptions
lated once the awareness is over. You may try to decide whether dah—dih— ◦ the sound with the interruptions filled with noise.
bzz and dah—bzz—dih appear different when played short after one other.
This is somewhat easier.

The Benussi example. The graphs show the air pressure as a function of time.
The full length is three-hundred milliseconds, divided into three sublengths
of a tenth of a second. The low and high tones are seen as the coarse and
finine sinusoids, the noise bursts look like noise: chaotic. When listening to
the sound clip you should try to correlate the sound with these graphs, it will
aid you in the understanding. (Sound clip.)

Thus, it is not necessarily true that the temporal order in immediate aware-
ness reflects the temporal order of the physical stimulus at the sensitive body
surface. The graphs show the sound pressure amplitude as a function of time. The
Awareness seems to cook up its own account, dah—dih sounding better sound is Robert Williams famous Gooood Morning Vietnaaamm!!, from
(or more probable) than dah—dih (a “Gestalt”, like a micro-melody) with a Barry Levinsons’s movie “Good Morning, Vietnam” (1987). (Sound clip.)
bzz (“no–thing”) interruption. That this interpretation is reasonable appears
when you interrupt a longer sound. The interruptions are very objectionable,
and interfere strongly with the understanding. When you fill the interruptions The original sound is heard as one entity, which is why some people use it
with bzz’s (noise bursts), you hear the (interrupted!) sound as continuous as a ring tone on their cell phones. When the gaps are introduced the impres-
behind a sequence of meaningless noise bursts and understandability is much sion is completely different. You no longer hear a single (complex) sound,
improved. you are aware of a number of mutually unrelated noises. Introducing the
You may try this for yourself in another example. I used the famous sound noise bursts helps a lot to render the mutilated sound “readable”. With the
from “Good Morning, Vietnam”, a 1987 American comedy–drama film set in noise bursts the main sound appears as a single Gestalt, whereas with the
Saigon during the Vietnam War starring Robin Williams. It is repeated thrice, silent interruptions it is fragmented. It is as if a continuous sound went on
with one second pauses in between. “behind” an array of meaningless noises.
You will hear: This clearly shows the creative nature of microgenesis. Awareness makes
◦ the straight sound more “sense” than the physical disturbances on the body surface.

3
Here is a simple case inspired by Gaetano Kanisza. Take a picture of a
horse, cut it in two, pull the pieces apart, and hide the gap somehow (almost
anything works). What you get is a “dachshorse”, it can be pulled out almost
arbitrarily! Your vision simply assumes that the horse continues behind the
“occluder”.

A “‘broken
face” becomes
a complete
portrait. This is
As dachshorses go, this one is pretty extreme. It still works for me though. a “no brainer”
You may want to create your own and see how far you can go. to the microge-
netic engine of
most observers.
This is equally true for vision. The visual domain is somewhat more com- Yet consider the
plicated due to the fact that one has both a spatial, and a temporal order. I implications!
consider a few examples.

Hiding image
In the dachshorse case you filled in emptiness. If there is actually some-
content turns a
thing to hide, this works if the occluder completely covers the object you
decent lingerie
want to remove. For instance, you may turn a decent lingerie add into a
add (from
pornographic item by hiding image content, thus paradoxically “revealing”
a woman’s
by “hiding”. One produces apparent nakedness through the addition of body
magazine) into
covering5 !
a pornographic
Images may be combined even rather sloppily, and visual awareness will
item, perhaps
still make sense of them. Thus the “broken face” effortlessly becomes a full
better suited to
portrait.
Playboy.
5
The point is that you cover the image, not the body in pictorial space. The black rectangle
exists only on the picture surface.

4
Here I occlude a
familiar portrait
with gray strips.
The portrait is seen
as if “behind bars”.
Even with rather fat
strips the portrait
is seen to carry on
behind the bars,
even if quite a bit
of the image is now In the four images at left I cut away the gray strips and moved the pieces
invisible. together. You have the same image information as with the strips present.
However, the impression is very different: it looks as if nothing is missing,
but the image simply becomes smaller.
At right I have cut the image into squares, pulled the parts apart, and hid
the gaps with gray stripes. Thus each image contains exactly the same im-
age information. In this case the image appears to “grow”, you apparently
It is fun to play with such cases, a portrait is excellently suited to the pur- imagine “hidden parts” that never existed!
pose (most landscapes are “too easy”). Consider what happens when you
divide an image into tiles and pull them apart. When you manage to “hide the
gaps” you apparently “explode” the image. Of course, this is very similar to
the case of the dachshorse.
Here the image was
You may also cut strips off the borders of the tiles, thus paring them down.
cut into vertical
When you push the pieces together, you again obtain an integral portrait.
strips. The strips
However, it has diminished in size.
were randomly
You may keep the size constant by hiding the gaps. Then you simply hal- shifted vertically.
lucinate the missing material. In that case the final image is the same as when Finally, the image
you put occluding strips over the integral image, of course. was reassembled.
Things become even more interesting when you include spatial disarray.
What happens if you “do the puzzle sloppily”? The example of the broken
face suggest that nothing much will happen at all. This is indeed the case. When you cut the portrait into vertical strips, and shift these randomly with

5
respect to each other, the result is still easily readable. easy to tile (as in the Einstein portrait examples). All you need to do is shuffle
and reassemble.

Here I cut square tiles from the


image (if necessary taking a fresh
copy). The tiles were cut at ran-
domly dithered locations. Finally,
the image was reassembled, the
boundaries between the (ill fitting)
tiles being hidden by gray strips.
The result looks like an integral im-
age “behind bars”.

A more extreme case involves squares cut at randomly dithered locations


(of course you need fresh copies of the image when the tiles overlap, elec-
The image used in the demo is a painting by van Gogh (“Wheat field with
tronic “cutting” is most convenient). When the “seams” where the ill fitting
cypresses”, video clip.)
squares meet are hidden with a mesh of gray stripes, you “see” a nice, integral
image “behind bars”.
Such demonstrations are fun to play with. More importantly, they teach In the first example I use only spatial scrambling, although I also show what
you a lesson. They are examples where the optical structure is chaotic, yet happens when you change the scrambling over time. In the video there are 5
the awareness coherent. Thus they are cases of very non-veridical perception. sections:
They reveal that the Newtonian notion of an absolute space does not work in ◦ section 1: The painting is shown for a few seconds without any interven-
perception. If it did, you would see the stimulus for what it is: a chaotically tion.
scrambled image. But you don’t. You are aware of a coherent image because ◦ section 2: The painting tiled. Each tile is filled with a randomly shifted
you (in microgenesis, that is in pre-awareness) construct space on the basis of copy of what should be there. The random shifts are drawn anew for each
relations that make sense. frame. Notice the turmoil which looks much like a continuous deforma-
Apparently this process of construction works both in time (the Benussi tion. Occasionally one spots the edge of a tile, though this requires some
examples) and in space. Would it work in space-time? Nowadays you can scrutiny.
check that, because you can handle space-time in your computer. Starting ◦ section 3: Like section 2, but here I introduced cracks between the tiles.
with a video sequence you can scramble both in space and in time. The tem- The impression is not that different from that in section 2, although the
poral dimension is already tiled (the video frame), the spatial dimension is movements seem confined to the tiles. Occasionally one believes to see

6
the tiles themselves moving (which they dont). and in the tiles I randomly shift the image both in space and in time.
◦ section 4: As section 2, but here I introduced flashes between the frames. Apart from the jerks, I introduce random shifts. This looks really bad!
Notice that the impression of a continuous turmoil is gone. One notices ◦ section 5: Like section 4, but I add both cracks and flashes (temporal
occasional dislocations between the tiles. cracks). This section should be compared with section 4 and section 1.
◦ section 5: As section 4, but here I have both the cracks and the flashes. Does it look more like section 1 than like section 4 (except for the cracks
The major impression is that of a coherent image. Some scrutiny reveals and flashes)? Most people I tried certainly think so.
occasional dislocations. The contrast with the impression of turmoil as This is surprising! Because I used rather extreme disarray, scrutiny reveals a
in section 2 is very striking. certain degree of incoherence (mostly dislocations). One experiences some
turmoil and occasional dislocations. Yet judge: does it look more like section
1 (disregard cracks and flashes) or like section 4? Applying a lesser amount
disarray would most likely yield examples that would not really look differ-
ent from section 1. An attempt at a full parametric study seems likely to be
rewarding.

Attempts at Physiological Accounts


This clip is based on a short sequence from Sam Peckinpah’s movie “The
Wild Bunch”. In this scene (LETS GO!) the men check their guns, and start
walking towards the final shootout. They understand it will mean their end.
The concept of “local sign” was introduced by Lotze in the mid nineteenth
Notice that the clip is free of scene cuts. Although the camera and the players
century. It played an important role in sensory physiology and psychology
move, one has a continuous view. There is plenty of movement, except for
until the mid twentieth century. The concept seems to have been largely for-
a short break in the middle, where the men line up just before walking off
gotten in recent times.
towards the left. (Video clip.)
Why is the concept important? The point is that no one has the slightest
notion of how the mind arrives at a two dimensional, topologically coherent,
In another demo I have actually scrambled things in space-time. The demo and even (at least sloppily) metrically calibrated visual field. Of course the
consists of five sections: retinal image is a two-fold extended continuum, isomorphic to the array of
◦ section 1: The scene straight from the movie. visual directions, a line bushel with apex at the eye (either the entrance pupil
◦ section 2: The same scene in (local) temporal disarray. Notice the obvi- or the center of rotation of the eye-ball, depending upon circumstances). But
ous jerks as the clip suddenly shifts towards past or future. this is just physics and cannot be the cause of the topology of the visual field.
◦ section 3: Same as section 2, except for flashes between the temporal Of course we know that the optic nerve (over a million connections) projects
shifts. The flashes mask the apparent movements. The clip apparently somatotopically on the primary visual cortex. Some people accept the so-
runs smoothly, although the periodic flashes are objectionable. matotopy as a cause, but of course that is even too naive to be even worth
◦ section 4: Here I introduce spatiotemporal disarray. The image is tiled, mentioning.

7
This was obvious to Lotze. In his time the somatotopy was not yet estab-
lished. In his writing he speculates that it might pertain, and asks the question
whether it is relevant, only to firmly answer that question with in the negative
(yes, Lotze is one of my scientific heroes). Something more is needed, kind
Rudolf Hermann Lotze (1817–1881), the of “label” on the fibers of the optic nerve. Lotze called that label “local sign”.
German philosopher who introduced the He speculated on its nature.
concept of “local sign” ( Localzeichen). It
appears prominently in his Medicinische
Psychologie oder Physiologie der Seele
(1852).

Hermann Ludwig Ferdinand von Helmholtz (1821–


1894) was a German physician and physicist who
made significant contributions to several widely var-
ied areas of modern science.

George Berkeley (Kilkenny 1685–Oxford 1753) was


Eventually Lotze came up with the theory that local sign is learned via the
an Irish philosopher of British descent, and an An-
effects of voluntary eye movements. This is reminiscent to Bishop Berkeley’s
glican bishop. He is associated with “subjective ide-
speculations regarding the relation between the upside down retinal image and
alism”. I would count his “New Theory of Vision”
the “correct” awareness. Berkeley notices that you have to bend over to reach
(of 1709) as required reading by any vision scientist.
an object that optically tickles the topside of your retina. This, he speculated,
would be learned through arbitrary association.

What would happen if you scrambled the optic nerve? Or if you exchanged
the locations of two distant cortical columns, keeping the wiring intact? If Ernst Heinrich Weber (June 24, 1795 January 26,
you believe that the brain is just another machine, your answer should be: 1878) was a German physician who is considered
“nothing of course”! For the layout of the wiring is immaterial to a machine, one of the founders of experimental psychology.
merely a matter of convenience or economy, it has to bearing on the function-
ing as such. But if you say that, you admit that somatotopy is irrelevant to the
problem.

8
work he noticed that in cases of acute toothache the patient generally is unable
Psychophysical results for a to point out whether the source of the pain is in the lower or in the upper jaw.
tarachopic patient examined Although it is simple enough to find out (merely apply pressure and listen
by Robert Hess. The MTF’s to the response), Helmholtz is fascinated by the phenomenon. He speculates
of both eyes are identical, that in chewing food the nerves that serve teeth in opposite position are nec-
thus the “bad eye” (includ- essarily stimulated in synchrony. The brain may well interpret correlation of
ing brain) is physiologically nervous activity with spatial overlap of sensitive areas (Empfindungskreisen,
not different from the “good a notion due to Ernst Heinrich Weber (1795–1878)) on the sensitive body
eye”. The insets show the surface. Thus Helmholtz has found a physiological mechanism that might
patient’s impressions of uni- establish topological relations.
form grating patterns of var- The Helmholtz mechanism can indeed be formalized as cohomology af-
ious frequencies. ter Eduard C̆ech (1893–1960), a Czech mathematician. In principle it could
serve to “unscramble” the optic nerve connections if some evil djinn were to
scramble them!

This gives the flavor of the Helmholtz idea:

R Here “P is included in Q” if and only if for


any R such that R correlates with P it is the
case that it also correlates with Q. Topolog-

Q P ical relations are defined in terms of neural


correlations.

This might simulate various (extreme) forms of tarachopia. These images


The formal theory of Helmholtz local sign is due to Alfred North White-
have identical histograms, that is to say, they are composed of the same pix-
head (1861–1947), an English mathematician (he wrote the Principia Math-
els. Even more important, the pixels are only coarsely in disarray, neighbor-
ematica (published in 1910, 1912, and 1913) with Bertrand Russel) who be-
ing pixels typically remain neighbors, implying that visual acuity is intact!
came a philosopher. The account is found in his Process and Reality (1929).
Such a system could discriminate a zebra from a donkey, yet be unable to see
There are some indications that the Helmholtz mechanism might be im-
whether the zebra’s stripes ran horizontally or vertically. The zebra might
portant. One is the existence of a form of amblyopia (“blunt eye”) known
even be checkered or polka-dotted, it would make no difference.
as “tarachopia”. The tarachopic patient has good optics and normal retinal
receptive fields. The spatial modulation transfer function as determined by
Much later, Helmholtz came up with another possible mechanism. standard psychophysical techniques is normal. Thus the patient has (techni-
Helmholtz started his career as an army physician. In the course of his daily cally spoken) normal visual acuity. Yet the patient may be unable to read the

9
headlines of a newspaper! Tarachopia is apparently a form of Seelenblindheit known local signs of a number of “landmarks”. A relative local sign already
(optical agnosia). It may well be a form of defective (Helmholtz) local sign. suffices. For instance, when you see a face you may locate the nose with re-
In the mid twentieth century Platt6 came with yet another idea, closely spect to the eyes and mouth, it doesn’t really matter where the face is. This
connected with the issue of local sign. He noticed that the retinal images even works when you see a different view of the face, as long as you see the
of straight lines are moved along themselves (at least momentarily) by eye landmarks (thus it fails if you have only a profile view). This method is cat-
movements. Thus collinearity in the visual field is related to certain invariant egorically different from the ones discussed above, because the “landmarks”
cortical activations. This yields the possibility of a projective mensuration of are derived from the optical structure. Thus the nose will actually move with
the visual field. respect to eyes and mouth as the head turns, this happens because the nose
does not lie in the plane defined by eyes and mouth8 .
There can be little doubt that landmark location is important in daily vision.
Like many animals cats have vibrissal pads It locates visual objects with respect to other visual objects, which makes the
(“whiskers”). It allows them to “see in the method very robust, and largely independent of “calibration”.
dark”, at least in near space. The neural ma- However, it presumes that the topology of the visual field is in place. This
chinery of the vibrissal arrays may be imple- involves the Helmholtz-type local sign.
mented in human primary visual cortex for
fine spatial vision. It is based on temporal,
rather than spatial patterns. It would put the The Structure of Time in Awareness
eye involuntary movements during fixation
to good use.
In awareness the time dimension is qualitatively different from the space
dimensions, this is unlike Einstein’s blend of space-time. Awareness is always
More recently, it has been argued that the foveal area may function much an awareness of the present, although there is simultaneously a presence of
like the vibrissal array (the “whiskers”) of such animals7 as rats, cats, sea the past, and a presence of the future. These latter presences are simultaneous
lions, and so forth. This would allow the system to fine tune local sign using with the now. A model would be a piece of sculpture like the Discobolus of
a temporal, rather than a spatial encoding of the optical structure. This idea Myron.
is actually not that novel, in the past it has often been speculated that the fast, The discobolus is a piece of stone, it has not lost its essential shape over
small, involuntary eye movements during fixation might have such a function. centuries. Yet you “see” both the immediate past and the immediate future.
But the idea to use the vibrissal system as a model appears to be recent. This is Anthony Ashley Cooper, 3rd Earl of Shaftesbury’s (1671–1713) prin-
Notice that these various mechanisms are by no means mutually exclusive. ciple of “anticipation and repeal9 ”.
In fact, they appear to be mainly mutually complementary. I would expect You see the history and fate of virtually anything in momentary awareness.
them all to contribute to what may be summarily called “local sign”. Suppose you find a dented coke can near the road. The very characterization
“dented coke can” already contains a rich history! You expect the object to be
L ANDMARK LOCATION is a different type of positioning that depends upon there a moment later, thus you are aware of its future.
6 8
Platt, J. R. (1960). How we see straight lines. Scientific American, 202(6), 121-129. This gets you to the topic of motion parallax, and “Shape From Motion”.
7 9
Humans are reported to show vestiges of vibrissal capsular muscles in the upper lip. To be found in the Characteristics (1737).

10
The Discobolus of Myron (“discus
thrower” Greek “Diskobolos”) is a
famous Greek sculpture dating from
the end of the Severe period, circa
460–450 BCE. The original Greek
bronze is lost. It is known through
numerous Roman copies in marble.

This example involves a complicated scene. It certainly looks like a “con-


certed action”. Yet these are pieces of stone dumped there by workers reno-
vating a building. (I took the photograph during the restauration of the Char-
Of course this history is your hallucination. The object might be a valuable lottenburg at Berlin.) You “see” cause and intention where there is none.
piece of art, never having been dented at all, but produced in its present form.
You will need little ingenuity to think of a dozen alternative histories. Phenomenologically your awareness is always now, and all you are ever
The future is your hallucination to. After all the object might be a booby aware of is now. That is your reality.
trap, set to explode the moment you approach it. This is a common enough It is different in reflective thought. In your thoughts you are moving in
occurrence in some regions, yet it probably didn’t occur to you (that’s why time, leaving the past behind, and entering the future, the “now” being only a
booby traps are effective). fleeting event that you will rarely consider. That is your illusion.
Without such hallucinations you couldn’t survive. What if you actually Awareness just happens. The moment it occurs it dies, being replaced by
considered such pasts as the coke can being a piece of art, or the can ready another awareness. The microgenesis takes about a tenth of a second.
to explode (or suddenly vanish, and so forth)? Since the simplest scene has Momentary awareness is somehow tied to a period roughly corresponding
infinite potential for past and future, you’d be stuck for the remainder of your to a single voluntary fixation. Most fixations are involuntary, they are part
life. Hallucinations are good! At least if you hallucinate well. People who of microgenesis. Such a period between voluntary fixations contains half a
don’t end up in mental institutions. dozen to a dozen involuntary fixations, and might be called a “glance”. A

11
“good look” is made up of may be ten glances, perhaps containing a few
voluntary fixations.
Glances are short in terms of clock time. They last for about a heart beat.
This may actually be relevant as the heart’s electrical activity is (by way of
volume conduction) available throughout body and brain as a central timing
signal.
Scrutiny involves at least ten good looks and a larger number of voluntary
William James (1842–1910)
fixations. There is no upper limit to the duration of scrutiny.
was a trained physician. He is
Scrutiny is connected to cognition and reflective thought. It is done by mainly known as a Harvard pro-
you. This is quite different from visual awareness per se. Visual awareness fessor who should be reckoned
simply happens, it is not something you do. Visual awareness is both pre- as one of the founding fathers
personal, and pre-cognitive. Because visual awareness is reality, cognition of psychology.
has to obtain its material from it. In doing so the material suffers a sea change
in that it is stripped of quality and sense. Instead, you assign meaning to it.
This meaning may well differ from the original sense, that is why I said that
reflective thought is illusion. You often “correct” visual awareness. This is
explicit in many “visual illusions”, where you “know” that what you see is
“wrong”.
A good look is usually part of what is known as the “specious present”, a
term coined by E. Robert Kelly (under the pseudonym “E. R. Clay”) and was Then the set N is a slightly thickened specious present that one might call
mainly developed by William James (1842–1910). The specious present holds “now”, whereas the sets P and F may be understood as past and future of that
a bunch of momentary awarenesses in no specific temporal order. Specious now11 . Thus one obtains a temporal order. It is a “time dimension” that might
presents may have elements in common. In such a case they may be said to be used in cognition. Notice that the “time axis” cannot be subdivided arbi-
“overlap”, and the Helmholtz local sign principle applies. trarily (there is a “graininess” to it), whereas the deletion of a single glance,
The overlap of specious presents yields a division into past and future. Sup- good look or specious present fails to cut the axis into two parts. The time
pose you have a collection of specious presents {m1 , m2 , m3 , . . .}, and a bi- domain is tightly knit together.
nary relatio R(i, j), such that R(i, j) = 1 in case the specious presents mi , Notice that the discobolus assumes a pose that is unlikely to be found
and mj overlap, and zero otherwise. Then you may split the set of specious among the frames of a high frame rate video of the performance of an ac-
presents into three mutually disjunct subsets P , N , and F , such that any two tual discus thrower. The discobolus is a “summary statement” of a specious
members of N mutually overlap, all members of F (and similarly for P ) mu- present, roughly a video sequence of about a second’s worth. It works so well
tually overlap indirectly10 , whereas some elements of P or F overlap with because it summarizes the past and future of a decisive moment. This is part
elements of N , and no element of P overlaps with any element of F . of the artistic value of the discobolus, (among more) it is an act seen through
10
an empathic mind.
Let a and z overlap indirectly if there is a set {b, c, . . . x, y} such that a overlaps with b,
11
b overlaps with c, . . . , x overlaps with y, and y overlaps with z. But notice that past and future play equivalent roles and may be interchanged!

12
It would be a great exercise at a film academy to let students “expand” a
single statement like the discobolus into a short video sequence (say one or Formal Accounts of Continua in Awareness
two seconds). It would likewise be a great exercise for a drawing course to do
exactly the opposite.
There figure quite a number of continua that are of immediate rele-
vance to the formal description of visual awareness. There obvious
are spaces like the “visual field”, “visual space”, “pictorial space”,
“depth”, “color space”, and so forth. Not all of these play a role in
immediate visual awareness. For instance, if you look at the scene
in front of you, you experience visual space, not the visual field, or
pictorial space. You may be able to switch. For instance, Claude
Monet12 was evidently able to experience the visual field, and to
relate it to potential pictorial spaces, no doubt in immediate visual
intuition.
In this chapter I consider mainly general, formal models. I limit
the discussion to space and time, ignoring topics like color space.

At top a set of specious presents (time axes horizontally, vertical coordinate Scale space
is just for separation). At bottom a thick “moment now” (black) divides the
past (blue) from the future (red). the orange specious presents overlap with
the thick now and are neither in the future, nor the past. “Scale space” is a well established formalism nowadays with numerous
applications in such fields as image processing. Some excellent textbooks
are available, so I will limit the discussion to an overview of the essential
In the film academy one could have additional exercises by asking for short concepts.
video sequences of the immediate past and future. Of course there is much The concept of “scale” is familiar enough. It is usually associated with
more leeway in the past and future than in the present sequence. Increasing “size”, in vision with “angular size”. At the battle of Bunker Hill (1775) pa-
the time span, past and future movies would diverge into such diversity that triot General William Prescott instructed his troopers, “Don’t one of you fire
it would be appropriate to call it to a halt because “anything goes”. A mul- until you see the whites of their eyes!” This proved to be good advice: as the
tiscreen presentation of a number of such movies (in sync of course) would Redcoats approached within forty yards, the men started a barrage of mus-
make a great happening. You would see different worlds converge, start to co- ket fire. Nearly a hundred enemy troops were cut down, causing the British
here, come together in the present moment, then start to decohere and diverge to retreat. Notice how “seeing the white of the eyes” translates into about
arbitrarily. It would be an appropriate illustration of a momentary awareness.
12
Claude Monet (1840–1926) was a founder of French impressionist painting. He was
the most consistent and prolific practitioner of the movement’s basic philosophy. He largely
specialized to plein-air landscape painting.

13
forty yards. This derives from the resolution of human vision. Here we have
an interplay between visual resolution and angular size (the British being of
generic size, angular size depends directly on distance).

Powers of Ten is a 1968 American documentary short film written and di-
rected by Charles and Ray Eames. The film depicts the relative scale of the
Universe in factors of ten. The film is an adaptation of the book Cosmic Pixellated portrait of Shakespeare. The blurred picture at right and the tiny
View (1957) by Dutch educator Kees Boeke. ( C LICK THE IMAGE TO RUN picture at bottom have the same structural information, yet they are easily
IN PDF, CLICK THIS TEXT TO RUN IN AN EXTERNAL APPLICATION .) “read”, whereas the pixellated image can’t be. Here you “don’t see the por-
trait for the pixels”. The demo dates from 1973 and is due to Leon Harmon
One usually uses “scale” relative to “size”. Thus an ant is at a “small scale” and Bela Julesz who used a portrait of Lincoln. (From this site.)
with respect to a leaf, whereas you are at small scale with respect to a tree, but
the leaf is at small scale with respect to you. The ant’s and your perspectives Thus “scale” has always to be reckoned with respect to size. There is no
naturally differ. such a thing as an absolute scale, though there is a rockbottom value. It is the
In image processing the size is usually set by the size of the image. For pixel size in image processing, and roughly a minute of arc in foveal human
contemporary images the smallest scale is about one thousandth of this (an vision, over a degree in very eccentric vision.
image of a thousand pixels across is typical). In vision the size might be In formal scale space one avoids these practical problems by assuming un-
taken to be the diameter of the field of view (about a hundred and eighty limited size, say a piece of the Euclidean plane “larger than you would ever
degrees of visual angle), but in practice it is some “region of interest”, which need”, and unlimited resolution, thus pixels “smaller than you would ever
is much smaller, and somewhat indefinite. Of importance is the number of want”. The interesting aspect of scale space is that, as you zoom through the
degrees of freedom, which is the square of the ratio of the size to the scale. scales, you encounter qualitatively different aspects of reality. This is nicely
In image processing this might be a million, but in human vision it is at best illustrated by the classical video clip “Powers of Ten”.
a few dozen. The human system simply lacks processing power. It works The same benefits are obtained in image processing, and nowadays it is
so well (at least that’s how we feel, knowing no better) because the visual quite common to use scale space based methods. This is a rather technical
system has an uncanny ability to reallocate resources. Moreover, what fails to subject. There are good texts to get you going.
be processed will not enter awareness, and will never be missed. It seems quite clear that human vision also uses some variety of scale space

14
processing. However, it is certainly far from perfect. There occur frequent the mean aperture diameter grows linearly with eccentricity. At the center of
cases where visual observers apparently fail to “see the forest for the trees”, the visual field all aperture sizes are available, at more eccentric locations the
for instance. It also happens that they fail to “see the trees for the forest”. Such smaller apertures are lacking, thus accounting for the low resolution of the
cases have been insufficiently researched, there remains much to be done. The peripheral visual field. This (ideal) structure implements the visual field as
topic is certainly important, both from a conceptual, and from an applications “zoom lens” and allows the user to zoom freely through scale. Although the
oriented perspective. actual structure is not ideal, we have shown psychophysically that it accounts
for the bulk of the properties of spatial contrast detection and discrimination.

Salvador Dali with skull sculpture “Voluptas


Mors” (1959). Photograph by Philippe Hals-
man. In the first impression you are aware of
the skull, scrutiny reveals that it is made up
of female nude figures. Here you “don’t see
the nudes for the skull”, it is the opposite of
the previous case. Make sure you view both The visual field
this and the previous figure also from a dis- structure as a scale
tance or through your eyelashes! space structure.

Psychophysics reveals that the visual system has invariant contrast sensi-
tivity with respect to scale. You find this if you measure the contrast detection
threshold for “punctate” (at the scale of course, points have a certain size)
stimuli13 .
The visual field may be interpreted as a scale space structure. One under-
stands it as a stack of scaled copies of a disk tiled with identical apertures.
Jet spaces
The apertures represent scale space “points”, one thinks of them as represent-
ing receptive fields, or (perhaps) columns. The largest disk covers the full
visual field at (necessarily) very low resolution. Progressively smaller disks Although the retinal illuminance pattern is generally considered to be the
cover less than the full visual field, albeit at higher resolution. The smallest “input” to the visual system, the primary visual cortex receives something
disk only covers the central fovea at the highest resolution. In such a structure very different. It receives something like the gradient (first order directional
13
The more common method uses sine-wave gratings. This complicates matters and yields derivative) of the local contrast, the second order directional derivative, and
a distorted picture because of spatial integration of numerous local excitations. (perhaps) the third and fourth order. The first order derivatives are commonly

15
known as “edge detectors”, and the second order ones as “line detectors”, a
very confusing habit with no good basis except in recent history. Atlas structure
Directional spatial derivatives are the base material of differential geome-
try. The bulk of interesting geometrical properties can be expressed in terms
It has become increasingly clear that immediate visual awareness is based
of algebraic combinations of such derivatives. The interesting thing is that
on only a minor part of the optical structure that impinges upon the eye. That
differential geometry can be framed (without any approximation!) in terms
is to say, it appears to be organized such that it computes a rough statistical
of scale space. Then derivatives become essentially receptive field profiles.
estimate of what happens throughout the visual field, and a more precise es-
This again means that differential geometry allows you to design neural im-
timate of what appears to be the case in some limited region. Here “limited”
plementations! Each algebraic combination of directional derivatives can im-
applies to structural complexity, rather than mere size. As you somehow “in-
mediately be compiled into the blueprint of a small, local neural network.
tegrate” successive looks, you need to relate a number of such local (limited
This means that the cortex can be described formally as an implementation structural complexity) samples. This leads to the problem of when and how
of differential geometry, a veritable “geometry engine”. such local samples may be compared.
A “jet14 ” is like a truncated Taylor expansion. Thus it describes the reti- A notion of distance is necessarily complicated when points have different
nal image locally (in the neighborhood of a point) up to a certain differential sizes. One way to grasp the essential idea is to consider a simple example,
order. The visual cortex may encode up to (and including) order four, which that we will refer to as the “atlas model”.
implies about a four by four pixel local representation. Such representations
would exist at various scales. One imagines the jets to be implemented as
cortical columns. In such a column one can perform a variety of geomet-
rical operations almost “for free”. Some common computations might be
prewired, other might be started after querying (or loading a tiny “program”)
from higher brain centers.
Examples include the computation of edge curvature, the detection of cor- Tourist map of Paris. There is no
ners, and so forth. way to mark Berlin on this map.
Even a common operation like “edge detection” is best implemented in
terms of differential geometry. Any so called “edge finder” is only something
akin a first order directional derivative. Generically, it will “detect” an edge
at any pixel. In order to tame it you need to set some arbitrary threshold.
For something you might be ready to call “edge” essentially all directional
Atlases contain maps of limited size and of various scale. Thus Paris and
derivatives, of any order, yield some output. If you tell the system what you
Berlin are both on the map of continental Europe. The Tour Eiffel will be
mean with “edge”, you may implement a detector that uses the full structure
on a map of central Paris, the Brandenburger Tor on a map of central Berlin.
available in the jet to come to a more focussed conclusion. It may enable you
These landmarks are not to be found on a single map though. The distance
to set a threshold on rational basis.
between the Tour Eiffel and the Brandenburger Tor is measured on the highest
14
Jet bundles were introduced by Charles Ehresmann and used to good advantage by Élie resolution map that (via many levels of indirection) contains them both. This
Cartan. map will have a rather low resolution. The distance is simply the distance

16
Paris–Berlin on the map of continental Europe. The distance between the with a conventional sign, that is a pointlike entity without internal structure.
Arc de Triomphe and the Brandenburger Tor is the same as the distance of In typical cases the ratio of the grain size to the scope is fixed throughout
the Tour Eiffel and the Brandenburger Tor. Yet the Tour Eiffel and the Arc the atlas. The inverse square of the ratio (for a planar map) is the “number
de Triomphe have a well defined distance as measured on the map of central of pixels”, that is the number of independent entities represented in the map.
Paris. In conventional geographical atlases it is large (a million say), in the visual
field it is rather small, say ten to a hundred. This ratio is an important number
that characterizes the atlas. We assume it will be fixed by neuroanatomi-
cal/physiological constraints. For the sake of experimental phenomenology it
is just a “constant of nature” descriptive of the human condition.

Tourist map of Berlin. There is no


way to mark Paris on this map.

This induces a planar structure that is quite different from the familiar Eu-
Tourist map of Europe. Berlin
clidean plane.
and Paris are both indicated.
The “location” in the spatial domain is defined hierarchically, as familiar
Notice their representation.
from daily life experience. For instance, suppose you forgot a key, where
would you look? Certainly in your home town, your neighborhood, your
house, a certain room, a certain desk, a certain drawer, somewhere in the mess
you probably expect there. If you had to specify the location over the phone
it would depend on who you were speaking to (a stranger, your neighbor,
or your spouse) where in the sequence you would start. Almost certainly
you would describe some nested order though. The essential gain of such
a description lies in the fact that you refer only to local structures. Doing Modeling the atlas structure is straightforward. I represent maps just as I do
this iteratively gets you by way of local methods (in the scale dimension) to points in scale space, with all the obvious consequences (thus maps overlap,
“global” relations (in the space dimensions), though only indirectly so. This have intersections, and may dominate15 each other). A map contains points
closely resembles the use of an atlas. Such a formal structure appears fit to that intersect with it and have a width equal to the grain size of the map. An
describe well known properties of the visual field. We discuss some of the object is located on a map if it is close to a point of the map. Here “close”
major features here. may be defined as having an overlap of at least some characteristic number,
A map is defined by a (central) location, a scope, and a grain size. The loca- say one half. The precise magnitude of the number is not really important.
tion tells you that “this is a map of such-and-so”, the scope tells you the “size” 15
A map “dominates” another if it covers its area. Thus the map of the Netherlands domi-
of the area that is covered by the map, and the grain size tells you the “reso- nates the map of Amsterdam. This does not imply that it contains the map of Amsterdam, of
lution”. Anything smaller than the resolution is either omitted, or represented course. Amsterdam may be represented by a dot on the map of the Netherlands.

17
When two points are both located on the same map, you may measure their cannot have a “punctate stimulation” that is effectively punctate for both, and
distance on the map. In reporting it one mentions both the distance and the simultaneously potent enough to excite both. In such cases the Helmholtz lo-
grain size, for instance 5 ± 2. Thus “distance” is indexed by resolution, one cal sign mechanism has to break down. In practice there will be some vaguely
actually has a one-parameter family of distance functions. Two distances are defined limit on the ratio of widths. Thus points may be really too small or
different if they differ by more than the combined uncertainty (grain size). “really too large”. In the case of the visual field one cannot use the conven-
Things start to be slightly interesting when one or perhaps both of the points tional methods of cartography. Any relation has to exist, that is to say, has to
are not on the map. Consider why a point might not be on a map. It may be relatable to prior experience, in order to be possibly meaningful.
happen either because the point is too large, or because it is too small. These From a formal point of view it is nice not to put arbitrary restrictions on
cases are categorically different. Europe is not on the city plan of Amsterdam: atlas size. That is to say the point width and grain size could be infinitely
it is too large. Tietjerksteradeel (pronounce Tytsjerksteradiel) is not on the small, and the scope of a map could be infinitely large. In real life one meets
map of Europe because it is too small. But Tietjerksteradeel might (through such restrictions of course. The visual field is about a hundred and eighty
some conventional mark) be indicated on the map of Europe, whereas it is degrees in diameter, thus of finite extent, and the best resolution is about a
evidently impossible to indicate Europe on the city plan of Amsterdam. This minute of arc, thus not infinitesimally small.
is not due to some lack of conventional signs. Moreover, the resolution (smallest grain size) depends on the eccentricity,
that is the distance to the center of the visual field. This leads to (important)
The Minsky–Papert spirals show what hap- complications that we have considered in some detail before. It also forces
pens when the configuration of interest the use of saccadic eye fixations, most of them involuntary, and evidently part
doesn’t “fit” a single page of the atlas. You of the micro genetic process. Here I have ignored this aspect, it is crucial to
are unable to “see” (without scrutiny) which any understanding of the actual system.
of the two convoluted blobs is single and When we ignore arbitrary constraints, the fuzzy plane, augmented with an
which is composed of two disjunct blobs. atlas structure, is a self-similar entity. That is to say, if you scale all spatial
dimensions by the same factor, you obtain an entity that is congruent to the
original. That is why I refer to the “self-similar fuzzy plane”. It is an ideal,
A point may dominate a set of points belonging to a map, and this set may but quite apt, formal description of the structure of the visual field. The self-
be a proper subset of the set of all points belonging to the map. In such a case similarity has a firm basis in psychophysical fact. I propose that the formal
the point is an “area” in terms of the map. Then points of the atlas may be structure may be fleshed out either in neurophysiology or in experimental phe-
designated a distance from the area, for instance, the minimum distance of the nomenology. These interpretations will be categorically different, of course.
given point in the atlas to any point of the area.
If a point is too small, one needs to find a suitable “representative” in the M ETAMERS OF THE CURSORY GLANCE are images that cannot be distin-
atlas. Possible representatives of a point are points that dominate the point. If guished from some fiducial image on the basis of a cursory glance, although
the points have representatives in the map, then these representatives define a they can possibly be distinguished with scrutiny. Images that are seen in ec-
small area in the map. Any point of the area can be used to represent the point centric vision are of this type, thus this applies to virtually all of the visual
on the map. field with the exception of an area at the fixation location.
A problem might be that the dominance relation may hardly be expected to The metamers of a fiducial image subtend a very large set. For simplicity
be defined for points with extremely different width. The reason is that one we may consider a subset of images that cannot be distinguished from each

18
other. This is still a very large set. The fiducial image itself cannot be distin-
guished from any image of the set, and is thus nothing special. “The” image
is any image from the set, and is thus only vaguely determined.
Given the atlas structure it is not hard to construct examples of such images.
For instance, one might perturb the map of Europe in such a way that the
differences would not show up in any particular map, yet such that Berlin
was displaced by a hundred kilometers relative to Paris, leaving the internal
relations (Brandenburger Tor relative to the Reichstag, for instance) intact.

Example of a deformation that cannot be detected given a certain atlas struc-


ture. Notice how the “local” structure is conserved on all scales. The local
deformations are combinations of Euclidean movements (translations and
rotations), Euclidean similarities (either dilations or contractions), and affine
shears.

When you apply such transformations to an image you obtain deformed


images that are immediately recognizable because the local structure is con-
Example of a mixture of blur and deformation. Each row gives some isomers
served on all scales. However, they certainly look deformed under scrutiny.
at a fixed scale.
When you flash the images, or view them eccentrically, they are difficult to
distinguish from the fiducial image.
Of course, as you view at progressively more eccentric locations of the These are not the only losses in the cortical (and so forth) data structures.
visual field, you also loose resolution. The images you obtain are both blurred They yield a vivid insight in the structure of visual awareness though.
and deformed. A major point is that the actual data structures cannot be illustrated by any

19
single image. The contents of visual awareness are necessarily metameric
sets of images. It is not that any single member of the set is your current
visual awareness. It is that momentary visual awareness is always vague. The
vagueness is described by the extension of the metameric set.

This movie shows the effect of a change of amplitude of the deformation.


(Video clip.)

Perhaps surprisingly, the type of deformations described above has a decid-


edly aesthetic appeal for large amplitudes. The image becomes unrecogniz-
able (useless as a portrait—of Einstein in this example), but it definitely looks
“interesting”. The interest is apparently due to the fact that microgenesis has
trouble coming up with something more or less definite, and instead gives a
kind of image that hovers in limo between definite and indefinite. The human
mind enjoys this as a kind of entertainment16 .
In the early nineteensixties Weegee17 designed a method that yielded results
very similar to these. He used it for caricatures of famous personalities and
“artistic interpretations” of nudes.

16
I’m pretty sure such images would sell in an art gallery!
17
If the deformations are large, they have serious consequences for recognition. Pseudonym of Arthur Fellig (1899–1968), photojournalist. Weegee worked in the Lower
Notice that these deformations appear to have an aesthetic appeal! East Side of New York City as a press photographer during the nineteenthirties and forties.
He specialized in “real life” (mob killings and the like) using a Speed Graphic with flash,
typically being able to pull off one or a few four by five inch plates when arriving close to
the police. He developed a way to distort photographs (using plastic lenses during enlarging)
that yields results very similar to the “metamers of the cursory glance”.

20
Here the spatial dither has been ap-
plied to all layers of a scale space
representation separately. Notice
the important difference with the
case in which the dither is the same
on all scales. Here the scales “deco-
here”, so you see relative shifts be-
tween fine and coarse detail, local
dissolution of edges, and so forth.

Weegee (notice the legendary SpeedGraphic camera, using 4 × 500 plates, In real life most objects do not all look the same, and this enables you often
sporting a rangemeter, frame viewer) and his interpretation of Robert to locate an object relative to other objects. This is an important, very general
Kennedy. principle. It works in cases of many objects that are mutually different, but
not that different. For if all objects are mutually unlike each other you will
The topic of “metamers of the cursory glance” is an important one, and not be able to find the reference objects, and you’re in the same spot as when
there is considerable room for further development. For instance, it may be the points were all alike.
brought into correspondence with the current thoughts on “sparse coding”18 . The best of all worlds is the case where objects are similar, but different in
A first step in this direction is to apply the spatial dither to the levels of a scale some lawful manner. A good example would be a certain animal. (Notice that
space representation. This has the effect that fine detail may shift with respect it already helps much to know that the object is an animal in the first place!)
to coarse detail, that edges may become less defined, at other places spurious To say that the animal is a mammal is to focus down its definition greatly. To
edges may pop up, and so forth. know it’s feline narrows it down even more. To say it’s about a foot high rules
out tigers and so forth. To know it’s reddish pins it down to a red cat, which
may be sufficient. (In case you need to identify it as a specific cat the process
Analogy Fields goes on of course.)
Notice that you use two different principles here. One is the Helmholtz–
Whitehead type of identification by way of mutually nested boxes. The other
It is not easy to pin down a point among other points. After all, all points is the principle I discuss here, the location of an item in an analogy field.
look pretty much the same! This is why the concept of local sign is necessary. Once you’re on to it, you will notice analogy fields all over the place.
18
See Simoncelli and Rosenholtz. The literature on the “crowding phenomenon” is also Analogies are important in awareness. Microgenesis uses analogy all the time
relevant. so as to somehow “define” objects in a qualitative sense. Cognition does much

21
the same thing. The railway locomotive was often called an “iron horse”, and
so forth. And for good reasons too. It pulls things (carriages) in much the
same way as the horses pulled the stage coach. Cars are in the same analogy
field. That most cars have the engine mounted in front is evidently due to
the basic pattern of a power source (man, horse, locomotive, . . . ) pulling a
container (for goods, people, . . . ).

An analogy field that


cannot be parameter-
ized smoothly, being
made up of discrete in-
stances.

Example of an analogy
field. This is a case where
a continuous parameteriza-
tion is possible.

Analogy fields are also important in the case of single objects. A rigid
object may be seen from a variety of angles, distances, illuminations, and so
forth. The retinal images for the object come in a rich variety, and may be
Some analogy fields are continuous (colors for instance), others more dis- mutually very different. However, they all have a certain “family likeness”.
crete (horses, locomotives, cars, . . . ), although combinations and approximate Moreover, any two specific views can be connected by way of a continuous
continuity occur. For instance, the field of Volkswagen cars is approximately series of in between views. Such in between views occur naturally as when
continuous as fashions in car bodies change gradually over the years. you walk around an object, or rotate it in your hands. These analogy fields are
The mind is always ready to place objects in a (usually more than one) highly structured. For instance, you can order them by viewpoint, light source
analogy field. This happens more or less automatically, microgenesis handles location, and so forth. This imposes a rich structure on the optical input, that
it subconsciously. It works even when confronted by completely novel classes is related to the structure of the environment as well as on your actions. There
of objects, like random nonsense objects in a laboratory setting. has been some theoretical work on the topic19 , but much remains to be done.
Of course, this ability is very useful in our world, just think of the structure
This structure is used in a quantitative way in computer vision applications,
of the animal and vegetable kingdoms, but also the world of artificial man-
19
made objects. Known as “aspect graphs”, or “vistas”.

22
but it can equally well be used in a qualitative or approximately or partly or maybe two plus one) with Euclidean space, it is definitely not a homoge-
quantitative manner in human perception and action. neous space.
Visual space is perspectival, in that it is organized about one’s vantage
point. This may be a single vantage point, as when one is at rest in an other-
wise static scene, with one eye open, or it may be based on multiple vantage
points. This happens in binocular viewing, or when the observer and the envi-
ronment are in relative motion. In various cases the structure of visual space
is likely to vary, whereas Euclidean space remains just what it is.
P ICTORIAL SPACE is experienced when one “looks into” a picture. This is
different from “looking at” a picture. When one looks at a picture one expe-
riences an object in visual space, a photograph, a painting, or whatever the
case may be. When one looks into a picture one becomes visually aware of
the existence of another space that is evidently not part of visual space, and
does not “represent” aspects of the Euclidean structure of the scene in front
of the eye. Like visual space, pictorial space is a mental entity. It differs from
visual space in that there is no immediate relation to the scene in front of the
eye, especially that the observer’s eye is not in pictorial space as it appears to
The aspect graph of a cube. This can be understood as an analogy field, be — as its origin, and cause of one’s perspective — in visual space.
and/or as a picture of a cube as seen from all sides simultaneously. The picture plane is an object in physical space, and does also not belong
to pictorial space.
Euclidean space is, of course, well understood, and will be taken for granted
here. Visual space, understood as some representation of Euclidean space
The Depth Domain and Pictorial Space seen from one or more particular points, can be approached as a subfield of
Euclidean geometry. There exists an extensive literature on this, starting with
Euclid, and significantly continued in recent times, for instance in fields like
Although pictorial space and visual space are both three-dimensional machine vision. There has been remarkable progress in this area.
spaces related to visual perception, they are categorically different. I consider Visual space from the experiential point of view (the ontologically correct
some of the differences: viewpoint) has been studied in experimental psychology and psychophysics.
V ISUAL SPACE is experienced when one opens one’s eye in broad daylight. There exists both an extensive literature on empirical approaches, and a rather
In generic circumstances one experiences the scene in front of the eye. The limited one on theoretical, formal approaches. The progress in this area has
physical scene in front of the eye has (for all practical purposes) the structure not been remarkable, to put it mildly.
of three-dimensional Euclidean space. If vision were veridical, then visual Pictorial space is categorically different from both Euclidean space and vi-
space would be a faithful copy of Euclidean space. In practice this is only ap- sual space because it is not the space in front of the observer, nor necessarily
proximately the case. Although visual space shares its dimensionality (three, some representation of it. The eye, most importantly, is not in the pictorial

23
space. It is elsewhere with respect to that space. Neither is the picture plane environment.
in pictorial space. The topic of pictorial space is discussed in a separate eBook from the
Pictorial space is evidently based on some picture. A picture is a physical Clootcrans Press.
object such as a painting, a photograph, or a computer screen with some si-
multaneous arrangement of colored pixels on it. For convenience I will speak
of a “picture” as of a physical surface, whose points are instantiated through
a pigment (as in a painting), a silver deposit (as in an old-fashioned photo-
graph), a radiance (as on an old-fashioned cathode ray tube), and so forth.
I will treat the (physical) “picture plane” as a Euclidean plane. Pictorial
space is clearly based on the picture plane, the picture plane evoking the
“visual field,” which is an object of visual awareness. The visual field ap-
pears when one assumes a “painter’s view,” for instance. Most people are
never aware of a “visual field”, they are aware of a “visual space”, or—more
simply—just “space”. I will simply equate the visual field with the picture
plane, even if those are ontologically disparate entities. Pictorial space is dif-
ferent from the picture plane in that each point of the picture plane somehow
carries another spatial dimension, the “depth domain.”
“Depth” is a basic feeling of “remoteness,” that is a feeling of separate-
ness from the observer. It appears as a one-dimensional, ordered entity in
awareness, although it is capable of degrees of definiteness and vagueness.
In its most articulate form it is like a one-dimensional line with perhaps the
structure of the affine line. There certainly is no natural origin, and there is
hardly a natural unit of depth. The affine structure is often evident though.
For instance, it often makes sense to consider the midpoint of two fiducial
points.
Thus, pictorial space may be described as a fiber bundle. Its coherence
suggests that nearby fibers are somehow coordinated. I will refer to the fibers
as “depth threads.” You should not conceive of the depth threads as somehow
standing in a geometrical relation to the picture plane. They are in a different
universe. The relation is purely formal. For instance, if you view a painting
obliquely, the idea that the depth threads would be inclined with respect to the
picture plane is simply senseless. The depth threads have to be understood in
a purely abstract sense. They model an aspect of visual awareness, and are
purely mental entities that do not correlate with some obvious structure of the

24
OTHER E B OOKS FROM T HE C LOOTCRANS P RESS :

1. Awareness (2012)
2. MultipleWorlds (2012)
3. ChronoGeometry (2012)
4. Graph Spaces (2012)
5. Pictorial Shape (2012)
6. Shadows of Shape (2012)
(Available for download here.)

A BOUT T HE C LOOTCRANS P RESS

The Clootcrans Press is a selfpublishing initiative of Jan Koenderink. No-


tice that the publisher takes no responsibility for the contents, except that he
gave it an honest try—as he always does. Since the books are free you should
have no reason to complain.

T HE “C LOOTCRANS ” appears on the front page of Simon Stevin’s


(Brugge, 1548–1620, Den Haag) De Beghinselen der Weeghconst, published
1586 at Christoffel Plantijn’s Press at Leyden in one volume with De Weegh-
daet, De Beghinselen des Waterwichts, and a Anhang. In 1605 there appeared
a suplement Byvough der Weeghconst in the Wisconstige Gedachtenissen. The
text reads “Wonder en is gheen wonder”. The figure gives an intuitive “eye
measure” proof of the parallelogram of forces. The key argument is

de cloten sullen uyt haer selven een eeuwich roersel maken, t’welck
valsch is.

Simon Stevin was a Dutch genius, not only a mathematician, but also an
engineer with remarkable horse sense. I consider his “clootcrans bewijs” one
of the jewels of sixteenth century science. It is “natural philosophy” at its
best.

25
26
27

You might also like