You are on page 1of 88

Axiomathes (2005) 15:399–486 DOI 10.

1007/s10516-004-5445-y DHANRAJ VISHWANATH

Ó Springer 2005

THE EPISTEMOLOGICAL STATUS OF VISION AND ITS IMPLICATIONS FOR DESIGN

ABSTRACT. Computational theories of vision typically rely on the analysis of two aspects of human visual function: (1) object and shape recognition (2) co-calibration of sensory measurements. Both these approaches are usually based on an inverseoptics model, where visual perception is viewed as a process of inference from a 2D retinal projection to a 3D percept within a Euclidean space schema. This paradigm has had great success in certain areas of vision science, but has been relatively less successful in understanding perceptual representation, namely, the nature of the perceptual encoding. One of the drawbacks of inverse-optics approaches has been the difficulty in defining the constraints needed to make the inference computationally tractable (e.g. regularity assumptions, Bayesian priors, etc.). These constraints, thought to be learned assumptions about the nature of the physical and optical structures of the external world, have to be incorporated into any workable computational model in the inverse-optics paradigm. But inference models that employ an inverse optics plus structural assumptions approach inevitably result in a naı¨ ve realist theory of perceptual representation. Another drawback of inference models for theories of perceptual representation is their inability to explain central features of the visual experience. The one most evident in the process and visual understanding of design is the fact that some visual configurations appear, often spontaneously, as perceptually more coherent than others. The epistemological consequences of inferential approaches to vision indicate that they fail to capture enduring aspects of our visual experience. Therefore they may not be suited to a theory of perceptual representation, or useful for an understanding of the role of perception in the design process and product. KEY WORDS: 3D shape and space perception, aesthetics, Bayesian inference, computational vision, design, epistemology, visual perception and cognition

1.

INTRODUCTION

‘‘When it comes to deriving suitable and rigorous concepts and designations for the various characteristics of our sensations, the first requirement is that these concepts should be derived entirely out of the sensations themselves. We must rigorously avoid confusing sensations with their physical or physiological causes, or deducing from the latter any principle of classification’’ Ewald Hering 1878

400

DHANRAJ VISHWANATH

A standard refrain in the introduction to most undergraduate textbooks on perception is that vision is not the result of a simple camera-like process in which the external world is imaged – faithfully – onto the minds eye. Instead, it is often claimed that the first step towards an understanding of perception is to discard the notion that what we perceive is an objective view of the external world. For example, in their highly regarded textbook, Sekuler and Blake (1986, p. 3) suggest that a distinction has to be made between ‘‘one’s perception of the world and the world itself’’. What we perceive should be more correctly thought of as the mind’s reconstructed 3D representation of the world generated from a meager 2D image impinging on the retina. Perception textbooks typically go on to say that dispelling the naı¨ve realist view that ‘‘the world is exactly as it appears’’ (Figure 1) has historically taken two opposing approaches: (1) Empiricism, which is best exemplified by Helmholtz’s theory of unconscious inference. (2) Nativism, which is best exemplified by Hering and the Gestalt school (see for example Rock (1984), Sekuler and Blake (1986), Palmer (1999), Turner (1994)). We find out that the empiricist believes that our perceptions are the result of our extensive experience and interaction with the world, while the nativist believes that our perceptions are entirely due to the minds innate predisposition to organize the sensory stimulation in a particular way. The underlying motivation for both these theories, we are told, is what is known as the poverty of

Figure 1.

Naı¨ ve realism.

THE EPISTEMOLOGY OF VISION

401

the stimulus argument: the retinal image highly underdetermines the structures that it gives rise to it in our percepts. Take the example of two possible images of a cube (Figure 2). Evidently, we perceive A as a cube while we perceive B as a square. But, B is also consistent with an image of a cube. In fact, both images, assuming Euclidean projective geometry, are consistent with an infinite class of 3-D shapes. The empiricist’s reasoning for our stable and unitary percepts in A and B might be as follows: Through experience we have noted that in the preponderance of situations that we have encountered a cube, it has appeared to us as image A. We have rarely been in the position to view it ‘‘headon’’ as shown in B and have only experienced such an image when encountering a square. We have thus learned to recognize a cube in A and a square in B. Obviously, the actual story is more complicated. It may, for example, entail the fact that we support these claims through association with our other senses such as touch. In more quantitative analysis of such inferential approaches, notions such as non-accidentalness or generic viewpoint assumptions may also be brought to bear (see, for example, Barlow, 1961; Nakayama and Shimojo, 1996; Richards et al., 1996). The Gestaltist, on the other hand, might say that the reason we perceive A and B the way we do, should be attributed entirely to the mind’s innate predisposition to organize each image. There are no cubes, squares, or surfaces in the world in the folk sense of the terms – i.e. in exactly the way we perceive them. Rather, what we see is the result of the spontaneous cortical organization of the sensory flux. Naturally, this does not preclude the possibility for the organized image to be correlated non-trivially with the physical structure of environment that gave rise to that image.

Figure 2.

402

DHANRAJ VISHWANATH

The aforementioned textbooks will usually inform us that the last half century of research have found both these models – taken by themselves – lacking as theories of perception. Therefore, a compromise between the two must be struck. This new approach, which is usually a sophisticated variant on the classical theory of constructivism originating from Helmholtz’s notion of unconscious inference, one might generically call Neoconstructivism.1 It is best characterized in the classic text by Marr (1982) (cf. Palmer, 1999). The theory is an attempt at an amalgamation of empirical findings in visual neurophysiology, and computational theories of vision originating in artificial intelligence, which view perception as a problem of inference in an inverse-optics framework. In other words, perception is the inversion of the optical process that generates the 2D image from a 3D environment. It is usually argued that Neoconstructivism has the appropriate combination of elements from both empiricist and nativist theories of knowledge. The Neoconstructivist rejects a purely empiricist notion of perception because it can be shown that inverse optics, as well as concept learning, is impossible unless the visual system has pre-specified constrains for determining how the image must be processed. To the Neoconstructivist the Gestaltist’s or nativist position is also deemed unattractive because it seems in the danger of slipping into a kind of solipsism (if it’s all in the cortex, where does the real world come into play?). A perfect compromise for the Neoconstructivist would be to assume, as a nativist might, that there are indeed innate constraints for processing the image. Constraints which capture objective properties and behavior of world that are learnt from interacting with the external environment. A few examples might be: that there exist surfaces, lines, parallel lines, and common objects shapes; that light impinges from above; that the observer is not viewing the environment from a special vantage point; and so on and so fourth. These constraints, along with some tractable form of learning, are combined with the outputs of early perceptual processes that measure properties of the objects and environment such as brightness, illumination, distance, direction, size and orientation. The task of the visual system, then, is to detect, recover or infer from the 2D retinal image the simplest environmental configuration that is consistent with these various measurements and constraints. The requirement for simplicity arises from the well-regarded notion that nature abhors unnecessary complexity. This principle – of Occam’s Razor – has been expressed in the perception literature in

Volume 24 for related analysis). 1996). one would submit. has to either reach the inexorable conclusion of an idealistic world.e. the constraints and assumption actually reflect something objective about the real world) it reduces.g. until those that most effectively ‘‘detect’’ the objective external structure are the ones that are incorporated into the phenotype. a successful rejection of naı¨ ve realism and a perfect compromise between a nativist and empiricist theory of knowledge. Any empiricist theory of knowledge. Since the central claim of Neoconstructivism is non-idealistic (i. and how they are encoded. Horn. In much of the literature. an achievement that for many renders moot any discussion of theories of knowledge. 1998). also see Brain and Behavioral Sciences.THE EPISTEMOLOGY OF VISION 403 such terms as the minimum principle (see Hochberg and McAllister. The story might go: different computational ‘‘tricks’’ or ‘‘rules’’ could compete with each other through evolution. by it’s own claims.g. or cling to its initial mistaken (naı¨ ve realist) belief that experience can provide objective knowledge of a real world. 1985). Hatfield and Epstein. Knill.g. In essence. A cursory glance at Neoconstructivism may make it appear to have achieved. simultaneously. it appears such a conclusion may be premature. the claim is that that these assumptions have to be hardwired into the system through phylogenetic interaction with the objective external world (see Pinker (1997) for a popular scientific account of this notion. betrays an empiricist theory of knowledge applied to the bootstrapping of hardwired assumptions or constraints. and genericity (e. 1953. homogeneity and isotropy assumptions (e. because such a theory usually leads to the question about where the assumptions or constraints about structure-in-the-world come from. regularity assumptions (e. The Neoconstructivist will typically say that it happens through evolution. Many areas of research can indeed . 1986). 1986).g. to a naı¨ ve realist theory.. Richards et al. we find that despite the view espoused in the introductory paragraphs of the perception texts alluded to earlier. these simplicity assumptions are assumed to be a direct reflection of the wellstructured behavior of the physical world. But this. naı¨ ve-realist one. the theoretical basis for current approaches to perception is essentially an empiricist. In other words. On closer inspection though. The foundational issues that afflict Neoconstructivist approaches do not by any means bear on the whole research enterprise of human and computer vision. minimum length encoding (e. as Hume admirably demonstrated. Boselie and Leeuwenberg.

The critical assumption is that the properties being signaled are properties of the real external world that have been learned through experience. the heavy lifting of perceptual processes is the inference from a 2D retinal image to just such a 3D description. This symbolic token may take the form of the firing pattern of an individual neuron or groups of neurons. indeed a faithful image up to some resolution and hardware limitations. Note that we only refer to the theories plausibility as a theory of human perception. of objective properties in the world. The informational content of the symbolic form is necessarily parasitic on the objective information contained in the external objects that they signal. The existence of a symbolic form signaling a property in the brain indicates that such and such a property. shape. an objective 3D description of the external world. explicitly or implicitly. part. The Neoconstructivist approach aligns itself with a theory of perceptual representation where the fruits of perception are. has a direct bearing on whether the theory is plausible or not. 1999).2 In these areas of research. has been successfully (or perhaps erroneously) detected from the available sensorium. color. The critical importance of defining the nature of the information content of perception was first broached by the Gestaltist. it is perhaps the issue of information content that has been most overlooked by contemporary theories of perception. at least to a point. a face. measure. many of the approaches in column B are quite suited to applications in machine vision. Table 1 is a partial classification of areas of research in visual science and perception based on whether or not epistemological issues are critical to such research. Leyton’s theory makes two crucial points regarding the informational structure of perception: (1) The . or entity. etc. Neoconstructivist theories are ultimately aligned with a notion of representation that involves ‘‘symbolic’’ tokens that signal external world measurements. From an epistemological standpoint. object. the representational scheme that is assumed. such as orientation. implicitly assume a naı¨ ve realist model. properties or entities. and has been most forcefully and elegantly put forward by Leyton (1992. and thus the representation is a direct mapping. surface. Undoubtedly. size.404 DHANRAJ VISHWANATH remain agnostic to epistemic assumptions within the theory and. Where the foundational issues do raise a red flag is in any theoretical or empirical research that falls in category ‘‘B’’ which involves the issue of perceptual representation. and are not synthetic constructs of perception. more or less.

vernier. stereo acuities) Spatial localization and capacity Attentional allocation. Application of ecological statistics to shape recovery and representation. etc. non-acidentalness. geons) Shape recovery via heuristics. in perception Correlations between perceptual and visuo-motor estimates of space Inverse optics. etc. Parts and wholes Feature binding. object perception as feature binding Percieved stability of visual world across eyemovements. etc. ‘‘bags of tricks’’.) and representation (cross correlation. 405 .TABLE I. size etc. slant. grouping principles. distance.) Sensor co-calibration (spatial estimation across multiple sources of information) including probabilistic approaches Spatial acuities and sensitivities (e. shape recovery and representation via grouping Perceptual completion.g. adaptation) Estimation of spatial properties (direction.g. mimima principles. and limitations. blinks. Image correlation approaches to shape recovery ´ n vector. regularity. Figure-ground Lightness and brightness.g.eige Perceptual organization.grouping. genericity. Shape recovery and representation via primitives (e. biases. thresholds. shape recovery as inverse optics Perception as shape/object recognition Shape perception as probabilistic inference Shape recovery of from multiple ‘‘cues’’. A: Research topics that can be neutral to epistemological assumptions B: Research topics affected by epistemological assumptions THE EPISTEMOLOGY OF VISION Forward optics Physiological optics Sensory physiology Front-end properties of sensory apparatus (e.

For example. distance and direction as well as any of the ontological categories that we might ascribe to it. Thus. but rather such entities have to derive from the representational scheme itself. In contrast. Putting aside any generic distaste for naı¨ ve realism. the epistemology and ontology of a Neoconstructivist suggest even at first blush to be at least weakly naı¨ ve realist. In other words. and also specify the content of those signals or symbols that make up the percept of the bending road. such as length. Under a Neoconstructivist theory. And the descriptions that can be applied to the fruit of our percepts are also exactly those descriptions that apply to the physical thing out there that is the road. algebraic (group-theoretic) representational schema. nested.. perhaps something less (resolution and hardware limitations). should naturally be described using these descriptors. and the bend exists descriptively in the world in more or less the way that those perceptual signals or symbols specifies. independent of perception.406 DHANRAJ VISHWANATH information content of a percept (it’s causal structure) is constituted internal to the perceptual schema and does not reside in the external world. both the physical thing that exists that is the bending road. certain measures that we may specify the bending road to have. surfaces. In other words. such as line or surface. but because our perceptions are (more or less) faithful descriptions of such objective physical properties and entities. width. This is achieved in Leyton’s model through a purely abstract. but certainly nothing more. as well as it’s percept. provide an objective spatial description of the bend in the road that exists externally. describing such objective properties and entities in the world in terms of these ontological categories and spatial attributes is the only objective way that they can be described. etc. a bend perceived in the road is the activation of some set of signals that a bend in the road exists. we use these spatial and ontological descriptors not because that is the format in which our perceptions specify the world. let us try to understand the informational content of a percept as proposed in Neoconstructivist theories by considering the example of an observer looking at a bend on a road. curvature. what else might possibly be wrong with a theory of perception like Neoconstructivism? One of the most enduring questions an inferential theory such as Neoconstructivism raises is the following: if a percept is . such as lines. (2) The entities and relations used to construct a representational model cannot be parasitic on entities identified in the perceptual product. Under a Neoconstructivist theory.

. or indicator of the existence of a property or entity in the world. the bending. stress free. and that it is instead a static. The sculpture does not represent any familiar object. the Gestalt school. a rolled toothpaste tube. but merely experiencing the lighting up of a hierarchical neural symbolic linkage. how is it that we have a phenomenological sense of the bend itself if the ‘‘indication’’ is merely specifying certain static properties or quantities? Let us look at more closely at Leyton’s question. and the most immediate markers of familiarity are that it is carved out of stone. Hering. as Leyton would point out. and is a solid rigid object. is that we can perceptually sense the forces. 1981 (from Beal and Jacob. Yet perhaps the most penetrating analysis on the question of perceptual experience and it’s relationship to the information content of the percept has been put forth by Leyton (1992). The question his theory raises and answers is the following: for the example of the bend in the road given earlier. say.THE EPISTEMOLOGY OF VISION 407 either an objective measure. or clasped hand. One might argue and say that those perceived forces are merely because the object ‘‘resembles’’. then how is this indication psychologically experienced? This question has been vexing to perceptual researchers from Mach. Yet all our direct familiarity cues should be telling us that such processes are not at work in the object. and so we are not ‘‘experiencing’’ the bending. Yet the most phenomenologically striking aspect of it. and the bulging. 1987). Carving #11 Barry Flanagan. that might be Figure 3. object. using the example of the sculpture in Figure 3. through to Gibson.

etc? The ubiquitous perceptual evaluation that seems integral to the process of designing and the experience of a designed product. if he hires an assistant to carve a multitude of the same shape through his lifetime. his experience should not make his visual system light up the in above symbolic hierarchy (even though he may intend such tromp l’oeil in his observers). suggests that the answer is yes. Any cognitive understanding of it’s similarity to a rolled toothpaste tube must be post-perceptual. Instead.408 DHANRAJ VISHWANATH as follows: ‘‘like a rolled paste tube fi rolling requires force and action fi that force and action produces internal stresses fi internal stresses cause stretching of external membrane fi excessive external stress can cause disruption of the membrane’’. Indeed. and his very percept of the object should change. is there a natural reflexive qualitative evaluation that occurs at the level of the perceptual understanding of a visual configuration. that assistant should cease to phenomenally experience any of the bending and bulging. which is prior to any application of cognitive factors such as memory. roundish object’’ (let us ignore the fact that even these have to be cashed out experientially). Indeed. since the shape only weakly invokes some sort of familiar object. an animal with a visual system comparable to the human visual system.. It arises when we consider what a Neoconstructivist theory of perception has to say about aesthetics and design. That such direct perceptual evaluation is at some level central to the aesthetic experience in art and architecture has been of great interest historically . This line of thinking leads to another question that is definitely a more relevant issue to the theme of this volume. should have a completely neutral perceptual experience with respect to the object since it is neither familiar to a known object. Presumably since he is a sculptor. as well as common visual phenomenology. carved. Do we reflexively perceive qualitative differences above and beyond the objective spatial and recognition measures when we view different visual configurations? In other words. deformations. What would a Neoconstructivist theory predict sculptor Barry Flanagan3 would see when looking at his own finished product. The entire informational structure (the sensed forces. etc. or created with a familiar procedure. he should just have activation of the symbolic set that simply says ‘‘solid. hard. experience. that Leyton enumerates) is under a Neoconstructivist theory either non-existent or the result of a simple application of our cognitive experience with objects. and has himself carved the object. while his experience should strongly evoke what the object itself really is.

Yet. a Neoconstructivist theory works very well. This assumption of psychological equivalence in inferential theories of perception is reflected in the fact that qualitative aspects of perception are usually judiciously sidestepped in favor of measurable ones. but the perceptual act remains neutral. Any judgment on the appropriateness of a configuration must come from extra-perceptual considerations (memory.g. from perception’s point of view. Leyton’s theory of perceptual representation has taken as its central charge the ability to explain fundamental aspects of aesthetics. It is interesting to note that for the aesthetician who wants to claim that all perceptual preferences are learned. aversion. Klee. etc. what we refer to in this paper as cognitive aesthetics. Of course cognitive factors. the nature of the process and product of design (and art) – as well as common phenomenology – are convincing evidence that such perceptual neutrality is not what we typically experience.THE EPISTEMOLOGY OF VISION 409 both in psychology (e. More recently. An inferential theory of perception such as Neoconstructivism implies that all physically plausible visual configurations are. involve choices and manipulation of physical configurations that are deeply connected to perceiving differences in the quality of the configurations. or perhaps. all physically plausible configurations should yield the same perceptual quality. appetite. aesthetic preference is cognitively applied onto the neutral product of perception. or experience. appetite. such as memory. there is nothing else that can be said about it in terms of quality. the erroneous percept is. might color the cognitive experience of the object that perception delivers. sometimes the recovery may be erroneous. The implicit assumption is that since a functioning perceptual system only faithfully infers what is out in the world (up to limits on hardware). no perceptual quality.) as well as artistic movements (e. just as valid. since rather than being a result of the very act of perception. Since the way that it is out there is physically valid (we have already made this caveat). and does not inject any non-trivial informational structure of its own. etc). in the way that it is out there. The very act of painting and designing. experience. Abstract Expressionism and Minimalism). Such differences are inexplicable within a naı¨ ve-realist theory of perception (which we will hopefully show Neoconstructivism to be). at the perceptual level. since all it does is indicate that such and such a thing is out there. analysis by Kant. the Gestalt theorists. Obviously. psychologically equivalent. Arnheim. but since there is no marker on the percept telling us this.g. Our experience .

etc. these central observations of Gestalt theory are precisely the one that has been jettisoned from contemporary theories of representation aligned with Neoconstructivism. . In empiricist theories of vision such as Neoconstructivism (perception as inference. specified by the theory. empirical and theoretical research in vision (notably Hering. and most particularly Leyton’s theory of shape. Yet surprisingly. There are six sections to this paper. Much of this might be attributed to the current lack of resources on the historical lineage of the epistemological and phenomenological problems. phenomenological and aesthetic criteria implicit in Gestalt theory. None of the introductory or survey texts used for pedagogy provide a sustained critique of current approaches and their consequences.) the critical informational and causal distinction between the 2D image and the 3D percept. contemporary vision science has shied away from tackling the enduring but difficult puzzles of perception that are tied to phenomenology. in the work of several researchers (e. The notion that the representational scheme of perception reveals its signature in our perceptual phenomenology is implicit. Leyton (1992. inverse optics. Most. historically. to weave an argument that consists of the following observations: 1. if not all. the Gestalt theorists and Gibson). Through these sections we will attempt communicate a range of ideas. starting from natural philosophy of the 18th century. have been expressed before in the literature.410 DHANRAJ VISHWANATH of what one might call perceptual aesthetics suggests that for a workable perceptual theory. 2001) in his theory has rigorously raised and answered many of the epistemological. and applied to current approaches to understanding perception within vision research.g. and how they apply to contemporary scientific research in perception. Generically. epistemology and aesthetics. Hering) and particularly the Gestaltists.4 We will generously borrow ideas from their analyses provided in these works. the differences in perceptual quality should be deducible from the representational schema that embodies our perceptual system. is erased by the computational rendering of the theory. This paper is an attempt at filling this gap by bringing together issues within an epistemological framework that have been sometimes explicit and sometimes implicit in prior research.

Both calibration and object recognition exhibit characteristics of learning. objects) turn out to be subjective descriptors parasitic on very perceptual structures that they are used to explain.and intra-sensory calibration. 4. On closer inspection such attributes (‘‘features’’). 3. particularly 3D space perception. the percept contains no non-metric information about the perceived world. is a very useful construct for understanding how interand intra-sensory calibration occurs. All other information is rendered to be properties of the outside world. measures (‘‘cues’’) and entities (lines. attributes and entities to both the sensory stimulation and the external world. The result of Neoconstructivist theories is a computational model of perception where the percept itself is largely noninformative. Although the notion of ‘‘cues’’. in such theories becoming naı¨ ve realist ones. the fact that surfaces are continuous) and such information is entirely the property of the inferential device. surfaces. and can usually remain agnostic to explicit representational structures. 6. Standard computational renderings of Neoconstructivist theories conflate sensor co-calibration and object recognition with perceptual representation. and their combination thereof. an objective measure on the external world. 5. The only non-metric information is generic rather than percept-specific (e. In such theories. This results. where relationships are restricted to predictions between output and input. the use of the notion of ‘‘cues’’ is problematic for areas of research that are aimed at understanding the nature of perceptual representation. and is especially not. which are usually taken by such theories to support an empiricist or constructivist epistemology for perceptual representation. properties which are merely symbolically instantiated in the inferential device. inexorably. Such a model is viable because it takes a strictly behaviorist approach to the notion of perceptual estimation of spatial attributes. Theories of perception-as-inference always involve positing objective measures. A restricted model of perception as inverse optics that deals with only inter-sensory and intra-sensory co-calibration issues is a viable model for a range of empirical research studies in vision.g. because . The remaining metric information is itself not informative outside of the purview of inter. as often assumed.THE EPISTEMOLOGY OF VISION 411 2.

This is important because the notion of the distinction between contingent and necessary connections between events will be crucial for understanding why all constructivist theories of shape representation/recovery ultimately reduce to untenable naı¨ ve realist ones. design and every day visual experience. Neoconstructivist theories cannot explain how the percept is experienced. and are not. grouping. In Section 2 we do a rudimentary review of the basic epistemological arguments in modern philosophy stretching from Descartes to Kant. One such red herring is the puzzle of how a stable percept is maintained despite the constant changes in the retinal image across saccadic eye movements and blinks. Theories of inference introduce spurious problems to the understanding of the perceptual process. most of these approaches are contrary to the basic epistemological and functional proposals implied by Gestalt theory. figure-ground) embrace Gestalt principles as important factors in the generation of the visual precept.g. Recent Neoconstructivist approaches (e. objective descriptors of either the external stimulus or the internal image. We reiterate that the methodologies in both these approaches have important and wide application to many problems in human and . as is commonly assumed. Gestalt theory. A fundamental charge of the theories put forth by Hering. Yet. Neoconstructivist theories cannot explain why our percepts seem to provide greater information content than appears to be ‘‘objectively’’ present in the external array. Neoconstructivist theories cannot explain the phenomenological reality of the reflexive qualitative judgments of perceived visual configurations that appear pre-cognitively in art. perceptual organization. 8. 10. cues are merely ways in which to specify measurements within the perceptual output. This is an argument implicit in Gestalt theory and central to Leyton’s generative theory of shape. In Section 3 we review the two basic approaches to shape representation/recovery in modern research: (1) standard computational vision5 and (2) shape perception as Bayesian probabilistic inference. Gibson and Leyton.412 DHANRAJ VISHWANATH 7. Leyton is among the few who have argued how the understanding of aesthetics is central to any computational theory of shape. 11. 9.

This section will. The reader is directed to the extensive reviews and analysis on their application to contemporary perceptual science by Albertazzi et al. but in addition. Many other ad hoc approaches to shape representation suffer similar problems. Albertazzi (2000. In Section 4 we assess two key theories of perception that have heavily influenced current research. .THE EPISTEMOLOGY OF VISION 413 computer vision. A short reading of the paper might include the introduction. and Sections 5. which is the one that is most familiar to researchers in perception. Section 2. For the latter we mention only the theory and approach of the Berlin school of Gestalt (e. 2002). 6 and 7. Specifically we will outline 3 of them: (1) shape perception as inference from 2D image to 3D world (naı¨ ve realism) (2) Shape perception as a calibration map (here we will also argue that shape or object recognition can be thought of as a form of calibration) (3) shape perception as the presentation6 of sensory flux. The intent here will be to try and show why they cannot be successful theories of human perceptual representation. Section 6 discusses the implications of each approach to perceptual experience. an important distinction must be made between ad hoc theories and the sound quantitative frameworks of computer-vision and probabilistic approaches. Since the paper is quite long.g. as well as the assessment of visuo-motor capacities of humans. Here the notion of representational conflict in perception is introduced. In that sense. Wertheimer and Kohler). namely Gibson’s theory of perception and Gestalt theory. There are many important and crucial ideas that come out of the early Gestalt theorists such as Brentano. Mach. (1996). be of a speculative nature. Section 5 analyses the shortcomings of inferential approaches and outline diagrammatic frameworks for understanding the various approaches one might take for a theory of perception. 2001. Section 7 discusses the implications of theories of perception to aesthetics and design. a first reading might be possible by skipping Section 3. Von Ehrenfels. they do not provide any useful quantitative framework for other basic aspects of vision research. as well as other philosophers and psychologists of the Austrian and Italian schools of Gestalt theory. by design. and are irreplaceable in the development of artificial systems. and for those familiar with the basic philosophical arguments.

we must either give up any claim to the existence of an objective world external to the perceiver.414 2. causality. Contemporary theories of perception appear to betray a blind sighting of this central epistemological result in philosophy. leading to the need for the introduction of inductive biases. the two competing theoretical approaches to explaining how knowledge is supported are nativism and empiricism. in order to make the computation tractable. it is the question of how we support the knowledge we have of the world. and induction) indicates that rather than being a tussle between nativism and empiricism. priors. DHANRAJ VISHWANATH PHILOSOPHICAL PRELIMINARIES ‘‘Ich gestehe frei: die Erinnerung des David Hume war eben dasjenige.8 Yet it’s analytic conclusion has reappeared quantitatively in every computational model of concept learning or perception ever devised.’’7 Immanuel Kant (1783) The central metaphysical questions and theories of knowledge in modern philosophy stretching from Descartes to Kant essentially arise from the attempt to explain how we arrive at our perceptual understanding of the world. the entire enterprise is essentially a proof of the untenable nature of an empiricist theory of knowledge.. Namely. Hume’s analysis could be thought of less as an attack on rationalism. heuristics. Notwithstanding the colloquial understanding of Hume’s contributions as the refutation of the Cartesian position. etc. was mir vor vielen Jahren zuerst den dogmatischen Schlummer unterbrach und meinen Untersuchungen im Felde der spekulativen Philosophie eine ganz andere Richtung gab. And as we mentioned earlier. and that with a more sophisticated analytic philosophy having arisen. One reading of the philosophy from Descartes to Kant (the analysis of space. regularity assumptions. as it is a device to expose the inherent contradiction in empiricism. is that the epistemological implication of the induction problem is viewed as having been beaten to death. neither the psychologist nor the philosopher . when there is no basis for such knowledge within the sensorium itself. or maintain a naı¨ ve realist view of perception. objecthood. A possible explanation of why these foundational questions receive short shrift. It does so by arriving at the following conclusion: If we embrace empiricism as a theory of knowledge.

we can observe nothing in the relation between the individual events A and B besides their contiguity in space and time.. Apart from this constant conjunction. . concept formation. We will be interested in those basic aspects that are essential to understand for anyone attempting to develop a theory of perceptual representation. and the fact that A precedes B. instead. we will steer clear of debates on causality.. in current analytic philosophy and. We will quickly revisit the induction problem in the simplest way possible. He stakes the claim that there can be no a priori proof of any matter of fact.THE EPISTEMOLOGY OF VISION 415 finds a need to consider it in constructing theories of perception or cognition. Hume’s argument against a rationalist understanding of cause. there is nothing that we observe. and nothing that we could observe. Though it appears that he may be saying one cannot rationalize one’s way to the truth. Induction can be thought of as a process by which we create a necessary connection between two things that are by their very definition. in the relation between A and B. Scruton’s excellent summary goes thus: ‘‘The idea of necessary connection cannot be derived from an impression of necessary connection – for there is no such impression. focus on only those aspects that are historically undisputed. If A causes B.. etc. leading us to expect B whenever we have observed a case of A. that would constitute a bond of ‘necessary connection’. we can do no better than merely ‘‘summarize what happened to be true’’ (Scruton. In the interest of not getting sucked into a philosophical black hole. and the invalidity of induction. what he is essentially saying is that the attempt to construct a theory of necessary connections from particulars of observation is doomed to failure. Since matters of fact are contingent on observing the state of affairs. 119). In order to keep the historical introductory perspective we will use characterizations from a well-regarded introduction to modern western philosophy by Scruton (1986) that quite sufficiently and elegantly captures the basic ideas we need. We say that A causes B only when the conjunction between A and B is constant – that is. p. when there is a regular connection of A-type and B-type events. and no set of observations however large will ever exhaust the possible theories that we may be able to construct through reason. Hume then goes on to show that the notion of cause and objecthood as necessary connections suffer from the precisely this problem. contingently separate.9 begins with his distinction between relations of ideas (necessary truths) and matters of fact (contingent truths).

Namely. 121] It is the same line of reasoning that denies us the grounds for inducing the notion of enduring objecthood. any more than 2 +3 is distinct from 5. or in other terms. But. claims that it is habit and custom. Propositions expressing matter of fact are always contingent. that the problem with induction (of causality. we cannot deduce the existence of B from that of A: the relations between the two can only be matters of fact. if A and B are identifiable apart from each other. p. Once again there is no rational basis for positing a necessary connection between these distinct ‘‘impressions’’ and thus no basis for positing the existence of objects that endure even when unobserved (Scruton. p. rules out the possibility of a necessary connection. (Scruton. But his statements to the fact that the notion of causality appears to arise spontaneously within us. seem to betray his implicit belief that these notions – whose validity cannot be established a priori – originate in the peculiar way our mind organizes our experiences. If there were a relation of ideas between A and B. Kant’s theory of knowledge places equal importance on both experience and the role of the innate predisposition of the mind to interpret that experience. p. Thus one way of interpreting Hume is that the properties imputed to the world cannot be derived solely from the particular sequence of sensory events. but rather. as a relation between distinct existences. Hence it must be possible to identify A without identifying B. How then does the mind construct notions of causality and objecthood? Hume. 1748). relying on his empiricist ideology. The concept of an object relies on the notion of persistence over time. it is only those conveying relations of ideas between A and B that are necessary. the necessary connection between distinct experiences of the same object. 123). These arguments against the concepts of cause and objecthood are part of Hume’s more general argument against the idea of induction. The very nature of causality. then A is a distinct event from B.416 DHANRAJ VISHWANATH Why is Hume so confident that ‘necessary connections’ between events cannot be observed? His reasoning seems to be this: causal relations exist only between distinct events. He rejects from Descartes the notion . that the most fundamental properties attributed to the external world are constructs of our mind. of the existence of objects. 122).’’ [Scruton. But. If A causes B. of the existence of a material world) is that it hinges on establishing necessary connections based on the observation of contingent connections between distinct existences. in that case A and B would not be distinct. then there might also be a necessary connection-as there is a necessary connection between 2+3 and 5. or that the mind ‘‘has a propensity to spread itself upon objects’’ (Hume.

the relationship is genuine synthetic knowledge of the behavior of things. the 2D retinal image and the resulting 3D percept are defined to be distinguishable entities. in the gas equation PV=nRT. then ‘‘P’’ cannot be said to be causally linked with ‘‘T’’ because a causal link is by definition a connection between two distinguishable quantities or entities. (See part 5 for a description of visuo-motor measurement space).e. a real distinct property such as pressure cannot be said to exist in distinction to another real and distinct property. while from Descartes he retains the idea of innate conceptual constructs as the only basis with which to describe that experience.THE EPISTEMOLOGY OF VISION 417 that through reason alone one can arrive at knowledge of the world. He rejects from Hume the position that concepts arise only via association and custom. though 2+3 and 5 are necessarily and not contingently connected. In the same way. Still. and the critical causal linkage occurs at the level of the inference . These innate ‘‘concepts’’ then represent the most basic foundations of human thought beyond which no further analysis is plausible. quantities for which there appear to be functional reasons to distinguish between by defining a specific mode of measurement. and thus their distinction is lost in such a formulation (i. It provides synthetic knowledge of the behavior of the distinct quantities and qualities (e. Thus. the relationship gives us synthetic knowledge.g. Another way of saying this is that perception must be explained in terms of the innate predispositions that are required to bootstrap sensory stimulation so that it may be experienced. pressure and temperature) that we define over the sensorium via a particular mode of perceptual-motor measurement. the kinetic energy of a molecule (rate of change measured in motor space as a function of measured time). But. because their distinction is merely in the way we define each in perceptual measurement space. An objective external distinction cannot exist between pressure and temperature for good reason. Still. temperature). a synthetic knowledge of the world is still possible. Kant’s brilliant insight is his claim that despite these constraints on knowledge. there is a necessary connection between pressure and temperature. What he retains is the idea from Hume (and the other empiricists) that our senses provide the only subject matter for knowledge. the equation does give us synthetic knowledge of the relationship between qualities and quantities defined in perceptual or motor terms. whether it be the space between a molecule (measurement in motor space). Within a Neoconstructivist theory.

This agnosticism regarding the status of perceptual knowledge has spread to a large swath of research surrounding the intersection of modern cognitive psychology and analytic philosophy. the foundational assumption is that the 2D retinal image is causally distinct from the 3D percept since the former (part of the real world) gives rise to the latter (part of perceptual space). and in effect there has been an unconscious return to naı¨ ve realism. possibly due to the deference that philosophers began to give perceptual scientists late in the century with the advent of sophisticated experimental methodologies and some rather remarkable empirical results in physiology. Yet within the research arena most see the empiricist-nativist argument as moot. that any focus on the epistemological question is not required for a scientific understanding of perception. This has derived from the nearly century long tussle between psychological theories that embraced an empiricist theory of knowledge and ones that embraced a nativist one. and the Behaviorists. Historically. what appeared to have taken its place is something in the guise of a compromise. is essentially aligned to the same epistemological claims as empiricism. the Structuralists. What we will hopefully show is that though such a theory depends on the distinguishability of the two. that have been rejected (the innate predisposition of the mind in organizing experience and the essential unknowability of the world as it is in itself 12) . and the latter by Hering and the Gestaltists. assured that the reality is somewhere in between.10. their very mode of definition leads. inexorably. to a place where that distinction is lost. In such a theory.418 DHANRAJ VISHWANATH (the inductive step) from 2D image to the 3D world. The former represented by Helmholtz. but as we have already begun to see in the introduction.11 What is interesting – and as we will hopefully show – is that it is precisely the things that Kant rejects based on Descartes’ and Hume’s analysis that have been embraced by Neoconstructivism (the possibility of rationalizing an objective world from our sense experience and the possibility of an empiricist theory of knowledge) and it is precisely those things that Kant embraced. resulting in a naı¨ ve realist or idealist metaphysics as Hume warned. the epistemological concerns in psychology have centered almost exclusively on the nature-nurture debate. With the waning of Behaviorism and associationist models of perception.

And nothing else. as Bayesians. The original goal of these approaches was not to understand human perception. what do we find? We find our own posteriors. and we will borrow much of the following analysis from there. 1996 The classical work in computational vision originated in work in artificial intelligence. 3. Gibson’s formulation of invariant properties of surface features under optical projection that provided the foundation for this early work. and therefore it is important to see why it fails as a model of perception. and the 2D images generated from that 3D environment via a forward optical . All we can ever see in perception is our own posteriors. we examine the ‘‘external world’’ to determine what priors we should use. this approach eventually became a model for theories of human perception as inference. it’s use in the artificial vision domain was appropriate since epistemological issues could be ignored. J. Leyton (1992. or computational theories generically referred to as ‘‘spatial vision’’ (spectral image analysis in space and time in the Fourier domain).1. Though we will later see that Gibson’s approach is epistemologically untenable. to distinguish it from other computational approaches to vision such as perception as Bayesian inference.THE EPISTEMOLOGY OF VISION 419 3. It was J. We will call this approach standard computational vision following Leyton (1992). COMPUTATIONAL APPROACHES TO PERCEPTION AS INFERENCE ‘‘So when. 1999)13 has provided a very clear analysis of standard computational vision. However. One of the early approaches that embraced theory from perceptual research was that of Horn and collaborators (compiled in Horn (1986)). These cues are aspects of the image that can be shown to have invariant relations between the 3D environment. and the problem of developing robotic devices that could optically detect and categorize objects in controlled environments.’’ Hoffman. but to develop computational schemes where video images could be used to recover or recognize objects in highly constrained environmental configurations. Standard computational vision Computational vision has largely focused on how geometry of surfaces can be inferred from a 2D image using certain ‘‘cues’’ in the image.

once characterized. Modeling of observer–environment interaction involves the derivation of functions that predict how the value of a particular measurable quantity at the 2D image varies with a measurable physical property of the environment such as surface orientation. no assumptions about the actual structure of the environment (e. 1983). or the shape of a texture element in the image as a function of the orientation of a texture element in the environment (e. the predicted quantity at the 2D image would be the measured illuminance or irradiance. (forward optics) (2) Modeling the recovery of observer independent properties of environment. 1981). given the orientation of the corresponding surface point in the environment. actual surface orientations) need to be made. 1981).g.e. Gibson provided a particularly clear description of these invariants. (inverse optics). in the domain of surface shading. The invariant relationships.14 while the assumed properties and quantities in the 3D environment would be surface reflectance properties and the nature of the illumination. or texture (Stevens. 1981. Stevens. and the observer. The result of this modeling would provide. 1981). One such assumption might be that the angle of reflection is equal to the angle of incidence. for example.g. Witkin and Tannenbaum. 1983. I. functions that predict the illuminance of a point in the image. shading (Ikeiuchi and Horn. Standard Computational Vision can be broadly broken down into two phases (from Leyton. Ikieuchi and Horn.420 DHANRAJ VISHWANATH projection. the medium (light). Brady. the image formation process. The image formation process is based on an optical model that involves assumptions regarding the nature of interaction between the environment (material substance). 1992): (1) Modeling the observer–environment interaction – i. At this stage. (assuming the environmental illumination and surface reflectance properties) (e. recovering the objective 3D environmental configuration from a 2D image. Analytically. are then used to by the visual system to accomplish inverse optics: infer the 3D environment. given the 2D image that was created by an optical projection. 1981). assuming certain relationships between quantities and properties in the 3D environment.e.g. and defines the image formation process in that domain. contour (Kanade. .g. The analysis is derived for sparse environment/image configurations specified only in a single domain e. For example.

1981). In order to get a clearer understanding of this two-stage process. left panel). y) where the absolute value of x and y are very high. Terzopoulos (see Blake and Zisserman. observed at the image. modeling the recovery of the observer-independent properties of the environment. 1981). certain basic assumptions about the nature of the observer independent environment must be made. These vectors can be expressed in terms of what is known as gradient space. we can relate point-wise surface orientations in the environment to pointwise image intensities. 0) since its normal is parallel to the line of sight. the intensity of the measured illuminance at the corresponding image point will be directly proportional to the intensity of the light reflected from the surface (Figure 4. namely. while . Thus. Since the pattern of shading (the light emanating from an object) in the 3D environment is correlated with the pattern of illuminance measured at the image. Naturally. illuminated from point S. The classical results in this area are from Grimson (see Grimson. but rather constrains the possible point-wise orientations of the surface/surfaces that could have given rise to a particular illuminance measurement at the image. we review the classical work on shape from shading by Ikieuchi and Horn (1981) and reviewed in Horn (1986). The first part of Ikieuchi and Horn’s shape from shading model is to determine the intensity of light reflected from a point on a surface in the environment based on the local surface orientation with respect to the viewing position (the surface luminance). or spatial pattern of intensity of light. these computations do not result in a solution for environmental (surface) structure. In order to arrive at a single interpretation of surface structure. knowledge of the latter might be used to infer the former. where the directions are specified with respect to the surface normal N.2. These are the assumptions that are brought to bear in the second part. Figure 4 (right panel) shows a surface patch viewed from point V.15 A frontoparallel surface has gradient (0. Shape from shading The goal of any shape from shading computation is to determine some property of surface shape – usually surface orientation – given the pattern of illuminance. 3. while a surface with very high slant will have a gradient (x. This phase has been typically referred to as surface interpolation.THE EPISTEMOLOGY OF VISION 421 However.

as long as we ‘‘know’’ or assume it’s . given a particular direction of illumination to that point we can derive the luminance as observed from a particular viewpoint. then Horn (1986) shows that we can derive expressions relating the angles between the surface normal.422 DHANRAJ VISHWANATH Figure 4. illumination direction. ¥). qÞ. can then be expressed purely as a function the variables hi and he. In other words. as observed from a particular viewpoint. a surface perpendicular to the image plane will have gradient ()¥. cosðhi Þ ¼ fðp. for any surface point whose orientation is expressed in terms of gradient space. qs Þ: The luminance of a point on the surface (of a particular orientation). given a particular direction of illumination. If (p. ps . the illumination direction vector and the line of sight can then be expressed in the coordinates of gradient space. In order to do this we have to assume what is known as the bi-directional reflectance distribution function (BRDF) for the material that makes up the surface. right panel) The relationship between the surface normal vector. and viewing direction in terms of the gradient space for arbitrary surface reflectance property (note: the viewpoint direction is always at the origin of the gradient space): cosðhe Þ ¼ fðp. This specifies the reflectance characteristics of the surface material due to its particular surface microstructure. q. qs) the illumination orientation. q) represent the co-ordinates of the surface patch orientation in gradient space and (ps. The BRDF is the ratio of the amount of light reflected from a surface patch in a particular direction to the amount of light arriving at the patch from the illuminant. (see Figure 5.

q) that have the same luminance with respect to the viewer. the one that makes e equal to i.THE EPISTEMOLOGY OF VISION 423 reflectance behavior. From Horn (1986). However. Using the vector expressions for surface orientation and illumination direction and the BRDF. Three examples of this are shown below (Figure 5). Points on a given contour are effectively different surface orientations (p. where the view vector orientation is represented by (0. Such a plot is called a reflectance map (see Horn. for other BRDFs.g.e. The Figure 5. The first is a reflectance map for a Lambertian (matte) surface where the viewpoint and illumination direction are identical. Indeed there may be a whole class of orientations that may give rise to the same observed luminance. we can derive the predicted luminance (radiance) for different orientations of a surface patch expressed in gradient space (p. . i. or Lambertian reflectance: distribution of reflection such that the same amount of brightness is seen from any viewing angle). qÞ For a purely specular surface (i. Rðp. it is possible that different surface-patch orientations will result in the same luminance (given a particular illumination angle with respect to viewing direction). q). a mirror). specular reflectance: reflection such that surface brightness is only measurable when the angle of incidence=angle of reflection. Any set of orientations that give rise to the same luminance will appear as iso-luminance contours in this reflectance map. at all other location no luminance will be observed. essentially a plot of surface luminance as a function of surface orientation expressed in terms of gradient space. we can plot the predicted surface luminance for all possible surface orientations for a particular set of illuminants. luminance will only be observed for one particular surface orientation. So if we assume an illumination direction and the surface BRDF (e. 0).e. 1986).

But it is obvious that such a mapping is not unique. So for example. It is possible that a given set of iso-reflectance contours could arise from entirely different sets of lighting and BRDF conditions. since . we could potentially use the inverse of this function to predict the surface orientation consistent with any given measured image illuminance value. Also.g. 1986) Eðx. Conversely. or altering their spatial distribution properties would change the nature of the iso-reflectance contours. semi-gloss surfaces) for viewpoint different from illumination direction. for a given surface orientation (p. yÞ ¼ Rðp. Thus. There are an infinite number of possible surface orientations that could give rise to a particular illuminance value. Then. Here the highest luminance will be for a surface whose normal is parallel to the illumination direction. if we can (1) assume a particular surface reflectance characteristic (BRDF). q) we can predict a particular luminance R. This relationship is implicit in what is called the image irradiance equation (see Horn. we can derive the expected luminance with respect to the line of sight (view vector) for a surface patch of a given orientation. This more complex reflectance map is derived as a simple sum of contributions of the matte and specular components. particular measured image illuminance. and (2) assume a particular orientation and distribution of the illumination. qÞ: Where E is the image illuminance (or irradiance) and R is expected surface luminance (or radiance) as a function of surface orientation. illumination and surface properties). we can derive the class of surface orientations consistent with a given surface luminance (under assumptions of viewpoint direction. The second is for Lambertian surface where the illumination direction and view vector are not the same. The third is an example of a Lambertian surface with a specular (shiny) component (e. since image illuminance (amount of light recorded at the image) is directly proportional to surface luminance we can derive the class of surface orientations consistent with measured illuminance at a point in the image. Note that in all cases the assumption is a single infinite uniform light source (all rays parallel to the assumed direction of illumination and of the same intensity). which in turn allows us to predict. up to a scaling factor. and shows two areas of maximum luminance.424 DHANRAJ VISHWANATH surface orientation with the highest luminance with respect to the viewer will be one whose normal is parallel to the line of sight (and illumination). Addition of light sources.

It is important to note that we have not used any information from the image per se. view direction and illumination direction we assume (3) the illumination direction with respect to view vector and (4) the surface reflectance characteristics (e. does not dissipate over space. we are already missing large swaths of the informational content of the percept of surfaces. the very structure of which we are trying to recover. namely orientation with respect to the line of sight. Lambertian).g. we have assumed: (1) a particular physical and ontological model of the world that contains the entities (light and reflecting surfaces) which include assumptions such as light travels in straight lines. To summarize. and a set of disconnected .e. reflects off of surfaces (2) assume that we are only interested in one aspect of surfaces. we are implicitly stating that the inference has to have a range of built-in assumptions of the environment. any given image illuminance value signals a class of possible of surface orientations. The constraint is weak because. Note that if we were to construe the analysis thus far as demonstrative of some real process in the human visual system. all we are able to do is put a weak constraint on the local surface orientation in the environment given the measured illuminance at a point in the image. Note that so far in constructing an inferential model.. Then in order to setup the relationships between surface orientation. despite it. and even if we could somehow specify a unique value for local surface orientation (i. In other words. Finally we have to use a (5) Euclidean projective geometry in arriving at a particular reflectance map. there are an infinite number of possible environmental configurations that could be inferred. pick a point on the contour) given the pixel brightness in the image. Once we have a particular image description. the image on the left is consistent with both a single connected surface. For example. Accepting a limited description of surface perception (orientation). in Figure 6. since the entire informational content of the percept here is reduced to local surface patch orientation. So we are still very far away from getting any handle on inferring what the actual surface configurations are.THE EPISTEMOLOGY OF VISION 425 any surface orientation on a particular iso-luminance contour is a valid prediction for a given value of image illuminance. up to this point. we still don’t know the nature of the connectivity between the piecewise orientations. we still have an infinite number of possible solutions for the surface structure. This is because though we uniquely specify the local surface orientations.

In other words. (2) figure out the actual location and size of the local surface patches. 1992) 1. Is as smooth as possible 3. Conforms to the boundary values. pick a point on the given contour in the relectance plots. This procedure in classical computational vision is know as surface interpolation. surface patches (some small ones 1 m away from the observer. Surface consistency: there is minimal variation in geometry of surfaces in the environment. Conforms to the image intensities (illuminance) 2. The first constraint is satisfied by picking surface orientation (f. q) given each image location such that the resulting hypothesized surface (quoting from Leyton. and may be given by the following expression .e. we additionally need to (1) limit the inferred orientation to a single value given a particular image illuminance value. others large ones even a mile away from the observer!) Either of these two configurations will result identical retinal images. g)16 that minimizing the variation between derived surface luminance (from the reflectance plots) and measured image illuminance. assuming one can independently recover these using some other independent mechanism). after we have picked the possible set of orientations that matches observed surface luminance (contour in Figure 5). So in order to infer the overall surface structure. and it involves making two assumptions: 1.426 DHANRAJ VISHWANATH Figure 6. 2. Boundary conditions of the surface given in the image must be satisfied (surface must conform to edges. we want to find the orientation or gradient (p. and are thus valid interpretations of the image. I.

gÞÞ2 dx dy: It can be thought of as the ‘‘error’’ between derived luminance from the reflectance plots and observed luminance as given by the image. where k. and where the reflectance error term and minimization of surface orientation variation is modeled using ‘‘spring’’ models. Terzopoulos (1983).THE EPISTEMOLOGY OF VISION 427 ei ¼ Z Z ðEðx.3. yÞ À Rðf. in effect. the regularization term is adjusted depending on how much noise we assume in the image. These equations are in effect the expression for a thin membrane clamped to an assumed boundary and allowed to settle to a minimum variation subject to conformation with image intensity values. This model was developed further by Grimson (1981). Additionally. Hume and shape from shading In conclusion we see that the standard computational vision model of inferring surface orientation from image brightness values (shape from shading) relied on (a) assuming a particular objective model for observer–environment interaction and (b) assuming a particular physical model of the environment. (see Leyton (1992) for a fuller . the physics of membranes or thin plate surfaces. other cost functions might be applied to determine whether the inferred surface should ‘‘break’’ or ‘‘bend’’ at high image contrast boundaries. The second constraint is satisfied by minimizing the variation in surface normal orientations hypothesized for each location on the image (derived using the reflectance–gradient relationship above) and may be expressed as Z Z   2 2 2 ðfx þ fy Þ þ ðg2 þ g Þ es ¼ x y dx dy: To satisfy both we minimize both simultaneously by the following expression es þ kei . Blake and Zisserman (1987) into more sophisticated models where the environmental surfaces are assumed to be thin plates. 3.

No knowledge of the world is gained by mere application of the 3D inferential transformation. I. it cannot genuinely constitute an ‘‘inference’’. Note how. rendering the inferential transformation non-informative.428 DHANRAJ VISHWANATH discussion). the surface structure we infer is fixed given the assumptions that are incorporated into the transformation. in such a model. assuming that they are two distinct entities. This is not a problem that is relevant for the machine vision type application for which this analysis was developed. But in this process we have defined a mechanism which removes the very contingent identity that makes any inference interesting or informative.e. If we take it to be a workable model of inference then we must assume in the model that the 2D image and the 3D environment are distinct entities. the inferred environmental structure is deterministic on the image. where the inferred 3D environment has some additional contingent information that is missing in the 2D image. Since there is no information gain due to such a ‘‘translation’’. Another way of thinking about it is that inferential theories implicitly suggest that there is a gain in dimensionality – and by extension a gain in information – in the process of inference from a 2D image to the 3D environment. like a lookup table or a translation from one language to another. The result is a model where the two states are necessarily connected. in as much as there is no knowledge gained from counting five objects as being . Given an image. But. The acceptance of the validity of the inferential process negates the very distinction that makes the inference interesting. but it is a problem if we construe it as a model of perceptual inference. the computational rendering of the inference reveals that there is a fixed relationship between the 2D image and the inferred 3D structure such that no information gain is possible – the transformation can be thought of as merely changing the frame of reference in a fixed dimension information space. This is precisely what Hume told us will happen if we try to induce state B from state A. In the process we will have to abandon that very distinction that makes the inference informative. for such a mechanism. the information constituted in the inferred 3D environment is informationally identical with the 2D image no new information results from making the inference from 2D image to 3D environment. and that we want to infer the former from the latter. All that is being done is that the information in the image is being transformed into a different data format using a fixed and known set of transformations internal to perceptual system.

having application to domains in machine vision.THE EPISTEMOLOGY OF VISION 429 a set of two and a set of three objects. rather than by counting them together as a set of five.17 If one were to insist that the inference process was valid. Let us now look at a more recent approach to perception as the inference of 3D structure from a 2D image. the forward optics functions) in the brightness domain constitutes genuine synthetic knowledge of the behavior of things in the environment (captured in terms our perceptual ontology – as is any physical science. The five objects (or the 2D image) remain the only contingent truth. Note that the derivation of the transformation functions from 3D environment to 2D projection (i. because from the point of view of the knowledge of objects. Its ostensible strength comes from the realization that inferences occur in a probabilistic. a naı¨ ve realist epistemology. application of the operator (or transform) does not add information to the original set of 5 objects (or the 2D image). This approach attempts to get around some of the problems of standard computational vision by focusing on the stochastic nature of the sensory flux and the organization and structure of the external world itself. and not deterministic. Any synthetic knowledge contained in the ‘‘+’’ operator (or 3D transformation function) about ‘‘the way things behave’’ is already internal to the perceiver. that ‘‘addition’’ transformation does not tell us anything more about the objects in front of us.e. domain. aspects of physiology. However. The idea is that the image (or set of images) is a statistical sample of some objective external environment which itself is made . or change their objective status in any way. then adding them to make five. and that it results in a gain of contingent truth. it does not bear on the question of how the visual system parses the sensory flux into the apparent brightness variation that is perceived. Perception as Bayesian inference There has been enormous interest in the last decade for reformulating the perceptual inference problem as one of probabilistic inference.4. or how the phenomenology of the continuous surface that appears to support those brightness variations is achieved. Establishing the mathematics of forward optics therefore constitutes a valid scientific enterprise in and of itself. two and three is the same as five. and assessment of the perceptual capacities in human vision. then one would be embracing. or why it does so in the way that it does. as the empiricists warned. 3.

Within a Bayesian framework (as indicated by Richard’s and Jepson) the prior probabilities p(P|C) can be thought of as the learned objective structure and behavior of the world. is a critical factor in the inferential step. point. we can express the probabilities of obtaining a particular image feature F given knowledge of P. The observer–environment interaction phase is what is typically referred to in the Bayesian literature as observer analysis. or. namely p (F | P). points. or a priori idealizations of the world. We know or establish p(P|C) because we have a priori knowledge or experience that the world behaves in a certain way. objects etc. environmental configuration (surface. ideal observer analysis. (1) probabilities relating to observer–environment interaction (2) probabilities for the inference of observer-independent structure. in a ‘‘world context’’ C given a measured feature F. lines.).) We will use a generic. and notably clear formulation of perception as Bayesian inference presented by Jepson and Richards (1992). points. (Note. Considering some collection of measurements on that environment F. or simply p (P). Not surprisingly. The important addition that Bayesian theory (as opposed to standard probability theory) brings to the formulation is that the probability of the occurrence of P in the world p(P|C). is to infer from the properties of the image (F) which specific configuration of the external environment (C) may have most likely given rise to that image. for now we’ll skip over the problem that the properties of the image are usually described in terms of the features that the percept itself is delivers. It is the derivation of the likelihood functions for the probability of observing a particular image configuration given a .g. surfaces. To paraphrase Jepson and Richards (1992): there are certain properties or configurations P that occur in the environment or ‘‘world context’’ C with probability p (P|C). as in the previous models. also referred to as image features. then. what have you). The probability p(F|C) is pre-determined due to the fact that we assume a particular structure for the interaction between the observer. The problem of perception. and medium (light). angles. within a Euclidean projective framework.430 DHANRAJ VISHWANATH up of a stochastic distribution of properties and elements (say. illumination. Bayesian analysis of perceptual inference breaks down in to the same two components enumerated for the standard computational vision example. The problem of perceptual inference is to determine the probability of the property P. e.

e. the structure of the image and the behavior of the word are assumed. In the Bayesian case. the likelihood function (observer– environment interaction) and the prior distribution (observer-independent environment properties) are then combined to produce the posterior distribution. Once again. the likelihood ratio. Rpost ¼ pðP j F. underwrite the validity of any particular interpretation. but rather. is called genericity condition. which.THE EPISTEMOLOGY OF VISION 431 particular environmental configuration. this derivation would essentially be isomorphic to what we saw as the iso-luminance function in Horn’s analysis. CÞ pð P. They propose a posterior probability ratio Rpost. . given F and C. The two components. Using Bayes rule. and P is the inferred property that we are trying to select that maximizes this ratio. it only constrains the possible space of environmental configuration that could have given rise to the observed image configuration. Jepson and Richards (1992) present a formulation that does not directly rely on probabilities. Note that in this formulation F and C are fixed. this class of solutions must be further constrained by making assumptions about the actual structure of the world. the ratio of priors. This is the second phase: the inference of observer independent structure. In a Bayesian framework these assumptions are introduced as the prior probability p(P | C). namely the posterior probability. on ratios of probabilities. the ‘‘most valid’’ interpretation is the one in which we chose the configurations that minimize the size of the expression. is called the measurement likelihood condition and Rprior. where L. CÞ pðP. In the shape from shading domain. pðF j  P. In order to arrive at a probabilistic inference of a single solution. CÞ Á . CÞ which can be simply written as Rpost ¼ L:Rprior . this can be rewritten as Rpost ¼ pðF j P. CÞ The perceptual system infers the property P which generates the largest value of Rpost. i. because the ratio is an expression of how ‘‘generic’’ the property P is in C. CÞ : pð P j F. In the case of the shape from shading model of Ikeiuchi and Horn (1981). the most valid interpretation is the one that maximizes the size of the expression. in effect.

432 DHANRAJ VISHWANATH Jepson and Richards now consider the perceptual inference of a simple image of a ‘‘V" shown in Figure 7. . note that e is akin to the image noise parameter k in the Ikeiuchi and Horn model) pðF j  P. CÞ ¼ e2 Figure 7. From this we seek to infer the structure P in the environment: two sticks in 3D space attached at one end. and normalizing the entire image to unit area. Obviously. the probability of observing F in the image is equal to the area included within some area of ‘‘e’’ (where e is arbitrarily small dimension that can be thought of as the resolution of the image. The image ‘‘feature’’ or property (F) is the co-termination (within some resolution e) of the two lines in the image. F holds when P is true. From Jepson and Richards (1992). namely pðF j P. CÞ ¼ 1: If P does not hold in the world.

in a probabilistic sense we would be guaranteed to be wrong. for the final posterior ratio we have 1 Rpost ¼ 2 Â e3 << 1: e (again. since Rpost is a small number less than 1.THE EPISTEMOLOGY OF VISION 433 From this we see that the Likelihood Ratio is a very large number (assuming some reasonable degree of resolution. CÞ e3  e3 : ¼ pð P. as Richards and Jepson correctly inform us. e is suitably small) L¼ pðF j P. This implies that the inference of P from F is actually not supported. However. If we assume a random world (and normalizing the environmental context to unit volume) then the probability of P in that world (two ‘‘sticks’’ co-terminating in 3-D) will be proportional to e3. CÞ e This would lead one to think that inferring P from F would be reasonable. This is precisely what we came up against in the Ikieuchi and Horn model. which represents a discrete probabil- . assuming a reasonable resolution e). In order to achieve a singular solution. CÞ 1 À e3 So combining. Since without any surface constraints. we saw in the Ikieuchi and Horn model the need to make some assumptions about the behavior of entities in the world.e. pðPÞ ¼ e << 1: pðFÞ Or. in a random world it is actually guaranteed to be wrong from a probabilistic standpoint. i. In the present example we need to make Rpost arbitrary large for ‘‘V type’’ configurations in the world (say some value d less than 1. CÞ 1 ¼ 2 >> 1: pðF j  P. pðP j FÞ ¼ pðFÞ ¼ e2 . Thus we have pðPÞ ¼ e3 . the set of possible orientations is uniformly distributed over the respective ‘‘iso-luminance’’ contour in gradient space. in terms of the prior ratios Rprior ¼ pðP. If we assumed a particular orientation of a surface patch for a particular image illuminance (based on our observer–environment analysis).

the inference to P is inexorable. 48). Indeed they propose that model of the world (C) may be considered to be just a collection of modes. the inference to P is certain to occur. In the Bayesian framework we would say that what we have done is assumed a ‘‘model’’ environment where the property P is generic (namely that such a property has an arbitrary non-zero probability mass in the probability distribution space of possible configurations). Then we get Rpost ¼ 1 Â d >> 1: e2 (with the reasonable assumption that e is at least as small a fraction as the probability mass d).434 DHANRAJ VISHWANATH ity mass). Now. Rpost is very large and so supports the inference to P. Then the inference is guaranteed to be correct. So once again in a Bayesian inference device. many other similarities abound between Bayesian Models and the standard computational vision models. the likelihood ratio is assumed to be infinite. Thus the prior component when configured with an infinite likelihood ratio ensures a deterministic rather than a probabilistic inference from F to P. i. In other words given the feature in the image. or assumptions about the world. just like the assumption that surface patches are connected smoothly in the Horn model for the shape from shading domain. This could be thought of as a prior for connectivity. connectivity. Richards and Jepson suggest that there must exist a framework to reduce the prior distribution to a single mode that can be selected for a perceptual inference. Given a positive mode in the prior distribution for property P in C. the probability of the image feature F. their connection has become a necessary one. regularity assumption or generic viewpoint assumptions). is the model of the world (C) that the perceiver must assume in making the inference.e. the image and percept no longer are distinct. Indeed. In such a framework. For example some have suggested a variation on prior distribution models called . These priors are assumptions about structure in the world (right angles. given the presence of property P is infinite small (see expression on p. This assumption of a non-zero probability masses in the distribution. Since our perceptual judgments are categorical. The notion of having to assume priors in order to constrain observer-independent properties is directly related to the regularity constraints for surfaces in the surface shape from shading models. that are presumably hardwired into the observer.

THE EPISTEMOLOGY OF VISION

435

competitive priors where the observer-independent environment constraint is not limited to a single prior distribution but to a set of competitive prior distributions (Yuille and Bulthoff, 1996). This is equivalent to what is captured by the weighting parameters in the multiple component standard computational vision models of Terzopoulos (1983), the weighting parameters adjusted based on the relative significance and cost of each of the environmental assumptions e.g. boundary conditions, criteria for breaking surfaces etc.). Thus, from the standpoint of understanding perception as a problem in inference, Bayesian models of inference are not essentially distinct from standard computational vision models. Though Bayesian analysis provides purchase in those areas where the probabilistic nature of the input is relevant, e.g. estimation of spatial properties, and is also useful in applications for artificial inference devices (as are the standard computational models), it is unclear what the redefinition of the problem of inference in a probabilistic domain contributes. As we have seen, the models though initially construed in terms of probabilities, have to be configured such that the inference given an image is deterministic on that image, and the original formulations based on stochastic models of image sampling or distributions of properties in the environment become moot. It should be noted however that in Bayesian models of spatial estimation, metric estimates of the scene are indeed genuinely probabilistic, but those models have to be carefully distinguished from the one discussed here, because such models are ultimately about sensor co-calibration, and not perceptual inference. Several studies have provided empirical evidence showing that estimates of metric properties of the sensorium can be well modeled by Bayesian statistics, and shown to behave like maximum likelihood estimators (Landy et al., 1995; Ernst and Banks, 2002; Alais and Burr, 2004; Hillis et al., 2002). It has to be noted, however, that the most successful of these Bayesian studies have been exactly the ones that have not attempted to apply the theory to shape recovery or inference. They have not attempted an application of Bayesian theory to so-called ‘‘mid-level’’ vision, e.g. recovery of surface brightness, surface shape, or for ecological statistics approaches to deriving or explaining Gestalt ‘‘grouping’’ phenomena. Again, approaches such as ecological statistics can prove effective in providing compact descriptions of the external environment for use in artificial applications, such as object recognition, line detectors, etc.; or to establish

436

DHANRAJ VISHWANATH

the statistical relationships between metric environmental properties and properties of the perceptual structures they give rise to. This latter type of analysis provides precisely the kind of useful synthetic knowledge that forward-optics analysis in the shape from shading, or shape from texture domains in standard computational vision provides. But they do not directly address the nature of perceptual representation, and thus what the information format of our percepts is. We discuss this further in part 5. 3.5. The problem of induction in any computational model of learning The central problem with empiricism, evident in the models presented above, has cropped up in many domains not the least of which have been in efforts to develop artificial inferential devices, e.g. machine-learning algorithms. Machine-learning research has repeatedly shown that ‘‘induction’’ is not possible in concept learning space without the introduction of an inductive bias, in order to constrain the search space (see Pratt, 1994). The most critical point is not that the system has to have a bias, but that once such a bias has been chosen, the rule that will be ‘‘induced’’ via learning is already determined by the nature of the training examples. The rule is already a property of the image set given the inductive bias. In other words, given the computational procedure, the training image set and the computed rule have a necessary (and not contingent) connection, just as the inferred 3D interpretation achieved by applying a fixed transformation on a 2D image has a necessary connection to that image itself. Namely, it is no longer an inductive solution but rather a deductive one. And to boot, there is no independent objective basis to determine whether one inductive bias is better than another. (see Pratt, 1994; Mitchell, 1997). The requirement to assume prior distributions in Bayesian models, or regularity assumptions in standard computational vision models, is precisely this sort of selection of inferential bias. In perception literature the bias that is claimed to be the right one, is naturally the one that explains the data. It leads to the implication that the solution is already determined in the image, rather than being an inference to external environment. This confound between the need to define the image and the inferred external world as contingently separate, while at the same time attempting to understand how the latter may be inferred from

THE EPISTEMOLOGY OF VISION

437

the former, without falling into a naı¨ ve realist trap, has not been lost on all researchers in the field as is evident from the following quotes from two prominent Bayesian theorists:
‘‘...uncertainties associated with how closely any chosen model for a world fact matches Mother Nature’s model....requires measures on knowledge which may not be explicit rule-based ‘‘knowing’’...It is not enough to simply to give ...a probabilistic structure and then to incorporate them into theoretical frameworks in a like manner... We must know specifically just how the structure of our cognitive models (and their underlying assumptions) force particular conclusions about the world.’’ (Richards, 1996, p. 227; my italics) Is there nevertheless an observer-independent world out there? I think so....But does this observer independent world resemble what I see, hear, feel or smell? That is more than I can know. But I suspect it does not’’ (Hoffman, 1996, p. 220)

However the next statement by Hoffman indicates how difficult indeed it is to shake loose of the notion of perception as inference:
‘‘In sum, for good Bayesian analysis we need appropriate priors (and likelihoods). But when we look we see only our posteriors. What are we to do? Well, what we in fact do is to fabricate those priors (and likelihoods), which best square with our posteriors. And we are happy when the three are finally consistent... ‘‘ (Hoffman, 1996, p. 220; my italics)

But Hoffman does not go on to tell us that fabricating these priors and likelihoods by squaring them with our posteriors (e.g. choosing the delta probability mass for ‘‘connectedness’’ in the example above) results in a model where the image and the perceived external world lose their distinction, and one where the inferential process is reduced to a naı¨ ve-realist one. It does so because generating the right posterior depends on choosing the right prior and likelihood function, which themselves are based on that very posterior. I.e. the priors and likelihoods are themselves meaningless without being defined in terms of the posterior they are supposed to generate. Unfortunately, despite the careful and thorough analysis of perception by Hering, Mach, and later the Gestaltists, the appeal of the Helmholzian and Gibsonian approach, that maintain an intrinsic naı¨ ve-realism, has unwittingly discouraged a more epistemologically sound approach to the study of why our perceptions force particular conclusions about the world. Naı¨ ve realism in all it’s guises is so potent an idea, that a concerted effort has to be made to recognize it, lest sophistication of analyses provides false cover.

438 4.

DHANRAJ VISHWANATH GESTALT THEORY AND GIBSON

Two approaches of the last century that resisted the constructivist approach to perception championed by the Helmholtz school were those of the Gestalt School and the psychologist J.J. Gibson. Gibson himself was influenced by the Gestalt school and drew heavily from their insights, but eventually sought to distance himself from their approach. We discuss here briefly how these two approaches bear on the analysis of computational theories provided in the previous section. 4.1. Gestalt theory and it’s current popularity There has been a great resurgence of interest in Gestalt theory in last two decades as researchers have begun to realize that traditional approaches in computational vision, such as those proposed by Marr (1982), and neurophysiological theories aligned with Hubel and Weisel’s model of cortical detectors, have failed to deliver everything they promised. (See for example, Westheimer (1999), Nakayama and Shimojo (1995)). There is a move to codify and quantify Gestalt theory within the general framework of Neoconstructivism. The popularity of a sub-field called perceptual organization or grouping, suggest that Gestalt theory is alive and well and enjoys wide application. But that may be too hasty a conclusion; because on closer inspection it is precisely those things that the Gestaltists held to be of paramount importance – the ontological and epistemological concerns that underlies their formulation – that are appear to be most often sidestepped. Meanwhile, the observed effects in perception such the Gestalt grouping principles, and the contextual effects of the whole on the part, have come to constitute the entirety of the modern notion of Gestalt. In a sense, the current interpretation of Gestalt is probably what Gibson would have ended up with were it not for his visceral distaste for Gestalt notion of grouping that the following quote betrays:
‘‘Wertheimer’s drawings were nonsense patterns of the extreme type, far removed from the images of a material world.’’ (Gibson, 1950, p. 196).

The idea is that the grouping principles we observe in our percepts are derived from the nature of grouping of elements (say line segments) that obtain statistically in a set of images of an objective external world (environment). Added to this is the generic rule of grouping that says: ‘‘the parts is defined by its relation to the whole’’. Palmer. The end product of this process is a set of 2D entities (which resolve ambiguities of figure/ground. local’’ processes (see for example. etc. Sekuler et al. 1999). connectivity.. the actual elementary units in the retinal image. 1986. This has led in recent times to empirical and quantitative analysis of what has become known as ‘‘global vs.g. say line segments or points. That is. The organizational rules that can take us from the retinal atoms to the perceived 3D space are ones that explain the observed phenomenology of object perception. Elder and Goldberg.) and where the final 3D interpretation (disposition and shape of objects) is constrained by the various cues available (e. Geisler et al. the grouping rules are themselves assumed to be instantiated in the organization of the external world or objective images of the external world. 1994). such as detected line segments being grouped into contours. occlusion.THE EPISTEMOLOGY OF VISION 439 4. grouping principles that the visual system uses to ‘‘bind’’ elementary units available at early stages of visual processing can . The application of grouping processes on elementary visual units gives rise to observed groupings. 2002. closer objects appear to group together (for a description of Gestalt rules see Sekuler and Blake. where the grouping principles are derived from statistical analysis of the distribution of properties or elements in photographs of natural scenes (see. have a particular organizational structure due to the special nature of the external environment. or process used by the visual system to recover the shape of objects in the world from elementary units available early in the visual stream. for example. binocular disparity). Grouping principles The common interpretation of Gestalt theory is that is specifies a set of rules. and to a whole range of empirically observed ‘‘configural’’ effects. This idea has developed into an approach called ecological statistics. and Gestalt rules of grouping are the visual systems learned instantiations of that structure. In other words. 2001). In some recent approaches to grouping. heuristics. particularly in the lightness and brightness perception domain..2.g. e.

The Gestalt argument about . The vertical grouping constitutes a qualitative content that is missing in the ‘‘raw’’ objects before us. perceptually we naturally group the dots into vertical lines. it still does not explain the grouping phenomena itself. etc) given in some objective description and arrived at by the appropriate application of Gestalt rules.440 DHANRAJ VISHWANATH be understood by analyzing sets of images produced of an objective external ecology (environment)18. The grouping principles are considered to be effective because they end up detecting things that are out there in the world in the way we end up seeing them. Once again. since an external criteria that says more distant dots should be grouped is equally valid. etc. the final product is a neural code that signals the existence of a real external property or behavior (closure. In typical computational renderings of such theories. there is no objective external criteria that specifies that we must group them as such. or even the objects given to us in perception. grouping principles are assumed to be rules or heuristics used by the visual system to ‘‘group’’ and then detect or identify in the image. yet nothing in the stimulus itself – or for that matter the perceptual objects before us (the dots) – provides information that the vertically aligned dots should be seen as related. In the classic example shown in Figure 4. shortest distance) but that such a perceptual thing as grouping even occurs. Even if we uncover some basic quantitative rules that specify under what conditions certain groupings occur. These qualities are not re-presentations of the selfsame qualities in the external world because there are no such qualities that can be defined on the external world independent of perception. lines. continuation. objects. It is not so much that grouping seems to follow certain rules (e. For the Gestaltists. Note. In other words there are qualities (or informational content) that are purely in the perceptual domain and cannot be attributed to the objective external world.g. But this is an interpretation that is precisely what the grouping principles appear to have been arguing against. The primary purpose was to show that our perceptions over specify any descriptions that we might assign ‘‘objectively’’ to the percept or the retinal image. even though it is the case that the measured distance between dots in vertical column are smaller. curves. surfaces. the grouping principles were a description of the nature of the spontaneous organization of elements that can be observed in a percept despite the fact that no indication of that organization is present in any objective spatial description of the ‘‘raw’’ stimulation.

in some objective descriptive mode. It is not a re-presentation of real entities out in the world. pointing out. Psychophysical isomorphism allows us to explain the fact that we do indeed experience the perceptual world directly. These two considerations lead to what one might claim are the two most important ideas at the core of Gestalt theory: (1) Perception is the presentation19 of an organized image. by definition. Enumerating or identifying ‘‘rules’’ of organization is just a descriptive exercise on the way to understanding what information is contained in the organized percept. and is also perceived to be twice as long. rather than experience a symbolic representation of a real world. . that information exists at the perceptual level that cannot be explained in an inferential model of perception (Figure 8).THE EPISTEMOLOGY OF VISION 441 observable rules of grouping is thus ultimately an epistemological one. (2) Psychophysical isomorphism: the organized percept is isomorphic to the ‘‘state’’ of the neural circuitry in response to the image. The scientific problem of perception is to determine objective principles independent of the perceived image that can be Figure 8. This very important idea provides a basis for why we experience our perception in exactly the same terms that we describe their content. The Gestalt ‘‘rules’’ of grouping are merely indications of how one might begin to construct a description of image organization to understand the informational content of the percept. A line 4 ft long is measured to be twice the length of a line 2 ft long. is achieved through an organization that takes into account the entire image. as Leyton (1992) has vividly demonstrated. The percept.

shading. directed by a rigid behaviorist and naı¨ ve realist ethic.. etc. and replaces it with the novel idea that perception can be explained by examining the invariants between the visual field (retinal image) and the external visual world (Gibson. that percepts are nothing more than the state of the organismic entities. Gibson proposed that . 1950. this second principle was promoted most vehemently by Kohler. The problem of how it occurs can then be tackled by empirical research in neurophysiology. He provided a novel analysis of sets of gradients that can be defined on surface properties such as texture.. Note that from an epistemological point of view there is nothing substantively different between his notion of cues and the ones he starts out riling against in traditional space perception approaches. 4. In the Berlin school of Gestaltist.. which had to wait over half a century to be finally empirically vindicated. The essential difference is that while the older space perception model suggested that ‘‘cues’’ could be used to infer distance and direction of points in space. my italics) Gibson approach is a strange combination of Gestalt intuitions with an aversion to Helmholtzian inferential models.3. The question is not how the percept gets organized but why it is always organized like the particular entity toward which the eye happens to be pointing’’ (Gibson.20 He starts out by criticizing a ‘‘cue’’ based inferential model on the basis that it is not explicative of the perceptual process or experience. It was this thesis that allowed him to develop the opponent process model of color. 1947). p. This principle is also central to Hering’s thinking. as measures defined over the extent of surfaces rather than the classical notion as measures over points in space. In the modern era the only quantitative theory of perception that has incorporated both these crucial Gestalt ideas has been Leyton’s theory of perception (1992).’’ (Kohler. Gibson’s direct perception ‘‘the characteristic of perception is that the result is not so much spontaneous as it is faithful to the thing perceived. 25. Gibson essentially redefines the nature of the term ‘‘cues’’. As he remarks ‘‘this last application of the principle (of isomorphism) has perhaps the greatest importance for Gestalt psychology.442 DHANRAJ VISHWANATH brought to bear in explaining what is occurring in such an organization. 1950).

THE EPISTEMOLOGY OF VISION 443 ‘‘gradients’’ in the image contain information about the external layout of surfaces. can be done without even considering the workings of the human brain! Gibson’s claim that the ‘‘total stimulation contains all that is needed for perception’’ is on closer inspection essentially an admission that there is no informational distinction between the 2D retinal image and the perceived 3D visual world in an invariants-based approach. As we saw in the previous section. an idea that rightfully has had an enormous impact on computational approaches to vision. and that the visual field is essentially a description of the retinal pattern of stimulation. His analysis of invariants then reduces to a description of forward optics in the external world (see Figure 11) given in terms of certain pre-specified measures that he calls gradients. The novel idea is that space perception be thought of a process of inference of surface layout. First. seems to have a phenomenal reality to it. Yet he ends up concluding that the percept is essentially a copy of the external world. . there is both a novel idea. a conclusion consistent with the inferential model of perception. He starts off with the idea that we should examine directly the percept and it’s correlate. suggesting that the analysis can ignore the external world and the actual physical stimulation. There is also something prescient and phenomenologically correct with respect to his claim about surfaces from an ontological standpoint. we see as a surface. This of course. and a sleight of hand. as he rightfully states at the outset. the visual field. a view sympathetic to Gestalt principles. any computational rendering of perception as inference results in the erasure of the informational or causal distinction that such models start out with. as we have seen in the previous section. Second. rather than the inference of point layout. Here. and thus inadvertently confirms that the transformation from 2D image to 3D percept under an inference model does not tell us anything about perceptual processes. but just describes relationships between two ways in which the perceptual product can be measured. 1996). Gibson’s sleight of hand is in packaging his idea of surface perception as somehow fundamentally distinct from the inferential model of space perception that preceded it. a range of empirical results in human discriminative capacities has supported the primacy of surfaces as perceptual entities (see Nakayama and Shimojo. Gibson’s notion that anything we see.

Gibson was keenly aware that this was problematic: a re-presentation cannot provide an experience of the thing it is representing. Gibson’s Gestalt roots and his exceptional phenomenological sense did not allow him to abandon experience. (2) One that reduces to the naı¨ ve-realist camera-like model where the external world is directly painted onto the mind (Figure 9). Conjoining this distaste for a phenomenal world with his fearless embrace of naı¨ ve realism.444 DHANRAJ VISHWANATH Figure 9. as much as the inference of the passage of a bear by paw patterns constitutes experience of the bear passage. He correctly saw that in a schema where the perceptual inferences are supposed to be describing an objective external world. in his later work. must have seen that the inexorable conclusion of his theory of invariants/gradients was one of two mutual exclusive models: (1) an inferential model in which the percept is a re-presentation and so cannot be experienced. that neural code cannot appear to us as the thing itself. because of the nagging reality of perceptual . Gibson. In a scheme where the neural code is merely a symbolic code for the existence of a real external property or behavior. lead him to the almost evangelical notion of direct perception. that description would have to be a re-presentation. Perception as inference. Yet at the same time Gibson was apparently unable to rid his distaste for the merely ‘‘phenomenal world’’ of the Gestaltists. Though Gibson realizes that he would rather accept conclusion (2) above.

However. It is precisely because standard computational vision and Bayesian approaches have attained such a foothold that there needs to be recognition that they are not viable as models for the explanation of perceptual representation. Gibson. As asserted before.THE EPISTEMOLOGY OF VISION 445 experience. Table 1). But the obscurity of the notion of direct perception. but only how we make judgments about the world. 5. a perceptual theory where perceptual experience is brushed under the rug.21 Neoconstructivist theories have. he achieves a model where the perceptual state is a direct experience of the external world. both approaches are valid for a range of basic research in sensory physiology as well as in the development of artificial systems (object recognition. 1950 The analysis we have provided in part 3 is meant to highlight shortcomings of two approaches in computational vision only insofar as they apply to research in perceptual inference or perceptual representation. he also realizes a theory of perception must be something more than just a mere camera (with its associated homunculus). etc. both of which are quantitative instantiations of Gibsonian and . Bayesian approaches and standard computational vision approaches to perceptual inference.). SHORTCOMINGS OF NEOCONSTRUCTIVISM [A] theory of cues could never really explain how we see the world. So he develops what came to be seen as the rather obscure notion of ‘‘direct pickup’’ where the objective information in the external world is directly picked up by the senses. and indeed some have advocated this crossover. is what presumably won it disfavor in modern perceptual science. tended to accept conclusion (1) above. or why it looks the way it does. and where the invariants (gradients) in the percept directly reflect the invariants (gradients) in the world. image processing. a considerable amount of this work has crossed over into the research program that is directed at understanding how the visual system constructs and encodes a percept (column B. for better or worse. and by introducing a new piece of terminology ‘‘direct perception’’ he skillfully brushes the camera and homunculus under the carpet. robotics. Thus. and it’s obvious resemblance to the camera-like model.

In Neoconstructivism. Figure 9 shows the standard Neoconstructivist model of perception as inference. so that a 3D description can be inferred from the 2D image. The 2D image is considered to be an entity distinct from the internal visual processes and the 3D percept they give rise to. perception is the inversion of this optical process. the quantitative description of forward optics is assumed to be an objective description independent of perception. The crucial point is that the 3D description that we arrive in our inference at has the same objective description that we can apply to the objective external world. Gibsonian naı¨ ve realism. we place a homuncular ‘‘direct perceptor’’ in between the external and psychological domain. In both the Gibsonian and Neoconstructivist inferential model. the main reason it is rejected is because it clearly involves an all-knowing homunculus who has to interpret the image created by the camera. it requires that the perceptual process have built in biases to constrain the inversion.446 DHANRAJ VISHWANATH Figure 10. Certain aspects of the perceptual . share the same set of problems. Helmholzian versions of perceptual inference. Usually. Figure 10 shows Gibson’s model. Since this inductive process is underdetermined. which are summarized here. The image formation process is assumed to occur – and to have an objective description – external to the perceptual system. the main difference being that since Gibson does not want any representational or inferential schema to be specified. Figure 1 showed the naı¨ ve realist camera model of perception that is typically rejected in the introduction to texts on perception. which is essentially a combination of Figures 1 and 9.

distance. independent of perception. are not thought to necessarily reflect properties of the external world under the same descriptive mode (the percept of a color is hard to describe quantitatively.THE EPISTEMOLOGY OF VISION 447 description. 1999). wavelength of impinging light. as Kant would say. and to have objective informational content. But. can be described spatio-temporally). The goal of the non-perceptual sciences is to understand. Rather. or the information content of percepts.) is a valid scientific endeavor that describes functional relationships between entities in the external world in terms of our perception of them. or more . but it’s external correlate. surfaces. etc. In the same way. Thus. under the same geometric descriptive mode. forward optics does not provide any knowledge of the perceptual representational structure. or motor-perceptual extensions of it. because it is only the most important aspect of the percept (the spatial description) that is thought to directly reflect properties of the external world. and modes of measurement specified by our extended perception (perception+measurement) we label as light rays.g. Instead it can be considered to be only weakly naı¨ ve realist. distance between image elements. ideal observers. what regularities hold in that world as described by our perception. As long as we are trying to attain synthetic knowledge of the external world (as in all other sciences) they can be considered to be objective in the same way as an electron orbit around an atom can be considered objective. orientation.g. This is because all the sciences can be agnostic (for the most part) as to the source of concepts such as objects. Note that both the term ‘‘line’’ and the term ‘‘length’’ are considered to be objective geometric descriptors. persistence over time. the sensation of color. etc. that percept is veridical when there exists a line that is X meters long in the world. which in terms of the ontology. Thus. Are these attributes. e.22 One might argue therefore that Neoconstructivism does not hold to a purely naı¨ ve realist view of this process. wavelength. understanding forward optics (likelihood functions. forward optics gives us synthetic knowledge of the invariant relations among entities in the external world. intensity of impinging light. size. in this model. measures and entities correctly thought of as objective descriptors? Yes and no. e. (Later we will explain how the notion of external measurement is merely an extension of perception. the 2D image is assumed to have an objective description independent of perception. etc. as has been most clearly pointed out by Leyton (1992. given our perceptual window into the world. measurement. attributes. when I see a line that is X meters long.

Such an assumption is fine for computer vision systems. ‘‘collinearity’’ of lines. In other words.) This knowledge that we gain from understanding forward optics is useful since it can be used to determine how the sensory systems calibrate. The problem is that the atoms and regularities are opportunistically defined based on how well they fit with the perceptual product. is the problematic assumption made by Neoconstructivist-type theories that the percept is descriptively equivalent to some actual external object. the validity of assuming persistence over time.448 DHANRAJ VISHWANATH correctly visuo-motor action. Yet the atoms and regularities are assumed in these theories to have an objective external basis. Yet. and what Leyton has more explicitly and quantitatively pointed out.g. it’s description given spatially. e. or to understand the physiology of sensory system. Fechner. Hering. Furthermore. The lack of . there is a direct lineage of breakthroughs in modern physics to early perceptual science through Mach. astoundingly. Inferential theories allow for the description to be erroneous or sparser. surfaces. that there are three receptors with differential sensitivity to wavelength. in contemporary perceptual science some of the very assumptions are taken as objective! What Gestalt theory suggested. but in no way to contain more information. regularities of the external world are captured in terms of the same descriptive atoms that are identified in the perceptual product – e.24 According to Neoconstructivist theories the informational content of the percept of a line (for example). they have pointed out the flaw in the assumption (of inferential theories) that there is nothing descriptively present in the percept that constitutes more information than what is available in the external object itself.23 In modern physics some of the very assumptions about modes of measurement. Indeed. ‘‘parallelism’’ of lines. ‘‘isotropy’’ of texture elements etc. But the visual system itself does not have an external interpreter telling it what is objective and what is not. lines etc. have already been explicitly challenged and incorporated into modern theories of physics. because there is no other external basis to determine what constitutes a regularity or an atom or a property in the external world apart from falling back on the phenomenological senses of the computer scientist that implicitly rely on the perceptual differentiation of things like objects. is parasitic on the objective informational content that the real object that constitutes a line in the external world posseses. all the way to Kant.g. given in the same type of spatial description.

notions such as slant. Taking this approach has the simultaneous effect of banishing the homunculus as well as rendering moot the question of what descriptive moles the percept is based on. 5. In this approach. but there is no claim that such a quantity is actually represented in the system in the way the quantitative theory specifies. or what constitutes a special property over that set. Shape as calibration map One way out of the problems just characterized. One no longer needs to make a claim about whether the descriptive moles and measures are objective properties of the external world. The ultimate question in such a model becomes: are the perceived locations . The quantities themselves. a percept can be thought of simply as being that which will quantitatively predict particular behavior (motor response or perceptual judgment). don’t need to be ‘‘experienced’’ per se.1. or otherwise. while staying within an inverse optics framework is to take a strong behaviorist position and be explicitly agnostic to the existence of perceptual representations. One might use a quantity such as horizontal disparity under this model. implies nothing more than that that reported estimate can quantitatively predict what the magnitude of a related motor action will be. such as perceived surface curvature. In such a model there is no need to discuss what the informational content of either the perceptual representation or the actual object in the world is. and can be thrown out once that process is understood. Instead. or what their information content might be. in the visual system.THE EPISTEMOLOGY OF VISION 449 an objective basis for defining how to parse a given data set. because they are defined solely for the purposes of investigating sensory co-calibration. They can be thought of as merely useful quantitative devices used by the researcher to relate the perceptual estimates and the predicted motor action or measurement. through a discrimination or magnitude estimation paradigm. Such quantities are merely operationalized variables that allow us to determine how and if sensory motor co-calibration occurs. or predict the magnitude of an external motor measurement of the object giving rise to the stimulation. A report of a perceptual quantity (say length or slant). 1969). or merely perceptual constructs. was famously illustrated in pattern recognition research in Watanabe’s ‘‘Ugly Duckling Theorem’’ (Watanabe. or other constitutive properties don’t need to be thought of as being represented symbolically.

a map of distance and direction of a set of points with respect to some frame of reference. slant. Perception as sensory co-calibration. etc. By avoiding the issue of representation. that are operationalized by the researcher in order to study the co-calibration. this restrictive version of perception as inverse optics is very effective in understanding basic issues in 3D space perception. It gets around some of the problems of an inferential version of Neoconstructivism. perceptual space or perceptual shape can be thought of simply as a calibration map. the only information that is required under this model is the specification of their distance and direction in some co-ordinate frame. and under what conditions... that can be used for motor localization. which eventually has to define how the various aspects of the percept are symbolically represented. and which behavior can in turn change. curvature. collinearity. binocular vision etc. and by focusing only on calibration space. So. where the goal is to determine . and what their epistemological source is. such as size. this is a limited but viable application of the notion of perception as inverse optics. along with higher order quantities that are based on these. For a collection of points in space that constitute an object. In a very important sense.450 DHANRAJ VISHWANATH Figure 11. of points in space consistent with measured locations. This approach is illustrated in Figure 11: which shows the output of the perceptual system to be estimates of metric environmental properties on which behavior can be based.

and whether estimates of spatial properties (in terms of perceptual descriptors such as distance. . size.g. be they ‘‘cues’’ defined on the retinal image. Figure 12 shows an extension of the calibration model of perception that specifies other quantitative and qualitative measures that the perceptual system might estimate and explicitly represent. Additionally. since strictly speaking. This model is essentially the core Neoconstructivist model and it is this one that suffers from the problems we have been addressing Figure 12. A more correct diagram that shows the true locus of calibration approaches in shown in Figure 17. e.THE EPISTEMOLOGY OF VISION 451 how various sources of information are co-calibrated. Note however that Figure 11 is not a correct representation of the calibration model. there should be no external world in a calibration model. Representational Neoconstructivism: perception as calibration and recognition in an inverse-optics paradigm. etc) are consistent with a motor test of the external environment (we will later define measurement as nothing more than an ‘‘extended’’ motor test on the sensorium). such an approach is adequate for many aspects of neurophysiology. The model operates only at the level of comparing measures on sensory responses. or tactile and proprioceptive feedback. finding out which pathways of the visual system are binocularly coded.

452 DHANRAJ VISHWANATH thus far. Repeatability and predictability of motor-action-as-measurement is . motor action is an implicit measurement test operating over the perceptual representation to determine whether the representation is coherent and predictive. depth. usually. measurement is better thought of as nothing more than an extension of motor action.2. Measurement as visuo-motor action There has been a conflation in representational versions of Neoconstructivism of what might be called motor measurement space with perceptual representation. persistence of objects over time. Instead. attributes and information content of perception that include notions such as distance. distance. perceptual representation is considered to be nothing more than a collection of estimates of metric properties of the visual scene. In that sense. When we apply a motor action within our perceptual sensorium. it is correct to say that that the visual system has implicit spatial estimates of properties of the visual scene. curvature etc. in other words the measurements are of the perceptual sensorium and not objective measures of the external world. Our percepts seem vivid and real because. When the domain is geometric shape or space the properties that are perceptually represented are usually considered to be spatial quantities such as length. but a more careful epistemology would reveal that those estimates are specified within the terms of the representational structures of perception. or damage to the perceptual/motor plant. Our percepts of objects seem vivid and real because they coincide with our measurement of them. they are meaningless in the same way that the idea of motion is meaningless without the assumption of persistence over time. attributes. proprioception. erroneous assumptions. these measurements coincide with those values already predicted within the percept. 5. or informational content of the ‘‘quantities’’ and ‘‘qualities’’ being represented leads to the secondary problems that are described below. and Euclidean 3D space. Most deviations from this coincidence can usually be assigned to limitations in resources. Now.25 The notion of measurement as some independent test of the external environment is essentially meaningless outside of the structures. Under this conflation. the result (touch. not acknowledging the epistemic source of the descriptive moles. or a new visual layout) is essentially a measurement on that sensorium. In such representational models. orientation.

or laser interferometer. Measurement. are all merely extensions of motor action as measurement. we have to assume that it is only in terms of visuo-motor representational space. whether it involves a ruler. to calibrate) has usually been taken as evidence that supports empiricist theories. and should. than are our motor actions. Thus. The sensors and motor plant can recalibrate within the visuo-motor representation space when motor-actions-as-measurement are not predictive. When the reliability or predictability is compromised the system adapts to bring the percept and measurement space back into coincidence. Purves. Some of these examples involve cases where the visual system has putatively learned statistical regularities of certain spatial properties of the environment. Device based measurement is just an extension of such motor action. because it shows that learning can occur. micrometer screw gauge. which might have the effect of biasing otherwise inherently ambiguous spatial estimates toward expected values in the environment (e. Two erroneous conclusions come out of the conflation of measurement space with representation space in Neoconstructivism: (1) The implication that. when we say that the visual system has spatial estimates of the visual scene. Since some of these modes of calibration or recalibration appear to go beyond merely adjusting spatial estimates. this is seen as even more impressive evidence of learning perceptual representations and structures.g. since perceptual estimates of spatial properties are more or less matched by external measurement of the objects themselves.THE EPISTEMOLOGY OF VISION 453 what defines whether a perceptual spatial property estimate available to perception is reliable and ‘‘veridical’’. (2) The ability for the visuo-motor system to recalibrate (or in the case of neonates. in some important sense our perceptual representation reflects the objective spatial properties of objects in the world. is no less dependent on perceptual structure and attributes. that we have such estimates. 2003). The informational content derived from making such measurements using external devices. but instead a learned rule that biases the . In other examples the calibration might not even involve a metric adjustment. This in turn leads to the idea that objects in the world (as they actually exist) are descriptively much like the object’s description given by our percepts. be described by specifying a set of spatial measures and nothing more. Namely they can.

such as the one in Figure 13 (right panel). an example being the well-know shadow illusion shown in Figure 13. An example of naturally learned Figure 13. But on closer inspection. priming. Another example would be a Mooney-type figure. An example of the former is a degraded display of text which might have ambiguous interpretation at the level of the individual letters. for an interesting empirical analysis of this illusion). where no object is initially perceived.26 5. in both these examples the learning or calibration merely biases a spatial estimate toward a particular value. but which will nevertheless be recognized as a particular letter in a familiar word.454 DHANRAJ VISHWANATH overall perceived configuration when the percept is potentially ambiguous.3. but after being primed one sees the outline of a Dalmatian dog (see Van Tonder and Ejima. 2000. This ‘‘illusion’’ is usually attributed to a learned assumption that light impinges from above. or might imply natural learning of experientially common configurations. given a discrete or continuous set of ambiguous spatial estimates that are already available at the level of the percept. Object recognition as calibration Analogous to spatial calibration as a form of learning within the visuo-motor domain is the notion of learning in object recognition. . depending on the context of that word in a sentence. Recognition learning may manifest through exposure with particular stimulus sets. Object recognition – to be carefully distinguished from object representation – provides numerous examples where an otherwise ambiguous stimulus generates an unambiguous percept that is consistent with a learned stimulus configuration.

none of these phenomena indicate that the representational structure itself is being altered. since the learning. Why learning in sensory calibration cannot be used to ‘‘learn’’ perceptual representations There is sometimes the temptation to extend the notion of calibration-as-learning from the object recognition and spatial estimation domain to the perceptual representation domains. while useful and robust. learning in the shadow example only results in resolving an ambiguity in metric spatial structure.4. Note that the example we gave in Figure 14 (left panel). Such learning might be suggested as ways in which the visual system Figure 14. Experiential Neoconstructivism. the epistemic issues are moot. In all the cases of recognition learning as a form of calibration. . 5. but here one can also make a distinction between seeing coherent motion. there may be a component of such a spatial (or motion) ambiguity resolution.THE EPISTEMOLOGY OF VISION 455 recognition might be biological motion. from recognizing what that motion is. is occurring within a given perceptual representation. is better thought of in the spatial calibration domain rather than recognition. In biological motion. since unlike biological motion.

(This is precisely the kind of bias needed in the shape from shading domain discussed earlier). or rectangularity cannot be deduced by calibration in an inverse-optics model. no unitary percept. because such an explanation still does not explain how the particular . assumptions. because those very assumptions have to be already embedded in the perceptual structures that are being calibrated! One might suggest that a unitary percept is not required for the calibration to occur. and related illusions (Ames room. which mean that there is no unitary percept or set of percepts of the surfaces that can be used for the learning of rectilinearity constraint via calibration. one might say that touching the orthogonal surfaces provides the disambiguation to calibrate the visual percept. which in turn gives rise to rectangularity constraints. Kaniza triangle). in the example above. or set of percepts. But this is not workable either. For example. 2000)). or the tilted balcony illusion (Griffiths and Zaidi. can be achieved onto which a calibration process can act.g.456 DHANRAJ VISHWANATH might learn ‘‘generic’’ configurations of the environment that can be incorporated into the perceptual inference mechanism. without a rectangularity constraint. Without these constraints.g. This is because it is precisely these constraints (biases. In other words we have a circularity: co-calibration across senses is used to learn the very constraints that are required in order to infer a singular spatial structure onto which the calibration can be applied. because the calibration might involve a different sense that does have a unitary estimate. An example might be learning of the existence and ubiquity of continuous surfaces in the environment. e. Assumptions such as smoothness. the projection of two orthogonal surfaces specify an infinite class of possible surface configuration under inverse optics. might be cited in the tendency to perceive illusory surfaces even when there is no direct visual evidence in the image for such a surface (e. Another example might be the learning of the ubiquity of orthogonal surface relations through exposure to carpentered environments. priors) that are required in order for a unitary perceptual inference to occur (as we reviewed in the quantitative examples provided in part 3). and the incorporation of this learning into a bias to parse the image into continuous surface when possible. Evidence of such a bias. But there is a problem with the notion of learning constraints through calibration in a Neoconstructivist paradigm.

5. Some infant learning literature has explored at what age neonates might begin to show evidence of perceptual notions such as individuation. (see Spelke et al. But this work is more about determining when causal inferences might be measurably identified in neonates. assuming that the perceptual representation itself is generated using assumptions that are learned from the environment. left panel) the calibration merely resolves a metric ambiguity. uniformity.. and the empirically established mode of learning as a process of calibration. In none of the cases is any representational structure or mode of inference itself learned. As we saw in part 3. and instead pointing out how most empirically demonstrable examples of learning are more . They cannot tell us whether change in measured response is babies is due to ‘‘learning’’ the notion of causation. Crucially. In the case of recognition (Figure 13. This distinction is crucial! 5. get embedded in the perceptual system. Hopefully. object persistence. when there is no unitary ‘‘seen’’ percept available in the first place! So to summarize. the ‘‘calibration’’ merely resolves the ambiguity based on matching to some visual memory. causality etc. and less about how the causal notions or constraints might themselves be encoded at the level of perceptual representation.. or is merely evidence of the timeline of the calibration process that brings other perceptual apparatus online. 1995).THE EPISTEMOLOGY OF VISION 457 percept that is ‘‘seen’’ as being right angle is matched with the ‘‘felt’’ right angle. etc. In the case of the shadow example (Figure 13. leads us inexorably to a naı¨ ve-realist camera model of perception. learning via calibration merely adjusts the metric value of an estimate. How can simple to complex perceptual representations be possible without learning? We have been arguing against the possibility that learning can alter the nature of perceptual representation. causal connectedness. and indeed no empirical finding has ever established that learning of such aspects exists. etc. in the case of spatial estimation within a perceptual representation. apparatus that are required to support the sensory parsing of events into objects. right panel). learning through calibration has no bearing on how notions such as causality. this will make clear the distinction between the theoretically implausible mode of learning ‘‘rules’’ for perceptual representation. connectedness.

and most often discussed by both the Gestaltists and Gibson. while at the same time accepting the notion that the core aspects of perceptual representation cannot be learned. which in turn are more complex than that of a fly. 6. they do not explain the reality of the perceptual experience. The need for a theory of perception to be able to explain the realities of perceptual experience has been put forward through history by a range of vision scientists from Mach and Hering. where the image may be in any sensory domain. The first. EXPERIENCING THE PERCEPT The ecology is the minimal order in which image asymmetries can be changed back to symmetries Michael Leyton. either phylogenetically (the shadow example) or ontogenetically (the degraded letter example). Leyton’s theory proposes an answer to this dilemma by defining a perceptual representation schema based on basic rules for the conversion of image asymmetries to nested representations of perceived symmetries. and recently in a more systematic and quantitative way by Leyton.458 DHANRAJ VISHWANATH correctly thought of as a process of calibration. For our purposes we will distinguish three aspects of the perceptual experience. to the Gestalt movement. But if a mode of learning that alters perceptual representation is not plausible. recognition and learning and into the realm of perceptual representation within a Neoconstructivist model. 1999 When we get beyond the issues of calibration. It is intuitively clear that perceptual representations of the primate visual systems are more complex than that of the cat. how might one explain the different degrees of complexity in perceptual representation that are surely present across species. How. does one get a continuum of complexity in perceptual systems if the learning of perceptual representations is not viable. and symmetries are defined abstractly in terms of algebraic group structures. features and measurements of the . we find that a profound aspect of perception is entirely missing from such theories – namely. rather than having the impression that we only have a collection of symbols. one might ask. How can one account both for the fact that a continuum of complexity in perceptual systems exists. Gibson. is why we seem to experience the objects of our perceptions so directly.

Let us look at each separately and explain why Neoconstructivism cannot account for any of these three aspects. In the calibration model. surface continuity. 1987. orientation. 6. perceiving an aligned set of objects as constituting a group or meta-object). but beyond the information content that seems to reside in the very objects before us (for example. is that our percepts themselves seem to provide a visceral subjective judgment of the visual objects. Experiencing the percept in conventional theories of perception Previously we argued that the only viable version of the inverse optics model of perception is the restricted ‘‘calibration’’ model that is shown in Figure 11. rather than having the impression that our percept gives us indirect access to the objects and their properties via these symbolic codes. giving us indirect perceptual access to the object. as we discussed. the question how the percept is represented or might be experienced.THE EPISTEMOLOGY OF VISION 459 objects bound up in some hierarchy. 1986c. The second is the question – implicit in the Gestalt arguments on grouping and analyzed more explicitly by Leyton (1984. it leaves open. but is omnipresent for the artist and designer. not only beyond what is available in the 2D retinal image. is suitable for a range of problems in vision science that can remain agnostic to representational issues (column B. The third which may appear more subtle to the vision scientist. This model. and is thus not viable as a theory of perceptual representation. shape.1. part structure. the full Neoconstructivist model assumes that there are a whole range of quantities and attributes that might be derived from these sets of points representing the objects. distinct from any cognitive or experience based effect. Table 1). where these quantities and attributes are represented as neural symbols (Figure 12). No claims are made about the representation of any other quantities based on these spatial measures. 1986a. all we have in essence are distances and directions of sets of points (where distance. we instead . Since the only claim in this model is that a set of spatial estimates of points can be measured in the percept. including things like a relative size. 1992. Yet in real visual experience. As we discussed. deliberately. etc. 1986b. 2001) – of why it seems as though the informational content of the percept seems to go. direction and point are perceptual modes of description).

as Leyton has intimated in his theory. In order to achieve such a ‘‘direct’’ percept (in Gibson’s words) in the standard model of Neoconstructivism. Thus any viable model of perceptual representation would need to explain how the representational scheme itself embeds notions such as continuity and connectedness. it seems that the quantities and qualities don’t come anywhere near to exhausting the range of measures and attributes that we can arbitrarily assign to our percept of the object. variable part-whole structure. is not a description of a picture. etc. he still can’t draw out for us why we attain other rudimentary aspects of our percepts such as continuation behind an occluder.460 DHANRAJ VISHWANATH seem to have a direct vivid perceptual representation of the objects themselves. but the picture itself. no set of descriptors however complex can seem to exhaust. however complex a hierarchical structure they might be embedded in. Indeed. but instead seem to be derived from the percept. in other words. Notice this act of ‘‘drawing’’ by a homunculus – which captures our experience of the connectedness of the points. as even Gibson claims. We have a sense of the form of objects when. that are introduced ad hoc in computational models are in effect trying underwrite (see Leyton (1992) for a detailed analysis). What we see. groupings.. priors. reduces to what we might call a sophisticated naı¨ ve realist model (Figure 15). the continuity of surfaces made up by those points – is precisely what the regularity assumptions.. Indeed. . We are conscious of things like continuity of surfaces. or inferring the presence of points that are not visible. where such quantities don’t seem to be giving rise to the percept. which we have argued Neoconstructivism to be. we’d have to add a special kind of homuncular draftsman (shown in Figure 14) to look at and interpret the symbolic (neural) output of estimates and then ‘‘draw’’ them out them and color them in as objects onto the canvas of our conscious percepts. that just can not seem to be exhaustively captured by a set of symbolic codes. But even after we have such a ‘‘draftsman’’.27 Thus a representationalist model of perception as inference. etc. a valid perceptual representation must be able to draw itself out. such as in the self occluded portions of a solid object. the development of computational models that are designed for highly constrained environments usually stop right when issues such as occlusion rear their ugly head. In other words.

informationally speaking. the orientation of the line with respect to the observer is [theta.2. The descriptors we can assign it might includelike: A line is a continuous one dimensional entity that spans two points in 3D space. is spatial measurements. the current line exists between measured points A[x1. Under a naı¨ ve realist model (to which. Let us take the example of line again and look at the possible modes of information content of a line under the various incarnations of Neoconstructivism. indeed the most important one. The information content of the percept We have been making the point that in conventional theories of perception. particularly sensor co-calibration. length. Under a naı¨ ve realist model. y2. the descriptions that the line is given in our percept are exactly those objective descriptions that we give to the real line in the world (namely its position. phi]. rather than being an objective measure of the external world. all Neoconstructivist theories of perceptual representation reduce to) a line is. We have explained how such a view of perception is a useful but limited one for understanding some important aspect of visual processing. a key component of information that is normally ascribed to perceptual representation. alpha.THE EPISTEMOLOGY OF VISION 461 Figure 15. . a description of a ‘‘line-object’’ in the world. Sophisticated naı¨ ve realism. as we have seen. We have also tried to point out that the descriptors and modes of measurement specified in such models are inextricable linked to the nature of perceptual representation itself. etc. 6. orientation). z1] and B[x2. y1. z2].

etc. since these assumptions are already part of the perceptual mechanism in Neoconstructivist models. because such a theory is not concerned with entities other than points in space. the point estimates constitute all the content we can genuinely speak of. and an estimate of their direction and distance. In other words. But again. priors. In standard Neoconstructivist theory. But as pointed out by Gestalt theory and Leyton (1992). Thus the information content of the percept in a calibration model of perception is solely a metric estimate of the position of ‘‘points’’.462 DHANRAJ VISHWANATH In the restricted ‘‘calibration’’ version of inverse optics approaches. Though we might conjecture that certain other quantities might be derived from these measurements. Note that saying that the notion of continuity of surfaces is somehow encoded in memory. and any quantity derived from them by application of an internal transform. these are captured in the ‘‘built-in’’ assumptions that are referred to variously as gestalt grouping rules. in terms of our philosophical discussion earlier. their application onto the inferential process does not add any additional information content over and above the metric information in the percept as specified in the calibration model. such quantities (e. we return to the problem we .g. because they can be derived from the point estimates by mere application of a fixed transform that is already part of the interpretive perceptual apparatus. there seems to be available additional information content in the percept such as groupings. connectedness of points. the final percept would still need to be ‘‘read out’’ and ‘‘drawn out’’ onto the phenomenological canvas. biases. There is no need to posit an entity called a line within the representation. where these estimates are given in terms of the descriptive structures of perception (distance. slant) do not constitute additional information content in the percept from the standpoint of the calibration model. It is agnostic as to what that object might be.. there is no distinction between the measured spatial location of points. regularity assumption. because in Neoconstructivist theories. When one asks what the representational scheme of the drawn out percept is. uniformity assumptions. direction). such as curvature. the only informational content of the percept is the position in Euclidean co-ordinates of all measurable points that lie on that line with respect to the observer. Within calibration theory. genericity assumptions. heuristics. continuity of surfaces etc. just makes the problem circular. minima principles. slant. when we examine our phenomenology.

But our phenomenology. (precisely what Leyton’s theory achieves by specifying representational schema in terms of abstract algebraic structures). the transformational structure of a square (see Leyton. as well as the restricted calibration model. is by definition. is just a set of measurements. Again. This is probably the most subtle but crucial point of the analysis we have been through. parametric. This would imply that all non-metric information in any percept of an object is not part of the percept. the information content of the percept in both a full-fledged model of perception as inverse optics.THE EPISTEMOLOGY OF VISION 463 started out with. Perceiving a square reduced to checking if spatial measurements correlate with those of a square As we have seen. the set of points themselves contain no such information). 1992). Thus in the Neoconstructivist model. This suggests that that a very important requirement for any viable theory of perceptual representation is that the core of such a representation be ‘‘metric-free’’. solely metric. 1987). and not for those aspects that need to capture the nature of the representation of shape. like the ‘‘picture’’ of the square clearly points to measurement-free information content in the percept (see Leyton. . and invoked when the measurements of the collection of points tally with that of a square. That is why such models are viable for studies that are primarily investigating aspects of vision that can be captured purely by metric descriptions. connective and topological properties of a square. for example. but instead part of the perceptual apparatus (think of a list of x. this implies that the entire non-metric information content of the square is already part of the perceptual apparatus (so.g.3. The percept itself. 1992 and 2001)). in this schema. In Neoconstructivist theories such properties are assumed instead to be encoded in the hierarchical symbolic representation of the square built into the visual system. 1978) often called ‘‘geons’’ (Biederman. 6. y coordinates of a set of points that make a square compared to drawing of a square). In order to ascertain whether the set of co-ordinates constitute a square one needs an interpreter that already has knowledge of all the geometric. perceiving a square is reduced to checking if spatial measurements correlate with those of a square. or indeed as some have suggested. there could be ‘‘innate shape primitives’’ such as ‘‘square’’. volumetric ones (Marr and Nishihara. (e.

6. The problem of a stable percept One of the most vexing problems that constructivist theories generate is how the visual system maintains a stable percept of the world despite the constantly and rapidly shifting sensory stimulation brought about by eye movements. and thus stability across eye movements is not a critical problem (though his insistence on the stability as being due to the ‘‘external world’’ makes his position epistemologically untenable).464 DHANRAJ VISHWANATH In any perceptual representation. 1994). but instead a set of measures relating one aspect of the precept with others. Many solutions suggest that a sort of memory of the prior scene may be maintained across fixations. efferent or ‘‘outflow’’ signals of the eye-movement command may be coupled with this memory representation to allow for this correspondence to be achieved. The problem of the stability of the visual world across eyemovements arises from the naı¨ ve realist notion that the visual . blinks.4. Gibson’s alternative proposal. 1966). maintained that the problem of stability of the visual world across fixations was moot. might be explained by the assumption that unchanging information underlies the changing sequence of obtained stimulation.a unitary visual world over time. and that the visual system need only depend on that assumption to maintain its own stability. or Gibsonian ‘‘gradient computations’’ methods does not provide purchase on important properties of perception outside of calibration. This is because one interpretation of Gibson’s proposal is that eye-movements are occurring within the framework of a stable percept. such as ‘‘cue combination’’ methods. because the ‘‘objective external environment’’ was itself stable.. Thus building a representational schema based on metric information alone. etc. which can be used to establish correspondence. As he says: ‘‘ The perception of . self motion. A similar position. but one that was epistemologically correct was also held by Hering in opposition to the Helmholtz school (see Turner. the metric information contained in the percept is meaningful only from the standpoint of the co-calibration of sensor measurement (including device-based measurement). does contain a modicum of truth. The metric information itself does not constitute any objective knowledge of the world. Additionally.. and that gets attended to’’ (Gibson. Gibson (1950) on the other hand. like his others.

It has been shown for example.THE EPISTEMOLOGY OF VISION 465 system is passively recording information that impinges on the retina from an objective ‘‘external’’ world. This device is entirely under the control of the exploratory internal eye.. The result of this exploration alters the deployment of the parallel sensory device. located in the external world (whatever that may be!). 1992). On the other hand if we maintain the proposal that motor actions are occurring within the framework of an organized image – and only within that framework – the notion of stability of the visual world no longer appears to be a central problem of vision. There is some empirical evidence for this alternate view of eye-movements. that it is the perceptual system that has to make adjustments and account for the movement of the eyes. and it is the role of the adaptive mechanisms that control the actual eye-movements to make sure that retinal stimulation maintains registration with the scanning of the internal eye. Specifically. Duhamel et al. . the retina. attentional control and perception. And therefore. Indeed a large area of research looking into how visuo-motor coordination occurs has established the need for fully specified internal models of the external sensory-motor plant (see Kawato. but in a coordinated way on a physically objective world. In a model where the organism is viewed as being located within the organized image (a perceptual space or sensorium). Organisms learn to calibrate motor behavior such that it operates successfully within the perceptual representation (or sensorium).g. the organism is then merely exploring this image (or representation) using what might be referred to as an internal eye (might be thought of as shifts of attention). This idea is crucially distinct from the notion that sensory and motor processes operate independently. that artificially adapting eye-movement mechanisms leads to perceptual ‘‘illusions’’ showing that the eye movement apparatus is designed to be calibrated to the perceived image rather than vice-versa (Bahcall and Kowler. and has been calibrated through early development to move in such a way as to result in successive retinal patterns of stimulation that are already predicted within the image. 1999). Other neurophysiological evidence show shifts of receptive field characteristics of cortical cells that occur due to only changes in attentional locus (e. the internal eye is merely exploring what is a stable perceptual space. 1999 for a review).

Perception is supposed to provide the causal inferential link between the two. Thus. in the same way that objects and surfaces are erroneously taken to be objective things in the world. Yet. let us now summarize the essential problems with the Neoconstructivist. are taken to be objective measures or properties of the world. . as we have seen. Similarly. (1) a naı¨ ve realist model (Figure 15). In these models. This essential problem is nothing more than Hume’s argument against any empiricist theory of knowledge that wants to retain the notion of an objective physical world. Any model of perception as inference reduces inexorably to one of two trivial models. A central aspect of these models is the assumption that spatial descriptions are objective from an informational point of view. or (2) An idealistic model in which the external world does not exist. an inferential model of perception – even when placed within a sophisticated quantitative framework – removes the very distinction that makes the inferential link informative. perception is assumed to be a process that involves the successful detection of objects that have distinct objective existences in the external world. position. where the perceptual system doesn’t involve an inductive (inferential) step between the external and psychological. which have been imaged onto an objective sensory field. the image is thought to have objective descriptors as well as objective ‘‘cues’’ which are assumed to have informational content distinct from the perceptual machinery and the final percept. In all such theories we end up losing the very distinction that we are trying to explain. (2) The external world. inverse optics approach to perceptual representation: (1) The 2D image and 3D percept are defined to be distinct entities: the former existing in the external world and the latter in the psychological domain. orientation. which are erroneously considered to be objective parameters independent of perception. Summary DHANRAJ VISHWANATH So. and the 3D percept are all defined in terms of the same descriptive parameters.5. geometric descriptors like length.466 6. and with it most information content in the percept. but has instead an all knowing homunculus examining an objective image on a camera. the 2D image.

(2) It implies that the only information content in the percept itself is metric – all non-metric information is not part of the percept. In these models. This is the most critical aspect of any theory of perception. Note that such a model is epistemologically a behaviorist model where there is only an ‘‘idealized’’ world of stimulus and response. This is because Neoconstructivism regards perception as a re-presentation of properties of the external world. One way they do so is by creating an unnecessary distinction between what we call measurement as motor action. color is not thought of . Such approaches. erroneously. not as a system of re-presentation. (5) Neoconstructivist theories conflate the notion of calibration with perceptual representation. (7) The non-metric informational content of a percept such as connectedness of points. constraints. continuity of surfaces. (6) Limiting the notion of inverse optics to a calibration domain allows for workable inverse-optics models for exploring a range of issues in vision. and device-based measurement. Thinking in such terms. etc. to be objective and informationally distinct from perception and motor action. (4) Though many recent Neoconstructivist approaches claim to be aligned with Gestalt theories of perception. the device-based measurement is considered. are assumed to be encoded as ‘‘learned’’ biases. nearly all maintain epistemological positions exactly opposite to those espoused by Gestalt theory.THE EPISTEMOLOGY OF VISION 467 (3) Inference requires a re-presentational scheme and re-presentation precludes direct experience. and so any re-presentational scheme that specifies the existence of an external world reduces to a naı¨ ve realist scheme. Though we have been using the term perceptual representation throughout the text. priors. it is most correctly defined. There are two problems with this: (1) Such constraints cannot be learned since a unique percept (on which the learning can take place) is impossible without these biases already in place. but of presentation. etc. implicitly posit that the only information contained in the percept is spatial estimates of points in the visual field. in which shape is viewed simply as a calibration map. (8) Neoconstructivist theories do not provide a satisfactory explanation of perceptual experience. we did introduce the notion of perception as a presentation of the sensory flux. but part of the inferential device.

and the perceived object description is part of the perceptual apparatus and not the extenal world. Figure 16 illustrates what an epistemologically correct model of perception would look like.28 In other words. Figure 16. any metric descriptions of this domain are based on perceptual entities and metrics. where the information content of the percept (both of the image description. Figure 17 illustrates the correct construal of the calibration domain in perception. any correlation between wavelength and perceived color is a synthetic description of relations between perceptual states and measurements in visuo-motor space. It is certainly not a re-presentation of differences in electromagnetic wavelength. the information content of color is parasitic on percept and not on objective properties of the external world.468 DHANRAJ VISHWANATH as a re-presentation of a property in the external world. . as determined by the flux at the sensory interface. Perception as the presentation of an external sensory flux. where ‘‘chromatic difference’’ is a property of the percept and not the external world. There is naturally a correlation between the states of the external world. it is the presentation of chromatic differences. again. and the percept.

. The work is such that materials are not so much brought into alignment with static a priori forms. and convention-based aspects. Robert Morris. as that the material itself is being probed for openings that allow the artist behavioral access. A central feature of perception. Though many of these are best discussed within the framework of art history and criticism. Calibration domain in an epistemologically correct model of perception. namely architecture and product design.THE EPISTEMOLOGY OF VISION 469 Figure 17.. sociological. PERCEPTUAL EXPERIENCE. ritualistic. The range of phenomena that can be ascribed to visual aesthetics is very broad and often involve personal. 7. the latter if emphasized is a recovery of the time that welds together ends and means. religious. a much more enduring puzzle in aesthetics has been the question of what the purely perceptual dimensions of aesthetics are.. that is most evident in the process and products of design. Sculptor. This has been a particularly vexing and important question in the domain of the design of artifacts.29 Perhaps the most enduring puzzle in the history of perception is its relationship to visual aesthetics. AESTHETICS AND DESIGN Objects project possibilities for action as much as they project that they themselves were acted upon – the former allows for certain subtle identifications and orientations.

Other empirical work has pointed to connections between possible representational schemas and their aesthetic import (e. Such written and unwritten rules abound in all fields of design. Recent theoretical and empirical studies have sought to establish more clearly what these links might be. 2001) has explained why qualitative effects in perception and aesthetics have to be explicable within the representational structure of perception and has provided a comprehensive view of art and aesthetics that comes out of his generative theory of shape perceptual theory. Klee. and Arnheim. the notion that aesthetics in art may be a window into underlying neural mechanisms has now made connecting visual arts and brain science a respectable topic. that are invariably. Taylor et al. Design styles that explore the deliberate breaking of implicit and explicit design ‘‘rules’’ (e.g.g. (1999)). are a result of the very nature of the underlying perceptual mechanisms. and scientists such as Kant. arrangement of elements (e. alignment). certain relationships of elements are universally acknowledged to be more aesthetically coherent than others. particularly in architecture. Albertazzi has provided extensive analysis on the relation between aesthetics and perception held by the Gestalt School. particularly in the Brentano and Graz traditions. There are strong reasons to believe that such reflexive perceptual preferences. This idea has been proposed through history by a range of philosophers. (2002). a particular arrangement of objects or parts). structural validity). which inform and shape every step of the design process. treatment of materials.470 DHANRAJ VISHWANATH is that some visual configurations appear. We might define the aesthetic underwritten by these rules as pre-cognitive reflexive perceptual judgments on perceived visual configurations. . and implicitly manifest in the process of design at all scales. There have been many historic efforts to codify design rules that embody those visual parameters of design from Alberti to Le Corbusier.g. particularly Minimalism. In addition to such classical ‘‘rules’’ of architecture there are much more subtle rules of form-making (e. Van Tonder et al.g. often spontaneously. The very trajectory of modern art has been a dialogue with these qualitative effects of perception on painting and sculpture. culminating in quite explicit investigation on the linkage between art and perception in contemporary art. repetition. Also. Leyton (1992. as perceptually more coherent than others (e.g. In architecture. late 70’s and 80’s postmodernism and deconstructivism) underscore their import and ubiquity. rather than being merely experience dependent or cognitively applied. artists.

we have characterized such a set of spatial estimates as a calibration map. The foundations of Gestalt theory originating in Ehrenfels’ work. The product of a percept in a calibration model is a set of estimates of the spatial position of points that make up the viewed configuration. in combination. Our purpose here will be limited to investigating the qualitative implications of contemporary approaches to perception. there should be agreement across all the various estimators (cues) of spatial position. rather than aesthetics in general. We have seen that there are two plausible models of perception as inference via inverse optics: perception as calibration or perception as recognition. propose that the most important aspect of qualitative perceptual judgments are that they are not judgments of external configurations. For external configurations (objects) that are physically plausible. Thus a set of estimates for one spatial configuration will be no different . Neoconstructivist theories of perceptual representation imply that all perceived configurations should. the qualitative perceptual strength of aligned objects cannot be ascribed to some objective notion of alignment in the external configuration.THE EPISTEMOLOGY OF VISION 471 almost a fashion. What these theories essentially imply is that no qualitative judgment can be made on the external configuration. We will specifically be interested in their implications to the design of artifacts. and more recently Leyton’s theory of shape. be qualitatively equivalent. Thus. 7.1. because it is never experienced independently of perception. in scientific circles previously skeptical of the ‘‘soft’’ issues like art and design. no subjective differences should obtain in the perception of any physically plausible external configuration. Qualitative implications of calibration and recognition theories of perception Conventional theories of perception cannot account for reflexively perceived subjective differences precisely because they define perception as a problem of inference. and so they will. It is abundantly clear that in a pure calibration model. for the Gestaltist. For another physically plausible configuration. but to the natural strength of aligned perceptual objects inside the sensorium. at the perceptual level. but rather of the internal perceptual state. a different but equally valid set of estimates will obtain. provide the best possible estimates for the points.

regardless of the nature of the configuration. Now. i. Both sets of estimates are merely metric values. for example. since better viewing conditions mean better estimates. no qualitative differences can be inferred. The second way is that the estimates from one set of ‘‘sensors’’ do not correspond to the estimate for another set of sensors. and close up (best binocular disparity cues) would be more reliable than one viewed in dim condition further away. But note that what we started out by claiming is that calibration theories cannot assign qualitative differences in situations where the external physical configuration is a physically plausible one.e. sensory stimulation that is consistent with a real object. Thus a system that can assess differences between real and artificial stimulation does not provide a means to distinguish qualitative differences between physically plausible (zero cue-conflict) configurations. It is possible to artificially stimulate the visual system so that the estimates from one set of ‘‘cues’’ conflicts with another set of ‘‘cues’’ giving rise to a cue conflict. For example.472 DHANRAJ VISHWANATH qualitatively than the estimates for a different configuration. and thus could be thought of as being ‘‘better’’. in order to study the action of the various sensor estimates that give rise to the overall percept. and not an indicator of the quality of the configuration. there are two possible ways in which the existence of additional information in the calibration map might allow for a qualitative judgment on the estimates. spatial estimates for a visual configuration that is viewed under sufficient illumination. Thus for any set of configurations that are viewed under the same ‘‘conditions’’. It is plausible that the visual system could assign qualitative differences to the percept based on the level of cue-conflict. and thus has no cue conflict. the calibration map based on disparity may be different from the calibration map specified by motion parallax. since the reliabilities will all be the same. This is a state of affairs that is very familiar to the vision researcher in what is know as a ‘‘cue-conflict’’ study. Thus. Cue conflict stimuli are not physically plausible but are situations that are created artificially in the laboratory. The fact that cueconflict situations do provide a means to distinguish between . thus high cue-conflict percepts might be seen as inferior to those with low or zero-conflict. The first way is that the sets of point estimates (that constitute the calibration map) have a concomitant indicator of their reliability. But note that such a reliability signal is in effect an indicator of the quality of the estimate being made.

is just a set of symbolic codes that indicate that such and such a thing has been recognized as being out there. or spatial entities. namely how metric estimates within the visuo-motor sensorium are maintained and correlated. So all physically plausible configurations. and possibly some hierarchical combination thereof. is not qualitatively different from another visual configuration that lights up a different set of symbols.e. that are recognized. Representation in a recognition model. We can extend the argument above to the full-fledged form of Neoconstructivism that we have been claiming reduces to a theory of ‘‘perception as recognition’’. The argument against the plausibility of qualitative differences at the perceptual level in such a recognition model is similar to that for the calibration model. From a perceptual standpoint. may have attributes that have become encoded through experience.e. Cue-conflict methods are perfect for exploring those areas where issues of perceptual representation are not of relevance. But under such a model. Since the representational schema in such a recognition model is essentially a learned instantiation of the objective physical world (in some neural symbology). We argued that in such a model. to be ‘‘better’’. ‘‘of hedonic value’’ etc. i.. they do not inhere to the perceived configuration. i. in the way that it is out there. The final product of the percept in such a model is the activation of a set of symbolic codes in what is essentially a look-up table. or hierarchy of symbols.THE EPISTEMOLOGY OF VISION 473 physically plausible and physically implausible configurations. and the activation of any set of symbols merely indicates that such and such attribute or shape has be identified in the sensorium. underscores the fact that cue-conflict stimuli are ideal for studying how co. in the symbolic look-up table. from the standpoint of perceptual representation. should yield the same perceptual quality. the inferences themselves are equally appropriate for any recognized configuration.calibration across sensory measurements is maintained: a calibration process that is designed to remove detected conflicts when possible.e. more ‘‘useful’’. a visual configuration that lights up one set of symbols. these ‘‘values’’ are crucially not properties of the percept itself. or visual dictionary. the final percept becomes a set of symbolic codes for the existence of pre-defined measures. i. the symbols. ‘‘common’’. ‘‘rare’’. such a functioning recognition system only faithfully infers what is out in the world (up to limits on hardware). but rather are properties of the symbols (or hierarchy of . attributes. It may be the case that some symbols (or sets of symbols) are predefined.

It is interesting and predictive to note that the Neoconstructivist would want to claim that all aesthetic qualities are learned and have no basis in raw perceptual experience.474 DHANRAJ VISHWANATH symbols) that make up the ‘‘look-up’’ table of the recognition device. in terms of its predefined representational schema and ontological entities. naı¨ ve realist. are also strongly indicated in our common phenomenology. just as valid. The perceptual system is constrained to present the percept in terms of certain spatial and . of the sensory flux (Figure 15) then we can think of a situation beyond the calibration domain of cue-conflict. Representational conflict can be thought of as a measure of the degree of internal conflict in the perceptual organization. or more correctly. Since the configuration is physically valid (we have already made this caveat). Obviously. that are most acutely observed in the process and product of design (and art). or it’s coherence based on some objective external measure. and might color the cognitive experience of the perceived object. but the perceptual experience remains neutral. there is nothing else that can be said about its perceptual representation in terms of quality. Under a Neoconstructivist model such cognitive factors would be encoded in memory via experience.2. because such a position is completely consistent with what a empiricist. Representational conflict The sort of reflexive qualitative judgments that our perception seems to apply to visual configurations. The very act of arranging or combining elements. 7. which we will call representational conflict. involve choices and changes of physical configuration that are deeply connected to perceiving differences in the quality of the configurations. in everyday tasks that require it. It is instead a measure of internal structural coherence of the percept. Certain configurations just don’t ‘‘look’’ right! This suggests that there is something in the perceptual product akin to the ‘‘cue-conflict’’ configurations in artificial stimulation. but since there is no marker on the percept telling us this. This conflict has nothing to do with the physical validity or invalidity of the external configuration. the erroneous percept is. it is possible that the ‘‘recognition’’ may be erroneous. theory of perception would predict. If perception is viewed as a particular type of representation. Neither does it have anything to do with qualitative judgments applied as a result of memory or cognition. from perception’s point of view. presentation.

than this idealized ground state. Such conflict will be present even when the external stimulation is perfectly physically valid and cue-conflict free. but the idealized ground state is never attained. Perceptually. Another implication of the idea of representational conflict is that qualitative perceptual judgments are internal to the object. while the latter (cue-conflict) might be erased through adaptation or re-calibration. such as those listed above.THE EPISTEMOLOGY OF VISION 475 ontological modes. even when masked by cognitive factors. In other words. a Neoconstuctivist or inferential theory of perception would imply a neutral perceptual product modulated by appetitive and aversive cognitive factors accrued through experience. but the crucial distinction between representational conflict and cue conflict. a particular configuration is not more or less qualitatively superior to a different visual configuration. and are not comparisons across objects. the measure of it’s qualitative status is selfreferential and thus no metric exists to compare it to a different object. We might also predict that the special property of representational conflict is that it can never be ‘‘corrected out’’ by calibration as in the cue-conflict situation. One building cannot be said to be more aesthetically coherent than another. In other words. In contrast. changes may shift the percept toward ‘‘better’’ lower conflict states. It may be possible for representational conflicts to be masked by learned cognitive attributes. but such a state is never achieved. thus the sensory stimulation generated by different external entities result in perceptual configuration with different degrees of internal perceptual conflict. We also propose that some degree of representational conflict is always present in any perceived visual configurations. an inferential theory cannot explain how we perceive qualitative differences in configurations that we cannot even recognize or have any cognitive knowledge of. cannot be compared. representational conflict cannot be resolved at the perceptual level – it is the very qualitative product of a perceptual organization. This is in contrast to cognitive theories of aesthetic quality where one would posit a neutral perceptual ground state. is that the former is always available at perceptual level. where the given percept is rendered better or worse by application of an extra-perceptual evaluation . we can think of the idealized ground state as one where there is no representational conflict. strictly speaking. For a given visual configuration. Crucially. rather each has it’s own measure of internal coherence that. Any given percept is thus ‘‘worse’’ in terms of representation.

Since design involves the manipulation of real objects. Such disruptions of homeostasis can naturally be cloaked by cognitive factors. One can think of poor visual configurations (in other words. design. and the view of perception as inferring the contents of that world. In such displays the cue conflict cannot be ‘‘calibrated out’’. our aim here is to show that such aspects are not correctly to be thought of in the purview of perceptual representation. In such theories there can be direct comparison across entities since the quality accrues to the symbols. and visual homeostasis The notion of cue conflict plays a role in the ergonomic development of display devices because of inherent cue conflicts in displaying stereo images on 2D computer displays. perceptual aesthetics (in contrast to cognitive aesthetics) can be thought of as a form of visual homeostasis. We can now add one more item to the list of problems with Neoconstructivist theories we gave in Section 6. 7. it suggests that the notion of representational conflict may instead plays a greater role in the visual efficacy of design. A major goal of the development of stereo display devices is to remove such conflicts.3. and not directly to the percept. making the stimulating light array more spatially consistent with that emanating from a real (physically plausible) object. One cannot deny that such a cognitive aesthetic is also a central aspect of the psychological experience. They can only imply a cognitive aesthetic applied onto the neutral product of perception. Good designers can then be thought of as those who intuitively configure artifacts toward a deep local minimum of representational conflict. It is interesting to . Representational conflict. Since inherent representational conflicts in the perceived configuration cannot be ‘‘corrected’’ out. Thus.4: (9) The implicit assumption of an objective external world. and such objects do not have cue-conflict. do not allow for qualitative differences to be experienced at the perceptual level.476 DHANRAJ VISHWANATH based on memory or other cognitive factors. This usually leads to visual stress or the inability to make consistent metric judgments. and in common design often are! Thus. such theories cannot explain the ubiquitous phenomenology of design. poor design) as ones where the homeostasis of perceptual organization is disrupted. they most likely lead to states of perceptual stress akin to the visual stress induced by cue-conflicts in displays.

But establishing external causal linkages between distinct existences is the central goal of any empiricist theory of perception. that local sensory experience is determined by more than merely local stimulation. or an idealistic one where we have lost the very external world we are trying to explain. The problem is that they attempt to be causal theories in an empiricist framework. which at the present time most psychologists suppose to be an effect of interaction in the nervous system. by arguing that the very schema of perception must be a causal one. the reader will hopefully begin to see that the underlying problem with all inferential inverse-optics approaches to perceptual representation is nothing more than the deceptively simple problem first posed to us by Hume in 1739.’’ Wolfgang Kohler (1947) By now. EPILOGUE: EMPIRICISM. when he essentially says that the mind ‘‘wraps’’ causality around object. As we have seen such an approach reduces to either a naı¨ ve realist model of perception. Leyton has laid the foundation for such a model. though interesting and of relevance to historic developments in perception research. Hume himself betrays his implicit belief that a causal theory is only plausible within a nativist framework. Here the point-to-point correlation between retinal stimuli and sensory experience is no longer defended. We cannot conclude from this that a causal theory is not possible. Within perception research. Hume’s analysis implies that if one wants to establish causal linkages outside of perception. ultimately of minimal bearing . but rather that it is not plausible under an empiricist theory. NATIVISM. But after this concession. The case is that of color contrast. 8. the distinct entities that we seek to establish those links between. and make those very causal links meaningless. it often seems like the debate from Descartes through Hume to Kant is merely a pleasant philosophical interlude that. lose their identity. how can we proceed as though nothing serious has happened? It took science some time to accept obvious evidence even in this case.THE EPISTEMOLOGY OF VISION 477 note that the concept of physiological homeostasis was central to the figures that first began the anti-constructivist view of vision. because the determination of local experience by conditions in a larger area is too evident. AND REPRESENTATION ‘‘In the case of one observation almost all psychologists agree. namely Mach and Hering.

exemplified by the debate between Hering and Helmholtz (see Turner. middle ground between empiricism and nativism appears. rationality. nurture debate as it came to be played out in the Helmholtz-Hering controversy (see Turner. Thus most Neoconstructivists will see themselves as neither empiricist nor nativist. as originally formulated by Hume. which is why so many have been drawn to take the natural middle road on the nature/nurture divide. etc. that they are all learned. being. 9). The assumption being that the argument reduces to one where nativism means that the modes of perceptual processing are already established at birth. 1950. Empiricism typically gets aligned with the notion of perception as learning through association. 1994). starting with Hume and Kant. p. Nativism typically gets aligned either with Descartes. but rather as holding to a scientific methodology that is orthogonal to the debate between the two. Indeed. the denouement of story of color perception and space perception are often held to be the hallmarks of this successful compromise. the notion of a ‘‘blank-slate’’. But the latter incommensurability is false. The problem has been that the epistemological incommensurability of nativism and empiricism. but illusory. Mach. merely because the . and with it the appearance that a compromise between empiricism and nativism is possible.. and Hering. through Brentano. and the associated muddles in the notion of self. futile and unnecessary (as Gibson states quoting Boring: ‘‘the long and barren controversy over nativism and empiricism’’ (Gibson. Naturally when the argument is characterized in that way. or with the idea that perception does not involve learning of any kind. and that rejects both these notions.478 DHANRAJ VISHWANATH to contemporary research. has always been characterized as an anachronism. 1994). Invariably researchers have assumed that there was a safe middle ground that could be occupied. Descartes. gave way to a supposed incommensurability of nature vs. Fechner.. Kant etc. that forms the direct lineage to modern physics. Research in perceptual representation has suffered because the debate between empiricists and nativists. and empiricism. The desire for an exactness matching that of physics is what is usually held up by Neoconstructivism as a basis for rejecting merely philosophical ‘‘musings’’ on the epistemology of perception. or the idea that perception can be explained without recourse to representation (behaviorism). What is ironic about this is that it is these very epistemological concerns in perception. which is taken to be on more solid scientific ground. a natural.

took a stronger epistemic stance.g. nativism and empiricism just represent two ways in which information might come to be encoded. we lost an early opportunity to redefine the entire gamut of vision research from receptor physics to perceptual representation on strong epistemological grounds. Even though Hering has been cast in the role of a person against any form of learning. and the time frame involved. because it precluded precisely the kind of so-called ‘‘atomistic’’ work followed by Hering school. see Turner. which had a very clear understanding of it’s epistemological claims. unlike the Helmholtz school. most characterization of current theory would claim that Neoconstructivism is partly nativist because it is fairly well established that there are constraints at birth. 1994). The later introduction of nativism by the Gestalt theorists. 1994). and indeed his theories have been redeemed. in the apparent conflation of Hering and Helmholtz. but was against the ‘‘atomistic approach’’ of both Helmholtz and Hering. But for the real nativist position (e. much of his theorizing on color and binocular vision still stands on solid epistemological ground. Arguably. Neoconstructivists have assumed that perceptual inference can be split up into two parts. on epistemological grounds. Hering and the Gestaltists) the least interesting aspect of nativism is that they predict constraints at birth. The Gestalt challenge. particularly the Berlin school. Yet. rather than that they represent mutually exclusive theories of the nature of the information or knowledge that our percepts provide. I believe this distinction hurt the Gestalt movement.THE EPISTEMOLOGY OF VISION 479 protagonist of each approach regarded themselves as being on the opposite sides of the divide (for a wonderful historical account of this. . In other words. The typical reasoning is that a ‘‘empiricist’’ like learning process occurs in ontogeny utilizing ‘‘nativist’’ like constraints that have evolved through phylogeny. in neither domain (color or spatial perception) has any support been found for a genuine empiricist theory of representation. Thus. the ontogenetic component and the phylogenetic component. This distinction introduced another red herring into the debate by defining empiricists as those who worked atomistically and Gestaltists who held to the paramount importance of the whole. Hering evidently saw that that any distinction between nativist and empiricist portions of a compromise theory dissolves once one posits that nativist constraints are merely ones that have been learned through evolution (see Turner.

Hering’s exceptional phenomenological and epistemological sense lead to the elegant solution of opponent color process which redefined color more strongly as a perceptual construct and only very weakly as a correlate of wavelength. Without a hyphen. Such research might more usefully shed light on issues in Design. empirical approach taken by Hering.g. . and in hindsight. An alternative theory of representation proposed by Leyton (1992. 1982). and one where the informational content and qualitative aspects of the perception of shape are explicable within a non-metric algebraic description. a particular symbolic hierarchy).480 DHANRAJ VISHWANATH is entirely commensurate with the Hering school. with the quantitative theoretical basis provided by Leyton. We merely use the term as shorthand for contemporary constructivist theories that view perception as a process of inference. NOTES 1 This term has been introduced previously in the literature in the context of the general approaches to cognition. When the standard English usage of the term ‘‘representation’’ is implied – as something that stands in for. Our usage here is restricted to visual perception and no commentary for or against any existing interpretation of the term is intended. and the original thrust of Gestalt theory. meanwhile the empiricist and behaviorist approaches naturally followed from Helmholtz. epistemologically correct. It’s possible that the Gestalt school lost it’s influence for several decades precisely because it didn’t provide a continuum with the solid scientific base that Hering’s school had already established. the differences may only be one of levels of abstraction and function. the term representation – or perceptual representation – will refer generically to how the percept is actually encoded or instantiated in some structural model (e. A fruitful enterprise for future work in vision would be a redefinition of research problems in space and object perception by combining the careful. by focusing on secondary aspects of the theory such as enumerating the rules of grouping. See Note 19. 2001). something else – the word will be hyphenated to read ‘‘re-presentation’’. 2 The term representation has varied use in the literature. A more correct term for the actual output of perception from an epistemological standpoint would be ‘‘perceptual presentation’’ or ‘‘presentation’’. has explicitly and implicitly resolved many of these challenges of Gestalt theory by providing a model whose very schema is a causal one. The increasing interest in Gestalt in the last two decades again threatens to further bury the critical epistemic foundations spelled out by Hering. (see Harnad. or symbolizes.

yet no comprehensive analysis of the foundational issues and current scientific approaches is available. This is especially problematic for the point of view of the training of new perception scientists and philosophers of perception. roused me from dogmatic slumbers and gave a new direction to my investigations in the field of speculative philosophy’’. 2004. 5 We use the term standard computational vision following Leyton (1992) for computational analysis of vision originating in Artificial Intelligence research. Translation in Mu ¨ ller ´ . ‘‘I confess frankly. It is unclear. which considers itself to be true the heir to philosophical treatments of perception. There has been resurgence in interest in perception research on some of these foundational notions. The increasingly technical bulwark that analytic philosophy has put up has shifted epistemological discourse more into the domain of language and concepts. Thus in cognitive psychology. Max. London. Immanuel Kant’s Critique of Pure Reason. eliminativism. however. The other two theories seem to hedge their bets with regard to whether properties inhere to the percept or the external world. 4 I am deeply indebted to Michael Leyton for the development of this paper. 11 An additional drawback has been the professionalization of analytic philosophy. it was the warning voice of David Hume that first. the status of the distinctions made among these theories. 9 Also anticipated by Al-Ghazali in ‘‘The Incoherence of Philosophy’’ (Scruton. and his analysis of the critical issues in visual phenomenology and the information content of perception. see Brain and Behavioral Sciences Issue 24 (2001). Mausfeld (2002)). 1986). Another area of much analysis in epistemology and ontology has been that of color perception (for a review and analysis see Byrne and Hilbert (2003). . 10 There have been several attempts to re-ignite the discourse (For example. once one assumes (a) the existence of an external world but without an objective description (b) that perceptual information is correlated non-trivially with the external flux. tr. The image is used with permission from Waddington Galleries. Externalism appears to tow a line close to Neoconstructivism while eliminativism appears to betray an idealist position – at least on one reading. F. Prolegomena zu einer jeden ku ¨ nftigen Metaphysikdie als Wissenschaft wird auftreten ko ¨ nnen. Immanuel (1783). it appears that both philosophers and psychologists of perception have been eager to relegate issues of epistemology to the analysis of the nature of reference.THE EPISTEMOLOGY OF VISION 3 481 Barry Flanagan is one of a group of British sculptors whose work since the mid 60’s has explored the interface between sculptural convention and the exigencies of perception. Riga: Johann Friedrich Hartknoch. 2002). that crystallized for me the epistemological argument I present here.. years ago. dispositionalism. Ludwig (1881). 8 Hume’s argument is essential for the cognitive and perceptual sciences in a way that it is not for the physical and biological sciences. 6 See note 2 and 19. and thank him both for introducing me to his work and his personal communications. (1991). 7 From Kant. and Noire London: Macmillan. It was an understanding of his theory. and functionalism. Scholl. The two other major artistic movements of the time that worked in a similar vein were Minimalism (US) and Art Povera (Italy). Varela et al. particularly the psychological primacy of the notion of object (see Pylyshyn. It is generally accepted that there are four distinct positions on the status of color as a mental/and or physical entity: externalism.

incoherent. (Kant.e. which derives from usage in German ‘‘Vorstellung’’ (as opposed to ‘‘Darstellung’’ or representation) by Gestaltists. y) plane is parallel to the image plane. 21 Many contemporary descriptions of ‘‘direct pickup’’ or ‘‘direct perception’’ are notoriously difficult to understand. or just representation. Reflectance is the proportion of light reflected from a surface in a particular direction.482 12 DHANRAJ VISHWANATH From the German. We have used the term perceptual representation. 15 Any vector orientation is expressed as a coordinate ( p. i. there doesn’t seem to be a reason for us to have perceptual access to a translated 3D space. i. 13 I am very grateful to Michael Leyton for providing me with his lecture notes for the course on computational vision that he taught at Rutgers. 14 Luminance (or radiance) is the amount of visible light that comes to the image from a surface. 18 Many researchers may object to this characterization. perceived surface reflectance. ‘‘Dich an sing’’ or thing-in-itself. Horn uses the terms radiance and irradiance. is at some level. The claim here is whatever may be the ‘‘intent’’ of ecological statistics. q) which are partial derivatives (along a flat surface patch normal to the vector) dz/dx and dz/dy in a co-ordinate system where the (x. it’s internal assumptions are consistent with an assumption of an image and external environment that can be objectively and ‘‘correctly’’ described. All these are physical properties that can be measured using physical devices. Brightness is the perceptual correlate of luminance. . from Critique of Pure Reason (1781)). 0). 19 I am indebted to Liliana Albertazzi for introducing the usage of this term to me.e. can be adequately expressed in lower dimensional (2D) space of the image. when the visual information. Most of the section is based on his review of computational theories. for exactly the opposite reasons. though we will use the terms luminance and illuminance. and the line of sight passes through the origin. But on closer consideration it is clear that if the translation from a 2D data set to a 3D data set is accomplished by a fixed transformation. Here we imply the actual structural model that encodes percepts. as well as any motor program. Lightness is the perceptual correlate of reflectance. then that same transformation can be directly applied to the image data set to program the motor commands without any need for perceptual access to it. particularly in the Bentano and the Graz school of Gestalt perception. throughout the text. g) here refer to a projected version of the original gradient space co-ordinates ( p. ¥). q). 22 See note 10. vectors parallel to the line of sight have gradient (0. and vectors perpendicular to the line of sight have gradient ()¥. See Horn (1986). and evidently arise from the fact that the concept itself. Thus. 17 A possible objection to the reasoning above is that the transformation is informative because it generates a 3D interpretation of the data that might be useful for motor programming. perceived surface luminance. 16 The gradient space co-ordinates ( f. 20 It is perhaps the reason why Gibson is so many different things to so many people – a person who can be berated and admired in the same breath by those with either Gestalt or empiricist leanings. Illuminance (or irradiance) is the amount of light incident on a surface (or the retinal image). See Horn (1986). Indeed.

: 2000. EW Graf and MO Ernst: 2004. Sensory Communication. that the perceiver might have with the object that might be construed as additional information. Kluwer: Dordrecht. Blake.THE EPISTEMOLOGY OF VISION 23 24 483 Wavelength is itself an ‘‘extended perception’’ concept. Note that we exclude cognitive associations. Rosenblith (eds). Zisserman: 1987. Nature Neuroscience 7(10). L. O. and E. 1057–1058. 115–147. Barlow. Dordrechet: Kluwer. I. MA: MIT Press. His work most directly explored issues underlying the nature of perception and its relation to sculpture. . the actual nature and dimensionality of which perception has no access to. to such a degree that his mode of work was criticized as being ‘‘theater’’ in the now controversial essay by Michael Fried ‘‘Art and Objecthood’’. B. 25 Naturally. MIT Press: Cambridge.. and D. L. Albertazzi.: 1961. (ed. A. Albertazzi (ed. Albertazzi. Burr: 2004. Albertazzi. in L. 1992). L. The School of Franz Brentano. 864–866. Beal.: 1987. Kowler: 1999.). H. The Ventriloquist Effect Results from Near-Optimal Bimodal Integration. 29–60. Current Biology 14(3). L. 27 It is precisely the capacity of the visual system to ‘‘draw out’’ the percept that is central to Leyton’s proposal for a nested. 2004). the relationships among those measurements have some correlation to the actual physical flux external to the perceptual system. Psychological Review 94. and A. Bahcall. Wholes and Psychophysics. REFERENCES Adams. A Quiet Revolution – British Sculpture since 1965. Possible Principles Underlying the Transformation of Sensory Messages. or categorizations. Experience can change the ‘light-fromabove’ prior. Albertazzi. Parts. Visual Reconstruction.): 2002. 257–262. A. 26 Recently there has been evidence that this particular assumption of lighting direction is amenable to learning on a short time scale and can be altered within laboratory conditions (Adams et al. Libardi. in W. Thames and Hudson: London. Poli: 1996. Alais. Early European Contributors to Cognitive Science. Kluwer: Dordrecht. M. Jacob: 1987. D. pp. Illusory Shifts in Perceived Visual Direction Accompany Adaptation of Saccadic Eye Movements. Presentational Primitives. Biederman. WJ. D. Cambridge. Unfolding Perceptual Continua. causal representational schema (see Leyton.: 2001. G. 29 Robert Morris was one of the pre-emininent artists of the Minimalist movement. 28 See note 10. and M. and R. since this does accrue to the perceptual process that delivers the percept of an object. Nature 400. Benjamins. Recognition-by-Components: A Theory of Human Image Understanding. J.. The Dawn of Cognitive Science: Early European Contributors.

J. Perry. F. Ernst. MIT Press: Cambridge. 90–2. and M. 1191–1202. J. Horn. Landy: 2002. Journal of Experimental Psychology 46(5). Malik. M. S. B. G. J.: 1983. Byrne. S. Hillsdale. and D. Duhamel J. Braun. Leeuwenberg: 1986. C. and E. M. S. Harnad. and Q. Super and D. Banks. J. p. 331–354. Brady. Banks: 2002. 9. Language. A Treatise of Human Nature Vol. M. D. Criteria for Representation of Shape. Cambridge: Cambridge University Press. Vol. J. Colby and M.: 1996. Journal of Vision 2(4). and M. J. Scholes (eds). 1–11. NJ. P. J. J. Mind and Brain. NJ: Erlbaum. F. Zaidi: 2000.: 2000. Vol. Combining Sensory Information: Mandatory Fusion within. Hochberg. 155–186. Robot Vision. Griffiths. J. A. The Perception of The Visual World. .484 DHANRAJ VISHWANATH Boselie. E. D. A. in A. pp. R. 14203–14204. A Quantitative Approach to Figural ‘‘Goodness’’. The Ecological Approach to Visual Perception. Hoffman. Mapping the Mental Space of Rectangles.: 1748. but not between. A test of the minimum principle requires a perceptual coding system. Simon and R.: 1982. Recognition and Learning. Ecological Statistics of Gestalt Laws for the Perceptual Organization of Contours. Beck (eds). Little Brown and Co: Boston. Grimson. Erlbaum: Hillsdale. in D. Modularity of Mind. Intimate Attention. Fodor. Nature 415.: 1991.. L. Epstein: 1985. The Status of the Minimum Principle in the Theoretical Analysis of Visual Perception. Perception as Bayesian Inference.. S. Goldberg: 2002. in T. MIT Press: Cambridge. M. Gibson. 1627–1630. B. 711–724. Consciousness Explained. Haughton Mifflin: Boston. Richards: 1998. J. W. 171–200. Houghton Mifflin: Boston. The Senses Considered as Perceptual Systems. Neoconstructivism: A Unifying Theme for the Cognitive Sciences. Gibson. and W. Senses. W. Human and Machine Vision. Perception 15. Perception 27. O. What do we Mean by ‘‘the Structure of the World’’? Commentary. and E. J. Image Recognition: Visual Grouping. E. J. J. Rosenfeld and J. and R. Geisler. S. Behavioral and Brain Sciences 26. L. M. J. M.: 1979. Vision Research 41. and P. S. Perceptual Assumptions and Projective Distortions in a Three-dimensional Shape Illusion. The updating of the representation of visual space in parietal contex by intended eye movements. Humans Integrate Visual and Haptic Information in a Statistically Optimal Fashion. MIT Press: Cambridge.: 1950. and W. D. Nature. 3–21.: 1966. Perona: 1999. Gibson.: 1983. Knill and W. C. McAlister: 1953.. Science 155. A. Proceedings of National Academy of Sciences 96(25). Edge Co-occurence in Natural Images Predicts Contour Grouping Performance. Dennett. 429–433. Hume. Color Realism and Color Science. Hilbert: 2003. Hillis. O. 1. From Images to Surfaces. Elder. Psychological Bulletin 97(2). Science 298. Feldman. Houghton Mifflin: Boston. K. Goldberg: 1992. Richards (eds). Hatfield. Gallogly: 2001. 1: of the Understanding in The Empiricists London: MacMillan Press.: 1986.. Ernst. 324–353. Buhmann. 408.: 1981. H. J. Perception 29(2). M. P. D. J. 361–364.

: 1999. Maloney. 1–120. S.: 1996. Leyton. Knill and W. Knill and W. ‘Priors. M. I. Generic Observers and the Information Content of Texture Cues. Palmer. Graphics. and Categorical Percepts’. Journal of Mathematical Psychology 30.: 1999.: 1947. C. A Theory of Information Structure I: General Principles. New Foundations for Perception. T. . MIT Press: Cambridge. Richards: 1992.: 1994. A Generative Theory of Shape. Horn: 1981. Vision Research 38. Recovery of the Three-dimensional Shape of an Object from a Single View. D. Mausfeld.. Current Opinion in Neurobiology 9. in E. Nakayama. Leyton. K. 389–412. M. Pinker. Scientific American Books: New York. Perception. Johnston. Leyton. Internal Models for Motor Control and Trajectory Planning. Richards.M. Leyton. Artificial Intelligence. Gestalt Psychology. M. Liverlight: New York. Richards. 409–460. Norton: New York. 20–53. A Theory of Information Structure II: A Theory of Perceptual Organization. Rock. Mind. M. Jepson and J. ‘What Makes a Good Feature?’.: 2002. E.: 1998. Feldman: 1996. T. Richards (eds). Cambridge Univ. 141–184. and S. J. C.. Jepson. Leyton. Surface Orientation from Texture: Ideal Observers. M.: 1981. S. Richards (eds). and W. and M. in L.: 1987.: 1997. A Claim Commentary. Macmillan: London. A.: 1997. Machine Learning. Kohler. Computer Vision. Biological Cybernetics 51. Preconceptual Objects. P. Freeman Press: San Francisco. Cambridge University Press. Spatial Vision in Humans and Robots. 718–7270. Perception as Bayesian Inference. Young: 1995. 1655–1682. Principles of Information Structure Common to Six Levels of the Human Cognitive System. Causality. 257–305.: 1984.: 2001. Leyton.: 1986b. Landy. Nested Structures of Control: An Intuitive View.: 2001. Vision. Pylyshyn (eds). Springer-Verlag Heidelberg. in D.: 1999. R. What is Cognitive Science?. Press. Leyton. W. Cognition 80. Symmetry. Kawato. W. Preferences. Pratt. L. Jenkin (eds). K. Leyton.. 103–160.: 1982. Vision Science. Wiley. 141–153. M.: 1986a. Vision Research 35. Richards (eds). M. 127–158. B. Harris and M. How the Mind Works. ‘The Physicalist Trap in Perception Theory’. MIT Press: Cambridge. A. Kanade. McGraw Hill. M. in Perception and The Physical World.: 1992. D.: 1986c. W. in D. Perception as Bayesian Inference. Cambridge: Cambridge University Press. Artificial Intelligence 17. A. Malden: Blackwell. I.THE EPISTEMOLOGY OF VISION 485 Ikeuchi. T. Information Sciences 38. C. Artificial Intelligence 17. and B. ‘Visual Indexes. and Situated Vision’. W. Perception as Bayesian Inference. Knill and W. Experiencing and Perceiving Visual Surfaces. K. Lepore and Z. in D.: 1984. Shimojo: 1996. Pylyshyn. Journal of Mathematical Psychology 30. Mitchell. Marr. S. M. Perceptual Organization as Nested Control. Measurement and Modeling of Depth Cue Combination: In Defense of Weak Fusion.‘Numerical Shape from Shading and Occluding Boundaries’. Knill. Z. and Image Processing 37. M. Cambridge: Cambridge University Press.

Bulthoff: 1996. Nature 419. Watanabe. 1. Sensation and Perception. Visual Cognition 2nd edition. Witkin. Scholl. John Wiley and Sons: New-York. Van Tonder. Ejima: 2002. B. Vol. MIT Press: Cambridge. C.: 2002. 95–105. Flynn: 1994. Nature 399. E. The Development of Object Perception. Yang. F. Shiffman. Spelke. Rosenfeld and J. E.: 1969. NJ: Erlbaum. Taylor. Terzopoulos. MA: MIT Press. NJ. Psychological Science 5. Graphics.486 DHANRAJ VISHWANATH in B. Turner. A. P. . and E. D. Sekuler. Perception as Bayesian Inference. 632–640.: 1999.. Palmer.. MA. A. and D. Westheimer. P. and C. and D. Stevens. A Quantitative Study of Inference Information. G. The Information Content of Texture Gradients. Invitation to Cognitive Science. A Short History of Modern Philosophy. Yuille. R. A Statistical Explanation of Visual Space. Cambridge. J. Micolich. Z.. Richards (eds). In the Eye’s Mind: Vision and the Helmholtz-Hering Controversy. Princeton University Press: Princeton. R. Fractal Analysis of Pollock’s Drip Paintings. Computer Vision. Scruton. S. Van de Walle: 1995. Jonas: 1999. G. R.: 1981. 422. A. and Y. Human and Machine Vision: Vol. 359–360.: 1983. John Wiley and Sons. A. Van Tonder. Beck (eds). Knowing and Guessing. Osherson (eds). The Embodied Mind: Cognitive Science and Human Experience. in D. Thompson. Knill and W. Perception. and J. Local and Global Processes in Visual Completion. Cambridge: Cambridge University Press. and Blake: 1986. and H. Bayesian Decision Theory and Psychophysics. and Image Processing 24. K. Sekuler. in D. Lyons. Nature Neuroscience 6. MIT Press: Cambridge.: 1986. On the Role of Structure in Vision. and Y. A. J.: 1994. Biological Cybernetics 42. ‘Gestalt Theory Reconfigured: Max Wertheimer’s Anticipation of Recent Developments in Visual Neuroscience’. Multilevel Computational Processes for Visual Surface Reconstruction. H. and G. G. S. R.. in A. G. 2nd edition Routledge: London. Tannenbaum: 1983. Purves: (2003). 260–267. Hillsdale. S. S. 52–96. Varela. Rosch: 1991.. Perception 28(1). Ejima: 2000. E. Perception 29. H. 2. M. Objects and Attention. J. 149–157. Visual Structure of a Japanese Zen Garden. R. M.: 1999. Gutheil.