This action might not be possible to undo. Are you sure you want to continue?
A Neurobiological Model of Visual Attention and Invariant Pattern Recognition Based on Dynamic Routing of Information
Bruno A. OIshausen,1,3 Charles H. Anderson,1,2,3 and David C. Van Essenla
‘Computation and Neural Systems Program, California Institute of Technology, Pasadena, California 91125, 2Jet Propulsion Laboratory, Pasadena, California 91109, and 3 Department of Anatomy and Neurobiology, Washington University School of Medicine, St. Louis, Missouri 63110
We present a biologically plausible model of an attentional mechanism for forming positionand scale-invariant representations of objects in the visual world. The model relies on a set of control neurons to dynamically modify the synaptic strengths of intracortical connections so that information from a windowed region of primary visual cortex (VI) is selectively routed to higher cortical areas. Local spatial relationships (i.e., topography) within the attentional window are preserved as information is routed through the cortex. This enables attended objects to be represented in higher cortical areas within an object-centered reference frame that is position and scale invariant. We hypothesize that the pulvinar may provide the control signals for routing information through the cortex. The dynamics of the control neurons are governed by simple differential equations that could be realized by neurobiologically plausible circuits. In preattentive mode, the control neurons receive their input from a lowlevel “saliency map” representing potentially interesting regions of a scene. During the pattern recognition phase, control neurons are driven by the interaction between topdown (memory) and bottom-up (retinal input) sources. The model respects key neurophysiological, neuroanatomical, and psychophysical data relating to attention, and it makes a variety of experimentally testable predictions. [Key words: visual attention, recognition, model, sating, visual cortex, pulvinar, control]
Of all the visual taskshumanscan perform, pattern recognition is arguably the most computationally difficult. This can be attributed primarily to two major factors. The first is that in order to recognize a particular object, the brain must go through a matching processto determine which of the countlessobjects it hasseen beforebest matchesa particular object under scrutiny. The second factor is that any particular object can appear at different positions, sizes, and orientations on the retina, thus giving riseto very different neural representations early stages at of the visual system.
Received Dec. 3, 1992; revised Apr. 29, 1993; accepted May 6, 1993. We thank Christof Koch, John Tsotsos, Andreas Hers, Harold Hawkins, Ed Connor, and Bill Press for critical commentary and helpful suggestions. This work was supported by ONR Grant N00014-89-J-1192andU.S. PHSGrant MH1913802 (T32). Portions of this work were described in CNS Memo 18, California Institute of Technology, August 1992. Present address of all authors, Department of Anatomy and Neurobiology, Box 8 108, Washington University School of Medicine, 660 South Euclid Avenue, St. Louis, MO 63110. Copyright 0 1993 Society for Neuroscience 0270-6474/93/134700-20$05.00/O
Researchon associativememorieshasprovided someinsight as to how the problem of pattern matching can be solved by neural networks (e.g., Hopfield, 1982; Kanerva, 1988). However, it is far lessclear how the brain solvesthe secondproblem to produce object representations that are invariant with respect to the dramatic fluctuations that occur on the sensory inputs. Our goal here is to propose a neurobiological solution to this problem that is detailed enoughin its structure to generateuseful experimental predictions. Our basic proposal is similar to a psychological theory put forth by Palmer (1983), in which it wasproposedthat the process of attending to an object placesit into a canonical, or objectbased, reference frame. It was suggested that the position and size of the referenceframe could be set by the position and size of the object in the scene(assumingit was roughly segmented), and that the orientation of the reference frame could be estimated from relatively low-level cues,suchaselongation or axis of symmetry (seealso Marr, 1982). The computational advantage of such a system is obvious: only one or a few versions of an object need to be stored in order for the object to be recognized later under different viewing conditions. The disadvantage, of course, is that a scenecontaining multiple objects requiresa serialprocess attend to one object at a time. However, to psychophysical evidence suggests the brain indeedemploys that such a sequential strategy for pattern recognition (Bergen and Julesz, 1983; Treisman, 1988). Palmer made no attempt to describea neural mechanismfor transformingan object’srepresentation from one referenceframe to another, becausehis was primarily a psychological model. Various other modelshave been proposedfor transforming referenceframesusingneural circuitry (Pitts and McCulloch, 1947; Hinton, 198la; Hinton and Lang, 1985; von der Malsburg and Bienenstock, 1986). Of these, only the proposal of Pitts and McCulloch can be viewed truly as a neurobiological model. However, their proposal- that the brain averagesover all possible transformations of an object via a scanningprocess-cannot be reconciled with our current understandingof the visual cortex. In this article we propose a neurobiological mechanism for routing retinal information so that an object becomesrepresentedwithin an object-basedreferenceframe in higher cortical areas.The mechanismis modified and expandedfrom an earlier proposal(Anderson and Van Essen,1987)for dynamically shifting the alignment of neural input and output arrays without loss of spatial relationships. The model presentedhere allows both shifting and scalingbetweeninput and output arrays, and it also provides a solution for controlling the shift and scale in an
Object-centered reference t7ame (position and scale invtiant) fl
Figure 1. Shiftingand resealing the window of attention. The image within the window of attention in the retina is remapped onto an array of sample nodes in an object-centered reference frame. a, In the simplest scheme, each “pixel” in the object-centered reference frame represents image luminance. b, More realistically, each pixel should presumably correspond to a feature vector that integrates over a somewhat larger spatial region and represents orientation, depth, texture, and so on.
autonomous fashion. While the model is clearly an oversimplification in some respects, it respectskey neuroanatomical constraints and is consistent with neurophysiological and psychophysical data relating to directed visual attention. We begin with a description of the basicmodel-the dynamic routing circuit-and its autonomous control. Subsequentsections describe the proposed neurobiological substrates and
sponsibility is to dynamically route information through successivestages the cortical hierarchy. of A dynamic routing circuit Figure 2a showsa simplified, one-dimensionaldynamic routing circuit (the next section discusses how this circuit can be scaled up asa model of the visual cortex). It consistsof an input layer of 33 nodes, an output layer of five nodes, and two layers in between.Additionally, a setof control units makemultiplicative contacts onto the feedforward pathways in order to changeconnection strengths. This network hasbeen constructed so that 1. the fan-in (number of inputs) on any node is the samein this case5, 2. the spacingbetweeninputs doublesat eachsuccessive stage, and 3. the number of nodeswithin a layer is suchthat the spread of its total input field just covers the layer below.
This connection scheme has the attractive property of keeping
mechanisms,predictions of the model, and a comparison with other modelsthat have been proposedfor visual attention and recognition.
The goalof our model is to provide a neurobiologically plausible mechanismfor shifting and resealingthe representation of an object from its retinal reference frame into an object-centered reference frame. Information in the retinal reference frame is representedon a neural map (the topographic representationin VI), and we hypothesize that information in the object-based reference frame is also representedon a neural map, as illustrated in Figure 1. This does not necessarily imply that only “pixels” can be routed into the high level areas, as drawn in Figure la; each sample node in the high level map could be expandedinto a feature vector representingvarious local image properties, suchasorientation, texture, and depth, that are made explicit along the way (Fig. lb). In order to map topographically an arbitrary section of the input onto the output, the neurons in the output stageneed to have dynamic access neurons in the input stage.In the brain, to this access must necessarilybe obtained via the physical hardwareof axons and dendrites. Sincethesepathwaysare physically fixed for the time scaleof interest to us (< 1 set), there needsto be a way of dynamically modifying their strengths.We propose that the efficacy of transmissionalong thesepathways is modulated by the activity of control neurons whose primary re-
the fan-in on any node fixed to a relatively low number while allowing the nodesin the output layer access any part of the to
input layer. This property will be important in scaling up the
model. An example of how the weights might be set for different positionsand sizesof the window of attention is shownin Figure 2, b and c. When the window is at its smallestsize (sameresolution as the input stage,Fig. 2b), the weights are set so as to establish a one-to-one correspondencebetween nodes in the output and the attended nodesin the input. When the window is at a larger size, the weights must be setsothat multiple inputs converge onto a single output node, resulting in a lower-resolution representationof the contents of the window of attention on the output nodes.If the input representationwere to contain nodestuned for different spatial frequencies,then the low-frequency nodes would be primarily used when the window of attention is large, whereasthe high-frequency nodes would be
d. is given by WI. . An “x ” at coordinate (j. + cu. I4 window of attention window of attention H Figure 2.4702 Olshausen et al.d. Note that the lowest layers are best suited for small. = (~2 u. fine-scale adjustments to the position and size of the attentional window. i) implies that . while the upper layers are better suited for large. Some examples of how connection strengths would be set for different positions and sizes of the window of attention. Connections are shown for the leftmost node in each layer. This diagram showsthe connection matrix for a simple one-dimensionalrouting circuit composedof two layers-an input layer and an output layer. + (Y&J: + c+J:). and I denotes the layer number. but merely shifted.cqi . In general. where the parameters d.. A simple.and u) is then given by d = d. (Y. The connections for the other nodes are the same. essentially interpolates from the nodes below by forming a linear weighted sum of its inputs: where W: denotes the strength of the connection from node j in level 1 to node i in level 1 + 1. respectively. + cr.(d. Thus. in the transformation from level 1 to level I + 1. coarse-scale adjustments.(Y~. Each node. Control Our analysisof how information flow can be controlled is aided by visualizing the routing circuit in “connection space.= exp (j . much of the image smoothing could be accomplishedby using a set of hardwired filters. Zf. scaling. I=1 N=B Input b. the the vertical axis representsthe nodes constituting the output layer.Model of Visual Attention and Recognition a. A set of control units (not explicitly shown) provide the necessary signals for modulating connection strengths so that the image within the window of attention in the input is mapped onto the output nodes. The challengein controlling the routing circuit lies in properly setting the synaptic weights to yield the desired position and size of the window of attention. denote the amount of translation. a! = LY~(Y. the lack of an “ x ” at (j. there are an infinite number of possiblesolutionsin terms of the combinations of weights that could achieve any particular input-output transformation.though.). Low levels of the circuit are well suitedfor making fine adjustmentsin the position and scale of the window of attention. one-dimensional dynamic routing circuit. 01. a. scaling. i) in connection spacedenotes that a physical connection exists from node j in the input to node i in the output. used when the window is small.and Q. The overall translation. N denotes the number of nodes within each layer. b and c. and blurring of the entire circuit (d. If a gaussian is used as the interpolation then wt.”asshown in Figure 3a. The gray level of each connection denotes its strength. and blurring. whereashigher levels are best suited for coarsecontrol. and then switching between these filters depending on the size of the attentional window. The horizontal axis represents nodesconstituting the input layer of the network.)> 24 function..
The stippled area represents those connections that are enabled. as shown in Figure 3c.. no connection pathway exists between those nodes.b and c. IV. in which the control blocks form the basis functions and the activations of the corresponding control units form the coefficients.@kxxx xxxxamxxx x x x l eeeieeeeeeeeeeee I Input * window of attention c l OUtpUt l l l l l l l l X l &W l l l l l l l l l c window of attention window of attention (aliased) . would be determined according to where c. j.g~xxx x x x a&% x x x )(xX)&3$(. i). November 1993. Note that for a two-dimensional routing circuit the connection matrix would require four dimensionsto display. but could also be usedto approximate scalingsas well. w. i) specifiesthe shapeof the kth control block in connection space.. then the strength of each connection. Figure3. then the band of enabled connections must translate acrossthe connection matrix.Another possibility would be for the control units to gate connections globally so that each unit is responsible effectinga singleposition and scaleof the window for of attention. Figure 3b showshow this would look in connection spacefor an attentional window centeredwithin the input array with a scale factor of one (i. aliasingwould occur.and if it were off then the connection would be disabled.However. but the concepts developed here are readily extendible to two dimensions. (j.We denote the strength of the connection at (j.the remaining connections are effectively disabled by mechanisms discussed below.e. The stippled region indicates thoseconnections that needto beenabled > 0) in orderto (wU mapthe regionwithin the windowof attention onto the output nodes.rather. That is. It may well be possibleto optimize the shapeof the control blocks using appropriate learning algorithms. asshownin Figure 4b. If d. If the window of attention is to shift to the left or right. An alternative way of expressing Equation 1that will beuseful later is rllk where r. otherwise. as shown in Figure 4d. 13(11) 4703 a. Note that the band of open connections must also be widened as it is tilted (corresponding to blur)..Shapingthe control blocks as in Figure 4c would be most optimal for realizing translations.. it can be seenthat the problem of setting the position. since each control neuron modulates only a small fraction of the many possible synapses i).The Journal of Neuroscience. needsto be set appropriately. the width of the enabled regionis too small.How this might be acin complished by the control units dependson how they are connected to the feedforward synapses the routing circuit. If a given control unit were “on.. leading to spuriouspatterns in the output representation(Fig.Weshalldenote the effective strength theconnection of from nodej in the input to node i in the output asw. . By viewing the routing circuit in this way. This could cause implementation difficulties and render the circuit neurobiologically implausible. An illustrationof “connection space. they should have a gaussianlike taper and overlap one another somewhat. One of possible scenariois for eachcontrol unit to modulatethe strength of a singlephysical connection (j. #J\ xxxxxxxxx. l i xxxlg~.k = *Jj. denotes the activity of the kth control unit. i).. Our proposedsolution to the control problem minimizes both the number of control units and the fan-out required by having each control unit modulate a local group of synapses-or a control block in connection space (Fig.Nearly any remapping could then be accomplished by simply activating the control units correspondingto the connectionswe wish to enable. 4~).the control blocks should not have sharpboundaries. this scheme would require a large fan-out for each control unit in a scaled-upsystem. If the window of attention is to be of a certain position and size. leadingto spurious patternsin the output). and blur of the window of attention amounts to one of generating the proper patterns of active synapses connection space.. T l l l l l l l xxxxxxxxx xxxxxxxxx xxxxxxxxx xxxxxxxxx xxxxxxxxx l ee*e**e*eee****e Input i x x x x&k xxxy. 3d). xxxeCh@W xxxq#%Xxx l x x x x:&t x x x xxxx. case some output nodes be lacking will any input. no magnification). size.this schemewould arguably constitute a waste of computational resources. Since the set of remappings we wish to accomplish (translationsand scalings) but a minute fraction of all possible is remappings.” inputcontains samThe 17 ple nodes the output contains and nine samplenodes.then aliasing result. The problem of forming the desiredpatterns in connection spacethen becomes an approximation problem... and the function \Ilk(j.an exwill aggerated is illustratedhere(i. Each x denotes a physical connection fromaninputnode to an outputnode.e. Note that rrlk = 0 for most combinations of i.” then its correspondingconnection would be enabled. i) as w. Changingthe size of the window of attention correspondsto tilting the band of open connections. and k. However. but the strategy illustrated here will suffice for our immediate purposes.~‘p~’ xxxxxxxxx XXXXXXXXX XXXXXXXXX 0 l l l l l l b. asillustrated in Figure 4a. denotesthe weight with which ck modulates the strength of synapse(j. i).In order to facilitate their ability to approximate patterns in connection space. We will usethe one-dimensional routing circuit for easeof illustration. In this sense. this scheme would require an enormous number of control units for a scaled-up system.a. the connection strengthsw!...
or a “control block. Figure 5. Move on to the next blob and repeat.4704 Olshausen et al. . 3. 0 l 0 l l 0 l *ea* xxxxxx l ooo*a***a*mm~oe~ l Input Input 1. Note the location. lower left” 5. . c. Each control unit modulates the strength of a large number of connections in order to effect a global position and scale of the window of attention.htput xxxxx . b. s-ii ] cl Recognition c0ritr01 b. x . size. the contents of the window of attention are fed to an associative memory for recognition.” d. : Inplt Figure 4. Blur objects into blobs. xxxxx. This process is then repeated ad infinitum..Model of Visual Attention and 4. Once an object has been localized.OutPut xxxx xxx xx X l . Objects are preattentively segmented via low-pass filtering. or until all interesting locations have been attended. 5 0 l 0 . . Feed the high resolution 4. a. I-+ “large ‘A’. Each control unit modulates a local group ofconnections. Each control unit modulates the strength of a single connection. xxxxxxxx:& l o*eaoe**m**o**** xxxxxx xx xxxxxxx x xxxxxxx x xxxxxxx 2 xxxxxx xxxxxxx E= $$*9 x x . d. . of attention to an associative memory. contents within the window and identity of the object. 2. Some possible control scenarios. A simple attentional strategy for an autonomous visual system. Approximating a desired position and scale of the window of attention using control blocks. . Focus the window of attention on a blob.
We can accomplish the first part of the objective by letting ck follow the gradient of an objective function. and size in the scene.z. We begin by formulating a solution for a simple one-dimensional routing circuit with one or more gaussian blobs presented to the input units. Focusing attention on a blob. Salient areas can often be defined on the basis of relatively low-level cues-such as pop-out due to motion. whereas step 4 is a highlevel problem beyond the scope of this article (cf. Anderson et al. 1985). but in general this need not be the case. 5). One possible choice for E.g. The control units then compete among each other. attention can also be directed via voluntary or cognitive influences. as shown in Figure 6a. depth.P)~/u~]. or potentially informative. areas of the visual input. 3. November 1993. 1987. via a.. (5) The second part of the objective (maintaining topography) can be accomplished by letting ck follow the gradient of a constraint function. via negatively weighted interconnections. 7.) In reality. 8) are denoted by the ampliJier symbol. 2.. that provides a measure of how well a blob is focused on the output units. = 1 and c. corresponds to a global position of the window of attention. are computed from the input units.The Journal of Neuroscience. = c. 1992). G. the network’s “goal” is to fill the output units with a blob while maintaining a topographic correspondence between the input and output (Fig. = 0. System objective. (3) Note that Equation 4 is obtained by substituting Equation 2 into Equation 3. Form a low-pass-filtered version of the scene so that objects are blurred into blobs. 1. Since the dynamic variables in this network are the ck. step 2). Mumford. 1985.. A simple one-dimensional routing circuit with a gaussian blob presented to the input units. The combined leaky integrator and squashing function (Eqs. we need to formulate an equation governing the dynamics of ck that accomplishes this objective. For example.. 5. and the desired blob shape. Feed the high-resolution contents of the window of attention to an associative memory for recognition.. f3(11) 4705 Autonomous control Up to now we have described an essentially “open loop” model of visual attention. E. u. (A blob may be defined simply as a contiguous cluster of activity within an image. The same circuit with control circuitry added to autonomously focus the window of attention on a blob in the input.). one could manually set the activity of the control units of the network so that the image within the window is remapped onto the output units of the network. The values on the output units. location. That is. then consider the object to be unknown. the values on the control units should be c. 1( Figure vob map 6. Here. = L: I 2 k CJ#Z/“. 5. b. Carpenter and Grossberg. we utilize a very simple measure of salience based on luminance pop-out in which attention is attracted to “blobs” in a low-pass-filtered version of a scene. That is. given a desired position and size for the window of attention. and 5. In order to focus the window of attention on a blob in the input. The following three subsections describe the details for carrying out steps 2. it would make sense for the attentional window to be automatically guided to salient. Thus. or right (c. Now inhibit this part of the scene and go to step 2 (find the next most salient blob).). Each control unit essentially has a gaussian receptive field in the input layer. 3. such that only the control unit corresponding to the strongest blob in the input prevails.. If there is not a good match. texture..O. center (c.. If a match with one of the memories is close enough (by some as yet unspecified criterion). Step 1 is trivial. q. or color (e. note its identity. Koch and Ullman. zy = 2 w. We now describe how the network may be autonomously controlled when provided only with visual input and no external commands beyond the initial task specification. In this simple circuit the Trlk are set so that each control unit c. p. is the correlation between the actual values on the output units.). then consider the object to have been recognized. The purpose of attention is to focus the neural resources for recognition on a specific region within a scene. G. in order to accomplish the remapping shown. those that . = exp[-(i . Each control unit corresponds to a different position of the window of attention: left (c. E constrant~ favors valid control states-that is. 4. I?‘. either learn it or disregard it. Select one of the blobs from the low-pass image-whichever is brightest or largest-and set the position and size of the window of attention to match the position and size of the blob. but these are not incorporated into our present model. We propose the following simple but useful strategy for an autonomous visual system (see Fig.
P. become additional dynamic variables hidden in the last term of Em.G.. Thus. . then we can replaceE. v..Equation 8 statesthat the input to each c. All of theseunits would then compete with one another so that the window of attention is constrained to a singleposition and scale..(Seeexample in the next section.z . while control units correspondingto a small window of attention would derive their inputs from a fine-grained (high-resolution) blob map. + zpem.. V. < 0). . c. (The more generalcaseusingcontrol blocks is describedbelow. essentiallyderive their inputs directly from a “blob map. the only dynamic variables are the V. would be computed by correlating the gaussianvalues.) In a more biologically plausible scenario..and this smaller setof control neurons is then usedto constrain the activities of the fine-grained control neurons routing the highresolution information. Olshausen. Also...k). like those shown in Figure 4c. 1984) is usedfor recognition.. Zymdenotesthe inputs to the memory. However.‘(V)d?’ .. .) Sincethe inputs of the associativememory are to be obtained directly from the outputs of the routing circuit (Zym = I?‘). In this case. routing is at first performed by a small number of control neuronson a low-pass-filteredversion of the image. the control units would be configured into control blocks. (Eq. One possible choice for EcOnStraln. is the “leaky integrator term. and this may causeproblemsfor matchingthe high-resolutioninformation. of the gradient of the energy. the term Z. that are “connected” via that control unit (specifiedby Ill.. g. is E cO”S. then. * Model of Visual Attention and Recognition corresponding to translations or scalings of the input-output transformation.. (6) where the constraint matrix Tc is chosen so as to couple the control neuronsappropriately. (9 In this equation the V. c.. the underlying high-resolution information can also be fed through the routing circuit and into the input of an associative memory for recognition. which evolve by following a monotonically increasing function.1 0 = -1 kzl k=l.). SeeAppendix. and g..2 KZym.. are constants that determine the integration time constant of each neuron. The second term is computed by forming a weighted sum of the activities on the other control units. Thesetwo results are then summed together and passedthrough a leaky integrator and squashingfunction to form the output of the control unit. A sigmoidal squashingfunction (u) of Econstrainf is usedto limit ck to the interval [0. each control neuron correspondsto a different position of the window of attention. The dynamics of Equations 10 and 11can be implementedin simple. with the associativememory’s “energy” function. Once the window of attention has been focused on a blob. This schemehas the effect of introducing many local minima. with a shifted version of the input (the amount of shift depends on the index k). denotesthe connection strength betweenneuronsi andj.. is a squashingfunction such as tanh(x). E. it is likely that the initial estimation of position and size made by routing the blob would be only approximately correct.j matrix (first term of in E.Io. Recognition. 11. and so the control neuronsneed to be more tightly constrained in order to converge on statesthat preserve local spatial relationships... Normally. By letting the c. That is. follow the gradient of&m. The circuit of Figure 6 could easily be modified to allow for different sizesof the window of attention by adding another set of control units for eachdesiredsize of the window of attention. and the constant p determinesthe contribution relative to E. and E ConStralnt given by Ck = d”k). (The secondterm of E.(A derivation of Eq.. In this scheme. I (11) where C. The first term on the right of Equation 8 is computed by correlating the gaussian. (7) where the constantsT and 7 determine the rate of convergence of the system. that simultaneously minimizes is both E.y. 8 is given in the Appendix. which is defined as Y g. while control units that are not part of the sametransformation inhibit each other (Pk....). would need to be modified in this casesothat those control units correspondingto a common translation or scalereinforce eachother (Pk. shall the associativememory be incorporated into the control of the routing circuit? If a Hopfield associativememory (Hopfield. T.) A neural circuit for computing Equations 7 and 8 is shown in Figure 6b. and thus forces a winner-take-all solution. denote the output voltages on the associative memory neurons. correspondingto the strongestblob prevails. and the input values. and R. The control units correspondingto a large window of attention would then derive their inputs from a coarse-grained(low-resolution) blob map. For the simple circuit of Figure 6a. Note that sincethe Gi are fixed. Note that the effect of minimizing E. We have accomplishedthis by utilizing a coarse-to-fine control architecture (B.. it would be desirable to have the associative memory help adjust the position and scaleof the attentional window while it converges.. This hasthe effect of punishing any state in which two or more control units are active simultaneously. the control neurons.= 2 ck %c. and (2) the similarity between the v and the inputs Zyrn (last term of E.. so we could define Tc as Pk.ral”.. is to simultaneouslymaximize (1) the similarity between the neuron voltages. unpublished observations).) A dynamical equation for c.” which is unimportant for now. How. however. the constraint matrix. the c.. G. as illustrated in Figure 7.4706 Olshausen et al.. and one of the stored patterns superimposed the T.neural-like circuitry.” and then compete among each other so that the c. the combined associativememory/ routing circuit should relax to the closeststored pattern and to the correct position and size of the window of attention simultaneously.. = 2 K.. 8) can essentially be considered a fixed weight. > 0). Thus. Gi. along with the V.
13(11) 4707 memory x XX XXX xxxx 0 l oooooooooooooooo Input Figure 7. when a group of control units are active for sometime (long enough for recognition to take place) they should begin to shut off.. A crude form of such a coarse-to-fine strategy has been utilized in the computer simulation below. during blob search. We assumed that (A derivation is given in the Appendix.g. the network has settled on the “A. that simultaneously minimizes both E. Thus. An autonomous routing circuit for recognition. Buhmann et al. This then puts the next interestingobject at a competitive advantage in attracting the window of attention so that it may also be recognized (Fig. the main qualitative difference betweenthis circuit and the “blob finder” (Fig.neural-like circuits for shifting and resealingthe information from an input array into a higher level. The current control state is then self-inhibited and the network switchesback into blob search mode. shifting. the window of attention should move on to another interesting part of the scene. Summary of the model By using control neuronsto modulate connection strengthsdynamically. whoseconnection pathways are influenced by control unit c. p. 9b). November 1993. it would be advantageousto perform the combined processof pattern matching..control units compute the correlation between memory outputs and retinal inputs and set their activation correspondingly. The control unit corresponding to the block shown (stippled region) should have excitatory connections ( Fk. those along the “-” directions. Witkin et al. In order to avoid local minima. and EcOnSfraln. V. we have bypassedthe step of prefiltering them into blobs. object-centered referenceframe. 1985). The network beginsin blob searchmode. attempting to fill the output of the routing circuit with something interesting. A dynamical equation for c. This tends to maximize the similarity between the outputs of the memory and the outputs of the routing circuit. the low-passinformation can be usedinitially to send the memory into the right part of its searchspace. . 1990). This will then allow other blobs or interesting items to compete successfully for control of the window of attention (seealso Koch and Ullman.” since it has the greatest overall brightnessin the input.) A neural circuit for computing Equations 12 and 13 is shown in Figure 8. the network is switched into recognition mode and the output of the routing circuit is fed to an associative memory. The blurred version of the object initially drives the inputs of the associativememory to begin the pattern search. Thus. Two patterns-“A” and ‘C-have been previously stored in the associative memory.an object is low-passfiltered by the routing circuit itself. > 0) to other control units whose blocks form a consistent position and size of the window of attentionthat is. Computer simulation Figure 9 showsthe resultsof a computer simulation of a simple attentional system for recognizing objects.. Control unit interactions when configured into control blocks. This scheme is somewhat analogous to the way constraints are imposed in the Marr/Poggio stereo algorithm (Marr and Poggio. basedon the ptincipleselucidatedabove. As the associative memory converges. we have derived simple. (specifiedby I’& The other terms are computed as before. Thus. those blocks lying along the “f” directions. the blurred version of the object is not affected much and still sendsthe memory searching in the correct direction. Hence. 1987. Inhibitory connections (T. < 0) should be formed with control units whose blocks are inconsistent with this one-that is. and outputs. The outputs of the associative memory are then fed back and correlated with the inputs to drive the control units. each node of the associative memory has dynamic connections to many input nodes. the simulation states the position and presumed identity of the object.. 1976). is given by Figure8.) After settling on a potentially interesting object.. (Since the shapes usedin this example are so compact and simple. . which will also refine the position of the attentional window so that the high-resolution componentscan be properly matched (Fig. Once an object has been recognized. After allowing a fixed amount of time for the associativememory to converge (another time constant or two).. Each node of the associative memory receives its external input from an output node of the routing circuit.If the position of the window of attention is slightly off.One way this could be accomplishedwould be for the control units to be self-inhibited through a delay.In this way. and scalingin a coarse-to-fine manner by utilizing information at multiple scales(e. 6) is that the control is guided by the interaction between top-down and bottom-up signalsrather than purely bottom-up sources.The Journal of Neuroscience. In Figure 9a. 9~4.the initial output of the associative memory can then be used to better refine the position and scaleof the window of attention before allowing in higher-resolution information. Shifting attention. The first term on the right of Equation 13 is computed by correlating the inputs.
i . . Although these circuits have been greatly oversimplified for the purpose of illustration. Cortical areas pathway.some of which may be devoted to specialized aspectsof pattern recognition.\. The following sections describehow we envision the dynamic routing circuit mapping onto this collection of neural hardware. The blurring function of the routing circuit has been facilitated in this case by setting the constraint matrix so that neighboring positions of the window of attention only weakly inhibit each other.. The C is now at a competitiveadvantage attractingthe windowof attention(c) and in is subsequently recognized the associative by memory(d).. Information from the retino-geniculo-striate pathway entersthe visual cortex through areaV 1 in the occipital lobe and proceedsthrough a hierarchy of visual areasthat can be subdivided into two major functional streams(Ungerleider and Mishkin. see Fig. * Model of Visual Attention and Recognition a. magno and parvo streams.in the simplestsense.regardless of their identity. The remaining areas-V2. Memory . The network begins in blob search mode. Memory b. each node would correspond to a feature vector that is representedby the activity profile on a large group (hundreds or thousands) of neurons in each visual area. and large (16 x 16)]. each one corresponding to a different size of the window of attention [small (8 x 8). which is rea- . Each control neuron within a set corresponds to a particular position of the window of attention. we derived equations for governing the dynamics of the control neurons in both “preattentive” (blob search) and “attentive” (recognition) modes..000 samples the retinal image(. 1982). The different stages the network correspondto of the major cortical areasin the “form” pathway. and IT-occupy one stage apiece. From this basic assumption. Robinson and Petersen. The so-called “where” pathway leads dorsally into the posterior parietal complex (PP). a subcortical nucleusof the thalamus. onenodefor eachoutputof the routingcircuit). 1982). . The network has settled on the A since it has the greatest overall brightness. . Figure lob showsthe scaled-uprouting circuit that we propose as a model of attentional processingin visual cortex. but we contend that these details can safely be neglectedfor now without losing the predictive value of the model. sampleof imageluminance. a More realistically. Computer simulation a simple of attentional system recognizing for objects. a useful strategy for an autonomous visual systemwould be to focus its attention on interesting regions within a scene and attempt to recognize whatever is there.\..\\. in After a fixed amountof time. . The sizes of the first four layers scaleroughly with the relative sizesof each correspondingcortical area (V 1 = 1120 mm*. Only a relatively small portion of IT would be required to representthe actual contents of the window of attention.e. Each node within a stage represents. input to the routingcircuit consists a 22 x 22array The of of sample nodes the output of the routingcircuit isan 8 x 8 array of sample and nodes. b..\\ .4708 Olshausen etal..in a small region of visual space.and different spectralbands(Van Essen and Anderson. scaled-up routing circuits composedof multiple stages. The number of nodes in the other layers is dictated by the rulesspecifiedin the previous section. The so-called“form” pathway leadsventrally through V4 and inferotemporal cortex (IT) and is mainly concerned with object identification. now turn to the issue We of how such circuits may possibly be implementedin the brain. There are three sets of control units. Felleman and Van Essen. The fan-in for each node is about 1000 inputs. perhaps becauseit includes a complex of multiple areas. medium (11 x 1 l)... Neurobiological Substrates and Mechanisms Figure lOa shows the major visual processingcenters of the primate brain. attempting to fill the output of the routing circuit with something interesting. and Vlb correfor sponding to superficial layers. The dashed and into outline within the input array denotes the position and size of the window of attention.lzl +-‘. the currentcontrol stateis self-inhibited the and networkis switched backinto blob search mode. The Hopfield associative memory network (“Mem output”. since Vl has about twice the density of neuronsper unit surfaceareaas the rest of neocortex (O’Kusky and Colonnier. The input layer of the network (V 1. The position and size of the objectare encoded the activities of the control neurons. This correspondsroughly with the number of complete spatial samples delivered by the lo6 optic nerve fibers when one takes into account the fact that information is divided into on. 1990). IT is disproportionately large.1991)...550 nodes of acrossin one dimension).c andd.+: lizl Memory Pl output P-l output Pl output . The pulvinar. 8) is composed of 64 units. fully interconnected arranged an 8x 8 grid (i.:. a. the basic principles can be extended to larger. in Vl. regardless position or of size. d.:. .It is impractical at this stageto include these characteristicsexplicitly in our model. each group would include cells selective for various orientations. and spatial frequencies..::::::.A.1992). The network is then switched into recognition mode and settles on the identification of the object. makes reciprocal connections with all of these cortical areas(cf. layer 4C) contains approximately 300. 0I Control Input Input Control Input Input Figure 9. V4 = 540 mm2. For example. There are two The ‘fform” stages Vl: Vla corresponding to layer 4C.and off-channels.given a fan-in of 1000 inputs per node (-30 inputs in one dimension).‘. V4. V2 = 1190 mm*... and seems be concernedwith to the locations and spatial relationshipsamong objects.:. Memory C.
which is several orders of magnitude beyond what is neuroanatomitally plausible. --L-q-L f Superior Colliculus u”uu’u” Figure b.e. contrast sensitivity to gratings. b.The Journal of Neuroscience. 1992). Thus. the resulting receptive field sizes (in the all-connections-open state) are consistent with the observed increase in the size of classical receptive fields as one proceeds upward through the form pathway (Gattass et al. Connections are shown for the center node in each layer. which represents the contents of the window of attention. 198 l).. This is also supported by lesion studies that show that damage to the parietal lobe in humans hinders the ability of other objects in the field of view to attract the attentional window away from the currently attended location (Posner et al... 199 1. 13(11) 4709 Occipital lobe a. 1985. Koch and Ullman. VII-PP) are not shown. Also. a fanin of nearly lo6 would be required for the neurons in IT. 198 1). While we certainly allow for some give and take on all of these numbers.. These latter results suggest that PP may be representing the locations of potential attentional targets. Steinmetz et al. These neurons would then drive the control neurons that compete for control of the window of attention. The number of sample nodes in two dimensions is approximately the square of this number. we believe this circuit contains the essential components to explain how information can be routed from a shiftable and scalable window of attention in Vl into IT while preserving spatial relationships. November 1993.. Others have reported a threefold enhancement for unattended targets when the animal is in an attentive state (Mountcastle et al. manually controlled). 199 1). even when no eye movements are made (Bushnell et al. This then corresponds to the spatial resolution of the window of attention in our model. and recognition (Campbell. Figure 11 shows some example out- puts of the simulation when attention is focused on different items within a scene. or even a relative suppression for attended targets as opposed to unattended targets (Robinson et al. because there is no need to gate the inputs selectively to a neuron if it is not being attended to. Major visual processing pathways of the primate brain.. contains approximately 1000 sample nodes. To avoid clutter. -go0 -30° -8” -2O 0” 2’ go 30’ 90’ 10.. The “where” pathway.g. or a window size of about 30 x 30 nodes. This estimate is roughly consistent with several lines of psychophysical evidence. Note that regions outside the window of attention in each cortical area are blurred. The label beside each layer indicates the corresponding cortical area and the number of sample nodes in one dimension. 1990a). the program appropriately gates the feedforward connections at each stage in the routing circuit so that only the contents of the window of attention are routed to IT. sonable for cortical neurons (Cherniak. we have created a computer simulation of an “open loop” version of the model (i. The specific predictions generated by this circuit will be discussed in the next section. The output of the network. 1990. 1985).g.) Control signals originate from the pulvinar to effectively gatethe feedforward synapses. 1985) analogous to the blob map utilized in the simple attentional system described previously. Proposed neuroanatomical substrates for dynamic routing. In order to better visualize the operation of this circuit. we propose that PP may act as a “saliency map” (e. At the bottom is shown a scale of the approximate eccentricity of the input nodes to the circuit.. Given a user-specified position and size for the window of attention. Douglas and Martin. a. . 1984).. as opposed to targets already being attended. Some studies have reported that neurons in this area show an enhanced response to attended targets within their receptive fields. (Individual nodes are indistinguishable here becauseof their density. The posterior parietal cortex (PP) is known to play an important role in attentional processes. including studies of spatial acuity. Note that without the multistage hierarchy. many known connection pathways (e. Van Essen et al..
Corbetta et al. the bottom left image shows hypothetical a retinalimage. The relative proximity of pulvinar neuronsto eachother would facilitate the competitive and cooperative interactions among the control neurons. One possibledrawback is that PP neurons typically have relatively long latencies. Bender. in u. Duhamel et al.4710 Olshausan et al.. suchasthrough the superior colliculus. V2.In eachcase. The pulvinar also receives a massiveprojection from the superior colliculus. which . morerealisticsimulation the A utilizing a log-polar latticehasalsobeen constructed. or filtering out unattended stimuli. 1990. 1989. 1991). the dashed outline within this image and indicates positionandsizeof the windowof attention. 1972). This proposalcontainsat leasttwo potential weaknesses. Ogren and Hendrickson (1979) have reported the existence of interneurons with elaborate dendritic trees approaching 600 pm in diameter. however. Goldberg and Wurtz. the but essential predictions the modelaremoreeasilyconveyed of with this simpler versionof thecircuit. of corticaldynamicroutingcircuit (noautonomous control). However. 1978. 1985. thus making it a plausible candidate for modulating information flow from VI to IT.andIT (the output). b.e.100 msec (Robinson et al..b. connections between input and output are 1:1).. 1992)-which is hard to reconcile with psychophysical data that imply that attention takes . A possible solution to this dilemma is that the superior colliculus may supplementPP by acting asa crude saliency map. which are necessary enforce the to constraint of maintaining spatial relationshipswithin the attentional window.a.. Attention is focused the letter Tat highest a on resolution (i..50 msecto move to a new location in the visual field (Nakayama and Mackeben. The other drawback of this proposal is that currently available anatomical data seem to offer relatively few direct pathways by which PP could influencethe control neuronsfor modulating information flow in the “form” pathway. In addition. and positron emission tomography studies(LaBergeand Buchsbaum. 1987. The pulvinar is reciprocally connected to all areasin the form pathway. [The image used this example obtained in was from Anstis(1974).The image the above this shows output of the routingcircuit-the contents the 30x 30 windowof attention. but with a quicker response time due to its direct retinal input (the latency of neurons in the superficial layers of the superior colliculus is in the rangeof 4050 msec. receptivefield of a hypotheticalIT cellis shown the (smallin a and largein b). 1987). soresolution sacrificed on and is within the windowof attention. 1991. neurophysiological studies (Petersenet al. the receptive field of a V4 celloutside windowof attentionisalsoshown. Subcortical areas We hypothesize that the pulvinar complex plays an important role in providing the control signalsrequired for the routing circuit. l Model of Visual Attention and Recqnition a.. 1990). V4.] Figure II. Although it is not known whether such interactions exist among pulvinar neurons. 1992). Gattass and Desimone.In both a andb. 1991) of the pulvinar suggest that it plays a role in engagingvisual attention. there do exist indirect pathways.Desimone et al. lesionstudies(Rafal and Posner.Thefour images to the right showfour stages the of of theroutingcircuit: VI (essentially copyof theretina). 1990. ‘IT-t 4 UINDOU OF RTTENTION YI ON RETINA 'VI' Computer simulation a sealed-up. that may provide viable alternatives (seebelow). Saarinen and Julesz. which is known to encode the direction of saccadetargets and may also be involved in setting up attentional targets (Posner and Petersen. A subcortical nucleussuch as the pulvinar also has the important property of being spatially localized while at the same time being able to communicate with vast areasof the visual cortex. Attention is focused a largerregionof the scene. 1988..
” the pulvinar might be influenced primarily from a saliency map of targets in the parietal lobe or superior colliculus. insofar as the inferior pulvinar projects mainly to lower areas (Vl. where thalamic relay cells exhibit two distinct response modes: a relay mode. The simple autonomous routing circuits of Figures 6 and 8 suggest an interesting role for the projections to the pulvinar from the parietal and temporal lobes and the superior colliculus. there is some overlap near the border between the lateral and medial portions of the pulvinar where these two streams intermingle. this would constitute a reasonable lower bound for the number of neurons in the pulvinar. or object-based. For example.The Journal of Neuroscience. 1992) and some against (Douglas et al. which is an inhibitory structure through which pulvinar neurons project on their way to the cortex. A somewhat different form of gating seems to take place in the LGN.lo6 projection neurons. 1992). An alternative means by which IT could supply top-down guidance to the control neurons would be via corticocortical feedback pathways. each pulvinar control neuron would require an additional fanout for controlling the inputs to all the neurons corresponding to an output node. 1978). Volgushev et al. Thus. which would be desirable for a winner-takeall type circuit. Theoretically. however. One study in G&go (Conley and Diamond. November 1993. During recognition. spatial frequency. Since there may be hundreds of neurons for each node.000 control neurons would be required for the first stage.) However. This could possibly be subserved by neurons residing in the deeper layers (5 and 6) of the cortex (see Van Essen and Anderson.000 for pulvinar neurons is probably too large to be plausible). During “blob search. guidance from IT. Evidence for this type of mechanism playing a role in orientation or direction tuning is mixed. as in the spinal cord. Postsynaptically. Another possible postsynaptic gating mechanism could be realized via the combined voltage. parietal cortex may also communicate with IT-recipient pulvinar indirectly through the superior colliculus. for example. the reticular nucleus of the thalamus is thought to be the source of the signal that switches the LGN into the nonrelay burst mode.. Assuming that the control blocks comprise approximately 1000 synapses each. and so on. However. An analogous function might also be served by the reticular nucleus of the thalamus. such as orientation. 1990). Gating mechanisms Neural gating mechanisms are believed to play an important role in many aspects of nervous system function. The pulvinar would then alternate between these two modes of input as attention moves from one object to the next. then the number of control neurons required for each stage of the routing circuit would be about the same as the number of output nodes of each stage (since the fan-in per node is also about 1000). is that the anatomical evidence suggests that PP and IT project mostly to segregated portions of the pulvinar (Baleydier and Morel. which is well within the estimated number of neurons in the pulvinar. 1988). On the other hand. 1992. in which cells tend to replicate retinal input more or less faithfully. the minimal number of control neurons is given by # of control neurons = (# of output nodes) x (fan-in per node) (# of synapses per control block) . it would make sense for each stage of the routing circuit to have its own set of control neurons.. Under this scenario.175. The control neurons for the lower stages would need to compete only locally. Gating mechanisms are also thought to play an important role in sensorimotor coordination. The pulvinar’s role would thus be analogous to that of a general in an army-coarsely specifying a plan of action. 1992). In addition. top-down influences from IT might then take over to refine the position and size of the attentional window for object matching. 1986). IT). since these stages would be more concerned with making local adjustments in the position and scale of the window of attention. which the cortical control neurons refine into a concise remapping under top-down. would probably provide the most localized gating effect. there are several possible biophysical mechanisms that would allow control neurons to gate synapses along the VI-IT pathway. -250. (The pulvinar has somewhat lower neuronal density than the LGN. . in which cells burst in a rhythmic pattern that bears little resemblance to the retinal input (Sherman and Koch. which is suggestive of lateral inhibitory connections within the pulvinar. Control would then be implemented in a hierarchical fashion. since these stages are setting the position and scale of the window of attention for the entire scene. In this instance. where descending fibers from the raphe nuclei form part of a control system that modulates pain transmission via presynaptic inhibition in the dorsal horn (Fields and Basbaum. and so on. to date there exists no morphological evidence for this type of synapse in the visual cortex (Berman et al. Since the LGN contains . Control neurons at the highest stage would need to compete globally. with some for (Pei et al.and ligand-gated NMDA . there are many instances in which spinal cord central pattern generators gate sensory inputs according to the phase of the movement cycle in which the input occurs (Sillar. and cortical control neurons specifying how information is routed between the neurons belonging to each node. control neurons within the cortex would be driven by feedback signals emanating from IT once the pulvinar neurons have roughly set the position and size of the window of attention. Although there is as yet no explicit evidence for gating mechanisms in the visual cortex. To first order. (1987) have shown that enhancing or depressing inhibition within the pulvinar can respectively slow down or speed up attentional shifts. with each pulvinar neuron specifying how information is routed between nodes. a control neuron could decrease or possibly nullify the efficacy of a corticocortical synapse via shunting inhibition. each output node in the circuit actually corresponds to a multitude of neurons representing various features. 199 1). the extent to which a noxious stimulus is perceived as painful varies greatly as a function of one’s emotional state and other external factors.. As noted already. 1990) has shown that the pulvinar projects quite diffusely into the reticular nucleus. 73(11) 4711 could mediate communication among pulvinar neurons. Thus. Presynaptic inhibition. but also is several times larger. A potential weakness of this proposal. neuropharmacological experiments by Petersen et al. the pulvinar neurons would need to amplify their fan-out via other neurons (a fan-out of 100. V2) and the lateral and medial pulvinar to higher areas (V4. and a non-relay burst mode.. The number of control neurons that would be required for the routing circuit depends on how many cortical synapses are modified by each control neuron.000 for the second stage. This is subserved at least in part by gating mechanisms in the spinal cord. The anatomical subdivisions ofthe pulvinar correspond roughly with this scheme.
4). In this case. which can then be summed together within a single cell. Threshold adjustment could perhaps be subserved by chandelier cells. 1982. (See also Desimone. would presumably be interested in the information from regions lying outside of the attentional beam. Also.. should dramatically degrade attention and pattern recognition abilities. the unboosted synapses will be essentially suppressed to a relatively low strength. 1988. 1992). Nagel-Leiby et al. and then probing the receptive field with a neutral (behaviorally irrelevant) stimulus to measure its extent. Under an excitatory gating scheme. 1987. gating of inputs within individual dendrites provides a much higher degree of flexibility than would merely gating the outputs of pyramidal cells. Preliminary results using such a paradigm suggest that the receptive fields of V4 cells do indeed translate toward attentional foci in or near the classical receptive field (Connor et al. Pettet and Gilbert. the hypothesized control center. This is what one would expect from our routing circuit. such as those in PP. While Moran and Desimone’s findings offer some support for attentional modulation effects predicted by the model. because once a V4 cell lies outside the region of interest in V4 it no longer needs to restrict its inputs (Fig. 1972.. 1988) that could provide nonlinear coupling between inputs. 1992). Predictions Neurophysiology.. Chalupa et al. respectively. All of these mechanisms. as well as the differences between our model and other network models that have been proposed for visual at- tention and invariant pattern recognition. some pattern recognition abilities appear to be relatively unimpaired by pulvinar lesions (Mishkin. 1987. 1993). however.) Discussion Because of its detailed neurobiological correlates. 1990). they did not attempt to map receptive fields under different attentional conditions with any precision. When control signals are present to boost the gain of individual synapses. 1992) and provides a higher degree of flexibility in sculpting patterns in connection space (see Fig. which make strong inhibitory connections exclusively onto the axon initial segment of pyramidal cells (Douglas and Martin. 1987). 12~). there remains a pressing need to resolve empirically the extent to which cortical receptive fields can dynamically change-position and size. the routing circuit model makes a number of interesting predictions that can be tested experimentally. a control neuron could effectively boost the gain of a corticocortical synapse by locally depolarizing the membrane in the vicinity of the synapse.. Indeed. they found that if two bar-shaped stimuli were placed within the classical receptive field (CRF) of a V4 cell. shifting to high spatial frequency for a small window of attention. our model predicts that V4 receptive fields could become up to 1OO-fold smaller than the CRF (in one dimension) when attention is at highest resolution. One would expect a cortical receptive field to shift as the attentional window is translated. Some support for this prediction comes from the neurophysiological findings of Moran and Desimone (1985) in areas V4 and IT of primate visual cortex.. and the firing threshold of pyramidal cells should be lowered to let all information through. .. Gating inputs within the dendrites. This way. and the animal was trained to attend to only one of them. such as shunting or presynaptic inhibition. any modulation of the cell’s output will simply be duplicated at all these subsequent input points. This results in a computational structure that is orders of magnitude richer (Mel. 1993). the threshold should be raised. one would need to hypothesize the existence of a gain control mechanism working in concert with the control neurons. Under an inhibitory gating scheme scheme. The finer the resolution desired within the window of attention. Desimone et al. While this extreme is unlikely. on the other hand.. and possibly others. Since the output of a pyramidal cell may branch to several cortical areas and make synaptic connections to a multitude of neurons. and briefly outline the unresolved issues that remain as topics for future research. and to expand or shrink as the attentional window is made larger or smaller.. This effect is also predicted by the model. They also found that the V4 cell responded to an unattended stimulus anywhere within its CRF when the animal attended a stimulus outside the CRF. As schematized in Figure 12. We believe the demonstrable computational advantage of dendritic gating mechanisms for visual processing motivates the need to specifically look for such mechanisms experimentally. 1984. 1989. 1990b). other targets of V4. Gallant et al. given the evidence for complex receptive fields in V4 (Desimone and Schein. there exist voltage-gated Ca2+ channels in dendrites (Llinas.. We predict that the optimal spatial frequency for the cell should change as well. These predictions can be tested by giving the animal a task that forces it to attend to a region of a specific size and location. 1992. Evidence that gain control mechanisms indeed exist in visual cortex has been established in previous physiological studies (Ohzawa et al. 1992). Bender and Butter. Desimone et al. While there is substantial evidence linking pulvinar lesions to attentional defects (Rafal and Posner. The most obvious prediction of the dynamic routing circuit model is that the receptive fields of cortical neurons should change their position or size as attention is shifted or resealed. 1972. One possible reason for the apparent sparing of pattern recognition is that the tasks used in these studies generally . Evidence for nonlinear interactions of this type have been reported for synaptic inputs into layer 1 of neocortex (Cauller and Connors. for a discussion of output vs input gating mechanisms. 1984). such as via NMDA receptors. since the pathways between the cell and the unattended stimulus would be effectively disabled in this case (Fig. their results do not address the more specific effects predicted by the model. In this section we discuss these predictions. and to low spatial frequency for a large window of attention. then the cell’s response to the unattended stimulus was substantially attenuated. When no control signals are provided. 12d). 1992). offer a multiplicative-type effect that is suitable for gating information flow through the cortex (see also Koch and Poggio.4712 Olshausen et al. in which case neurons in IT would exhibit the very large receptive fields observed in anesthetized or inattentive animals (Gross et al. We also describe some generalizations of the model. which has been shown to play an important role in normal visual function (Miller et al. In its present simple form. cortical input would be rather weak. Bender. thus. allows the nonlinear computation of many intermediate results (& cJukq) within the postsynaptic membrane. This effect should be especially pronounced in higher cortical areas. Another physiological prediction of the model is that lesions to the pulvinar. the more the control neurons would need to be engaged. The absence of any activity on the control neurons would correspond to all connections being open (the inattentive state). Nelson and Sur. 1976. the control neurons would need to become active only when attention is actively engaged on an object. From a computational viewpoint.Model of Visual Attention and Recognition receptor channel.
The Journal of Neuroscience. since the distribution expect a retrograde of connections injection in the routing comesmore patchy at higher levels (seeFig.----Classical receptive field “ineffective” stimulus “effective” stimulus c. The routing circuit model predicts that the lesionson recognition abilities. The hatched region indicates those connections to the cell that are enabled. (1985) have reported suchan enhancementeffect for neurons in the dorsomedial portion of the pulvinar (which size of the cortical region from which a cell receives its input should increase by about a factor of 2 at each stage in the hierarchy of visual areas in the form pathway. The physiological responsesto be expected from pulvinar neuronsdepend on how they are configuredto gate information flow in the cortex. were very simple. 1992). c. In an inhibitory gating scheme. In an excitatory gating scheme. and it would be interesting to repeat this experiment while reversibly deactivating the pulvinar. would one expect to find enhancedresponses from pulvinar neurons projecting to areasof the cortex within the attentional beam only. but not in the inferior or lateral portion (which is connected to VI-IT). 1991. It is conceivable that such a task could be carried out even when the fidelity of the remapping process hasbeen compromised. and little or no response from pulvinar neurons projecting to those areasof the cortex substantially outside the attentional beam. some evidence in support of this prediction-for While there is example. one would in V4 or IT to result in a patchy distribution in the lower level. 13(11) 4713 a. b. In the nonattentive state. 1991)-more accurate and higher resolution data are circuit be- neededin order to confirm or contradict this prediction.one would expect enhancedresponses from pulvinar neuronsprojecting to areas of the cortex within and immediately surrounding the attentional beam. as this would require the greatest participation from the control neurons in gating out irrelevant information. November 1993. The node in layer V4 indicates the cell under scrutiny. A more rigorous test using stimuli that demand the full spatial resolution capacity of the window of attention would be better suited to test the effect of pulvinar these latter areasmay be due to the fact that the task usedin this experiment was very simple (detecting the dimming of a spot of light). 1986. the cell’s response should decrease substantially since the neural pathways to the effective stimulus are gated out. Attend to effective stimulus Vlb Vlb I. which indeed has beenreported (Fellernan and McClendon. Neuroanatomy. the others are disabled. Pulvinar lesionswould also be expected to diminish the result found by Moran and Desimone (1985). No attention (all connections ape b. 1976). When attending to the effective stimulus.. the cell’s response should be unaltered since the neural pathways to the stimulus are still open. there is no need to gate the cell’s inputs since it is no longer taking part in the process of routing information within the window of attention. . The bounds of the window of attention in each area are shown by the stippled lines. d. Again. The dynamicrouting circuit interpretation of the Moran and Desimone (1985) experiment. The lack of enhancement in (Chalupa et al. Also.. Felleman et al. such as distinguishing a large “N” from a “Z” is connected with PP). Attend outside k------j rI Vlb Figure 12. Attend to ineffectivestimulus d. a. When attending to the ineffective stimulus. 1990. con- nections betweenV4 and IT are more diffuse than connections between Vl and V2 (Van Essenet al. lob). When attending outside the cell’s CRF. all connections will be open and the effective stimulus can excite the cell anywhere within its CRF. DeYoe and Sisola.. Petersenet al. a more appropriate task would be one that fully taxes the capacity of the attentional window.
This prediction is partially supported by the existence of interneurons within the pulvinar (Ogren and Hendrickson. 1990). A more appropriate experiment might be one that tested pattern discrimination ability as a function of the position. For example. There is some evidence for such a mechanism. the model predicts that there should exist lateral inhibitory and excitatory connections within the pulvinar in order to enforce the constraint of preserving spatial relationships within the window of attention.g. James. it is not known whether the pulvinar afferents make synapses with inhibitory interneurons or directly onto the dendrites of pyramidal cells.. Note . once a location has been attended to in the visual field it should be difficult to stay there or immediately revisit the site. or to what extent the reticular nucleus of the thalamus might subserve this role. On the other hand. they studied only two tasks-detecting a nonmoving target among moving distracters. This prediction is most consistent with Remington and Pierce’s (1984) study showing time-invariant shifts of visual attention. Unlike eye saccades. and it has proved to be a powerful computational strategy for object recognition systems (Hinton.100 pixels total) is somewhat lower than ours. various studies have reported evidence for a “zoom lens” model of attention in which the density of processing resources decreases as the size of the attentional window increases (Eriksen and St. and resolution of an object. Also. 1986. 1987). size.. 3) of Vl and to both the input and output layers (3. * Model of Visual Attention and Recognition Another anatomical prediction of the model is that the terminations of pulvinar-cortical projections should be suitably positioned for effective modulation of intercortical synaptic strengths. our present model predicts *that performance would drop off sharply once the spatial frequency content of the stimulus exceeded approximately 15 x 15 cycles per object. 1979). 6). including studies of spatial acuity. 1987. 199 1). We contend that a key disadvantage of such approaches is that information about the effective connection state at any one point in time is not explicitly encoded anywhere in the system. von der Malsburg and Bienenstock. 1983) are in disagreement (but see Eriksen and Murphy.. We speculate that the progression of activity across the control neurons is what underlies one’s perception of motion in such cases. The model also makes some interesting predictions with regard to the dynamics of visual attention. On the other hand. because the control neurons corresponding to that part of the visual field would be transiently inhibited from firing. Rezak and Benevento. In our model. those experiments that have been directed at studying the amount of “resources” allocated during visual attention have largely ignored the issue of spatial resolution. The amount of time that it takes the attentional window to shift from one location to another would be expected to be roughly independent of the distance between locations. Lowe. Ogren and Hendrickson. this information is encoded explicitly in the activities of the control neurons. it would also be possible for the control neurons to warp the reference frame transformation in order to form object representations that are invariant to distortion (e. 1979). During object recognition. In addition. 1987). 1985. 1984). One way that information about connectivity can be utilized is in constraining the active connections between retinal. This constraint is incorporated in our model via the competitive and cooperative interactions among the control neurons (Eq. However. Shulman and Wilson. This prediction shares a basic similarity to Nakayama’s (199 1) “iconic bottleneck” theory. However. 1989) and appear to be inhibited from return (Posner and Cohen. handwritten digits). this is known as the viewpoint consistency constraint.g. In this case. and pattern recognition (Campbell. Finally. Crick. Rather. Comparison with other models Control versus synchronicity. 1976. most of the experiments had display times long enough to permit multiple shifts of attention (although we doubt that this would have been a major contaminating factor in most cases). but it remains to be seen if the axons of projection neurons have collaterals that spread horizontally within the pulvinar. However. Tsal. A number of other models of visual attention and pattern recognition have been proposed that rely on the synchronous firing of neurons in order to change connection strengths (e. Psychophysics.. Rezak and Benevento. 1986. which they conclude to have an upper bound of about 50 bits. in that involuntary attentional fixations tend to be transient (Nakayama and Mackeben. Verghese and Pelli (1992) have attempted to measure the information capacity of the window of attention. in which case information about the particular shape of the object (e. or detecting a nonflashing square among flashing squares-neither of which is well suited for measuring spatial resolution. Van Essen et al. The pulvinar is known to project to the output layers (2. 1979). although his estimate (.. 1984. For example. information about the position and size of an object can be obtained by simply reading out the state of the control neurons. and IT (Benevento and Rezak.g. its slant or style) could also be preserved. if attention were actually to track a stimulus. 1981b. The number of sample nodes in the top layer of our routing circuit is predicated on the notion that the spatial resolution of the window of attention is limited to the equivalent of about 30 x 30 pixels. The 30 x 30 estimate is roughly consistent with several lines of psychophysical evidence. although other studies (e. 1977. Another advantage of having knowledge of the active connection state readily available is that the ensemble of control neurons together form a neural code for the current position and size of the window of attention. which then allows it to be utilized advantageously in a number of ways. for a critical commentary on these and other studies). These synapses are suspected to be excitatory since they are of the asymmetric type (in layers 1 and 2.g. It is interesting to note that Cavanagh (1992) has discovered some forms of visual stimuli that produce a motion percept only when tracked with attention. V4. Crick and Koch. because once a few point-to-point correspondences have been established. Therefore. contrast sensitivity to gratings. However. In machine vision. this constraint drastically reduces the number of degrees of freedom in matching points between the retinal and object-centered reference frames.and object-based reference frames to be in accordance with a global shift and scale transformation. there is no obvious reason why the control neurons should sequence through all intervening positions of the attentional window. these experiments were not designed to measure spatial resolution explicitly.4714 Olshausen et al. In particular. one problem with this analysis is that the critical data were derived from experiments in which visual attention was not explicitly controlled. the number of potential matches between other pairs of points is greatly reduced. moving the locus of attention would require merely inhibiting the current control state and activating a new one. then one would indeed expect a smooth transition of activity across the control neurons. 4) of extrastriate areas V2.
Ahmad (1992) and Posner et al. but we have more or lessignored the feedback pathways that are known to exist in abundancein the visual cortex. Tsotsos (199 1) and Mozer and Behrmann (1992) have proposed somewhat more abstract connectionist models that utilize gating units to control attention. Vl. 1987) to account for translational invariance in visual object priming (Biederman and Cooper. 13(11) 4715 that such information is typically lost in networks that utilize feature hierarchies of complex cells (Fukushima. including topdown (template-driven) control. for example... to the extent that those areas ofthe brain having access to the control neurons (such as parietal cortex) can directly influence where attention is directed. its implementation would seem to be less straightforward. we believe that the strategy of utilizing explicit control neurons may be a useful computational principle employed by the brain in other domains as well. Here we outline some of the more important unresolved issues that remain as topics for future research. modulate each of these components.. dynamic routing need not necessarilybe restricted to the space domain. the computation Control w. 1985) for forming position-. November 1993. the output of a neuron is computed by forming the inner product of a weight vector. Although these models attempt to explain various psychophysical data.” or by other modalities. Unresolvedissues The dynamic routing circuit as described in this article is intended as a “zero-th order” model. (1993). and then passing the result through a nonlinearity. since it is no longer necessaryto have dedicated. LeCun et al. and c.Gallant et al. 1987. This added degree of flexibility reduces the neural resourcesrequired for solving a complicated task. A different perspective of dynamic control is illustrated in Figure 13. have proposed an interesting solution to controlling a hierarchical shifter circuit based on a series of stages of local.the weightvector maybeableto occupy any region within the circularoutlinein order to optimize the network for the particular input and task at hand. More generally. as mentioned earlier. Pollen et al. A more general way of viewing control. However. Most notably. Control-based network models. 1992). and or distortion-invariant representations. This model shares many similarities to the model presented here. w.. being carried out by the network can be dynamically reconfigured and optimized for-the particular task at hand. winner-take-all circuits. 1988) have proposed control-based models that do preserve spatial relationships within the window of attention and share the same basic principle as the model presented here. Thus. one key neurobiological characteristic neglected in the present model is the known preponderance of feature-selective cells in the visual cortex. This also provides a convenient format for mediating the access to control among various competing demands. and from high-spatial-frequency cells when the window of attention is small. KJ. but it differs in the specifics of the control structure.g. and w. but typically in remains fixed over the relatively short time in which the task is actually performed (e. Niebur et al. 1993). By having control neurons available to modify ++on a short time scale. (1988) among others. Control as a general computational strategy Besides being advantageous for the control of visual attention.How does this affect the routing process?One possiblestrategy. Desimone (1992) LaBerge (1990. The weight vector may change on a slow time scale in order to optimize the network for performing a certain task. none of these models preserve spatial relationships within the window of attention. In most neural network models.that is. (1992) have proposed a neural model based upon the original shifter circuit proposal (Anderson and Van Essen. 1980. Postma et al. 1992). Our model can also explain how attention may be directed “at will.with the inputs to the neuron. is shown. Mumford (1992) has sketched a theory proposing that the role of these feedback pathways is to relay the interpretations of higher cortical areas to lower cortical areasin order to verify the high-level interpretation of . Postma et al. 1989. which we consider to be a critical component of the routing process. Control neurons c.The Journal of Neuroscience. While such forms of top-down control are not impossible to incorporate in models based on synchronicity-gated connections. and V2 and V4 contain cells that seemto be tuned for more complex stimuli (von der Heydt and Peterhans. We have describedhow information can be routed in the feedforward pathways. 1971. In addition. remapping object representations from retinal into object-centered reference frames via a third set of units (equivalent to control in our model). 1978. and as such many details have been neglected or oversimplified. Features instead of pixels. < 1 set). A number of other network models of attention and recognition have also utilized the concept of control neurons for directing information flow. A weightvector with two components. they do not contain the necessary level of neurobiological detail to give them strongly predictive value in biology.g. Feedbackpathways. respectively.. Cavanagh. would be to route information primarily from low-spatial-frequencycellswhen the window of attention is large. scale-. specializednetworks with fixed connectionsto deal with each variation of a task (Van Essenet al. is known to contain cells tuned for various orientations and spatial frequencies. but could work across feature domains as well. to change the weight vector dynamically. Hinton and Lang (1985) and Sandon (1990. 1990) or Fourier transforms (e.. As already noted. 1993). have proposed models that involve the pulvinar as a control site for routing information from a select portion of the visual scene.. v Figure 13.
there is a clear need to devise learning rules for networks with control-like structures. Such a mechanism would obviously be of use for step 4 of our proposed strategy for an autonomous visual system. 1980). respect to C. and EcOnStrain. Appendix: Derivation of Autonomous Control Dynamics Blob search The total energy functional we wish to minimize is E total= Em + P%nstra. 1987). within the window of attention). or three-way interactions. In this case. handwritten characters). (A3) (A4) (A% This has the effect of limiting ck to the interval [0.Model of Visual Attention and Recognition a scene. and EcOnStralnt will also be unbounded and the network will not be guaranteed to converge. How might other salient features-such as pop-out due to motion or texture gradients-be incorporated into the preattentive system? How would the demands among these different saliencies be mediated? Integration across multiple attentional sh$s. We have demonstrated how this feat may be accomplished by model neural circuits that are largely consistent with our current knowledge of neurophysiology and neuroanatomy. . We can ameliorate this problem by letting c. or will suggest where it is wrong and how it might be revised. with yields and so the dynamical equation for u. 11. Taking the derivative of E. Another possible role for information flow in the feedback pathways may be to refine the tuning characteristics of lower-level cortical cells based upon the interpretations made in higher cortical areas (see.. beginning with only roughly appropriate connections. As it stands. the limitation does not present a problem. Learning. depending on the position and size of the attentional window as well as the orientation of the eyes. perceptual stability would be desired in IT. connectivity. Moreover. which may facilitate the routing.. and the control neurons would need to learn how to configure themselves to maintain a stable percept as an attended object moves or changes size on the retina.” In our model.. or measuring the spatial resolution of the window of attention-that may not have been obvious otherwise.-ldc. we obtain =- ' --TJP-jy’ ac. As these experiments are carried out. It is through this combined process of computational modeling and experimentation that we hope to understand how visual attention and recognition are actually implemented in the brain. is thus . dt .’ U(X) = [l + exp(-Xx)]-‘. That is. Under this scenario. “blobs” were the only salient features used to attract the window of attention. = 4%Jr du. Letting c. is that a compact representation of each object is maintained in the form of the activities on a set of neurons within a “scene buffer. since this would just involve another form of routing. and how is information in the retinal reference frame transformed to match this representation? One possibility..” Each attentional fixation would then write its contents into a different part of the buffer. head. it remains to be seen whether such a system can self-organized or fine tune itself with experience. that actually follows the gradient. be a monotonically increasing function of another analog variable. More generally.. follow the gradient of this functional. c. How are three-dimensional objects represented neurally.. for foveated objects the log-polar representation in V 1 would convert rotations into approximate linear shifts on the cortex (Schwartz. Pop-out in multiple dimensions. Our model accounts for how reference frames can be shifted and resealed. c. but since we know a priori that the desired minimum of E. The model suggests several experiments. a-&k+ a-Llnstraint k where rr is a constant determining the rate of gradient descent. the routing circuit would be required to reposition and rescale the object properly so that the interpolation could take place. and computational mechanisms required. A hint as to how this may be accomplished has been described by Foldiak (199 l). is that three-dimensional objects are actually represented by a few characteristic two-dimensional views. who has demonstrated how a complex cell can learn translation invariance using the objective function of “perceptual stability. hence E. is unbounded. Rotation and warp. and 0 is a constant determining the relative contribution of the constraint term..4716 Olshausen et al. and body with respect to the environment (see also Baron. as outlined by Hinton (198 1b). as advanced by Poggio and Edelman (1990).nt> (Al) are where Lob and EcOnstralnt defined in Equations 5 and 6. uk.. Although the model we have presented here is neurobiologically plausible in terms of the number of neurons. In the simple autonomous visual system we have proposed. The ability to rotate or warp reference frames could probably be included in the model without much difficulty. the results will either help to increase our confidence in the model... it would be necessary to route information flow within the feedback pathways as well to ensure that the high-level interpretation is matched against the appropriate region within the cortical area below (i.such as measuring attentional modulation of receptive field position and size. but it does not address rotation and other distortions (e. How are the various “snapshots” obtained by the window of attention incorporated to form an overall percept of a scene? One possibility.e..g. e.. Tsotsos. Concluding remarks In order for us to make sense of the visual world. _ a&a. and that a match to the retinal representation is achieved by interpolating among these views. rather than simple perceptron-type networks with two-way interactions only.g. Three-dimensional objects. and EconSfrainf lies in this range. the brain must be capable of forming object representations that are invariant with respect to the dramatic fluctuations occurring on the retina. 1991).
Butter CM (1987) Comparison of the effects of superior colliculus and pulvinar lesions on visual search and tachistoscopic pattern discrimination in monkeys.. DeYoe EA. New York: Elsevier. MA: MIT Press. Buhmann J. Thomas L. Sisola LC (199 1) Distinct pathways link anatomical sub- .). Vis Neurosci 8:391405. 6412) -dt + + z I 1 I where the time constant r is defined as llqcu. Desimone R (1992) Neural circuits for visual attention in the primate brain. Lades M. pp 199-229. is now du. Hillsdale. Semin Neurosci 2:263-275. Comput Vision Graphics Image Process 37:54-l 15. In: Models of the visual cortex (Rose D. Schein SJ (1987) Visual properties of neurons in area V4 of the macaque: sensitivity to stimulus form. 7278. eds). pp 83-95. eds).. Davis JL. Cooper EE (1992) Evidence for complete translational and reflectional invariance in visual object priming. Crick F. is replaced with I’. In: Single neuron computation (McKenna T. Cold Spring Harbor Symp Quant Biol 55:963-97 1. Desimone R. and speed: functional anatomy by positron emission tomography. NJ: Erlbaum. Bruce C (1984) Stimulus-selective properties of inferior temporal neurons in the macaque.r. Julesz B (1983) Parallel versus serial processing in rapid pattern discrimination. Nature 303:696-698. Biederman I. [The (1984). San Mateo. eds). von der Malsburg C (1990) Size and distortion invariant object recognition by hierarchical graph matching. J Neurophysiol39:354369. Dobmeyer S. Petersen SE (199 1) Selective and divided attention during visual discriminations of shape. Cauller LJ. San Diego.. yields and so the final dynamical c. Science 257: 1563-1565.Jy sP EC. Grossberg S. pp 443-476. pp 343-364. In: Neural networks for vision and image processing (Carpenter GA. which may cause implementation difficulties. pp 55-65. J Neurophysiol46:755-772.zy + q/3 2 T&c. in press. eds). Goldberg ME. Bergen JR. Connor CE. Cambridge. It depending Taking constant O( determines the relative contribution of effect of adding this term is discussed in Hopfield essentially pushes c. Vision Res 14:589-592...Jr + T-l& = 1IL: L: v. Bender DB (1988) Electrophysiological and behavioral experiments on the primate pulvinar. Vo175 (Hicks TP. eds). Lindsley DB (1976) Effect of pulvinar lesions on visual pattern discrimination in monkeys. Exp Brain Res 69: 140-l 54. New York Wiley. Gross C. New York: Elsevier. In: Progress in brain research. equation for c.= J%. Wessinger M. Brain Res 108: l-24. Bender DB. J Cognit Neurosci 2:58-68.. Robinson DL (198 1) Behavioral enhancement of visual responses in monkey cerebral cortex. Chalupa LM. . Grossberg S (1987) A massively parallel architecture for a self-organizing neural pattern recognition machine. Eur J Neurosci 2:2 1 l-226. Paper presented at the International Joint Conference on Neural Networks. We can convert the integrator to a more biologically plausible leaky integrator by adding to Etota. (A16) Note that this result is just the same as Equation exception that G. Shulman GL. Gattass R. Diamond IT (1990) Organization of the visual sector of the thalamic reticular nucleus in G&go. J Neurosci 4:205 l-2062. replaced by E ITlUllTaking the derivative of Em. Chemiak C (1990) The bounded brain: toward quantitative neuroanatomy.1 the derivative of Eleak with respect to c. Gallant JL. J Neurophysio157: 835-868. Gross CG. MA: Academic. is defined as in Equation 9... Carpenter G. Benevento LA.. Modulation in posterior parietal cortex related to selective visual attention. Hanson SJ. 7-N=~2 2 G. Sot Neurosci Abstr. except with E. Marc RE. eds). Anstis SM (1974) A chart demonstrating variations in acuity with retinal position. is now defined as (AlO) where the Eleah. with respect to c. pp 42w27. Van Essen DC (1987) Shifter circuits: a computational strategy for dynamic aspects of visual processing. I. Cavanagh P (1985) Local log polar frequency analysis in the striate cortex as a basis for size and orientation invariance. Zometzer SF. Morel A (1992) Segregated thalamocortical pathways to inferior parietal and inferotemporal cortex in macaque monkey. Crick F (1984) Function of the thalamic reticular complex: the searchlight hypothesis. eds). Campbell FW (1985) How much of the information falling on the retina reaches the visual cortex and how much is stored in the visual memory? In: Pattern recognition mechanisms (Chagas C. Proc Nat1 Acad Sci USA 8 1:45864590. Silito AM. the term E leak = F The total energy functional j-u-‘(c) dc. Vol 90 (Mize RR. Coyle D. Lippman RP. Baleydier C. Koch C (1990) Towards a neurobiological theory of consciousness. Perception 7:167-177. Bushnell C..0. Douglas RJ. = 4u!J. Anderson CH. Conley M.. Perception 20: 585-593. is (Al3) + PE constraint 4eais + where E. Schneider W (1990) Attentional control of visual perception: cortical and subcortical mechanisms. is thus da. June. Dobson VG. Proc Nat1 Acad Sci USA 84:6297-6301. Benedek G. November 1993. Miezin FM. Martin KAC (1992) GABA-mediated inhibition in the neural networks of visual cortex. Berlin: Springer. pp 85-95. Burt PJ. Rezak M ( 1976) The cortical projections of the inferior pulvinar and adjacent lateral pulvinar in the rhesus monkey: an autoradiographic study. van der Wall GS (1985) Change detection and tracking. Baron RJ (1987) The cerebral computer. Connors BW (1992) Functions of very distal dendrites: experimental and computational studies of layer I synapses on neocortical pyramidal cells. Van Essen DC (1993) Effects of focal attention on receptive field profiles in area V4. on the value of (Yand X. equation for c.The Journal of Neuroscience. Note that Equation Al 3 is just the same as Equation AlO. CA: Kaufmann. yields (Al4) and so the new dynamical Ck = du.. Desimone R. Cavanagh P (1978) Size and position invariance in the visual system. In: Progress in brain research. A12. Corbetta M. Recognition Now the total energy functional E rota. J Neurosci 11:2383-2402. slightly away from 0 and 1. Berman NJ. Cambridge. SPIE Vol579-Intelligent Robots and Computer Vision. with the References Ahmad S (1992) VISIT: a neural model of covert visual attention. Albright TD.r. Anderson CH. In: Advances in neural information processing systems 4 (Moody JE. Desimone R. 13(11) 4717 One remaining problem is that uk must be computed via pure integration. Cavanagh P (1992) Attention-based motion perception. color.
Marr D (1982) Vision. Desimone R (1992) Stimulation of the superior colliculus (SC) shifts the focus of attention in the macaque. Niebur E. Proc Nat1 Acad Sci USA 79: 2554-2558. Science 259:100-103. Percept Psychophys 401225-240. eds). Science 229:782-784. Vancouver. MA: MIT Press. ed). J Cognit Neurosci 2:358-372. Cereb Cortex 1: 147. Neural Comput 3: 194-200. Hopfield JJ (1982) Neural networks and physical systems with emersent collective comnutational abilities. Hubbard W. pp 3 15-345. LaBerge D (1990) Thalamic and cortical mechanisms of attention suggested by recent positron emission tomographic experiments. Desimone R (1985) Selective attention gates visual processing in the extrastriate cortex. Sur M (1992) NMDA recenters in sensory information processing. Ohzawa I. Petersen SE. Mumford D (1992) On the computational architecture of the neocortex. Hopfield JJ (1984) Neurons with graded response have collective comnutational nronerties like those of two-state neurons. J Comp Neurol210:178-290. Cambridge: Cambridge UP. Rocha-Miranda CE. . Rosin C (1993) An oscillation-based model for the neuronal basis of attention. Poggio T (1992) Multiplying with synapses and neurons. Sot Neurosci Abstr 17: 1282. Kanerva P (1988) Sparse distributed memory. Ogren MP. J Neurophysiol 35:96-l 11. Gallant JL. Neural Comput 4:502-5 17. Van Essen DC (199 1) Distributed hierarchical processing in the primate cerebral cortex. Wurtz RH (1972) Activity of superior colliculus in behaving monkey. In: Synaptic organization of the brain (Shepard GM. O’Kusky J. Denker JS. Stryker MP (1989) Visual responses in adult cat visual cortex depend on iV-methyl-o-aspartate receptors. Appl Optics 26: 4985-4992. Gattass R. Hillsdale. Gattass R. Van Essen DC (1993) Selectivity for polar. In: Connectionist approaches to natural language processing (Reilley RG. Ullman S (1985) Shifts in selective visual attention: towards the underlvina neural circuitrv. Visual receptive fields of single neurons. Proc Nat1 Acad Sci USA 81:3088-3092. Carter M. pp 409460. eds). Gross C. Paper presented at the Seventh International Joint Conference on Artificial Intelligence 2. Nature 298:266-268. Duhamel J. Sot Neurosci Abstr 17: 1282. Proc Nat1 Acad Sci USA 86:5183-5187. Eriksen CW. Nakayama K (1991) The iconic bottleneck and the tenuous link between early visual processing and perception. pp 389-438. Eccles JC. Mozer MC. Sot Neurosci Abstr 18:390.4718 Olshausen et al. J Neurosci 10:613-619. Brain Res 300:295-303. Hendrickson AE (1979) The structural organization of the inferior and lateral subdivisions of the Mucucu monkey pulvinar. Science 194:283-287. Boser B. and synapses in the visual cortex (area 17) of adult macaque monkeys. Koch C. I. glia. Neural Comput 4:3 18331. Braun J. Hope B. in press. Howard RE. Martin KAC. eds).. New York: Freeman. Ogren MP. St James JD (1986) Visual attention within and around the field of focal attention: a zoom lens model. Davis JL. Volgushev M. Mackeben M (1989) Sustained and transient compounds of focal visual attention. Canada. Mishkin M (1972) Cortical visual areas and their interactions.C. Los Angeles. II. Sot Neurosci Abstr 18:703. Sousa APB. Hendrickson AE (1977) The distribution of pulvinar terminals in visual areas 17 and 18 of the monkey. Goldberg ME. Curr dpinion Neurobiol 2:484-488. Foldiak P (199 1) Learning invariance from transformation sequences. Morris JD (1987) Contributions of the pulvinar to visual spatial attention. Vision Res 29: 163 1-1647. Gross CG. Bender DB. New York: Oxford UP. eds). Llinas RR (1988) The intrinsic electrophysiological properties of mammalian neurons: insights into central nervous system function. J Neurophysiol 35:542-559. pp 269-339. In: Brain and human behavior (Karczmar AG. Whitteridge D (1988) Selective responses of visual cortical cells do not depend on shunting inhibition. Colby L. Poggio T (1976) Cooperative computation of stereo disparity. NJ: Erlbaum. Hinton GE. Paper presented at the Seventh International Joint Conference on Artificial Intelligence 2. Vision Res. Bender DB (1972) Visual properties of neurons in inferotemporal cortex of the macaque. Neural Comput 1:541-55 1. Marr D. Sclar G. Butter CM -( 1984) Eifects of kainic acid and radiofrequency lesions of the pulvinar on visual discrimination in the monkey. Keys W (1985) Pulvinar nuclei of the behaving rhesus monkey: visual responses and their modulation. Desimone R (1991) Attention-related responses in the superior colliculus of the macaque. Paper presented at the Ninth International Joint Conference on Artificial Intelligence. Robinson DL. J Neurosci 1: 12 18-l 235. Cambridge. Berlin: Springer. Lang KJ (1985) Shape recognition and illusory conjunctions. McClendon E. Mel BW (1992) NMDA-based pattern discrimination in a modeled cortical neuron. Creutzfeldt 0 (1992) A comparison ofdirectional sensitivity with the excitatory and inhibitory field structure in cat striate cortical simple cells. B. . In: Pattern recognition mechanisms (Chagas C. Nagel-Leiby S. Douglas RJ. pp l-20. pp 187208. Motter BC (198 1) The influence of attentive fixation upon the excitability of the light-sensitive neurons of the posterior parietal cortex. Murphy TD (1987) Movement of attention focus across the visual field: a critical look at the evidence. Pei X. Behrmann M (1992) Reading with attentional impairments: a brain-damaged model of neglect and attentional dyslexias. Science 255:90-92. Hinton GE (198 la) A parallel computation that assigns canonical object-based frames of reference. The role of cortico-cortical loons. Biol Cvbem 66:241-25 1. Miller KD. Buchsbaum MS (1990) Positron emission tomographic measurements of pulvinar activity during an attention task. Annu Rev Physiol40:217-248. Martin KAC (1990b) Control of neuronal output by inhibition at the axon initial segment. Petersen SE. Andersen RA. Felleman DJ. Nature 332642-644. Douglas RJ. In: Vision coding and efficiency (Blakemore C. Sharkey NE. B. Sot Neurosci Abstr 17:545. Percep Psychophys 42: 299-305. hyperbolic. In: Human and machine vision (Beck J. Henderson D. Goldberg ME (1992) The updating of the representation of visual space in parietal cortex by intended eye movements. Martin KAC (I 990a) Neocortex. Jackel LD (1990) Backpropagation applied to handwritten Zip code recognition. Canada.. McClendon E (199 1) Modular connections between area V4 and temporal lobe area PITv in macaque monkeys. Gattass R. Lowe DG (1987) The viewpoint consistency constraint. Robinson DL. Hinton GE (198 1b) Shape representation in parallel systems. Chapman B. Freeman RD (1982) Contrast gain control in the cat visual cortex. Hum Neurobiol 4:2 19-227. Cambridge. Felleman DJ. LaBerge D. Nelson SB. and Cartesian gratings in macaque visual cortex. Colonnier M (1982) A laminar analysis of the number of neurons. Eriksen CW. Nakayama K. Koch C. Neural Comput 2:283-292. J Comp Neurol 188:147-178.Model of Visual Attention and Recognition divisions of V4 with V2 and temporal cortex in the macaque monkey. Koch C. New York: Springer. Brown V (1992) A network simulation of thalamic circuit operations in selective attention. Orlando: Academic. Basbaum AI (1978) Brainstem control of spinal paintransmission neurons. J Neurophysiol 54:867-886. LaBerge D. eds). Zometzer SF. pp 41 l-422. Lin K (1992) Modular segregation of visual pathways in occipital and temporal lobe visual areas in the macaque monkey. Science 242~1654-1663. Felleman DJ. Int J Comput Vision 1157-72. Moran J. Brain Res 137:343350. Neuropsychologia 25:97-105. Perception 21 [Suppl 2]:26. Palmer SE (1983) The psychology of perceptual organization: a transformational approach. Douglas RJ. ed). Gattass R. Rosenfeld A. Fukushima K (1987) Neural network model for selective attention in visual pattern recognition and associative recall. Mountcastle VB. Fields HL. LeCun Y. Covey E (1985) Cortical visual areas of the macaque: possible substrates for pattern recognition mechanisms.C. In: Single neuron computation (McKenna T. Biol Cybem 36: 193-202. Vancouver. MA: Academic. Fukushima K (1980) Neocognitron: a self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position.
WA: SPIE. Lau C. Pei X. Lee JR. Sandon PA (1990) Simulating visual attention. eds). Pitts W. Rezak M. Burlingame. Newsome WT. McCulloch WS (1947) How we know universals: the perception of auditory and visual forms. Olshausen BA (1993) Dynamic routing strategies in sensory. Perception 16:89-l 0 1. Cambridge. Raichle ME (1988) Localization of coanitive ooerations in the human brain. Proc Nat1 Acad Sci USA 88:1812-1814. Disordered systems and biological organization (Bienenstock E. Koch C. Petersen SE (1990) The attention system of the human brain. Science 240: 1627-l 63 1. Stanton GB (1978) Parietal association cortex in the primate: sensory mechanisms and behavioral modulations. MA: MIT Press. August. Wilson J (1987) Spatial frequency and selective attention to local and global information. Tsotsos JK (1991) Localizing stimuli in a sensory field using an inhibitory attention beam. Posner MI (1987) Deficits in human visual spatial attention following thalamic lesions. ’ Robinson DL. Posner MI. July. Sandon PA. Sot Neurosci Abstr 18: 148. J Neurosci 9: 173 l-l 763. In: Analysis of visual behavior (Ingle DJ. Felleman DJ. eds). area1 boundaries. Bouwhuis DG. Petersen SE (1992) The oulvinar and visual salience. MacLeod KM (1992) Focal spatial attention suppresses responses of visual neurons in monkey posterior parietal cortex. Anderson CH. Robinson DL. Sot Neurosci Abstr 17:442. Van Essen DC. Pelli DG (1992) The information capacity of visual attention. Paper presented at the 10th Annual Conference ofthe Cognitive Science Society. In: Large scale neuronal theories of the brain (Koch C. DeYoe EA. Montreal. Taylor JH (1971) How does the striate cortex begin the reconstruction of the visual world? Science 173:74-77. Hudson PTW (1992) The gating lattice: a neural substrate for dynamic gating. pp 17-28. Friedrich FJ. Fogelman Solie F. Connor CE. motor. Vol 1473 (Mathur BP. Proc Nat1 Acad Sci USA 84:7349-7353. Posner MI. control of language processes (Bouma H. Witkin AP. Davis JL. Ungerleider LG. Vision Res 32:983-995. Sherman SM. In: Attention and performance. Schwartz EL (1980) Comoutational aeometrv and functional architecture of striate cortex. Olavarria J. Treisman A (1988) Features and objects: the fourteenth Bartlett memorial lecture. Fox PT. Weisbach G. In: Proceedings of the SPIE Conference on Visual Information Processing: from neurons to chips. Percept Psychophys 35: 393-399. Peterhans E (1989) Mechanisms of contour perception in monkey visual cortex. X. Vidyasagar TR. eds). Gallant JL (199 1) Pattern recognition. Gilbert CD (1992) Dynamic changes in receptive-field size in cat primary visual cortex. Rafal RD (1984) Effects of parietal in. Mishkin M (1982) Two cortical visual systems. Exp Brain Res 63: l-20. Creutzfeldt OD (1992) Orientation-selective inhibition in cat visual cortex: an analysis of postsynaptic potentials. Tsal Y (1983) Movements of attention across the visual field. Pollen DA. Paper presented at CNS*92. Trends Neurosci 15: 127-132. Postma EO. Knierim J (1990) Modular and hierarchical organization of extrastriate visual cortex in the macaque monkey. Van Essen DC. Walker JA. Vol F20. Maunsell JHR. eds). Julesz B (199 1) The speed of attentional shifts in the visual field. Q J Exp Psycho1 40A:201-237. Petersen SE. and information bottlenecks in the primate visual system. San Francisco. Volgushev M. Bull Math Biophys 9:127-147. Bixby JL (1986) The projections from striate cortex (V 1) to areas V2 and V3 in the macaque monkey: asymmetries. von der Heydt R. Davis J. J Comp Neurol 244:451480. von der Malsburg C. Cambridge. J Exp Psycho1 [Hum Percept] 9:523-530. Bowman EM. Kass M (1987) Signal matching through scale space. Proc Nat1 Acad Sci USA 89:83668370. Van Essen DC. in press. November 1993. Bienenstock E (1986) Statistical coding and shortterm synaptic-plasticity: a scheme for knowledge representation in the brain. and cognitive processing. Canada. Goldberg ME. Van Essen DC. Cold Spring Harbor Symp Quant Biol55:679696. Pierce L (1984) Moving attention: evidence for timeinvariant shifts of visual selective attention. Cohen Y (1984) Components of visual orienting.jury on covert orienting of attention. J Neurosci 4: 1863-l 874. J Cognit Neurosci 2:213-231. Steinmetz MA. Poggio T. CA. ed). Technical report RBCV-TR-91-37. attention. Vision Res 20:645-669. Saarinen J. Van Essen DC. J Neurophysiol4 1:9 1O-932. In: An introduction to neural and electronic networks (Zometzer SF. Anderson CH ( 1990) Information processing strategies and pathways in the primate retina and visual cortex. University of Toronto. van den Herik HJ. pp 549-586. Posner MI. II. pp 247-272. Hillsdale. New York: Academic. Remington R. Robinson DL. Sillar KT (199 1) Spinal pattern generation and sensory gating mechanisms. and patchy connections. Berlin: Springer. Edelman S (1990) A network that learns to recognize threedimensional objects. pp 531-556. Annu Rev Neurosci 13:25-42. Perception 21 [Suppl 2]:26. MA: MIT Press. Kertzman C (1991) Convert orienting of attention in Macaque. 73(11) 4719 Pettet MW. Shulman GL. Department of Computer Science. Olshausen B. Verghese P. the inferior pulvinar and adjacent lateral pulvinar to primary visual cortex (area 17) in the macaque monkey. Posner MI. Anderson CH. Brain Res 167: 19-40.The Journal of Neuroscience. pp 43-72. Terzopoulos D. Rafal RD. A signal in parietal cortex to disengage attention. Int J Comput Vision 1:133-l 44. Curr Opin Neurobiol 1:583-589. eds). Benevento LA (1979) A comparison of the organization of the projections of the dorsal lateral geniculate nucleus. Uhr LM (1988) An adaptive model for viewpoint-invariant object recognition. . In: NATO AS1 series. NJ: Erlbaum. Science 3431263-266. Koch C (1986) The control of retinogeniculate transmission in the mammalian lateral geniculate nucleus.
This action might not be possible to undo. Are you sure you want to continue?
We've moved you to where you read on your other device.
Get the full title to continue listening from where you left off, or restart the preview.