Professional Documents
Culture Documents
and reverse-contrast Ternus motion properties arise. A key locations because it is caused by summed signals in the
property of the motion-sensitive cells in Layers 4 and 5 of long-range directional filter, which activates a processing
Figure 8.25 is that they are activated by transient cells. As stage beyond the one where the transient cell responses are
their name suggests, transient cells are activated by changes computed.
in inputs, and respond transiently to onsets and offsets of How about group motion? When the ISI is longer, then
inputs. Given that motion perception is a perception of OFF-transient cells can be activated at all flash positions
change, or transiency, it is perhaps not surprising that the in Frame 1, and ON-transient cells can be activated at all
brain has developed specialized detectors for sensing such flash positions in Frame 2 (Figure 8.27, right column). The
events. Recall the importance of motion transients in dis- three flash activations in each frame can then add across
cussing percepts like change blindness (Chapter 7). space via the Gaussian filter to produce a summed gradient
Transient cells play an important role in explaining with its maximum in the middle of its three flashes (Figure
how element motion (Figure 8.23) is perceived at a short 8.28, left column). When spatial competition acts on this
ISI. As in the discussion of change blindness, there are two sum, it selects a single peak of activation that occurs at
types of transient cells to consider: ON-transient cells that the middle of the three flashes. This happens to the sum
are activated by the onset of an input, and OFF-transient of the OFF-transient cells in response to the first frame,
cells that are activated by the offset of an input. Recall and to the sum of the ON-transient cells in response to
from the explanation of the traveling wave in Figure 8.16 the second frame. The group motion percept arises from
that the waning activation due to the first flash sums with the traveling wave that occurs between these peaks, as in
the waxing activation due to the second flash in the long- Figure 8.28 (right column). In summary, the group motion
range Gaussian filter to cause the G-wave. The model pro- percept can be explained by the same Gaussian long-range
poses that the waning of activation due to first flash turns directional filter that enables directional responses and
on OFF-transient cells, and the waxing of activation due long-range apparent motion to occur at all, and the ele-
to the second flash turns on ON-transient cells. The model ment motion percept calls attention to the importance of
further predicts that a G-wave can occur when the OFF- transient cells at an earlier motion processing stage.
transient cell activation adds to the ON-transient cell acti- Why is element motion eliminated when the flashes of
vation through the Gaussian filter. the first and second frames have different relative contrasts
With this background, consider how the transient cells relative to the background, as in Figure 8.24? To fix ideas,
behave when the Ternus display is presented with a short suppose that the three stimulus dots in Frame 1 are more
ISI. In response to the first frame of the Ternus display luminous than their background, whereas the three dots
(Figure 8.23), the ON-transient cells respond at the loca- in Frame 2 are less luminous than their background. The
tions of all three flashes. Now consider how the transient explanation is intuitively simple, once one realizes the key
cells respond to the three flashes in the
second frame of the display. Recall that
two of the three positions are the same in
the two sets of flashes. If the ISI is suffi-
ciently short, then the waning activation
to the first flash and the waxing activation
to the second flash tend to cancel out at
these positions. As a result, neither the
OFF-transient cells nor the ON-transient
cells can be activated at the shared posi-
tions when the second frame occurs, as
illustrated by the computer simulation
summaried in Figure 8.27 (left column).
Only the leftmost flash in the first frame
and the rightmost flash in the second
frame can substantially activate tran-
sient cells at this time. The leftmost flash
activates OFF- transient cells, and the
rightmost flash activates ON- transient
cells. These transients combine in a G-
wave that travels from the leftmost flash
to the rightmost flash, yielding a percept
FIGURE 8.27. The distribution of transients through time at onsets and offsets of
of element motion. This traveling wave Ternus display flashes helps to determine whether element motion or group motion
does not “bump into” the shared flash will be perceived.
308 | Conscious MIND Resonant BRAIN
Spatial attention
shifts and tracking
by the Where
FIGURE 8.28. The Gaussian distributions of activity that arise from the three si- cortical stream
multaneous flashes in a Ternus display add to generate a maximum value at their mid-
point. The motion of this group gives rise to group motion.
Is there evidence to support the hypoth-
esis that spatial attention shifts may some-
role of ON-and OFF-transient cells in the motion filtering times also be controlled by a G- wave?
process. Onset of Frame 1 activates ON-transient cells, as If this were the case, then spatial attention, just like
before. However, onset of Frame 2 activates OFF-transient long-range apparent motion, should be able to adapt its
cells, no matter how small the ISI is chosen, due to the re- speed to the distance between two successive events. The
versal of contrast between Frames 1 and 2. The offset of experiments of several investigators, including Roger
Frame 1 can therefore activate OFF-transient cells at all dot Remington and Leslie Pierce in 1984, and Ho-Wan Kwak,
positions, just as in the case of a large ISI without contrast- Dale Dagenbach, and Howard Egeth in 1991, have, in fact,
reversed frames. All three locations in both frames can shown that spatial attention can speed up to cover variable
therefore influence the formation of a unimodal motion distances in equal time (Kwak, Dagenbach, and Egeth,
signal, centered at the middle of each grouping of dots, 1991; Remington and Pierce, 1984).
thereby leading to a percept of group motion at ISIs which, The hypothesis that G-waves may give rise to both
in the same-contrast case, led to element motion. apparent motion and spatial attention shifts is also con-
It should also be noted that, when complete boundary sistent with anatomical evidence. For example, G-waves
groupings from V2 interact with motion cells in MT and for apparent motion are predicted to occur within cortical
MST due to formotion interactions, as in Figure 8.26, then area MT, or at least no later than MST, which is part of
the illusory contours that are formed between the dots also the motion cortical processing stream that is labeled mag-
undergo apparent motion, just as in the percept caused by nocellular in Figure 1.21; also see Figure 3.1. The motion
the apparent motion of illusory contours (Figure 8.11). This stream is, in turn, part of the larger Where cortical pro-
kind of formotion interaction will also play an important cessing stream that includes cortical regions MT, MST, and
role below in my explanation of how we see objects that posterior parietal cortex (PPC), as well as their projections
are moving while some of their parts may also be moving to the prefrontal cortex (PFC) (Figure 3.1). Higher pro-
relative to them, as in the Duncker illusion. cessing stages in the Where stream, including the parietal
After reading an explanation like this, one might still cortex, compute the positions of targets with respect to an
think: That is all well and good. But how do I know that observer and direct spatial attention and action towards
this explanation, albeit simple and based on elementary them (Goodale and Milner, 1992; Mishkin, Ungerlieder,
mechanisms that are known to occur, is correct? Nothing and Macko, 1983; Ungerlieder and Mishkin, 1982).
is certain in science but, as I remarked above, if the same Data consistent with the prediction that an interpola-
small combination of mechanisms can explain lots of dif- tive process, such as a G-wave, occurs in the Where stream
ferent data properties in a principled and unified way, then have been reported from the PPC of monkeys (Albright,
one’s confidence in them increases accordingly. It would 1995; Assad and Maunsell, 1995). In particular, in 1995,
also help to further test these explanations, and to guide John Assad and John Maunsell reported that there are
How We See and Recognize Object Motion | 309
motion-sensitive cells in PPC that respond to motion in and speed. As I noted above, the ambiguity of most local
image regions where no physical motion occurred, but motion signals is, in fact, faced by all motion detectors
where motion was implied by the temporary disappear- with a finite receptive field due to the aperture problem.
ance of the stimulus, as occurs when a moving object is Only when a contour within an aperture contains unam-
temporarily occluded. This is just what would be expected biguous features—such as line terminators, object corners,
if these cells were influenced by G-waves that arise either or high contrast blobs or dots—can a local motion detector
in the parietal cortex itself, or in the MT cortex whose out- accurately measure the direction and speed of motion. The
puts can influence parietal activation in the same process- brain is capable of using these feature tracking signals to
ing stream. In 1986, William Newsome, Akichika Mikami, impose their directions on other parts of an object which
and Robert Wurtz described neurophysiological evidence are receiving ambiguous motion signals.
for apparent motion selectivity in MT for higher speed How a brain may do this is illustrated by the percept
stimuli in macaque monkeys (Newsome, Mikami, and generated by the classical barberpole display shown in
Wurtz, 1996). In 1998, Rainer Goebel, Darius Khorram- Figure 8.5. As I noted above, in this display, lines are seen
Sefat, Lars Muckli, Hans Hacker, and Wolf Singer (Goebel moving within an invisible rectangular aperture; that is,
et al., 1998) provided evidence from echoplanar functional within an aperture that is the same color as the paper. In
magnetic resonance imaging in humans that MT and MST this situation, all the motion directions within the interior
were the first areas of visual cortex to respond with an in- of the display are ambiguous, as illustrated by the “fan”
crease in signal intensity as compared with flickering con- of possible motion directions within the interior of each
trol conditions. line in Figure 8.5, but the unambiguous feature tracking
In evaluating the hypothesis that long-range apparent signals that are computed at the line ends that bound the
motion inputs to spatial attention mechanisms, it needs to barberpole can propagate to its interior to select, or cap-
be kept in mind that a spatially continuous motion signal ture, motion signals that code the same direction, while
is generated only under certain spatiotemporal conditions, suppressing signals that code different directions. In par-
that the speed of the motion signal is nonuniform in time ticular, the feature tracking signals that are computed at
(e.g., Figure 8.21), and that spatially discrete jumps in acti- the barberpole’s long axis dominate the motion percept. A
vation may occur in cases where continuous motion is not number of investigators have presented supportive psycho-
observed; for example, if the distance between “flashes”, physical data for this viewpoint, including Ellen Hildreth
or intermittent appearances of a target, is too great to be in 1984, and Ken Nakayama and Gerald Silverman in 1988
spanned by the spatial scales that subserve the waves (e.g., (Hildreth, 1984; Nakayama and Silverman, 1988a, 1988b).
Figure 8.19). These considerations may help to clarify some
of the variability in data about how quickly attention may
shift under different conditions, and whether it does so
continuously or discretely. From ambiguous local
motion to correct object
Solving the Aperture motion: Integration and
Problem: Global capture of segmentation
object direction and speed The barberpole illusion illustrates how the brain can gen-
erate an unambiguous percept of an object’s direction and
Given this background, we can now outline the Motion speed of motion even if most of its local motion signals do
BCS explanation of how the motion stream computes an not code this direction and speed. This illusion illustrates
object’s direction and speed of motion, even under many a brain process that is often called motion integration, by
of the challenging lighting conditions that occur in the which the brain joins spatially separated motion signals
wild. The model also clarifies how attention can be focused into a single object motion percept. Motion integration is
on moving objects, and how they may become consciously complemented by a process of motion segmentation that
visible. determines when spatially separate motion signals belong
The example of a leaping leopard in Figure 8.4 illus- to different objects. To accomplish either process, the
trates the fact that, when an object moves under real-world visual system uses the relatively few feature tracking sig-
conditions, only a small subset of its image features, such nals to capture the same motion direction signal at posi-
as the object’s bounding contours, may generate motion tions where aperture ambiguity holds, while suppressing
direction cues that accurately describe the object’s direc- other directions. As I will explain below, this motion cap-
tion of motion. Despite this fact, object percepts often ture process involves cooperative and competitive interac-
seem to pop-out with a well-defined motion direction tions across space, and the balance between cooperation
310 | Conscious MIND Resonant BRAIN
overlapping responses at each speed, determines the per- due to a “field theory” were rejected. It was not clear what
ceived speed of an object after it is further processed by sort of continuous field could react to different stimuli to the
the circuits in Figure 8.31. This speed estimate enabled two eyes. A binocular long-range directional filter has no
Jon Chey and me to accurately simulate how visual speed problem with these data, since it can generate a G-wave from
perception and discrimination in humans are affected by dichoptically presented stimuli because inputs from both
stimulus contrast, duration, density, and spatial frequency eyes activate the directional long-range filter (Figure 8.31).
of the moving stimuli. Other data about the properties of MT cells are also
The fourth design principle concerns how nearby object consistent with this model prediction. For example,
contours of different contrast polarity and orientation that Akichika Mikami published neurophysiological data in
are moving in the same direction can all pool their signals 1991 that was recorded from macaque monkeys whose MT
via the directional long-range filter (Figure 8.31) to gen- cells responded to long-range apparent motion (Mikami,
erate a pooled motion direction signal, as in Figure 8.1. 1991). Using a multi-channel biomagnetometer, the human
Recall that this pooling process differentiates the direction- homolog of MT was also shown to respond to long-range
sensitive motion cortical stream from the orientation- apparent motion in 1997 by Yoshiki Kaneoke, Masahiko
sensitive form cortical stream, and is a key difference in Bundou, Sachiko Koyama, Hiroyuki Suzuki, and Ryusuke
their complementary computations. As I have already Kakigi (Kaneoke et al., 1997). Thomas Albright and Hillary
noted, these long-range directional filter cells are the first Rodman published neurophysiological data in 1985 and
true direction-selective cells in the model. They are analo- 1987 showing that most MT cells are directionally selec-
gous to the complex cells of the form perception system, as tive for a wide variety of stimulus forms (Albright, 1984;
in Figures 4.17 and 4.19. This is the same long-range direc- Rodman and Albright, 1987), consistent with the model
tional Gaussian filter that generates the G-wave which sim- hypothesis that the long-range filter pools over multiple
ulates data about long-range apparent motion, as in Figure orientations moving in the same direction (Figure 8.31).
8.16, including beta, gamma, delta, reverse, split, Ternus, In a similar vein, the orientational selectivity of MT
and reverse-contrast Ternus motion, and Korte’s laws. neurons appears to be weaker than their directional selec-
tivity, in the sense that movement in the preferred direc-
tion will usually evoke a response even from a bar aligned
in a non-preferred orientation (Albright, 1984). In addi-
Explaining properties of cells tion, John Maunsell and David Van Essen showed in 1983
that responses to oriented bars are transient even for pre-
in cortical area MT ferred orientations (Maunsell and Van Essen, 1983), con-
sistent with the idea that MT cells are fed by transient cells
The Motion BCS model hereby explains how long-range from lower processing stages (Figure 8.31). These various
apparent motion and target tracking properties may arise authors also reported that MT cells exhibit well-defined
from more basic constraints on generating direction- speed tuning curves (Maunsell and Van Essen, 1983;
selective “complex cells” in the motion system. This analy- Rodman and Albright, 1987), consistent with the hypoth-
sis suggests that the earliest anatomical stage at which esis that they come after the multi-scale short-range filter
the long-range directional filter can occur is cortical area (Figure 8.31). Thus, all the model stages of processing by
MT in the Where cortical processing stream (Figures 1.21 transient cells, a speed-tuned directional short-range filter,
and 3.1). One important experimental fact about MT cells and a long-range directional filter are consistent with neu-
that is also exhibited by the long-range directional filter is rophysiological data about MT cells.
that they each combine inputs from both eyes, as shown
in experiments in 1983 by John Maunsell and David van
Essen, and in 1995 by David Bradley, Ning Qian, and
Richard Andersen (Bradley, Qian, and Andersen, 1995; Solving the Aperture
Maunsell and van Essen, 1983). The model hypothesis that
the long-range directional filter carries the G-wave, and
Problem: Cycling bottom-up
also receives inputs from both eyes, is consistent with data motion grouping and
showing that long-range apparent motion occurs with di-
choptically presented stimuli. Thus, if Flash 1 is presented top-down priming
to one eye and Flash 2 to the other eye, then long-range
apparent motion may still be perceived, as was reported The fifth design principle concerns how motion capture
in 1948 by J. A. Gengerelli and in 1968 by Irwin Spigel occurs. The processing stages up to the long-range di-
(Gengerelli, 1948; Spigel, 1968). rectional filter creates direction-selective cells that can
This last property is one of the reasons that the original interpolate continuous motion trajectories through spa-
ideas of Gestalt psychologists about apparent motion being tially discrete appearances, or “flashes”, of motion cues.
314 | Conscious MIND Resonant BRAIN
But these stages cannot solve the aperture problem. They How can a preattentive process, which typically uses
can amplify the strengths of feature tracking signals rel- bottom-up circuits that relay signals from peripheral pro-
ative to ambiguous motion signals using the directional cessing stages to ones deeper in the brain, share the same
short-range filter and the spatial and directional competi- mechanisms as an attentive process, which typically uses
tion (Figure 8.31), but they cannot remove the ambiguous top-down circuits that relay signals from deeper in the
motion signals. The key remaining problem is to explain brain towards the periphery? The Motion BCS model pre-
how feature tracking signals can select, or capture, the dicts that preattentive motion capture and attentive motion
correct object motion direction from among the large array priming process are achieved by bottom-up and top-down
of ambiguous motion directional signals (see Figure 8.4), pathways, respectively, that define a feedback circuit be-
while suppressing incorrect motion directions, without tween the same pair of processing stages (Figure 8.31).
distorting or destroying the speed estimate of the selected These processing stages are proposed to be cortical areas
direction. Thus, motion capture is a process of capturing MT and MST. More specifically, they occur in a particular
motion-direction-and-speed. This hypothesis is supported substream of MT, called MT minus (MT–), and of MST,
by many psychophysical data showing that direction and called ventral MST (MSTv). This prediction is consistent
speed are selected together, as is vividly illustrated during with the fact that MSTv has large directionally-tuned recep-
switches between percepts of incoherent and coherent tive fields that are specialized for detecting moving objects,
plaid motion (Figure 8.29), which can be seen in the video as was reported in 1993 by K. Tanaka, Y. Sugita, M. Moriya,
https://w ww.youtube.com/watch?v=WhbZesV2GRk. and H. Saito (Tanaka, Sugita, Moriya, and Saito, 1993).
A single feedback loop controls motion capture and at- Complementary motion streams for motion capture
tentional priming of motion direction. My proposed so- and visually-based navigation. Just as MT- and MSTv
lution to the global aperture problem leads to a surprising are specialized to detect and track moving objects, a par-
prediction that is consistent with neurophysiological and allel cortical stream through regions MT+ and dorsal MST,
functional neuroimaging experiments, but has not yet been or MSTd, is specialized to carry out visually-based nav-
directly tested. The prediction is that motion capture and at- igation based on optic flow signals (Figure 8.34). Optic
tentional priming can both be achieved by the same process; flow is the pattern of motion that is generated in an ob-
namely, a single long-range grouping and priming network, server’s brain when objects in a visual scene move rela-
as in Figure 8.31. Attentional priming is the process whereby tive to the observer. How optic flow can be used to guide
recent exposures to a stimulus or to
other stimuli that are similar to, or
associated with, that stimulus can
decrease the time it takes to find
the stimulus when it is presented
among distracting stimuli. Thus, if
I say: “Look for your brother John,”
you may detect John more quickly
in a crowd than if you did not acti-
vate the prime. The prediction that
motion capture and attentional
priming may use the same process
is surprising because motion cap-
ture is an automatic and preattentive
process. Why should this preatten-
tive process also be used by volition
to focus, or prime, attention to selec-
tively look for motions in a desired
direction, and to amplify any such
motions that may occur? In this
regard, several authors have shown
that attention can focus selectively
on a desired motion direction
(Groner, Hofer, and Groner, 1986;
Sekuler and Ball, 1977; Stelmach
FIGURE 8.34. The ViSTARS model for visually-based spatial navigation. It uses the Motion
and Herdman, 1991; Stelmach, BCS as a front end and feeds its output signals into two computationally complementary cor-
Herdman, and McNeil, 1994). tical processing streams for computing optic flow and target tracking information.
How We See and Recognize Object Motion | 315
navigation using the MT--to-MSTv and MT+-to-MSTd cor- controls object attention: This kind of circuit dynamically
tical streams will be discussed in Chapter 9. For now, let stabilizes the memory of learned receptive fields. In this
me just note that these two parallel processing streams use case, the receptive fields that are learned are directional
computationally complementary processes. These com- grouping cells in MSTv. Directional grouping cells are thus
plementary computations are illustrated in Figure 8.34 by a kind of directional recognition category within an ART
the assertion that additive processing enables the brain to feedback circuit.
determine the direction of heading, or a navigator’s self-
motion direction, whereas subtractive processing is used to ART again: Dynamically stabilizing learned directional
determine the position, direction, and speed of a moving cells also solves the aperture problem. By dynamically
object. These complementary types of processing enable stabilizing the learning of directional grouping cells in
the computation of an observer’s heading while moving MSTv, the ART Matching Rule also automatically solves
relative to a scene, and of an object’s movements relative to the global aperture problem! In other words, evolution did
the observer. This latter information, in turn, can be used not have to design a specialized solution of the aperture
to avoid collisions with objects in a scene while moving problem. All it had to do was to implement the usual ART
through it. feedback dynamics to dynamically stabilize the memory
Chapter 9 will discuss how navigation based on optic of learned categories, in this case learned directional cat-
flow signals work. Then Chapter 16 will discuss how nav- egories. I will return to this issue below, right after using
igation based on path integration signals works. Whereas the MT--MSTv feedback network to solve the global aper-
optic flow navigation computations occur in cortical areas ture problem.
MT and MST, among others, path integration navigation The Castet et al. (1993) data in Figure 8.30 about the
computations occur in the entorhinal and hippocampal perceived speeds of moving tilted lines provide a vivid ex-
cortices. Chapter 16 will propose why path integration ample of how the global aperture problem is solved by this
computations occur there, as part of an analysis which sug- feedback circuit. Recall that the feature tracking signals at
gests that time and space—or, more specifically, adaptively- the ends of the tilted moving lines are amplified by the di-
timed learning and spatial navigation—both occur in the rectional short-range filter and the competitive stages in
entorhinal-hippocampal system because they both exploit Figure 8.31. As a result, these feature tracking signals are
variations of the same circuit designs. Chapter 16 will also the ones that typically win the directional competition in
discuss work that remains to be done to unify our under- MSTv and feed back their winning directional signals to
standing of how vision and path integration work together MT-. Due to the ART Matching Rule, the feature tracking
to control navigation under multiple conditions,
including in the dark.
directions in MT- are selected by these feedback signals, Figure 8.37 summarizes a computer simulation
while the other ambiguous motion signals are suppressed showing how motion capture occurs as a diagonally ori-
(Figure 8.36). Thus, after MT--MSTv feedback acts in a ented line moves to the right. The left column shows re-
given region of space, the only cells that remain active at sponses of the directionally-selective transient cells and
positions that previously computed ambiguous motion di- short-range filter cells. The top row of the right column
rections are those that code the feature tracking direction. shows how enhanced rightward-pointing feature tracking
All the active MT- cells in this region now code for the signals at the line ends initially coexist with ambiguous
feature tracking direction, and cells that had ambiguous aperture signals within the line interior at the directional
signals in that region will no longer be active (Figure 8.36). long-range filter cells. The second and third rows of this
During the next cycle, the bottom-up signals from MT- column show how the rightward feature tracking direc-
near the border of this region can enhance the activities tion propagates towards the middle of the line through
of MSTv cells that are outside the original region, but near time, selecting consistent directional signals, while sup-
it. Then these MSTv cells can win the directional compe- pressing inconsistent ones, as it does so.
tition at their position, and feed back to MT- the winning
direction to again suppress different directional signals.
The cycle can iterate in this way and propagate the feature
tracking direction across the entire object by selecting Stable learning unifies
directionally-compatible ambiguous motion signals and
suppressing cells that code for different directions.
preattentive motion capture
What is remarkable about the propagation across space and attentive directional
of the feature tracking direction is that it arises through a
feedback cycle that just activates the same cells between priming
MT- and MSTv over and over again. There is no explicit lat-
eral motion mechanism, just the cyclic action of bottom- Such a feedback circuit also helps to understand how mo-
up and top-down feedback exchanges. The effect, though, tions in the world that automatically attract attention via
is one of a laterally propagating wave of directional selec- a bottom-up route interact with volitional top-down at-
tion, as occurs in the percepts and simulations of tilted tentive motion priming. The bottom-up route activates
moving lines (Figures 8.30 and 8.36) that were reported by transient signals, such as those that occur during change
Castet et al. (1993). blindness (see Chapter 7), and during long- range ap-
parent motion. The ART feedback circuit
creates an interface wherein bottom-up
automatic attention and top- down vo-
litional attention can cooperate and
compete until they reach a consensus.
Processes of learning, expectation, and
attention are hereby parsimoniously uni-
fied within a single circuit. As Chapter 5
noted, these three processes are a subset
of the CLEARS design whereby processes
of Consciousness, Learning, Expectation,
Attention, Resonance, and Synchrony
work together.
FIGURE 8.37. Processing stages that transform the transient cell inputs in response to a tilted moving line into a global percept of the
object’s direction of motion. The orientations of the lines denote the directional preferences of the corresponding cells, whereas line
lengths are proportional to cell activities.
MSTv. For example, in 1992, Patrick Cavanagh (Cavanagh, me in 1997 to publish an article with my PhD student
1992) described psychophysical data about what he called Aijaz Baloch (Baloch and Grossberg, 1997) in which we
an attention-based motion process that he proposed exists simulated many data that were collected in psychophys-
in addition to low-level or automatic motion processes. He ical experiments with humans, including percepts that
showed that this attention-based process enables human are called the line motion illusion (Hikosaka, Miyauchi,
subjects to make accurate velocity judgments. The Motion and Shimojo, 1993a, 1993b), motion induction (Faubert
BCS model explains these data by showing how the atten- and von Grünau, 1992, 1995; von Grünau and Faubert,
tive process solves the aperture problem while it selects the 1994; von Grünau, Racette, and Kwas, 1996), and trans-
correct speeds for the captured motion directions. In 1996, formational apparent motion (Tse and Cavanagh, 1995;
Stefan Treue and John Maunsell (Treue and Maunsell, Tse, Cavanagh, and Nakayama, 1996). I will now explain
1996) published neurophysiological data showing that at- these percepts to add additional experimental evidence
tention can modulate motion processing in cortical areas for the role of the ART Matching Rule, as well as to artic-
MT and MST of macaque monkeys. In 1997, Kathleen ulate various perceptual issues that are of interest in their
O’Craven, Bruce Rosen, Kenneth Kwong, Anne Treisman, own right.
and Robert Savoy (O’Craven et al., 1997) showed by using
magnetic resonance imaging that attention can modulate
the MT/MST complex in humans.
In addition, there is growing evidence that the ART
Line motion illusion:
Matching Rule regulates attention during form and Attentive or preattentive?
motion perception, visual object recognition, auditory
streaming, speech perception, and cognitive working The line motion illusion illustrates how attention can in-
memory and list chunking, among other processes, as I fluence the perception of preattentive motion signals. In
have reviewed in several articles (Grossberg, 1999, 2013a, this percept, which was described in 1993 by Okihide
2017b, 2018). Even within the context of motion process- Hikosaka, Satoru Miyauchi, and Shinsuke Shimojo
ing per se, this hypothesis, when suitably modeled, enabled (Hikosaka et al., 1993a, 1993b), a line or bar is presented
318 | Conscious MIND Resonant BRAIN
prediction about how long it takes for the brain to solve the of how the aperture problem is solved using feedback in-
aperture problem (Pack and Born, 2001). They reported teractions between MT- and MSTv.
that MT neurons initially respond primarily to the com-
ponent of motion perpendicular to a contour’s orienta-
tion, which is the expected ambiguous direction response
due to the aperture problem. Over a period of approxi- ART Matching Rule explains
mately 170 milliseconds, the responses of MT cells grad-
ually shifted to code the true feature tracking direction, induced motion
regardless of orientation, again a much longer time than a
feedforward process could explain. Their experiment also The same modulatory property whereby the ART Matching
tested whether these MT cells controlled the direction of Rule chooses the feature tracking direction without dis-
eye movements. Their behavioral data showed that the in- torting the speed estimates of the surviving MT cells
itial velocity of pursuit eye movements also deviated in a also clarifies why motion signals do not appear to propa-
direction perpendicular to local contour orientation. gate throughout regions of empty space. Induced motion,
The Pack and Born data from 2001 are plotted in which was described by Hans Wallach in 1976 (Wallach,
Figure 8.38 alongside the 1997 simulation that I did with 1976), illustrates this property. During induced motion,
Jonathan Chey about this transition (Chey, Grossberg, and when a square outline moves to the right, this motion
Mingolla, 1997). This simulation also fit the Castet et al. can make an enclosed stationary dot appear to move to
(1993) data on the apparent speed of a tilted line moving the left. However, the intervening space does not appear
to the right (Figure 8.30). As I noted above, Castet et al. to move. Likewise, during motion capture, randomly scin-
displayed each line for 167 milliseconds. The time scale tillating dots may be captured by unambiguous motion
that led to our good fits to those data in Figure 8.30 also signals, but empty space is not, as was shown in 1985 by
led to the perceived direction simulation in Figure 8.38 V. S. Ramachandran and V. Inada (Ramachandran and
(right column). In this simulation, the perceived direction Inada, 1985).
gradually shifts from the tilted aperture-ambiguous initial These data properties are consistent with the the model
direction towards the correct object motion direction of prediction that top-down excitatory feedback from MSTv
zero degrees. The time scale in the simulation is shown in to MT- is directional and modulatory: Top-down, modula-
arbitrary units. When converted to the time scale of the tory on-center signals can help to select a motion direction
Castet et al. speed data, this arbitrary time scale is con- that has already been activated by the bottom-up long-
verted to approximately the 170 millisecond time scale range filter, but cannot, by itself, activate a cell.
that was reported in the neurophysiological recordings of Although the top-down, modulatory feedback cannot
Pack and Born (2001) that are summarized in Figure 8.38 create a suprathreshold motion signal in empty space, it
(left column). The fact that these two time scales so closely can do so at the position of the stationary dot, which re-
agree, especially given that the 1997 simulations of psy- ceives a bottom-up input, thereby generating the percept
chophysical data anticipated the 2001 recordings from MT of induced motion.
cells, provides additional support for the model hypothesis An exception to this rule may seem to occur during
long-range apparent motion, where
motion is perceived at positions
between the offset of one flash and
the onset of another flash. However,
this can be explained by a G-wave
of bottom-up activation that trav-
els between the flashes as it propa-
gates across the long-range filter in
MT- before this wave can activate the
circuits that trigger top-down ART
Matching Rule feedback from MSTv.
In summary, the feedback ex-
change of signals between MT- and
MSTv can support the multiple tasks
of stable learning of directionally-
tuned motion- selective cells,
FIGURE 8.38. The neurophysiological data from MT (left image) confirms the prediction
preattentive motion capture, di-
embodied in the simulation of MT (right image) concerning the fact that it takes a long time rectional attentional priming, and
for MT to compute an object’s real direction of motion. conscious perception of moving
320 | Conscious MIND Resonant BRAIN
objects. The original modeling articles with my students line first enters the rectangular aperture from the left, and
Aijaz Baloch, Julia Berzhanskaya, Jonathan Chey, Praveen the bottom row showing the horizontal direction that is
Pilly, and Lavanya Viswanathan provide explanations perceived when the line is moving in the middle of the rec-
and computer simulations of many more motion per- tangular aperture.
cepts (Baloch and Grossberg, 1997; Baloch et al., 1999; There are at least two possible explanations for the
Berzhanskaya, Grossberg, and Mingolla, 2007; Chey, perceived diagonal motion in the corner of the barberpole
Grossberg, and Mingolla, 1997, 1998; Grossberg, Mingolla, display: It could be that the diagonal motion is the result of
and Viswanathan. 2001; Grossberg and Pilly, 2008). Here a compromise, or average, between the horizontal and ver-
I will select a few more percepts for explanation that fur- tical feature tracking signals that are present at each end
ther demonstrate how key design principles of motion of the line at the corner positions. Alternatively, it could
perception work. be that the competing feature tracking signals may cancel
each other out, leaving the ambiguous motion signals with
their propensity to favor the direction normal to the orien-
tation of the line.
Barberpole motion A simple way to distinguish between these two expla-
nations is to alter the orientation of the line. According to
The next level of complexity in simulating the motion of the first explanation, this should result in, at most, a small
simple forms arises when one or more moving oriented shift towards the direction normal to the new orientation.
contours are viewed through an aperture. We already According to the second explanation, such a change in ori-
know that the motion of a contour is ambiguous when entation should also have little effect until the orientation
it moves within a circular aperture, due to the aperture becomes sufficient to favor one of the feature tracking di-
problem (Figure 8.5, left image). The barberpole display rections, at which time the motion of the line may be en-
has a long and a short axis that break the symmetry of tirely captured by that direction. Informal observations of
the circular aperture (Figure 8.5, right image). If a single barberpole displays suggest that the second of these expla-
diagonal line moves in this rectangular aperture at any nations is correct: In barberpole displays containing lines
time, then it is perceived to move diagonally in the corner whose orientation is not exactly intermediate between the
regions and horizontally in the long central region. The two aperture edges, the percept in the corner region seems
presence of multiple moving lines tends to lock the percept to move in the direction equal to that of the aperture edge
into a single direction of motion along the length of the whose orientation is most nearly perpendicular to the line.
aperture. A computer simulation of these effects is sum- This result suggests that the diagonal motion seen in
marized in Figure 8.39 (right column), with the top row the corner of barberpole displays is a result of inconclusive
showing the diagonal direction that is perceived when the competition between different feature tracking signals, in
the sense that no one feature tracking signal is
able to gain dominance and propagate to the rest
of the moving line. Interference between multiple
feature tracking signals is attributed in the model
to the grouping filter that occurs from cortical
areas MT- to MSTv. The receptive fields of this
filter are sufficiently large to overlap several fea-
ture tracking signals.
The difficulty of integrating feature tracking
signals that code different directions of motion
was also demonstrated in psychophysical experi-
ments by Jean Lorenceau and Maggie Shiffrar
in 1992 (Lorenceau and Shiffrar, 1992) and
again later in 2001 in an article by Lorenceau
with David Alais (Lorenceau and Alais, 2001).
Lorenceau and Shiffrar studied the integration of
motion information between multiple apertures,
each of which revealed a portion of a translating
diamond shape. One can think of apertures as the
FIGURE 8.39. Simulation of the barberpole illusion direction field at two result of occlusion of the moving object via a sur-
times. Note that the initial multiple directions due to the feature tracking sig-
face with holes of an irregular shape. Under some
nals at the contiguous vertical and horizontal sides of the barberpole (upper
image) get supplanted by the horizontal direction of the two horizontal sides conditions, observers could perceive the dia-
(lower image). mond’s rigid motion, but under other conditions,
How We See and Recognize Object Motion | 321
also asymmetric competition within direction from larger (Figure 8.26), in order to process visual signals that are
to smaller scales, as illustrated by Figure 8.41 (bottom half). ambiguous in both their form and motion properties. In
Such an asymmetry between near and far was also used to other words, formotion interactions between the form
explain how percepts of occluded objects can get completed and motion cortical processing streams are often needed
behind their occluders, as summarized in Figure 4.51. to generate an unambiguous percept of objects moving in
How does the asymmetry between near and far abet depth. The apparent motion of illusory contours that is de-
transparency if two fields of dots are otherwise symmetri- scribed in Figure 8.11 illustrates such a formation interac-
cally processed by the visual system, as in Figure 8.41 (top tion. I will now explain another example that emphasizes
half, left image)? Fluctuations within the visual system, new properties of how formotion processes interact with
whether due to small asymmetries in activation, or to mo- figure-ground processes: namely, the chopsticks illusion
mentary attentional biases, can break the symmetry of ac- that Stuart Anstis reported in 1990 (Anstis, 1990). Here,
tivation of opponent directional cells by the two arrays of as with all of the examples that I discuss, my conclusions
dots moving in opposite directions. Such an asymmetry are not restricted to stimuli as simple as chopsticks. These
can render one direction of motion momentarily more simpler stimuli illustrate general perceptual problems and
active than another. This asymmetry may be implemented the mechanisms that can solve them in response to far
computationally by randomly selecting one of the two more complex form and motion stimuli.
active directions of motion, say rightward motion within In the displays that generate the chopsticks illusion
a given scale, say the big scale in Figure 8.41 (bottom), (Figure 8.42), two overlapping diagonal bars of the same
and does so inside the foveal region. Then the rightward luminance (the “chopsticks”) move in opposite directions,
direction would tend to be enhanced in the near depth. to the left or to the right. The display contains two kinds of
Because the big scale inhibits the rightward directional features: the terminators of each line and the intersection
cells of small scale at the corresponding position in MSTv, of the two lines. The two line terminators of one bar move
the leftward directional cells would be disinhibited at the leftward while the two lines terminators of the other bar
further depth. This would begin the process of perceiving move rightward. The intersection of the two lines moves
two coherently moving sheets of dots moving in opposite upward.
directions at different depths. When the lines are viewed behind visible occluders, as
In order to achieve such a coherent percept, interac- in Figure 8.42 (left column, top row), they appear to move
tions between MT- and MSTv must propagate across space, together as a welded unit in the upward direction, as in
as they always do to solve the aperture problem; cf. Figure Figure 8.42 (left column, bottom row). When the occlud-
8.36. In particular, the chosen directions, at the chosen ers are made invisible, as in Figure 8.42 (right column, top
depths, in MSTv would select the corresponding directions row), the lines no longer cohere. Instead, they appear to
and depths due to MSTv-to-MT- attentional signals. Such slide one on top of the other in opposite directions, as in
an operation is consistent with data showing attentional
enhancement in MT/MST in both monkeys and humans
(O’Craven et al., 1997; Treue and Martinez Trujillo, 1999;
Treue and Maunsell, 1996, 1999). As this feedback cycle
continues between MT- and MSTv, two sheets of coher-
ently moving dots at different depths will emerge. In the
computer simulations of this motion capture process that
I published in 2001 with Lavanya Viswanathan, the atten-
tional change was applied only within the selected direc-
tion and scale and inside an attentional locus that is 6.25%
of the total display area, and propagated outwards from
there to envelop the entire display.