You are on page 1of 81

Attention in Psychology: I Historical Background

 Attention was one of the first concepts to appear in


Psychology texts (ca 1730) – e.g., Ebbinghaus, Titchener, …
 Early discussions (Hatfield, 1998) focused on properties such as
 Narrowing (Aristotle, 4th century BC)
 Active Directing (Lucretius, 1st century AD)
 Involuntary shifts (Hippo, 400 AD)
 Clarity (Buridan, 14th century)
 Fixation over time (Descartes, 17th century)
 Effector sensitivity (Descartes)
 All the above phenomena (William James, early 1900s)
 More recent studies have been concerned with
 The view of attention as selection
 The analysis of attention as a process of resource allocation
 The study of the relation between voluntary and involuntary
control of attention

Attention as Selection
We will concentrate on the Selection or Filtering aspects of
attention. We will ask:
1. Why do we need to select anyway?
 Because our processing capacity is limited?
The Big Question: In what way is it limited? (Miller, 1957)
 We will return to this core question after some preliminaries on
the early study of attention as selection and the filter theory.
2. On what basis do we select? Some alternatives:
 We select according to what is important to us (e.g., affordances)
 We select what can be described physically (i.e., “channels”)
 We select based on what can be encoded without accessing LTM
 We “pick out” things to which we subsequently attach concepts:
i.e., we pick out objects (or regions?)
3. What happens to what we have not selected? A largely
unsolved mystery (though in some cases there are plausible
answers).

Big Question #1: Why do we
need to select information?
Along which dimensions is human information
processing capacity limited?
 Channel capacity: Shannon-Hartley Theorem

 Capacity measured in some sort of “chunks” (Miller)


 Capacity measured in terms of the number of
arguments that can be simultaneously bound to
cognitive routines (Newell)
To what things in the world can the arguments of
visual predicates be bound?
Early studies: Colin Cherry’s
“Cocktail Party Problem”
 What determines how well you can select one
conversation among several? Why are we so good at it?
 The more controlled version of this study used dichotic
presentations – one “channel” per ear.
 Cherry found that when attention is fully occupied in
selecting information from one ear (through use of the
“shadowing” task), almost nothing is noticed in the
“rejected” ear (only if it was not speech).
 More careful observations shows this was not quite true
 Change in spectral properties (pitch) is noticed
 You are likely to notice your name spoken
 Even meaning is extracted, as shown by involuntary ear
switching and disambiguating effect of rejected channel content
Visual analogues illustrating the
two-channel selection problem

In these examples you are to read only the


text in shadows and ignore the rest. Read as
quickly as you can and when you are
finished, close your eyes or look away from
the text.
Visual analogue #1 illustrating the
two-channel selection problem

In performing an experiment like this one on


man attention car it house is boy critically hat
important she that candy the old material
horse that tree is pen being phone read cow by
book the hot subject tape for pin the stand
relevant view task sky be read cohesive man
and car gramatically house complete boy but
hat without shoe either candy being horse so
tree easy pen that phone full cow attention
book is hot not tape required pin in stand
order view to sky read red it nor too difficult.
Visual analogue #2 illustrating the
two-channel selection problem

It is important that the subject man be car


pushed slightly boy beyond hat his normal
limits horse of tree competence pen for be only
in phone this cow way book can hot one tape be
pin certain stand that snaps he with is his paying
teeth attention in to the the empty relevant air
task and hat minimal shoe attention candy to
horse the tree second or peripheral task.
Broadbent’s Filter Theory
Rehearsal loop
Effectors

Motor planner
Very Short Term Store
Senses

Filter

Limited Capacity Channel

Store of conditional
probabilities of past
events (in LTM)

Broadbent, D. E. (1958). Perception and Communication. London: Pergamon Press.


Problems with the Filter Theory
 The filter “leaks.” Work by Treisman, Lackner, and many
others shows that the filter could not be eliminating parts of the
input using a physically-defined channel, because the properties
on the basis of which the input is filtered require a high level of
processing (e.g., determination of meaning). Consequently such
information must have to have gotten through the filter!
 Many solutions to this conundrum have been proposed, ranging
from replacing the filter with an attenuator, to various complex
(and highly incomplete) proposals such as those of Deutsch &
Deutsch, (1963) and Norman (1968), Morton (1969) and
Neisser(1967), none of which are satisfactory, but each of which
embodies some ideas that may be part of the story.
 What all these alternatives do is assume that the filter is
responsive to top-down expectancy and prediction effects. But
the evidence is against this sort of knowledge-based selection as a
general property of perception (Pylyshyn, 1999), although it is
possible within such modular domains as language processing.
Stroop Effect
Baseline: Name the colors of the ink




Stroop Effect in Portuguese
Name the colors of the ink
VERMELHO VERDE AZUL MARROM ROSA
ALARANJADO VERDE ROSA VERMELHO AMARELO
VERDE AMARELO VERMELHO MARROM
VERMELHO AZUL MARROM VERDE VERMELHO
ALARANJADO VERMELHO AZUL AMARELO ROSA
ALARANJADO VERDE AZUL MARROM ROSA
VERMELHO AMARELO VERDE AMARELO
VERMELHO MARROM ROSA VERMELHO AMARELO
VERDE AMARELO VERMELHO ROSA ALARANJADO
VERDE AZUL MARROM ROSA VERMELHO
AMARELO VERDE AMARELO VERMELHO BROWN
VERMELHO AZUL MARROM VERDE AMARELO
VERDE AMARELO VERMELHO ROSA ALARANJADO
VERDE VERMELHO AZUL MARROM VERDE
VERMELHO ALARANJADO VERMELHO AZUL
Stroop Effect in English
Name the colors of the ink
RED GREEN BLUE PINK BROWN ORANGE GREEN
PINK RED YELLOW GREEN YELLOW RED BROWN
RED BLUE BROWN GREEN RED ORANGE RED
BLUE YELLOW PINK ORANGE GREEN BLUE
BROWN PINK RED YELLOW GREEN YELLOW RED
BROWN PINK RED YELLOW GREEN YELLOW RED
PINK ORANGE GREEN BLUE BROWN PINK RED
YELLOW GREEN YELLOW RED BROWN RED
BLUE GREEN BROWN YELLOW GREEN YELLOW
RED PINK ORANGE GREEN RED BLUE BROWN
GREEN RED ORANGE RED BLUE YELLOW
YELLOW GREEN YELLOW RED BROWN PINK RED
YELLOW GREEN PINK RED YELLOW
Degree of Interference of the attended
message, as well as its interpretation, shows
that the rejected message was understood
 Moral: Although the rejected channel appears to be rejected, it
is being processed enough to understand the words!
 The semantic interpretation of attended message depends on
the meaning content of the rejected message. Subjects were
asked to paraphrase the attended message in:
– Channel 1 (attended): “I think I will go down to the bank but I will be
back for dinner”
– Channel 2 (rejected): “The election results will depend on the value of
the dollar against the Euro and on the state of the domestic economy”
– OR Channel 2 (rejected): “The rain has resulted in erosion by the
overflowing river”

Lackner, J. R., & Garrett, M. F. (1972). Resolving ambiguity: Effects


of biasing context in the unattended ear. Cognition, 1, 359-372.
Amount of information in terms of the
Information-theoretic measure (entropy)
 Amount of information in a signal depends on how much one’s
estimate of the probability of events is changed by the signal.
H = -pi Log2 (pi) … information in bits
 “One of by land, two if by sea” contains one bit of information if the
two possibilities were equally likely, less if they were not (e.g., if one
was twice as likely as the other the information in the message would
be ⅓ Log ⅓ + ⅔ Log ⅔ = 0.92 bits)
 The amount of information transmitted depends on the potential
amount of information in the message and the amount of correlation
between message sent and message received. So information
transmitted is a type of correlation measure.
 The information measure assumes an “ideal receiver”. It is the
maximum information that could be transmitted, given the statistical
properties of messages, assuming that the sender and receiver know the
code. This maximum depends on physical properties of the channel –
its Channel Capacity.
Information transmitted in a typical
absolute judgment experiment

 Information transmitted in an experiment in which subjects were presented with


tones drawn from a known practiced set (of a given size, which determines the
value of input information) and had to name the tones from a learned name set.
 The information transmitted was always around 2.5 bits or an average of 6.25
equiprobable alternatives!
The channel capacity hypothesis implies that
the amount of information retained in STM is
constant and independent of the type of items
 But it turns out that much more information
is retained when the items are drawn from a
larger set (e.g., more information can be
retained when the input is numerals rather
then than binary digits, more for letters,
more for words, etc).
Why can we retain vastly different amounts
of information just by using a different
encoding vocabulary?
• Answer: The architecture of the cognitive system has the
property that it can deal with a fixed maximum number of
items, regardless of what the items are.
• This property can be exploited to get around the bottleneck of
the short-term memory. We do this by recoding the input into
a smaller number of discrete units, called chunks.
• There is also evidence that it takes additional time to encode
and decode chunks, so the recoding technique is a case of
time-capacity tradeoff or what is known in CS as a compute-
vs-store tradeoff.
• Newell has a model of the time taken in the Sternberg memory
scan experiment that attributes the observed RT to encoding or
chunking.
Example of the use of chunking

• To recall a string of binary bits – e.g., 00101110101110110101001


• People can recall a string of about 8 binary integers. If they learn a
binary encoding rule (000, 011, 102, 113) they can recall
about 8 such chunks or 18 binary bits. If they learn a 3:1 chunking rule
(called the Octal number system) they can recall a 24 bit string, etc
Does the evidence support this idea?

Memory span can be greatly increased through chunking! Yet


chunking has also been used to explain things it cannot explain. It is
only explanatory if you have an account of how chunking occurs and
what rules in LTM are being used (and what counts as a chunk).
What does visual attention select?
(What are the bases for selection?)
 If attention is selection, what does visual attention select?
 An obvious answer is places. We can select places by moving
our eyes so our gaze lands on different places.
 When places are selected, are they selected automatically?
 Must we always move our eyes to change what we attend to?
 Studies of Covert Attention-Movement: Posner (1980).
 How does attention switch from one place to another?
 Is it always the case that we attend to places? Can we attend to
any other property? Can we select on the basis of color, depth,
spatial frequency, affordances, or the property a painting has of
having been painted by Da Vinci (A property to which Bernard
Berenson was able to attend extremely well). cf Gibson

How else can visual attention select?
 Can we control the size and shape of the region that is selected, or
is selection always punctate and data-driven?
 Zoom Lens model of spatial attention (Eriksen & St James, 1986).
 What controls where attention moves:
 Is this automatic or voluntary?
 How do we know where to direct our attention? How do we

specify a location prior to attending to it?


 We need a way to specify where or what prior to attending to it!
Keep this conundrum in mind – we will return to it later!
 How narrowly can we focus our attention? Can we make it pick
out one out of several objects?
 Are there special conditions under which we are able to pick out
individual things? We will return to “attentional resolution” or the
minimum spacing for selecting individual things.
Covert movements of attention

Example of an experiment using a cue-validity paradigm for showing that the


locus of attention moves without eye movements and for estimating its speed.
Posner, M. I. (1980). Orienting of Attention. Quarterly Journal of Experimental Psychology, 32, 3-25.
Extension of Posner’s demonstration of attention switch
Fixation Target-cue
frame Cue interval Detection target

* Cued

Uncued

Along the
path
*

Does the improved detection in intermediate locations entail that the “spotlight of
attention” moves continuously through empty space?
Exogenous vs endogenous
control of attention
 In the Posner paradigm illustrated in the last slide, attention was automatically
grabbed by the onset of a spot (exogenous attention allocation). Other
experiments showed that this could be done under voluntary (endogenous)
control – e.g., by providing an arrow at fixation indicating what direction to
move attention.
 Posner, Tsal and others showed that when attention goes from A to B,
intermediate locations are maximally sensitive to detecting a signal at
intermediate times.
 Both exogenous and endogenous control produces movement of attention, but
they differ in some of their effects.
 Endogenously moved attention does not lead to Inhibition of Return (we
will turn to this next)
 Endogenous controlled movement does not appear to affect detection
sensitivity, but it does affect discrimination
 Endogenous controlled effects are stronger and appear earlier
 Although the evidence suggests a continuously moving “spotlight” of attention,
there are other models that claim that this is a side-effect of an attentional
activation that fades at the starting place and grows at the target place, creating
an overlap in intermediate locations (Sperling).
We can select a shape even when it is
intertwined among other similar shapes

Are the green items the same? On a surprise test at the


end, subjects were not able to recognize recall shapes
that had been present but had not appeared in green.
The time-course of attention:
Inhibition of return
 If we vary the time between the cue and target in a modified Posner
paradigm, we find that when the Cue-Target-Onset-Asynchrony
(CTOA) gets to around 300-900 ms, reaction time to the target begins to
increase. This is called Inhibition-of-return (Klein, 2000).
 To get this effect we actually have to attract attention to the target location and
then attract it back to the origin. IOR is one of many examples of an inhibition
effect being produced by attention.
Other examples of attentionally
induced inhibition
 Negative Priming (Treisman & DeShepper, 1996).

 Is there a figure on the right that is the same as the figure on the left?
 When the figure on the left is one that had appeared as an ignored
figure on the right, RT is long and accuracy poor.
 This “negative priming” effect persisted over 200 intervening trials
and lasted for a month!
Another negative
attention effect:
Inattentional
Blindness
Inattentional Blindness
 The background task is to report which of two arms of the + is
longer. One critical trial per subject, after about 3,4 background
trials. Another “critical” trial presented as a divided attention
control.
 25% of subjects failed to see the square when it was presented in
the parafovea (2° from fixation).
 But 65% failed to see it when it was at fixation!
 When the background task cross was made 10% as large,
Inattentional Blindness increased from 25% to 66%.
 It is not known whether this IB is due to concentration of
attention at the primary task, or whether there is inhibition of
outside regions.
In what other ways might our
information capacity be limited?
 We have limitations on the input side that depend on
the acuity of the sensors and the range of physical
properties to which they respond.
 But there is a limitation beyond that of acuity: The
perceptual system is limited in what it can individuate
and how many of these individuals it can deal with at
one time. The capacity to individuate is different from
the capacity to discriminate.
 This notion of individuating and of individuals may be
related to Miller’s “chunks”, but it has a special role in
vision which we will explore in the next lecture
 First some reason for thinking that individuating is a
distinct process
Individuating is different from discriminating
Individuating as a distinct process
 Individuating has its own psychometric function: The
minimum distance for individuating is much larger than for
discriminating.
 It may be that in vision our attention is limited in the
number of things we can individuate and simultaneously
access (more on this later). But how do you determine what
counts as a “thing”? See next lecture.
 Individuating is a prerequisite for recognition of patterns
and other properties defined among a number of individual
parts
 An example of how we can easily detect patterns if they are defined
over a small enough number of parts is in subitizing
 Another area where the concept of an individual has become
important is in cognitive development, where it is clear that babies
are sensitive to the numerosity of individual things in a way that is
distinct from their perceptual abilities but is limited in its capacity
Pick out 3 dots and keep track of them

 You can follow instructions to “move one up” or Move 2 right” etc so
long as at no time do you have to hold on to more than 4 dots
 You can pick out 4 dots and then search through those 4 locations if
all dots change to search items (Burkell & Pylyshyn, 1997)
 You can count up to 4 dots without error (Trick & Pylyshyn, 1994)
 You can keep track of 4 dots through saccades (Irwin, 1996)
 You can detect such basic patterns as inside(dot, contour),
Collinear(x1,x2,x3,x4), or Online(dot, contour) so long as there are a
small number of the relevant arguments to hold on to at one time.
Next: Objects and Attention
Are there collinear items (n>3)?
Several objects must be picked out at
once in making relational judgments
 The same is true for other relational judgments like
inside or on-the-same-contour… etc. We must pick
out the relevant individual objects first.
When items cannot be individuated,
predicates over them cannot be evaluated
 Do these figures contain one or two distinct curves?
 Individuating these curves requires a “curve tracing”
operation, so Number_of_curves (C1, C2, …) takes time
proportional to the length of the shortest curve.
The figure on the left is one continuous
curve, the one on the right is two distinct
curves – as shown in color.
Another example: Subitizing vs Counting.
How many squares are there?
Subitizing is fast, accurate and only slightly
dependent on how many items there are. Only the
squares on the right can be subitized.

Concentric squares cannot be subitized because individuating


them requires curve tracing, just as it did in the spiral example.
Signature subitizing phenomena only appear when objects
are automatically individuated and indexed

Trick, L. M., & Pylyshyn, Z. W. (1994). Why are small and large numbers enumerated differently? A
limited capacity preattentive stage in vision. Psychological Review, 101(1), 80-102.
Example of subitizing popout
and non-popout features
(Count Pink vs. Count Online)
What is attention is for?
Treisman’s Attention as Glue Hypothesis
 The purpose of visual attention
is to Bind properties together in
order to recognize objects
How are conjunctions of features detected?

Read the vertical line of digits in the following display

Under these conditions Conjunction Errors are very frequent


Rapid visual search (Treisman)
Find the following simple figure in the next slide:
Rapid visual search (conjunction)
Find the following simple figure in the next slide:
Find the unique item in this slide
Serial vs parallel search?
 Finding an object that differs from all others in a scene
by a single feature – called a single-feature search – is
fast, error-free and almost independent of how many
nontargets there are;
 Finding an object that differs from all others by a
conjunction of two or more features (and that shares at
least one feature with each object in the scene) – called a
conjunction search – is usually slow, error-prone, and is
worse the more nontargets there are in the scene*.
 These results suggest that in order to find a conjunction,
which requires solving the binding problem, attention has
to be scanned serially to all objects.

* This way of putting is simplifies things. Under certain


conditions the serial-parallel distinction breaks down
The attention-as-glue hypothesis has a
converse: In addition to requiring
attention to recognize objects
Attention is primarily directed at Objects
 Instead of being like a spotlight beam that
can be scanned around a scene and can be
zoomed to cover a larger or smaller area,
perhaps attention can only be directed
towards occupied places – i.e., to visual
objects. (This is compatible with both kinds
of attention allocation occurring).
Evidence for attentional selection
based on Objects
 Single Object Advantage: pairs of judgments are
faster when both apply to the same perceived object
 Entire objects acquire enhanced sensitivity from focal
attention to a part of the object
 Single-Object advantage occurs even with
generalized “objects” defined in feature space
 Simultanagnosia and hemispatial neglect show
object-based effect
 Studies with Moving Objects
 IOR
 Object Files
 MOT
Some single-object superiority studies

Duncan (1984) showed that two judgments made about the same objects
are faster even when the distances and areas are controlled. He concluded
“Findings support a view in which parallel, preattentive processes
serve to segment the field into separate objects, followed by a process
of focal attention that deals with only 1 object at a time.”
Single-object superiority even
when the shapes are controlled
More controls for the Baylis study…
(Baylis, 1994)
Controls for
separability,
convexity, area…
“Objects” endure over time

 Several studies have shown that what counts as


an object (as the same object) endures over
time and over changes in location;
 Certain forms of disappearances in time and
changes in location preserve objecthood.
 This gives what we have been calling a “visual
object” a real physical-object character and
partly justifies our calling it an “object”.
Inhibition of return appears to be object-
based (as well as location-based)
 Recall that Inhibition-of-return is the phenomenon
whereby an object that has been attended (and
then attention is moved away from it) is less likely
to attract attention again in a period of 300 ms to
900 ms after it is first attended. The attended item
is said to be inhibited.
 This is thought to help in visual search since it prevents
previously visited objects from being revisited
 The original study used static objects. Then
(Tipper, Driver & Weaver, 1991) showed that IOR
moves with the inhibited object.
Object Based Inhibition of Return
Objects appear to carry their history with them
Object-specific priming of objects and contents
Object File Theory
Kahneman, Treisman & Gibbs(1992)

Letters are faster to read if they appear in the same box where they
appeared initially. Priming travels with the object. According to the theory,
when an object first appears, a file is created for it and the properties of the
object are encoded and subsequently accessed through this object-file.
Visual neglect syndrome is object-based

When a right neglect patient is shown a dumbbell that rotates,


the patient continues to neglect the object that had been on the
right, even though It is now on the left (Behrmann & Tipper, 1999).
Simultanagnosic (Balint Syndrome) patients
only attend to one object at a time

Simultanagnosic patients cannot judge the relative length of two


lines, but they can tell that a figure made by connecting the ends
of the lines is not a rectangle but a trapezoid (Holmes & Horax, 1919).
Balint patients can only attend to one object
at a time even if they are overlapping

Luria, 1959
End ? (for now)
 Multiple Object Tracking is a methodology
for studying Object-Based attention.
Multiple Object Tracking
 One of the clearest cases illustrating object-based
attention is Multiple Object Tracking
 Keeping track of individual scene objects requires a
mechanism for individuating, selecting, accessing
and maintaining the identity of individuals over time
 These are the functions we have proposed are carried out by
the mechanism of visual indexes (FINSTs)
 We have been using a variety of methods for studying visual
indexing, including subitizing, subset selection for search,
and Multiple Object Tracking (MOT).
Multiple Object Tracking
 In a typical experiment, 8 simple identical objects are
presented on a screen and 4 of them are briefly
distinguished in some visual manner – usually by
flashing them on and off.
 After these 4 “targets” have been briefly identified, all
objects resume their identical appearance and move
randomly. The subjects’ task is to keep track of which
ones had earlier been designated as targets.
 After a period of 5-10 seconds the motion stops and
subjects must indicate, using a mouse, which objects
were the targets.
 People are very good at this task (80%-98% correct).
The question is: How do they do it?
Keep track of the objects that flash
How do we do it? What properties
of individual objects do we use?
Keep track of the objects that flash
How do we do it? What properties
of individual objects do we use?
Explaining Multiple Object Tracking
 Basic finding: People (even 5 year old children, though
not most senior professors!) can track 4 to 5 individual
objects that have no unique visual properties
 How is it done?
 We have shown that it is unlikely that the tracking is done by
keeping a record of target locations, and updating them while
serially visiting the objects.
 I have proposed that individuating and keeping track of certain
kinds of individuals is a primitive visual operation and uses
the mechanism of visual indexes or FINSTs.
 Tracking is preconceptual* and preattentive§
(* § explanation is left for another occasion)
A possible location-based tracking algorithm
1. While the targets are visually distinct, scan
attention to each target in turn and encode its
location on a list.
2. When targets begin to move, check the n’th position
in the list and go to the location encoded there:
Call it Loc(n).
3. Find the closest element to Loc(n).
4. Update the actual location of the element found in
#3 in position n in the list: this becomes the new
value of Loc(n).
5. Move attention to the location encoded in the next
list position, Loc(n+1).
6. Repeat from #3 until elements stop moving.
7. Report elements whose locations are on the list.

Use of the above algorithm assumes (1) focal attention is required


to encode locations (i.e., encoding is not parallel), (2) focal
attention is unitary and has to be scanned continuously from
location to location. It assumes no encoding (or dwell) time at each
Predicted performance for the serial tracking algorithm as
a function of the speed of movement of attention
If we are not using and updating objects’
locations, then how are we tracking them?
 Since objects are identical, location is the only unique object
property, yet we do not appear to be using locations to track.
 Other ideas of how we track (e.g., that we view objects as vertices
of a deforming polygon), even if in some sense a true description,
does not explain how we do it (e.g. polygon strategy).
 We could be splitting attention, with each attentional beam
moving independently (but if so they act differently from focal
attention – e.g., subjects do not notice properties of targets).
 The explanation we prefer, which is independently motivated, is
that there are a small number of primitive indexes or pointers,
each of which can pick out a particular individual object qua
object and keeps providing access to the object as it changes its
properties and its location.
Additional examples of MOT
 MOT with occlusion
 MOT with virtual occluders
 "Rubber band" displays
Summary of some properties of indexing
revealed by our recent experiments
1. Targets can be tracked even when they disappear
behind an occluder and, under certain conditions,
even when all objects disappear from view (Scholl &
Pylyshyn, 1999; Keane & Pylyshyn, VSS2003). Demo:
MOT with occlusion
2. Properties of targets are not encoded during MOT nor
are they used in tracking. Changes in target
properties are not even noticed (Scholl, Pylyshyn & Franconeri,
1999).

3. Not all well-defined clusters of features can be


tracked: Only ones that correspond to objects (Scholl,
Pylyshyn & Feldman, 2001). Demo: "Rubber band" displays
Summary of some properties of indexing
revealed by our recent experiments
4. Indexes are assigned primarily exogenously
(involuntarily). They can also be assigned
endogenously (voluntarily) but only by moving focal
attention to each target serially (Annon & Pylyshyn, VSS2003).
5. Index maintenance in tracking appears to be non-
predictive and non-attentive (Keane & Pylyshyn, VSS2003; Leonard &
Pylyshyn, VSS2003).

6. Target-target confusions are much more numerous than


target-nontarget confusions. The reason appears to be
that nontargets are inhibited, which may prevent them
from being swapped with nontargets (Pylyshyn & Leonard, VSS2003).
So what are FINSTs?
 They are a primitive reference mechanism that refer to
individual objects in the world (FINGs?)
 Objects are picked out and referred to without using any
encoding of their properties, including their location.
Picking out objects is prior to encoding their locations!
 Indexing is nonconceptual because it does not represent the
individuals as members of some conceptual category.
 FINSTs serve as visual demonstratives, much like the terms
this or that do in language, by picking out and referring to
individuals without using their properties.
 The central function of FINST indexes is to bind
arguments of visual predicates or of motor commands to
things in the world to which they must refer. Only
predicates with bound arguments can be evaluated.
Schema for how FINSTs
function in visual-motor control
The binding hypothesis of the
visual-cognitive bottleneck
 Going back to Newell’s binding hypothesis we are
hypothesizing that the bottleneck between vision
and cognition is in the number of objects that can
be simultaneously bound to the arguments of
cognitive routines
 Another way to put this is that visual cognition
can simultaneously attend to only about 4 objects.
 There is direct evidence for the limit of about 4 visual
objects in visual working memory:
Luck, S. and E. Vogel (1997). "The capacity of visual working
memory for features and conjunctions." Nature 390: 279-281.
Information processing capacity appears to be limited to
7 ± 2 “chunks” rather than to a number of bits or baud

 Experiments on “short term memory” STM


(or “working memory”)
– Miller, G. A. (1956). The magical number seven,
plus or minus two: Some limits on our capacity
for processing information. Psychological Review,
63, 81-97.
 Experiments on the capacity of Visual
Working Memory (Luck & Vogel, 1997)
Studies of the capacity of Visual
Working Memory (Luck & Vogel, 1997)
 People appear to be able to retain about 4
properties of an object (4 colors, 4 shapes, 4
orientations, etc) over a short time
 People can also retain the identity of 4 objects for
a short time.
 Luck and Vogel found that as long as there are
not more than 4 properties per object, people can
retain large numbers of properties (a phenomenon
that is reminiscent of Miller’s “chunking
hypothesis” except the chunks are objects).

You might also like