Features To Objects: Sjoerd Stuit

Features to objects Sjoerd Stuit
Elements of vision
2
Cortical visual processing
❖ Retina & LGN:

❖ spatial frequency
❖ V1: increasingly
complexity
3
V1
❖ V1: new
response
properties
4
Orientation tuning
Orientation selectivity can be built up from spatial interactions between different center surround cells. Responds to co-
activation of all cells by an oriented line/bar
Makes a gabor filter RF structure. Many different combinations of spatial frequency and orientation and phase
Tuning curves
Orientation
580
selectivity M. J. VAN DER SMAGT, C. WEHRHAHN, AND T. D. ALBRIGHT
FIG. 2. Influence of surrou

of a V1 neuron to bright target
to target line in classical recept
mask. Neuron was sharply tu
slightly clockwise from vertica
to target stimuli recorded in the
ent stimulus conditions indicate
yielded suppression of respons
ited by target alone. Greatest le
elicited by masks composed o
same orientation and contrast p
of suppression was reduced if t
with respect to orientation or
6 bination of orientation and pola
no greater or lesser effect tha
peristimulus time histograms
ms, smoothed by a sliding Gau
SD) showing time-course of
same-polarity masks. Differenc
els as a function of masking c
ms after stimulus onset and per
ms. D: PSTHs showing time
caused by different-polarity m
sponses appeared to diverge
$80 –90 ms after stimulus on
for another 400 ms. Response
tions remained indistinguishab
Estimates of PSTH event timin
approximate, owing to lack o
averaged PSTHs (Fig. 6) for s
of event times.
target. The icons depict the specific stimulus conditions used. evidence for an additional suppression re
First of all, it is abundantly clear that a surround mask of any consistent with additive effects of the two
variety used suppressed the response to the target [repeated trary, the pattern of suppressive effects see
measures ANOVA (F ! 7.40 P " 0.001) with LSD posthoc the signature of a highly interactive syste
comparisons, P " 0.01]. Second, consider the effects of a difference resulted in reduced mask suppr
surround mask that was of the same polarity as the target (i.e., and target were of the same luminance con
bright). A mask oriented similarly to the target yielded sup- and Roc bars), but not when mask and tar
pression (Rnc bar in Fig. 2B) that was significantly stronger polarities (Rpc and Rdc bars). Similarly, a
than suppression resulting from a mask with differently ori- resulted in reduced mask suppression whe
ented line segments (Roc bar; P " 0.05). This pronounced were of the same orientation (Rnc and Rpc b
orientation-specific masking effect corroborates previous mask and target were different orientations
claims (Knierim and van Essen 1992; Li et al. 2000). Figure 2C shows the PSTH for the targ
Third, turning the tables, consider the effects of polarity for along with PSTHs for the two same-po
a surround mask that was of the same orientation as the target. expected, the cell responded strongly to th
A mask with the same polarity as the target yielded suppression of the target in the absence of a mask (Rt
(Rnc bar in Fig. 2B) that was significantly greater than that of the sustained component of the respon
resulting from a mask with a different polarity (Rpc bar; P " significantly smaller when a same-polarity
0.05). This novel finding constitutes a contrast polarity analog (Roc), and even more so when the mask a
of the established orientation-masking phenomenon. the same orientation (Rnc). Figure 2D sho
Finally, our experiment allowed us to examine the conjoint the target-only condition with those for
effects of orientation and contrast polarity cues. Figure 2B polarity masks. The magnitude of the sus
reveals that the degree of response suppression for a double- of the response to a target was smaller
cue mask (Rdc bar) was no different from the effects of either polarity mask was present. However, the r
cue alone (Roc and Rpc bars; P # 0.1 for each comparison). (Rpc) and different-orientation (Rdc) mas
Thus, although masking responses remained suppressed rela- indistinguishable throughout the entire ep
tive to that elicited by the target alone (Rto bar), we saw no were recorded.
J Neurophysiol • VOL 94 • JULY 2005 • www.jn.org
Orientation columns
7
Orientation columns
8
Ocular dominance columns
Each little patch of cortex represents the content of a single receptive field (map location) with lots of different response
properties (columns). blobs are sensitive to color as well, segregation is mainained in V2
Increasing complexity
❖ V1: complex cells
10
Selectively combining outputs from several cells give more complex cell preferences (must respond to this and this and this)
Takes several spatially overlapping inputs, but some separation and summation leads to larger receptive fields with more
processing Complex cells respond to both light and dark bars of the same orientation. Some spatial invariance. Larger receptive
fields, but more complex responses. Signature of hierarchical convergent processing
❖ Complex cell:
❖ Orientation selective
❖ Not selective for absolute
position
❖ Not selective for ON or OFF
(responds to onset of either

or both)
❖ No response to small spots
or diffuse illumination
❖ increased RF size
11
Here information on spatial position is thrown out

Still represented in input neurons, and still accessible to awareness
❖ V1 hyper complex cells

❖ combines complex and/or simple
cells
❖ End-stopping: length preferences
❖ Increased RF size
12
The aperture problem
❖ Early receptive fields are

rather small.
❖ Limited information:
multiple solutions
13
early in the visual pathway, RFs are still small. This lead to certain problems.
Integration
❖ Orientation
integration
from outside of
the receptive
field!
14
After orientation detection, structure over the whole image must be analyzed, a much more complex process
Initially relies on examining differences between content of nearby receptive fields
Extra-classical RF
from: Bakin et al (2000) 15
Monkey V1/2
Flank facilitation signals stimulus contours. A, It is easier to detect a continuous contour composed of individual elements as the
number of component elements increases. B, Adding colinear flanking lines outside of the RF (dashed line-outlined square) core
increases, or facilitates, the response of the neuron to a target stimulus located inside of the cell's RF. However, interrupting the
path of the smooth contour by inserting a bar oriented orthogonal to the path of the contour blocks the flank-induced neural
facilitation (adapted from Kapadia et al., 1995). In this, and all other quantified response plots, the response rate [spikes/second
(s/s)] of the neuron is plotted on the y-axis. C, Quantified responses recorded from a V2 neuron that exhibited flank facilitation
are shown. Placing the orthogonal bar either in the same plane as the flank and target stimulus or in the far depth plane (0.16°
uncrossed disparity) blocked the flank-induced facilitation of the neural response to the target stimulus. However, when the
orthogonal bar was placed in the near depth plane (0.16° crossed disparity; arrow), the flank facilitated the neuron's response to
the target stimulus, suggesting that flank facilitation can signal contours even when they are partially occluded. Fix refers to
plane of fixation; dot in diagram at left is fixation plane.
Extra-classical RF
❖ Interaction of
receptive field
content with
information from
neighbouring
receptive fields
❖ Increases response
to receptive field
content
(facilitation)
❖ Intermediate stage
to further form
processing
16
Not always facilitation
❖ Center-Surround Suppression
17
Extra-classical RF
Similarity needed for integration
❖ Not innate, learned

❖ Develops slowly as we learn
regularities in our
environments: Statistical
learning
18
Statistical Learning
Freeman et al, 2013

19
After V1 is becomes more and more difficult to figure out the response preferences for neurons.
This is a neural network answer: V2 has learned from the real world that images normally have local correlations in colour, shape
and orientation
i.e. it responds when inputs have the correlations that are common in the real world
Scene statistics from the real world: If it fires together, it wires together.
So neural response selectivity gets quite complex very quickly
Statistical Learning
NIH-PA Author Manuscript

C

B
Freeman et al, 2013

20
Figure 2.
Neuronal responses to naturalistic textures differentiate V2 from V1 in macaques. (a) Time
course of firing rate for three single units in V1 (green) and V2 (blue) to images of
naturalistic texture (dark) and spectrally-matched noise (light). Thickness of lines indicates
s.e.m. across texture families. Black horizontal bar indicates the presentation of the stimulus;
gray bar indicates the presentation of the subsequent stimulus. (b) Time course of firing rate
averaged across neurons in V1 and V2. Each neuron's firing rate was normalized by its
maximum before averaging. Thickness of lines indicates s.e.m. across neurons. (c)
Modulation index, computed as the difference between the response to naturalistic and the
response to noise, divided by the sum. Modulation was computed separately for each neuron
and texture family, then averaged across all neurons and families. Thickness of blue and
green lines indicates s.e.m. across neurons. Thickness of gray shaded region indicates the
2.5th and 97.5th percentiles of the null distribution of modulation expected at each time
point due to chance. (d) Firing rates for three single units in V1 (green) and V2 (blue) to
naturalistic (dark dots) and noise (light dots), separately for the 15 texture families. Families
are sorted according to the ranking in panel e. Gray bars connecting points are only for
visualization of the differential response. Modulation indices (averaged across texture
families) are reported in the upper right of each panel. Error bars indicate s.e.m. across the
After V1 is becomes more and more difficult to figure out the response
15 preferences for neurons.
samples of each texture family. (e) Diversity in modulation across texture families,
This is a neural network answer: V2 has learned from the real world that images normally have local correlations in colour, shape
Nat Neurosci. Author manuscript; available in PMC 2014 January 01.
and orientation
i.e. it responds when inputs have the correlations that are common in the real world
Scene statistics from the real world: If it fires together, it wires together.
So neural response selectivity gets quite complex very quickly
Spatial Interaction at multiple scales
❖ Photo-receptor level (horizontal cells)
❖ On/Off organization ganglion cells and LGN
❖ On/Off organization simple cells
❖ End-stopped behavior hypercomplex cells (orthogonal with respect
to simple cell interactions)
❖ EXTRA-Receptive Field interactions in early visual cortex

(e.g. collinear facilitation)
❖ Complex statistical learning V2
21
Retinotopy
❖ Lateral interaction
require structure
❖ Retinotopic maps:
intact map of the
visual field on the
cortex
22
Visual Maps
❖ We have 30+
visual areas, each
with a complete
visual field map
❖ Different
specialisations
and different
response
properties
23
Beyond early visual cortex
❖ Increasing receptive field

size
❖ Increasingly complex
respons properties
24
25
In the IT, which is part of the ‘what’ pathway we do find highly specialised neurons specialized cells…hands only..
How specific does it get…and what do we need for such a specificity?
❖ Computational problems with
specialized cells for every
object? (i.e. ‘grandmother cells’)
❖ Error prone…
❖ Susceptible to cell death…
❖ What about new (never seen)
objects?
❖ Very uneconomical
❖ How big should our head be….?
Alternative:
Ensemble coding/
Population coding
26
Binding problem
❖ How do we ‘bind’ all these features together to form a

coherent percept?
❖ What does behaviour tell us?
❖ Gestalt: The sum is more than its parts
27
Gestalt grouping
❖ Any process that builds a

representation of larger-scale
shapes must integrate local
information. On what basis?
❖ Gestalt psychologists formulated a

set of rules or "laws" of perceptual
organization.
28
Gestalt grouping
❖ Symmetry
29
Gestalt grouping
❖ Proximity
30
Gestalt grouping
❖ Similarity
31
Gestalt grouping
❖ Good continuation
32 32
Facilitate contours: remember collinear facilitation

Gestalt grouping
❖ Combining cues
33
Gestalt grouping
❖ They probably reflect assumptions about the nature of
real-world surfaces and objects:
❖ Objects are made of cohesive, opaque, uniform
materials (proximity, similarity).
❖ When objects change or move, all their parts change
or move (common fate).
❖ Object contours tend to vary smoothly (good
continuation).
34
Object perception
❖ Gestalt grouping tells us showing about the rules the

systems uses to integrate information across space (and
time) to form objects
❖ But we still need to recognise them!
35
Object perception
❖ detection
❖ Is there an object?
❖ recognition
❖ Is it a mug?
❖ discrimination
❖ My mug versus
your mug
❖ interaction
❖ Pour the milk from
a jug into a mug
36
Object perception
❖ Why is this difficult?
37
Object perception
❖ Stage 1 builds a piecemeal
representation of local image
properties.
❖ Stage 2 builds a representation of

larger-scale shapes and surfaces.
❖ Stage 3 matches shapes and

surfaces with stored object
representations-recognition.
38
Overall plan of processing for object recognition/identification

Object perception
❖ Can we implement
these stages?
❖ Problems?
39
This is one way different views of an object can be associated, or different individual examples
Component shapes are relatively easy to extract, and will generally have the same spatial relationships to each other
Object perception
❖ Same relative features,

very different conclusions
40
Object perception
❖ The problem with object perception:
❖ The fundamental problem in building object representations is

that images of objects reflect a combination of intrinsic factors
and extrinsic factors.
❖ Intrinsic factors define the character of a specific object—its

shape, surfaces, and parts.
❖ Extrinsic factors relate to variation in viewing conditions, such

as position, lighting, and occlusion.
41
Object perception
❖ Solutions:
❖ View-independent theories try to remove extrinsic
variation and build a representation that captures
the intrinsic character of each object.
❖ View-dependent theories do not remove extrinsic

effects, but build in ways of accommodating them.
42
View Independent
❖ Properties of view independent theories:

❖ Objects are represented in terms of a symbolic,
structural description of their component parts,
and the relations between them.
❖ Components are specified using a limited set of

parts descriptors.
43
2 view independent theories
❖ Marr and Nishihara (1978) use 3-D generalised cones as parts

descriptors; their structural description is hierarchical (parts can
be decomposed into parts).
❖ Biederman (1987) uses 2-D representations of "geons" as parts

descriptors; basic geometric shapes such as cylinders, cones, and
blocks. Geons are detected on the basis of certain image
properties such as linearity, parallelism, and curvature.
44
Marr: Computational vision
❖ David Marr was very influential to modern approaches to model
Vision
❖ Marr 1982. Vision: A computational investigation into the human
representation and processing of visual information
❖ 3 Stages
❖ Primal Sketch: Multi-scale Edge Detection
❖ 2.5D Sketch: Viewer centred Scene Representation
❖ 3D Sketch: Object Centred Representation
45
Marr: Stage 1
❖ Edge detection
46
Primal Sketch: Multi-scale Edge Detection
V1 filters
Marr: Stage 2
❖ 2.5D, a 3D representation on a 2D surface
47
2.5D Sketch: Viewer centred Scene Representation

assumptions: surface parcing
In the 2.5D sketch local 2D info is used to create a 3D representation of the objects parts. Info is integrated to create the 3D
cones. The object represented from the viewpoint of the observer for these computations.
Marr: Stage 3
❖ 3D object centred representation
Human
Arm
Forearm
Hand
48
3D Sketch: Object Centred Representation
Hierarchical over scale.

Very simple cylinder/cone objects
Biederman
❖ Geons
49
Very influential theory

Objects used do not maintain dimensions, just rough shape
Limitation view dependent
❖ With a limited set of parts descriptors, it is difficult to
make fine discriminations between objects in the same
class.
❖ Some objects may have several possible

decompositions into parts; others may have none at all.
❖ The theories have been difficult to implement

successfully in computer vision systems.
(i.e. find objects in photographic images automatically)
50
View dependent theories
❖ Each object is stored as a small number of discrete

prototypical views.
❖ Recognition of a novel object view is achieved by

comparing it against stored prototypical views
(Bülthoff, Tarr et al.).
51
❖ Prototypical view:
52
53
❖ The geometric structure of objects is usually not explicit
in the representation, making it unsuitable for tasks
requiring this information, such as physical interaction.
❖ Present implementations involve only a relatively small

number of prototypes; it is not clear how well the
theories can deal with a realistically large set of object
prototypes.
❖ How to deal with novel objects?
54
‘Special’ objects
❖ People are very good at reading faces for information
about age, gender, and emotional state
❖ Face-inversion effect: An upside down face and an

inverted face are very difficult to recognize.
55
Are faces objects?
56
Are faces special?
57
How good are we really?
❖ Happy or sad?
58
❖ Happy or sad?
59
❖ Discrimination: is this player winning or losing?
60
❖ Mix and match: what

source do we find
most informative/
reliable?
61
❖ Judgements based on body, not face
62
63
More Face Stuff
64
Russell R. (2009). A sex difference in facial contrast and its exaggeration by cosmetics. Perception. 38 (8), 1211-9.
Fusiform Face Area (FFA)
65
FFA stimulation
66
FFA stimulation
❖ Low level features did not change
❖ ‘You’re still waring a suit and tie’
❖ ‘Only your face changed, everything else was the same’
❖ Subject has trouble describing the changes
❖ ‘They shifted to a side, and maybe stretched. But they didn’t get larger or smaller.’
❖ ‘It was more of a perception, how i perceived your face’
❖ Recognition corrupted, but activated
❖ ‘You were you, and then you weren’t you’
❖ ‘You had similar eyes to Dr. Parvizi, but you were someone else’
❖ ‘You look like someone I’ve seen before’
67
Faces only?
❖ Greebles are three-
dimensional,
computer-generated
creatures that differ
by gender, family
membership, and
individual identity.
❖ People trained to
identify greebles
show increased
activation of “face”
areas of the brain
when recognizing
new greebles.
68
- Normal subjects without any experience with these objects appear to process Greebles in a part-based fashion comparable to
the way they recognize other non-face objects.
- However, research with Greebles has shown that after people become experts at recognizing individual Greebles, they then
process Greebles in a holistic and configural fashion, similar to the way we tend to treat upright faces (Gauthier & Tarr, 1997;
Gauthier & Tarr, 2002).
- Greeble experts also activate their "fusiform face area" in the brain more than novices do (Gauthier, Tarr, Anderson, Skudlarski
& Gore, 1999): this suggests that this part of the brain may be specialized for faces because of our experience with them, not
because of some innate bias.
Practice makes perfect
❖ The brain’s temporal lobe stream makes synaptic connections
in several areas of the temporal lobe’s lower half.
❖ This is known as the Inferior Temporal (IT) cortex.
❖ Neurons in the IT have complex selectivity and huge
receptive fields.
❖ Practice improves visual recognition.
❖ Visual training is associated with changes in the responses of
IT cortical neurons.
69
❖ from: Messinger et al, PNAS 2001

70
monkey neural recordings, learn associations, cue ball, correct response plane, targets both in receptive field
❖ from: Messinger et al, PNAS 2001

71
Activity pattern similarity increases over time when associations between objects are learned
A) ABCD: different stimuli, A with B, C with D.Initially different spike RATES, over trials more similar
B) linear increase corr in firing rate
C) Both positive and negative changes, overall more positive
Recap
❖ Retinotopic representations allow for lateral interactions to
integrate information
❖ Gestalt grouping describes the rules we use to integrate
❖ Multiple ways to represent object: all with their own issues
❖ Specialised processing areas for faces
❖ This may reflect our massive experiences with faces
❖ Higher processing areas area well equipped to learn associations
between objects by synchronising activity for different objects
72
Recap
❖ Retinotopic representations allow for lateral interactions to
integrate information
❖ Gestalt grouping describes the rules we use to integrate
❖ Multiple ways to represent object: all with their own issues
❖ Specialised processing areas for faces
❖ This may reflect our massive experiences with faces
❖ Higher processing areas area well equipped to learn associations
between objects by synchronising activity for different objects
73

Features To Objects: Sjoerd Stuit

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Features To Objects: Sjoerd Stuit

Uploaded by

Copyright:

Available Formats

Features to objects Sjoerd Stuit

❖ Retina & LGN:

FIG. 2. Influence of surrou

❖ V1: complex cells

❖ Not selective for absolute

(responds to onset of either

Here information on spatial position is thrown out

❖ V1 hyper complex cells

❖ Early receptive fields are

from: Bakin et al (2000) 15

❖ Not innate, learned

Freeman et al, 2013

NIH-PA Author Manuscript

NIH-PA Author Manuscript

Freeman et al, 2013

❖ EXTRA-Receptive Field interactions in early visual cortex

❖ Increasing receptive field

❖ How do we ‘bind’ all these features together to form a

❖ Any process that builds a

❖ Gestalt psychologists formulated a

Facilitate contours: remember collinear facilitation

❖ Gestalt grouping tells us showing about the rules the

a jug into a mug

❖ Why is this difficult?

❖ Stage 2 builds a representation of

❖ Stage 3 matches shapes and

Overall plan of processing for object recognition/identification

❖ Same relative features,

❖ The fundamental problem in building object representations is

❖ Intrinsic factors define the character of a specific object—its

❖ Extrinsic factors relate to variation in viewing conditions, such

❖ View-dependent theories do not remove extrinsic

❖ Properties of view independent theories:

❖ Components are specified using a limited set of

❖ Marr and Nishihara (1978) use 3-D generalised cones as parts

❖ Biederman (1987) uses 2-D representations of "geons" as parts

Primal Sketch: Multi-scale Edge Detection

2.5D Sketch: Viewer centred Scene Representation

3D Sketch: Object Centred Representation

Hierarchical over scale.

Very influential theory

❖ Some objects may have several possible

❖ The theories have been difficult to implement

❖ Each object is stored as a small number of discrete

❖ Recognition of a novel object view is achieved by

❖ Present implementations involve only a relatively small

❖ How to deal with novel objects?

❖ Face-inversion effect: An upside down face and an

❖ Mix and match: what

❖ from: Messinger et al, PNAS 2001

❖ from: Messinger et al, PNAS 2001

You might also like