Professional Documents
Culture Documents
Features To Objects: Sjoerd Stuit
Features To Objects: Sjoerd Stuit
Elements of vision
2
Cortical visual processing
3
V1
❖ V1: new
response
properties
4
Orientation tuning
Orientation selectivity can be built up from spatial interactions between different center surround cells. Responds to co-
activation of all cells by an oriented line/bar
Makes a gabor filter RF structure. Many different combinations of spatial frequency and orientation and phase
Tuning curves
Orientation
580
selectivity M. J. VAN DER SMAGT, C. WEHRHAHN, AND T. D. ALBRIGHT
target. The icons depict the specific stimulus conditions used. evidence for an additional suppression re
First of all, it is abundantly clear that a surround mask of any consistent with additive effects of the two
variety used suppressed the response to the target [repeated trary, the pattern of suppressive effects see
measures ANOVA (F ! 7.40 P " 0.001) with LSD posthoc the signature of a highly interactive syste
comparisons, P " 0.01]. Second, consider the effects of a difference resulted in reduced mask suppr
surround mask that was of the same polarity as the target (i.e., and target were of the same luminance con
bright). A mask oriented similarly to the target yielded sup- and Roc bars), but not when mask and tar
pression (Rnc bar in Fig. 2B) that was significantly stronger polarities (Rpc and Rdc bars). Similarly, a
than suppression resulting from a mask with differently ori- resulted in reduced mask suppression whe
ented line segments (Roc bar; P " 0.05). This pronounced were of the same orientation (Rnc and Rpc b
orientation-specific masking effect corroborates previous mask and target were different orientations
claims (Knierim and van Essen 1992; Li et al. 2000). Figure 2C shows the PSTH for the targ
Third, turning the tables, consider the effects of polarity for along with PSTHs for the two same-po
a surround mask that was of the same orientation as the target. expected, the cell responded strongly to th
A mask with the same polarity as the target yielded suppression of the target in the absence of a mask (Rt
(Rnc bar in Fig. 2B) that was significantly greater than that of the sustained component of the respon
resulting from a mask with a different polarity (Rpc bar; P " significantly smaller when a same-polarity
0.05). This novel finding constitutes a contrast polarity analog (Roc), and even more so when the mask a
of the established orientation-masking phenomenon. the same orientation (Rnc). Figure 2D sho
Finally, our experiment allowed us to examine the conjoint the target-only condition with those for
effects of orientation and contrast polarity cues. Figure 2B polarity masks. The magnitude of the sus
reveals that the degree of response suppression for a double- of the response to a target was smaller
cue mask (Rdc bar) was no different from the effects of either polarity mask was present. However, the r
cue alone (Roc and Rpc bars; P # 0.1 for each comparison). (Rpc) and different-orientation (Rdc) mas
Thus, although masking responses remained suppressed rela- indistinguishable throughout the entire ep
tive to that elicited by the target alone (Rto bar), we saw no were recorded.
J Neurophysiol • VOL 94 • JULY 2005 • www.jn.org
Orientation columns
7
Orientation columns
8
Ocular dominance columns
Each little patch of cortex represents the content of a single receptive field (map location) with lots of different response
properties (columns). blobs are sensitive to color as well, segregation is mainained in V2
Increasing complexity
10
Selectively combining outputs from several cells give more complex cell preferences (must respond to this and this and this)
Takes several spatially overlapping inputs, but some separation and summation leads to larger receptive fields with more
processing Complex cells respond to both light and dark bars of the same orientation. Some spatial invariance. Larger receptive
fields, but more complex responses. Signature of hierarchical convergent processing
Increasing complexity
❖ Complex cell:
❖ Orientation selective
position
❖ Not selective for ON or OFF
or diffuse illumination
❖ increased RF size
11
12
The aperture problem
13
early in the visual pathway, RFs are still small. This lead to certain problems.
Integration
❖ Orientation
integration
from outside of
the receptive
field!
14
After orientation detection, structure over the whole image must be analyzed, a much more complex process
Initially relies on examining differences between content of nearby receptive fields
Extra-classical RF
Monkey V1/2
Flank facilitation signals stimulus contours. A, It is easier to detect a continuous contour composed of individual elements as the
number of component elements increases. B, Adding colinear flanking lines outside of the RF (dashed line-outlined square) core
increases, or facilitates, the response of the neuron to a target stimulus located inside of the cell's RF. However, interrupting the
path of the smooth contour by inserting a bar oriented orthogonal to the path of the contour blocks the flank-induced neural
facilitation (adapted from Kapadia et al., 1995). In this, and all other quantified response plots, the response rate [spikes/second
(s/s)] of the neuron is plotted on the y-axis. C, Quantified responses recorded from a V2 neuron that exhibited flank facilitation
are shown. Placing the orthogonal bar either in the same plane as the flank and target stimulus or in the far depth plane (0.16°
uncrossed disparity) blocked the flank-induced facilitation of the neural response to the target stimulus. However, when the
orthogonal bar was placed in the near depth plane (0.16° crossed disparity; arrow), the flank facilitated the neuron's response to
the target stimulus, suggesting that flank facilitation can signal contours even when they are partially occluded. Fix refers to
plane of fixation; dot in diagram at left is fixation plane.
Extra-classical RF
❖ Interaction of
receptive field
content with
information from
neighbouring
receptive fields
❖ Increases response
to receptive field
content
(facilitation)
❖ Intermediate stage
to further form
processing
16
Not always facilitation
❖ Center-Surround Suppression
17
Extra-classical RF
Similarity needed for integration
18
Statistical Learning
After V1 is becomes more and more difficult to figure out the response preferences for neurons.
This is a neural network answer: V2 has learned from the real world that images normally have local correlations in colour, shape
and orientation
i.e. it responds when inputs have the correlations that are common in the real world
Scene statistics from the real world: If it fires together, it wires together.
So neural response selectivity gets quite complex very quickly
Statistical Learning
Figure 2.
Neuronal responses to naturalistic textures differentiate V2 from V1 in macaques. (a) Time
course of firing rate for three single units in V1 (green) and V2 (blue) to images of
naturalistic texture (dark) and spectrally-matched noise (light). Thickness of lines indicates
s.e.m. across texture families. Black horizontal bar indicates the presentation of the stimulus;
NIH-PA Author Manuscript
gray bar indicates the presentation of the subsequent stimulus. (b) Time course of firing rate
averaged across neurons in V1 and V2. Each neuron's firing rate was normalized by its
maximum before averaging. Thickness of lines indicates s.e.m. across neurons. (c)
Modulation index, computed as the difference between the response to naturalistic and the
response to noise, divided by the sum. Modulation was computed separately for each neuron
and texture family, then averaged across all neurons and families. Thickness of blue and
green lines indicates s.e.m. across neurons. Thickness of gray shaded region indicates the
2.5th and 97.5th percentiles of the null distribution of modulation expected at each time
point due to chance. (d) Firing rates for three single units in V1 (green) and V2 (blue) to
naturalistic (dark dots) and noise (light dots), separately for the 15 texture families. Families
are sorted according to the ranking in panel e. Gray bars connecting points are only for
visualization of the differential response. Modulation indices (averaged across texture
families) are reported in the upper right of each panel. Error bars indicate s.e.m. across the
After V1 is becomes more and more difficult to figure out the response
15 preferences for neurons.
samples of each texture family. (e) Diversity in modulation across texture families,
This is a neural network answer: V2 has learned from the real world that images normally have local correlations in colour, shape
Nat Neurosci. Author manuscript; available in PMC 2014 January 01.
and orientation
i.e. it responds when inputs have the correlations that are common in the real world
Scene statistics from the real world: If it fires together, it wires together.
So neural response selectivity gets quite complex very quickly
Spatial Interaction at multiple scales
❖ Photo-receptor level (horizontal cells)
❖ On/Off organization ganglion cells and LGN
❖ On/Off organization simple cells
❖ End-stopped behavior hypercomplex cells (orthogonal with respect
to simple cell interactions)
21
Retinotopy
❖ Lateral interaction
require structure
❖ Retinotopic maps:
intact map of the
visual field on the
cortex
22
Visual Maps
❖ We have 30+
visual areas, each
with a complete
visual field map
❖ Different
specialisations
and different
response
properties
23
Beyond early visual cortex
24
Beyond early visual cortex
25
In the IT, which is part of the ‘what’ pathway we do find highly specialised neurons specialized cells…hands only..
How specific does it get…and what do we need for such a specificity?
Beyond early visual cortex
❖ Computational problems with
specialized cells for every
object? (i.e. ‘grandmother cells’)
❖ Error prone…
❖ Susceptible to cell death…
❖ What about new (never seen)
objects?
❖ Very uneconomical
❖ How big should our head be….?
Alternative:
Ensemble coding/
Population coding
26
Binding problem
27
Gestalt grouping
28
Gestalt grouping
❖ Symmetry
29
Gestalt grouping
❖ Proximity
30
Gestalt grouping
❖ Similarity
31
Gestalt grouping
❖ Good continuation
32 32
❖ Combining cues
33
Gestalt grouping
❖ They probably reflect assumptions about the nature of
real-world surfaces and objects:
❖ Objects are made of cohesive, opaque, uniform
materials (proximity, similarity).
❖ When objects change or move, all their parts change
or move (common fate).
❖ Object contours tend to vary smoothly (good
continuation).
34
Object perception
35
Object perception
❖ detection
❖ Is there an object?
❖ recognition
❖ Is it a mug?
❖ discrimination
❖ My mug versus
your mug
❖ interaction
❖ Pour the milk from
36
Object perception
37
Object perception
❖ Stage 1 builds a piecemeal
representation of local image
properties.
38
❖ Can we implement
these stages?
❖ Problems?
39
This is one way different views of an object can be associated, or different individual examples
Component shapes are relatively easy to extract, and will generally have the same spatial relationships to each other
Object perception
40
Object perception
❖ The problem with object perception:
41
Object perception
❖ Solutions:
❖ View-independent theories try to remove extrinsic
variation and build a representation that captures
the intrinsic character of each object.
42
View Independent
43
2 view independent theories
44
Marr: Computational vision
❖ David Marr was very influential to modern approaches to model
Vision
❖ Marr 1982. Vision: A computational investigation into the human
representation and processing of visual information
❖ 3 Stages
❖ Primal Sketch: Multi-scale Edge Detection
❖ 2.5D Sketch: Viewer centred Scene Representation
❖ 3D Sketch: Object Centred Representation
45
Marr: Stage 1
❖ Edge detection
46
V1 filters
Marr: Stage 2
❖ 2.5D, a 3D representation on a 2D surface
47
Human
Arm
Forearm
Hand
48
❖ Geons
49
50
View dependent theories
51
View dependent theories
❖ Prototypical view:
52
View dependent theories
53
View dependent theories
❖ The geometric structure of objects is usually not explicit
in the representation, making it unsuitable for tasks
requiring this information, such as physical interaction.
54
‘Special’ objects
❖ People are very good at reading faces for information
about age, gender, and emotional state
55
Are faces objects?
56
Are faces special?
57
How good are we really?
❖ Happy or sad?
58
How good are we really?
❖ Happy or sad?
59
How good are we really?
❖ Discrimination: is this player winning or losing?
60
How good are we really?
61
How good are we really?
❖ Judgements based on body, not face
62
How good are we really?
63
More Face Stuff
64
Russell R. (2009). A sex difference in facial contrast and its exaggeration by cosmetics. Perception. 38 (8), 1211-9.
Fusiform Face Area (FFA)
65
FFA stimulation
66
FFA stimulation
❖ Low level features did not change
❖ ‘You’re still waring a suit and tie’
❖ ‘Only your face changed, everything else was the same’
❖ Subject has trouble describing the changes
❖ ‘They shifted to a side, and maybe stretched. But they didn’t get larger or smaller.’
❖ ‘It was more of a perception, how i perceived your face’
❖ Recognition corrupted, but activated
❖ ‘You were you, and then you weren’t you’
❖ ‘You had similar eyes to Dr. Parvizi, but you were someone else’
❖ ‘You look like someone I’ve seen before’
67
Faces only?
❖ Greebles are three-
dimensional,
computer-generated
creatures that differ
by gender, family
membership, and
individual identity.
❖ People trained to
identify greebles
show increased
activation of “face”
areas of the brain
when recognizing
new greebles.
68
- Normal subjects without any experience with these objects appear to process Greebles in a part-based fashion comparable to
the way they recognize other non-face objects.
- However, research with Greebles has shown that after people become experts at recognizing individual Greebles, they then
process Greebles in a holistic and configural fashion, similar to the way we tend to treat upright faces (Gauthier & Tarr, 1997;
Gauthier & Tarr, 2002).
- Greeble experts also activate their "fusiform face area" in the brain more than novices do (Gauthier, Tarr, Anderson, Skudlarski
& Gore, 1999): this suggests that this part of the brain may be specialized for faces because of our experience with them, not
because of some innate bias.
Practice makes perfect
❖ The brain’s temporal lobe stream makes synaptic connections
in several areas of the temporal lobe’s lower half.
❖ This is known as the Inferior Temporal (IT) cortex.
❖ Neurons in the IT have complex selectivity and huge
receptive fields.
❖ Practice improves visual recognition.
❖ Visual training is associated with changes in the responses of
IT cortical neurons.
69
Practice makes perfect
monkey neural recordings, learn associations, cue ball, correct response plane, targets both in receptive field
Practice makes perfect
Activity pattern similarity increases over time when associations between objects are learned
A) ABCD: different stimuli, A with B, C with D.Initially different spike RATES, over trials more similar
B) linear increase corr in firing rate
C) Both positive and negative changes, overall more positive
Recap
❖ Retinotopic representations allow for lateral interactions to
integrate information
❖ Gestalt grouping describes the rules we use to integrate
❖ Multiple ways to represent object: all with their own issues
❖ Specialised processing areas for faces
❖ This may reflect our massive experiences with faces
❖ Higher processing areas area well equipped to learn associations
between objects by synchronising activity for different objects
72
Recap
❖ Retinotopic representations allow for lateral interactions to
integrate information
❖ Gestalt grouping describes the rules we use to integrate
❖ Multiple ways to represent object: all with their own issues
❖ Specialised processing areas for faces
❖ This may reflect our massive experiences with faces
❖ Higher processing areas area well equipped to learn associations
between objects by synchronising activity for different objects
73