You are on page 1of 73

Features to objects Sjoerd Stuit

Elements of vision

2
Cortical visual processing

❖ Retina & LGN:


❖ spatial frequency
❖ V1: increasingly
complexity

3
V1

❖ V1: new
response
properties

4
Orientation tuning

Orientation selectivity can be built up from spatial interactions between different center surround cells. Responds to co-
activation of all cells by an oriented line/bar
Makes a gabor filter RF structure. Many different combinations of spatial frequency and orientation and phase
Tuning curves

Orientation
580
selectivity M. J. VAN DER SMAGT, C. WEHRHAHN, AND T. D. ALBRIGHT

FIG. 2. Influence of surrou


of a V1 neuron to bright target
to target line in classical recept
mask. Neuron was sharply tu
slightly clockwise from vertica
to target stimuli recorded in the
ent stimulus conditions indicate
yielded suppression of respons
ited by target alone. Greatest le
elicited by masks composed o
same orientation and contrast p
of suppression was reduced if t
with respect to orientation or
6 bination of orientation and pola
no greater or lesser effect tha
peristimulus time histograms
ms, smoothed by a sliding Gau
SD) showing time-course of
same-polarity masks. Differenc
els as a function of masking c
ms after stimulus onset and per
ms. D: PSTHs showing time
caused by different-polarity m
sponses appeared to diverge
$80 –90 ms after stimulus on
for another 400 ms. Response
tions remained indistinguishab
Estimates of PSTH event timin
approximate, owing to lack o
averaged PSTHs (Fig. 6) for s
of event times.

target. The icons depict the specific stimulus conditions used. evidence for an additional suppression re
First of all, it is abundantly clear that a surround mask of any consistent with additive effects of the two
variety used suppressed the response to the target [repeated trary, the pattern of suppressive effects see
measures ANOVA (F ! 7.40 P " 0.001) with LSD posthoc the signature of a highly interactive syste
comparisons, P " 0.01]. Second, consider the effects of a difference resulted in reduced mask suppr
surround mask that was of the same polarity as the target (i.e., and target were of the same luminance con
bright). A mask oriented similarly to the target yielded sup- and Roc bars), but not when mask and tar
pression (Rnc bar in Fig. 2B) that was significantly stronger polarities (Rpc and Rdc bars). Similarly, a
than suppression resulting from a mask with differently ori- resulted in reduced mask suppression whe
ented line segments (Roc bar; P " 0.05). This pronounced were of the same orientation (Rnc and Rpc b
orientation-specific masking effect corroborates previous mask and target were different orientations
claims (Knierim and van Essen 1992; Li et al. 2000). Figure 2C shows the PSTH for the targ
Third, turning the tables, consider the effects of polarity for along with PSTHs for the two same-po
a surround mask that was of the same orientation as the target. expected, the cell responded strongly to th
A mask with the same polarity as the target yielded suppression of the target in the absence of a mask (Rt
(Rnc bar in Fig. 2B) that was significantly greater than that of the sustained component of the respon
resulting from a mask with a different polarity (Rpc bar; P " significantly smaller when a same-polarity
0.05). This novel finding constitutes a contrast polarity analog (Roc), and even more so when the mask a
of the established orientation-masking phenomenon. the same orientation (Rnc). Figure 2D sho
Finally, our experiment allowed us to examine the conjoint the target-only condition with those for
effects of orientation and contrast polarity cues. Figure 2B polarity masks. The magnitude of the sus
reveals that the degree of response suppression for a double- of the response to a target was smaller
cue mask (Rdc bar) was no different from the effects of either polarity mask was present. However, the r
cue alone (Roc and Rpc bars; P # 0.1 for each comparison). (Rpc) and different-orientation (Rdc) mas
Thus, although masking responses remained suppressed rela- indistinguishable throughout the entire ep
tive to that elicited by the target alone (Rto bar), we saw no were recorded.
J Neurophysiol • VOL 94 • JULY 2005 • www.jn.org
Orientation columns

7
Orientation columns

8
Ocular dominance columns

Each little patch of cortex represents the content of a single receptive field (map location) with lots of different response
properties (columns). blobs are sensitive to color as well, segregation is mainained in V2
Increasing complexity

❖ V1: complex cells

10

Selectively combining outputs from several cells give more complex cell preferences (must respond to this and this and this)
Takes several spatially overlapping inputs, but some separation and summation leads to larger receptive fields with more
processing Complex cells respond to both light and dark bars of the same orientation. Some spatial invariance. Larger receptive
fields, but more complex responses. Signature of hierarchical convergent processing
Increasing complexity

❖ Complex cell:
❖ Orientation selective

❖ Not selective for absolute

position
❖ Not selective for ON or OFF

(responds to onset of either


or both)
❖ No response to small spots

or diffuse illumination
❖ increased RF size

11

Here information on spatial position is thrown out


Still represented in input neurons, and still accessible to awareness
Increasing complexity

❖ V1 hyper complex cells


❖ combines complex and/or simple
cells
❖ End-stopping: length preferences
❖ Increased RF size

12
The aperture problem

❖ Early receptive fields are


rather small.
❖ Limited information:
multiple solutions

13

early in the visual pathway, RFs are still small. This lead to certain problems.
Integration

❖ Orientation
integration
from outside of
the receptive
field!

14

After orientation detection, structure over the whole image must be analyzed, a much more complex process
Initially relies on examining differences between content of nearby receptive fields
Extra-classical RF

from: Bakin et al (2000) 15

Monkey V1/2
Flank facilitation signals stimulus contours. A, It is easier to detect a continuous contour composed of individual elements as the
number of component elements increases. B, Adding colinear flanking lines outside of the RF (dashed line-outlined square) core
increases, or facilitates, the response of the neuron to a target stimulus located inside of the cell's RF. However, interrupting the
path of the smooth contour by inserting a bar oriented orthogonal to the path of the contour blocks the flank-induced neural
facilitation (adapted from Kapadia et al., 1995). In this, and all other quantified response plots, the response rate [spikes/second
(s/s)] of the neuron is plotted on the y-axis. C, Quantified responses recorded from a V2 neuron that exhibited flank facilitation
are shown. Placing the orthogonal bar either in the same plane as the flank and target stimulus or in the far depth plane (0.16°
uncrossed disparity) blocked the flank-induced facilitation of the neural response to the target stimulus. However, when the
orthogonal bar was placed in the near depth plane (0.16° crossed disparity; arrow), the flank facilitated the neuron's response to
the target stimulus, suggesting that flank facilitation can signal contours even when they are partially occluded. Fix refers to
plane of fixation; dot in diagram at left is fixation plane.
Extra-classical RF
❖ Interaction of
receptive field
content with
information from
neighbouring
receptive fields
❖ Increases response
to receptive field
content
(facilitation)
❖ Intermediate stage
to further form
processing
16
Not always facilitation

❖ Center-Surround Suppression

17
Extra-classical RF
Similarity needed for integration

❖ Not innate, learned


❖ Develops slowly as we learn
regularities in our
environments: Statistical
learning

18
Statistical Learning

Freeman et al, 2013


19

After V1 is becomes more and more difficult to figure out the response preferences for neurons.
This is a neural network answer: V2 has learned from the real world that images normally have local correlations in colour, shape
and orientation
i.e. it responds when inputs have the correlations that are common in the real world
Scene statistics from the real world: If it fires together, it wires together.
So neural response selectivity gets quite complex very quickly
Statistical Learning

NIH-PA Author Manuscript


C

NIH-PA Author Manuscript


B

Freeman et al, 2013


20

Figure 2.
Neuronal responses to naturalistic textures differentiate V2 from V1 in macaques. (a) Time
course of firing rate for three single units in V1 (green) and V2 (blue) to images of
naturalistic texture (dark) and spectrally-matched noise (light). Thickness of lines indicates
s.e.m. across texture families. Black horizontal bar indicates the presentation of the stimulus;
NIH-PA Author Manuscript

gray bar indicates the presentation of the subsequent stimulus. (b) Time course of firing rate
averaged across neurons in V1 and V2. Each neuron's firing rate was normalized by its
maximum before averaging. Thickness of lines indicates s.e.m. across neurons. (c)
Modulation index, computed as the difference between the response to naturalistic and the
response to noise, divided by the sum. Modulation was computed separately for each neuron
and texture family, then averaged across all neurons and families. Thickness of blue and
green lines indicates s.e.m. across neurons. Thickness of gray shaded region indicates the
2.5th and 97.5th percentiles of the null distribution of modulation expected at each time
point due to chance. (d) Firing rates for three single units in V1 (green) and V2 (blue) to
naturalistic (dark dots) and noise (light dots), separately for the 15 texture families. Families
are sorted according to the ranking in panel e. Gray bars connecting points are only for
visualization of the differential response. Modulation indices (averaged across texture
families) are reported in the upper right of each panel. Error bars indicate s.e.m. across the
After V1 is becomes more and more difficult to figure out the response
15 preferences for neurons.
samples of each texture family. (e) Diversity in modulation across texture families,

This is a neural network answer: V2 has learned from the real world that images normally have local correlations in colour, shape
Nat Neurosci. Author manuscript; available in PMC 2014 January 01.
and orientation
i.e. it responds when inputs have the correlations that are common in the real world
Scene statistics from the real world: If it fires together, it wires together.
So neural response selectivity gets quite complex very quickly
Spatial Interaction at multiple scales
❖ Photo-receptor level (horizontal cells)
❖ On/Off organization ganglion cells and LGN
❖ On/Off organization simple cells
❖ End-stopped behavior hypercomplex cells (orthogonal with respect
to simple cell interactions)

❖ EXTRA-Receptive Field interactions in early visual cortex


(e.g. collinear facilitation)
❖ Complex statistical learning V2

21
Retinotopy

❖ Lateral interaction
require structure
❖ Retinotopic maps:
intact map of the
visual field on the
cortex

22
Visual Maps
❖ We have 30+
visual areas, each
with a complete
visual field map
❖ Different
specialisations
and different
response
properties

23
Beyond early visual cortex

❖ Increasing receptive field


size
❖ Increasingly complex
respons properties

24
Beyond early visual cortex

25

In the IT, which is part of the ‘what’ pathway we do find highly specialised neurons specialized cells…hands only..
How specific does it get…and what do we need for such a specificity?
Beyond early visual cortex
❖ Computational problems with
specialized cells for every
object? (i.e. ‘grandmother cells’)

❖ Error prone…
❖ Susceptible to cell death…
❖ What about new (never seen)
objects?
❖ Very uneconomical
❖ How big should our head be….?
Alternative:
Ensemble coding/
Population coding
26
Binding problem

❖ How do we ‘bind’ all these features together to form a


coherent percept?
❖ What does behaviour tell us?
❖ Gestalt: The sum is more than its parts

27
Gestalt grouping

❖ Any process that builds a


representation of larger-scale
shapes must integrate local
information. On what basis?

❖ Gestalt psychologists formulated a


set of rules or "laws" of perceptual
organization.

28
Gestalt grouping

❖ Symmetry

29
Gestalt grouping
❖ Proximity

30
Gestalt grouping

❖ Similarity

31
Gestalt grouping

❖ Good continuation

32 32

Facilitate contours: remember collinear facilitation


Gestalt grouping

❖ Combining cues

33
Gestalt grouping
❖ They probably reflect assumptions about the nature of
real-world surfaces and objects:
❖ Objects are made of cohesive, opaque, uniform
materials (proximity, similarity).
❖ When objects change or move, all their parts change
or move (common fate).
❖ Object contours tend to vary smoothly (good
continuation).

34
Object perception

❖ Gestalt grouping tells us showing about the rules the


systems uses to integrate information across space (and
time) to form objects
❖ But we still need to recognise them!

35
Object perception
❖ detection
❖ Is there an object?

❖ recognition
❖ Is it a mug?

❖ discrimination
❖ My mug versus

your mug
❖ interaction
❖ Pour the milk from

a jug into a mug

36
Object perception

❖ Why is this difficult?

37
Object perception
❖ Stage 1 builds a piecemeal
representation of local image
properties.

❖ Stage 2 builds a representation of


larger-scale shapes and surfaces.

❖ Stage 3 matches shapes and


surfaces with stored object
representations-recognition.

38

Overall plan of processing for object recognition/identification


Object perception

❖ Can we implement
these stages?
❖ Problems?

39

This is one way different views of an object can be associated, or different individual examples
Component shapes are relatively easy to extract, and will generally have the same spatial relationships to each other
Object perception

❖ Same relative features,


very different conclusions

40
Object perception
❖ The problem with object perception:

❖ The fundamental problem in building object representations is


that images of objects reflect a combination of intrinsic factors
and extrinsic factors.

❖ Intrinsic factors define the character of a specific object—its


shape, surfaces, and parts.

❖ Extrinsic factors relate to variation in viewing conditions, such


as position, lighting, and occlusion.

41
Object perception

❖ Solutions:
❖ View-independent theories try to remove extrinsic
variation and build a representation that captures
the intrinsic character of each object.

❖ View-dependent theories do not remove extrinsic


effects, but build in ways of accommodating them.

42
View Independent

❖ Properties of view independent theories:


❖ Objects are represented in terms of a symbolic,
structural description of their component parts,
and the relations between them.

❖ Components are specified using a limited set of


parts descriptors.

43
2 view independent theories

❖ Marr and Nishihara (1978) use 3-D generalised cones as parts


descriptors; their structural description is hierarchical (parts can
be decomposed into parts).

❖ Biederman (1987) uses 2-D representations of "geons" as parts


descriptors; basic geometric shapes such as cylinders, cones, and
blocks. Geons are detected on the basis of certain image
properties such as linearity, parallelism, and curvature.

44
Marr: Computational vision
❖ David Marr was very influential to modern approaches to model
Vision
❖ Marr 1982. Vision: A computational investigation into the human
representation and processing of visual information

❖ 3 Stages
❖ Primal Sketch: Multi-scale Edge Detection
❖ 2.5D Sketch: Viewer centred Scene Representation
❖ 3D Sketch: Object Centred Representation

45
Marr: Stage 1

❖ Edge detection

46

Primal Sketch: Multi-scale Edge Detection

V1 filters
Marr: Stage 2
❖ 2.5D, a 3D representation on a 2D surface

47

2.5D Sketch: Viewer centred Scene Representation


assumptions: surface parcing
In the 2.5D sketch local 2D info is used to create a 3D representation of the objects parts. Info is integrated to create the 3D
cones. The object represented from the viewpoint of the observer for these computations.
Marr: Stage 3
❖ 3D object centred representation

Human

Arm
Forearm
Hand

48

3D Sketch: Object Centred Representation

Hierarchical over scale.


Very simple cylinder/cone objects
Biederman

❖ Geons

49

Very influential theory


Objects used do not maintain dimensions, just rough shape
Limitation view dependent
❖ With a limited set of parts descriptors, it is difficult to
make fine discriminations between objects in the same
class.

❖ Some objects may have several possible


decompositions into parts; others may have none at all.

❖ The theories have been difficult to implement


successfully in computer vision systems.
(i.e. find objects in photographic images automatically)

50
View dependent theories

❖ Each object is stored as a small number of discrete


prototypical views.

❖ Recognition of a novel object view is achieved by


comparing it against stored prototypical views
(Bülthoff, Tarr et al.).

51
View dependent theories
❖ Prototypical view:

52
View dependent theories

53
View dependent theories
❖ The geometric structure of objects is usually not explicit
in the representation, making it unsuitable for tasks
requiring this information, such as physical interaction.

❖ Present implementations involve only a relatively small


number of prototypes; it is not clear how well the
theories can deal with a realistically large set of object
prototypes.

❖ How to deal with novel objects?

54
‘Special’ objects
❖ People are very good at reading faces for information
about age, gender, and emotional state

❖ Face-inversion effect: An upside down face and an


inverted face are very difficult to recognize.

55
Are faces objects?

56
Are faces special?

57
How good are we really?
❖ Happy or sad?

58
How good are we really?
❖ Happy or sad?

59
How good are we really?
❖ Discrimination: is this player winning or losing?

60
How good are we really?

❖ Mix and match: what


source do we find
most informative/
reliable?

61
How good are we really?
❖ Judgements based on body, not face

62
How good are we really?

63
More Face Stuff

64

Russell R. (2009). A sex difference in facial contrast and its exaggeration by cosmetics. Perception. 38 (8), 1211-9.
Fusiform Face Area (FFA)

65
FFA stimulation

66
FFA stimulation
❖ Low level features did not change
❖ ‘You’re still waring a suit and tie’
❖ ‘Only your face changed, everything else was the same’
❖ Subject has trouble describing the changes
❖ ‘They shifted to a side, and maybe stretched. But they didn’t get larger or smaller.’
❖ ‘It was more of a perception, how i perceived your face’
❖ Recognition corrupted, but activated
❖ ‘You were you, and then you weren’t you’
❖ ‘You had similar eyes to Dr. Parvizi, but you were someone else’
❖ ‘You look like someone I’ve seen before’

67
Faces only?
❖ Greebles are three-
dimensional,
computer-generated
creatures that differ
by gender, family
membership, and
individual identity.

❖ People trained to
identify greebles
show increased
activation of “face”
areas of the brain
when recognizing
new greebles.

68

- Normal subjects without any experience with these objects appear to process Greebles in a part-based fashion comparable to
the way they recognize other non-face objects.
- However, research with Greebles has shown that after people become experts at recognizing individual Greebles, they then
process Greebles in a holistic and configural fashion, similar to the way we tend to treat upright faces (Gauthier & Tarr, 1997;
Gauthier & Tarr, 2002).
- Greeble experts also activate their "fusiform face area" in the brain more than novices do (Gauthier, Tarr, Anderson, Skudlarski
& Gore, 1999): this suggests that this part of the brain may be specialized for faces because of our experience with them, not
because of some innate bias.
Practice makes perfect
❖ The brain’s temporal lobe stream makes synaptic connections
in several areas of the temporal lobe’s lower half.
❖ This is known as the Inferior Temporal (IT) cortex.
❖ Neurons in the IT have complex selectivity and huge
receptive fields.
❖ Practice improves visual recognition.
❖ Visual training is associated with changes in the responses of
IT cortical neurons.

69
Practice makes perfect

❖ from: Messinger et al, PNAS 2001


70

monkey neural recordings, learn associations, cue ball, correct response plane, targets both in receptive field
Practice makes perfect

❖ from: Messinger et al, PNAS 2001


71

Activity pattern similarity increases over time when associations between objects are learned
A) ABCD: different stimuli, A with B, C with D.Initially different spike RATES, over trials more similar
B) linear increase corr in firing rate
C) Both positive and negative changes, overall more positive
Recap
❖ Retinotopic representations allow for lateral interactions to
integrate information
❖ Gestalt grouping describes the rules we use to integrate
❖ Multiple ways to represent object: all with their own issues
❖ Specialised processing areas for faces
❖ This may reflect our massive experiences with faces
❖ Higher processing areas area well equipped to learn associations
between objects by synchronising activity for different objects

72
Recap
❖ Retinotopic representations allow for lateral interactions to
integrate information
❖ Gestalt grouping describes the rules we use to integrate
❖ Multiple ways to represent object: all with their own issues
❖ Specialised processing areas for faces
❖ This may reflect our massive experiences with faces
❖ Higher processing areas area well equipped to learn associations
between objects by synchronising activity for different objects

73

You might also like