You are on page 1of 12

Visual Perception: Neural basis

The precondition for vision is the existence of light. Light is electromagnetic radiation that
can be described in terms of wavelength. Humans can perceive only a small range of the
wavelengths that exist; the visible wavelengths are from 380 to 750 nanometers (Figure 3.6;
Starr, Evers, & Starr, 2007). Vision begins when light passes through the protective covering
of the eye (Figure 3.7). This covering, the cornea, is a clear dome that protects the eye. The
light then passes through the pupil, the opening in the center of the iris. It continues through
the crystalline lens and the vitreous humor. The vitreous humor is a gel-like substance that
comprises the majority of the eye. Eventually, the light focuses on the retina where
electromagnetic light energy is transduced—that is, converted—into neural electrochemical
impulses (Blake, 2000). Vision is most acute in the fovea, which is a small, thin region of the
retina, the size of the head of a pin. When you look straight at an object, your eyes rotate so
that the image falls directly onto the fovea. Although the retina is only about as thick as a
single page in this book, it consists of
three main layers of neuronal tissue
(Figure 3.8). The first layer of neuronal
tissue—closest to the front,
outward-facing surface of the eye—is
the layer of ganglion cells, whose axons
constitute the optic nerve. The second
layer consists of three kinds of
interneuron cells. Amacrine cells and
horizontal cell make single lateral (i.e.,
horizontal) connections among adjacent
areas of the retina in
the middle layer of cells. Bipolar cells
make dual connections forward and
outward to
the ganglion cells, as well as backward and inward to the third layer of retinal cells.
The third layer of the retina contains the photoreceptors, which convert light
energy into electrochemical energy that is transmitted by neurons to the brain.
There are two kinds of photoreceptors—rods and cones. Each eye contains roughly
120 million rods and 8 million cones. Rods and cones differ not only in shape but
also in their compositions, locations, and responses to light. Within the rods and
cones are photopigments, chemical substances that react to light and transform
physical electromagnetic energy into an electrochemical neural impulse that can be
understood by the brain. The rods are long and thin photoreceptors. They are more
highly concentrated in the periphery of the retina than in the foveal region. The
rods are responsible for night vision and are sensitive to light and dark stimuli. The cones
are short and thick photoreceptors and allow for the perception of colour. They are more
highly concentrated in the foveal region than in the periphery of the retina (Durgin, 2000).
The neurochemical messages processed by the rods and cones of the retina travel via
the bipolar cells to the ganglion cells (see Goodale, 2000a, 2000b). The axons of the ganglion
cells in the eye collectively form the optic nerve for that eye. The optic nerves of the two eyes
join at the base of the brain to form the optic chiasma (see Figure 2.8 in Chapter 2). At this
point, the ganglion cells from the inward, or nasal, part of the retina—the part closer to your
nose—cross through the optic chiasma and extend to the opposite hemisphere of the brain.
The ganglion cells from the outward, or temporal area of the retina closer to your temple go
to the hemisphere on the same side of the body. The lens of each eye naturally inverts the
image of the world as it projects the image onto the retina. In this way, the message sent to
your brain is literally upside-down and backward. After being routed via the optic chiasma,
about 90% of the ganglion cells then go to the lateral geniculate nucleus of the thalamus.
From the thalamus, neurons carry information to the primary visual cortex (V1 or striate
cortex) in the occipital lobe of the brain. The visual cortex contains several processing areas.
Each area handles different kinds of visual information relating to intensity and quality,
including colour, location, depth, pattern, and form.

Pathways for What and Where


A pathway in general is the path the visual information takes from its entering the human
perceptual system through the eyes to its being completely processed. Generally, researchers
agree that there are two pathways.
Why are there two pathways? It is because the information from the primary
visual cortex in the occipital lobe is forwarded through two fasciculi (fiber bundles):
One ascends toward the parietal lobe (along the dorsal pathway), and one descends
to the temporal lobe (along the ventral pathway). The dorsal pathway is also called
the where pathway and is responsible for processing location and motion information;
the ventral pathway is called the what pathway because it is mainly responsible for
processing the color, shape, and identity of visual stimuli (Ungerleider & Haxby,
1994; Ungerleider & Mishkin, 1982).
An alternative interpretation of the visual pathways has been suggested. This
interpretation is that the two pathways refer not to what things are and to where
they are, but rather, to what they are and to how they function. This view is known
as the what/how hypothesis (Goodale & Milner, 2004; Goodale & Westwood, 2004).
This hypothesis argues that spatial information about where something is located in space is
always present in visual information processing. What differs between the two pathways is
whether the emphasis is on identifying what an object is or, instead, on how we can situate
ourselves so as to grasp the object. The what pathway can be found in the ventral stream and
is responsible for the identification of objects. The how pathway is located in the dorsal
stream and controls movements in relation to the objects that have been identified through the
“what” pathway. Ventral and dorsal streams both arise from the same early visual areas
(Milner & Goodale, 2008).
Bottom up and Top Down
There are two ways to understand how we perceive the world: Top-down and Bottom up
theories.
Perception is a complex cognitive process central to our understanding of the world. It
involves interpreting sensory information, and two essential mechanisms, top-down and
bottom-up processing, play pivotal roles in shaping our perceptual experiences.

Bottom-Up Perception
Definition and Characteristics
Definition: Bottom-up perception, often referred to as data-driven perception, begins with
raw sensory input. It's a process where perception is constructed from the ground up, guided
primarily by sensory data.
Characteristics:
It follows a "building-block" approach, starting with the analysis of basic sensory features
and progressively building toward more complex interpretations.
There's minimal influence from prior knowledge or expectations at the initial stages.
Real-Life Examples
When you look at a jigsaw puzzle, your brain processes the colors, shapes, and edges of
individual puzzle pieces without any preconceived notions.
Detecting sudden movements, such as a car suddenly swerving on the road.
Applications:
Crucial for processing new or unfamiliar information.
Fundamental in early stages of perceptual processing, such as detecting edges and basic
visual features.
Top-Down Perception
Definition and Characteristics
Definition: Top-down perception, also known as concept-driven perception, is an approach
where prior knowledge, context, and cognitive processes influence the interpretation of
sensory input.
Characteristics:
Prior knowledge and context significantly shape perception, often guiding the interpretation
of sensory data.
It involves using existing concepts and mental models to understand sensory information.
Real-Life Examples
Reading a book where you automatically correct a typo because of your knowledge of
grammar and language rules.
Recognizing a friend's face in different lighting or clothing due to your familiarity with their
facial features.
Applications
Vital for understanding language, context, and social cues.
Allows us to quickly recognize familiar objects and make predictions based on prior
knowledge.
Interaction Between Bottom-Up and Top-Down Perception
Complementary Process: In reality, perception often involves a dynamic interplay between
bottom-up and top-down processing. The two processes complement each other, helping us
make sense of the world.
Role of Attention: Attention plays a crucial role in the interaction. It directs our focus to
specific sensory input, and top-down processes can influence where we allocate our attention.
Example: Consider being in a crowded room, and someone calls your name (top-down
processing), guiding your attention to that specific voice. Simultaneously, your sensory input
(bottom-up) helps you identify the source of the voice.
Real-Life Significance
Everyday Perception: We engage in both bottom-up and top-down perception continuously
throughout our daily lives. Understanding these processes is essential for grasping how we
navigate and interact with the world.
Cognitive Psychology: Top-down and bottom-up perception are at the core of cognitive
psychology, which delves into the intricacies of how our mental processes shape our
perceptions and behaviors.
Limitations and Ongoing Research
Ongoing research continues to explore the complexities of how bottom-up and top-down
processes interact. Researchers aim to uncover how these mechanisms can be more
effectively harnessed in various applications, including improving artificial intelligence,
enhancing human-computer interactions, and gaining a deeper understanding of human
cognition.
In conclusion, the interplay between bottom-up and top-down perception is pivotal to how we
make sense of the world. Understanding these processes illuminates the core principles of
how humans process information, and ongoing research further unravels the intricacies of
these mechanisms, guiding developments in various scientific and applied fields.

Template Matching Model (Sternberg 99-100)


The template-matching theory of perception proposes that a retinal image of an object is
faithfully transmitted to the brain, and the brain attempts to compare the image directly to
various stored patterns, called templates. Templates are highly detailed models for patterns
we potentially might recognize. The basic idea is that the perceptual system tries to compare
the image to the templates it has and then reports the template that gives the best match.
We see examples of template matching in our everyday lives. Fingerprints are matched in this
way. Machines rapidly process imprinted numerals on checks by comparing them to
templates. Increasingly, products of all kinds are identified with universal product codes
(UPCs or “bar codes”). They can be scanned and identified by computers at the time of
purchase. Chess players who have knowledge of many games use a matching strategy in line
with template theory to recall previous games (Gobet & Jackson, 2002).
Rigidity and Lack of Flexibility: Template models are rigid and inflexible, making it
challenging to account for variations in objects or patterns. They rely on stored templates and
often fail to recognize objects with slight deviations or contextual changes.
Memory Storage and Computational Inefficiency: To recognize diverse objects, template
models require extensive memory storage for templates, which can be impractical.
Additionally, the process of comparing input to numerous templates can be computationally
inefficient.
Limited Generalization and Ambiguity Handling: Template models struggle with
generalizing from known examples to recognize new ones and have difficulty handling
ambiguous stimuli. They rely heavily on exact matches and do not naturally consider
contextual information.
Inadequate Explanation of Complex Perception: These models lack mechanistic explanations
for how recognition occurs in the brain, oversimplifying the hierarchical and contextual
processes involved in visual perception.
Feature-MatchingTheories (Sternberg Pg. 101-105)
Yet another alternative explanation of pattern and form perception may be found in
Feature-matching theories. In this model, stimuli are thought of as combinations of elemental
features. According to these theories,we attempt to match features of a pattern to features
stored in memory,rather than to match a whole pattern to a template or
prototype.(Stankiewicz,2003).
You might wonder how feature analysis represents an advance beyond the template model.
Features in the feature analysis model are essentially like simplified templates. This model
offers advantages over the template model. It's easier to see how it can correct recognition
difficulties faced by the template model. Feature analysis allows specifying crucial
relationships among features within a pattern. For example, in recognizing the letter A, the
critical point is the presence of three lines: two diagonals and one horizontal. Many other
details are irrelevant. Using features instead of complete patterns reduces the number of
templates required because the same features appear in multiple patterns, significantly
reducing the number of distinct entities to represent
Object Recognition Model (Sternberg Pg. 106-107)
One position,viewer-centered representation,is that the individual stores the way the object
looks to him or her.Thus,what matters is the appearance of the object to the viewer, not the
actual structure of the object.The shape of the object changes,depending on the angle from
which we look at it.A number of views of the objects are stored, and when we try to
recognize an object,we have to rotate that object in our mind until it fits one the stored
images.
The second position, object-centered representation, is that the individual
stores a representation of the object, independent of its appearance to the viewer.
In this case, the shape of the object will stay stable across different orientations.
One potential reconciliation of these two approaches to mental representation
suggests that people may use both kinds of representations. According to this ap
proach, recognition of objects occurs on a continuum (Burgund & Marsolek, 2000;
Tarr, 2000; Tarr & Bülthoff, 1995). At one end of this continuum are cognitive
mechanisms that are more viewpoint-centered. At the other end of the continuum
are cognitive mechanisms that are more object-centered. For example, suppose you
see a picture of a car that is inverted. How do you know it is a car? Object-centered
mechanisms would recognize the object as a car, but viewpoint-centered mechanisms
would recognize the car as inverted.
Recognition by Components
Overview: Proposed by Irving Biederman, this theory suggests that we recognize objects by
perceiving their basic 3D shapes called "geons."
1. The object is segmented into a set of basic subobjects via a process that reflects the output
of early visual processing.
2. Once an object has been segmented into basic subobjects, one can classify the category of
each subobject. Biederman (1987) suggested that there are 36 basic categories of subobjects,
which he called geons (an abbreviation of geometric ions).
3. Having identified the pieces from which the object is composed and their configuration,
one recognizes the object as the pattern formed by these pieces. Thus, recognizing an object
from its components is like recognizing a word from its letters. The crucial assumption in this
theory is that object recognition is mediated by the recognition of its components.

Context and Pattern Recognition


When context or general knowledge of the world guides perception, we refer to the
processing as top down processing, because high-level general knowledge contributes to the
interpretation of the low-level perceptual units. A general issue in perception is how such
top-down processing is combined with the bottom-up processing of information from the
stimulus itself, without regard to the general context.

1. Introduction
The human perceptual system is remarkably adept at making sense of the world around us.
Even when presented with limited or ambiguous information, our brains can often quickly
and accurately recognize objects, events, and contexts. The Pattern and Context Recognition
Theory posits that perception is an active process where the brain matches incoming sensory
information to stored patterns, and then interprets this information within a broader context.

2. Basic Concepts
a) Pattern Recognition: This refers to the ability of the brain to identify regularities in sensory
input. For instance, upon seeing the letters "c", "a", and "t" in sequence, our brain recognizes
the pattern and interprets it as the word "cat".

b) Context Recognition: This is the process by which the brain uses surrounding information
to aid perception. For example, the word "lead" can be pronounced as "leed" or "led"
depending on the context in which it's used.

3. The Role of Top-Down and Bottom-Up Processing


a) Bottom-Up Processing: This starts with the sensory input and builds up to the final
perception. It's data-driven and relies heavily on the external stimuli.

b) Top-Down Processing: This involves prior knowledge and expectations to interpret


sensory input. It's conceptually driven.

For example, reading handwritten notes relies on both processes. While the shapes of the
letters guide our recognition (bottom-up), our knowledge of the language and the context
helps fill in gaps or interpret ambiguous letters (top-down).
7. Studies and Experiments
a) The Word Superiority Effect (Reicher, 1969): Participants were quicker at recognizing
letters presented within a word compared to those presented in isolation or within a
non-word, highlighting the influence of context in recognition.

b) Contextual Cueing (Chun & Jiang, 1998): Participants were faster at locating a target in a
scene when the scene's configuration remained consistent across trials, even if they weren't
consciously aware of the repeating patterns.
Facial Recognition (Sternberg Pg. 116-118)

Introduction
Facial recognition is a specialized process in which the human visual system identifies and
processes faces differently from other objects. Faces convey a wealth of information, from
identity to emotions, intentions, and health. Given the social nature of humans, accurately
recognizing and interpreting faces is crucial for successful interpersonal interactions.
2. The Configural System
Facial recognition relies heavily on the configural system, which refers to the spatial
relationships between facial features. Unlike other objects, slight changes in the configuration
of facial features can lead to significant changes in perceived identity.
There are three primary components of configural processing:
a) First-order relational information: This is the basic configuration of facial features that are
common to all faces (e.g., two eyes above a nose, which is above a mouth).
b) Second-order relational information: This pertains to the specific spatial relationships
between features on an individual's face (e.g., the distance between the eyes or the size of the
nose relative to the mouth).
c) Holistic processing:This is the ability to perceive the face as a whole, rather than as a
collection of individual features. When we view a face, we don't just see eyes, a nose, and a
mouth; we see an integrated whole, and this gestalt perception is central to recognizing
familiar faces.

The Fusiform Face Area (FFA)


The FFA, located in the fusiform gyrus of the brain, is crucial for facial recognition.
Neuroimaging studies have shown increased activity in the FFA when participants view faces
as compared to other objects. Damage to this area can result in prosopagnosia (face
blindness), where individuals struggle to recognize familiar faces, underscoring its
importance in facial processing.
The Face Inversion Effect
A key piece of evidence for the configural system's role in facial recognition is the face
inversion effect. When faces are turned upside down, the ability to recognize them is
impaired more than the recognition of other objects when they are inverted. This suggests
that our specialized system for recognizing faces relies heavily on the typical upright
configuration of facial features.
Composite Face Effect
Another evidence for holistic processing is the composite face effect. When the top half of
one face is combined with the bottom half of another, and the resulting composite is viewed
as a whole, it is difficult to recognize the individual halves as coming from two different
faces. However, when the halves are misaligned, recognition becomes easier. This
demonstrates the tendency of the visual system to process faces as integrated wholes.

Challenges in Facial Recognition


While the human visual system is remarkably adept at recognizing faces, there are
challenges. For instance:
a) Similar-looking faces:Some faces have features that are closely matched in configuration,
making them harder to differentiate.
b) Variable conditions:Changes in lighting, angle, and facial expression can affect recognition
accuracy.
Facial recognition is a complex and specialized aspect of human perception, relying heavily
on the configural system. The ability to accurately recognize and interpret faces has
significant evolutionary and social implications, allowing humans to navigate social
hierarchies, detect threats, and form interpersonal bonds.

Massaro’s Fuzzy Logical Model of Perception (FLMP)


The Fuzzy Logical Model of Perception (FLMP) was developed by Dominic Massaro in the
late 1980s and early 1990s. The model is essentially a theory of how multiple sources of
information (or "cues") are integrated when making perceptual decisions. Massaro has
argued that the perceptual information and the context provide two independent sources of
information about the identity of the stimulus and that they are just combined to provide a
best guess of what the stimulus might be. One of the most distinguishing features of the
FLMP is its assertion that this integration is "independent," meaning that the influence of one
cue doesn't depend on the presence or strength of another cue.The model assumes three
operations in visual perception: feature evaluation, feature integration, and decision.
Continuously valued features are evaluated, integrated, and matched against prototype
descriptions in memory, and an identification decision is made on the basis of the relative
goodness of match of the stimulus information with the relevant prototype descriptions.
Basic Principle- Fuzzy Logic:
Traditional logic asserts that things are either true or false, with no in-betweens. In contrast,
fuzzy logic acknowledges that reality often falls somewhere in the middle, allowing for
degrees of truth.
Real-life Example: Imagine deciding whether to bring an umbrella based on the weather.
Traditional logic might say: if it's raining, then bring an umbrella. But what if there's only a
slight drizzle? Or if the forecast says there's a 50% chance of rain? Fuzzy logic would help
you evaluate these "in-between" scenarios by giving a degree of truth to each condition.
Multiple Cues:
The FLMP argues that when we perceive something, we don't just use one piece of
information. We integrate multiple cues to form a perception. Each cue contributes a "degree
of membership" to potential perceptual categories.

Example: In a busy cafe, as we speak to someone, their words are cues, so are their lip
movements and gestures. With the café's ambient noise, understanding them isn't just about
hearing. You're unconsciously triangulating meaning from audio, visual, and even contextual
cues.
4. Evaluation of Cues
Each cue provides a level of evidence for or against each potential perceptual category. Using
fuzzy logic, each cue is given a degree of membership to each category, ranging from 0 (not a
member) to 1 (full member).
*Real-life Example:* When your friend's lips form the word "coffee", but the noise drowns
out some sound, the lip-reading might offer a 0.90 confidence they said "coffee", while the
sound alone might only provide 0.60.
5. Independent Integration
The magic of the FLMP is in how it combines these cues. The model asserts that cues are
combined independently, which contrasts with other models that suggest cues interact.

Real-life Scenario: combining ingredients for a unique café recipe. The lip movement (0.90)
and voice sound (0.60) independently contribute, blending to shape the perception.
Mathematically, their integration multiplies them, giving a combined confidence of 0.90 x
0.60 = 0.54 for the word "coffee."

6. Making the Decision


Once all cues are evaluated and integrated, the perceptual category with the highest value is
typically chosen as the final perception.
Continuing the Example: If "coffee" registers a confidence of 0.54, but another word like
"toffee" rates lower, your brain will lean towards "coffee" as your friend's word choice.
7. Implications and Limitations
Massaro's FLMP offers a unique perspective on perception, emphasizing the independent
combination of multiple cues using principles from fuzzy logic. By understanding this model,
we gain insights into the complex processes underlying our seemingly straightforward
perceptual experiences.The FLMP has been particularly influential in studies of speech
perception, especially in situations where auditory and visual information (e.g., lip
movements) are combined.
However, like all models, it's a simplification of reality. While FLMP provides an intricate
model, reality can be even richer in complexity. There's debate over whether cues are always
as independent as FLMP posits.Some researchers have argued that the strict independence
assumption might not always hold, and real-world perceptions might involve more intricate
interactions between cues.

Deficits in Perception
Sternberg, Pg. 127-131

https://www.youtube.com/watch?v=QV59aAtznSg ;
https://www.visioncenter.org/conditions/motion-blindness/

https://medlineplus.gov/genetics/condition/achromatopsia/#causes

https://my.clevelandclinic.org/health/diseases/23421-visual-agnosia

https://www.britannica.com/science/optic-ataxia

https://www.britannica.com/science/prosopagnosia

https://www.britannica.com/science/color-blindness
https://byjus.com/biology/colour-blindness/

You might also like