Professional Documents
Culture Documents
Introduction:
Spoken words and music notes are nothing more than very familiar complex
sounds
However, music and speech can be distinguished from most other environmental
sounds
Music and speech are created with perceivers in mind
o Both music and speech serve to communicate, and both can convey
emotion and deeper meanings
Music:
The oldest discovered musical instruments (flutes carved from vulture bones) are
at least 30,000 years’ old
Musical Notes:
o One of the most important characteristics of any acoustic signal is
frequency
o Brain structures for processing sounds are tonotopically organized to
correspond to frequency
o The psychological quality of perceived frequency is pitch
Pitch:
The psychological aspect of sound related mainly to
perceived frequency
o Figure 11.1: Illustrates the extent of the frequency range
of musical sounds in relation to human hearing
The sounds of music extend across a frequency
range from about 35 to 4200 Hz
o Chords:
Music is further defined by richer, complex sounds called chords,
which are created when 3 or more notes are played simultaneously
Chord:
A combination of 3 or more musical notes with
different pitches played simultaneously
The simultaneous playing of 2 notes is called a “dyad”
The major distinction between chords is whether they are
consonant or dissonant
Perceived to be most pleasing, consonant chords are
combinations of notes in which the ratios between the note
frequencies are simple
One of the consonant relationships is the octave, in
which the frequencies of the 2 notes are in the simple
ratio of 2:1
Others include: the perfect fifth (3:2); the perfect
fourth (4:3)
Dissonant intervals are defined by less elegant ratios and do
not sound very pleasing
Ex) Minor second (16:15); augmented fourth (45:32)
Because chords are defined by the ratios of the note frequencies
combined to produce them, they are named the same no matter
what octave they’re played in
In the musical helix in Figure 11.3, the relationships between the
notes of the chord on the helix stay the same; the only thing that
changes is the pattern’s height
Chords are made up of 3 or more notes and can be played
with different tone heights while their chromatic relationships
are maintained. Here the G major chord (purple shading) is
shown being played at different heights
o Cultural Differences:
Musical scales and intervals vary widely across cultures
Although potent relationships between notes such as octaves are
relatively universal, different musical traditions use different
numbers of notes and spaces between notes within an octave
Heptatonic Scale 7 note scale
Pentatonic Scale 5 notes per octave
Traditional in many parts of the world, especially Asia
Speakers of tone languages (Chinese, Thai, Vietnamese) use
changes in voice pitch (fundamental frequency) to distinguish
different words in their language
It spears that language may affect the types of musical scales
people use
Changes in pitch direction are larger and more frequent in
spoken tone languages
Parallel acoustic differences are found between the
pentatonic music of China, Thailand, and Vietnam and the
heptatonic music of Western cultures
Pitch changes in Asian music are larger and occur more
often
In scales for which fewer notes comprise an octave, notes may be
more loosely tuned than are notes in the heptatonic Western scale
When there are fewer notes to distinguish, a wider range of
pitches could qualify for a given note
Because people around the world have very different listening
experiences, you might expect them to hear musical notes in
different ways
Furthermore, infants seem equipped to learn whatever scale
is used in their environment
STUDY: Lynch & Eilers (1990)
Tested the degree to which 6-month-old infants
noticed inappropriate notes within both the traditional
Western scale and the Javanese pelog scale
The infants appeared to be equally good at detecting
such “mistakes” within both Western and Javanese
scales, but adults from Florida were reliably better at
detecting deviations from the Western scale
o Absolute Pitch:
“Absolute Pitch (AP) is often referred to as “perfect pitch”
Absolute Pitch (AP):
A rare ability whereby some people are able to very
accurately name or produce notes without
comparison to other notes
In this way, most people identify notes relative to one another
AP relates specifically to musical notes, because people with AP
share the same basic sense of hearing a other people
Their auditory systems are no more sensitive than normal,
and they are no better than usual at detecting differences
between sounds
AP is very rare, occurring in less than one in 10,000 people
Among musicians, this rare skill is often considered to be
highly desirable
Researchers disagree as to how some people come to have AP
One idea is that people might acquire AP following a great
deal of practice;
However, it is very difficult for adults to acquire AP
even through a great deal of practice
In contrast to learning AP, others have suggested that some
people are born with AP, and this is consistent with the fact
that AP appears to run in families
However, children in the same family share more than
genetics
o In a home in which one child receives music
lessons, it is likely that siblings also receive
music lesion
There is variation in AP, and this variation present a
challenge for identifying possible genetic components
Most traits that are influenced by genetics, such as
height, show variation (they are not all or none) and
variation in expertise also occurs for skills that are
learned
The best explanation for AP might be that it is acquired
through experience, but that experience must occur at a
young age
In studies measuring abilities across many people,
the age at which musical training begins is well
correlated with future possession of AP
AP might not be so absolute, and adults with AP might be more
flexible in their listening than typically thought
STUDY: Hedger, Heald & Nusbaum (2013)
After first testing the identification of notes by
listeners, the researchers had their participants listen
to Johannes Brahms’s Symphony No. 1
What the researchers did not tell their listeners was
that, during the first part of the piece, they very
gradually “detuned” the music by making all the notes
a bit lower (flat) before playing the music at these
lower frequencies for a while longer
After the musical passage was complete, AP listeners
were more likely to judge flat notes as being in0tune
and to judge in-tune notes as being mistuned
While it may be very difficult to train adults to become
AP listeners, perception of musical notes by adults
with AP can be shifted by musical experience
o Sensation & Perception in Everyday Life: Music and Emotion
Listening to music affects people’s mod and emotions
When listeners hear pleasant-sounding chords preceding a word,
they are faster to respond that a word such as charm is positive,
and they are slower to respond that a word such as evil is negative
Given the powerful effects of music on mood and emotion, some
clinical psychologists practice music therapy.
Music has deep physiological effects
Music can promote positive emotions, reduce pain, and
alleviate stress, and it may even improve resistance to
disease
Evidence across many studies suggests that music can have
a positive impact on pain, anxiety, mood and overall quality
of life for cancer patients
When people listen to highly pleasurable music, they
experience changes in hear rate, muscle electrical activity,
and respiration, as well as blood flow in brain regions that
are thought to be involved in reward and motivation
Music Making:
o Notes or chords can form a melody, a sequence of sounds perceived as a
single coherent structure
Melody:
A sequence of notes or chords perceived as a single
coherent structure
Note that a melody is defined by its contour (the pattern of rises
and declines in pitch) rather than by an exact sequence of sound
frequencies
o One simple way in which melody is not a sequence of specific sounds:
Shift every note of a melody by one octave and the resulting
melody is the same
o In addition to varying in pitch, notes and chords vary in duration
The average duration of a set of notes in a melody defines the
music’s tempo
Tempo:
o The perceived speed of the presentation of sounds
Any melody can be played at either a fast or slow tempo
o But the relative durations within a sequence of notes
are a critical part of the melodies themselves
If the notes of a given sequence are played with different
relative durations, we will hear completely different melodies
o Rhythm:
In addition to varying in speed, music varies in rhythm
Many (or even most) activities have rhythm
Perhaps it is the very commonness of rhythm that causes us to
hear nearly all sounds as rhythmic, even when they’re not
Thaddeus Bolton
Conducted experiments in which he played a sequence of
identical sounds perfectly spaced in time; they had no
rhythm
Nevertheless, his listeners readily reported that the sounds
occurred in groups of 2, 3, or 4
They reported hearing the first sound of a group as
“accented” or “stressed”, while the remaining sounds were
unaccented, or unstressed
Listeners are predisposed to grouping sounds unto rhythmic
patterns
Several qualities contribute to whether sounds will be heard
as accented (Stressed) or unaccented (unstressed)
o Sounds that are longer, louder, and higher in pitch all
are more likely to be heard as leading their groups
o The timing relationship between one sound and the
others in a sequence also helps determine accent
Listeners prefer, or at least expect, sequences of notes to be fairly
regular and this tendency provides opportunities for composers to
get creative with their beats
One way to be creative in deviating from a bland succession
of regular beats is to introduce syncopation
o Syncopation:
Any deviation from a regular rhythm
Ex) Accenting a note that is expected to be
unaccented or not playing a note (replacing it
with a rest) when a note is expected
o Syncopation has been used for centuries and can be
found in compositions by all the great composers,
including Back, Beethoven and Mozart
o Ex) Syncopated Auditory Polyrhythms
When 2 different rhythms are overlapped, they
can collide in interesting ways
(If one rhythm is based on 3 beats
[AaaAaaAaaAaa] and the other on 4
[BbbbBbbbBbbbBbbb], the first
accented sound for both rhythms will
coincide only once every 12 beats
Across the 11 intervening beats, the 2
rhythms will be out of sync
When we listen to syncopated polyrhythms,
one of the 2 rhythms becomes the dominant or
controlling rhythm, and the other rhythm tends
to be perceptually adjusted to accommodate
the first (Figure 11.5)
In particular, the accented beat of the
subordinate rhythm shifts in time
Thus, syncopation is the perception that beats
in the subordinate rhythm have actually
travelled backward or forward in time
Rhythm is in large part psychological
We can produce sequences of sounds that are rhythmic and
are perceived as such
But we also hear rhythm when it does not exist, and notes
effectively travel in time to maintain the perception of
consistent rhythm
o Melody Development:
Melody is essentially a psychological entity
It is our experience with a particular sequence of notes or with
similar sequences that helps us perceive coherence
STUDY: Saffran et al. (1999)
Studies of 8-month-old listeners reveal that learning of
melodies begins quite early
Created 6 simple and deliberately novel “melodies”
composed of sequences of 3 tones
Infants sat on their parents’’ lap while hearing only 3 minutes
of continuous random repetitions of the melodies
Next, infants heard both the original melodies and series of
new three-tone sequences
o The new sequences contained the same notes as the
originals, but one part of the sequence was taken
from one melody and another part from another
melody
Because the infants responded differently to the new
melodies, we can deduce that they had learned something
about the original melodies
The ability to learn new melodies is not limited to simple sequences
of tones
STUDY: Saffran, Loman, & Robertson (2000)
o Study with 7-month-old infants, parents played a
recording of 2 Mozart sonata movements to their
infants every day for 2 weeks
o After another 2 weeks had passed, the infants were
tested in laboratory to see whether they remembered
the movements
o Infant listeners responded differently to the original
movements than to similar Mozart movements
introduced to them for the first time in the laboratory
Speech:
Humans are capable of producing an incredible range of distinct speech sounds
(the 5000 or so languages across the world use over 850 different speech
sounds)
The flexibility arises from the unique structure of the human vocal tract:
o Vocal Tract:
The airway above the
larynx used for the
production of speech
The vocal tract includes the
oral tract and nasal tract
o (Figure 11.6)
a) Speech production has 3
basic components:
respiration (lungs)
phonation (vocal folds), and
articulation (vocal tract)
b) The vocal tract is made up of the oral and nasal tracts
o Unlike that of other animals, the human larynx is positioned quite low in
the throat
One notorious disadvantage of such a low larynx is that humans
choke on food more easily than any other animal does
The fact that these life-threatening anatomical liabilities were
evolutionarily trumped by the survival advantage of oral
communication is a testament to the importance of language to
human life
Speech Production:
o The production of speech has 3 basic components
Respiration (lungs)
Phonation (vocal folds)
Articulation (vocal tract)
o Speaking fluently requires an impressive degree of coordination among
these components
o Respiration and Phonation:
To initiate a speech sound, air must be pushed out of the lungs,
through the trachea and up to the larynx
The diaphragm flexes to draw air into the lungs, and elastic recoil
forces air back out
At the larynx, air must pass through the 2 vocal folds, which are
made up of muscle tissue that can be adjusted to vary how freely
air passes through the opening between them
These adjustments are described as types of phonation
o Phonation:
The process through which vocal golds are
made to vibrate when air pushes out of the
lungs
The rate at which vocal folds vibrate depends on their stiffness and
mass
Vocal folds become stuffer and vibrate faster as their tension
increases, creating sounds with higher pitch
Children, who have relatively small vocal folds, have higher
pitched voices than adults do
Adult men generally have lower-pitched voices than women
have, because one of the effects of testosterone during
puberty is to increase the mass of the vocal folds
By varying the tension of vocal folds (stiffness) and the
pressure of airflow from the lungs, individual talkers can vary
the fundamental frequency of voices sounds
If we were to measure the sound right after the larynx, we would
see that vibration of the vocal folds creates a harmonic spectrum
Figure 11.7a
o The spectrum of sound coming from the vocal folds is
a harmonic spectrum
o (A) After passing through the vocal tract, which has
resonances based on the vocal tract’s shape at the
time,
o (B) there are peaks (formants) an troughs of energy at
different frequencies in the sounds that come out of
the mouth (C)
If we could listen to just this part of speech, it would sound
like a buzz
The first harmonic corresponds to the actual rate of physical
vibration of the vocal folds (the fundamental frequency)
Talkers can make interesting modifications in the way their
vocal folds vibrate and singers can vary vocal fold tension
and air pressure to sing notes with widely varying
frequencies
o However, the really extraordinary part of producing
speech sound occurs above the larynx and vocal
folds
o Articulation:
The area above the larynx (the oral tract and the nasal tract
combined) is referred to as the “vocal tract”
Humans have an unrivaled ability to change the shape of the vocal
tract by manipulating the jaw, lips, tongue body, tongue tip, velum
(soft palate), and other vocal-tract structures
These manipulations are referred to as articulation:
o Articulation:
The act or manner of producing a speech
sound using the vocal tract
Changing the size and shape of the space through which sound
passes increases and decreases energy at different frequencies
We call these effects “resonance characteristics” and the
spectra of speech sounds are shaped by the way people
configure their tracts as resonators
Peaks in the speech spectrum are referred to as formants, and
formants are labeled by number, from lowest frequency to highest
Formants:
o A resonance of the vocal tract
o Formants are specified by their center frequency and
are denoted by integers that increase with relative
frequency
These concentrations in energy occur at different
frequencies, depending on the length of the vocal tract
o Fort shorter vocal tracts (in children and smaller
adults), formants are at higher frequencies than for
longer vocal tract
o Because absolute frequencies change depending on
who’s talking, listeners must use the relationships
between formant peaks to perceive speech sounds
o Only the first 3 formants are shown in figure 11.7c
For the most part, we can distinguish almost all
speech sounds on the basis of energy in the
region of these lowest 3 formants
However, additional formants do exist, at
higher frequencies with lower amplitudes
Constant frequency spectra
Ex) If a sound started with a 5-dB, 100-Hz component and a
60-dB, 200-Hz component, these frequencies continued at
these amplitudes for the duration of the sound
One of the most distinctive characteristics of speech sounds is that
their spectra change over time
To represent this third dimension (time) in addition to the
dimensions of frequency and amplitude represented
infrequency spectra of static sounds, auditory researchers
use a type of display called a spectrogram
o Spectrogram:
In sound analysis, a 3D display that plots time
on the horizontal axis, frequency on the vertical
axis and amplitude (intensity) on a colour or
gray scale
o Figure 11.8
A) For sounds that do not vary over time,
frequency spectra can be represented by
graphs that plot amplitude on the y-axis and
frequency on the x-axis
B) To graphically show sounds whose spectra
change over time, we rotate the graph so that
frequency is plotted on the y-axis
C) Then we plot time on the x-axis, with the
amplitude of each frequency during each time
sliver represented by colour (redder showing
greater intensity)
The spectrogram in part (c) shows the
acoustic signal produced by a male
uttering the sentence “we were away a
year ago”
o Formants show up clearly in spectrograms as bands
of acoustic energy that undulate up and down,
depending on the speech sounds being produced
Speech Perception:
o Speech production is very fast
In casual conversation we produce about 10-15 consonants and
vowels per second
o To achieve this acoustic feat, our articulators must do many different
things very quickly
However, articulators can move only so fast, and mass inertia keep
articulators from getting all the way to the position for the next
consonant or vowel
Experienced talkers also adjust their production in anticipation of
where articulators need to be next
In these ways, production of one speech sound overlaps
production of the next
o This overlap of articulation in space and time is called
coarticulation:
Coarticulation:
The phenomenon in speech whereby
attributes of successive speech units
overlap in articulatory or acoustic
patterns
Although coarticulation does not cause much
trouble for listeners understanding speech, it
has made it harder for speech perception
researchers to explain how we do it
o Coarticulation and Lack of Invariance:
Context sensitivity is a signature property of speech
One might expect context sensitivity to cause a significant
problem for speech perceivers, because it means there are
no “invariants” that we can count on to uniquely identify
different speech sounds
Explaining how listeners understand speech despite all this
variation has been one of the most significant challenges for
speech researchers
Context sensitivity due to articulation also presents of the
greatest difficulties in developing computer recognition of
speech
o We cannot program or train a computer to recognize a
speech sound (consonant or vowel) without also
taking into consideration which speech sounds
precede and follow that sound
o And we cannot identify those preceding and following
sounds without also taking into consideration which
sounds precede and follow them, and so on.
o Categorical Perception:
Something in the acoustic signal must lead children to perceive the
‘d’ in deem, doom, and dam as the same consonant
Shortly after WWII, researchers invented machines that could
produce speech like sounds, and they started testing listeners to try
to determine exactly what the acoustic cues were that enabled
them to distinguish different speech sounds
They found for example, that by varying the transitions of F2
and F3, they could produce sounds that listeners reliably
reported haring as ‘bah’, ‘dah’ or ‘gah’
o Figure 11.13
The sound spectrograms at the bottom this
figure indicate auditory stimuli that change
smoothly from a clear ‘bah’ on the left through
‘dah’ to a clear ‘gah’ on the right
Perception, whoever, does not change
smoothly
All the sounds on the left sound like a
‘bah’ (blue curve) until we reach a sharp
‘bah’-‘dah’ border
Listeners are much better at discriminating a
‘bah’ from a ‘dah’ than at discriminating 2 ‘bah’s
or 2 ‘gah’s
The dashed black line indicates discrimination
performance
The same is true for ‘dah’ (red curve) and ‘gah’
(green curve
STUDY:
These researchers knew that making small, incremental
changes to simple acoustic stimuli such as pure tones leads
to gradual changes in people’s perception of these stimuli
o Ex) Tones sound just a little higher in pitch with each
small step in frequency
Surprisingly, speech sounds were not
perceived in this way
o When researchers started with a synthesized ‘bah’
and gradually varied the formant transitions moving
toward ‘dah’ and then ‘gah’, listeners responses did
not gradually change ‘bah’ to bah’-ish ‘dah’ to ‘dah’ to
‘dah’-ish ‘gah’ to ‘gah
Instead listeners identities these sounds as
changing abruptly from one consonant to
another
Furthermore, listeners appeared incapable of hearing that
much of anything was different when 2 sounds were labeled
as the same consonant
o In this second part of the experiments, researchers
played pairs of synthesized speech sounds and asked
listeners to tell them apart
o Listeners performed almost perfectly when detecting
small differences between 2 sounds if one was ‘bah’
and the other was ‘dah’
o But if both sounds were ‘bah’ or both were ‘dah’,
performance dropped to nearly chance levels (dashed
line), even though the differences in formant
transitions in the first and second pairs of stimuli were
equally large
This pattern of results has come to be called categorical perception
Categorical Perception:
o For speech as well as other complex sounds and
images, the phenomenon by which the discrimination
of items is no better than the ability to label items
3 qualities define categorical perception
o (1) A sharp labeling (identification) function
o (2) Discontinuous discrimination performance
o (3) Researchers can predict discrimination
performance on the basis of labelling data
Listeners report hearing differences between sounds only
when those differences would result in different labels for the
sounds, so the ability to discriminate sounds can be
predicted by how listeners label the sounds
Learning to Listen:
o Experience is incredibly important for visual perception, particularly the
higher-level perception of objects and events in the world
o Experience is every bit as important for auditory perception, especially for
the perception of speech
Experience with speech begins very early in development
Babies gain significant experience with speech even before
they’re born
o Measurements of heart rate as an indicator of the
ability to notice change between speech sounds have
revealed that late-term fetuses can discriminate
between different vowel sounds
o Prenatal experience with speech sounds appears to
have considerable influence on subsequent
perception
A newborn prefers hearing her mother’s voice
over other women’s voices
Newborns prefer hearing particular children’s
stories that were read aloud by their mothers
during the third trimester of pregnancy
o Learning Words:
The whole point of producing and perceiving these speech sounds
is to put them together to form words, which are the units of
language that convey meaning
How do novice language learners (infants) make the leap from
streams of meaningless speech sounds to the meaningful groups of
speech sounds that we call words?
No string of speech sounds is inherently meaningful
o Infants in different places must learn the words that
are specific to their native languages
A series of consonants and vowels within a single word tend
to “run into” one another because of coarticulation
o It turns out the situation is not much between for a
series of words forming a sentence
Our perception usually seems at odds with this acoustic reality
When we listen to someone talking to us in our native
language, individual words seem to stand out quite clearly as
separate entities
But listening to someone speak in a different language is
often a very different experience
STUDY: Saffran, Aslin & Newport (1996)
To study how infants, learn words from the continuous
streams of speech that they encounter in their environment,
Each of the words of this new language had 2 syllables
Next they created randomized sequences of these novel
words, with words running together just as they do in fluently
produced sentences
o Figure 11.22
8 month old infants can learn to pick out words
from streams of continuous speech based on
the extent to which successive syllables are
predictable or unpredictable
While sitting on their parents’ laps, infants
heard 2-minute sequences of syllables (a)
In the second part of the experiment, infants
were familiar with 3 syllable sequences (tokibu,
gopila, gikoba, tipolu) that they had heard
before (b), but they noticed that they had never
heard other syllable combinations (poluto,
bugopi, kobati)
8 month old infants listened to a 2-minute such as this while
sitting on their parent’s laps
After this brief period of learning, infants heard either “tupiro”
(one of the novel words) or “pabiku” (a new combination of
the same syllables used to produce the “real” novel words)
The infants listened longer to the nonwords than to the
words, indicating that after 2 minutes of exposure, they had
already begun to learn the words in this new “language”
The researchers suggest that the infants in their study
learned the words by being sensitive to the statistics of the
sequences of sounds that they had learned in the first part of
the experiment
o In the real world of language, words are simply
sequences of speech sounds that tend to occur
together
Other sequences occur together less often
They suggest that infants learn to pick words out of the
speech stream by accumulating experience with sounds that
tend to occur together;
o These are words (at least to babies)
Ex) Infants eventually split the “word” allgone
into 2 as they acquire more linguistic
experience
When sounds that are rarely heard together occur in
combination, that’s a sign that there is a break between 2
words
Figure 11.23
o The superior temporal gyrus extends along the upper
surface of each temporal lobe
o A1 and the adjacent belt area are hidden in the
Sylvian fissure, with only the parabelt area exposed
toward the posterior superior temporal gyrus
o Portions of the superior temporal gyri (left and right)
are active when listen to complex sounds
These nearby areas of auditory cortex are called belt and
parabelt areas
o They are often referred to as “secondary” or
“association” areas
o We see these areas activated when listeners hear
speech and music
At these relatively early levels of cortical processing, activity in
response to music, speech, and other complex sounds is relatively
balanced across the 2 hemispheres
o Because we know that language is typically lateralized to one hemisphere
(usually the left for right handed people), processing of speech should
become more lateralized at some point because perceiving speech is part
of understanding language
o Listeners are very good at understanding speech even under adverse
circumstances when some parts of the signal are missing or distorted, so
this capability makes it pretty difficult to construct stimuli that are complex
like speech without being heard as speech
o STUDY: Rosen et al. (2001)
Developed a particularly clever way to tease apart cortical
responses to acoustic complexity from responses to speech per se
Figure 11.24:
o Stimuli created to measure cortical responses to
acoustic complexity versus responses to speech
o a) This spectrogram represented an intact intelligible
sentence with natural changes in frequency and
amplitude
o b-e) In these spectrograms, the same sentence is
now unintelligible because either frequency changes
were removed (b) or amplitude changes were
removed (c)
o The spectrogram in (d) has frequency changes from
(a) but amplitude changes from (e)
o While (d) is equal to (a) in acoustic complexity, it is
unintelligible because of the mismatch between
changes in amplitude and frequency
They began with complete sentences that included natural changes
in both amplitude and frequency (a), although they replaced voicing
vibrations with noise
Although using noise made the sentences sound whispered,
they were perfectly intelligible
Next the researchers took away changes in frequency, leaving
behind only changes in amplitude (b)
Then they took away changed in amplitude, leaving only changes in
frequency (c)
Finally, to match the amount of acoustic complexity in speech, they
created hybrid “sentences” by adding the amplitude changes of one
sentence (e) to the frequency changes in another (a)
These hybrids (d) were just as complex as the sentence in
(a) but they were completely unintelligible
The researchers played these 4 types of sentences (a-d) to
listeners while their brains were being scanned using PET
Figure 11.25:
o Substantial neural activity was observed in both left
and right superior temporal lobes (red) when listeners
heard hybrid sentences such as the one shown in (d),
for which amplitude and frequency changes came
from different sentences
o An increase in activity in only the language-dominant
left anterior temporal lobe (purple) was evident when
the sentences were intelligible (a), with matching
amplitude and frequency
Neural activity in response to unintelligible hybrid “sentences” (d)
was found bilaterally, with a little more activity along the superior
temporal gyrus running along the top of the right temporal lobe
Responses in the left superior temporal gyrus became dominant
only when sentences were intelligible because amplitude and
frequency changes coincided properly (a)
From these finding, it appears that language-dominant
hemispheres responses depend on listeners using speech for
linguistic understanding, and not from acoustic complexity alone
o Evidence suggests that as sounds become more complex, they are
processed in more anterior and ventral regions of the superior temporal
cortex farther away from A1
When speech sounds become more clearly a part of language, they
are processed more anteriorly (forward) and in the left temporal
lobe
However, this back-to-front path may not be all there
is to speech perception in the brain
Other brain regions are likely involved
o Figure 11.26
Prior to performing brain surgery to remove a tor or
to reduce effects of very severe epilepsy,
neurosurgeons often place electrodes directly on the
surface of the brain to localize regions of neural
activity
Before surgery, researchers can take advantage of
these electrodes to record responses to experimental
materials
o STUDY: Mesgarani et al. (2014)
Played 500 sentences spoken by 400 different people to 6 human
participants while recording directly on the surface of their superior
temporal gyri
At different single electrodes, they found that response were
selective for certain lasses of speech sounds such as fricatives,
stop consonants, or vowels
In some cases, responses were even more selective
Ex) Responses to stop consonants might be selective for only
voiceless stop (p, t, k) or for one place of articulation (d) versus
others (b, g)
o STUDY: Chang et al. (2010)
Played a series of synthesized syllables varying incrementally from
‘bah’ to ‘dah’ to ‘gah’ to patients who were awaiting surgery
The researchers found that, across populations of thousands of
neurons recorded by their surface electrodes, neural responses
were much like listening data for the same syllables
Neural responses were very similar for pairs of stimuli that listeners
would label as the same, as ‘bah’ or ‘dah’ or ‘gah’
Responses were quit distinct for pairs of syllables that were
acoustically separated by the same extent but were labeled as
different, as ‘bah’ and ‘dah’ or as ‘dah’ and ‘gah’
o As with perception in general, cortical organization depends critically on
experience, and the perception of speech sounds is tremendously
experience-dependent
There may be a place in the left temporal love where experience
with English ‘b’ and ‘d’ shapes neural activity
o Cortical processes related to speech perception could be distinguished
from brain processes that contribute to the perception of other complex
sounds in 2 other ways
o Listeners have a wealth of experience simultaneously hearing speech and
viewing talkers’ faces, and the McGurk effect is evidence of the profound
effects that visual cues can have on the way speech sounds are perceived
o STUDY: Reale et al. (2007)
Used grids of electrodes on the surface of posterior temporal cortex
in their experiments investigating cortical processing of audio and
visual speech
Found that neural responses to auditory speech stimuli in the
language-dominant hemisphere were influenced substantially by
simultaneous viewing of the lower half of the face of a person either
producing audible speech or carrying out a meaningless mouth
motion
o Because visual information combines with auditory experience when we’re
perceiving speech, you may wonder what happens for people who are
deaf
STUDY: Zatorre (2001)
Studied a group of people with cochlear implants who
previously had been deaf
These listeners exhibited increased brain activity in the
visual cortex when listening to speech
Zatorre hypothesized that this activation of visual areas of
the brain is the result of increased experience and ability
with lip-reading for these previously deaf individuals
o Whenever people talk, they both hear the sounds they’re producing and
experience the movements of speech production in their own vocal tracts