You are on page 1of 6

Describe the mechanisms involved in sound

localization
When analysing the different stimuli for sound localization, it is useful to distinguish
between those requiring both ears and those merely requiring one ear. As far as both
ears go, differences in both time and intensity can be used to recognise the position of
the sound stimulus, although this will depend on the sound frequency. High frequency
sounds, with a short wavelength, are affected by the head allowing sounds to be
localized by their level, that is to say the logarithm of their amplitude, normally
measured in dB. Lower frequency sounds are localized using phase difference in the
waveform. But to see that these cannot be the only methods of localising sounds, two
ears can only tell localise sounds relative to the midline. For vertical localization, and
indeed to establish whether a sound is behind or in front of you, it is necessary to use
each ear separately. The horizontal axis, along which we use both our ears to localise
sounds, is also known as the azimuthal axis.

The external ear, consisting of the pinna with its whorls and bumps and the external
auditory canal, adds colouration to the sound. It emphasises and de-emphasizes
certain frequencies. This alters the sound spectrum. This allows us to determine in
particular the vertical direction of the sound, since which frequencies are emphasised
and de-emphasised depends strongly on the sound elevation. Also, whether the
stimulus comes from in front of or behind us functionally has to be determined by this
method, since there will be no difference between how the two ears perceive these
sounds.

The binaural methods of localisation rely on the central nervous system. This is in fact
one of the most important functions of the auditory cortex. It is useful to take a step
back and remember the problems to be overcome. In the visual system, maps of the
visual field are present in the retina and V1: in the lower pathways. As we get higher,
concepts like spatial frequency and edge detection start to occur. In the auditory
system things are, in a manner of speaking, the other way round. The cochlea is not
organised by position, but rather any “maps” are restricted to frequency1. From here,
the auditory nerve terminates in the cochlear nucleus, from which fibres project to the
superior olive and inferior colliculus. This time round, the processing for localisation
will occur at these, higher levels. As already mentioned, two cues are used: interaural
time differences and interaural level (amplitude) differences. Differences in sound
amplitude occur because the head gets in the way of the sound, attenuating it on one
side. This only works if the wavelength is sufficiently small and the frequency
sufficiently high. Otherwise the sound waves can bend round the head without any
attenuation. This is the same mechanism as how some houses in valleys have problem
receiving high frequency TV signals, but can still pick up longer wavelength radio
signals. In practice, level sensitivity is useful for localising sounds with frequencies
above 3 kHz, and time sensitivity is useful for sounds below 1.5 kHz.

1
Indeed, at the higher levels of the auditory pathway, namely the medial geniculate nucleus of the
thalamus and the auditory fields of the cortex (AI: AII shows responses to more complex stimuli), a
common feature is tonotopic organization. There are isofrequency laminae, namely sheets of cortex
where the neurons all respond to the same characteristic frequency.
Perhaps the easiest way to pigeonhole (and hopefully explain) the binaural method of
sound localization is to briefly tour the main anatomical structures on the auditory
pathway, and explain their relevance to localization. The cochlear nucleus is the
termination of the auditory nerve from the cochlea, and is the best understood and
studied area of the pathway. It the point at which the processing starts. The cochlear
nuclei have a number of different cells, classified by their morphology and their
“poststimulus time histogram” (PST), that is to say the pattern of their response
(spikes) after a stimulus. Cell types include pyramidal cells, octopus cells, globular
bushy cells, multipolar cells and spherical bushy cells, and these cells predominate in
different regions of the cochlear nucleus. There is some correlation between cell
morphology and its response type (cells labelled with horse radish peroxidase can be
experimented on). There are three cochlear nuclei, and functionally the cells of the
dorsal and anteroventral parts are very different, with those of the intermediate
posteroventral nucleus occupying a halfway house2. The dorsal part has cells which
are highly specialised, responding to quite complex stimuli. Since quite a lot of
processing has already occurred, these cells project directly to the inferior colliculus.
The anteroventral part has a variety of cells which show responses more like those of
the neurons innervating them. Since the responses are less sophisticated, cells from
this nucleus pass to the superior olive for a further stage of processing before going on
to the inferior colliculus. Since the superior olive is responsible for much of the
binaural processing for sound localisation, it “follows” that these cells might be
expected to play a prominent role in azimuthal sound localization. And indeed they
do!

There are two types of cell found in the ventral cochlear nucleus. These are the bushy
cells and the stellate cells, which show responses at characteristic frequencies3. The
bushy cells further subdivide into the spherical and globular bushy cells. For example,
the spherical bushy cell shows similarity in their responses to the primary auditory
fibres: so their PST’s are known as “primarylike”. These cells show an early response
to stimulation, called a “prepotential”. On the PST this appears as an early, very
marked depolarisation before an eventual discharge. Anatomically, these neurons
synapse with a very large auditory nerve terminal, the endbulb of Held, and this is
thought to contribute to this prepotential. Functionally, whenever there is a spike in
the auditory nerve fibre it is followed by the prepotential. And by picking up the spike
pattern of the auditory nerve fibres so faithfully, the spherical bushy cells contribute
to picking up the phase locked signals4 since they signal the time of the stimulus very
accurately. Thus they contribute to time localization of the sound5. These cells are
found in the ventral cochlear nucleus.

To ignore the dorsal nucleus totally, however, would be to do it an injustice. It has


many cells, and among them are the fusiform cells. These show excitatory and
inhibitory responses to a broad range of stimuli, and it has been suggested that these

2
At least according to Neurophysiology, R. H. S. Carpenter, although unfortunately my sources seem
to part company somewhat in their descriptions of the cochlear nuclei.
3
So the ventral nucleus seems, in a gross oversimplification, to be coding for frequency in these two
different ways, as well as contributing to sound localization?
4
Which will be discussed in more detail later.
5
This is their most obvious functional specialisation, but the mere fact that they are showing some
response to the level of the sound allows them also to take their place on the pathway for level
localization, as will be explained below.
cells, which together show a spatial firing pattern, help establish the vertical location
of the sound.

In many species, there are other cells that seem to respond to dynamic range at the
expense of frequency. Thus it seems that these cells, which are found in an
anatomically different area of the cochlear nuclei, respond to the sound level. So it
would seem that there may be some anatomical separation of the pathways
responsible for localisation due to time sensitivity and localization due to level
sensitivity, in these species (particularly birds). In humans, however, it seems that the
globular cells feature in both.

We now move onto the superior olive. This divides into medial and lateral parts. The
structure in the CNS responsible for level sensitivity is the lateral superior olive. This
area shows greatest response to high frequency sounds, since in general it receives its
input from neurons of high characteristic frequency. Input to the lateral superior olive
from the ipsilateral side comes from the spherical bushy cells of the cochlear nucleus,
and is excitatory, but that from the contralateral side is inhibitory. From this side, the
excitatory globular bushy cells form synapses on the medial nucleus of the trapezoid
body6, but these neurons use glycine as a neurotransmitter and are inhibitory. Thus the
neurons of the lateral superior olive can compare the sound from each ear, and will be
excited when the ipsilateral ear detects a sound greater in amplitude than that of the
contralateral ear. When the two ears hear sound of the same frequency, the inhibitory
response seems to outweigh the excitatory response, and there is little response. The
lateral superior olive is tonotopically organised, so that in addition to this information
about localisation, the localisation is “organised” by sound frequency. The lateral
superior olive then gives out fibres that cross to the contralateral inferior colliculus.
So the initial excitatory ipsilateral response then becomes a contralateral excitatory
response.

The time difference method compares differences in the phase for the same sound
entering each ear. So the amount that the two signals are out of phase by tells us the
time difference between when the sound reached each ear. Knowing the speed of
sound, time is just distance/speed, the ear can “work this out” and thus localise the
sound. This method is most effective for sounds below 1.5kHz. Above this frequency,
the waveform repeats too often, so that even when a difference of phase is detected, it
is not clear whether the true time difference is just that, or whether a complete number
of cycles was involved as well. So there is a nice parallelism between the two binaural
cues. Level sensitivity localises high frequency sounds above 3 kHz, while time
sensitivity localizes sounds at frequencies of less than 1.5 kHz. And indeed, there is a
reduction in the ability to localize sounds in the azimuthal plane between 1.5 kHz and
3 kHz.

Time sensitivity relies on the phase locking response of low frequency audio receptors
in the cochlea. Consider the auditory response to a pure tone. Responses of auditory
fibres may discharge at a certain phase on the repeating waveform, thus providing a
temporal code of the frequency. They will generally not discharge on every waveform
peak, but when they do discharge it is at the same point in the cycle7.

6
The trapezoid body is the most prominent outflow tract from the anteroventral cochlear nucleus. It
extends at the level of the pons to the three nuclei of the superior olive.
Since phase locking is so crucial to the time localisation, a brief description of its
mechanism will now be given. As the tectorial membrane moves up and down
(vibrates in sympathy with the sound, and moves relative to the basilar membrane
since the axes of their oscillations are different), the stereocilia of the inner hair cell
move backwards and forwards, and the mechanically gated channels open and shut.
So the membrane potential oscillates, in phase with the sound waveform8. The audio
receptor does not have action potentials, and instead releases glutamate in relation to
its membrane potential. Assuming that the amount of glutamate needed to inspire an
action potential in the cochlear ganglion cell is constant, then this will be achieved at
the same point on the oscillating membrane potential, and thus at the same point on
the waveform of the sound. Thus, the frequency of action potentials gives a temporal
code of the frequency of the sound. For a pure tone of a given frequency, the cells of
that characteristic frequency will show the strongest response. But those cells with
similar characteristic frequencies will also show a response, although this will be
weaker than the response of the cells with the same CF. These other cells, however,
also show phase locking, and crucially every cell will fire action potentials at the
same point in the cycle. So for coding sound frequency, there is a spatial code of
which cells are firing (where along the cochlea they are coming from) and there is the
phase locking temporal code. Phase locking is extremely important at low sound
frequencies, where it codes the frequency of the sound, and is used to localise it, but it
declines from about 1 kHz to 3 kHz, since the ac receptor potential itself is lower at
these frequencies. At these frequencies, frequency is coded by the spatial code, while
location is derived from relative amplitudes in the ears.

So as already discussed, the information about phase locking is picked up by the


spherical bushy cells of the anteroventral cochlear nucleus. From here, the
information is passed to the medial superior olive. This structure allows us to tell
differences in time of up to 10 μs. Given that the width of the head, and the speed of
sound, give us a maximum time delay of 700 μs, this minimum time delay is clearly
of the right order of magnitude. This minimum time delay gives rise to the “cone of
uncertainty”, which is a recognition of the fact that we can only localise a sound into a
particular (fairly narrow) cone in space (with apex at the ear).

On each side of the head, the same sound is picked up at slightly different times, and
then undergoes the same processes. There is nothing inherently complex about the
way the medial superior olive codes for the time difference: it is just like a race9,
relying on the time delay of the action potentials. The technical term put forward by
Jeffress is “neural delay line”, although it now seems that only the contralateral input
is actually delayed. In the medial superior olive there are lines of neurons, each
receiving contralateral and ipsilateral input. They are in series, so some will receive
input relatively more rapidly from the ipsilateral side than the contralateral, while for

7
How often they discharge is approximately linearly related to sound level: thus there is a logarithmic
relation between frequency of firing an amplitude. Phase locking becomes important when the level
becomes sufficient to saturate this relationship (due to the refractory period of the action potential).
Though the frequency of the firing is affected by these extraneous factors, exactly when the nerve
fibres fire if they are to fire at all is determined by phase locking.
8
Referring specifically to the ac component of the mechnical current. There is also a dc component, a
fixed deflection from the resting potential due to stimulation, but for phase locking this does not
concern us.
9
But a very politically correct one, where one only wins when one comes in at exactly the same time as
one’s rival!
others they will receive input relatively more rapidly from the contralateral side than
the ipsilateral. But these neurons need to receive input from both sides at the same
time to fire. So the idea is that, depending on the time delay, a specific neuron will
fire, for which the difference in timing between the two sides compensates for the
time delay of the sound reaching each ear. Thus medial olive neurons are tuned to a
specific interaural time difference (ITD). There seems to be a definite mapping along
the anterior-posterior axis. Neurons responding best to 0 ITD are found more
anteriorly, and those responding to a greater lead of the contralateral stimulus are
found more posteriorly. Thus there is a form of desuccation, with neurons responding
best to sounds on the other side of the head. This information is then passed to the
inferior colliculus, where it is used to establish azimuthal location of the sound.

The inferior colliculus receives input from the dorsal part of the cochlear nuclei and
from the superior olive. The pathway from the superior olive is the lateral lemniscus.
The inferior colliculus has several subdivisions, with a dorsal component divided into
four areas, and the large, laminated central nucleus, which is the best studied area. In
some species there is a map of the location of sounds in relation to the head, such as
in barn owls10, although probably not in humans. The only place where there is
undoubtedly such a map is the superior colliculus, but here it is related to integration
of auditory input to enable responses to visual stimuli. The inferior colliculus has to
not only compare level of the sound, but also time differences, to combine these
stimuli into a picture of where the sound is coming from. The ventromedial part of the
inferior colliculus receives input from the high characteristic frequency neurons of the
lateral superior olive, and deal with the level of the sound as described above. The
dorsal part receives input from the low characteristic frequency neurons of the medial
superior olive, and measure differences in time. There does not seem to be much
anatomical integration between the two, since the central nucleus is tonotopically
organised and our two different methods of localisation are relevant for sounds of
very different frequency. Lesions of the inferior colliculus do have effects on
localisation, but fail to completely abolish level difference localisation. This implies
that higher levels must also be involved in processing (“rediscovering”?) these
stimuli.

From here, sound passes to the medial geniculate nucleus of the thalamus, and thence
to AI, the primary auditory cortex. This corresponds to Brodmann’s areas 41 and 42.
In AI, many neurons are sensitive to interaural time and level differences, and their
response depends on the azimuth of the source. Receptive fields are varied, and can be
very wide. Due to the decussation that has occurred earlier, many neurons will
respond to sounds on the contralateral side, and receptive fields can include the whole
of the contralateral side, or even the entire auditory field! There are also cells that
show a much more narrow receptive field. Just as for the visual cortex, the auditory
cortex displays a columnar organisation, with elements in the same column having
similar receptive fields and responses. For the moment at least there is no evidence
that there is a map of the location of the sound, and this is a major difference between
this and the two pathways discussed in the other two essays: vision and
somatosensation. Despite the lack of maps, however, experiments suggest that AI

10
They rely on it to hunt mice, since they do this in the dark and their eyesight is poor anyway.
Humans don’t have this need. Other specialisations of the owls in addition to this map include an over-
representation of cortex for the ultrasound frequencies that they use to localise objects. This is a bit like
the over-representation of the fovea in the lateral geniculate nucleus in humans?
plays an important role in sound localization11. Cats with unilateral lesions of AI
display difficulties with sound localization on the contralateral side, whereas lesions
in other areas do not affect this12. The deficit is particularly marked for certain
frequencies, depending on where exactly in AI the lesion was. The deficit is most
pronounced when the cat is required actually to move toward the sound source, rather
than just indicating (by pressing a bar on the left or right) which side it is on. So the
relatively simple task of lateralization is performed at lower levels such as the
superior olive, but actually placing the sound precisely in space is reserved for AI. So
many structures are involved in localisation of sound, and this is in turn one of the
most important functions of the ear.

Sources

Neurophysiology RHS Carpenter


Principles of Neuroscience Kandel and Schwartz
Fundamental Neuroscience Zigmond and Bloom

11
Fundamental Neuroscience Zigmond and Bloom
12
Thereby displaying “Necessity and Sufficiency”.

You might also like