You are on page 1of 101

SIXTH FRAMEWORK PROGRAMME

PRIORITY 2
INFORMATION SOCIETY TECHNOLOGIES

Contract for:

SPECIFIC TARGETED RESEARCH OR INNOVATION


PROJECT

Annex I - Description of Work


Project acronym: EmCAP
Project full title: Emergent Cognition through Active Perception
Proposal/Contract no.: 013123
Related to other Contract no.: (to be completed by Commission)
Date of preparation of Annex I: 11 March 2005 (draft); 8 April 2005, 28 April
2005; 27 May 2005 (revised)
Operative commencement date of contract: 1 October 2005

TABLE OF CONTENTS

1. Project summary

Page
3

2. Project objectives

3. Participant list

4. Relevance to the objectives of the specific programme and/or


thematic priority
a) Current state of the art and intended advances
b) Summary of innovative aspects of the project
c) Relevance of the project to IST and FET objectives

8
8
17
18

5. Potential Impact

19

6. Project management and exploitation/dissemination plans

21

6.1 Project management

21

6.2 Plan for using and disseminating knowledge

24

6.3 Raising public participation and awareness

25

7. Workplan for whole duration of the project

26

7.1 Introduction general description and milestones

26

7.2 Workplanning and timetable

33

7.3 Graphical presentation of work packages

34

7.4 Work package list /overview

39

7.5 Deliverables list

40

7.6 Work package descriptions

43

8. Project resources and budget overview

70

8.1 Efforts for the project

70

8.2 Overall budget for the project

71

8.3 Management level description of resources and budget.

71

9. Ethical issues

75

Appendix A Consortium description

78

A.1 Participants and consortium

78

A.2 Sub-contracting

92

A.3 Third parties

92

A.4 Funding of third country participants

92

Appendix B References

013123 (EmCAP) Annex I, vers. 3 (27/05/05) Approved by EC on 1 June 2005 page 2 of 101

93

1. Project Summary
Our goal is to investigate how complex cognitive behaviour in artificial systems can
emerge through interacting with an environment, and how, by becoming sensitive to the
properties of the environment, such systems can autonomously develop effective
representations. The underlying hypothesis is that perception is an active process; even in the
absence of overt behaviour, perception involves prediction, and the need for making better
predictions is what drives the development of useful representations and cognitive structures.
We will explore these issues within the realm of music cognition. Music is an ideal domain in
which to investigate cognitive behaviour, since it is a universal phenomenon containing
complex abstractions and temporally extended structures. As music is self-referential there are
no externally determined semantics; the appropriate segmentation of the stream of sounds
depends upon the structure of the signal itself, rather than the need to individuate objects in
the external world. By focusing on music cognition we can directly address problems such as
the autonomous development of representations and processes that support the
characterisation of events and event sequences, the development of categories and useful
abstractions, the representation of situational context, interactions between long-term
knowledge structures and working memory, the role of attention in optimising processing
with respect to the current object of interest, the representation of temporal expectancies, and
the integration of events across many different time scales. We will investigate music
cognition through perceptual experiments and computational modelling studies, embodying
our understanding in the construction of an emergent interactive music system, which will
learn to develop representations and expectations in response to the music it experiences, and
will use these predictions to generate actions in the form of appropriately timed and pitched
sounds.

013123 (EmCAP) Annex I, vers. 3 (27/05/05) Approved by EC on 1 June 2005 page 3 of 101

2. Project objectives
a. Introduction
The goal of this project is to investigate how complex cognitive behaviour in artificial
systems can emerge through interacting with an environment, and how, by becoming sensitive
to the properties of the environment, such systems can develop effective representations and
processing structures autonomously.
The central hypothesis underlying this project is that perception is essentially an active
process. Recent work on auditory processing has shown that the auditory system, far from
being a passive receptor of sounds, is constantly adjusting its processing to reflect the current
acoustic context and task demands [1, 2]. The idea is that perception, even in the absence of
overt behaviour, involves a process of prediction, and that the need for making better
predictions is what drives the development of useful representations and cognitive structures;
ultimately giving rise to intelligent cognition. Conversely, perceptual phenomena and the
representations and processing structures in the brain can only be understood in relation to the
structure of the environment.
We intend to explore these issues within the realm of music cognition. Music is an ideal
domain in which to investigate complex cognitive behaviour, since music, like language, is a
universal phenomenon containing complex abstractions and temporally extended structures,
whose organisation is constrained by underlying rules or conventions that participants need to
understand for effective cognition and interaction. Music shares many other characteristics
with language; perception evolves in time, and the acoustic stimulus is processed within the
context of locally determined expectations, long-term knowledge and the focus of attention.
However, since music is self-referential there are no externally determined semantics; the
appropriate segmentation of the stream of sounds depends upon the structure of the signal
itself, rather than the need to individuate objects in the external world. By focusing our
investigations on music cognition we can directly address problems such as the autonomous
development of representations and processes that support the characterisation of events and
event sequences, the development of categories and useful abstractions, the representation and
evaluation of situational context, interactions between long-term knowledge structures and
working memory, the role of attention in optimising processing with respect to the current
object of interest, the representation of time and temporal expectancies, and the integration of
events across many different time scales.
We will investigate the development of music cognition by combining the
complementary approaches of perceptual experiments using human subjects, functional and
neurocomputational modelling, and the implementation of an interactive embodied cognitive
system. Experimental studies using neonates will determine for the first time whether certain
basic perceptual abstractions, such as pitch, are innate, or whether they develop through early
experience. Adult studies will also explore how the perception of musical form is influenced
by the characteristics of natural language. The results of these experiments will inform
neurocomputational modelling studies. Most existing models of auditory perception are based
upon adult data; however, here we will use the new experimental results to constrain our
models so that they account for the emergence of adult processes and representations through
experience. The computational models will, as far as possible, be consistent with current
understanding of the neurobiology of the human auditory system. This is important both as a
valuable source of guidance and constraint, and to ensure the relevance of predictions made

013123 (EmCAP) Annex I, vers. 3 (27/05/05) Approved by EC on 1 June 2005 page 4 of 101

by our work to further understanding of auditory neuroscience, particularly in providing


useful insights into the properties of higher levels in the human auditory system. From the
modelling studies we will derive theoretical insights into computational processes underlying
active perception and cognition, from which we will distil the essential elements in the
formulation of a generic theoretical model of cognition. These computational principles will
Artificial
behaviour

Emergent
Music System

Computational
principles

Perceptual
Experiments

Innate & adult


abilities

Neurocomputational
Modelling

also be applied in the implementation of an interactive music processing system, the Music
Projector; in which the autonomous development of internal musical codes and expectancies,
and phenomena such as categorization, similarity ranking, and streaming will be investigated.
The system will synthesize, as musical output, the expectancies generated in response to
musical stimuli, which will allow us to compare the musical perceptions and expectancies of
the artificial system with music cognition in humans; thereby closing the experimental loop,
as depicted in the diagram below.
b. Objectives
The principal objectives of the project fall into the three categories described; i)
experimental investigations into the perception of musically relevant stimuli, ii)
neurocomputational modelling of auditory processes subserving music cognition, iii)
identification of theoretical principles and implementation of an emergent music cognition
system. Detailed verifiable and timed objectives are specified within each of the
workpackages.
Experimental investigations into the perception of musically relevant stimuli
Compare the processing of musically meaningful sounds in neonates and adults in
order to distinguish innate from learned levels of abstraction in auditory processes
underlying music perception.
Investigate the role of timbre, in particular the timbres of language, in adult
perception of musical form.
Neurocomputational modelling of auditory processes subserving music cognition
Develop an integrated neurocomputational architecture for auditory processing and
music cognition.
Investigate the role of attention and the computational principles underlying an
active listening system.
Investigate the emergence of representations, processing strategies and perceptual
categories through experience.
013123 (EmCAP) Annex I, vers. 3 (27/05/05) Approved by EC on 1 June 2005 page 5 of 101

Theoretical principles and implementation of an emergent music cognition system


Extract theoretical insights into the computational processes underlying active
perception and music cognition, and formulate principles for a generic model of
cognition.
Implement an emergent interactive music processing system based upon these
cognitive principles.

013123 (EmCAP) Annex I, vers. 3 (27/05/05) Approved by EC on 1 June 2005 page 6 of 101

3. Participant list
List of Participants
Partic.
Role*

Partic.
no.

Participant
name

Participant
short name

Country

Date
enter
project

Date exit
project

CO

University of
Plymouth

UoP

UK

Month 1

Month 36
(end of
project)

CR

Universitat
Pompeu Fabra

FUPF

Spain

Month 1

Month 36
(end of
project)

CR

Magyar
Tudomnyos
Akadmia
Pszicholgiai
Kutatintzet

MTAPI

Hungary

Month 1

Month 36
(end of
project)

CR

Universiteit van
Amsterdam

UvA

Netherlands

Month 1

Month 36
(end of
project)

*CO = Coordinator
CR = Contractor

013123 (EmCAP) Annex I, vers. 3 (27/05/05) Approved by EC on 1 June 2005 page 7 of 101

4. Relevance to the objectives of the specific programme and/or


thematic priority
a. Current state of the art and intended advances
Investigate innate levels of abstraction
It has widely been regarded that a large part of the auditory processing underlying
music perception is automatic and universal, i.e., independent of culture and training. The
assumed automatic and universal processes include not only basic sound analysis functions
but also higher-level operations, such as, for example, temporal grouping [3, 4]. On this basis
it is reasonable to hypothesize that these processes may be innate. However due to the
difficulties in establishing what neonates perceive of their environment, these assumptions
have seldom been tested [5].
Using a method based on an electrophysiological measure, the mismatch negativity
(MMN) response, we are now able to objectively assess what regularities newborns can detect
in the acoustic input. In this project, we will, for the first time, test the operation of abstract
auditory sensory processes in neonates. Using musically meaningful stimulus material, we
will ask whether in neonates, grouping by pattern repetition, timbre-independent processing
of pitch, and the perception of relative pitch, operate similarly to adults. In more general
terms, we will study whether humans possess inborn processes abstracting higher-level
constructs from the acoustic input. In the following, we outline the basis of the method and
the current state of the art of the field.
MMN is a component of the event-related brain potentials (ERP), which is elicited by
sounds violating some regular aspect of the preceding stimulus sequence [6, 7]. The simplest
way to elicit the MMN is to occasionally present a different sound (termed deviant) within
the sequence of a repeating sound (termed standard). Deviation in any acoustic feature
triggers the MMN. However, stimulus repetition is not a prerequisite of MMN elicitation,
MMN also is elicited by violating complex acoustic rules [8]. For example, when most tones
in a sequence complied with the rule the higher the tone-frequency the lower the amplitude,
occasional tones with high frequency and high amplitude or low frequency and low amplitude
elicited the MMN [9]. It has been established that the elicitation of MMN is based on the
detection of mismatch between the incoming sound and what is extrapolated (predicted) from
the preceding sound sequence [10, 11]. Therefore, finding that a given sound following a
regular sequence elicits MMN in a subject indicates that the auditory system of this subject
formed some representation of the regularity that was violated by the deviant.
Because the elicitation of MMN does not require subjects to perform some task or even
to attend to the sounds [12, 13] and MMN can be recorded in newborn infants [14-16], MMN
is an ideal tool for studying the innate capabilities of the human brain for dealing with
complex auditory scenes. As the first step in this line of research, we have recently shown that
building a model of the acoustic environment that separately represents the regularities of
concurrent sound sequences is an innate function of the human auditory system [17].
Music perception and cognition depend on the ability of the auditory system to detect
acoustic regularities. The MMN response has previously been used to study how human
adults represent the sound features relevant for music perception, such as relative pitch [18],
rhythm [19, 20], and tone-grouping [21, 22]. Learning affects the sensory resolution of

013123 (EmCAP) Annex I, vers. 3 (27/05/05) Approved by EC on 1 June 2005 page 8 of 101

auditory stimulus representations as well as what type of regularities can be detected. MMN
studies have shown the effects of general auditory perceptual learning [23, 24] as well as that
of musical training [25-27] on the representation of acoustic regularities and the detection of
regularity violations.
Within the scope of the current project we will take a significant step forward,
determining whether or not some important perceptual processes underlying music perception
are innate. In adults, it has been established that the features of the auditory representations
indexed by MMN closely match perception [11]. As described in the previous section, the
formation of predictions is thought to underlie music cognition. Because the MMN method
determines whether the auditory system of a subject population can produce predictions on
the basis of a given rule, the elicitation of MMN in neonates would suggest that they perceive
the corresponding aspect of music. In contrast, finding differences between the results
obtained in neonates and adults would suggest that the given function develops through
maturation and/or learning. Thus, the experiments of the current proposal will provide new
insights into 1) innate vs. learned processing of musically relevant abstract acoustic features,
2) the operation of model-based auditory prediction at birth, and 3) the perception of music by
newborn babies.
Investigate the influence of timbre on the perception of musical form
Another approach for distinguishing between functions which are innate and those that
develop through experience is to analyse to what extent particular perceptual predispositions
correlate with the properties of important classes of sounds in the environment. The most
significant class of sounds to which humans are exposed during early development is speech.
Speech and other communication sounds are characterised by time-varying spectral patterns,
i.e. by changing timbre, and also by smoothly changing pitch. In most European languages,
timbral patterns generally convey the majority of the semantic information, while pitch tends
to convey complementary information such as intonation and mood. Recent work has shown
that many idiosyncratic phenomena of general pitch perception, as well as common musical
intervals, and the perception of consonance can be predicted from speech spectra and from the
characteristics of pitch in speech [28, 29]; suggesting that these aspects of perception may
develop through normal early experience. This approach can also be used to investigate the
perception of musical form.
The perception of musical form arises from our ability to organise sequences of sounds
into coherent global structures; and interestingly, while musical training may enhance this
ability, it appears to be part of the normal perceptual development of the general population.
We propose to investigate whether typical sequential relationships such as chord progressions,
melodic contours, tension profiles and preferred rhythmic patterns, can similarly be shown to
arise from speech, and whether in this way, the perception of musical form can be predicted
by the characteristics of subjects native language. This project also benefits from having
access to people from a range of contrasting language backgrounds; which will allow
comparative perceptual experiments to be conducted in order to verify the findings of this
analysis.
In further experiments the influence of musical timbre per se on the perception of form
will be investigated. Given the huge variety of musical instruments, and the unlimited
possibilities electronic synthesis, composers have a very rich palette of musical timbres, or
tone colours, at their disposal. However, the combination of different timbres can have
unforeseen consequences on form perception; for example, it has been shown to be far more
difficult to compare pitches of different timbres than those of the same timbre [30]. Although,

013123 (EmCAP) Annex I, vers. 3 (27/05/05) Approved by EC on 1 June 2005 page 9 of 101

there have been a number of studies of the use of timbre in music [31-33], the influence of
timbre on the perception of large-scale musical form has not been systematically investigated.
Development of an integrated neurocomputational architecture for music cognition
Despite some evidence for specialisation, it is clear that music cognition involves a very
widespread network of processing structures in the brain [34]. Contrary to the case in the
visual system, where very detailed and biologically realistic computational models exist [35],
there are few large-scale models of auditory cortical processing [36]. Instead, auditory
modelling has largely focussed on specific aspects of subcortical auditory processing, e.g.
sound localisation, pitch perception or spectral decomposition in the cochlea. There is an
urgent need for systems-level models of auditory processing to support investigations into
important aspects of audition such as cortical representations and processes, the effects of
early experience, the formation of perceptual categories, and the influence of attention in
modulating the processing of incoming stimuli; none of which have been addressed to any
great extent in previous modelling studies. We therefore propose in this project to develop an
integrated neurocomputational model of music cognition, constrained by the known
architecture of the auditory system, and incorporating neurobiologically realistic models of
processing in the cochlea and sub-cortical auditory nuclei, thalamus, primary auditory cortex,
superior temporal gyrus and prefrontal cortex.
The modelling of auditory cortical processing will be based upon our previous work in
the visual system [37-42], where a powerful theoretical framework was developed and shown
to be able to account simultaneously for empirical evidence from experimental measurements
at three different levels of cognitive neuroscience, namely: microscopic (single-cells) [35, 38],
mesoscopic (fMRI, EEG, neuroanatomy) [39, 43], and macroscopic (psychophysics,
neuropsychology) [44-46]. This approach to the computational modelling of cortex was
motivated by the need to employ a level of description accurate enough to allow the relevant
mechanisms at the level of neurons and synapses to be properly taken into account, while at
the same time simple enough, so that inferences regarding the relevant principles underlying
perception and cognition could be made.
A common assumption is that a proper level of description at the microscopic level is
captured by the spiking and synaptic dynamics of one-compartment, point-like models of
neurons, such as Integrate-and-Fire-Models [47]; these dynamics allow the use of realistic
biophysical constants (like conductances and delays) and a thorough study of the actual time
scales and firing rates involved in the evolution of the neural activity underlying cognitive
processes for comparison with experimental data. However, the integrate-and-fire model,
although in itself a simplification of the original work by Hodgkin and Huxley [48], is
actually too elaborate to be simulated completely for a whole network of thousands of
neurons with current technology. One solution to this problem is to simplify the dynamics
using a mean-field approximation, at least for the stationary conditions, and to use this to
exhaustively analyse the bifurcation behaviour of the dynamics. This analysis enables the
selection of parameter regions that show the emergent behaviour of interest. Full nonstationary simulations using the true dynamics of the full integrate-and-fire scheme may then
be run using these parameters [41, 47, 49].
In order to model each cortical area a network of interconnected excitatory and
inhibitory neurons is defined. Within this structure the strength of connectivity can be
adjusted in order to allow the organisation of functional clusters [43]. This allows inputs from
other regions to be processed in the context of neuronal reverberation, cooperation and
competition biased by task-relevant information. Networks, representing each of the cortical

013123 (EmCAP) Annex I, vers. 3 (27/05/05) Approved by EC on 1 June 2005 page 10 of 101

areas involved in auditory processing will be connected according to the known architecture
of the auditory system. In order that the cortical model receives realistic inputs, it will be built
upon an existing modelling system for peripheral auditory processing [50], which includes
well-established models of cochlear and subcortical processing,
The proposed large-scale model of auditory processing will be far more extensive than
any previously developed, and will provide us with a common framework within which other
modelling advances emanating from this project can be incorporated.
Attention and active listening
An important objective of neurocomputational modelling studies within the project will
be to investigate attentional modulation of auditory processing. The integrated neurocomputational model, described above, will be used to investigate the role of attentional, or
top-down control, on bottom-up, stimulus-driven processing within the peripheral auditory
system, and in auditory streaming.
Auditory streaming is the phenomenon in which a sequence of sounds perceptually
splits into separate streams, the auditory analogue of visual figure-ground segregation. Once
streaming occurs, subjects lose the ability to recognise relationships between sounds falling
into different streams, and only within-stream patterns can be recognised [51]. Composers
typically take account of the features that cause streaming to ensure the perception of coherent
melodic or rhythmic patterns, or to create perceptual ambiguities. Auditory streaming appears
to be an innate function [17], therefore we propose to investigate streaming using the largescale model as described above, before the development of experience-dependent
representations.
The role of attention in auditory streaming is somewhat controversial, with some
arguing that streaming is a pre-attentive phenomenon [51], and others that it depends crucially
upon of the involvement of attention [52]. Previous models of auditory streaming have been
restricted to simple stimuli, and only very rudimentary models of attention have been used
[53-57].
Attention is much better understood in the visual than in the auditory system. The
notions of object-based attention, and the attentional modulation of low-level processing in
the form of biased competition, are important concepts in current theories of visual attention
[58-62]; however, these ideas have not so far been explored in audition. We have previously
shown how a biased competition model of visual object-based attention can explain many of
the phenomena of visual attention [38, 39, 42, 43, 49]. In this work, a large-scale hierarchical
model of the visual cortex, incorporating biased competition mechanisms at the neuronal
level, was used to simulate and explain visual attention in a wide variety of tasks. The
proposed model of auditory cortex is based upon this visual model, therefore it will allow us
to investigate to what extent the computational principles of biased competition can also
account for the attentive processing of auditory objects, and in particular, whether these
principles can account for auditory streaming.
The same modelling framework will be used to investigate the implications of top-down
modulation of subcortical processing. There is growing evidence that processing in the
peripheral auditory system is under attentional control. It has been known for some time that
there are extensive feedback projections to, and within, the subcortical auditory system [63],
but investigation of the effect of this feedback has only recently begun. The auditory cortex
can directly influence subcortical processing [64]; and there are observable effects on
responses, even at the level of the cochlea [64, 65].

013123 (EmCAP) Annex I, vers. 3 (27/05/05) Approved by EC on 1 June 2005 page 11 of 101

Although there are numerous models of peripheral processing, these models often exist
only in isolation and generally process stimuli in a strictly feed forward manner. To our
knowledge, there has not been any previous study of the neurocomputational implications of
an active peripheral system. In bats at least four effects of cortical control on peripheral
processing have been identified, including short term egocentric selection, long term
egocentric selection, gain control and shaping or retuning of responses properties [64]. These
properties will be investigated by including known feedback projections within the peripheral
auditory model. In the integrated model, the top-down signals will result from the attentional
modulation of processing in primary auditory cortex, which in turn modulate subcortical
processing; allowing us to investigate cognitive control of peripheral auditory processing. In
doing so this project will directly address a strategic goal of the IST programme, namely that
of creating an intelligent sensory periphery.
This will also influence auditory streaming, in that once the system has begun to
segregate a subset of the total incoming acoustic input, top-down signals associated with this
subset can be used to enhance peripheral processing selectively, causing increased cortical
activity in response to that subset; a possible explanation of the increase in perceived
loudness, or pop-out, of the foreground stream [51].
Development of experience-dependent abstractions
Further computational modelling studies will consider three aspects of auditory
perception particularly relevant to music cognition; namely timbre, rhythm and pitch. We
propose to investigate the development of experience-dependent representations and
processing strategies, and the emergence of perceptual categories in each of these cases.
i.

Timbre in the thalamocortical system

Timbre is not a well-defined concept in audition but the consensus is that it is related to
the spectral and temporal envelope properties of sounds. Here we propose to investigate how
the response fields of cells in primary auditory cortex support the representation of timbre. In
doing so we will also consider the computational the properties of this network, since the
primary auditory cortex and thalamus are tightly linked in a stereotypical network architecture
through feed forward and feedback connections.
The thalamocortical network is an important computational hub in the auditory system;
it is the point at which there is a sudden loss of phase locking to rapid fluctuations in the
stimulus and the neural code appears to change. The stimulus-determined response of the
thalamocortical network is frequently specified in terms of the spectrotemporal receptive
fields (STRFs) of the principal thalamic and cortical neurons involved in the network. STRFs
indicate influential factors determining the response properties of the cell, and are usually
measured at the level of a single neuron using the technique of reverse correlation [66-71].
It is important to note that the STRFs, by the nature of their analysis, only represent the
linear relationship between the stimulus and the response. Auditory cortical neurons have
highly nonlinear responses, in particular to natural stimuli, where the linearly derived STRFs
were found to predict, on average, only about 11% of the response power [71]. The
spectrotemporal response pattern of a single neuron is governed by the spatiotemporal activity
of the network in which the neuron is situated. The effect of the network on an individual
neurons STRF is likely to be: (a) nonlinear, in that the effect of any two neurons on a third is
unlikely to be linear owing to the nature of synaptic and dendritic integration mechanisms; (b)
nonstationary, in that the synaptic connections between neurons are not constant but change
as function of time, and (c) adapting, in that repetition of stimuli may cause the synaptic

013123 (EmCAP) Annex I, vers. 3 (27/05/05) Approved by EC on 1 June 2005 page 12 of 101

connections between neurons in the network to change. In addition, the local network is
subject to influences from non-local sources of activity that are determined by factors such as
stimulus context, attentional requirements and task demands.
In animal studies it has been found that the STRFs in primary auditory cortex are
formed during an early critical period through normal exposure to the acoustic environment
[72-74]. Although the nature of STRFs in human primary auditory cortex is obviously not
known, we have found suggestive evidence that they too might develop primarily through
early experience of speech sounds; in a recent study we found that ensembles of STRFS
constructed from fragments of speech stimuli can support the robust classification of other
sounds, and furthermore, the spectrotemporal properties of useful speech fragments had
similar characteristics to those measured experimentally in animals [75].
In this project, we propose to construct a detailed computational model of the
thalamocortical system, and to investigate how the dynamical properties of this network can
give rise to the experimentally observed spectrotemporal response fields, and the mechanisms
underlying the development of such response fields through exposure to different auditory
experiences [72, 76, 77]. A major advance will be obtained by incorporating this more
detailed model of the thalamocortical network within the large-scale cortical model, allowing
the development of experience-dependent representations in the integrated model. The ability
of the enhanced model to support the categorisation of ongoing auditory stimuli will be
investigated; in particular, the contextual role of intracortical signals in facilitating the
segmentation and categorisation of ongoing stimuli, and the attentional control of this process.
ii.

Rhythm and temporal expectancies

It is clear that any system interacting with an external world must become sensitive to
the timing of events in that world and the timescales appropriate to understanding different
events. In music abstracting the regularities due to rhythmic patterns allows the formation of
temporal expectancies which can facilitate perceptual processing, the integration of events
occurring at different time scales, and the generation of well-timed predictions, or actions. In
addition, it is known that attention can be focussed in time [78-80], and there is evidence for
the periodic predictive engagement of attention entrained to rhythmic stimuli [81].
Research in music perception has shown that time, as a subjective structuring of events
in music, is quite different from the concept of time in physics [82]. Listeners to music do not
perceive rhythm on a continuous scale. Instead, rhythmic categories are recognized and
function as a reference relative to which the deviations in timing are appreciated [83, 84]. In
fact, temporal patterns in music combine a number or time scales that are essentially different:
the discrete rhythmic durations as symbolized by, for example, the half and quarter notes in a
musical score; the continuous timing variations that characterize an expressive musical
performance; and tempo, the impression of the speed (or changes thereof) of the performed
pattern, which is related to the music theoretical notion of tactus [85] and the cognitive
process of beat induction [86].
A knowledge representation that makes the relationships between rhythmic structure,
tempo and timing explicit, and shows how expressive timing can be expressed in terms of the
temporal structure and global tempo, has been previously proposed [87] and will form the
formal basis for the current study. A central idea in this approach is the notion of rhythm
space [83, 88], i.e. the space of all possible performances of a small number of time intervals.
In this n-dimensional space every point constitutes a different temporal pattern. This infinite
set contains musical and unmusical rhythmic patterns, rhythms often encountered in music,
and those rarely used. The rhythm space captures, in principle, all possible expressive
interpretations in any musical style of any rhythm of n+1onsets. The cognitive process of
013123 (EmCAP) Annex I, vers. 3 (27/05/05) Approved by EC on 1 June 2005 page 13 of 101

extracting a discrete categorical representation from a performance can be described as a


mapping from a performance space into a score space and has been studied by determining
which sets of performances are considered interpretations of the same rhythmic pattern. This
rhythm space representation has been used to analyse the results of a series of experiments in
which musicians were asked to notate a large set of rhythmic patterns representing a
systematic sampling of the performance space. The results of this analysis demonstrated a
clear relationship between performed rhythm and the rhythmic categories recognized in
perception [89].
Even though the computational modelling of beat and meter induction has been
researched for some time human assignment of metrical information still outperforms existing
computational models of cognition. Humans are not only very precise in finding structural
information, they can also do it quickly and flexibly; e.g. they can easily distinguish between
rhythmic or tempo changes. We propose to investigate the emergence of representations and
categories in a model of rhythmic perception based on the notion of likelihood [90]. In this
approach, previously encountered patterns can be used to derive interpretations of new data in
terms of the most probable encoding. The model will be formulated to account for the
formation of rhythmic categories (rhythmic categorization), the influence of temporal context
(such as metrical structure, tempo and previous exposure) and the formation of rhythmic
projections or expectancies. This work will support extensions to the attentional model to
facilitate the processing of an attended object at appropriate times, and will also comprise a
key component in the interactive music system, allowing it to engage interactively with the
acoustic environment.
iii.

Relative pitch

Pitch is a fundamental perceptual attribute of communication sounds and music, and is


also a powerful cue for the grouping and segregation of sounds within a mixture [91].
Although there are many models of pitch perception they have some significant limitations.
Pitch relationships are an essential aspect of pitch perception. We generally experience
pitches not in isolation but in sequences and as part of higher-level cognitive structures. Many
experiments have shown that perceptual judgements of pitch are influenced by context;
judgements can be facilitated if the context is a melodic sequence [30, 92, 93], or a chord
sequence with tonality consistent with the target [93], but can be impaired by the presence of
a non-matching context [93]. Current models of pitch perception do not account for the
influence of context on perception.
Models of pitch perception generally focus on the representation of absolute pitch, a
perceptual phenomenon very few people actually possess [94]; although most people do have
a good sense of relative pitch, e.g. judging whether a note is in tune or not. There is a
tendency across all cultures for the categorisation of pitch relationships [95]; octaves play a
special role and usually tend to be subdivided into between 7 and 12 discrete categories,
usually unevenly spaced. There is also a clear preference for sub-divisions corresponding to
simple ratio relationships, octave 2:1, perfect fifth 3:2, perfect fourth 4:3, and so on. In
contrast to rhythm, the categorical perception of pitch relationships is widely recognised and
is implicit in the identification of perceptual tonal hierarchies that reflect the perceived degree
of similarity between pitches in Western music [96, 97]. There have, however, been very few
neurocomputational studies of relative pitch perception although [97] have proposed that the
filtering and phase locking properties of the peripheral auditory system together with the pitch
relationships between the notes comprising a chord can account for the harmonic preferences
in Western music. A different interpretation is suggested by results from a recent analytical
study, in which it was shown that many aspects of pitch perception, such as the pitch of the

013123 (EmCAP) Annex I, vers. 3 (27/05/05) Approved by EC on 1 June 2005 page 14 of 101

missing fundamental, and the spectral dominance region, as well as interval sensitivities and
the perception of consonance could be predicted from the characteristics of speech [29, 98].
The claim here is that if developmental influences are considered, and the problem is
reformulated in terms of predictions based on previous experience then many of the
phenomena of pitch perception, which have previously been difficult to explain, are a natural
consequence. What is currently lacking though is a neurocomputational model of this process.
A fundamental problem is that it is not yet clear how pitch is represented in the brain
[99, 100]. There is evidence for the extraction and representation of periodicity within each
frequency channel, subcortically [101, 102], and in AI [103]. There is also evidence that
cortex is involved in pitch-related processing; for example, a pitch onset response has been
identified in Heschl's gyrus [104]; fMRI evidence has been found for pitch sensitive-regions
in cortex [105, 106], and for the separate representation of relative and absolute pitch [107];
and lesion studies have shown that interactions between prefrontal and temporal regions of
cortex are necessary for the perception of tonality [108]. However, a more detailed theoretical
account of the representation of pitch at the neuronal and network level is lacking.
In this project, the modelling of pitch perception will be informed by the neonate
experiments, which will identify for the first time those aspects of pitch perception that are
innate, and those which develop through experience. The problem of pitch perception will be
considered in terms of active perception, motivated by the idea that the brain is constantly
trying to abstract regularities from the stimuli it experiences. We propose to apply this
approach to the problem of extracting the regularities associated with the abstraction of pitch,
and aim to formulate a model which can account for the development of discrete pitch
categories, preferred pitch interval relationships and contextual influences on pitch
judgement, in addition to the well-documented characteristics of the perception of individual
pitches.
Extraction of theoretical insights
The fundamental hypothesis that cognition emerges through active perception of the
environment is a guiding principle of this project. The idea that music cognition, even in the
absence of overt behaviour, depends upon the development of expectations, conditioned by
the current musical context and by previous musical experience, is well accepted, but a
detailed theoretical understanding of this process has yet to be developed. We expect
therefore that the proposed experimental and computational studies will suggest many
important theoretical and computational principles of music cognition, and its autonomous
development through experience. By extracting and explicitly identifying these essential
principles, the project will contribute significantly to furthering understanding of the
emergence of autonomous complex cognitive behaviour and its realisation in artificial
systems.
Although cognitive musicology is a comparatively new discipline, and until fairly
recently music was seldom studied scientifically, it is now recognized, along with vision and
language, as an important and informative domain in which to study a variety of aspects of
cognition, including expectation, emotion, perception, and memory [109-111]. Much of the
early research in the field was criticized for focusing too much on low-level issues of
sensation, often using impoverished stimuli (e.g., small rhythmic fragments) or music
restricted to the Western classical repertoire, as well as for a general lack of awareness of the
role of music in its wider social and cultural context [109], and it is only recently that the
neuroscientific basis for music cognition has begun to be explored [34]. However, this is

013123 (EmCAP) Annex I, vers. 3 (27/05/05) Approved by EC on 1 June 2005 page 15 of 101

hampered by the lack of comprehensive detailed theoretical or neurocomputational models of


auditory processing.
The dependence of higher-level cognitive functions such as music or language cognition
on experience and cultural exposure is obvious but it is usually assumed that more basic
auditory functions are innate. However, recent work in natural scene statistics suggests that
the nervous system might exploit the statistical structure of natural scenes to form useful
representations of the environment even at a more fundamental level. In addition, the work of
Winkler, Ntnen and their colleagues has shown that the creation and maintenance of
predictive models of the acoustic environment is a generic and innate function in the auditory
system [10, 17, 112]. In this project we consider development as an active exploration of the
environment in which the nave system gradually develops useful representations and
abstractions by using mismatches between the expectations of its predictive model and
incoming stimuli to drive learning. This approach has not previously been applied in
neurocomputational modelling studies of auditory processing, and is likely to lead to many
new theoretical insights.
In order to ensure that we take advantage of the benefits offered by the diversity of the
work in this project we propose to explicitly identify and document the theoretical insights we
gain from each of the investigations and to use them to formulate a generic functional
computational architecture for intelligent perception and cognition. Furthermore, the utility of
the computational principles derived through neurobiological modelling studies, will be
assessed through the implementation of an emergent music processing system, thereby
allowing us to verify and refine our conclusions.
An emergent interactive music processing system
An essential ingredient of interactive communication is the ability to predict the
communication sequences of others, and it has been argued that a process of model alignment
is fundamental to human interactions [113]. Similarly, the success of interactive music
improvisation depends to a large extent on consistency between the models created by each
participant, and therefore the degree to which they can predict each others behaviour. While
the creation of a fully-fledged artificial music improviser is some way off, work in this project
addresses directly the fundamental problems of creating and maintaining predictive models of
the acoustic environment, and the role of developmental experience in shaping these models.
For this reason investigations into the role of attention in segregating incoming sounds into
perceptual streams are important as this ability would underlie the formation of multiple
models; essential for interactions in a realistic music environment.
Most work in interactive music systems has so far focussed on music performance, i.e.
at the behavioural level. Robert Rowes book Interactive Music Systems [114] lays out a
conceptual framework which is still considered today as the key reference for the discussion
and evaluation of such systems. In this view, interactive computer music systems are those
whose behaviour changes in response to musical input, allowing them to participate in live
performances of both notated and improvised music. A number of interactive music systems
have been developed. One of the first, Cypher [114], was based upon the society of mind
theory [115] and modelled music cognition as the cooperative activity of different agents;
each programmed to perform a specific predetermined musical task. More recently, the
Continuator system [116], has achieved notable success in real time interaction with other
musicians. This system generates predictions using a hidden Markov modelling approach
commonly used in speech recognition systems, demonstrating the feasibility of learning to
generate stylistically and contextually consistent musical passages. Other recent work in the

013123 (EmCAP) Annex I, vers. 3 (27/05/05) Approved by EC on 1 June 2005 page 16 of 101

development of interactive music systems has focused on understanding the interactions that
occur during performance with acoustic musical instruments. These interactions are very
complex and engage several communication channels (tactile, haptic and kinesthetic in
addition to sonic) [117]. In this approach the idea is that interactive music systems should be
able to affect and modify the performers expected actions, thus provoking a permanent
dialog between the performers and the system.
The focus in this project is rather different in that we are interested in understanding the
mental processes underlying music cognition, the factors which determine the creation of
contextual models, the controlling of attention in time, and experience-dependent
developmental processes, all of which are necessary to support the emergence of autonomous
intelligent behaviour. We propose to apply the computational principles derived through the
perceptual and computational modelling studies in the implementation of an emergent music
processing system, the Music Projector. In this system the autonomous development of
internal musical codes and expectancies, and phenomena such as categorization, similarity
ranking, and streaming will be investigated. The system will also synthesize as musical
output, sounds corresponding to the expectancies generated in response to musical stimuli.
This will allow us to compare the musical perceptions and predictions of the artificial system
with music cognition in humans; thereby closing the experimental loop outlined in the
introduction.
b. Summary of innovative aspects of the project
The findings from the experimental investigations will inform the computational
modelling studies, by indicating which aspects of perception should be learnt through
exposure to an acoustic environment, and which aspects can reasonably be hard-wired into
the models, a priori. Perceptual experiments will provide new insights into:
Innate processing of musically relevant abstract acoustic features, such as pitch,
rhythm and sequential grouping;
Innate abilities to create model-based predictions in response to acoustic stimuli;
Relationships between language and formal structures in music;
Timbral influences on the perception of musical form.
The extensive neurocomputational modelling studies, formulated with the aim of
furthering understanding of music processing in the biological system will provide us with
many new insights into the processing strategies underlying cognition. Innovative aspects of
the modelling studies include:
The development of a large-scale model of auditory processing, far more extensive
than any previously developed, which incorporates active control of peripheral
auditory processing mediated through a detailed model of the thalamocortical
system, and a model of prefrontal cortex which supports aspects of working
memory;
Investigations into whether the computational principles of biased competition can
also account for the attentive processing of auditory objects and for auditory
streaming;
Investigations into the attentional control of peripheral auditory processing;
Investigations into the development of experience-dependent representations and
processing strategies, and the emergence of perceptual categories of timbre, rhythm
and pitch;

013123 (EmCAP) Annex I, vers. 3 (27/05/05) Approved by EC on 1 June 2005 page 17 of 101

Investigation into the development of spectrotemporal response fields through


exposure to auditory experiences, and the resulting ongoing segmentation of
auditory stimuli, and categorisation of timbral patterns;
Development of a predictive model of rhythmic perception which can account for
the formation of rhythmic categories, the influence of temporal context and the
formation of temporal expectancies;
Formulation of an active model of pitch perception, in which representations emerge
through the abstraction of regularities present in the acoustic stimuli experienced.

The compilation of theoretical insights and the implementation of an artificial system


that embodies our theoretical understanding will allow us to evaluate the utility of these
principals through comparisons between the performance of the artificial system and that of
human subjects. Innovative aspects of this project include the:
Identification of important computational principals underlying cognition;
Formulation of a generic functional computational architecture for perception and
cognition;
Implementation of an emergent interactive music processing system, based upon the
principals identified, which develops autonomously through experiencing musical
stimuli.
c. Relevance of the project to IST and FET objectives
Communication and cooperation between humans and computer systems in a shared
task is a largely unsolved problem, and the challenge of creating an autonomous system,
capable of complex interactive behaviour, remains one of the major challenges facing
information systems today. In addressing these issues, this project clearly meets the objectives
of FET in fostering innovative, high-risk research. The project also addresses the more
specific objectives of a number of FET initiatives such as Neuro-IT, Bio-i3, Life-like
Perception Systems and Beyond Robotics; for which the major source of inspiration for the
development of new computational paradigms lies in neuroscience, and in the understanding
of life-like perception and cognitive processes. Our proposal is fully consistent with these
ideals. The project benefits from having a clearly defined focus in music cognition, yet
simultaneously far-reaching implications for a more general understanding of perceptual and
cognitive processes in the brain, and for the creation of artificial cognitive and interactive
systems.
As we will focus primarily on basic research topics, we do not anticipate immediately
tangible returns in the form of commercial applications. However, the theoretical advances
that the project requires will certainly be of interest to others in music and information
technology and healthcare applications. The interactions between neuroscience and computer
science central to this project have strong potential for spin-off technologies and applications.
There are many applications which would benefit from more autonomous, intelligent and
flexible processes, not adequately provided by present technology. The ability of artificial
systems to autonomously learn to interact with others meaningfully and productively in
realistic environments goes well beyond the current state of the art. Approaching human
capabilities even on a small scale and for a restricted class of natural stimuli is likely,
therefore, to prove of immense value in this respect.
By providing new insights into biological perception and cognitive systems, our results
could fertilise new developments in artificial intelligent information systems, significantly
enhancing their ability to interact with people. A fundamentally significant advance in this
013123 (EmCAP) Annex I, vers. 3 (27/05/05) Approved by EC on 1 June 2005 page 18 of 101

respect will result from the ability of an artificial system to create and maintain abstract
models, at many different levels, of its own behaviour and that of other parties, and to learn to
attribute meaning to patterns of sensory inputs and to generate meaningful sequences of
actions in response to others.
Finally, an important objective of IST Call 3 is that of facilitating the participation of
organisations from New Member States in the activities of IST. A leading participant in this
project is based in Hungary and funding for the project will support the establishment of a
field laboratory for electrophysiological experiments on newborn babies. The cognitive
abilities of neonates are extremely difficult to assess and the proposed method is one of the
very few feasible ways for doing so; therefore this work is likely to prove foundational for
developing unique capabilities in a new member state and, particularly for developing
screening programmes for the early detection of problems in auditory perception which go
beyond simple audiometrical measurements. For this reason, the proposal clearly also relates
to IST-NMP-2 and the development of health monitoring systems.

5. Potential Impact
If the project achieves its ambitious aims, its impact will be large. Major contributions
will relate to enhancing scientific understanding of the cognitive capabilities of neonates and
the computational principles that underlie intelligent perception and cognition. A number of
important technological advances could also stem from this work, particularly in the area of
hearing prosthesis, and enhanced functionality for artificial systems. Finally, the work also
has the potential for significant societal impact in the development of improved hearing
screening programmes, and in the sphere of music education and entertainment. In this
section we expand upon some of the ways in which this project could have an impact, but
firstly we consider why it should be conducted at a European level.
The scope of the project requires that scientists from a range of disciplines collaborate,
including those active in theoretical and computational neuroscience, sensory perception and
cognition, experimental psychology, music technology, musicology and composition. The
proposed project requires participation at the European level because the requisite range of
expertise and mass of critical resources cannot be found at the national level. Furthermore, in
the next decade artificial systems and devices with perceptual capabilities will become an
important part of many people's lives. The development of perceptual and cognitive systems
that are sufficiently flexible to operate in the same environment and the same conditions as
humans do, will therefore have a strong impact on the future organisation of daily life. If
Europe is to help shape this future, it is important that such developments occur in Europe and
that critical expertise and know-how is assembled in its member states. In addition, the project
will make a strong contribution to training scientific specialists in Europe in interdisciplinary
topics, following the trends and fulfilling the requests made by an increasing number of
academic institutions. The young investigators taking part in this project will gain skills in
areas that are of fundamental importance to European scientific and economic success.
The cognitive capabilities of neonates are not well understood and are difficult to
investigate. This project will significantly advance understanding of these capabilities, and
may therefore provide important knowledge for helping to devise effective hearing screening
programmes in the future. The early detection of perceptual dysfunction is likely to have
profound effects, since prompt treatment at this time generally has a better chance of success
due to higher brain plasticity in early life, and also minimises related problems such as
deficient communication, late onset of speech, or difficulties in social interaction. Hearing aid
technology could take advantage of the processing mechanisms we find to be important for
013123 (EmCAP) Annex I, vers. 3 (27/05/05) Approved by EC on 1 June 2005 page 19 of 101

intelligent perception; for example, by automatically detecting and emphasizing unexpected


and potentially significant sounds in the environment. A more ambitious possibility lies in
improving processing strategies for cochlear and brain stem implants. There is currently
considerable interest in the brain control of prosthetic devices, for example robot control by
paralysed people [118, 119]. Recent findings indicating attention can affect processing in the
cochlea [64] suggest that cochlear and brainstem implants similarly have the potential to be
responsive to active control. By exploring the functional implications of active control of
peripheral auditory processing, this project will uncover ways in which processing strategies
for cochlear and brainstem implants could be improved; making these devices responsive to
their wearers, and thereby enhancing their quality of life.
There is now a growing interest amongst auditory neuroscientists in investigating the
representations and processes involved in processing complex sounds in the auditory cortex.
Because of the complexity of the biological system, research in auditory perception, as in
other modalities, urgently needs theoretical guidance from systems-level models. Most
existing models in audition have focussed on aspects of subcortical processing, and so far
modelling studies have not addressed to any great extent the representations and processing
employed in cortex, the influence of experience, or the role of attention in auditory
perception. The proposed large-scale model of auditory processing will constitute a huge step
forward in this respect by simultaneously allowing direct comparison with human perceptual
and cognitive behaviour, as well as with physiological data at the neuronal and network
levels, using complex, meaningful and realistic stimuli.
In formalising the principles of cognitive processing the project has the potential to
make a large technological impact. The theoretical insights into the processes underlying
intelligent perception and cognitive systems could fertilise new developments in artificial
intelligent information systems, significantly enhancing their ability to interact with people.
An important breakthrough would result from discovering how artificial systems could
automatically create and maintain predictive models of their own behaviour and that of other
agents in their environment; a fundamental requirement for autonomous behaviour. Such an
ability would allow artificial systems to learn to attribute meaning to patterns of observations,
and to generate meaningful sequences of actions in response to others. There are many
potential applications for this technology including in the immediately obvious area of
musical education and entertainment systems, as well as in more general applications that
involve humans and computers working together on shared tasks.
In summary, we have identified a number of potential routes for exploitation, and an
important goal will be to make effective and practical use of the results of this project. For
this reason, in addition to actively participating in scientific meetings, we have explicitly
planned to communicate the findings of the project with potentially interested parties, beyond
those that we would normally expect to encounter at such meetings; including, for example,
healthcare professionals, interested in developing hearing screening programmes or
improvements to prosthetic devices; technologists interested in exploiting this work in
developing commercial applications which require intelligent autonomous or interactive
behaviour; and educationalists who see possible applications in musical education or in
extending the public awareness of science.

013123 (EmCAP) Annex I, vers. 3 (27/05/05) Approved by EC on 1 June 2005 page 20 of 101

6. Project management and exploitation/dissemination plans


6.1 Project management
The University of Plymouth, which has had extensive prior experience in the
management of research projects both at the national and international level, will be
responsible for the overall scientific and administrative management of the project. The
project coordinator will be Dr Susan Denham of the Centre for Theoretical and
Computational Neuroscience, University of Plymouth, assisted by an administrative
coordinator, Dr John Martin of the Research Support and Development Office, who will
provide financial management and control. The project is organised into workpackages each
of which will be the responsibility of a designated workpackage coordinator.
Project Steering Committee
The project coordinator and workpackage coordinators will constitute a steering
committee responsible for major decision-making in the project and for ensuring that the
consortium fulfils its contractual obligations. This committee will meet at regular intervals (at
least every six months) to discuss, plan and monitor project progress, as well as before key
review and deliverable dates; the first meeting will take place within the first three months of
the commencement of the project. The project coordinator will convene all meetings of the
committee, circulate the agenda beforehand, and draft and communicate the minutes of each
meeting. The purpose of these meetings will be to maintain the focus of the project, to assess
the deliverables due, to review the progress to date and outstanding tasks, and to provide
coordinated direction for the following period. The committee will also be responsible for
agreeing remedial actions to be taken in the case of a default by any member of the
consortium, and for deciding upon any major changes in workpackages. Decisions of the
committee will be made on the basis of consensus. However, if this is not possible, decisions
will be made on the basis of a majority vote, with one vote per workpackage coordinator, and
the project coordinator having the casting vote. In order for the decision to be valid, a
minimum quorum of five committee members should be present.
Project coordinator
The project coordinator will be the intermediary between the project and the
Commission, and will have overall responsibility for meeting the project objectives and for
ensuring that deliverables are achieved. The project coordinator will remain the focal point of
the project and will liaise with any other related projects as appropriate. Specific
responsibilities of the project coordinator include:
submitting reports and deliverables to the Commission;
receiving and disseminating deliverables to and between workpackage
leaders;
ensuring efficient management of tasks within the consortium;
mediating and managing conflict between partners;
establishing efficient communications within the consortium;
monitoring the quality and timeliness of deliverables, together with the
workpackage leaders;
monitoring risk elements and adjusting manpower assignment, together with
workpackage leaders;
013123 (EmCAP) Annex I, vers. 3 (27/05/05) Approved by EC on 1 June 2005 page 21 of 101

coordinating the consortium's representation at major meetings that are likely


to generate useful feedback;
coordinating bi-annual meetings of the Project Steering Committee and
irregular meetings of smaller groups;
coordinating publications and ensuring that authorships conform to accepted
standards.

Administrative coordinator
The administrative coordinator will provide:
ongoing monitoring of budget use of all partners;
receipt of all payments made by the Commission, and transfer of funds to
members of the consortium according to the agreed budget;
liaison with workpackage leaders to ensure consistency of project progress
and project expenditure;
coordination of progress reports after each 6-month period;
coordination of mid-term report, interim report, and final report;
coordination of all contractual issues: such as project contract, amendments,
collaboration agreements, and audit certificates.
Workpackage coordinators
The project is divided into nine workpackages, described in detail in section 7, each of
which will be led by the workpackage coordinator designated below. Workpackage
coordinators will be responsible for:
planning the scientific and technical work of the workpackage;
monitoring workpackage progress relative to the project plan in order to
ensure that project timescales are maintained;
reporting any problems or slippages promptly to the project coordinator;
initiating remedial action plans in the event of project deviations;
ensuring that relevant information regarding their work is communicated to
the project coordinator (and to other members of the consortium where
appropriate) promptly and accurately;
ensuring that the objectives and milestones of the workpackage are achieved;
ensuring that deliverables are available on time.
WP
number

WP description

WP leader

Management, communication and documentation

Susan Denham
(UoP)

Higher level auditory functions underlying music


perception: Innate vs. learned operations

Istvn Winkler
(MTAPI)

Perception of music form

Eduardo Miranda
(UoP)

Prefrontal cortical function in the control of attention and


short term memory

Gustavo Deco
(FUPF)

013123 (EmCAP) Annex I, vers. 3 (27/05/05) Approved by EC on 1 June 2005 page 22 of 101

Spectrotemporal response fields in the thalamocortical


system

Michael Denham
(UoP)

Perception and categorisation of rhythmic patterns

Henkjan Honing
(UvA)

Active perception, relative pitch and the emergence of


tonality

Susan Denham
(UoP)

Theoretical insights into music cognition

Susan Denham
(UoP)

Interactive Music System

Xavier Serra
(FUPF)

Ensuring effective integration


Since we recognise that the goals of the project are very challenging and its success will
depend upon the proper interactions between the different levels of work being conducted by
each of the members of the consortium, the essential points of interaction have been explicitly
identified in the work plan. The timing and nature of the interaction are defined, the
participants identified, and where appropriate an associated deliverable has been specified. In
the event that the requisite interaction fails to occur, the workpackage leaders responsible will
notify the project coordinator, and a revised schedule will be agreed. In addition, the agenda
for meetings of the Project Steering Committee will always contain an explicit item for the
reporting of collaborative work and interactions during the previous period.
Communication mechanisms
The primary means of communication will be the six-monthly meetings of the Project
Steering Committee, supplemented by the reporting system, irregular bilateral meetings, and
day-to-day email exchanges. Reports will include quarterly summaries by the project
coordinator as well as the formal project reports. Communication and liaison throughout the
consortium will also be maintained by additional contacts between workpackage leaders, as
required by the project work plan.
A dedicated web site will assist communication between consortium members and
between the consortium and the public. It will be used for publishing the periodic project
reports, updating meeting schedules, and for sharing the material produced in the
workpackages. Depending on the nature of the information, this will be accessible to the
public (scientific results, publications, etc.) or accessible only to the consortium members
(partial results, software in development, etc.).
Finally, some workpackages involve contributions from more than one participant. In
these cases it may prove advantageous to send junior members of the collaboration to spend
some weeks or months at one of the other institutions. Funds have been allocated in the
budget for these activities, and we will organise economical accommodation (in student
housing facilities) so that we can decide on such visits in a flexible manner. We also plan to
install a voice-over-IP system, such as Skype, to facilitate multiparty discussions between
collaborating consortia members.

013123 (EmCAP) Annex I, vers. 3 (27/05/05) Approved by EC on 1 June 2005 page 23 of 101

Communication with the European Commission


Progress will be reported to the Commission at the end of each 12-month reporting
period. Progress reports will summarise the progress within each workpackage, with
particular attention to deliverables and milestones achieved. A timetable of deliverables is
included in section 7.2 of this document. A final report will be submitted at the end of the
project.
Risk management and quality assurance
The work plan is broken down into nine workpackages and for each a clear set of
objectives and deliverables has been defined. This division, which reflects the research
interests and expertise of the consortium and has been agreed by all members, provides the
basis for efficient control and follow-up on the progress of the project. The coordinator will
monitor progress together with the workpackage leaders, and inform the Project Steering
Committee of risks, delays, or other factors that appear likely to lead to departures from the
work plan. The Project Steering Committee will then determine the appropriate response
actions, which might include the redistribution of resources (manpower) within the
consortium, the negotiation of modified targets, and/or the proposal of changes to the work
plan to be negotiated with the Commission.
Conflict management
In order to provide a clear legal basis for the entire project life cycle, the consortium
will agree and sign a consortium agreement to regulate intellectual and industrial ownership,
authorship conventions, publicity, and confidentiality. In the event of any disagreement
between partners on the project, which the partners concerned cannot resolve, the project
coordinator will attempt to mediate. If any disagreement remains, then the matter will be
presented to the full Project Steering Committee. The partners in dispute will be governed by
the majority decision of all members of the Project Steering Committee not involved in the
dispute.
Consortium agreement
A consortium agreement will be signed by all of the partners upon commencement of
the project. It will detail issues such as Intellectual Property Rights, prior knowledge, rights to
exploitation, and the responsibilities of all of the parties involved in the project. Legal
advisors of the Research Support and Development Office at the University of Plymouth will
circulate a draft agreement to all consortia members for examination and amendment during
the negotiation period.

6.2 Plan for using and disseminating knowledge


Plans for Exploitation and Dissemination
The work in this project will be disseminated primarily through presentations at relevant
scientific conferences and meetings, and through the publication of papers in the leading peerreviewed journals. The scientific communities who are likely to be interested in this project
include developmental and cognitive neuroscientists and psychologists, experimental
neuroscientists and theoreticians interested in the computational principles underlying
auditory perception and cognition and in theoretical models of brain function, cognitive

013123 (EmCAP) Annex I, vers. 3 (27/05/05) Approved by EC on 1 June 2005 page 24 of 101

musicologists and music technologists, and technologists interested in enhancing the


capabilities of artificial systems in interacting with people. In addition, aspects of the project
which are likely to interest the general public will be disseminated through press releases and
articles in popular scientific magazines, e.g. New Scientist; and, as described in section 6.3,
we plan to produce a multimedia promotional kit in order to facilitate communication with the
public.
A web site will also be established; partly to facilitate communication between project
participants and partly to communicate to the wider scientific community and interested
members of the public important aspects of the work we are undertaking, as well as our
findings. The project will culminate in the organisation of a workshop to which we will invite
leading scientists working in the field as well other parties whom we identify as having a
potential interest in the exploitation of our work. We also intend to publish a collection of
papers stemming out of the workshop in the form of a book in order to disseminate our
findings as widely as possible.
In planning this project, we have identified a number of potential routes for exploitation,
and an important activity in WP7 will be to take account of the achievements of the project in
order to make effective and practical use of our results. Our intention is to seek out and
communicate with those whom we identify as having a potential interest in taking advantage
of our findings, beyond those we would normally expect to encounter at scientific meetings.
Such parties might include healthcare professionals, interested in developing hearing
screening programmes or improvements to prosthetic devices (hearing aids, cochlear
implants, etc); technologists interested in exploiting this work in developing commercial
music systems, or more generic applications which require intelligent autonomous or
interactive behaviour; and educationalists who see possible applications in musical education
or in extending the public awareness of science.

6.3 Raising public participation and awareness


Although this project is very challenging scientifically, many of the issues that we will
address are also likely to be of interest to the public at large. For this reason we will acquire,
prior to the start of the project, a specific internet domain that will clearly identify the project
(e.g. www.emcap.org or www.activeperception.org ), which we will use as the base for
intensive dissemination of knowledge and activities emanating from the project. The public
website will contain:
a description of the project and its goals, and the project consortium (with
links to other sites and associated information);
all public project documentation;
a discussion forum for all the EmCAP community (including project partners,
interested members of the research community, and members of the public
potentially interested in the project deliverables);
a news repository that will cover not only news from the project, but also
related news from around the world;
an electronic compass which will introduce newcomers to the field of musical
neuro-cognition with links to relevant tutorials, papers, researchers and
projects;
a special section offering non-technical explanations of the goals and
achievements of the project.
In order to foster the public appreciation of the work of this project we will produce an
EmCAP Multimedia promotional kit, which will be presented at professional, scientific and
013123 (EmCAP) Annex I, vers. 3 (27/05/05) Approved by EC on 1 June 2005 page 25 of 101

popular meetings, such as the European Science Week. It will contain an interactive
presentation of the project as well as other elements such as those above, also downloadable
from the projects web-site. The production of version 1 of this kit is scheduled for month 6,
by which time the project is expected to be well under way and the first deliverables
available; it will subsequently be updated periodically throughout the remainder of the
project.
All members of the consortium are affiliated to educational institutions, and during the
course of the project they will offer graduate courses (for PhD and Masters students) about
topics related to the project. In addition, the holding of highly focussed short courses (either
industrially or academically oriented) will be considered as the project matures.
Other aspects of the project which are likely to be of interest to the wider public will be
publicised and communicated as and when sufficient information becomes available; these
include the findings relevant to infant hearing screening programmes, and the possibilities for
early detection and intervention; cultural and language influences on music perception and
cultural differences in the perception of music which could be used to guide the selection of
suitable music for Europe-wide media (e.g. in advertisements); possibilities for improving and
enhancing the learning experience in musical education; and the relevance to contemporary
music practice which may be of interest to composers, and which we will showcase during the
regular public Contemporary Music Weekends which are held biannually under the auspices
of Peninsula Arts in Plymouth.

7. Workplan for whole duration of the project


7.1 Introduction - general description and milestones
i.

Research, technological development and innovation related activities

The ultimate goal of understanding how complex cognitive behaviour can emerge
through experience, and how useful representations and processes result from the need to
interact effectively within an environment, is a very challenging one. In this project we will
restrict ourselves to musical environments and will address these questions with regard to
musically meaningful stimuli. The project is therefore structured so as to allow us to
investigate the perception of musically relevant sounds and sound features experimentally and
through theoretical and modelling studies, and also to distil from these the essential elements
for an artificial interactive system. The research work falls into three distinct areas as
described in the scientific case for support, and this is also reflected in the organisation of the
workpackages. In this section the major milestones and information flows are identified,
while the details of timing, the parties involved and the nature of the interactions are specified
in each of the workpackages.
Experimental investigations into the perception of musically relevant stimuli
WP1: Higher level auditory functions underlying music perception: Innate vs. learned
operations.
Music perception is subserved by auditory processing, a large part of which, including
some higher-level analyses, is regarded as being possibly innate [5]. In previous work, we
have established that several higher-level auditory processes can operate without attention

013123 (EmCAP) Annex I, vers. 3 (27/05/05) Approved by EC on 1 June 2005 page 26 of 101

being focused on sound [12, 120] and that the operation of these processes can be studied
with the mismatch negativity (MMN) event-related brain potential [112]. Because, the MMN
method can be employed to study auditory processing in neonates [14, 15], it allows us to test
whether some of the higher-level auditory functions underlying music perception in humans
are innate. Specifically, we will ask the following questions: 1) Do neonates form groups
from repeating pitch patterns (milestone 1, month 14); 2) Do neonates process pitch
independently of other spectral sound features (timbre) (milestone 2, month 20); 3) Do
neonates process relative pitch (i.e., equate equal musical steps independently of absolute
pitch) (milestone 3, month 28); and 4) Are pitch steps, which have a special relevance in
western music processed preferentially in neonates (milestone 4, month 36)? Because
studying question 4 is dependent on the results of the experiment testing question 3, an
alternative question is considered: Do neonates form groups on the basis of repeating
rhythmic patterns?
These studies are detailed in WP1. The stimuli for the experiments will be designed in
consultation with the other groups. By identifying those aspects of perception which can be
reasonably be hard-wired and those which should be learnt through experience of sounds,
these findings will provide fundamental information to guide computational modelling; in
particular, the models of pitch perception (WP6) and rhythm (WP5). In general, in trying to
design an artificial system capable of autonomous behaviour, it is important to understand the
extent to which representations and processes should reflect the stimuli experienced during
development; these results will therefore provide important theoretical insights into design
principles for artificial cognitive systems (WP7), and guidelines for the development of the
interactive music system (WP8).
WP2. Perception of musical form
Musical form arises from the sequential combination of a coherent set of sounds. The
perception of musical form is commonly thought to arise from the interplay between
expectation and surprise [121], normally investigated in terms of pitch, tonality and rhythm,
e.g. [89, 122-124]. Although there is a great deal of understanding about expectations in
response to sound sequences, the strategies the brain employs to follow pieces of music on a
larger scale are not yet well understood. In these studies we will investigate the role of the
timbral properties of sounds in the formation of long-term expectations, and will relate this to
the concepts of predictive modelling and abstraction. We will investigate the role of timbre
though an analytical study of the relationships between typical musical structures and the
timbral patterns in speech, and by means of perceptual experiments.
These studies are detailed in WP2. The findings will inform computational modelling
studies, particularly those relating to the perception of pattern sequences and streaming (WP3)
and pitch perception (WP6). They will also provide theoretical insights (WP7) into how prior
experience with other sound classes, such as speech, can affect the development of
expectancies in music. The experiments and the musical stimuli used will be formulated in
collaboration with researchers working on the interactive music system (WP8) since the
results will be directly relevant to the functionality of the system as well as for setting
performance goals.
Computational modelling of important components subserving music cognition
WP3. Prefrontal cortical function in the control of attention and short term memory

013123 (EmCAP) Annex I, vers. 3 (27/05/05) Approved by EC on 1 June 2005 page 27 of 101

The notions of object-based attention, and attentional modulation of low-level


processing in the form of biased competition, are important concepts in current theories of
visual attention; however, these ideas have not so far been explored very much in audition. In
previous work we have shown how a biased competition model of visual object-based
attention can explain many of the phenomena of visual attention [38, 39, 42, 43, 49]. This
theoretical framework has allowed us to integrate empirical evidence provided by
experimental measurements at different levels of cognitive neuroscience, including single-cell
recordings, network activity in measurements of fMRI and EEG, and behavioural responses.
Here we will develop a similar large-scale neurocomputational architecture for auditory
processing, to include a neurobiologically realistic model of the primate auditory cortex,
auditory areas I and II, the superior temporal gyrus and subcortical auditory processing. We
will use the cortical model to investigate to what extent the computational principles of biased
competition can account for the attentive processing of auditory objects within a musical
context, and whether these processes can account for auditory streaming [51].
In order to investigate the role of working memory in contextual processing the largescale cortical model described above will also be extended to include an explicit model of
cortical prefrontal brain areas (dorsolateral and inferior frontal regions) [40], and their
coupling with the superior temporal gyrus [125, 126]. This model will then provide the basis
for investigations into the role of working memory in perceptual constancy and perceptual
categorization in audition.
These studies are detailed in WP3. The large-scale neurocomputational model will form
the basis for collaborative work with other modelling studies, and will be extended and
enhanced to incorporate modelling advances emanating from other workpackages in the
project. The baseline model of peripheral auditory processing used in the investigations into
active peripheral processing (WP6) will be used initially to supply realistic patterns of activity
to the cortical model, and subsequently the attentional modulation of auditory processing in
cortex will be used to generate top-down control signals, thereby also providing attentional
control of peripheral processing. The detailed model of the thalamocortical network, and the
representation of dynamic receptive fields within this network (WP4), will also be included in
the large-scale model towards the end of the project. In addition, the role of working memory
in maintaining global pitch information to support the extraction of relative pitch and the
emergence of tonality will form the basis for a collaborative modelling study with those
investigating relative pitch perception (WP6). The working memory study will also be
informed by the perceptual data regarding spectral expectations (WP2). Important theoretical
insights (WP7) will be gained from understanding the essential principles of attentive
processing in audition, and these will also inform the design of the interactive music system
(WP8).
WP4. Spectrotemporal response fields in the thalamocortical system
The stimulus-determined response of the thalamocortical network is frequently
specified in terms of the spectrotemporal receptive fields (STRFs) of the principal thalamic
and cortical neurons involved in the network; usually measured at the single neuron level
using reverse correlation [66-69, 71]. Traditionally, receptive field properties of neurons have
been related to features in the stimulus, as in the visual system in the seminal work of Hubel
and Wiesel [127]. This view is also prevalent in attempts to understand auditory processing.
However, for assigning meaning to an auditory stimulus, e.g. identifying that the stimulus is
part of a melody or is related to some coherent component of a complex auditory scene,
categorisation may play a more important part than feature analysis. Thus whilst it may be
important to understand how specific features in a stimulus drive the neuronal response, it is
013123 (EmCAP) Annex I, vers. 3 (27/05/05) Approved by EC on 1 June 2005 page 28 of 101

arguably more important to understand how the stimulus-induced activity is combined with
intrinsic cortical activity, which is directly related to the brains current understanding of the
meaning of the stimulus, as determined by its context and its role in the ongoing perceptual
task.
In order to explore these ideas, we will construct a detailed computational model of the
thalamocortical and associated intracortical network, and replicate within the model the
experimentally observed STRFs for a selected subset of cortical and thalamic neurons. We
will use the model to investigate mechanisms of self-organisation and plasticity in the
development of STRFs through exposure to different auditory experiences, in a manner
consistent with experimentally observed modifications of STRFs in early development [72,
76, 77]. The ability of the model STRFs to support the categorisation of ongoing auditory
stimuli will also be studied; in particular, the role of intracortical signals in the model in
modifying the STRFs in a way which facilitates and improves the categorisation of ongoing
stimuli.
This work is detailed in WP4. As discussed before, there is clearly considerable overlap
between the detailed modelling of the thalamocortical networks underlying auditory STRFs in
WP4 and the large-scale modelling of auditory cortex in WP3. The studies in WP4 will
provide the opportunity for a detailed investigation of a very important auditory processing
component, and this will allow us later to refine this component within the large-scale model.
WP4 will be informed by the statistical analysis into relationships between music and
language (WP2), and will provide further insights into the development of effective
representations and the way in which context can influence ongoing processing, contributing
directly to our understanding of the processes underlying music cognition (WP7). The work in
WP4 will also suggest useful representations and processing strategies for the interactive
music system (WP8).
WP5. Perception and categorisation of rhythmic patterns
Research in music perception has shown that time, as a subjective structuring of events
in music, is quite different from the concept of time in physics [82]. Listeners to music do not
perceive rhythm on a continuous scale. Instead, rhythmic categories are recognized which
function as a reference relative to which the deviations in timing can be appreciated [83, 84].
In fact, temporal patterns in music combine two time scales which are essentially different:
the discrete rhythmic durations as symbolized by, for example, the half and quarter notes in a
musical score, and the continuous timing variations that characterize an expressive musical
performance. Here we will investigate the formation of rhythmic categories (rhythmic
categorization) and the influence of temporal context (such as metrical structure, tempo and
previous exposure) in active perception. Three computational modelling approaches to
categorization will be evaluated using existing empirical data. Based on these modelling
studies, a more comprehensive model of rhythmic expectation will be formulated. Such a
model would form the key temporal component in any system that engages interactively with
an environment in time.
These studies are detailed in WP5. The functionality of the model will be informed by
the relevant findings from previous perceptual experiments, as well as those to be performed
in WP1 and WP2. From the model of rhythmic expectation, processing principles will be
derived and formalised in WP7, and these will form a central component in the interactive
music system (WP8). Rhythmic expectation is also important in the controlling of attention in
time [3, 128]; the model will therefore also influence the way in which attentional processing
is implemented within the large-scale cortical model (WP3).

013123 (EmCAP) Annex I, vers. 3 (27/05/05) Approved by EC on 1 June 2005 page 29 of 101

WP6. Active perception, relative pitch and the emergence of tonality


Pitch is an important perceptual attribute of many sounds; communication sounds and
music, in particular, are characterised by prominent pitches. Pitch is also a powerful cue for
the grouping and segregation of sounds within a mixture, reviewed in [91]. There are many
models of pitch perception, generally formulated to extract the absolute pitch of individual
notes as accurately as possible, consistent with human performance. However, paradoxically,
while absolute pitch is clearly an attribute of the stimulus, very few people actually possess a
conscious sense of absolute pitch [94], although most are able to make good judgements of
relative pitch; for example, in judging whether a target note is higher than a reference note, or
whether a note is in tune or not. There is furthermore a tendency across all cultures for pitch
categorisation, the octave relationship playing a special role. Octaves tend to be subdivided
into between 7 and 12 discrete categories and there are also clear preferences for subdivisions corresponding to simple ratio relationships, octave 2:1, perfect fifth 3:2, perfect
fourth 4:3, and so on. It has been suggested that the phenomena of pitch perception and the
interval sensitivities can be predicted from the characteristics of speech [29, 98]. In this
computational modelling study we will investigate the development of discrete pitch
categories through the experience of pitch sequences in musical stimuli, and will extend
current models of absolute pitch perception to account for the perception of relative pitch,
preferred pitch interval relationships and contextual influences on pitch judgement.
There is growing evidence that processing in the peripheral auditory system is under
attentional control; the subcortical auditory system acts as a tuneable active listening device
[1, 2], and the sub-cortical auditory system contains feedback pathways descending in stages
from the cortex all the way back down to the cochlea [64]. There has been very little work
exploring the computational implications of influences on peripheral processing from higher
levels; an issue we will address here, focussing in particular on those aspects of relevance to
pitch processing.
This work is detailed in WP6. The model of peripheral processing will form a common
basis for generating inputs for the other modelling studies (WP3, WP4 and WP5), and for the
interactive music system (WP8). Once the peripheral model has been reformulated to allow
active control of low level processing, it will be combined with the large-scale cortical model
(WP3) in order to make use of the attentional control signals identified in WP3. The model of
relative pitch perception will be informed by experimental findings aimed at determining
which aspects of pitch perception should be learnt through experience (WP1). Collaborative
work will explore the role of working memory (WP3) in relative pitch perception and the
emergence of tonality. The modelling studies of WP6 will contribute further theoretical
insights into the processes underlying music cognition (WP7), which will be used to inform
the way in which pitch is processed in the interactive music system (WP8).
Theoretical principles and implementation of an emergent music cognition system
WP7. Theoretical insights into music cognition
We expect that all of the workpackages described above will give rise to a large number
of important theoretical principles of music cognition and its autonomous development
through experience. By extracting and identifying explicitly these essential principles, the
project will contribute significantly to the major goal of furthering our understanding of the
emergence of complex cognitive behaviour and the specification of a generic functional
computational architecture for cognition. In order that the key theoretical principles are

013123 (EmCAP) Annex I, vers. 3 (27/05/05) Approved by EC on 1 June 2005 page 30 of 101

effectively disseminated beyond the project participants, we propose a formal mechanism for
collecting and documenting them, which will form the basis of WP7. The culmination of this
aspect of WP7 will be a scientific workshop where the theoretical principles will be
presented. The proceedings of the workshop will also be published in the form of a book.
This workpackage will form the theoretical focal point of the project, the purpose of
which will be to integrate the theoretical advances emanating from all of the previously
described workpackages, both experimental (WP1, WP2), and computational (WP3, WP4,
WP5, WP6). The computational principles underlying music cognition will be used to
formulate a generic functional computational architecture for intelligent perception and
cognition, and the utility of this architecture will be assessed through the implementation of
an emergent music processing system (WP8), allowing us to verify and refine our
conclusions; work in WP8 will also therefore contribute to the theoretical model.
WP8. Interactive music system
The fundamental principles and generic cognitive architecture formalised in WP7 will
guide the design and development of the proposed interactive music system. Although there
has been an enormous amount of experimental work in music cognition, there are very few
artificial systems in existence, and none of these to our knowledge has been motivated by the
desire to gain a deeper understanding of the neural processes underlying cognition. Most
music analysis systems exploit domain-specific knowledge and are usually formulated to
maximise performance goals; hence they are generally designed without strong constraints on
their perceptual or cognitive plausibility. In contrast, in this project we propose to implement
a system that will initially have limited a priori knowledge. It will however process musical
stimuli using a biologically realistic low-level acoustic analysis front-end and will learn
through experience to derive a multi-faceted representation of its content. Inspired by the idea
that active perception is the key to understanding self-organisation and autonomous
development, the system will be designed to develop musical expectancies with which it will
compare incoming sounds. By being immersed in a continuous musical environment, the
aim is that it will thereby discover useful features and processing strategies. In order to
support interactive behaviour, these expectancies will also be used to synthesize output
signals.
The system will provide the opportunity, both within the project and subsequently, to
study the effects of different patterns of exposure to music on the internal high-level
representations they generate, and the extent to which behavioural phenomena in music
perception and cognition (WP1, WP2) can be replicated within an artificial system. This work
is detailed in WP8, and forms a major focus for collaboration within the project. A functional
system using current state of the art algorithms will form the basis for the proposed system.
This system will then be incrementally enhanced and refined by incorporating the models and
theoretical insights derived from the experimental and modelling workpackages (WP1-6). It is
expected that the process of developing and experimenting with the artificial system in more
realistic musical environments, will provide additional significant theoretical insights into
music cognition (WP7).
ii.

Risks and contingency plans

Here we consider the main scientific risks arising from the three parts of the project:
experimental investigations, computational modelling, formalisation and embodiment.
The experimental investigations will determine some of the challenges to be faced by
the modelling studies and the interactive system, and hence to some extent the complexity of
013123 (EmCAP) Annex I, vers. 3 (27/05/05) Approved by EC on 1 June 2005 page 31 of 101

the project. These experiments will define what aspects of processing should be learnt from
experience, and devising models which can learn and self-organise may prove to be more
difficult than simply building in the known functionality of the adult system. On the other
hand a deeper understanding of how the properties of the environment can be used to
determine the representations and processes within a system would open up many prospects
for the development of more powerful autonomous systems. Since the scientific questions are
clear and the methodology is well established, in themselves the experimental investigations
comprise a relatively low-risk part of the project.
The computational modelling to be undertaken in workpackages WP3-6 presents
significant scientific challenges. The theoretical understanding of audition lags behind that of
vision, and hampers the development of large-scale models of auditory processing. However,
there is a rapidly growing neuroscientific literature on the perception and processing of
complex sounds, and we believe that incorporating current data within computational models
provides a powerful means for formalising current theories, and for pointing out new
experimental questions and exposing inconsistencies. While the development of a large-scale
cortical model is difficult, and hence a high-risk component of the project, the investigator
responsible has a great deal of experience in computational modelling; in particular, in the
development of large-scale cortical models of vision. Detailed modelling of the
thalamocortical system will similarly derive much from more advanced work in vision, and in
considering the development of representations through experience we will be able to draw
upon previous studies such as [129-131], and our own recent work [132]. The modelling
studies on the perception of rhythm and pitch will both have a considerable base upon which
to build, and therefore although we expect to extend current models, these studies do not carry
as much risk as the others.
The scientific goal of formalizing and communicating the important theoretical insights
into music cognition that we gain during this project does not carry any significant risk.
However, the embodiment of this understanding within an interactive music system, which
can develop effective representations and processes through exposure to musical stimuli, is
very challenging, and carries considerable risk. In order to reduce this risk we will build upon
existing work. An initial baseline system will be developed and then enhanced incrementally,
as we make theoretical advances in the modelling studies. The group responsible for this work
has a great deal of experience in music technology, and the work plan is structured explicitly
so as to allow significant periods of time for collaborative work.
iii.

Management and consortium activities

Efficient and effective management of the project will be essential to ensure the
integration of the work carried out by the consortium members; workpackage WP0 will be
devoted to this aspect of the project. Management activities will include organisational,
technical, administrative, and financial co-ordination, the monitoring of progress on the
project, the enforcement of quality standards, and the facilitation of effective communications
and information flow. All partners will contribute to this workpackage; however, the majority
of this activity and management of the project will the primary responsibility of the Project
Coordinator, aided by an Administrative Assistant.

013123 (EmCAP) Annex I, vers. 3 (27/05/05) Approved by EC on 1 June 2005 page 32 of 101

7.2 Workplanning and timetable

013123 (EmCAP) Annex I, vers. 3 (27/05/05) Approved by EC on 1 June 2005 page 33 of 101

7.3 Graphical representation of work packages


Planned interactions between each of the workpackages and the others are illustrated in
the diagrams below for each of the workpackages in turn, together with the month(s) by
which time the interaction should have occurred.
WP5
WP5
M14,36

M6,28

WP6
WP6

WP1

M14,20,28

M20,28,36

M14,20,28,36
M14,20,28

WP8

WP7

M14,20,28,36

WP8

Experimental
design & stimuli

Results

WP3

M36

WP4

M12

WP8

M12,24

WP2

M12,24
M12,24

WP5

WP6

M12,24,36
M12,24,36

Experimental
design & stimuli

WP7

WP8

Results

013123 (EmCAP) Annex I, vers. 3 (27/05/05) Approved by EC on 1 June 2005 page 34 of 101

WP2

WP4
M12,30

M30

WP4

WP5

M30

WP3

WP6

M12,18,30
M12,18,30

M30

WP7

M12,18,30

M3,18,30

WP6

WP8

Models &
experimental results

Results & models

WP3

WP2
M12

WP3

M30

M12,24

WP4

M3

M24

WP6

M24,32,36

WP7

WP6
M24,32,36

WP8

Experimental
results & models

Results & models

013123 (EmCAP) Annex I, vers. 3 (27/05/05) Approved by EC on 1 June 2005 page 35 of 101

WP1
M6,28

WP1

WP3

M14,36

WP5
WP2

M12,24

M30

M12,30,36

WP7

M30

WP8

Experimental
results

Stimuli, model
& results

WP1

M14,20,28

WP1

WP3

M3,24,32

WP4

M20,28,36
M12,18,30

WP6

WP3

M3,32
M3

WP5

M18,24,32
M18,24,32

Experimental results
& models

WP7

WP8

Stimuli, models
& results

013123 (EmCAP) Annex I, vers. 3 (27/05/05) Approved by EC on 1 June 2005 page 36 of 101

WP1

WP2

M14,20,28,
M12,24,30

WP3
M12,18,30,36

WP4

WP7

M12,24

WP8

M24,32,36
M12,30,36

WP5
M18,24,32,36

WP6

Experiments, models
& results

M18,24,32,3

WP8

Experimental results,
models & theories

013123 (EmCAP) Annex I, vers. 3 (27/05/05) Approved by EC on 1 June 2005 page 37 of 101

WP1

WP2

M14,20,28,36

WP1

M12,24,36

WP3

M14,20,2
M12,18,30

WP4

M24,32,36

M12,24

WP8

M18,24,32,36
M24,32

WP5

WP6

WP2

WP7

M3,18,24,32

M12,24

Experimental stimuli,
results & theories

WP7

Experimental results
& models

013123 (EmCAP) Annex I, vers. 3 (27/05/05) Approved by EC on 1 June 2005 page 38 of 101

7.4 Work package list /overview

Work package list


Workpackage
No

Workpackage title

Lead
contractor
No

Person- Start
months month

End
Deliverable
month No

Higher level auditory


functions underlying music
perception: Innate vs.
learned operations

125

36

D1.1-D1.4

Perception of music form

38

36

D2.1-D2.3

Prefrontal cortical function


in the control of attention
and short term memory

68

36

D3.1-D3.6

Spectrotemporal response
1
fields in the thalamocortical
system

38

36

D4.1-D4.4

Perception and
categorisation of rhythmic
patterns

54

36

D5.1-D5.4

Active perception, relative


pitch and the emergence of
tonality

50

36

D6.1.1-D6.3

Theoretical insights into


music cognition

30

36

D7.1-D7.4

Interactive Music System

68

36

D8.1.1-D8.5

Management,
communication and
documentation

25

36

D0.1-D0.3

TOTAL

496

013123 (EmCAP) Annex I, vers. 3 (27/05/05) Approved by EC on 1 June 2005 page 39 of 101

7.5 Deliverables list

Deliverables list
Deliverable
No1

Deliverable title

Delivery
date2

Nature3

Dissemination
level4

D1.1

Manuscript for WP1/Exp. 1

14

PU

D1.2

Manuscript for WP1/Exp. 2

20

PU

D1.3

Manuscript for WP1/Exp. 3

28

PU

D1.4

Manuscript for WP1/Exp. 4

36

PU

D2.1

Manuscript for WP2/analytical study

12

PU

D2.2

Manuscript for WP2/Exp. 1

24

PU

D2.3

Manuscript for WP2/Exp. 2

36

PU

D3.1

Large-scale cortical model of auditory


processing

12

R, P

PU, PP

D3.2

Investigations into the role of auditory


attention in the formation of auditory
streams

18

PU

D3.3

Extensions to large-scale cortical model


to include prefrontal brain areas, and the
results of other modelling studies

30

R, P

PU, PP

D3.5

Investigations into the role of working


memory in contextual processing

30

PU

D3.6

Computational principles relevant to a


general model of music cognition

36

PP

D4.1

Thalamocortical auditory model

12

R, P

PU, PP

D4.2

Developmental mechanisms for selforganisation in response to auditory


experience

24

R, P

PU, PP

Deliverable numbers in order of delivery dates: D1 Dn


Month in which the deliverables will be available. Month 0 marking the start of the project, and all delivery
dates being relative to this start date.
3
Please indicate the nature of the deliverable using one of the following codes:
R = Report
P = Prototype
D = Demonstrator
O = Other
4
Please indicate the dissemination level using one of the following codes:
PU = Public
PP = Restricted to other programme participants (including the Commission Services).
RE = Restricted to a group specified by the consortium (including the Commission Services).
CO = Confidential, only for members of the consortium (including the Commission Services).
2

013123 (EmCAP) Annex I, vers. 3 (27/05/05) Approved by EC on 1 June 2005 page 40 of 101

D4.3

Investigations into the role of intracortical 32


inputs in relation to categorising auditory
stimuli

R, P

PU, PP

D4.4

Computational principles relevant to a


general model of music cognition

36

PP

D5.1

Evaluation of existing approaches to


rhythmic categorization on shared data
(paper)

12

PU

D5.2

Comparison of rhythm perception models


based on simplicity vs. likelihood (paper)

24

PU

D5.3

Prototype of rhythmic expectation


(model)

30

PP

D5.4

Computational principles relevant to a


general model of music cognition

36

PP

D6.1.1

Comparison between temporal and


spectral models of pitch for extraction of
relative pitch

12

R, P

PU, PP

D6.1.2

Formulation of an active model of pitch


perception, which can account for
contextual influences and the emergence
of discrete pitch intervals (month 32)

32

R, P

PU, PP

D6.2.1

Baseline peripheral model, distributed to


partners

PP

D6.2.2

Study of the computational and functional 18


implications of active control of
peripheral sensory processing

R, P

PU, PP

D6.2.3

Integration of enhanced peripheral model


into large-scale cortical model

32

PP

D6.3

Computational principles relevant to a


general model of music cognition

36

PP

D7.1

Establish the project web-site

PU

D7.2

Produce multimedia promotional kit (v1)

PU

D7.3

Initial prototype of the generic model of


intelligent perception

12

PP

D7.4

Revisions to the generic computational


architecture for intelligent perception and
multimedia kit

24

R, P

PP

D7.5

Compilation of the theoretical insights


into music cognition

36

PU

D7.6

Formulation of a generic model for


intelligent perception and cognition

36

PU

013123 (EmCAP) Annex I, vers. 3 (27/05/05) Approved by EC on 1 June 2005 page 41 of 101

D7.7

Scientific workshop, and publication

36

O, R

PU

D8.1.1

Survey and evaluation of existing


auditory software components and
cognitive architectures

PP

D8.1.2

Overview of saliency in music processing

12

PP

D8.2.1

Mock-up of music analysis system

10

PP

D8.2.2

First version of music projection system

18

PP

D8.3.1

Music Projector incorporating other


partners innovative contributions

24

PP

D8.3.2

Final Music Projector

36

PU

D8.4.1

First results from experimental


simulations

30

PU

D8.4.2

Results from advanced experimental


simulations

36

PU

D8.5

Computational principles relevant to a


general model of music cognition

36

PP

D0.1

Progress report, year 1

12

PP, PU

D0.2

Progress report, year 2

24

PP, PU

D0.3

Final project report

36

PP, PU

D0.4

Project catalogue (CD-ROM)

36

PP, PU

013123 (EmCAP) Annex I, vers. 3 (27/05/05) Approved by EC on 1 June 2005 page 42 of 101

7.6 Work package descriptions

WP1. Higher level auditory functions underlying music


perception: Innate vs. learned operations
1
1
Start date or starting
Workpackage
event:
number:
Participant id:
UoP
FUPF
MTAPI
2
120
Person-months per participant: 2

UvA
1

Objectives
1. Distinguish innate and learned levels of abstraction in auditory processes underlying music
perception
2. Compare auditory model-based predictive functions between neonates and adults
3. Test whether neonates experience pitch similarly to adults

Description of work
Methodology
Event-related brain potentials are measured between electrodes attached to various scalp
locations and a reference electrode. The measurement of these potentials has been standardized
(see [133]). For typical recording parameters and electrode montage in neonates as well as in
adults, see [134]. Epochs, time-locked to the test stimuli are extracted from the continuous EEG
record. Epochs are then aligned with each other with respect to stimulus onset and averaged in
groups formed according to the role of the stimuli in the stimulus sequence (in the present case,
typically, standard stimuli form one group and deviant stimuli form another group). Averaged
responses are compared between the different stimulus groups and across conditions and/or subject
groups by means of parametric statistical tests, such as the ANOVA or MANOVA including
planned comparisons and possible post-hoc tests.
The parameters (amplitude, latency, etc.) of MMN can be best assessed by subtracting from
the response elicited by the deviant stimulus the response elicited by a control stimulus, which
shares as many features as possible with the deviant stimulus but does not violate any regularity
within its own context. The simplest method is to subtract the response elicited by the standard
stimulus from the deviant-stimulus response. When possible, we shall use a control condition, in
which stimuli that are acoustically identical to the deviant appear with the same probability as the
deviant is presented in the test condition, but the control stimulus is a regular (standard) stimulus
within the control sequences (see [135]).
Because MMN elicitation requires the normal functioning of the auditory system, we
regularly perform audiometry on our subjects. This can be done with a simple behavioural
procedure in adults. In neonates, we will employ the objective audiometry procedure that measures
the parameters of the brainstem auditory evoked responses (BAER). BAERs are widely used as an
objective screening for hearing deficits (e.g. [136]). BAER waveforms indicate the functionality of
different stages of the auditory pathway from the cochlea up to the thalamic level. Because
latencies of BAER waveforms reflect maturation at an early age (e.g. [137]), data may serve as a
basis of further studies investigating effects of maturation on higher-level auditory functions in
013123 (EmCAP) Annex I, vers. 3 (27/05/05) Approved by EC on 1 June 2005 page 43 of 101

healthy pre-term infants. Early detection of possible hearing deficits may be an additional benefit
because corrective measures are more effective when the plasticity of the brain is still higher.
In infants, MMN can be elicited in sleep [138]. MMN is elicited in adults while they perform
a visual primary task (such as an n-back task, see [17]). In all experiments, adult subjects will
perform a visual primary task. Recordings in neonates will be carried during quiet sleep.
Description of the experiments
Experiment 1: Grouping by periodic pitch pattern
In adults, Sussman et al. [22] found that at short inter-stimulus intervals (ISI), no MMN was
elicited by the infrequent tone in a tone sequence having the AAAABAAAAB structure (where
A and B are two tones differing from each other in frequency), although MMN was elicited when
the order of A and B tones was randomised while retaining their ratio (4:1). The authors
interpreted their results in terms of temporal grouping: When the unit of the stimulus sequence
becomes the AAAAB tonal group then the B tone does not violate any rule, because its is part of
the repeating standard and, therefore, no MMN is elicited. This interpretation has been confirmed
when, using a longer ISI, MMN was elicited by the B tones as long as subject was not informed
about the structure of the tone sequence [139] (see WP1-Figure 1). Since then, the ISI below which
automatic grouping of the AAAABAAAAB sequence occurs (i.e., no MMN is elicited when
subjects do not attend to the sounds) has been established as ca. 400 ms (Sussman, unpublished
data).
WP1-Figure 1. ERP responses to A (Tone 1)
and B (Tone 2) tones in five different
experimental conditions. Randomised sequences
presented 80% A and 20% B tones in a
randomised order. The structure of the patterned
sequences was AAAABAAAAB Subjects read a
book in the Ignore conditions. In the
Attend-Pitch condition, subjects were instructed
to press a response key when they heard the T
tone, which occurred very infrequently in each
sequence (2.5%) and was lower in pitch than
either A or B. In the Attend-Pattern
condition, subjects were instructed to press the
response key when the repeating AAAAB pattern
of the sequence was violated (that is, they
pressed the key again for the T tones, which
broke the regular structure of the patterned
sequence). The Attend-Pattern condition was
administered after the Attend-Pitch condition.

In the current experiment, we will test whether neonates automatically group this
periodically presented tone pattern and, whether the ISI limit of automatic grouping is similar to
that in adults. Control sequences will present the same tones in randomised order. On the basis of
previous studies [14], we expect MMN to be elicited in the control sequences.
Results of this study will provide constraints for models of rhythm perception (WP5) and
music cognition (WP7). The experiment will also provide task design and empirical data to guide
some of the simulations planned for developing the interactive music system (WP8).
Experiment 2: Timbre-independent extraction of pitch
Adults are able to equate the pitch of spectrally very different sounds (e.g., tell that a sound
produced by flute had the same pitch as another sound produced by a violin), which is an
important prerequisite of music perception. Using the missing fundamental pitch phenomenon
[140] in which pitch is retained although the fundamental (lowest) harmonic of complex tones are
013123 (EmCAP) Annex I, vers. 3 (27/05/05) Approved by EC on 1 June 2005 page 44 of 101

removed from the sound, we have shown that MMN is elicited by change in (virtual) pitch even
when no spectral component of the pitch-deviant sound was infrequent in the sound sequence
[141, 142]. For these studies, nine complex tones were created, all of which had the same
(removed) fundamental frequency (standard tones). One complex tone was composed of
harmonics, which were also present in four of the other tones, but based on the (removed)
fundamental whose frequency was two times the fundamental frequency of the other nine tones
(deviant tone). This resulted in the deviant tone being perceptually an octave higher than the nine
standard tones. In sequences containing all ten complex tones with equal probability, the deviant
tone elicited the MMN response. This result suggested that 1) the MMN response is based on
perceived (pitch) rather than physical (frequency) stimulus properties and 2) that MMN is elicited
by pitch change even when other spectral parameters vary in the sound sequence. (Note that the
memory representations involved in the MMN-generating process also contain timbre information;
see [143].)
The current experiment will use an extension of the above procedure. Sounds of identical
pitch produced by various musical instruments will be presented in a sequence (standards).
Responses to occasional sounds produced by the same instruments as the standards but having a
different pitch (deviants) will be checked for signs of MMN elicitation. In control sequences,
responses to the same deviants will be recorded when these follow homogeneous sequences of the
corresponding (same-instrument) standard sound. Behavioural studies in adults have shown that
timbre-independent extraction of pitch is significantly facilitated by placing the sounds within a
musical phrase [30]. In the second condition of this experiment, the same standard and deviant
sounds will be delivered within short, musical phrases. Results will be compared across the
musical-context and isolated-sound conditions. This experiment will reveal, whether neonates
separate pitch from other spectral sound features and whether musical context helps pre-attentive
processing of pitch or the advantage found in behavioural experiments is the product of later
processes (e.g., decision).
The experimental design and stimulus materials will be developed in conjunction with UoP
(WP6) and FUPF (WP8). The results will provide a crucial constraint for models of pitch
perception (WP6): they will tell whether the separation of pitch and timbre should be regarded as a
basic feature of the system or learning mechanisms should be postulated to account for this
separation. Results will also be used in modelling music cognition (WP7), and to provide task
design and empirical data to guide some of the simulations planned for developing the interactive
music system (WP8).
Experiment 3: Representation of relative pitch
Relative pitch is more important for music than absolute pitch, since it carries the melody
contour. Thus it is important to test, whether there exist innate operations extracting frequency
ratio (the physical parameter underlying relative spectral pitch) independent of the absolute
frequency (absolute spectral pitch) level. In a previous study, Paavilainen and his colleagues [144]
have shown that occasional ascending-pitched tone pairs with a frequency increment that was
higher or lower than that of the majority of tone pairs elicited the MMN, despite the variation of
the absolute frequency level (WP1-Figure 2).
A version of this paradigm will be tested in new-born babies. For maximizing the size of the
response, we will use version 2a with the tones presented in parallel as in 3b (see WP1-Figure 2).
Instead of tones, valid musical sounds (e.g., guitar) and musical intervals will be used. The effect
of musical context can be tested by embedding the sound-pairs in full chords. The experimental
design and stimulus materials will be developed in conjunction with UoP (WP6) and FUPF (WP8).
The results will be used in theoretical models of music cognition (WP7) and they will be especially
important for constraining computational modelling of the perception of relative pitch (WP6) and
for guiding some of the simulations of the interactive music system (WP8).
013123 (EmCAP) Annex I, vers. 3 (27/05/05) Approved by EC on 1 June 2005 page 45 of 101

WP1-Figure 2. Left side: Schematic illustration of the stimulus sequences. The tone-pairs are depicted by small
connected boxes. The y axis represents frequency, the x axis time. Tone-pairs were either delivered sequentially (2a
and b, 3a) or in parallel (3b). Frequent regular (standard) tone-pairs are marked with the letter s, infrequent deviant
tone-pairs with d. Right side: ERP responses elicited by the standard and deviant tone pairs (overplotted),
separately for the different conditions and direction of the deviance (2nd row: decreased within-pair frequency
difference; 3rd row: increased within-pair frequency difference). The MMM response appears as the difference
between the standard and deviant ERPs was statistically significant for all conditions and deviance direction; it is
depicted by shading of the difference between the responses.

Experiment 4: Preferential processing of pitch intervals typically used in western music or


representation for rhythmic patterns
If the results of Experiment 3 suggest that relative pitch is extracted by newborn babies, we
will test whether the MMN response elicited by deviants when the standard pitch interval is a
typical one in western music (e.g., 2:1, 4:3, 5:4) shows a difference from when the same amount of
deviance occurs within a sequence, within which the regularly repeating pitch interval does not
often occur in western music (e.g., 2.4:1). A comparison between the responses elicited in
newborns and adults can reveal whether aspects of western music accommodate perceptual
abilities or plastic changes in the brain adapt the human perceptual system to stimuli of cultural
relevance. Direct links between this experiment and other workpackages are the same as those
described for Experiment 3.
If no innate encoding of relative pitch is indicated by the results of Experiment 3, an
alternative plan will be followed. It has been shown in adults that violating rhythmic regularities
triggers the MMN response [19]. However, the representation of abstract rhythmic regularities has
not been tested before. We propose to test whether newborns and adults detect regularities based
on the rhythm of a short pattern when the speed of presenting these patterns varies (i.e., a
syncopated rhythm presented at various paces). The experimental design and stimuli will be
developed in consultation with UvA (WP5) and FUPF (WP8). Results of this experiment will
inform theoretical models of music cognition (WP7), in modelling rhythm perception (WP5), and
in designing the interactive music system (WP8).

Deliverables
D1.1 Manuscript for WP1 experiment 1 (month 14)
013123 (EmCAP) Annex I, vers. 3 (27/05/05) Approved by EC on 1 June 2005 page 46 of 101

D1.2
D1.3
D1.4

Manuscript for WP1 experiment 2 (month 20)


Manuscript for WP1 experiment 3 (month 28)
Manuscript for WP1 experiment 4 (month 36)

Milestones and expected results


1) Hospital field lab is ready for experiments, the nurse, the doctoral student, and the post-doc
fellow have been trained, and recruitment of neonates is established (month 6).
2) The manuscript describing the results of Experiment 1 (grouping by periodic stimulus pattern)
is submitted for publication (month 14).
3) The manuscript describing the results of Experiment 2 (timbre-independent pitch perception)
is submitted for publication (month 20).
4) The manuscript describing the results of Experiment 3 (processing of relative pitch) is
submitted for publication (month 28).
5) The manuscript describing the results of Experiment 4 (encoding of pitch intervals
characteristic to western music or rhythmic patterns) is submitted for publication (month 36).

013123 (EmCAP) Annex I, vers. 3 (27/05/05) Approved by EC on 1 June 2005 page 47 of 101

WP2. Perception of musical form


1
2
Start date or starting
Workpackage
event:
number:
Participant id:
UoP
FUPF
MTAPI
2
0
Person-months per participant: 36

UvA
0

Objectives
1. Investigate relationships between the timbres of language and musical structures
2. Determine whether the phonetics of native languages bias musical expectation
3. Investigate whether timbral pattern sequences can affect the perception of musical form

Description of work
Introduction
In this workpackage we will investigate whether musical expectations can be created and
influenced by timbral patterns. Specifically, the following questions will be asked: a) Can the
statistical structure of timbral patterns in speech of specific languages predict the typical musical
structures, such as tension profiles, or idiomatic phrases, found in music of that culture? b) Is
possible to detect musical sequence universals across a number of different linguistic groups? c)
Can the phonetics of different languages bias musical expectation? d) Can spectral relationships
bring about expectation in music? The statistical experiment will be performed on linguistic and
musical corpora. The psychoacoustic experiments will be performed with realistic experimental
pieces of music composed and synthesised to address the issues in question. These examples will
be prepared in collaboration with the FUPF team and will be informed by the ongoing work
developed in WP8.
Description of the experiments
Experiment 1: The role of speech systems in musical expectation
Statistical analysis of speech and music corpora
It has been demonstrated experimentally that language and music share brain resources; they
can be studied in parallel to address questions of neural specificity in cognitive processing [145,
146], and general cognitive principles are involved when aspects of syntactic processing in
language are compared with aspects of harmonic processing in music [147]. The music of most, if
not all, human cultures share a number of characteristics believed to be musical universals; e.g.,
division of the continuous dimension of pitch into iterated sets of intervals defining a musical scale
and the preferential use in musical composition of particular subsets of these intervals [148, 149].
Lieberman [150] argues that the sounds of human speech also share a number of features across
most, if not all, human languages. Prosody and intonation may have originated very early in the
course of human evolution, perhaps even before we evolved the neural apparatus to deal with
language and music as two distinct phenomena [151, 152]. The notion that speech plays an
important role in music perception is shared by a number of researchers [85, 153, 154]. However,
although it has recently been demonstrated that the probability distribution of amplitude-frequency
combinations in human utterances of a number of different languages matches the structure of the
chromatic scale intervals [28], the notion that the probability distribution of sequential patterns in
speech may also predict the structure of musical sequences has not yet been investigated.
Therefore this study will address the following questions: Can the statistical structure of timbral

013123 (EmCAP) Annex I, vers. 3 (27/05/05) Approved by EC on 1 June 2005 page 48 of 101

sequences in specific languages predict the structure of musical sequences within music of that
culture? Is possible to detect musical sequencing universals cross a number of different linguistic
groups?
Speech corpora comprising a number of different living languages will be segmented using
variable resolutions [155], and the spectrotemporal pattern of each segment will be characterised.
A corpus of musical pieces (without singing) typical from the cultures that speak the respective
languages will be segmented and analysed in the same way. The probability distribution of the
speech and corresponding musical patterns will be compared in order to identify timbral structures
common to both.
The results of the analysis will reveal whether one can infer rules about musical form based
on the statistical structure of different speech systems, and will contribute to the debate concerning
possible influences of musical experience on speech perception.
Expectations from formant contexts
The findings of the study described above will be verified in perceptual experiments using
musical stimuli. In this experiment will address the question: Can the phonetics of different
languages bias musical expectation? Subjects from different linguistic backgrounds will listen to
musical pieces within which musical passages appear in different contexts. They will be asked to
judge the degree of similarity between the passages. The contexts will be derived so that they
correspond to the formant configurations of the vowel systems of different languages. In addition,
the sequential structures identified in the analytical study will be used to suggest typical and
atypical rhythmic patterns, phrases and chord sequences.
The musical passages, designed in consultation with FUPF (WP8), will be carefully crafted
for the following scenarios:
The pitches of the passage in question will be derived from the centre frequencies of first
three formants of a typical vowel of the mother tongue of the subject, and larger scale form
will be derived from typical sequential structures found in the analytical study;
The pitches of the passage in question will be derived from the centre frequencies of first
three formants of a vowel that is not characteristic of the mother tongue of the subject, and
the larger scale form will contain atypical sequential patterns;
The ability to rank the similarity of musical passages embedded in different contexts will
measure the influence of the phonetic and structural properties of different languages in musical
expectation.
The results from the analytical study and this perceptual experiment will inform the
implementation of the music system (WP8) and the theoretical models of music cognition (WP7),
and will also add further insights into the distinction between innate and learned levels of
abstractions in the perception of pitch (WP6) and rhythmic patterns (WP5), and into the likely
nature of learned representations (WP4).
Experiment 2: Expectations from spectral relationships
There have been a number of studies aimed at characterising musical similarity in terms of
pitch and rhythm, including a model to compute approximate repetitions of musical sequences in
terms of pitch distances [156], and a method to measure melodic similarity based psychological
rating tests with subjects [157]. Recently, a theory of musical understanding has been proposed,
which suggests how musical structure may be processed in the brain, and within which similarity,
derivation, categorisation and schematisation function in an integrated way [158]. Although a few
composers have considered timbre to be the main musical attribute in structuring musical form
[159-161], there is a lack of supporting experimental data. In order to address this problem we will
consider in this experiment whether spectral context can characterise similarity of musical content.

013123 (EmCAP) Annex I, vers. 3 (27/05/05) Approved by EC on 1 June 2005 page 49 of 101

This experiment will address the following question: Can spectral relationships bring about
expectations in music? Subjects will listen to pairs of musical passages that are presented one after
the other, separated by a short delay and will be asked to judge the degree of similarity between
the passages. The second passage will be slightly changed in a number of ways, including
manipulation of the spectral envelope and altering the relationships between partials, transposition,
as well as rhythmic and pitch changes. Details of the experimental design and stimuli will be
decided in consultation with FUPF (WP8).
The ability to rank the similarity of spectral transpositions will measure the expectations of
the subjects with relation to the spectral dynamics of a musical piece. The experiment will also
reveal whether timbre can mask the perception of small variations in tonality and rhythm. If this is
the case then it will provide experimental data for the investigation into the role of working
memory in perceptual categorization (WP3). The results from this experiment will inform the
implementation of the music system (WP8) and will have implications for the design of the
theoretical model of musical cognition (WP7).
Deliverables
D2.1 Manuscript for WP2 analytical study (month 12)
D2.2 Manuscript for WP2 experiment 1 (month 24)
D2.3 Manuscript for WP2 experiment 2 (month 36)

Milestones and expected results


1) The manuscript describing the results of the analytical study (Statistical analysis of speech and
music corpora) is submitted for publication (month 12).
2) The manuscript describing the results of Experiment 1 (The role of speech systems in musical
expectation) is submitted for publication (month 24).
3) The manuscript describing the results of Experiment 2 (Expectations from spectral
relationships) is submitted for publication (month 36).

013123 (EmCAP) Annex I, vers. 3 (27/05/05) Approved by EC on 1 June 2005 page 50 of 101

WP3. Prefrontal cortical function in the control of attention and short term
memory
1
3
Start date or starting
Workpackage
event:
number:
Participant id:
UoP
FUPF
MTAPI
60
0
Person-months per participant: 6

UvA
2

Objectives
1. Develop a large-scale cortical model of auditory processing.
2. Investigate the role of auditory attention in the formation of auditory streams.
3. Extend the large-scale cortical model to include prefrontal brain areas, and investigate the
role of working memory in contextual processing.
4. Integrate models from other modelling studies within the large-scale cortical model.
5. Extract computational principles relevant for a general model of music cognition, and for
technical music applications.
Description of work

Introduction
Contrary to the case in visual perception, where very detailed and biological realistic
computational models exist (see for example [35] for a review), there are very few models of
auditory cortical processing, in particular, for the analysis of attention and memory in auditory
perception. We have previously developed a theoretical framework, incorporating mathematically
explicit spiking and synaptic dynamics, which enables single neuron responses, fMRI activations,
psychophysical results, the effects of pharmacological agents and the effects of damage to parts of
the neural system under study, to be explicitly simulated and predicted. This framework is
consistent with the leading theory of visual attention, namely the hypothesis of biased competition,
which postulates that populations or pools of activated neurons engage in inhibition-induced
competitive interactions which can be biased toward a specific population by an external input
representing attention or context [58, 59, 61, 62]. In a generalized version of this hypothesis,
neural populations are combined in such a way as to model an individual brain structure (e.g. a
cortical area) and engage in competitive and cooperative interactions, through which they try to
represent their input in a context-dependent way. Different model areas bias each other, through
which interaction different aspects of the environment are represented by different areas, leading to
a more complete percept [35]. In this workpackage we will develop a similar large-scale model of
auditory processing, and use it to investigate the role of attention in auditory streaming, and in the
active control of peripheral processing. Extensions to the model to include prefrontal regions will
also allow investigations into aspects of working memory and perceptual constancy.
Develop a large-scale cortical model of auditory processing
Develop a large-scale, neurobiologically realistic cortical model of the primate auditory
cortex, including the medial geniculate nucleus, the auditory areas I and II, and the superior
temporal gyrus. Processes occurring at the AMPA, NMDA and GABA synapses will be
dynamically modelled in an integrate-and-fire implementation to produce realistic spiking
dynamics. We assume a hierarchically organized set of different attractor network pools in the
primate auditory cortex consistent with the brain areas mentioned above. The hierarchical structure
013123 (EmCAP) Annex I, vers. 3 (27/05/05) Approved by EC on 1 June 2005 page 51 of 101

will be organized within the general framework of the biased competition model of attention. In
this approach each cortical area is modelled as a network of interconnected excitatory and
inhibitory neurons, with the strength of connectivity adjusted to reflect the organisation of
functional clusters [43]. This allows inputs from other regions to be processed in the context of
neuronal reverberation, cooperation and competition biased by task-relevant information.
Networks, representing each of the cortical areas involved in auditory processing will be connected
according to the known architecture of the auditory system. In order that the cortical model
receives realistic inputs, it will be built upon an existing modelling system for peripheral auditory
processing [50], which includes well-established models of cochlear and subcortical processing.
The common model of peripheral processing will be implemented in WP6 and also used as the
basis of investigations in WP4 and WP6. The enhanced peripheral model with feedback
connections will be incorporated into this modelling framework later in the project.
The proposed large-scale model of auditory processing will be far more extensive than any
previously developed, and will provide us with a common integrating modelling framework for the
project.
Investigate the role of auditory attention in the formation of auditory streams
Using the large-scale cortical model we will investigate to what extent the computational
principles of biased competition can also account for the attentive processing of auditory objects,
and in particular, whether these principles can account for auditory streaming. We will analyse the
stationary states in the model via mean-field techniques, and non-stationary transient states via the
full spiking simulations; for a review of this approach see [41, 49]. In particular, we will
concentrate on the process of auditory streaming in response to musical stimuli. Most experiments
on auditory streaming have used simple stimuli, and the relationship between the spectra of
successive sounds and the inter-stimulus interval have been shown to dominate the formation of
streams [51]. However, pitch can also influence stream formation [162]. Pitch is thought to arise
from sub-cortical processing which extracts the dominant periodicities within each frequency
channel [102, 163], although the formation of a global pitch percept and its representation in
cortex remains controversial [99, 164, 165]. Here we will include a simple model of pitch
processing, which will also form the starting point for the model of relative pitch perception
(WP6), in which the autocorrelation of the activity in each channel is used to form a correlogram
[163, 166]. The results of these investigations will have implications for the theoretical model of
musical cognition (WP7), the modelling of relative pitch perception (WP6), and the
implementation of the interactive music system (WP8).
Extensions to the large-scale cortical model
We will collaborate with researchers from WP4 in order to incorporate a more detailed
model of the thalamocortical system within the large-scale cortical model. This will provide the
large-scale model with the possibility of developing time-varying receptive fields in response to
acoustic stimuli. We will also work with researchers from WP4 and WP6 in implementing topdown attentional control of peripheral processing (mediated through the thalamocortical system),
and in the investigations into active perception. In collaboration with researchers from WP5 the
model will also be extended to account for the generation of rhythmic expectancies in response to
musical stimuli, and through this the control of attention in time.
Extend the large-scale cortical model to include prefrontal brain areas, and investigate the role
of working memory in contextual processing
We will extend the large-scale cortical model described above to include an explicit model
of cortical prefrontal brain areas (dorsolateral and inferior frontal regions) [40], and its coupling
with the superior temporal gyrus [125, 126]. This will allow us to investigate the role of working

013123 (EmCAP) Annex I, vers. 3 (27/05/05) Approved by EC on 1 June 2005 page 52 of 101

memory in auditory perception, in particular the phenomena of perceptual constancy and


perceptual categorisation in the context of music cognition [92, 167, 168]. This work will be
conducted in collaboration with researchers developing a model of relative pitch perception
(WP6), since the perception of relative pitch and the emergence of tonality has been found to
involve both prefrontal cortex and superior temporal gyrus [30, 92, 169], and will be informed by
early results from the perceptual study (WP2: Experiment 2 Expectations from spectral
relationships).
Extraction of computational principles relevant for a general model of music cognition, and for
technical music applications
From each of these modelling studies we will extract the computational principles that we
have found to be important in music cognition. These will be included in the formulation of a
generic architecture for cognition (WP7), and in designing more powerful algorithms for use in the
interactive music system (WP8).

Deliverables
D3.1 Report describing the large-scale cortical model of auditory processing (month 12)
D3.2 Report describing the investigations into the role of auditory attention in the formation of
auditory streams (month 18)
D3.4 Report describing extensions to large-scale model beyond those documented in year 1
(month 30)
D3.5 Report describing the investigations into the role of working memory in contextual
processing (month 30)
D3.6 Report describing the computational principles emanating from these modelling studies
relevant to a general model of music cognition (month 36)

Milestones and expected results


1) Large-scale model of auditory processing (month 12)
2) Investigations into auditory streaming and auditory attention (month 18)
3) Extensions to large-scale model to include models from WP4, WP5 and WP6 (month 30)
4) Extension to large-scale to include prefrontal areas; investigations into the role of working
memory in contextual processing (month 30)
5) Computational principles emanating from the modelling studies relevant to a general model of
music cognition (month 36)

013123 (EmCAP) Annex I, vers. 3 (27/05/05) Approved by EC on 1 June 2005 page 53 of 101

WP4. Spectrotemporal response fields in the thalamocortical system


1
4
Start date or starting
Workpackage
event:
number:
Participant id:
UoP
FUPF
MTAPI
36
2
0
Person-months per participant:

UvA
0

Objectives
1. Construct a computational model of the thalamocortical and associated intracortical
network and replicate within the model spectrotemporal response fields (STRFs) such as
those observed experimentally.
2. Investigate mechanisms of self-organisation and plasticity in the development of STRFs
through exposure to different auditory experiences.
3. Investigate the ability of the model STRFs to support the categorisation of ongoing auditory
stimuli within the arge-scale cortical modelling framework.
4. Extract computational principles relevant for a general model of music cognition, and for
technical music applications
Description of work
The aim of this workpackage is to understand how the thalamocortical network of the
auditory system contributes to the representation of timbre and the extraction of meaning from an
auditory stimulus, where the stimulus forms part of a continuous stream of stimuli, such as in
music. Our approach is to develop a detailed computational model of the auditory thalamocortical
network (TCN) and its associated intracortical networks and to investigate its response to auditory
stimuli and how this response is modified by factors which assist in specifying the meaning of a
stimulus. Such factors include the temporal context in which the stimulus appears, the brains
expectation of the presence of the stimulus in this context; the attentional status of the stimulus
(whether it is being attended to or not at the current time), and the role the stimulus plays in the
current goals of the listener.
The proposed plan of work is presented below in three stages:
Stage 1: Detailed computational model of the auditory thalamocortical system
Construction of a computational model of the thalamocortical and associated intracortical
network; replication within the model of experimentally observed STRFs for a selected subset of
cortical and thalamic neurons, i.e. those with relatively simple receptive field structures with a
small number of excitatory and inhibitory fields distributed both spectrally and temporally.
The proposed computational model will comprise mathematical descriptions of the dynamic
properties of the principal populations of excitatory and inhibitory neurons and their connections,
in the following thalamic and cortical areas: medial geniculate nucleus of the thalamus (MGN);
nucleus reticularis of the thalamus (NRT); and layers 2/3, 4, 5 and 6 of the primary auditory
cortex (AI2-6). The models will be based on neurobiological data on the cellular and synaptic
mechanisms involved, and initially constructed at a range of levels of description, including
population-based models and conductance-based spiking neural network models, the latter
incorporating the membrane currents considered necessary to adequately describe the
spatiotemporal dynamics of the network. If necessary, e.g. to take account of the spatial
distribution of synaptic inputs onto cortical pyramidal cells, multi-compartmental neuron models
will be incorporated into the network model. From these initial models, the final model will be
013123 (EmCAP) Annex I, vers. 3 (27/05/05) Approved by EC on 1 June 2005 page 54 of 101

constructed, as the simplest possible model which is capable of replicating the major dynamical
features of the experimentally observed STRFs of principal thalamic and cortical neurons. This
detailed model of the auditory thalamocortical network will be incorporated into the large-scale
cortical model (WP3), and will provide the interface whereby cortical processing can generate
appropriate top-down signals in order to influence subcortical processing (WP6).
Stage 2: Mechanisms of self-organisation and plasticity
Investigation into the mechanisms of self-organisation and plasticity in the development of
the STRFs of selected neurons in the model through exposure to different auditory experiences, in
a manner which is consistent with experimentally observed modification of STRFs in early
development.
It is likely that the main features of the STRF properties of selected thalamic and cortical
neurons develop using self-organising mechanisms as a result of exposure to natural auditory
environments during early developmental stages. Exposure to abnormal auditory environments has
been shown experimentally to lead to disruption of normal STRF properties. Because the
mechanisms which determine the early development of STRF properties are presently unknown,
this stage of the workpackage is speculative and based largely on hypothetical mechanisms. It is
thought however that sharpening of tuning curves and the refinement of tonotopicity in AI is
dependent on appropriately patterned input activity during development [76, 77, 170, 171] and
there is evidence that modification of neural circuits in AI can be induced by abnormal patterns of
neural activity [76, 77, 170, 171]. These investigations will also be informed by the results from
the statistical study of the relationship between language and music in WP2. The results will
inform the development of the theoretical model of musical cognition (WP7), and suggest suitable
representations for the interactive music system (WP8).
Stage 3: Categorisation of ongoing auditory stimuli
Investigation of the ability of the model STRFs of the model to support the categorisation of
ongoing auditory stimuli, e.g. relating the stimuli to particular components of a complex auditory
scene; the form of intracortical signals in the model which relate to the meaning of the stimulus,
and their ability to modify the STRFs in a way which facilitates and improves the categorisation of
ongoing stimuli.
Sensory information which might be used to create the meaning of a stimulus must be
interpreted in the context of the internal knowledge related to the present task. Neurons and local
neuronal circuits respond to external sensory-evoked input and to internally generated input within
the context of a background network or population activity which exists at the time. In this way the
neuronal network, within which an individual neuron or neuronal circuit participates, can exercise
a fine control over the individual neurons response, in accordance with the contextual knowledge
represented by the activity of the network. Such a neuronal control mechanism seems to provide a
potential link between context at the neural response level and context at the cognitive response
level.
In this stage of the workpackage we will investigate the way in which contextual
intracortical activity in the primary auditory cortex is created, and how this activity affects
neuronal response properties. It appears that, in the case of a single neuron, contextual network
activity can create spatiotemporal patterns of subthreshold membrane potential activity across the
dendritic tree of the neuron. The pattern of activity at any time will have a strong influence on the
synaptic integration properties of the cell and therefore on its STRF. Theoretical and
computational studies of these properties in auditory cortex and the thalamocortical network in
order to determine the way in which contextual cortical activity affects neuronal response
properties in order to improve the role of STRFs in categorisation may provide a link between the
neuronal and the cognitive level on the role of context in understanding the meaning of sensory
013123 (EmCAP) Annex I, vers. 3 (27/05/05) Approved by EC on 1 June 2005 page 55 of 101

stimuli in the brain. In order to conduct this study effectively it will be necessary to integrate the
detailed thalamocortical model into the large-scale cortical model developed in WP3. The results
will inform the development of the theoretical model of musical cognition (WP7), and suggest
suitable representations for the interactive music system (WP8).
Stage 4: Extraction of computational principles relevant for a general model of music cognition,
and for technical music applications
From these modelling studies we will extract the computational principles that we have
found to be important in music cognition. These will be included in the formulation of a generic
architecture for cognition (WP7), and in designing more powerful algorithms for use in the
interactive music system (WP8).

Deliverables
D4.1 Report describing the thalamocortical model (month 12)
D4.2 Report on developmental mechanisms for self-organisation in response to auditory
experience (month 24)
D4.3 Report on investigations into the role of intracortical inputs in modifying the STRF
properties in relation to categorising auditory stimuli (month 32)
D4.4 Report describing the computational principles emanating from these modelling studies
relevant to a general model of music cognition (month 36)

Milestones and expected results


1) A computational model of the thalamocortical and associated intracortical network which
captures the major features of the STRF properties of selected thalamic and cortical neurons
(month 12).
2) Extensions to the computational thalamocortical model to incorporate the main developmental
mechanisms of self-organisation and adaptation to auditory experience (month 24).
3) Investigation into the role of intracortical inputs in modifying the STRF properties in relation
to categorising auditory stimuli (month 32).
4) Computational principles emanating from the modelling studies relevant to a general model of
music cognition (month 36)

013123 (EmCAP) Annex I, vers. 3 (27/05/05) Approved by EC on 1 June 2005 page 56 of 101

WP5. Perception and categorisation of rhythmic patterns


1
5
Start date or starting
Workpackage
event:
number:
Participant id:
UoP
FUPF
MTAPI
0
2
0
Person-months per participant:

UvA
52

Objectives
1. Rhythmic categorization. Study the formation of rhythmic categories (rhythmic
categorization) and the influence of temporal context (such as metrical structure, tempo and
previous exposure) in active perception.
2. Model selection. Three computational modeling approaches to categorization will be
evaluated on existing empirical data. One approach is based on the Gestalt principles of
perception, simplicity or ease of encoding being a key aspect. An alternative approach,
called memory-based, is based on the notion of likelihood. Here, models try to explain
structural interpretations in terms of the most probable encoding, the probabilities being
extracted from previously heard examples. A third approach is based on the laws of
kinematics, modeling rhythm directly in terms of action, using the apparent similarities
between physical and musical motion.
3. Rhythmic expectation. Based on the results of 2) a model of rhythmic expectation can be
formulated that will be the key temporal component in a model of emergent cognition.
4. Extract computational principles relevant for a general model of music cognition, and for
technical music applications
Description of work
Introduction
Research in music perception has shown that time, as a subjective structuring of events, is
quite different from the concept of time in physics [82]. Listeners to music do not perceive rhythm
on a continuous scale. Instead, rhythmic categories are recognized which function as a reference
relative to which the deviations in timing can be appreciated [83, 84]. In fact, temporal patterns in
music combine two time scales which are essentially different: the discrete rhythmic durations as
symbolized by, for example, the half and quarter notes in a musical score, and the continuous
timing variations that characterize an expressive musical performance. In this workpackage we
will evaluate current theories of rhythmic perception and formulate a model which can generate
rhythmic expectancies in response to musical stimuli using a categorical representation of
rhythmic patterns.
Investigate the perceptual formation of rhythmic categories
Honing [87] proposed a knowledge representation that makes these three aspects (i.e.
rhythmic structure, tempo and timing) explicit by introducing a way in which expressive timing
can be expressed in terms of the temporal structure and global tempo. This representation will
form the formal basis for the current study in trying to disentangle these components by studying
systematically obtained empirical data. The key idea in this approach is the notion of rhythm space
[83, 88]. Instead of using the more common method of studying a corpus of typical examples
[172], we consider the space of all possible performances of a small number of time intervals (or
note durations). In this n-dimensional space every point constitutes a different temporal pattern.

013123 (EmCAP) Annex I, vers. 3 (27/05/05) Approved by EC on 1 June 2005 page 57 of 101

This infinite set contains musical and unmusical rhythmic patterns; rhythms often encountered in
music, and those rarely used. This rhythm space captures, in principle, all possible expressive
interpretations in any musical style of any rhythm of n+1 onsets. For example, in considering
rhythmic patterns of four onsets, any pattern can be represented in three dimensions, with the three
axes representing the three inter-onset intervals (IOIs). All patterns that add up to a fixed total
duration form a diagonal triangular slice in such a space (see Figure WP5-1a). Looking from
above, towards the origin, the triangle can be presented as a ternary plot (see Figure WP5-1b), and
the particular rhythmic pattern by any point is this space can be interpreted (see Figure WP5-1c for
two examples).

Figure WP5-1. Rhythm space (A), ternary plot (B), and two example patterns (C) (see text for details).

The cognitive process of extracting a symbolic representation from a performance as studied


in music perception (i.e. categorization), can be described as a mapping from a performance space
into a score space. This can be studied by determining which sets of performances are considered
interpretations of the same rhythmic pattern. This rhythm space representation was used to analyse
the results of a series of experiments in which musicians were asked to notate a large set of
rhythmic patterns presenting a systematic sampling of the performance space [89]. This revealed
the relation between performed rhythm and the rhythmic categories recognized in perception. We
will use this performance space representation to study the influence of the other components of
rhythm as well: expressive timing, the influence of absolute tempo and tempo change (e.g. rubato),
trying to disentangle their respective contributions in the formation of rhythmic categories in
active perception, and how the emergent categories influence further incoming stimuli and their
rhythmic interpretation.
Modelling approaches to rhythmic categorization
Even though the computational modelling of beat and meter induction has been researched
for some time now [173, 174], the human assignment of metrical information still outperforms
existing computational models of cognition. Humans are not only very precise in finding structural
information, they can also do it quickly and are very flexible; for example, they can easily
distinguish between rhythmic or tempo changes. There is a considerable amount of literature on
modelling the phenomenon of meter induction, using a large variety of computational paradigms.
One class of models is based on Gestalt principles of perception, simplicity or ease of encoding
being a key aspect [175]. An alternative approach, called memory-based, is based on the notion of
likelihood [90]. Here, models try to explain structural interpretations in terms of the most
probable encoding. The probabilities are extracted from previously seen examples. Instead of
generating the metrical structure using a simple model, previously encountered structures drive the
analysis of new data.
From the experimental data the performed rhythm that was identified most often as a certain
rhythmic category, and therefore apparently the most communicative rendition of the rhythm, was
directly related to the structure of the rhythmic category [89]. Generally, these so called
performance centroids are not expected to be the mechanical rendition of the rhythm, but will be
for most rhythms slightly off-centre. We will use this data to formulate a model that predicts the
timing of these centroids (i.e. the most communicative rendition) from the rhythms identified.

013123 (EmCAP) Annex I, vers. 3 (27/05/05) Approved by EC on 1 June 2005 page 58 of 101

Using as comparison models based on simplicity, or minimum description length (MDL), we


will explore a number of memory-based approaches; see [176] for an overview) to determine the
factors that contribute to these timing patterns, aiming at a generative model of rhythmic
categorization and its emergent cognitive structure. Based on these investigations we will
formulate a model of rhythmic expectation which will be used as the key temporal component in
models of emergent cognition. This will form the basis for collaborative work with researchers
developing the large-scale cortical model (WP3), and the emergent interactive music system
(WP8).
Extraction of computational principles relevant for a general model of music cognition, and for
technical music applications
From the modelling studies we will extract the computational principles that we have found
to be important in music cognition. These will be included in the formulation of a generic
architecture for cognition (WP7), and in designing more powerful algorithms for use in the
interactive music system (WP8).

Deliverables
D5.1 Evaluation of existing approaches to rhythmic categorization on shared data (paper)
(month12)
D5.2 Comparison of rhythm perception models based on simplicity vs. likelihood (paper)
(month24)
D5.3 Prototype of rhythmic expectation (model) (month 30)
D5.4 Report describing the computational principles emanating from these modelling studies
relevant to a general model of music cognition (month 36)
Milestones and expected results
1) Evaluation predictive power of kinematic, memory-based and perception-based models of
rhythm perception and production (month 12).
2) Construct and evaluate rhythm perception models based on simplicity and on likelihood
(month 24).
3) Model of rhythmic expectation, computational principles (month 36).

013123 (EmCAP) Annex I, vers. 3 (27/05/05) Approved by EC on 1 June 2005 page 59 of 101

WP6. Active perception, relative pitch and the emergence of tonality


1
6
Start date or starting
Workpackage
event:
number:
Participant id:
UoP
FUPF
MTAPI
48
2
0
Person-months per participant:

UvA
0

Objectives
1. Active perception and the emergence of relative pitch. Investigate the development of
discrete pitch categories through the experience of pitch sequences, and the extension of
current models of absolute pitch perception, to account for the perception of relative pitch,
pitch interval relationships and contextual influences on pitch judgements.
2. Computing with active sensors. Investigate the computational and functional implications
of active control of peripheral sensory processing through gain control and filter retuning.
3. Active pitch perception. Investigate the properties of a model of relative pitch perception,
reformulated to actively modify peripheral processing.
4. Extract computational principles relevant for a general model of music cognition, and for
technical music applications

Description of work
Introduction
Pitch is a fundamental perceptual attribute of communication sounds and music, and is also a
powerful cue for the grouping and segregation of sounds within a mixture (reviewed in [91]). Here
we propose to formulate a model of pitch perception which can account for contextual influences
on pitch perception and for the emergence of discrete pitch categories. Pitch relationships are an
essential aspect of pitch perception, and we generally experience pitches not in isolation but in
sequences and as part of higher-level cognitive structures. Perceptual judgements of pitch are
influenced by context; judgements can be facilitated if the context is a melodic sequence [30, 92,
93], or a chord sequence with tonality consistent with the target, but can be impaired by the
presence of a non-matching context [93]. However, current models of pitch perception do not
account for the influence of context on perception. In addition, most models of pitch perception
focus on the representation of absolute pitch, a perceptual phenomenon very few people actually
possess [94]; although most people do have a good sense of relative pitch, e.g. judging whether a
note is in tune or not. We propose to formulate an active model of pitch perception that develops
representations by modelling the regularities in the stimuli it experiences, at increasingly higher
levels of abstraction. The investigations into predictive modelling of incoming stimuli will also be
applied to the development of an intelligent sensory periphery and the attentional modulation of
bottom-up processing.
Active perception and the emergence of relative pitch
We will investigate the development of discrete perceptual pitch categories through the
experience of pitch sequences, and formulate extensions to current models of absolute pitch
perception to account for the perception of relative pitch, pitch interval relationships and
contextual influences on pitch judgements.
Models of pitch perception fall into two broad classes, spectral or place models which rely
upon the distribution of energy across the basilar membrane to isolate individual harmonics in a
013123 (EmCAP) Annex I, vers. 3 (27/05/05) Approved by EC on 1 June 2005 page 60 of 101

complex tone, and temporal models which analyse the periodicities in the firing patterns of nerve
fibres. Both sources of information are available in the auditory nerve activity, and the competing
models have different strengths and weakness; so there remains some controversy over which is
correct. We will consider a representative model of each class, and investigate their response in the
presence of multiple pitches, and their ability to represent interval relationships and to account for
contextual effects.
An influential temporal model, the summary autocorrelation function (SACF) model [163]
has been shown to account for a wide range of pitch phenomena. It consists of a model of cochlear
processing which includes spectral decomposition by a bank of band pass filters, half wave
rectification and low pass filtering of inner hair cell processing, periodicity estimation within each
frequency channel using an autocorrelation function (ACF), and a linear sum of the of the ACFs to
give the SACF. The incorporation of adaptive delays into this model would give it contextual
sensitivity. We will investigate whether an adaptive SACF model shows sensitivities to pitch
similarities that account for the preference for simple pitch relationships.
Although successful, spectral models have previously been criticised for their assumption of
the existence of harmonic templates used to group harmonics to derive pitch. However, recently it
has been shown how such templates could arise simply through correlations between auditory
nerve spike trains [177]. The model of [178] is a spectral model based upon the idea of a harmonic
sieve and it too explains a great deal of perceptual data. Processing in the model results in a pattern
of activity across a pitch map in which energy levels indicate the degree to which a particular pitch
is activated. There is potential in this model too for the inclusion of contextual effects and
facilitated processing of related pitches, and for combining it with place-time peripheral processing
[177].
Our investigations will evaluate which of the candidate models is better able to represent
pitch intervals, the influence of tonality and the preference for discrete relationships. Both models
extract absolute pitch, and so we will formulate extensions to support pitch-invariant interval
representations. We will use this to inform the formulation of a model of pitch perception in terms
of active perception, motivated by the idea that the brain is constantly trying to abstract regularities
from the stimuli it experiences.
Researchers on WP6 will collaborate with MTAPI (WP1) in order to design suitable stimuli
for distinguishing between learnt and innate levels of pitch perception. The model of pitch
perception developed here will be informed and constrained by the results of the perceptual
experiments (WP1). The model will be integrated into the large-scale modelling framework, in
collaboration with researchers in WP3, in order to investigate the role of interactions between
prefrontal cortex and superior temporal gyrus in the perception of tonality found to be necessary
experimentally [108, 179, 180].
Computing with active sensors
We will investigate the computational and functional implications of active control of
peripheral sensory processing through gain control and filter retuning, and the properties of a
model of relative pitch perception, reformulated to actively modify peripheral processing.
In the bat it has been shown that cortical stimulation can modulate cochlear tuning in a
number of ways [181]; these include a simple gain increase at the site with matching best
frequency, as well as adjustments to the tuning of the basilar membrane filtering properties at nonmatched sites. Depending on the site of cortical stimulation, the retuning can move the best
frequency of the filter towards or away from the stimulating frequency, and can alter the
bandwidth of the filter, and increase or decrease its gain. There are many models of cochlear
processing, and we will base our experiments on the one of Meddis and colleagues, since it has
been shown to account for many aspects of cochlear processing [182, 183].We will extend this

013123 (EmCAP) Annex I, vers. 3 (27/05/05) Approved by EC on 1 June 2005 page 61 of 101

model to incorporate active control of the filter properties, and conduct a systematic empirical
investigation into the computational implications for models operating on its output. In particular,
we will consider the impact on the pitch models, described above, of dynamic adjustments to
cochlear processing.
In collaboration with researchers in WP3 and WP4, the active peripheral model will be
incorporated into the large-scale cortical model in order to simulate the attentional control of
peripheral processing.
Extraction of computational principles relevant for a general model of music cognition, and for
technical music applications
From these modelling studies we will extract the computational principles that we have found to
be important in music cognition. These will be included in the formulation of a generic
architecture for cognition (WP7), and in designing more powerful algorithms for use in the
interactive music system (WP8).

Deliverables
D6.1.1 Comparison between temporal and spectral models of pitch in relationship to the
representation of pitch intervals, the influence of tonality, and categorical perception of relative
pitch (report) (month 12).
D6.1.2 Formulation of an active model of pitch perception, which can account for contextual
influences and the emergence of discrete pitch intervals (model, report) (month 24).
D6.2.1 Baseline peripheral model, distributed to partners (model) (month 3).
D6.2.2 Study of the computational and functional implications of active control of peripheral
sensory processing through gain control and filter retuning (report) (month 18).
D6.2.3 Integration of enhanced peripheral model into large-scale cortical model to investigate
attentional control of peripheral processing (model, report) (month 32).
D6.3 Report describing the computational principles emanating from these modelling studies
relevant to a general model of music cognition (report) (month 36)
Milestones and expected results
1) Comparison between computational models of relative pitch (month 12).
2) Formulation of an active model of pitch perception, which can account for contextual
influences and the emergence of discrete pitch intervals (month 24).
3) Integration of the pitch model into the large-scale cortical model, and investigations into the
role of working memory in extracting pitch relationships (month 30).
4) Empirical investigation of the computational and functional implications of active control of
peripheral sensory processing through gain control and filter retuning (month 18).
5) Integration of enhanced peripheral model into large-scale cortical model to investigate
attentional control of peripheral processing (month 30).
6) Report describing the computational principles emanating from these modelling studies
relevant to a general model of music cognition (month 36).

013123 (EmCAP) Annex I, vers. 3 (27/05/05) Approved by EC on 1 June 2005 page 62 of 101

WP7. Theoretical insights into music cognition


1
7
Start date or starting
Workpackage
event:
number:
Participant id:
UoP
FUPF
MTAPI
10
12
4
Person-months per participant:

UvA
4

Objectives
1. Extract and document the essential computational principles and key insights into music
cognition gained through the project as a result of behavioural experiments,
neurocomputational modelling, and the development of an artificial system for real time
interaction.
2. Define a generic functional computational architecture for cognition.
3. Identify potential beneficiaries of project outcomes.
4. Communicate the outcomes of the project to people outside our immediate research
community.

Description of work
The fundamental hypothesis that cognition emerges through active perception of the
environment has been a guiding principle in structuring this proposal. The idea that music
cognition, even in the absence of overt behaviour, depends upon the development of expectations,
conditioned by the current musical context and by previous musical experience, is well accepted,
but a detailed theoretical understanding of this process has yet to be developed. We expect
therefore that the proposed experimental and computational studies will suggest many important
theoretical and computational principles of music cognition, and its autonomous development
through experience. This workpackage will receive as input reports produced during the course of
the work on the other workpackages. Associated with each of the experiments and modelling
studies, workpackage leaders will produce reports for WP7 in which they highlight the theoretical
insights derived during that stage of their work.
In order to ensure that we take advantage of the benefits offered by the diversity of the work
in the project, one of the tasks here will be to compile the theoretical insights we gain from each of
the investigations in to a coherent report, and use this work to formulate a generic functional
computational architecture for intelligent perception and cognition. In this way we will contribute
significantly to furthering understanding of the emergence of autonomous complex cognitive
behaviour and its realisation in artificial systems. However, in order to begin the project with a
consensus view of the generic architecture for which we are aiming, an initial prototype will be
formulated through discussions at the project meetings during the first year, and used as a basis for
WP8 during the initial stages of the project.
Communication with the wider public and with potential users and beneficiaries of the
technology developed through this project is an important objective. To this end we will in the
early stages of the project produce a promotional multimedia kit, which will include an interactive
presentation of the project and many of the other interesting introductory and non-technical
elements which are also available on the web-site. This kit will be updated as the project proceeds.
The project web-site will also be established early in the project, and will be a major source of
useful information for people outside the consortium. This web-site will contain a description of
the project and its goals, and the project consortium (with links to other sites and associated
013123 (EmCAP) Annex I, vers. 3 (27/05/05) Approved by EC on 1 June 2005 page 63 of 101

information); all public project documentation; a discussion forum for all the EmCAP community
(including project partners, interested members of the research community, and members of the
public potentially interested in the project deliverables); a news repository that will cover not only
news from the project, but also related news from around the world; an electronic compass which
will introduce newcomers to the field of musical neurocognition with links to relevant tutorials,
papers, researchers and projects; a special section offering non-technical explanations of the goals
and achievements of the project.
A public workshop will be organised and held at the end of the project. This will allow us to
present the work of this project to other scientists in the field. We will also seek out and
communicate with other parties who we identify as having a potential interest in taking advantage
of our findings; such as healthcare professionals, interested in developing hearing screening
programmes or improvements to prosthetic devices (hearing aids, cochlear implants, etc);
technologists interested in exploiting this work in developing commercial music systems, or more
generic applications which require intelligent autonomous or interactive behaviour; and
educationalists who see possible applications in musical education or in extending the public
awareness of science. In addition, the proceedings of the workshop will be published as a book in
order to disseminate the work on this project as widely as possible.

Deliverables
D7.1 Establish the project web-site and populate it with information available initially (FUPF,
month 3).
D7.2 Produce version 1 of the multimedia promotional kit (FUPF, month 6).
D7.3 Initial prototype for the generic computational architecture for intelligent perception (month
12)
D7.4 Updates to prototype for the generic computational architecture for intelligent perception and
multimedia kit (month 24)
D7.5 Compilation of the theoretical insights into music cognition (month 36)
D7.6 Definition of a generic model for intelligent perception and cognition (month 36)
D7.7 Organisation of scientific workshop, and publication of proceedings (month 36)

Milestones and expected results


1) Establishment of communication mechanisms, and formulation of the initial prototype of the
generic model of intelligent perception (month 12)
2) Refinements to the prototype for a generic model of intelligent perception (month 24)
3) Compilation of the theoretical insights into music cognition (month 36)
4) Definition of a generic model for intelligent perception and cognition (month 36)
5) Organisation of scientific workshop, and publication of proceedings (month 36)

013123 (EmCAP) Annex I, vers. 3 (27/05/05) Approved by EC on 1 June 2005 page 64 of 101

WP8. Interactive music system


1
8
Start date or starting
Workpackage
event:
number:
Participant id:
UoP
FUPF
MTAPI
5
61
0
Person-months per participant:

UvA
2

Objectives
1. Build a music processing system to study the development of internal musical codes,
music expectancies and phenomena of music cognition such as music categorization,
similarity ranking and streaming. The system will be able to synthesize, as musical output,
the expectancies generated after processing short musical excerpts provided as input.
2. Investigate the performance of the music processing system experimentally, considering
such aspects as attention, categorization, similarity, and stream segregation.
3. Integrate, into the music processing system, enhanced algorithms emanting from the work
of other project partners.

Description of work
Introduction
The methodology underlying most applications of music technology, such as music content
processing through the automatic analysis and description of musical sounds, or musically
expressive synthesis and voice performance enhancement, depends upon analysis algorithms
which exploit domain-specific knowledge. These algorithms are generally designed without
strong constraints on their neurobiological plausibility and aim to maximise computational
performance without explicitly emulating any known perceptual or cognitive processes. The
focus in this project is rather different in that we are interested in understanding the mental
processes underlying music cognition, e.g. the factors which determine the creation of contextual
models, the control of attention in time, and experience-dependent developmental processes; all
of which are necessary to support the emergence of autonomous intelligent behaviour. The
success of interactive music improvisation depends to a large extent on consistency between the
predictive models created by each participant, and therefore the degree to which they can predict
each others behaviour. While the creation of a fully-fledged artificial music improviser is some
way off, we will address the fundamental problems of creating and maintaining predictive models
of the acoustic environment, and the role of developmental experience in shaping these models.
Methodology
We propose to devise and implement a music analysis prototype that initially has minimal
hard-wired musical knowledge (the initial capabilities will be informed by the experiments
conducted in WP1 and WP2). The system, which we have tentatively nicknamed the Music
Projector, will take music data processed by a low-level acoustic analysis front-end and will
elaborate a multi-faceted description of its content, including symbolic and sub-symbolic
representations. The idea is that the system will learn useful abstractions by forming predictive
models of the musical input, i.e. the abstractions that allow it to make accurate predictions. In this
way, the system will develop representations that can be interpreted as music expectancies, which
will be translated into audible music by means of synthesis, thereby providing a kind of
interactive behaviour to the system. The fundamental goal is to include the ability for self-

013123 (EmCAP) Annex I, vers. 3 (27/05/05) Approved by EC on 1 June 2005 page 65 of 101

organization in the implementation: i.e. the system will find useful features and ways for
processing them by being immersed in a continuous musical environment. We also envisage
that such a system could bootstrap itself by forming a hierarchy of increasingly higher-level
concepts.
The basic starting point for the system will need to contain some inbuilt functionality. Our
brains generally exhibit innate properties that constrain the range of feasible processing strategies
and representations. Since the goal of the Music Projector to emulate human perception at the
functional or algorithmic level, some of those architectural constraints will also be included. In
addition, it is well known that the human brain is particularly malleable during early
development. This means that early experience may affect the tuning of representations and
processes, facilitating or making more difficult the development of certain knowledge structures,
or the processing of specific types of stimuli (e.g. our exposure to occidental music makes it
difficult for us to understand Indian or Chinese music). The early development of representational
structures is considered in some detail in WP4, and this work will help to guide developments in
the Music Projector. In summary, the goal of the system is to study the effects of different
patterns of exposure to music on the internal high-level representations they generate, and the
approximate replication of music perceptual and cognitive phenomena. The system will detect
regularities across several perceptual dimensions, and organize its internal representations in
order to account for them.
The architecture of the system will include:
An acoustic front-end that exhibits an acceptable degree of perceptual plausibility. This
will be the same front-end used in WP3, 4, 5 & 6.
Simple detectors such as those that have been found or hypothesized in the auditory
pathway and cortex. For example, detectors of noise, continuity, change, harmonicity,
correlation across channels, etc.
Specific detectors for pitch and temporal information, although these will be refined later
in response to work in WP5 and WP6.
One or more memory systems with a resonance mechanism that is capable of maintaining
a fading trace of the input for some time, for long-term storage, and also for generating
expectancies.
One or more learning components that make possible auto-association and also learning
by external teaching.
A stream generation component that makes it possible to decompose a complex
combination of stimuli into a series of simpler auditory streams.
An attentional component that makes it possible to change the saliency of specific musical
dimensions or of specific streams, according to the ongoing outputs of the analysis
processes and the goals given to the system (if any).
A categorization or chunking mechanism to associate complex streams and sub-streams
with simpler representations that may act as labels for them.
A music generation component that is capable of synthesizing musical sounds that can act
as an auditory feed back of the expectancies generated by the system.
Plan of work
The work to be undertaken will be organized around the following tasks:
T1. Reviews of existing literature and software components. The review will focus on: (1)
acoustic front ends, (2) auditory memory and cognition processes, (3) music similarity and
saliency, and (4) music streaming. (months 1-12)

013123 (EmCAP) Annex I, vers. 3 (27/05/05) Approved by EC on 1 June 2005 page 66 of 101

T2. System development. The development of the music projector is expected to follow four
phases:
a. Elaboration of a mock-up of music analysis system, (based upon the initial
architecture formulated in WP7), used as a proof of concept and as a way to elicit
feedback for further development, but where some functionalities may be absent or
very limited (months 6-12)
b. Elaboration and testing of the first version of music projector (months 13-24)
c. Elaboration and testing of the final version of music projector, incorporating, as
input, theoretical and empirical findings from other partners, and software
developments from them, if feasible; this version will also be interactive (i.e., it
will generate audible music outputs corresponding to music predictions) (months
25-36)
T3. Simulation experiments. (months 12-36) Different simulation experiments are envisioned:
a. Novelty detection
b. Perceptual learning and categorization
c. Similarity
d. Saliency
e. Stream segregation
f. Implicit learning by exposure to different music cultures
Deliverables
D8.1.1 Survey and evaluation of existing auditory software components and cognitive
architectures (Report, month 7)
D8.1.2 Overview of saliency in music processing (Report, month 12)
D8.2.1 Mock-up of music analysis system (Software prototype, Month 10)
D8.2.2 First version of music projection system (Software prototype, Month 18)
D8.3.3 Music Projector incorporating other partners innovative contributions (Software
prototype, Month 24)
D8.3.4 Final Music Projector (Software prototype, Month 36)
D8.4.1 First results from experimental simulations (Report, Month 30)
D8.4.2 Results from advanced experimental simulations (Report, Month 36)
D8.5 Computational principles relevant to a general model of music cognition (Report, Month36)

Milestones and expected results


1) Identification of existing theoretical and computational elements to be considered, included,
or recoded in the music projector, and testing of the available software components.
Implementation of a mock-up of the music processing system that can be used as proof of
concept. At least 3 papers presented in conferences (month 12).
2) A basic music analysis system is running; it will mostly include previously existing
components from partners and from elsewhere. At least 4 papers presented (1 of them in a
journal) (month 24).
3) An interactive music listening system is running, and has been extensively tested.
Simulations of several processes and phenomena are achievable. Two PhD Theses are well
advanced. Chapter contributions to the planned book. At least 4 papers presented (2 of them
in journals) (month 36).

013123 (EmCAP) Annex I, vers. 3 (27/05/05) Approved by EC on 1 June 2005 page 67 of 101

WP0. Management, communication and documentation


1
0
Start date or starting
Workpackage
event:
number:
Participant id:
UoP
FUPF
MTAPI
1
1
Person-months per participant: 22

UvA
1

Objectives
1. Ensure effective project coordination and administration.
2. Facilitate effective collaboration, integration, and communication.
Description of work
Project coordination
Liaise with and prepare and submit reports and deliverables to the Commission. Inform the
Commission about any circumstances that may alter project goals, and negotiate changes in goals.
Receive and disseminate deliverables to and between workpackage leaders, while monitoring their
quality and timeliness. Ensure efficient management of tasks and the establishment of effective
communications between members of the consortium. Mediate and manage in the event of conflict
arising between partners. Monitor risk elements and identify problems or delays; take appropriate
actions and adjust manpower assignment, if necessary. Monitor milestone decision points.
Coordinate the consortium's representation at major meetings that are likely to generate useful
feedback. Identify any developments outside the collaboration that may impact the project.
Coordinate joint publications, ensure consistent quality standards and that authorships conform to
consortium agreement. Organise and coordinate bi-annual meetings of the Project Steering
Committee and irregular meetings of smaller groups; together with workpackage leaders set the
agenda for consortium meetings; record, circulate and agree the minutes for each meeting.
Administrative coordination
Monitor budget and manpower use of all partners. Receive payments from the Commission
and transfer to consortium members according to the agreed budget. Liaise with workpackage
coordinators to ensure consistency of project progress and to assemble six monthly progress
reports, mid-term reports and the final report. Coordinate travel to consortium and international
meetings. Coordinate organisation of final workshop, including venue, invitations to participants,
publicity and subsequent publication of proceedings. Coordinate contractual issues such as
amendments to the project contract, collaboration agreements and audit certificates.
Ensuring integration, collaboration and scientific and technological progress
Monitor and report on workpackage progress in relationship to the agreed project plan,
paying particular attention to project timescales and integration points. Report problems and
slippages promptly and plan remedial actions. Ensure the timely and accurate communication of
experimental results and modelling advances to all members of the consortium. Ensure that the
objectives and milestones of each workpackage is achieved, and that deliverables are available on
time.
Deliverables
D0.1 Progress report, year 1 (month 12)

013123 (EmCAP) Annex I, vers. 3 (27/05/05) Approved by EC on 1 June 2005 page 68 of 101

D0.2 Progress report, year 2 (month 24)


D0.3 Final project report (month 36)
D0.4 Project catalogue - a CD-ROM which will include the key results of the project; all the key
project deliverables, reports, demos, publications and press releases (month 36)

Milestones and expected results


1) Establish schedules and procedures for meetings, integration and collaboration (month 3).
2) Monitor progress and activity through the first year, and modify project organisation and
decide upon remedial actions in the event that any deficiencies are found (month 12).
3) Monitor progress and activity through the first year, and modify project organisation and
decide upon remedial actions in the event that any deficiencies are found (month 24).
4) Monitor progress and activity through the final year and decide upon remedial actions in the
event that problems are found (month 36).

013123 (EmCAP) Annex I, vers. 3 (27/05/05) Approved by EC on 1 June 2005 page 69 of 101

8. Project resources and budget overview


8.1 Efforts for the project (STREP/STIP Efforts Form in Appendix 1)

Research/innovation
activities
WP1
WP2
WP3
WP4
WP5
WP6
WP7
WP8
Total
research/innovation
Management
activities
WP0
Total management

TOTAL
ACTIVITIES

UOP

FUPF

MTAPI

UVA

TOTAL
PARTNERS

2
36
6
36
0
48
10
5
143

2
2
60
2
2
2
12
61
143

120
0
0
0
0
0
4
0
124

1
0
2
0
52
0
4
2
61

125
38
68
38
54
50
30
68
471

22
22

1
1

1
1

1
1

25
25

165

144

125

62

496

013123 (EmCAP) Annex I, vers. 3 (27/05/05) Approved by EC on 1 June 2005 page 70 of 101

8.2 Overall budget for the project


CPF Form 3.1

CPF Form 3.2

8.3 Management level description of resources and budget.


The project will last for three years. Project resources include the cost of employing
postdoctoral and postgraduate researchers within each team, essential consumables, travel
expenses, participation in scientific meetings, and for the installation of essential equipment.
Within the budget for each partner, the requested equipment budget is limited indicating that
the partners are well equipped and have the infrastructure and resources to carry out the work.
The total project budget is allocated as:
RTD and innovation related activities (incl. overheads):
1,864,427
Management activities at 4.3 % of the total RTD budget:
85,573
Total Requested:
1,950,000
A detailed breakdown of requested resources and their justifications are given below.
Human Resources:
UoP: 475,530
Two postdoctoral researchers (RF/L Pt13 155,547 each) and one research
student (RA Pt6 105,784) and one research assistant half-post (RA Pt6
52,892); total 126 man-months. Payments for subjects for perceptual
013123 (EmCAP) Annex I, vers. 3 (27/05/05) Approved by EC on 1 June 2005 page 71 of 101

experiments 8/hour * 4 hours * 30 subjects * 6 experiments = 5,760). In


addition, the principal investigators will devote a total of 24 man-months to the
project (a contribution of personnel resources to the project at an approximate
value of 108,000).
Total: 150 man-months
FUPF: 492,480
Supervision of Research Director (1 day/week = 7.2 man-months in total @
4,500 per month each), two postdoctoral researchers F/T (32.4 man-months in
total @ 3,600 per month each) and two research students (32.4 man-months
in total @ 3,000 per month each).
Total: 144 man-months.
MTAPI: 139,800
One postdoctoral researcher (50,000), one doctoral student (40,000), one
nurse (32,500), and part-time (2 hours/day) salary for a statistical expert
(12,500); total 117 man-months. Payments for subjects for the adult part of
the EEG experiments (6/hour * 5 hours * 40 subjects * 4 experiments =
4,800). In addition, the principal investigator will devote a total of 8 manmonths to the project (a contribution of personnel resources to the project at an
approximate value of 36,000).
Total: 125 man months
UvA: 216,549
One postdoctoral researcher (BBRA 11.0 158,525) and one research assistant
half-post (BBRA 10.0 58,024); total 54 man-months. In addition, the
principal investigator will devote a total of 8 man-months to the project (a
contribution of personnel resources to the project at an approximate value of
36,000).
Total: 62 man-months
Equipment:
UoP: 12,000
Three high performance PCs for modelling and simulation at 4,000 each for
each of the researchers. The existing sound equipment in the auditory
laboratory (with an approximate value of 15,000) will be used for perceptual
experiments.
FUPF: 16,000
Four high performance PCs for modelling and simulation plus one portable, @
4,000 for each PC.
MTAPI: 61,000
EGI mobile EEG measuring system for the field lab in the hospital (includes
pre- and main amplifiers, data collection PC, and software 54,000),
stimulation unit for the field lab (includes PC, loudspeakers/baby-headphones,
and software 7,000). The existing EEG and stimulation equipment in the
adult auditory EEG laboratory of MTAPI (with an approximate value of
300,000) will be used for the adult experiments.
UvA: 10,000
013123 (EmCAP) Annex I, vers. 3 (27/05/05) Approved by EC on 1 June 2005 page 72 of 101

Two high performance PCs for theoretical modelling and simulation at 4,000
each for each of the researchers, 2,000 for Audio/MIDI hardware.
Consumables:
UoP: 7,000
To cover services such as communication, publishing, photocopying and
printing, and equipment maintenance. Software licenses and media storage
needed during the lifespan of the project.
FUPF: 7,000
To cover services such as communication, publishing, photocopying and
printing, and equipment maintenance. Software licenses and media storage
needed during the lifespan of the project.
UvA: 7,000
To cover services such as communication, publishing, photocopying and
printing, and equipment maintenance. Software licenses and media storage
needed during the lifespan of the project.
MTAPI: 28,000
Electrode caps (one adult, one baby 3,000), consumables for EEG
experiments (disposable electrodes, electrode paste, cleaning materials,
headphone inserts, etc. 8,000), computer-related consumables,
communication, publishing, photocopying, printing costs and equipment
maintenance (10,000). Software licenses and media storage needed during the
lifespan of the project (4,000).

Travel and subsistence:


For each partner we envisage roughly the same expenditure pro-rata on travel and
subsistence.
UoP: 44,000
For meetings and conferences for the six researchers involved in the project,
assuming an average of two meetings or conferences per year per person for
three years at 1,000 /meeting, 36,000. Additionally, as mobility and
professional training of researchers is seen as important for the successful of
this project funds of 8,000 are requested to support exchange trips for
collaborative work on aspects of the project.
FUPF: 44,000
For meetings and conferences for the six researchers involved in the project,
assuming an average of two meeting or conference per year per person for
three years at 1,000 /meeting, 36,000. Exchange trips to allow collaborative
work on aspects of the project 8,000.
MTAPI: 38,000
For meetings and conferences for the three researchers involved in the project,
assuming an average of two meetings or conferences per year per person for
three years at 1,000 /meeting, 18,000. Exchange trips to allow collaborative

013123 (EmCAP) Annex I, vers. 3 (27/05/05) Approved by EC on 1 June 2005 page 73 of 101

work on aspects of the project 8,000. Finally, 12,000 is requested for


training the post-doc fellow and the nurse in running experiments in new-born
babies in a laboratory that has long-standing experience in this type of work,
the Cognitive Brain Research Unit and the collaborating Hospital for Children
and Adolescents of the Helsinki University.
UvA: 22,000
For meetings and conferences for the three researchers involved in the project,
assuming an average of two meetings or conferences per year per person for
three years at 1,000 /meeting, 18,000. Exchange trips to allow collaborative
work on aspects of the project 4,000.

Other
UoP: 20,400
For organisation of the final public workshop on music cognition, and the
printing of a collection of papers, 12,000. For share of final project
publication, plus the groups project publications (3,000); conference/seminars
fees (6 researchers in the project at an average of 1 annual conference;
attendance at 18 conference/seminars @ 300 per conference = 5,400 in
total).
FUPF: 12,000
Establish and maintain web site (3,600). For share of final project publication,
plus the groups project publications (3,000); conference/seminars fees (6
researchers in the project at an average of 1 annual conference; attendance at
18 conference/seminars @ 300 per conference = 5,400 in total).
MTAPI: 5,200
For share of final project publication, plus the groups project publications
(2,500); conference/seminars fees (3 researchers in the project at an average
of 1 annual conference; attendance at 9 conference/seminars @ 300 per
conference = 2,700 in total).
UvA: 5,200
For share of final project publication, plus the groups project publications
(2,500); conference/seminars fees (3 researchers in the project at an average
of 1 annual conference; attendance at 9 conference/seminars @ 300 per
conference = 2,700 in total).
Management activities
UoP: 51,451
An administrative coordinator will be employed for 2 days per week at a cost
of 36,376 to assist with project coordination. Other costs requested include:
(subcontracted) audit costs at 6,000, coordination consumables at 1,500, and
overhead 7,575.
Total: 15 man-months
FUPF: 14,050

013123 (EmCAP) Annex I, vers. 3 (27/05/05) Approved by EC on 1 June 2005 page 74 of 101

This includes management/administrative personnel costs at 10,000, and


(subcontracted) audit costs of 4,050.
MTAPI: 10,000
This includes project management and administrative personnel costs at
5,050, and (subcontracted) audit costs of 4,050.
UvA: 10,000
Includes project management and administrative personnel costs at 5,950, and
(subcontracted) audit costs at 4,050.

9. Ethical issues
WP1: Higher level auditory functions underlying music perception: Innate vs. learned
operations

The work package includes electrophysiological experimental measurements in


neonates and young healthy adult subjects. In conducting the experiments we will strictly
adhere to the applicable national, EU-wide, and international laws, treaties, and ethical
guidelines.
Measurement safety issues
The measurements are non-invasive and represent minimal risk to subjects. Electrodes
are attached to the scalp with an electrode paste and the electric potential (EEG) is measured.
The electrodes, the electrode paste, and the equipment to be employed are certified for use in
medical and experimental settings within the EU and, specifically in Hungary. Safety
measures include shock protection (no galvanic contact between the subject and the devices
connected to electric outlets; insulation standards, etc.) and the use of skin-friendly materials.
These are periodically checked in the laboratories of MTAPI and certification will be acquired
before experiments in the new field unit can start at Clinic I for Gynaecology of the
Semmelweis Medical School (CIGSMS). In case of any health-related after-effect (skin
irritation), consultation will be provided after the experiment.
General ethical permissions
Preliminary permission has been acquired from the local ethical committee of MTAPI
(for the adult experiments) and from the CIGSMS (experiments on neonates) for submitting
the current application (these do not permit conducting experiments, only submitting the
proposal). Before starting the research we will acquire full ethical permission for each of the
experiments from both institutions.
Subject recruitment and rights
All subjects will volunteer for the experiments. Adult subjects will be recruited through
part-time employment agencies and they will be paid an hourly fee for their participation.
Newborn babies will be recruited in the hospital by the physician asking the parents to
volunteer for the experiment. Neither the babies nor their parents will receive any
compensation (financial, extra treatment, or other service advantages) for their participation,
except that the parents will learn the results of the objective audiometry conducted as an
integral part of the experiment on neonates. Audiometry can indicate possible hearing
problems.

013123 (EmCAP) Annex I, vers. 3 (27/05/05) Approved by EC on 1 June 2005 page 75 of 101

Prior to the experiment, written consent will be obtained from the subject (adult
subjects) or the parent (neonates) after the goal, procedures, and potential risks of the
experiment, their rights, and data management issues are explained them in detail. The
information will consist of a written part (included in the signed consent form) and
consultation with the experimenter and, in case of the babies, with the physician. The parent
(or both parents) will be present at the experiment conducted on newborns. The subject (or the
parent present) can terminate the experiment at any point without the need to give a reason for
the termination.
Subjects will only be refused from participation if they do not meet the pre-set health or
age criteria (i.e., not suffering from neurological diseases or having hearing problems, 18-30
years for adults, and full-term birth for newborns). No gender or race criteria will be used,
though we will aim at an approximate balance between genders. (Previous research found no
sex- or race-related differences regarding the method used in the experiments.)
Experimenters (both for adults and for neonates) will receive training regarding all
ethical issues (subjects rights, data management, etc.) as well as about the optimal ways to
communicate with subjects.
Data management
Information that allows subject identification will be treated according the privacy act
of Hungary (i.e., not disclosed to anyone outside the research team directly involved in the
experiments and kept in safe records only for the time required by the evaluation of the results
and retention of records for the dissemination of the results). The experimental results, which
do not allow subject identification (since we do not collect genetic material or other
identifiable biological information), will be disseminated within the scientific community.
Subjects (or their parents, for newborns) will be given the possibility to learn the results of the
experiment conducted on them (or their child).
Populations possibly benefiting from the results of the research
Beyond the benefits for basic science, results of the research on newborns may be later
applied to a) develop new screening methods for hearing deficits in newborns, b) provide
early corrective measures for hearing deficits, and c) monitor the effects of such corrective
measures. At this point, we see no gender or race issues involved; however, should a
normative database be set up on the basis of our research, these issues will have to be
considered.
WP2: Perception of music form

The work package includes perceptual experiments in young healthy adult subjects. In
conducting the experiments we will strictly adhere to the applicable national, EU-wide, and
international laws, treaties, and ethical guidelines.
Subject recruitment and rights
All subjects will volunteer for the experiments. They will be recruited from the student
population and they will be paid an hourly fee for their participation. Prior to the experiment,
written consent will be obtained from the subject after the goal, procedures, and potential
risks of the experiment, their rights, and data management issues are explained them in detail.
The information will consist of a written part (included in the signed consent form) and
consultation with the experimenter. The subject can terminate the experiment at any point
without the need to give a reason for the termination.
013123 (EmCAP) Annex I, vers. 3 (27/05/05) Approved by EC on 1 June 2005 page 76 of 101

Subjects will only be refused from participation if they do not meet the pre-set health or
age criteria (i.e., not suffering from neurological diseases or having hearing problems, 18-30
years of age). No gender or race criteria will be used, though we will aim at an approximate
balance between genders.
Experimenters will receive training regarding all ethical issues (subjects rights, data
management, etc.) as well as about the optimal ways to communicate with subjects.
Data management
Information that allows subject identification will be treated according the privacy act
of the U.K. (i.e., not disclosed to anyone outside the research team directly involved in the
experiments and kept in safe records only for the time required by the evaluation of the results
and retention of records for the dissemination of the results). The experimental results, which
do not allow subject identification, will be disseminated within the scientific community.
Subjects will be given the possibility to learn the results of the experiment conducted on them.
Populations possibly benefiting from the results of the research
Beyond the benefits for basic science, results of the research will be useful to those
interested in language learning deficits in suggesting new ways in which music could enhance
language learning.

013123 (EmCAP) Annex I, vers. 3 (27/05/05) Approved by EC on 1 June 2005 page 77 of 101

Appendix A - Consortium description


A.1 Participants and consortium
All consortium leaders are international leaders and/or at internationally leading
institutions in their respective spheres.
Prof Istvn Winkler (MTAPI) will lead WP1: Higher level auditory functions
underlying music perception: Innate vs. learned operations. He is a leading expert in the
investigation of auditory perception, memory and sound organization using
electrophysiological methods. This project will provide him with a unique opportunity for
extending his work on auditory model-based predictive functions to neonates.
Dr Eduardo Miranda (UoP) will lead WP2: Perception of musical form. He is a
practicing composer and an expert in computer music. The project will allow him to extend
his work on music cognition, and to explore the role of timbre in the cognition of musical
form.
Prof Gustavo Deco (FUPF) will head WP3: Prefrontal cortical function in the control
of attention and working memory. He has extensive experience in studying and modelling the
emergence of attentional mechanisms in hierarchical neural systems and their role in visual
object recognition. The project will give him the opportunity to extend his theories to the
auditory modality and to focus on temporal aspects of attention.
Prof Michael Denham (UoP) will lead WP4: Spectrotemporal response fields in the
thalamocortical system. His expertise lies in characterising neuron and neuronal circuit
response properties and the interplay between stimulus-evoked and network-based
activations. This project will provide him with the opportunity to extend his work on the
thalamocortical visual system to investigate the development of representations within the
thalamocortical auditory system.
Dr Henkjan Honing (UvA) will head WP5: Perception of rhythmic patterns. He is a
leading researcher in the perception and representation of musical time and
temporal structure. Within this project he will have the opportunity to significantly extend
current models of rhythmic perception.
Dr Susan Denham (UoP) will lead WP6: Active perception, relative pitch and the
emergence of tonality and WP7: Theoretical insights into music cognition. Her expertise lies
in auditory perception and cognition and in computational modelling of auditory processing.
This project will provide Dr Denham with the opportunity to apply her current work on
model-based active perception to investigate the development of representations through the
abstraction of regularities in sounds.
Prof Xavier Serra (FUPF) will lead WP8: Interactive music system. Prof Serra is an
expert in music technology and in the analysis and synthesis of sounds and content-based
retrieval of audio. This project will allow him to extend his work on musical similarity to
incorporate advances in the theoretical understanding of processes underlying music
cognition.
Below we include a more detailed description of the institutions and the investigators
involved in this project.

013123 (EmCAP) Annex I, vers. 3 (27/05/05) Approved by EC on 1 June 2005 page 78 of 101

Partner 1: University of Plymouth (UoP)


Institute: Centre for Theoretical and Computational Neuroscience
Work at the Centre for Theoretical and Computational Neuroscience, University of
Plymouth, is aimed at applying rigorous quantitative approaches, including mathematical and
computational modelling and psychophysics, to study how information is represented,
processed, and stored in the brain, in perception and action. Special areas of study include:
visual and auditory perception; sensorimotor control, in particular oculomotor control; and
mathematical and computational modelling of the neural circuitry underlying perception and
action. Academic staff in the Centre currently include: Prof Roman Borisyuk; Prof Mike
Denham; Dr Susan Denham; Dr Daniel Durstewitz; Prof Chris Harris; and Dr Thomas
Wennekers. The research in this area was deemed to have attained a level of international
excellence in the 2001 UK Research Assessment Exercise, as part of the Psychology and
Computer Science submissions from the University, which were both rated 5.
The Centre has dedicated laboratories for psychophysical experiments in vision,
audition and sensorimotor control, and a 64-channel system for the measurement and analysis
of high-resolution electroencephalography (EEG) and evoked potential (EP) signals, together
with visual and auditory stimulus presentation equipment. It also has access to the 1.5T fMRI
research facility within the Peninsula Medical School, which is equipped to carry out visual
and auditory experiments. The Centre has a number of dedicated high-performance computers
for modelling and simulation, including a Compaq ES45 AlphaServer multiprocessor
computer running True64 Unix, with NEURON and Matlab software. The Centre is
collocated with the Department of Psychology, the School of Biological Sciences and the
Peninsular Medical School and has its own seminar and library space, as well as offices for
academic staff, research students and visiting academics.
Coordinator, Participant 1: Dr Susan Denham
Dr Denham is a Principal Lecturer at the University of Plymouth. Her research interests
lie in developing theoretical and computational models of auditory perception and learning,
focussing on the representation of complex sounds and the segregation and grouping of
sounds within natural acoustic environments. She graduated with distinction from the
University of South Africa in Physics (1980) and in Computer Science (1992). In 1995 she
received a PhD in computing for work on neural models of sub-cortical auditory processing.
Dr Denhams aim is to combine insights gained from detailed neurocomputational modelling,
with computationally efficient implementations for practical applications. Dr Denham is a cofounder and director of NeuVoice Ltd, a university spinout company established in November
1999. NeuVoice provides leading-edge noise robust speech and sound recognition
technology for embedded applications using neurobiologically inspired algorithms, based on
Dr Denham's research, to achieve very high levels of noise robustness. Dr Denhams recent
work has been directed towards characterising the features of sounds which support the
recognition of acoustic objects. Dr Denhams work has been supported by a Leverhulme
Trust Research Fellowship and by the EU (2002-2005: Braun J, Denham SL, Del Guidice P,
Deco G, Indeveri G, Fusi S, Olshausen B. Attend-to-learn and Learn-to-attend with
Neuromorphic, analogue VLSI, European commission, Information Society Technologies
Programme, IST-FET Open (RTD), 2,277,158.00.).
Role in present project

013123 (EmCAP) Annex I, vers. 3 (27/05/05) Approved by EC on 1 June 2005 page 79 of 101

Dr Denham will be responsible for overall coordination of the EmCAP project. She will
also be responsible for WP6 (Active perception, relative pitch and the emergence of tonality),
which will primarily involve theoretical work and computational modelling. She will
coordinate activities in WP7 (Theoretical insights into music cognition), principally the
organisation of the workshop and subsequent publication of a collection of theoretical papers
derived from the project. Dr Denham will devote 33% of her time to this project. In addition,
one postdoctoral fellow and half-post research assistant will be appointed. The postdoctoral
fellow will be responsible for formulating and developing a neurobiologically realistic model
of relative pitch perception and collaborative work with researchers in WP3 to include the
model of working memory in this process. The research assistant will carry out investigations
into active peripheral processing.
Relevant publications
Denham, S.L. (2005). "Dynamic Iterated Ripple Noise: further evidence for the importance
of temporal processing in auditory perception", BioSystems, 79(1-3),199-206.
Khurshid A, Denham SL (2004). "A Temporal Analysis Based Pitch Estimation System
for Noisy Speech with a Comparative Study of Performance of Recent Systems"
IEEE Transactions on Neural Networks, Vol. 15(5), 1112-1124.
Lanyon L J, Denham SL (2004) A model of object-based attention that guides active visual
search to behaviourally relevant locations. Lecture Notes in Computer Science,
Paletta L et al. (eds), Vol. 3368, 42-56.
Lanyon L J, Denham SL (2004). "A model of active visual search with object-based
attention guiding scan paths". Neural Networks, Vol. 17(5-6), 873-897
Lanyon L J, Denham SL (2004). "A biased competition computational model of spatial and
object-based attention mediating active visual search". Neurocomputing, Vol. 5860C, 655-662.
Denham SL (2003).Perception of the direction of frequency sweeps in moving ripple
noise stimuli, in Plasticity of the Central Auditory System and Processing of
Complex Acoustic Signals: Merzenich M, Syka S (eds.) Kluwer Plenum, New York,
273-278.
Packham ISJ & Denham SL (2003), "Visualisation Methods for Supporting the
Exploration of High Dimensional Problem Spaces in Engineering
Design", Proceedings of International Conference on Coordinated & Multiple Views
in Exploratory Visualization (CMV2003), Roberts J. (ed.), London, UK, 15 July
2003, IEEE Computer Society, pp. 2-13.
Denham SL (2001). "Cortical synaptic depression and auditory perception". In
Computational Models of Auditory Function, Greenberg S, Slaney M (ed.s), NATO
ASI Series, IOS Press, Amsterdam, 281-296.
Denham SL, Denham MJ (2001). "An investigation into the role of cortical synaptic
depression in auditory processing". In Emergent Neural Computational Architectures
based on Neuroscience, Wermter S, Austin J, Willshaw D (ed.s), Lecture Notes in
Artificial Intelligence, Springer, 494-506.
Borisyuk R, Denham MJ, Denham SL, Hoppensteadt F (1999). Computational models of
predictive and memory-related functions of the hippocampus, Reviews in the
Neurosciences, 10, 213-232.
McCabe SL, Denham MJ (1997). "A model of auditory streaming", J. Acoust. Soc. Am.,
101(3), 1611-1621.

013123 (EmCAP) Annex I, vers. 3 (27/05/05) Approved by EC on 1 June 2005 page 80 of 101

Participant 2: Prof Michael Denham


Professor Mike Denham is Professor of Neural Systems and Head of the Centre for
Theoretical and Computational Neuroscience at the University of Plymouth. He obtained his
first degree in Electronic and Electrical Engineering in 1968 and his PhD in Mathematical
Systems and Control Theory in 1972. He held positions as a postdoctoral researcher, and then
as lecturer at Imperial College, London before joining Kingston Polytechnic, where he
became Reader in 1981, and Professor in 1986. From 1984 to 1988 he served as Head of the
School of Computing. He joined Plymouth Polytechnic (now the University of Plymouth) as a
Research Professor in 1988. He has published over 100 research papers in control systems
theory and applications, computer aided control systems design, and neural networks and
computational neuroscience. Prof Denham was recently involved in the Foresight Cognitive
Systems Project, for which he co-authored the scientific research review on Sensory
Processing, and had prominent roles in UK Computing Research Committee Grand
Challenge on Architecture of Brain and Mind, and the British Computer Society Grand
Challenges in Research Conference. He was a member of the Computer Science Panel for
both the 1996 and 2001 Higher Education Funding Council for England (HEFCE) Research
Assessment Exercises. In 2001 he was invited to join the UK Computing Research
Committee. He reviews grants for the US National Science Foundation (NSF) and was invited
in 2001 to sit on the review panel for the NSF programme in Revolutionary Computing.
In his current research, Prof Denham aims at characterising neuron and neuronal circuit
response properties in response to stimulus-evoked synaptic input, in the presence of specific
patterns of contextual background membrane potential network activity. In particular he is
interested in the role that specific types of dendritic morphology and membrane conductance
distributions play in determining the interplay between stimulus-evoked and network-based
activation in local neural circuits in neocortex and hippocampus. Prof Denhams work is
currently being supported by a national research grant: Denham MJ (PI), Denham SL, Dudek
P, Furber S, Hausser M, Panzeri S, Roth AE, Schnupp JW, Thomson AM, van Rossum M,
Wennekers,T, Willshaw DJ (2005-2010). A novel computing architecture for cognitive
systems based on the laminar microcircuitry of the neocortex, EPSRC, 1,812,000.
Role in present project
Prof Denham will be responsible for WP4 (Spectrotemporal response fields in the
thalamocortical system), which will involve computational modelling of the thalamocortical
and associated intracortical networks to replicate development of experimentally observed
spectrotemporal response fields and the categorisation of auditory stimuli. Prof Denham will
devote 10% of his time to this project. In addition, one postdoctoral fellow will be appointed,
who will be responsible for the detailed modelling of thalamocortical processing, and for
collaboration with researchers involved in developing the large-scale auditory model (WP3).
Relevant publications
Yousif NAB & Denham MJ (2004). Action potential backpropogation in a model
thalamocortical relay cell. Neurocomputing, 58-60: 393-400.
Borisyuk R, Denham MJ, Hoppensteadt F, Kazanovich Y and Vinogradova O (2001).
Oscillatory model of novelty detection. Network: Computation in Neural Systems,
12: 1-20.
Denham MJ (2001). The dynamics of learning and memory: lessons from neuroscience. In:
Emergent Neural Computational Architectures based on Neuroscience. S.Wermter,

013123 (EmCAP) Annex I, vers. 3 (27/05/05) Approved by EC on 1 June 2005 page 81 of 101

J. Austin, D. Willshaw (eds.), Lecture Notes in Artificial Intelligence (LNAI 2036),


Springer, 333-347.
Denham MJ & Borisyuk RM (2000). A model of theta rhythm production in the septalhippocampal system and its modulation by ascending brain stem pathways.
Hippocampus, 10: 698-716.
Borisyuk R, Denham MJ, Denham SL and Hoppensteadt F (1999). Computational models
of predictive and memory-related functions of the hippocampus. Reviews in the
Neurosciences, 10: 213-232.
McCabe SL & Denham MJ (1997). A model of auditory streaming, J. Acoust. Soc. Am.,
101(3), 1611-1621.
Participant 3: Dr Eduardo Reck Miranda
Dr Eduardo Reck Miranda received an MSc in Music Technology in 1991 from the
University of York and a PhD in Music in 1995 from the University of Edinburgh. He
subsequently served as a research fellow at the Edinburgh Parallel Computing Centre and
lecturer in Music Technology at the University of Glasgow. In 1997, Dr. Miranda moved to
Paris, in France, to take up a research position at Sony Computer Science Laboratory, where
he conducted research aimed at gaining a better understanding of the fundamental cognitive
mechanisms employed in sound-based communication systems, with particular focus on the
evolution of the human ability to speak, and the role of musical cognition in the development
of spoken languages. He was one of the developers of the speech and emotion recognition
technologies for Sonys commercial robot AIBO, and authored five patents in the field of
speech processing. In 2000 he was made Visiting Professor of Interactive Media Arts at
MECAD (Barcelona) and in 2003 he was appointed Reader in Artificial Intelligence and
Music at the University of Plymouth. He is a member of the editorial board of Leonardo
Music Journal (MIT Press, USA), Contemporary Music Review (Routledge, USA) and
Organised Sound (Cambridge University Press, UK). He is also a member of the reviewing
panel of the European Unions post-graduate scholarships programme Alban. In addition, Dr
Miranda is a practicing composer of international reputation. His compositions are regularly
broadcast and performed in concerts and festivals worldwide.
Role in present project
Dr Miranda will have responsibility for WP2 (Perception of music form), in which the
relationship between language and musical form, and the role of timbre in the perception of
form will be investigated. This will involve analytical studies and perceptual experiments, and
the composition of music stimuli, also for use in WP8. Dr Miranda will devote 20% of his
time to this project. In addition, one PhD student will be appointed as a research assistant. The
PhD student will be responsible for conducting the perceptual experiments, and for data
collection and analysis.
Relevant publications
Miranda ER (2004). At the Crossroads of Evolutionary Computation and Music: SelfProgramming Synthesizers, Swarm Orchestras and the Origins of Melody.
Evolutionary Computation, Vol. 12, No. 2., pp. 137-158.
Westerman G & Miranda ER (in press, 2004). A New Model of Sensorimotor Coupling in
the Development of Speech. Brain and Language.

013123 (EmCAP) Annex I, vers. 3 (27/05/05) Approved by EC on 1 June 2005 page 82 of 101

Miranda ER, Kirby S and Todd P (2003). On Computational Models of the Evolution of
Music: From the Origins of Musical Taste to the Emergence of Grammars.
Contemporary Music Review, Vol. 22, No. 3, pp. 91-111.
Westerman G & Miranda ER (2003). Modelling the Development of Mirror Neurons for
Auditory-Motor Integration. Journal of New Music Research, Vol. 31, No. 4, pp.
367-375.
Miranda ER (2003). On the evolution of music in a society of self-taught digital creatures.
Digital Creativity, Vol. 14, No. 1, pp. 29-42.
Miranda ER (2003). On the Music of Emergent Behaviour: What can Evolutionary
Computation Bring to the Musician?, Leonardo, Vol. 36, No. 1, pp. 55-58.
Westerman G & Miranda ER (2002). Integrating Perception and Production in a Neural
Network Model, J. A. Bullinaria and W. Lowe (Eds.), Connectionist Models of
Cognition and Perception, Progress in Neural Processing Vol. 14. London: World
Scientific.
Miranda ER (2002). Emergent Sound Repertoires in Virtual Societies. Computer Music
Journal, Vol. 26, No. 2, pp. 77-90.
Miranda ER (2002). Mimetic Development of Intonation. In C. Anagnostopoulou, M.
Ferrand and A. Smaill (Eds.), Music and Artificial Intelligence, Lecture Notes in
Computer Science (LNAI 2445), pp. 107-118. Berlin: Springer Verlag.
Miranda ER (2002). Generating Source Streams for Extralinguistic Utterances. Journal of
the Audio Engineering Society (AES), Vol. 50, No. 3, pp. 165-172.
Miranda ER (2001). Automatic Sound Identification based on Prosodic Listening.
Proceedings of the 17th International Congress on Acoustics, Rome, Italy. Rome
(Italy): ICA.
Miranda ER (2001). Synthesising Prosody with Variable Resolution. Proceedings of the
110th Audio Engineering Society Convention, Amsterdam, The Netherlands. New
York (NY): AES.
Miranda ER (2001). Improved Synthesis of Ultra-Linguistic Utterances. SONY Research
Forum Technical Digests, Tokyo, Japan. Tokyo (Japan): SONY Corporation.

Partner 2: Universitat Pompeu Fabra, Barcelona


Institute: Fundaci Universitat Pompeu Fabra (FUPF) www.ec.upf.es
FUPF is a non-profit-making organisation established by Universitat Pompeu Fabra at
the beginning of 2002. The object of the Fundaci is to cooperate in the accomplishment of
the objectives of the Universitat Pompeu Fabra. The Estaci de la Comunicaci, supported
by the Fundaci UPF, groups together on a single site the Escola Superior Politcnica,
delivering degrees in Computer Science and Telematics; the Departament de Tecnologia,
specialised in digital communication technology (both at www.upf.es/esup); and the Institut
Universitri de lAudiovisual (www.iua.upf.es) an interdisciplinary centre which provides a
meeting point for the traditionally separate fields of engineering, science, design, computing
and communication, running several postgraduate programmes in Digital Media. The
Department and the Institute are running for the fifth year a successful PhD programme on
Informtica i Comunicaci Digital, now attracting over 60 students from all over the world.
The current research groups are: Music Technology, Interactive Audiovisual, Distributed
Multimedia Applications, Mathematical Image Processing, Computational Linguistics Group
and Interactive Technology Group.
013123 (EmCAP) Annex I, vers. 3 (27/05/05) Approved by EC on 1 June 2005 page 83 of 101

The Computational Neuroscience Group, directed by Prof. Gustavo Deco, investigates


mechanisms of human visual cognition underlying the intelligent and flexible analysis of
complex visual scenes. The group builds models of the neuronal mechanisms of visual
attention and investigate how attention is controlled by the interplay between conflict
detection/management and short term memory. This approach follows the hypothesis that
cognitive phenomena like visual information selection, reasoning and decision making about
which information to select, are generated by the mutual recurrent influences and interactions
between brain areas related to visual perception, conflict detection, learning and memory. The
group also applies computational neuroscience based techniques in other cognitive areas, like
psycholinguistics.
The Music Technology Group (MTG) will also participate in this project. MTG was
created in 1994 by its current director, Dr. Xavier Serra, as one of the research groups of the
Audiovisual Institute (IUA), a centre for interdisciplinary research in the different areas of
Audiovisual Communication. The MTG is currently working on many publicly funded
projects, both at the EU and national levels; and privately funded projects with companies like
Yamaha, SGAE, Telefnica I+D, Roland, Tape Gallery and DUY. The MTG participates in
the following European projects,: CUIDADO (development of technologies for content-based
products and services using the emerging MPEG-7 standard), RAA (automatic audio
recognition methodology for broadcast monitoring), OPENDRAMA (definition, development
and integration of a novel platform to author rich cross-media digital objects of lyric opera
and other vocal dramatic music), AGNULA (A GNU Linux Audio distribution with software
applications and development platforms related with sound and music to be part of Red-Hat
and Debian Linux distributions), and MOSART (a research network in the area of sound and
music computing). Other relevant projects are AUDIOCLAS (sound effects classification and
retrieval, Eureka E! 2668), TABASCO (content-based audio transformation, TIC-2000-1094C02-01), AIDA (Audio Analysis and Identification, SGAE: Spanish Authors Society),
SIMAC (Semantic Interaction with Music Audio Contents), SEMANTIC HIFI (Browsing,
Listening, Interacting, Performing, Sharing on Future HIFI Systems) HARMOS (European
Multilingual Digital Data Collection for Multimedia Content in Music Heritage) and S2S2
(Sound to Sense, Sense to Sound). The MTG is currently organizing ISMIR 2004 (5th
International Conference on Music Information Retrieval) and next year will organize the
ICMC (International Computer Music Conference).
Participant 4: Prof Gustavo Deco
Prof. Dr. phil. Dr. rer. nat. habil. Gustavo Deco is Research Professor at the Instituci
Catalana de Recerca i Estudis Avanats at the Pompeu Fabra University (Barcelona) where he
is head of the Computational Neuroscience group at the Department of Technology and also
the co-director of the doctoral program in Computer Science and Digital Communication.
Prof Deco He studied Physics at the National University of Rosario (Argentina) and in 1987
received his Ph.D. degree in Physics for his thesis on Relativistic Atomic Collisions. From
1987 to 1990 he held postdoctoral positions at the University of Bordeaux and then at the
Alexander von Humboldt Foundatio, University of Giessen, Germany. From 1990 to 2003, he
led the Computational Neuroscience Group in the Neural Computing Section at Siemens
Corporate Research Centre in Munich, Germany. In 1997, he obtained his habilitation
(maximal academic degree in Germany) in Computer Science (Dr. rer. nat. habil.) at the
Technical University of Munich for his thesis on Neural Learning. In 2001, he received his
PhD in Psychology (Dr. phil.) for his thesis on Visual Attention at the Ludwig-MaximilianUniversity of Munich. Since 1998 he has been Associate Professor at the Technical
University of Munich and Honorary Professor at the University of Rosario, and since 2001

013123 (EmCAP) Annex I, vers. 3 (27/05/05) Approved by EC on 1 June 2005 page 84 of 101

Invited Lecturer at the Ludwig-Maximilian-University of Munich. Since 2001 he has also


been McDonnell-Pew Visiting Fellow at the Centre for Cognitive Neuroscience, University of
Oxford. In 2001 he was awarded the international price of Siemens "Inventor of the Year" for
his contribution in statistical learning, models of visual perception, and fMRI based diagnosis
of neuropsychiatric diseases.
His research interests include computational neuroscience, neuropsychology,
psycholinguistics, biological networks, statistical formulation of neural networks, and chaos
theory. He has published three books, more than 90 papers in International Journals, 130
papers in International Conferences and 14 book chapters. He has also 44 patents in Europe,
USA, Canada and Japan.
Role in present project
Prof Deco will be responsible for WP3 (Prefrontal cortical function in the control of
attention and working memory), which will involve the development of a large-scale,
neurobiologically realistic cortical model of the primate auditory cortex and medial geniculate
nucleus in order to investigate temporal aspects of attention in auditory perception, and the
role of working memory in music cognition. Prof Deco will devote 20% of his time to this
project. In addition, one postdoctoral fellow and one PhD student will be appointed. The
postdoctoral fellow will be responsible for the large-scale model of auditory cortex and
thalamus, and for interacting with researchers involved in investigating the detailed
functionality of the thalamocortical network (WP4), and with the attentional control of
peripheral processing (WP6). The PhD student will work on incorporating a model of
prefrontal cortex to investigate the role of working memory in perceptual constancy and
perceptual categorization in audition; and will also liaise with researchers in WP5 and WP6 in
the modelling the role of working memory in maintaining global pitch and timing information
in order to support the extraction of relative pitch and timing.
Relevant publications
Rolls E & Deco G (2001). Computational Neuroscience of Vision. Oxford University
Press, Oxford.
Deco G & Schrmann B (2000). Information Dynamics: Foundations and Applications.
Springer Verlag, New York.
Deco G & Obradovic D (1996). An Information-Theoretic Approach to Neural
Computation. Springer Verlag, New York.
Deco G & Rolls E (in press, 2004). Synaptic and Spiking Dynamics underlying Reward
Reversal in the Orbitofrontal Cortex. Cereb. Cortex.
Szabo M, Almeida R, Deco G, Stetter M (2004). Cooperation and Biased Competition
Model Can Explain Attentional Filtering in the Prefrontal Cortex. European Journal
of Neuroscience, 19, 1969-1977.
Corchs S & Deco G (2004). Feature Based Attention in Human Visual Cortex: Simulation
of fMRI Data. NeuroImage, 21, 36-45.
Deco G & Rolls E (2004). A Neurodynamical Cortical Model of Visual Attention and
Invariant Object Recognition. Vision Research, 44, 621-642.
Deco G, Rolls E, Horwitz B (2004). 'What' and 'Where' in Visual Working Memory: A
Computational Neurodynamical Perspective for Integrating fMRI and Single-Neuron
Data. Journal of Cognitive Neuroscience, 16, 683-701.

013123 (EmCAP) Annex I, vers. 3 (27/05/05) Approved by EC on 1 June 2005 page 85 of 101

Deco G & Rolls E (2002). Attention and Working Memory: A Dynamical Model of
Neuronal Activity in the Prefrontal Cortex. European Journal of Neuroscience, 18,
2374-2390.
Corchs S & Deco G (2002). Large-scale Neural Model for Visual Attention: Integration of
Experimental Single Cell and fMRI Data. Cerebral Cortex, 12, 339-348.
Deco G & Rolls E (2002). Object-Based Visual Neglect: A Computational Hypothesis.
European Journal of Neuroscience, 16, 1994-2000.
Deco G, Pollatos O, Zihl J (2002). The Time Course of Selective Visual Attention: Theory
and Experiments. Vision Research, 42, 2925-2945
Deco G & Zihl J (2001). A Neurodynamical Model of Visual Attention: Feedback
Enhancement of Spatial Resolution in a Hierarchical System. Computational
Neuroscience, 10, 231-251.
Deco G & Zihl J (2001). Top-down Selective Visual Attention: A Neurodynamical
Approach. Visual Cognition, 8, 119-140.
Deco G & Schrmann B (1997). Information Transmission and Temporal Code in Central
Spiking Neurons. Physical Review Letters, 79, 4697-4700.
Participant 5: Prof Xavier Serra
Professor Dr Xavier Serra is the head of the Music Technology Group, Director of the
Audiovisual Institute (IUA) and Director of the Department of Technology of the Pompeu
Fabra University (UPF) in Barcelona, where he has been Professor since 1994. He holds a
Masters degree in Music from Florida State University (1983), a Ph.D. in Computer Music
from Stanford University (1989) and worked for two years as Chief Engineer in Yamaha
Music Technologies USA, Inc. His research interests are in sound analysis and synthesis for
music and other multimedia applications. Specifically, he is working with spectral models and
their application to synthesis, processing and high quality coding, as well as other music
related problems such as: sound source separation, performance analysis and content-based
retrieval of audio.
Dr. Serra is editor for a number of international journals, reviewer for several
international conferences and for the 6th framework program of the European Commission,
member of a number of professional organisations and he is often invited at conferences and
workshops as a guest speaker. He is the principal investigator of more than 10 major research
projects funded by the European Commission and other public and private institutions. He has
more than 30 patents, most of them submitted in Japan and the USA, he has published more
than 30 articles in international journals and proceedings of conferences and he has
contributed to several books. Dr. Serra also maintains an active music activity by playing the
cello and teaching at the Escola Superior de Msica de Catalunya (ESMUC), where he is the
head of the Department of Sonology.
Current international activities of Dr. Serra include being the coordinator of the
European project SIMAC (Semantic Interaction with Music Audio Contents), chair of the
ISMIR 2004 (5th International Conference on Music Information Retrieval) and research
chair of the ICMC 2005 (International Computer Music Conference).
Role in present project
Dr. Serra will be responsible for WP8 (Interactive music system), which will involve
the development of a music processing system (the music projector) to study the development
of internal musical codes, music expectancies and music cognition phenomena such as music
013123 (EmCAP) Annex I, vers. 3 (27/05/05) Approved by EC on 1 June 2005 page 86 of 101

categorization, similarity ranking, or streaming. The system will be able to synthesize, as


musical output, the expectancies generated after processing short musical excerpts provided
as input, and will be used to study basic processes of music perception, learning and
categorization.
In incorporating improved contextual processing, attentional modulation of processing,
and enhancements to the processing of pitch and rhythmic patterns (WP3-6), this
workpackage will be the initial beneficiary of contributions to WP7. Prof Serra will devote
20% of his time to this project. In addition, one postdoctoral fellow and one PhD student will
be appointed. The postdoctoral fellow will be responsible for devising and testing the system,
and for planning and analysing the simulation experiments. The PhD student will work to
implement the required software components and functions. The group will also contribute its
expertise to assist other partners software developments and design of experiments
Relevant publications
Amatriain X, Bonada J, Loscos A, Serra X (2002). Spectral Processing. Udo Zlzer Ed.,
DAFX: Digital Audio Effects, 554, John Wiley & Sons.
Serra X (2002). The Musical Communication Chain and its Modeling. Assayag, Gerard;
Feichtinger, Hans-Georg; Rodrigues, Jose Francisco Ed., Mathematics and Music. A
Diderot Mathematical Forum, 243, Springer Verlag.
Bonada J, Celma O, Loscos A, Ortol J, Serra X, (2001). Singing Voice Synthesis
Combining Excitation plus Resonance and Sinusoidal plus Residual Models.
Proceedings of International Computer Music Conference 2001, Havana, Cuba.
Amatriain X, Bonada J, Loscos A, Serra X (2001). Spectral Modeling for Higher-level
Sound Transformation. Proceedings of MOSART Workshop on Current Research
Directions in Computer Music, Barcelona.
Bonada J, Loscos A, Cano P, Serra X (2001). Spectral Approach to the Modeling of the
Singing Voice. Proceedings of 111th AES Convention New York, USA
Cano P, Loscos A, Bonada J, de Boer M, Serra X (2000). Voice Morphing System for
Impersonating in Karaoke Applications. Proceedings of International Computer
Music Conference 2000, Berlin, Germany.
Herrera P, Amatriain X, Batlle E, Serra X ( 2000). Towards Instrument Segmentation for
Music Content Description: a Critical Review of Instrument Classification
Techniques. Proceedings of International Symposium on Music Information
Retrieval, Plymouth, MA (USA).
Wright M, Beauchamp J, Fitz K, Rodet X, Roebel A, Serra X, Wakefield G (2000).
Analysis/synthesis comparison. Organized Sound, 5(3),173-189.
Amatriain, X. Arumi, P. Ramrez, M. 2002. 'CLAM, Yet Another Library for Audio and
Music Processing?' Proceedings of 17th Annual ACM Conference on ObjectOriented Programming, Systems, Languages and Applications. Seattle, WA, USA.
Amatriain, X. Bonada, J. Loscos, A. Arcos, J. Verfaille, V. 2003. 'Content-based
Transformations' Journal of New Music Research Vol.32 .1.Amatriain, X. de Boer,
M. Robledo, E. Garcia, D. 2002. 'CLAM: An OO Framework for Developing Audio
and Music Applications' Proceedings of 17th Annual ACM Conference on ObjectOriented Programming, Systems, Languages and Applications. Seattle, WA, USA.
Durand, N. Gmez, E. 2001. 'Periodicity Analysis using An Harmonic Matching method
and Bandwise Processing' Proceedings of MOSART Workshop on Current Research
Directions in Computer Music. Barcelona.
013123 (EmCAP) Annex I, vers. 3 (27/05/05) Approved by EC on 1 June 2005 page 87 of 101

Gmez, E. Klapuri, A. Meudic, B. 2003. 'Melody Description and Extraction in the


Context of Music Content Processing' Journal of New Music Research Vol.32 .1.
Gouyon, F. Herrera, P. 2003. 'Determination of the Meter of musical audio signals:
Seeking recurrences in descriptor of beat segment descriptors' Proceedings of Audio
Engineering Society, 114th Convention. Amsterdam, The Netherlands.
Gouyon, F. Meudic, B. 2003. 'Towards Rhythmic Content Processing of Musical Signals:
Fostering Complementary Approaches' Journal of New Music Research Vol.32 .1.
Herrera, P. Peeters, G. Dubnov, S. 2003. 'Automatic Classification of Musical Instrument
Sounds' Journal of New Music Research Vol.32 .1.
Herrera, P. Yeterian, A. Gouyon, F. 2002. 'Automatic classification of drum sounds: a
comparison of feature selection methods and classification techniques'. Proceedings
of Second International Conference on Music and Artificial Intelligence. Edinburgh,
Scotland.

Partner 3: Magyar Tudomnyos Akadmia Pszicholgiai Kutatintzet


(MTAPI)
Institute: MTAPI ltalnos Llektani Osztly (Department of General Psychology of
the Institute for Psychology, Hungarian Academy of Sciences)
MTAPI (Director: Prof. Dr. Istvn Czigler; http://www.mtapi.hu/bemutatkozas.htm) is a
non-profit centralized research institute financed by the Hungarian Academy of Sciences. The
research program of the Department of General Psychology (the Department; Head: Dr.
Istvn Winkler; http://www.mtapi.hu/altalanos.htm) includes human visual and auditory
perception, attention, and memory with the focus on object-related perceptual processing. The
central questions addressed in both sensory modalities are: 1) How do humans structure
incoming sensory information (temporal integration, feature binding, sequential and spatial
grouping, stream segregation and object formation, contextual effects); 2) What kind of
resources are used in various stages of the processes leading to the veridical perception of
distal objects (reliance on sensory, working, and long-term memory, pre-attentive and
attentive processes, the effects of task-load and time constraints); 3) Developmental aspects of
the above questions (are there innate processes in organizing the sensory input, changes in
grouping processes in childhood and with aging). Investigation of the above-described topics
is mainly conducted with behavioural and electrophysiological methods. Presently, the
department includes four senior researchers, a post-doc researcher, two PhD students, a
technician, and two research assistants.
The Department has two fully equipped laboratories for behavioural and
electrophysiological research. The laboratories have electrically shielded and
sound-attenuated chambers, equipment for auditory and visual stimulation, 32/40-channel
state-of-the-art EEG recording (NeuroScan), general-purpose computers and both commercial
and custom-made ERP analysis software. One of the laboratories will be available with
priority to the proposed project. The research assistants will recruit subjects and the technician
will provide any help needed to conduct the experiments.
Participant 6: Prof Istvn Winkler
Dr. Winkler is a leading expert in the investigation of auditory memory and sound
organization (auditory grouping, stream segregation, auditory context processing) using
electrophysiological methods. He received his Masters degree in electrical engineering at the
013123 (EmCAP) Annex I, vers. 3 (27/05/05) Approved by EC on 1 June 2005 page 88 of 101

Budapest University of Technology and in psychology at the Etvs Lornd Science


University. He completed his PhD work at the Helsinki University and defended his Doctor of
Academy (Academy Professor) thesis in psychology at the Hungarian Academy of Sciences.
He has received the Samuel Sutton Award and served as Leibniz professor at the Leipzig
University. Winkler has authored and co-authored over 60 papers in international
peer-reviewed journals. In addition to heading the Department, he is also a docent of the
Helsinki University and a part-time teacher at the Pzmny Pter Catholic University. His
research interests focus on everyday auditory perception; sound organization; auditory stream
segregation; auditory sensory memory; speech perception; auditory change detection;
automatic, default, and attentive processing; the brain mechanisms of all the above;
developmental aspects of sound organization and sensory memory, especially the question of
what functions are innate.
His current projects include: sound organization and attention (stages in sound
organization), innate functions of sound organization, processing and perceptual units in
audition, representing complex auditory scenes, pre-attentive conjunction of auditory features,
language context and phonetic analysis, audio-visual integration in speech perception, change
detection in natural auditory environments, task-independent ERP paradigms for measuring
auditory functions.
Role in present project
Dr. Winkler will be responsible for WP1 (Higher level auditory functions underlying
music perception: Innate vs. learned operations), testing the innateness of higher-level
stimulus-driven auditory functions. This includes experimental design, piloting, theoretical
work, and organization of the experiments in newborn infants and the corresponding control
studies in adult. He will devote 20% of his time to this project. The post-doc fellow will be
responsible for overseeing the neonate experiments, data analysis and evaluation, and will do
theoretical work. The doctoral student will oversee the adult experiments; and carry out the
data analysis and evaluation. The nurse will run the experiments in newborn babies at the
hospital (Clinic I for Gynaecology, of the Semmelweis Medical School). The Research
assistant will run the adult experiments.
Relevant publications
Korzyukov, O., Winkler, I., Gumenyuk, V., Alho, K., & Ntnen, R. (2003). Processing
abstract auditory features in the human auditory cortex. Neuroimage, 20, 2245-2258.
Kushnerenko, E., eponien, R., Fellman, V., Huotilainen, M., & Winkler, I. (2001).
Event-related potential correlates of sound duration: Similar pattern from birth to
adulthood. NeuroReport, 12, 3777-3781.
Ntnen, R., Tervaniemi, M., Sussman, E., Paavilainen, P., & Winkler, I. (2001).
Primitive intelligence in the auditory cortex. Trends in Neurosciences, 24, 283-288.
Ntnen, R., & Winkler, I. (1999). The concept of auditory stimulus representation in
cognitive neuroscience. Psychological Bulletin, 125, 826-859.
Paavilainen, P., Simola, J., Jaramillo, M., Ntnen, R., & Winkler, I. (2001). Preattentive
extraction of abstract feature conjunctions from auditory stimulation as reflected by
the mismatch negativity (MMN). Psychophysiology, 38, 359-365.
Winkler, I., Cowan, N., Cspe, V., Czigler, I., & Ntnen, R. (1996). Interactions between
transient and long-term auditory memory as reflected by the mismatch negativity.
Journal of Cognitive Neuroscience, 8, 403-415.

013123 (EmCAP) Annex I, vers. 3 (27/05/05) Approved by EC on 1 June 2005 page 89 of 101

Winkler, I., Karmos, G., & Ntnen, R. (1996). Adaptive modeling of the unattended
acoustic environment reflected in the mismatch negativity event-related potential.
Brain Research, 742, 239-252.
Winkler, I., Kushnerenko, E., Horvth, J., eponien, R., Fellman, V., Huotilainen, M.,
Ntnen, R., & Sussman, E. (2003). Newborn infants can organize the auditory
world. Proceedings of the National Academy of Sciences USA, 100, 1182-1185.
Winkler, I., Schrger, E., & Cowan, N. (2001). The role of large-scale perceptual
organization in the mismatch negativity event-related brain potential. Journal of
Cognitive Neuroscience, 13, 59-71.
Winkler, I., Sussman, E., Tervaniemi, M., Ritter, W., Horvth J., & Ntnen, R. (2003).
Pre-attentive auditory context effects. Cognitive, Affective, & Behavioral
Neuroscience, 3 (1), 57-77.
Winkler, I., Tervaniemi, M., & Ntnen, R. (1997). Two separate codes for missing
fundamental pitch in the auditory cortex. Journal of the Acoustical Society of
America, 102, 1072-1082.
van Zuijen, T.L., Sussman, E., Winkler, I., Ntnen, R., & Tervaniemi, M. (2004).
Pre-attentive grouping of sequential sounds - an event-related potential study
comparing musicians and non-musicians. Journal of Cognitive Neuroscience, 16,
331-338.

Partner 4: University of Amsterdam (UvA)


Institute: Institute for Logic, Language and Computation (ILLC)
The Institute for Logic, Language and Computation (ILLC) is a research institute of the
University of Amsterdam (UvA), in which researchers from the Faculty of Science, the
Faculty of Humanities and the Faculty of Social and Behavioural Sciences collaborate.
ILLCs central research area is the study of fundamental principles of encoding, transmission
and comprehension of information. Emphasis is on natural and formal languages, but other
information carriers, such as images and music, are studied as well. Research at ILLC is
interdisciplinary, and aims at bringing together insights from various disciplines concerned
with information and information processing, such as linguistics, logic, computer science,
cognitive science, artificial intelligence and philosophy. The groups expertise is in topics
such as rhythmic categorization, tempo tracking and beat induction: temporal aspects that are
fundamental to any system of musical cognition. The concepts, methods, and visualizations
that have been developed at UvA/ILLC will provide significant support for the research aims
of this proposal.
Participant 7: Dr Henkjan Honing
Dr Honing is currently affiliated to the Department of Musicology and the Institute for
Language, Logic and Computation (ILLC) of the University of Amsterdam (UvA) where he
conducts research in music cognition, focusing on the perception and representation of
musical time and temporal structure, commonly identified as a research area crucial to the
understanding of the complex processes that enable us to enjoy and perform music. His aim is
to arrive at a cognitive science of music, with a special focus on its temporal aspects, such as
rhythm, tempo and timing by applying an interdisciplinary approach that builds on

013123 (EmCAP) Annex I, vers. 3 (27/05/05) Approved by EC on 1 June 2005 page 90 of 101

musicology, psychology and computer science to better understand music cognition as a


whole.
Dr Honing studied electronic music and composition at the Institute for Sonology
(1981-84) and at the Center for Computer Research in Music and Acoustics (CCRMA),
Stanford University (1984). He received his PhD in Music (on the representation of time and
temporal structure in music) from City University, London (1991). He conducted research on
connectionist models and knowledge representation at the Centre for Knowledge Technology
(1986-92) and the Music Department, City University, London (1988-90), funded by the
British Economic and Social Research Council (ESRC). Subsequently, from 1992-97 he was
affiliated to the Institute for Language, Logic and Computation (ILLC), University of
Amsterdam as a Research Fellow of the Royal Netherlands Academy of Arts and Sciences
conducting research in the formalisation of musical knowledge. He has been visiting
researcher at the Department of Music and Performing Arts Professions, New York
University (1992, 2002), Thomas J. Watson Center, IBM (1996), Institute of Research and
Coordination in Acoustics/Music, IRCAM (1997), Center for Computer Research in Music
and Acoustics (CCRMA), Stanford University (1995), and visiting professor at the School of
Music, Northwestern University (2004). He has published over 90 refereed articles in
international journals and books on music representation, music cognition and music
technology. Dr Honing is advisory editor of the Journal for New Music Research. With Peter
Desain he founded the Music, Mind, Machine group supported by an NWO-PIONIER grant,
NICI / University of Nijmegen, Faculty of Humanities / University of Amsterdam, and a
number of companies. Until 1992 Henkjan Honing was active as a musician and a composer
and performer of new music, composing computer music and constructing sound installations
for museums and galleries in the Netherlands and abroad. He is currently chairman of the
Luna's Fridge foundation for new opera productions, and board member of the Jazz in Motion
Foundation.
Role in present project
Dr Honing will be responsible for WP5 (Perception and categorisation of rhythmic
patterns) and will be primarily concerned with modelling the temporal aspects of music
perception and performance such as rhythm, timing and tempo. Dr Honing will devote 20% of
his time to this project. In addition, one postdoctoral fellow and one PhD student will be
appointed. The postdoctoral fellow will be responsible for evaluating the predictive
capabilities of competing kinematic, memory-based and perception-based models of rhythm
perception and production and for developing a model of rhythmic perception and expectancy
based on the notion of likelihood. The student will undertake work in formulating and
developing a model of rhythmic expectation.
Relevant publications
Honing H (in press, 2005). Is there a perception-based alternative to kinematic models of
tempo rubato? Music Perception.
Honing H (2004). Computational modeling of music cognition: a case study on model
selection. ILLC Prepublication, PP-2004-14.
Honing H (2003). The final ritard: on music, motion, and kinematic models. Computer
Music Journal, 27(3), 66-72.
Honing H (2002). Structure and interpretation of rhythm and timing. Tijdschrift voor
Muziektheorie. 7(3), 227-232.

013123 (EmCAP) Annex I, vers. 3 (27/05/05) Approved by EC on 1 June 2005 page 91 of 101

Honing H (2001). From time to time: The representation of timing and tempo. Computer
Music Journal, 35(3), 50-61.
Desain P & Honing H (2003). The formation of rhythmic categories and metric priming.
Perception, 32(3), 341-365.
Timmers R & Honing H (2002). On music performance, theories, measurement and
diversity. In M.A. Belardinelli (ed.). Cognitive Processing (International Quarterly
of Cognitive Sciences), 1-2, 1-19.
Desain P, Honing H, van Thienen H & Windsor WL (1998.) Computational Modeling of
Music Cognition: Problem or Solution?. Music Perception, 16(1),151-166.
Desain P & Honing H (1998). A reply to S. W. Smoliar's "Modelling Musical Perception:
A Critical View". N. Griffith, & P. Todd (eds.), Musical Networks, Parallel
Distributed Perception and Performance, 111-114.

A.2 Sub-contracting
Each partner will obtain the required audit certificates through sub-contracts agreed
with recognised local auditors.

A.3 Third parties


None

A.4 Funding of third country participants


None

013123 (EmCAP) Annex I, vers. 3 (27/05/05) Approved by EC on 1 June 2005 page 92 of 101

References
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
18.
19.
20.

Middlebrooks, J.C., The acquisitive auditory cortex. Nat Neurosci, 2003. 6(11): p.
1122-3.
Fritz, J., et al., Rapid task-related plasticity of spectrotemporal receptive fields in
primary auditory cortex. Nat Neurosci, 2003. 6(11): p. 1216-23.
Drake, C. and D. Bertrand, The quest for universals in temporal processing in music.
Annals of the New York Academy of Sciences, 2001. 930: p. 17-27.
Trehub, S., Human Processing predispositions and musical universals. N.L. Wallin,
B. Merker, & S. Brown (Eds.), The origins of music (pp. 427-448). Cambridge
Massachusetts: The MIT press., 2000.
Imberty, M., The question of innate competencies in musical communication, in The
Origins of Music, N.L. Wallin, B. Merker, and S. Brown, Editors. 2000, The MIT
Press: Cambridge, MA. p. 449-462.
Picton, T.W., et al., Mismatch negativity: different water in the same river. Audiol
Neurootol, 2000. 5(3-4): p. 111-39.
Ntnen, R. and K. Alho, Mismatch negativity--the measure for central sound
representation accuracy. Audiol Neurootol, 1997. 2(5): p. 341-53.
Winkler, I., Change detection in complex auditory environment: beyond the oddball
paradigm, in Detection of change: Event-related potential and fMRI findings, J.
Polich, Editor. 2003, Kluwer Academic Publishers: Boston. p. 61-81.
Paavilainen, P., et al., Preattentive extraction of abstract feature conjunctions from
auditory stimulation as reflected by the mismatch negativity (MMN).
Psychophysiology, 2001. 38(2): p. 359-65.
Winkler, I., G. Karmos, and R. Ntnen, Adaptive modeling of the unattended
acoustic environment reflected in the mismatch negativity event-related potential.
Brain Res, 1996. 742(1-2): p. 239-52.
Ntnen, R. and I. Winkler, The concept of auditory stimulus representation in
cognitive neuroscience. Psychol Bull, 1999. 125(6): p. 826-59.
Ntnen, R., The role of attention in auditory information processing as revealed by
event-related potentials and other brain measures of cognitive function. Behavioral
and Brain Sciences, 1990. 13: p. 201-288.
Sussman, E., I. Winkler, and W.J. Wang, MMN and attention: Competition for
deviance detection. Psychophysiology, 2003. 40: p. 430-435.
Alho, K., et al., Event-related brain potential of human newborns to pitch change of
an acoustic stimulus. Electroencephalogr Clin Neurophysiol, 1990. 77(2): p. 151-5.
Kurtzberg, D., et al., Developmental studies and clinical application of mismatch
negativity: problems and prospects. Ear Hear, 1995. 16(1): p. 105-17.
Leppnen, P.H.T., K.M. Eklund, and H. Lyytinen, Event-related brain potentials to
change in rapidly presented acoustic stimuli in newborns. Developmental
Neuropsychology, 1997. 13: p. 175-204.
Winkler, I., et al., Newborn infants can organize the auditory world. Proc Natl Acad
Sci U S A, 2003. 100: p. 1182-1185.
Paavilainen, P., et al., Neuronal populations in the human brain extracting invariant
relationships from acoustic variance. Neurosci Lett, 1999. 265(3): p. 179-82.
Imada, T., et al. Mismatch fields evoked by a rhythm passage. in The 9th International
Conference on Biomagnetism. 1993. Vienna.
Nordby, H., W.T. Roth, and A. Pfefferbaum, Event?related potentials to time?deviant
and pitch?deviant tones. Psychophysiology, 1988. 25: p. 249?261.

013123 (EmCAP) Annex I, vers. 3 (27/05/05) Approved by EC on 1 June 2005 page 93 of 101

21.
22.
23.
24.
25.
26.
27.
28.
29.
30.
31.
32.
33.
34.
35.
36.
37.
38.
39.
40.
41.
42.

Winkler, I. and E. Schrger, Neural representation for the temporal structure of sound
patterns. Neuroreport, 1995. 6: p. 690?694.
Sussman, E., W. Ritter, and H.G. Vaughan, Jr., Predictability of stimulus deviance and
the mismatch negativity. Neuroreport, 1998. 9(18): p. 4167-70.
Ntnen, R., et al., Development of a memory trace for a complex sound in the
human brain. Neuroreport, 1993. 4: p. 503-506.
Kraus, N., et al., Neurophysiologic bases of speech discrimination. Ear Hear, 1995.
16(1): p. 19-37.
Brattico, E., R. Ntnen, and M. Tervaniemi, Context effects on pitch perception in
musicians and non-musicians: Evidence from ERP recordings. Music Perception,
2002. 19: p. 1-24.
Koelsch, S., E. Schrger, and M. Tervaniemi, Superior attentive and pre-attentive
auditory processing in musicians. NeuroReport, 1999. 10: p. 1309-1313.
van Zuijen, T.L., et al., Grouping of sequential sounds--an event-related potential
study comparing musicians and nonmusicians. J Cogn Neurosci, 2004. 16(2): p. 3318.
Schwartz, D.A., C.Q. Howe, and D. Purves, The statistical structure of human speech
sounds predicts musical universals. J Neurosci, 2003. 23(18): p. 7160-8.
Schwartz, D.A. and D. Purves, Pitch is determined by naturally occurring periodic
sounds. Hearing Research, 2004. 194(1-2): p. 31-46.
Warrier, C.M. and R.J. Zatorre, Influence of tonal context and timbral variation on
perception of pitch. Percept Psychophys, 2002. 64(2): p. 198-207.
Riotte, A., Quelques reflexions sur la controle formel du timbre, in Timbre:
Metaphore pour la Composition, J.-B. Barriere, Editor. 1991, Christian Bourgois:
Paris.
Smalley, D., Spectromorphology: Explaining sound-shapes. Organised Sound, 1997.
2(2): p. 107-126.
Lerdahl, F., Les hierarchies de timbres, in Timbre: Metaphore pour la Composition,
J.-B. Barriere, Editor. 1991, Christian Bourgois: Paris.
Peretz, I. and R.J. Zatorre, The cognitive neuroscience of music. Oxford University
Press, Oxford, 2003.
Rolls, E.T. and G. Deco, Computational neuroscience of vision. 2002, Oxford: Oxford
University Press.
Husain, F.T., et al., Relating neuronal dynamics for auditory object processing to
neuroimaging activity: a computational modeling and an fMRI study. Neuroimage,
2004. 21(4): p. 1701-20.
Deco, G. and J. Zihl, A neurodynamical model of visual attention: feedback
enhancement of spatial resolution in a hierarchical system. J Comput Neurosci, 2001.
10(3): p. 231-53.
Deco, G. and T.S. Lee, A unified model of spatial and object attention based on intercortical biased competition. Neurocomputing, 2002. 44-46: p. 775-781.
Corchs, S. and G. Deco, Feature-based attention in human visual cortex: simulation of
fMRI data. Neuroimage, 2004. 21(1): p. 36-45.
Deco, G. and E.T. Rolls, Attention and working memory: a dynamical model of
neuronal activity in the prefrontal cortex. Eur J Neurosci, 2003. 18(8): p. 2374-90.
Deco, G., E.T. Rolls, and B. Horwitz, "What" and "where" in visual working memory:
a computational neurodynamical perspective for integrating FMRI and single-neuron
data. J Cogn Neurosci, 2004. 16(4): p. 683-701.
Deco, G. and E.T. Rolls, A neurodynamical cortical model of visual attention and
invariant object recognition. Vision Res, 2004. 44(6): p. 621-42.

013123 (EmCAP) Annex I, vers. 3 (27/05/05) Approved by EC on 1 June 2005 page 94 of 101

43.
44.
45.
46.
47.
48.
49.
50.
51.
52.
53.
54.
55.
56.
57.
58.
59.
60.
61.
62.
63.
64.
65.

Corchs, S. and G. Deco, Large-scale neural model for visual attention: integration of
experimental single-cell and fMRI data. Cereb Cortex, 2002. 12(4): p. 339-48.
Deco, G. and E.T. Rolls, Object-based visual neglect: a computational hypothesis. Eur
J Neurosci, 2002. 16(10): p. 1994-2000.
Deco, G., O. Pollatos, and J. Zihl, The time course of selective visual attention: theory
and experiments. Vision Res, 2002. 42(27): p. 2925-45.
Heinke, D., et al., A computational neuroscience account of visual neglect.
Neurocomputing, 2002. 44-46: p. 811-816.
Brunel, N. and X.J. Wang, Effects of neuromodulation in a cortical network model of
object working memory dominated by recurrent inhibition. J Comput Neurosci, 2001.
11(1): p. 63-85.
Hodgkin, A.L. and A.F. Huxley, A quantitative description of membrane current and
its application to conduction and excitation in nerve. J Physiol, 1952. 117(4): p. 50044.
Szabo, M., et al., Cooperation and biased competition model can explain attentional
filtering in the prefrontal cortex. Eur J Neurosci, 2004. 19(7): p. 1969-77.
Meddis, R. and L.P. O'Mard, DSAM: Development System for Auditory Modelling.
Centre for the neural basis for hearing, Essex University: p.
http://www.essex.ac.uk/psychology/hearinglab/dsam.
Bregman, A.S., Auditory Scene Analysis. MIT Press, 1990.
Carlyon, R.P., et al., Effects of attention and unilateral neglect on auditory stream
segregation. J Exp Psychol Hum Percept Perform, 2001. 27(1): p. 115-27.
McCabe, S.L. and M. Denham, A model of auditory streaming. J Acoust Soc Am,
1997. 101(3): p. 1611-1621.
Wrigley, S. and G. Brown, A neural oscillator model of auditory attention. Lecture
notes in computer science, 2001: p. 1163-1170.
Wrigley, S. and G. Brown, A neural oscillator model for auditory selective attention,
in Advances in neural information processing systems 14, T.G. Dietterich, S. Becker,
and Z. Ghahramani, Editors. 2002, MIT Press.
Beauvois, M.W. and R. Meddis, A computer model of auditory stream segregation.
Quaterly Journal of Experimental Psychology, 1991. 43A(3): p. 517-541.
Beauvois, M.W. and R. Meddis, Computer simulation of auditory stream segregation
in alternating-tone sequences. J Acoust Soc Am, 1996. 99(4 Pt 1): p. 2270-80.
Moran, J. and R. Desimone, Selective attention gates visual processing in the
extrastriate cortex. Science, 1985. 229(4715): p. 782-4.
Chelazzi, L., et al., A neural basis for visual search in inferior temporal cortex.
Nature, 1993. 363(6427): p. 345-7.
Chelazzi, L., Serial attention mechanisms in visual search: a critical look at the
evidence. Psychol Res, 1999. 62(2-3): p. 195-219.
Duncan, J., Cooperating brain systems in selective perception and action, in Attention
and Performance XVI, T. Inui and J.L. McClelland, Editors. 1996, MIT Press:
Cambridge MA. p. 433-458.
Reynolds, J.H. and R. Desimone, The role of neural mechanisms of attention in
solving the binding problem. Neuron, 1999. 24(1): p. 19-29, 111-25.
Spangler, K.M. and W.B. Warr, The descending auditory system. Neurobiology or
Hearing: The central auditory system, Altschuler et al (ed.s) Raven Press., 1991.
Suga, N., et al., The corticofugal system for hearing: recent progress. Proc Natl Acad
Sci U S A, 2000. 97(22): p. 11807-14.
Maison, S., C. Micheyl, and L. Collet, Influence of focused auditory attention on
cochlear activity in humans. Psychophysiology, 2001. 38(1): p. 35-40.

013123 (EmCAP) Annex I, vers. 3 (27/05/05) Approved by EC on 1 June 2005 page 95 of 101

66.
67.
68.
69.
70.
71.
72.
73.
74.
75.
76.
77.
78.
79.
80.
81.
82.
83.
84.
85.
86.
87.
88.

Kowalski, N., D.A. Depireux, and S.A. Shamma, Analysis of dynamic spectra in ferret
primary auditory cortex. I. Characteristics of single-unit responses to moving ripple
spectra. J Neurophysiol, 1996. 76(5): p. 3503-23.
deCharms, R.C., D.T. Blake, and M.M. Merzenich, Optimizing sound features for
cortical neurons. Science, 1998. 280(5368): p. 1439-43.
Klein, D.J., et al., Robust spectrotemporal reverse correlation for the auditory system:
optimizing stimulus design. J Comput Neurosci, 2000. 9(1): p. 85-111.
Miller, L.M., et al., Spectrotemporal receptive fields in the lemniscal auditory
thalamus and cortex. J Neurophysiol, 2002. 87(1): p. 516-27.
Linden, J.F., et al., Spectrotemporal structure of receptive fields in areas AI and AAF
of mouse auditory cortex. J Neurophysiol, 2003. 90(4): p. 2660-75.
Machens, C.K., M.S. Wehr, and A.M. Zador, Linearity of cortical receptive fields
measured with natural sounds. J Neurosci, 2004. 24(5): p. 1089-100.
Zhang, L.I., S. Bao, and M.M. Merzenich, Persistent and specific influences of early
acoustic environments on primary auditory cortex. Nat Neurosci, 2001. 4(11): p.
1123-30.
Zhang, L.I., S. Bao, and M.M. Merzenich, Disruption of primary auditory cortex by
synchronous auditory inputs during a critical period. Proc Natl Acad Sci U S A, 2002.
99(4): p. 2309-14.
Chang, E.F. and M.M. Merzenich, Environmental noise retards auditory cortical
development. Science, 2003. 300(5618): p. 498-502.
Coath, M. and S.L. Denham, Robust sound classification through the representation of
similarity using response fields derived from stimuli during early experience.
Biological Cybernetics, 2004. submitted.
Kral, A., et al., Congenital Auditory Deprivation Reduces Synaptic Activity within the
Auditory Cortex in a Layer-specific Manner. Cereb. Cortex, 2000. 10(7): p. 714-726.
Kral, A., et al., Postnatal Cortical Development in Congenital Auditory Deprivation.
Cereb. Cortex, 2004: p. bhh156.
Jones, M.R., Time, our lost dimension. Psychol. Rev., 1976. 83(5): p. 323-55.
Coull, J.T. and A.C. Nobre, Where and when to pay attention: the neural systems for
directing attention to spatial locations and to time intervals as revealed by both PET
and fMRI. J Neurosci, 1998. 18(18): p. 7426-35.
Coull, J.T., et al., Functional anatomy of the attentional modulation of time
estimation. Science, 2004. 303(5663): p. 1506-8.
Jones, M.R., et al., Temporal aspects of stimulus-driven attending in dynamic arrays.
Psychol Sci, 2002. 13(4): p. 313-9.
Michon, J.A. and J.L. Jackson, Time, mind and behaviour. 1985, Berlin: Springer 7.
Desain, P. and H. Honing, Music, mind and machine, studies in computer music,
music cognition and artificial intelligence. 1992, Amsterdam: Thesis Publishers.
Clarke, E.F., Rhythm and timing in music, in Psychology of Music 2nd edition, D.
Deutsch, Editor. 1999, Academic Press: New York. p. 473-500.
Lerdahl, F. and R. Jackendoff, A generative theory of tonal music. 1983, Cambridge
MA: MIT Press.
Povel, D.J. and P. Essens, Perception of temporal patterns. Music Perception, 1985.
2(4): p. 411-440.
Honing, H., From time to time: The representation of timing and tempo. Computer
Music Journal, 2001. 35(3): p. 50-61.
Honing, H., Structure and interpretation of rhythm and timing. Tijdschrift voor
muziektheorie, 2002. 7(3): p. 227-232.

013123 (EmCAP) Annex I, vers. 3 (27/05/05) Approved by EC on 1 June 2005 page 96 of 101

89.
90.
91.
92.
93.
94.
95.
96.
97.
98.
99.
100.
101.
102.
103.

104.
105.
106.
107.
108.
109.
110.

Desain, P. and H. Honing, The formation of rhythmic categories and metric priming.
Perception, 2003. 32(3): p. 341-65.
Honing, H., The final ritard: on music, motion and kinematic models. Computer
Music Journal, 2003. 27(3): p. 66-72.
Darwin, C.J. and R.P. Carlyon, Auditory grouping. Handbook of Perception and
Cognition, Volume 6: Hearing, B.C.J. Moore (Ed.), Orlando, Florida: Academic Press,
387-424, 1995.
Warrier, C.M. and R.J. Zatorre, Right temporal cortex is critical for utilization of
melodic contextual cues in a pitch constancy task. Brain, 2004. 127(Pt 7): p. 1616-25.
Tillmann, B. and E. Bigand, Further investigation of harmonic priming in long
contexts using musical timbre as surface marker to control for temporal effects.
Percept Mot Skills, 2004. 98(2): p. 450-8.
Zatorre, R.J., Absolute pitch: a model for understanding the influence of genes and
development on neural and cognitive function. Nat Neurosci, 2003. 6(7): p. 692-5.
Krumhansl, C.L., Rhythm and pitch in music cognition. Psychol Bull., 2000. 126(1): p.
159-79.
Krumhansl, C.L. and E.J. Kessler, Tracing the dynamic changes in perceived tonal
organization in a spatial representation of musical keys. Psychol Rev, 1982. 89(4): p.
334-68.
Tramo, M.J., et al., Neurobiological foundations for the theory of harmony in western
tonal music. Ann N Y Acad Sci, 2001. 930: p. 92-116.
Schwartz, D.A., C.Q. Howe, and D. Purves, The Statistical Structure of Human
Speech Sounds Predicts Musical Universals. J. Neurosci., 2003. 23(18): p. 7160-7168.
Nelken, I., et al., Primary auditory cortex of cats: feature detection or something else?
Biol Cybern, 2003. 89(5): p. 397-406.
Griffiths, T.D., et al., Cortical processing of complex sound: a way forward? Trends
Neurosci, 2004. 27(4): p. 181-5.
Langner, G., Periodicity coding in the auditory system. Hear Res., 1992. 60(2): p. 11542.
Wiegrebe, L. and R. Meddis, The representation of periodic sounds in simulated
sustained chopper units of the ventral cochlear nucleus. J Acoust Soc Am, 2004.
115(3): p. 1207-18.
Langner, G., et al., Frequency and periodicity are represented in orthogonal maps in
the human auditory cortex: evidence from magnetoencephalography. Journal of
Comparative Physiology A: Sensory, Neural, and Behavioral Physiology, 1997.
181(6): p. 665 - 676.
Krumbholz, K., et al., Neuromagnetic evidence for a pitch processing center in
Heschl's gyrus. Cereb Cortex., 2003. 13(7): p. 765-72.
Griffiths, T.D., Functional imaging of pitch analysis. Ann N Y Acad Sci., 2003. 999:
p. 40-9.
Patterson, R.D., et al., The processing of temporal pitch and melody information in
auditory cortex. Neuron, 2002. 36(4): p. 767-76.
Warren, J.D., et al., Separating pitch chroma and pitch height in the human brain.
Proc Natl Acad Sci U S A, 2003. 100(17): p. 10038-42.
Zatorre, R.J., A.C. Evans, and E. Meyer, Neural mechanisms underlying melodic
perception and memory for pitch. J Neurosci, 1994. 14(4): p. 1908-19.
Huron, D., Foundations of Cognitive Musicology. Berkeley, University of California.,
1999. [http://www.music-cog.ohio-state.edu/Music220/Bloch.lectures/].
Juslin, P. and J.E. Sloboda, Music and Emotion: Theory and Research. Oxford
University Press, Oxford, 2001.

013123 (EmCAP) Annex I, vers. 3 (27/05/05) Approved by EC on 1 June 2005 page 97 of 101

111.
112.
113.
114.
115.
116.

117.
118.
119.
120.
121.
122.
123.
124.
125.
126.
127.
128.
129.
130.

Levitin, D.E., Foundations of Cognitive Psychology: Core Readings. MIT Press,


Cambridge, MA, 2002.
Ntnen, R., et al., "Primitive intelligence" in the auditory cortex. Trends Neurosci,
2001. 24(5): p. 283-8.
Garrod, S. and M.J. Pickering, Why is conversation so easy? Trends Cogn. Sci., 2004.
8(1): p. 8-11.
Rowe, R., Interactive Music Systems - Machine Listening and Composing. MIT Press,
Cambridge, MA, 1994.
Minsky, M., Society of Mind. Simon & Schuster, New York, 1988.
Pachet, F., (vol. ), pages , . Interacting with a musical learning system: the
continuator. In C. Anagnostopoulou, M. Ferrand, A. Smaill, (eds), Music and
Artificial Intelligence, Lecture Notes in Artificial Intelligence, Springer Verlag, 2002.
2445: p. 119-132.
Bongers, B., Physical Interfaces in the Electronic Arts - Interaction Theory and
Interfacing Techniques for Real-time Performance. In M. M. Wanderley &. M. Battier
(Eds.), Trends in Gestural Control of Music, 2000. Paris: Ircam-Centre Pompidou.
Wessberg, J. and M.A. Nicolelis, Optimizing a Linear Algorithm for Real-Time
Robotic Control using Chronic Cortical Ensemble Recordings in Monkeys. Journal of
Cognitive Neuroscience, 2004. 16(6): p. 1022-1035.
Musallam, S., et al., Cognitive Control Signals for Neural Prosthetics. Science, 2004.
305(5681): p. 258-262.
Sussman, E., I. Winkler, and W. Wang, MMN and attention: competition for deviance
detection. Psychophysiology, 2003. 40(3): p. 430-5.
McAdams, S. and A.S. Bregman, Hearing musical streams, in Foundations of
Computer Music, C. Roads and L. Strawn, Editors. 1985, The MIT Press: Cambridge,
MA.
Bigand, E., The influence of implicit harmony, rhythm and musical training on the
abstraction of tension-relaxation schemas in tonal musical pieces. Contemporary
Music Review, 1993. 9: p. 123-137.
Sloboda, J.A., The Musical Mind: The Cognitive Psychology of Music. 1985, Oxford:
Oxford University Press.
Temperley, D., The cognition of basic musical structures. 2001, Cambridge, MA: The
MIT Press.
Pandya, D. and E. Yeterian, Architecture and connections of cortical associations
areas, in Cerebral Cortex, A. Peters and E. Jones, Editors. 1985, Plenum Press: New
York. p. 3-61.
Chiry, O., et al., Patterns of calcium-binding proteins support parallel and
hierarchical organization of human auditory areas. Eur J Neurosci, 2003. 17(2): p.
397-410.
Hubel, D.H. and T.N. Wiesel, Receptive fields, binocular interaction and functional
architecture in the cat's visual cortex. J Physiol, 1962. 160: p. 106-54.
Drake, C., M.R. Jones, and C. Baruch, The development of rhythmic attending in
auditory sequences: attunement, referent period, focal attending. Cognition, 2000.
77(3): p. 251-88.
Singh, N.C. and F.E. Theunissen, Modulation spectra of natural sounds and
ethological theories of auditory processing. J Acoust Soc Am, 2003. 114(6 Pt 1): p.
3394-411.
Theunissen, F.E., et al., Song selectivity in the song system and in the auditory
forebrain. Ann N Y Acad Sci, 2004. 1016: p. 222-45.

013123 (EmCAP) Annex I, vers. 3 (27/05/05) Approved by EC on 1 June 2005 page 98 of 101

131.
132.
133.
134.
135.
136.
137.
138.
139.
140.
141.
142.
143.
144.
145.
146.
147.
148.
149.
150.
151.

Lewicki, M.S., Efficient coding of natural sounds. Nat Neurosci, 2002. 5(4): p. 35663.
Coath, m. and S.L. Denham, Robust sound classification through the representation of
similarity using response fields derived from stimuli during early experience.
Biological Cybernetics, 2004, submitted.
Davidson, R.J., C.J. Jackson, and C.L. Larson, Human electroencephalography, in
Handbook of Psychophysiology (second edition), J.T. Cacioppo, L.G. Tassinary, and
G.G. Bernston, Editors. 2000, Cambridge University Press: Cambridge. p. 27-52.
Winkler, I., et al., Preattentive auditory context effects. Cogn Affect Behav Neurosci,
2003. 3(1): p. 57-77.
Jacobsen, T., et al., Mismatch negativity to pitch change: varied stimulus proportions
in controlling effects of neural refractoriness on human auditory event-related brain
potentials. Neurosci Lett, 2003. 344(2): p. 79-82.
Messner, A.H., et al., Volunteer-based universal newborn hearing screening program.
Int J Pediatr Otorhinolaryngol, 2001. 60(2): p. 123-30.
Karmel, B.Z., et al., Brain-stem auditory evoked responses as indicators of early brain
insult. Electroencephalogr Clin Neurophysiol, 1988. 71(6): p. 429-42.
Friederici, A.D., M. Friedrich, and C. Weber, Neural manifestation of cognitive and
precognitive mismatch detection in early infancy. Neuroreport, 2002. 13(10): p. 12514.
Sussman, E., et al., Top-down effects can modify the initially stimulus-driven auditory
organization. Brain Res Cogn Brain Res, 2002. 13(3): p. 393-405.
de Boer, E., On the residue and auditory pitch perception, in Handbook of Sensory
Physiology: Vol 3, W.D. Keidel and W.D. Neff, Editors. 1976, Springer: New York. p.
479-583.
Winkler, I., et al., From objective to subjective: pitch representation in the human
auditory cortex. Neuroreport, 1995. 6(17): p. 2317-20.
Winkler, I., M. Tervaniemi, and R. Ntnen, Two separate codes for missingfundamental pitch in the human auditory cortex. J Acoust Soc Am, 1997. 102(2 Pt 1):
p. 1072-82.
Tervaniemi, M., I. Winkler, and R. Ntnen, Pre-attentive categorization of sounds
by timbre as revealed by event-related potentials. Neuroreport, 1997. 8(11): p. 2571-4.
Paavilainen, P., et al., Neuronal populations in the human brain extracting invariant
relationships from acoustic variance. Neurosci Lett, 1999. 265(3): p. 179-82.
Besson, M. and D. Schon, Comparison between language and music. Ann N Y Acad
Sci, 2001. 930: p. 232-58.
Patel, A.D., et al., Processing syntactic relations in language and music: an eventrelated potential study. J Cogn Neurosci, 1998. 10(6): p. 717-33.
Maess, B., et al., Musical syntax is processed in Broca's area: an MEG study. Nat
Neurosci, 2001. 4(5): p. 540-5.
Krumhansl, C.L. and R.N. Shepard, Quantification of the hierarchy or tonal functions
within a diatonic context. Journal of Experimental Psychology, 1979. 5: p. 579-594.
Deutsch, D., Octave generalization of specific interference effects in memory for tonal
pitch. Percept. Psychophys., 1973. 13: p. 271-275.
Lieberman, P., Uniquely Human: The Evolution of Speech, Thought and Selfless
Behaviour. Harvard University Press, Cambridge, MA, 1991.
Nazzi, T., C. Floccia, and J. Bertoncini, Discrimination of pitch contour by neonates.
Infant Behaviour, 1998. 21: p. 543-554.

013123 (EmCAP) Annex I, vers. 3 (27/05/05) Approved by EC on 1 June 2005 page 99 of 101

152.
153.
154.
155.
156.
157.
158.
159.
160.
161.
162.
163.
164.
165.
166.
167.
168.
169.
170.
171.
172.

Leitman, J.T. and J.S. Reidenberg, Advances in understanding the relationship


between the skull base and larynx, with comments on the origins of speech. Human
Evolution, 1988. 3: p. 101-111.
Miranda, E.R., The role of speech synthesis in Requiem per una veu perduda.
Organised Sound, 1998. 3(3): p. 235-240.
Palmer, C. and M. Kelly, Linguistic prosody and musical meter in song. Journal of
Memory and Language, 1992. 31: p. 525-542.
Miranda, E.R. Synthesising Prosody with Variable Resolution. in Proceedings of the
110th Audio Engineering Society Convention. 2001. Amsterdam.
Combouropoulos, E., et al. Algorithms for computing approximate repetitions in
musical sequences. in Proceedings of the Australasian Workshop on Combinatorial
Algorithms. 1999. Perth, Australia.
Mullensiefen, D. and K. Frieler. Measuring melodic similarity: human vs algorithmic
judgements. in Proceedings of the Conference on Interdisciplinary Musicology. 2004.
Graz, Austria.
Ockleford, A., On similarity, derivation and the cognition of musical structure.
Psychology of Music, 2004. 32(1): p. 23-74.
Grisey, G., Structuration des timbres dans la musique instrumentale, in Timbre:
Metaphore pour la Composition, J.-B. Barriere, Editor. 1991, Christian Bourgois:
Paris.
Saariaho, K., Timbre et harmonie, in Timbre: Metaphore pour la Composition, J.-B.
Barriere, Editor. 1991, Christian Bourgois: Paris.
Wishart, T. and S. Emmerson, On Sonic Art. 1996, London: Routledge.
Vliegen, J., B.C. Moore, and A.J. Oxenham, The role of spectral and periodicity cues
in auditory stream segregation, measured using a temporal discrimination task. J
Acoust Soc Am, 1999. 106(2): p. 938-45.
Meddis, R. and L. O'Mard, A unitary model of pitch perception. J Acoust Soc Am,
1997. 102(3): p. 1811-20.
Bernstein, J.G. and A.J. Oxenham, Pitch discrimination of diotic and dichotic tone
complexes: harmonic resolvability or harmonic number? J Acoust Soc Am, 2003.
113(6): p. 3323-34.
Oxenham, A.J., J.G. Bernstein, and H. Penagos, Correct tonotopic representation is
necessary for complex pitch perception. Proc Natl Acad Sci U S A, 2004. 101(5): p.
1421-5.
Brown, G. and M. Cooke, Computational auditory scene analysis. Comp Speech
Lang, 1994. 8: p. 297-336.
Deutsch, D., The processing of pitch combinations, in The psychology of music, D.
Deutsch, Editor. 1982, Academic Press: New York. p. 271-316.
Deutsch, D., Effect of repetition of standard and comparison tones on recognition
memory for pitch. J Exp Psychol, 1972. 93(1): p. 156-62.
Zatorre, R.J., Neural specializations for tonal processing, in The Cognitive
Neuroscience of Music, I. Peretz and R.J. Zatorre, Editors. 2003, Oxford University
Press: Oxford.
Rauschecker, J.P., Auditory cortical plasticity: a comparison with other sensory
systems. Trends Neurosci, 1999. 22(2): p. 74-80.
Sharma, J., A. Angelucci, and M. Sur, Induction of visual orientation modules in
auditory cortex. Nature, 2000. 404(6780): p. 841-7.
Widmer, G., Using AI and machine learning to study expressive music performance:
project survey and first report. AI Communications, 2001. 14(3): p. 149-162.

013123 (EmCAP) Annex I, vers. 3 (27/05/05) Approved by EC on 1 June 2005 page 100 of 101

173.
174.
175.
176.
177.
178.
179.
180.
181.
182.
183.

Desain, P., et al., Computational modeling of music cognition: problem or solution?


Music Perception, 1998. 16(1): p. 151-166.
Large, E.W. and M.R. Jones, The dynamics of attending: how we track time varying
events. Psychological Review, 1999. 106(1): p. 119-159.
Desain, P. and H. Honing, Computational models of beat induction: the rule-based
approach. Journal of New Music Research, 1999. 28(1): p. 29-42.
Bod, R., R. Scha, and K. Sima'an, Data-Oriented Parsing. 2003, University of
Chicago Press: CSLI Publications.
Shamma, S. and D. Klein, The case of the missing pitch templates: how harmonic
templates emerge in the early auditory system. J Acoust Soc Am, 2000. 107(5 Pt 1): p.
2631-44.
Cohen, M.A., S. Grossberg, and L.L. Wyse, A spectral network model of pitch
perception. J Acoust Soc Am, 1995. 98(2 Pt 1): p. 862-79.
Zatorre, R.J., Sound analysis in auditory cortex. Trends Neurosci, 2003. 26(5): p. 22930.
Zatorre, R.J. and A.R. Halpern, Effect of unilateral temporal-lobe excision on
perception and imagery of songs. Neuropsychologia, 1993. 31(3): p. 221-32.
Xiao, Z. and N. Suga, Modulation of cochlear hair cells by the auditory cortex in the
mustached bat. Nat Neurosci, 2002. 5(1): p. 57-63.
Lopez-Poveda, E.A. and R. Meddis, A human nonlinear cochlear filterbank. J Acoust
Soc Am, 2001. 110(6): p. 3107-18.
Sumner, C.J., et al., A nonlinear filter-bank model of the guinea-pig cochlear nerve:
rate responses. J Acoust Soc Am, 2003. 113(6): p. 3264-74.

013123 (EmCAP) Annex I, vers. 3 (27/05/05) Approved by EC on 1 June 2005 page 101 of 101