You are on page 1of 9

Multilevel rhythms in multimodal

communication
royalsocietypublishing.org/journal/rstb
Wim Pouw1,2, Shannon Proksch3, Linda Drijvers1,2,†, Marco Gamba4,†,
Judith Holler1,2,†, Christopher Kello3,†, Rebecca S. Schaefer5,6,† and
Geraint A. Wiggins7,8,†
Review 1
Donders Institute for Brain, Cognition and Behaviour, Radboud University, Nijmegen, The Netherlands
2
Max Planck Institute for Psycholinguistics, Nijmegen, The Netherlands
Cite this article: Pouw W, Proksch S, Drijvers 3
Cognitive and Information Sciences, University of California, Merced, CA, USA
4
L, Gamba M, Holler J, Kello C, Schaefer RS, Department of Life Sciences and Systems Biology, University of Turin, Turin, Italy
5
Wiggins GA. 2021 Multilevel rhythms in Health, Medical and Neuropsychology unit, Institute for Psychology, and 6Academy for Creative and Performing
Arts, Leiden University, Leiden, The Netherlands
multimodal communication. Phil. Trans. R. Soc. 7
Vrije Universiteit Brussel, Brussels, Belgium and Queen Mary University of London, UK
B 376: 20200334. 8
Queen Mary University, London, UK
Downloaded from https://royalsocietypublishing.org/ on 07 June 2023

https://doi.org/10.1098/rstb.2020.0334 WP, 0000-0003-2729-6502; SP, 0000-0002-8737-9396; CK, 0000-0003-1588-9474;


GAW, 0000-0002-1587-112X
Accepted: 5 April 2021
It is now widely accepted that the brunt of animal communication is conducted
via several modalities, e.g. acoustic and visual, either simultaneously or sequen-
One contribution of 17 to a theme issue tially. This is a laudable multimodal turn relative to traditional accounts of
‘Synchrony and rhythm interaction: from the temporal aspects of animal communication which have focused on a single
brain to behavioural ecology’. modality at a time. However, the fields that are currently contributing to the
study of multimodal communication are highly varied, and still largely discon-
nected given their sole focus on a particular level of description or their
Subject Areas:
particular concern with human or non-human animals. Here, we provide an
behaviour, cognition integrative overview of converging findings that show how multimodal pro-
cesses occurring at neural, bodily, as well as social interactional levels each
Keywords: contribute uniquely to the complex rhythms that characterize communication
in human and non-human animals. Though we address findings for each of
multimodal communication, rhythm,
these levels independently, we conclude that the most important challenge in
multimodal signalling, cross-species,
this field is to identify how processes at these different levels connect.
interaction This article is part of the theme issue ‘Synchrony and rhythm interaction:
from the brain to behavioural ecology’.
Author for correspondence:
Wim Pouw
e-mail: wim.pouw@donders.ru.nl 1. Introduction
The rhythms animals can sustain in communicative perception and action
characterize in great part their social-ecological niche. It is only recently that dis-
parate research fields have focused on the study of temporal aspects of
communication as a truly multimodal process [1–3]. Lessons about the different
scales or levels at which multimodal processes happen are however still scattered
over different fields, such as psycholinguistics [3], neuroscience [4] and evol-
utionary biology [5]. The goal of this paper is to align some of the important
findings of these fields concerning the different ways in which the brain, body
and social interaction each contribute uniquely to the temporal structure of mul-
timodal communication (figure 1 for an overview). Although we overview
findings at each level (neural, body, social) independently, we hope to stimulate
investigation into potential interactions between levels. We provide some broad
terminology for the phenomenon of multilevel rhythm of multimodal communi-
cation (§2), and then overview rhythmic multimodal processes on the neural-
cognitive (§3), the peripheral body (§4) and the social interactional level (§5).

2. Concepts and terminology



These authors contributed equally to this Multimodal processes interest researchers from largely disparate fields and con-
review. sequently terminology varies [1,5,8], where related meanings potentially get

© 2021 The Author(s) Published by the Royal Society. All rights reserved.
2

social interactions
communication
multilevel rhythms of multimodal communication

royalsocietypublishing.org/journal/rstb
multimodal signals

biomechanics
(sensory) features

neural processes

Phil. Trans. R. Soc. B 376: 20200334


a single communication channel neural processing of multimodal
Downloaded from https://royalsocietypublishing.org/ on 07 June 2023

(e.g. music, speech) contains signals recruits coordinated


information in: neural-behavioural information
• multiple sensory modalities channels (e.g. audiomotor,
• multiple features/components visuomotor networks)
within those modalities (e.g.
pitch, rhythm , prosody) integration or separation of
multisensory signals depends on:
communication exhibits various • spatiotemporal relationships
(quasi) rhythmic interactions between and within signals
between features and modalities • multi-level, multi-scale cortical
across time scales oscillations

bodily constraints during signal unitary and multimodal signals can


production embed temporal induce multimodal responses.
information about one modality into spontaneous movement can:
the signal of another modality • be induced by listening to music
or having an engaging
• gestural movements made by a conversation
speaker can be inferred from • induce physiological synchrony and
auditory signal alone increase social-affiliative feelings

• isochronous wing-beats of flying humans and non-human primates


bats provide temporal structure for flexibly employ multi-modal
echo-vocalizations signalling in social communication

quasi-rhythmic structure of
vocal and gestural signals can:
• be intentionally mediated
• facilitate interaction and
information processing
• aid in turn-taking and avoid
overlapping vocal calls

Figure 1. Multilevel rhythm in multimodal communication. Graphical overview of how each level contributes uniquely to the rhythms sustained in multimodal
communication. Figures are adapted from [6,7] and inspired by Gilbert Gottlieb’s (1929–2006) view on epigenesis.

lost in translation. In box 1, we have marked the terms and address all the rich and varied (temporal) functions of com-
their senses that occur throughout our overview. This glos- plex multimodal signalling [5,8,10]. But in our review, the
sary also aims to capture a very general meaning of common thread resonates with a recent overview by Half-
specialist terms offered to address a particular process in per- werk et al. [10] who suggest that multimodal signalling
ception or production, or at a neural, structural or functions are not exhausted by simply (i) providing redun-
behavioural level. The definitions are as general as possible, dant backup information or (ii) combining multiple
for instance, so as to underline a continuity of the perception independent messages. Instead, what is central to the tem-
and production of multimodal signals or so as to include poral functioning of multimodal systems is that the
phenomena not traditionally treated as multimodal in resulting perception or production of a signal ‘is qualitatively
nature. For example, in sign languages, both the hands as different from the sum of the properties of its components’
well as facial and labial expressions are combined in complex [10, p. 2], i.e. has emergent properties [8].
utterances [9]. Such complex signs are designed to be
received through one sensory channel and are thus unimodal
by common definitions (but see [10]). In our view, signed 3. Neural level: multimodal neural-cognitive
languages are an example of a multimodal production in
virtue of combining several otherwise independent signalling processes
modes/signal features. Similarly, neural processes can be mul- Here, we present an overview of how temporal coupling
timodal in our view, in virtue of coupling neural ensembles in the production and perception of multimodal signals
that independently would be tuned to differently structured can be constrained by neural ensembles that are independently
information in the environment. Note, that we cannot tuned towards specifically structured information in the
3
Box 1. Definitions.

royalsocietypublishing.org/journal/rstb
general definition of
phenomenon term context of term example

a distinct measurable aspect component ethology frequency or duration of a signal; an intellectual instance
of a system, which can be determining behaviour
measured independently of component mathematics; electronic partial at frequency x; regions of energy concentration
other aspects engineering (EE)
feature EE; computer science (CS) spectral centroid; signal onset/offset; duration of a signal
feature music; current paper pitch; fundamental frequency; rhythm; harmony

Phil. Trans. R. Soc. B 376: 20200334


unitary communication event cue (1) ethology size of an animal, not intentionally communicated
X which is informative natural signs (1) (Peircian) semiotics footsteps in the sand, not intentionally communicated
about state of affairs Y to sign (1 and 2) (Peircian) semiotics; current word or gesture, intentionally communicated; understood
a receiver (1) and/or paper in a three place relation of sign, referential target, and
Downloaded from https://royalsocietypublishing.org/ on 07 June 2023

producer (2) the user of the sign


sensory and/or effector modality neuroscience; current paper specific neural ensembles associated with processing of a
communication channel specific sensory channel or structure
conventionally treated as modality psycholinguistics; audition; vision; touch (usually ascribed to senses of the
functionally separable psychomusicology; ethology; receiver — the receiver
from others current paper processes light signals via the sense of vision)
mode movement science; current whispering, phonating; in-phase, anti-phase synchrony;
paper resonance; punching, kicking
a measurable aspect of a signal mathematics; EE; CS; current frequency, voltage, amplitude
producing system, paper
changing in time, which is signal ethology a (sequence of) vocalization(s), or movement(s), etc
used by a receiver system intentionally produced for a receiver,
e.g. a specific mating call
informational, temporal, and/ multimodal cue (1) ethology; current paper information about body movement or size from vocal
or mechanical coupling patterning; indexical cues
between two or more multimodal signal ethology; psycholinguistics; sonic communication with facial and/or manual gesture
measurable aspects, the (1 and 2) current paper
coupling of which benefits multi-component ethology combined vocal and visual signalling
communicative purposes; signal (1 and 2)
the benefit can be for the coordination of movement science; entrainment of neural ensembles for sensory integration;
recipient (1) and/or the modes (1 and/ current paper coordination of respiratory, jaw, and articulatory modes
producer (2) or 2) for speaking; gesture (person 1) and speech (person
2) interactions

environment. In their multimodal arrangement, they yield optogenetics, EEG, MEG, fMRI, combined with psycho-
unique stabilities for tuning to the rhythms of multimodal physical experiments [11–13]). Although the potential
communication. Furthermore, some neural ensembles are multisensory integration mechanisms are debated, the inte-
uniquely specialized to attune to multisensory information. gration likelihood of two signals seems highly dependent
When integrating a cascade of sensory signals to form a on the degree of spatio-temporal coherence between those
unified, structured percept of the environment, the brain signals: unisensory signals that are closer in time and space
faces two challenges. First, integrating different sensory signals have a greater likelihood of being integrated (cf. [3] and §5).
into a unified percept relies on solving the ‘binding problem’: Both human and non-human animal research demonstrates
whether signals need to be integrated or segregated. Second, that multisensory neurons in the superior colliculus respond
these sensory signals require integration with prior and more robustly to spatio-temporally congruent audio-visual
contextual knowledge to weigh their uncertainty. cues than to individual sensory cues [14–16]. For example,
The neural integration of multiple sensory signals in macaques (Macaca mulatta), single-unit activity measure-
is describable at several neural levels and measurable ments in one specific area in the superior temporal sulcus
using wide-ranging methods (e.g. single-unit recordings, (anterior fundus) show unique sensitivity to facial displays
when temporally aligned with vocal information, while other system in terms of timing mechanisms [30]. While the audi- 4
areas (anterior medial) are sensitive to facial displays alone tory versus visual modality seems better suited to guide

royalsocietypublishing.org/journal/rstb
[17]. Behavioural evidence of multisensory integration is movement [31], it appears that within different sensory mod-
shown in the territorial behaviour of dart-poison frogs alities, different features may be better suited to cue
(Epipedobates femoralis), who aggress conspecifics more when movement [32]. For example, movement is most easily cued
auditory and visual cues are sufficiently spatio-temporally by discrete events in the auditory domain (e.g. beeps), fol-
aligned [18]. Note though, multimodal temporal alignment lowed by continuously moving objects in the visual domain
need not entail synchronization but can specifically involve (e.g. moving bars) [32]. For discrete visual stimuli (e.g.
structured sequencing (i.e. alignment at a lag). This is evi- flashes), or continuous auditory stimuli, (e.g. a siren), sensor-
denced by research on a taxa of flycatcher bird species imotor synchronization is less stable (see for similar results in
(Monarcha castaneiventris) who are uniquely responsive to audio-visual speech: [33]). By contrast with humans, rhesus
long-range-emitted song followed by seeing plumage colour macaques (Macaca mulatta) more easily synchronize to dis-
of potential territorial rivals as opposed to their, reversely crete visual cues [34] perhaps due to weaker audiomotor
ordered, synchronized or unimodal presentation [19]. connections in the macaque brain [35]. These findings indi-

Phil. Trans. R. Soc. B 376: 20200334


Integration by temporally aligned presentation can be a devel- cate that multimodal perception is not simply a matter of
opmentally acquired disposition, as research in cats shows that adding more modalities, but rather the combination of tem-
the development of multisensory integration in the superior poral structure and signal content, affecting behavioural
colliculus is dependent on exposure to spatio-temporally performance and neural activations [36,37]. Moreover, com-
Downloaded from https://royalsocietypublishing.org/ on 07 June 2023

coherent visual and auditory stimuli early in life [12]. pelling arguments based on multimodal mating signals in a
Although the lower level and higher level multimodal inte- range of species, as reviewed by Halfwerk et al. [10], suggests
gration mechanisms are not well understood, both feedback that exactly this integration of signals, leading to a multimo-
and feed-forward interactions between early and higher dal percept rather than a main and a secondary modality, is
level cortices might be relevant for integration. Specifically, it what makes them informative.
has been hypothesized that synchronized neural oscillations Behavioural and neural studies show that temporal struc-
provide a mechanism for multisensory binding and selecting tures in one sensory domain can affect processing in another.
information that matches across sensory signals [20]. Here, Examples are auditory [38] or even multisensory rhythmic
coherent oscillatory signals are thought to allow for functional cues such as music or a metronome [28] not only regularizing
connectivity between spatially distributed neuronal popu- movement (i.e. changing motion trajectories as compared to
lations, where low-frequency neural oscillations provide uncued movements), but also entraining visual attention
temporal windows for cross-modal influences [21]. This syn- [37,39], by increasing visual sensitivity at time points pre-
chronization can occur through neural entrainment and/or dicted to be salient by an auditory stimulus. The neural
phase resetting, which might be relevant for phase reorganiz- underpinnings of such interactions are largely unclear.
ation of ongoing oscillatory activity, so that high-excitability Music-cued versus non-cued movement leads to additional
phases align to the timing of relevant events [21]. New neural activation in motor areas, specifically cerebellum
methods, such as rapid invisible frequency tagging [22–24], [40,41], suggesting that the neural activations related to mul-
might clarify how multisensory signals are neurally inte- timodal processing are synergetic. This may explain findings
grated, and what the role of low-frequency oscillations is in of enhanced learning with multimodal cues, for instance
this process over time. Moreover, novel approaches focusing when auditory feedback of movement (or sonification) is pro-
on moment-to-moment fluctuations in oscillatory activity vided [42,43]. Even when multimodal embedding of motor
combined with methods with an increased spatial resolution learning does not show clear behavioural increases, differ-
(e.g. ECoG/depth-electrode recordings) could significantly ences in learning-related neural plasticity were reported for
advance our knowledge of the role of oscillatory activity in novices learning a new motor sequence to music as compared
routing and integrating multisensory information across to without [44], suggesting that the learning process is
different neural networks [21]. This will be especially relevant implemented qualitatively differently [45].
in more complex, higher level multimodal binding scenarios, Taken together, different sensory modalities, and the fea-
such as (human) communication. tures embedded in these signals, have different sensitivities
Communicative signals in naturalistic settings arguably for specific timescales, making some features especially suit-
include multiple features that work together to maximize able for cross-modal combinations. When investigating
their effectiveness. Different sensory modalities may operate features that naturally combine, behavioural and neural
at different timescales, with specific well-matched combi- responses emerge which amount to more than a simple
nations of features across modalities, leading to common addition of multiple processes.
cross-modal mappings that are intuitively associated (e.g.
visual size and auditory loudness, cf. [25–27]). Prominent
well-matched cross-modal mappings (see §§4 and 5) are sen- 4. Body level: multimodal signalling and
sorimotor mappings: signals transmitted to, from, or within
visuomotor and auditory-motor systems. Given the high sen- peripheral bodily constraints
sitivity of the auditory system for periodic signals aligned Understanding rhythmic multimodal communication also
with motor periodicities [28,29], auditory signals often requires a still underdeveloped understanding of peripheral
entrain movement, with examples seen in various kinds of bodily constraints (henceforth biomechanics) in the pro-
joint action (e.g. marching or other timed actions). Less com- duction of multimodal signals. Here, we overview findings
monly, visual signals serve this purpose, as seen in musical which show how multimodal signalling sometimes exploits
conductors. Moreover, perception of both auditory and physical properties of the body in the construction of
visual rhythms shares neural substrates with the motor temporally complex signals.
Speech is putatively a superordinate mode of coordi- gestures possible in this classic McGurk effect [61]. Recently, a 5
nation between what were originally stable independent ‘manual gesture McGurk-effect’ has been discovered [62].

royalsocietypublishing.org/journal/rstb
vocal and mandibular action routines [46]. In chimpanzees When asked to detect a particular lexical stress in a uniformly
(Pan troglodytes), non-vocal lip smacking occurs in the theta stressed speech sequence, participants who see a hand gesture’s
range (approx. 3–8 Hz) typical of the speech envelope and beat timed with a particular speech segment tend to hear a lex-
labial kinematics of human speech [47]. Marmosets (Callithrix ical stress for that segment [62]. We think it is possible that
jacchus) occupy bistable modes of vocal-articulatory coordi- the gesture–speech–respiratory link as reviewed previously is
nation, where mandibular oscillation is only synchronized actually important for understanding the manual McGurk
at the characteristic theta range with vocal modulations at effect as listeners attune to features of the visual–acoustic
the final but not starting segments of the call [48]. Similarly, signal that are informative about such coordinated modes of
in the zebra finch (Taeniopygia guttata), respiratory pulses production [63]. Similarly, communicative gestures can also
are timed with syrinx activity and rapid beak movements, influence the heard duration of musical notes. For example, the
the coordination of which is held to sustain the highly absolute duration of a percussive tone sounds longer to an
varied vocalization repertoire of this bird species [49]. audience member when seeing a long- versus short-percussion

Phil. Trans. R. Soc. B 376: 20200334


Human speech is characterized by even more hierarchically gesture [64,65].
nested levels of such coordinated periodicities of effectors Furthermore, spontaneous movements are naturally eli-
and is in this sense multimodal [50]. cited by music. Whether this spontaneous movement stems
Human communicative hand gestures have acceleration from generalizable cross-modal associations is debated, but
Downloaded from https://royalsocietypublishing.org/ on 07 June 2023

peaks co-occurrent with emphatic stress in speech, which they might be identified when properly related to biomecha-
are tightly and dynamically coupled under adverse con- nics. For instance, hierarchical bodily representations of metre
ditions, though with more temporal variability for more can be elicited in spontaneous music-induced movement,
complex symbolizing gestures [51]. This coupling of gestures’ with different aspects of the metre embodied in hand, torso
acceleration-induced forces and speech can arise biomechani- or full-arm movements [66]. Additionally, specific coordi-
cally from upper limb–respiratory coupling, e.g. by soliciting nation patterns emerge between different body parts of
anticipatory muscle adjustments to stabilize posture during interacting musicians during musical improvisation [67].
gesture [52], which also include respiratory-controlling Thus, what one hears in music might be constrained to
muscles supporting speech-vocalization [53]. Comparable what body part can be optimally temporally aligned with a
biomechanical interactions and synergies have been found feature in the music.
in other animals long before such associations were raised To detect multimodal cues in this way may be very
to explain aspects of human multimodal prosody. In closely related to indexical signals, such as hearing the poten-
brown-headed cowbirds (Molothrus ater), vocalizations are tial strength of a conspecific from vocal qualities [60,68].
produced with specific respiratory-related abdominal Indexical signals are often the result of perceptual and/or
muscle activity. Such modulations are reduced during voca- morphological specialization to detect/convey features
lizing while moving the wings for visual displaying, even from multimodal couplings. For example, frogs (Physalaemus
though air sac pressure is maintained. This suggests that pustulosus) and frog-eating bats (Trachops cirrhosus) have
visual displays in cowbirds biomechanically interact with res- learned to attune to frog calls in relation to the water ripples
piratory dynamics supporting vocalization [54]. During their produced by the calling frog’s vocal sac deformations [69].
more vigorous wing displays, these birds are vocally silent, Similarly, crested pigeons (Ochyphaps lophotes) are alarmed
likely so as to avoid biomechanical instability of singing by the sounds of high-velocity wing beats of conspecifics,
and moving vigorously at the same time. Such biomechanical where the feathers turn out to have morphologically evolved
interactions are consistent with findings of the wing beats of to produce the aeroelastic flutter needed to sustain these
flying bats (e.g. Pteronotus parnellii), which are synchronized unique alarm calls during fleeing locomotion [70]. In broad-
with echo vocalizations due to locomotion–respiratory bio- tailed hummingbirds (Selasphorus platycercus), the character-
mechanical synergies [55]. The echo vocalizations during a istic high-speed courtship dives seem to be driven to attain
flight are often isochronously structured (at 6–12 Hz), and exactly the right speeds to elicit sonification from aeroelastic
this rhythmic ability is attributed to locomotion–respiratory flutter, which is synchronized with attaining the correct
couplings as they share a temporal structure. However, iso- angle transition relative to the to-be-impressed perceiver so
chrony (at 12–24 Hz) has also been observed in stationary that the gorget dramatically changes colour during sound
bats when producing social vocalizations [56]. In this way, production [71]. In sum, multimodal communication some-
biomechanical stabilities from one domain may have scaf- times involves a specialized exploitation or attunement of
folded the rhythmic vocal capabilities that are sustained in physics that constrains particular (combined) modes of
social vocal domains [57]. acting (with the environment).
Rhesus macaques assume different facial postures with Note that the multimodal information embedded in com-
particular vocalizations. Lips usually protrude when emitting municative acoustic signals can have impacts on complex
coos or grunts (e.g. during mother–infant contact or group communication in humans too. Speakers who cannot see
progression). During the emission of screams (e.g. copulation but only hear each other tend to align patterns of postural
or threats), lips retract [58]. In macaques, facial gestures are sway suggesting that vocal features are used to coordinate
associated with peculiar vocal tract shapes, which influence a wider embodied context [72]. These emergent coordinations
acoustic signals during phonation [59] and can be discriminated are found to increase social affiliation and can align bodily
by conspecific listeners [60]. Relatedly, in humans, perceiving lip processes [73]. For example, synchronized drumming in
postures allows the perceiver to derive a /ba/ or /pa/ from an groups synchronizes physiology, aligning participants’ heart-
auditory signal. It is the auditory–visual–motor co-regularity beats [74]. Further, visual observation of interpersonal
that makes the visual or haptic perception of articulatory synchronous movement between others may lead observers
to rate higher levels of rapport (liking) between the interact- head, face, torso, etc. [3]. These different signals are typically 6
ing individuals [75] and increase an audience’s affective not aligned in time but distributed over the entire length of

royalsocietypublishing.org/journal/rstb
and aesthetic enjoyment of a group dance performance [76]. utterances and beyond, with varying onsets and offsets.
To conclude, we have overviewed examples of peripheral Typically, multimodal utterances in human social inter-
bodily constraints which influence the perception and produc- action are produced within a scaffold of speaking turns.
tion of multimodal signals across species. Specifically, these Sacks et al. [82] propose that interlocutors abide by a clear
biomechanical processes mediate the temporal structuring of set of rules which, combined with linguistic information
multimodal communicative signals. (semantics, pragmatics, prosody and syntax), afford precise
timing of turns, yielding minimal gaps and overlaps. Indeed,
quantitative cross-language analyses support this tight tem-
5. Social level: complex rhythms in interactive poral coupling [83], in line with a putative ‘interaction
engine’ providing cognitive-interactional predispositions for
multimodal communication this human ability [84], though gestural turn exchanges in
In this section, we overview how social interaction complexi- bonobos point towards an evolutionary precursor [85].

Phil. Trans. R. Soc. B 376: 20200334


fies the rhythms which are sustained in communication The rhythmical structure may further facilitate the temporal
relative to the rhythms that would arise out of more simple coupling of turns. Wilson & Wilson [86] specify a mechanism by
sending or receiving of signals. which interlocutors’ endogenous oscillators are anti-phase
Temporal structure is often rhythmic, but studies have coupled, allowing next speakers to launch their turn ‘on time’,
Downloaded from https://royalsocietypublishing.org/ on 07 June 2023

also found quasi-rhythmic structure in sounds of speech, while decreasing the chance of overlap. This may be enhanced
music and animal communication [77], and likewise for through temporal projections derived from linguistic infor-
movements produced while talking, singing or performing mation [87], but the rhythmical abilities grounding this
music [78]. The multiscale character of these sounds and mechanism are evolutionarily basic [88]. Wild non-human
movements is readily illustrated in speech—phonemes of primates, like indris, gibbons and chimpanzees, show coordi-
varying durations combine to create longer syllabic units nation during joint vocal output, suggesting the ability to
with more variability in duration, which combine to form coordinate to auditory rhythms [89,90]. The captive chimpan-
phrases with even more variability in length, and so on, zee Ai was able to synchronize her keyboard tapping with an
thus creating quasi-rhythmicity at each timescale. acoustic stimulus [91], and captive macaques can flexibly
The durations of linguistic units like phonemes and sylla- adjust their tapping in anticipation of the beat of a visual metro-
bles are difficult to measure in the acoustic speech signal, but nome [34]. Moreover, cotton-top tamarins and marmosets have
they generally correspond to modulations in a specific feature been observed to avoid initiating and adjust the duration and
of the acoustic signal, called the amplitude envelope. Within onset of their calls such that they avoid interfering noise [92].
the amplitude envelope, units are expressed in terms of However, conversational turn-taking is also characterized
bursts and lulls of energy, and their temporal patterning by temporal variation, including periods of overlap and gaps
can be distilled in the timing of bursts via peak amplitudes. ranging up to hundreds of milliseconds [83,93]. The full breadth
Speech analysis [79] shows that smaller bursts cluster to of factors influencing turn transition times remains opaque, but
form larger bursts, where larger bursts cluster to form even turn duration, syntactic complexity, word frequency and social
larger bursts across timescales that roughly correspond with action are some of them [94]. A coupled-oscillator turn-taking
( phonemic, syllabic, phrasal) units of language. Musical mechanism can accommodate this large variation in turn
recordings also exhibit degrees of multiscale structure timing, since entrained interlocutors could begin speaking at
whose specifics depend on the genre of music or type of any new anti-phased periodic cycle [86,88]. A recent study
speech performance [77]. Even recordings of animal vocaliza- based on telephone interactions shows a quasi-rhythmic struc-
tions have been found to exhibit multiscale structure using ture regulated by turn-by-turn negative autocorrelations [95].
those same analysis methods. While we do not have access The coupled-oscillator mechanism that may form the basis for
to the underlying units, recordings of communicative vocali- dealing with quasi-rhythmicity at the interactional level may
zations produced by killer whales were found to have a also govern communication in non-human species, such as
quasi-rhythmic structure across timescales surprisingly the interactional synchronization of non-isochronous call
similar to human speech interactions [77]. patterns in the katydid species Mecopoda [96].
Multiscale structure in speech and music is also multimo- To conclude, the temporal organization of intentional
dal. Analyses of sounds and movements in video recordings communication is an intricate matter, characterized, on one
have found coordinated multiscale structures in the ampli- hand, by synchrony serving the amplification of signals or
tudes of co-speech face, head and body movements [78], specific features/components thereof, as well as semantic
and the degree of coordination in speech sounds and move- enhancement and smooth coordination between interlocu-
ments depends on the communicative context. Studies of tors. On the other hand, the temporal organization is
the rhythmic structure have also found that visual communi- characterized by quasi-rhythmic, multiscale structure within
cative signals are tightly coordinated with the acoustic signals and across modalities, serving complex communication and
of speech [3]. However, while gestures with a beating quality coordination patterns that are widespread in communicative
coincide closely with pitch peaks, on the semantic level animal vocalizations, human speech and even music.
object- or action-depicting gestures frequently precede corre-
sponding lexical items by several hundred milliseconds [80].
Facial signals, too, can precede the speech they relate to [81].
Variable timing is most obvious if we consider multimodal 6. Conclusion
utterances in their entirety, where speech is embedded in a We have argued that to understand communicative rhythms
rich infrastructure of visual signals coming from the hands, which characterize animal communication, a multimodal
perspective is necessary and multiple levels need to be exam- place to exploit certain bodily capacities [101]. A recent inte- 7
ined. The current overview takes the first step towards a grative approach has been proposed in the understanding of

royalsocietypublishing.org/journal/rstb
multilevel multimodal approach, showing how each level beat perception and motor synchronization, where it is
(neural, bodily and interactive) uniquely contributes to the suggested that a network of biological oscillators are at play
communicative rhythms of animals. We think that when pro- when moving to a rhythm, which involves more neurally
cesses on these levels are understood we can productively passive dynamic biomechanics and neural processes [28].
investigate why the rhythms of, for example, human conver- Finally, social interactions allow for new rhythmic stabilities
sation are so complexly varied. Though we have addressed that are simply absent or qualitatively different in nature to
the unique contributions at each level independently, the non-interactive set-ups [102]. Indeed, there are increasingly
biggest challenge is understanding how levels intersect. louder calls for action for understanding neural processes
A historic lesson in this regard comes from early theories as sometimes softly assembling into a wider distributed
about human vocalization. Early theories held that phonation multi-person system in social interactions [103–106]. The cur-
was actively neurally driven, such that active muscle contrac- rent contribution further underlines a call for such a
tions would be needed to complete each vocal fold cycle [97]. multiscale investigation of temporal rhythms of multimodal

Phil. Trans. R. Soc. B 376: 20200334


This hypothesis was soon refuted in favour of a biomechani- communication, where neural processes are properly
cal theory [98], which correctly posited that vocal fold embedded in bodily processes unfolding in social interaction.
oscillation arises out of more neurally passive dynamics.
Namely, vocal fold oscillations arise due to air pressure flux
Downloaded from https://royalsocietypublishing.org/ on 07 June 2023

around a tensed elastic material (i.e. vocal folds). Similarly,


Data accessibility. This article has no additional data.
neurally passive dynamics have been discovered in subsonic
Authors’ contributions. W.P. led the writing of the manuscript. S.P. has
phonations in elephant (Loxodonta africana) trunks [99]. But designed the figure. W.P., S.P., L.D., M.G., J.H., C.K., R.S. and
interestingly, it turns out that for several cat species low-fre- G.A.W. have written the manuscript.
quency purring is actively neuro-muscularly driven to Competing interests. We declare we have no competing interests
complete a cycle [100]. The lesson is that the neural-cognitive Funding. W.P. is supported by a Donders Fellowship and is financially
mechanisms that are invoked in our explanations of rhythmic supported by the Language in Interaction consortium project ‘Com-
communication will crucially depend on our knowledge of municative Alignment in Brain & Behaviour’ (CABB). L.D. is
biomechanics, and any redundancies present biomechani- supported by a Minerva Fast Track Fellowship from the Max
Planck Society. L.D. and J.H. are supported by the European
cally can completely reshape the type of neural-cognitive Research Council (CoG grant no. 773079, awarded to J.H.).
control mechanisms that need be invoked. In the same way, Acknowledgements. We would like to thank the organizers of the Lorentz
understanding the unique neural constraints can lead to the workshop ‘Synchrony and rhythm interaction’ for their leadership in
discovery that neural-cognitive mechanisms need to be in the field.

References
1. Partan SR, Marler P. 2005 Issues in the classification 9. Sandler W. 2018 The body as evidence for the stimuli. J. Cogn. Neurosci. 1, 12–24. (doi:10.1162/
of multimodal communication signals. Am. Nat. nature of language. Front. Psychol. 9, 1782. (doi:10. jocn.1989.1.1.12)
166, 231–245. (doi:10.1086/431246) 3389/fpsyg.2018.01782) 16. Burnett LR, Stein BE, Perrault TJ, Wallace MT. 2007
2. Fröhlich M, Sievers C, Townsend SW, Gruber T, van 10. Halfwerk W, Varkevisser J, Simon R, Mendoza E, Excitotoxic lesions of the superior colliculus
Schaik CP. 2019 Multimodal communication and Scharff C, Riebel K. 2019 Toward testing for preferentially impact multisensory neurons and
language origins: integrating gestures and vocalizations. multimodal perception of mating signals. multisensory integration. Exp. Brain Res. 179,
Biol. Rev. 94, 1809–1829. (doi:10.1111/brv.12535) Front. Ecol. Evol. 7, 124. (doi:10.3389/fevo. 325–338. (doi:10.1007/s00221-006-0789-8)
3. Holler J, Levinson SC. 2019 Multimodal language 2019.00124) 17. Khandhadia AP, Murphy AP, Romanski LM, Bizley
processing in human communication. Trends 11. Stein BE, Meredith MA. 1993 The merging of the JK, Leopold DA. 2021 Audiovisual integration
Cogn. Sci. 23, 639–652. (doi:10.1016/j.tics.2019. senses. Cambridge, MA: MIT Press. in macaque face patch neurons. Curr. Biol.
05.006) 12. Xu J, Yu L, Stanford TR, Rowland BA, Stein BE. 2015 31, 1826–1835. (doi:10.1016/j.cub.2021.
4. Stein BE, Stanford TR. 2008 Multisensory What does a neuron learn from multisensory 01.102)
integration: current issues from the perspective of experience? J. Neurophysiol. 113, 883–889. (doi:10. 18. Narins PM, Grabul DS, Soma KK, Gaucher P, Hödl W.
the single neuron. Nat. Rev. Neurosci. 9, 255–266. 1152/jn.00284.2014) 2005 Cross-modal integration in a dart-poison frog.
(doi:10.1038/nrn2331) 13. Yu L, Cuppini C, Xu J, Rowland BA, Stein BE. 2019 Proc. Natl Acad. Sci. USA 102, 2425–2429. (doi:10.
5. Hebets EA, Papaj DR. 2005 Complex signal function: Cross-modal competition: the default computation 1073/pnas.0406407102)
developing a framework of testable hypotheses. for multisensory processing. J. Neurosci. 19. Uy JAC, Safran RJ. 2013 Variation in the temporal
Behav. Ecol. Sociobiol. 57, 197–214. (doi:10.1007/ 39, 1374–1385. (doi:10.1523/JNEUROSCI.1806- and spatial use of signals and its implications for
s00265-004-0865-7) 18.2018) multimodal communication. Behav. Ecol. Sociobiol.
6. Dimensions.Guide | Database of Dimensioned 14. Meredith MA, Stein BE. 1986 Visual, auditory, and 67, 1499–1511. (doi:10.1007/s00265-013-1492-y)
Drawings. See https://www.dimensions.guide somatosensory convergence on cells in superior 20. Senkowski D, Schneider TR, Foxe JJ, Engel AK.
(accessed on 1 May 2019). colliculus results in multisensory integration. 2008 Crossmodal binding through neural
7. Chilton J. 2020 Brain outline. Zenodo. (doi:10.5281/ J. Neurophysiol. 56, 640–662. (doi:10.1152/jn.1986. coherence: implications for multisensory processing.
zenodo.3925989) 56.3.640) Trends Neurosci. 31, 401–409. (doi:10.1016/j.tins.
8. Partan SR, Marler P. 1999 Communication goes 15. Stein BE, Meredith MA, Huneycutt WS, McDade L. 2008.05.002)
multimodal. Science 283, 1272–1273. (doi:10.1126/ 1989 Behavioral indices of multisensory integration: 21. Bauer A-KR, Debener S, Nobre AC. 2020
science.283.5406.1272) orientation to visual cues is affected by auditory Synchronisation of neural oscillations and cross-
modal influences. Trends Cogn. Sci. 24, 481–495. between speech and gestures in clear and degraded 51. Pouw W, Dixon JA. 2019 Entrainment and 8
(doi:10.1016/j.tics.2020.03.003) speech. J. Cogn. Neurosci. 30, 1086–1097. (doi:10. modulation of gesture–speech synchrony under

royalsocietypublishing.org/journal/rstb
22. Drijvers L, Spaak E, Jensen O. 2020 Rapid invisible 1162/jocn_a_01301) delayed auditory feedback. Cogn. Sci. 43, e12721.
frequency tagging reveals nonlinear integration of 37. Pearce MT, Ruiz MH, Kapasi S, Wiggins GA, (doi:10.1111/cogs.12721)
auditory and visual semantic information. bioRxiv Bhattacharya J. 2010 Unsupervised statistical learning 52. Hodges PW, Richardson CA. 1997 Feedforward
067454. (doi:10.1101/2020.04.29.067454) underpins computational, behavioural, and neural contraction of transversus abdominis is not
23. Zhigalov A, Herring JD, Herpers J, Bergmann TO, manifestations of musical expectation. Neuroimage 50, influenced by the direction of arm movement. Exp.
Jensen O. 2019 Probing cortical excitability using 302–313. (doi:10.1016/j.neuroimage.2009.12.019) Brain Res. 114, 362–370. (doi:10.1007/pl00005644)
rapid frequency tagging. Neuroimage 195, 59–66. 38. Repp BH. 2005 Sensorimotor synchronization: a 53. Pouw W, Harrison SJ, Esteve-Gibert N, Dixon JA.
(doi:10.1016/j.neuroimage.2019.03.056) review of the tapping literature. Psychon. Bull. Rev. 2020 Energy flows in gesture-speech physics: the
24. Duecker K, Gutteling TP, Herrmann CS, Jensen O. 12, 969–992. (doi:10.3758/BF03206433) respiratory-vocal system and its coupling with hand
2020 No evidence for entrainment: endogenous 39. Escoffier N, Herrmann CS, Schirmer A. 2015 Auditory gestures. J. Acoust. Soc. Am. 148, 1231–1247.
gamma oscillations and rhythmic flicker responses rhythms entrain visual processes in the human (doi:10.1121/10.0001730)
coexist in visual cortex. bioRxiv 279497. (doi:10. brain: evidence from evoked oscillations and event- 54. Cooper BG, Goller F. 2004 Multimodal signals:

Phil. Trans. R. Soc. B 376: 20200334


1101/2020.09.02.279497) related potentials. Neuroimage 111, 267–276. enhancement and constraint of song motor patterns
25. Eitan Z, Granot RY. 2006 How music moves: musical (doi:10.1016/j.neuroimage.2015.02.024) by visual display. Science 303, 544–546. (doi:10.
parameters and listeners images of motion. Music 40. Brown S, Martinez MJ, Parsons LM. 2006 The neural 1126/science.1091099)
Percept. Interdiscip. J. 23, 221–248. (doi:10.1525/ basis of human dance. Cereb. Cortex 16, 55. Lancaster WC, Henson OW, Keating AW. 1995
Downloaded from https://royalsocietypublishing.org/ on 07 June 2023

mp.2006.23.3.221) 1157–1167. (doi:10.1093/cercor/bhj057) Respiratory muscle activity in relation to vocalization


26. Küssner MB, Tidhar D, Prior HM, Leech-Wilkinson D. 41. Schaefer RS, Morcom AM, Roberts N, Overy K. 2014 in flying bats. J. Exp. Biol. 198, 175–191. (doi:10.
2014 Musicians are more consistent: gestural cross- Moving to music: effects of heard and imagined 1242/jeb.198.1.175)
modal mappings of pitch, loudness and tempo in musical cues on movement-related brain activity. 56. Burchardt LS, Norton P, Behr O, Scharff C,
real-time. Front. Psychol. 5, 789. (doi:10.3389/fpsyg. Front. Hum. Neurosci. 8, 774. (doi:10.3389/fnhum. Knörnschild M. 2019 General isochronous rhythm in
2014.00789) 2014.00774) echolocation calls and social vocalizations of the bat
27. Schaefer RS, Beijer LJ, Seuskens W, Rietveld TCM, 42. Brown RM, Penhune VB. 2018 Efficacy of auditory Saccopteryx bilineata. R. Soc. Open Sci. 6, 181076.
Sadakata M. 2016 Intuitive visualizations of pitch versus motor learning for skilled and novice (doi:10.1098/rsos.181076)
and loudness in speech. Psychon. Bull. Rev. 23, performers. J. Cogn. Neurosci. 30, 1657–1682. 57. Larsson M, Richter J, Ravignani A. 2019 Bipedal
548–555. (doi:10.3758/s13423-015-0934-0) (doi:10.1162/jocn_a_01309) steps in the development of rhythmic behavior in
28. Damm L, Varoqui D, De Cock VC, Dalla Bella S, 43. van Vugt FT, Kafczyk T, Kuhn W, Rollnik JD, humans. Music Sci. 2, 2059204319892617. (doi:10.
Bardy B. 2020 Why do we move to the beat? A Tillmann B, Altenmüller E. 2016 The role of auditory 1177/2059204319892617)
multi-scale approach, from physical principles to feedback in music-supported stroke rehabilitation: a 58. Hauser MD, Evans CS, Marler P. 1993 The role of
brain dynamics. Neurosci. Biobehav. Rev. 112, single-blinded randomised controlled intervention. articulation in the production of rhesus monkey,
553–584. (doi:10.1016/j.neubiorev.2019.12.024) Restor. Neurol. Neurosci. 34, 297–311. (doi:10.3233/ Macaca mulatta, vocalizations. Anim. Behav. 45,
29. Morillon B, Baillet S. 2017 Motor origin of temporal RNN-150588) 423–433. (doi:10.1006/anbe.1993.1054)
predictions in auditory attention. Proc. Natl Acad. Sci. 44. Moore E, Schaefer RS, Bastin ME, Roberts N, Overy K. 59. Fitch WT. 1997 Vocal tract length and formant
114, E8913–E8921. (doi:10.1073/pnas.1705373114) 2017 Diffusion tensor MRI tractography reveals frequency dispersion correlate with body size in
30. Schubotz RI, Friederici AD, von Cramon DY. 2000 Time increased fractional anisotropy (FA) in arcuate rhesus macaques. J. Acoust. Soc. Am. 102,
perception and motor timing: a common cortical and fasciculus following music-cued motor training. Brain 1213–1222. (doi:10.1121/1.421048)
subcortical basis revealed by fMRI. Neuroimage 11, Cogn. 116, 40–46. (doi:10.1016/j.bandc.2017.05.001) 60. Ghazanfar AA, Turesson HK, Maier JX, van Dinther R,
1–12. (doi:10.1006/nimg.1999.0514) 45. Schaefer RS. 2014 Auditory rhythmic cueing in Patterson RD, Logothetis NK. 2007 Vocal-tract
31. Repp BH, Penel A. 2004 Rhythmic movement is movement rehabilitation: findings and possible resonances as indexical cues in rhesus monkeys. Curr.
attracted more strongly to auditory than to visual mechanisms. Phil. Trans. R. Soc. B 369, 20130402. Biol. 17, 425–430. (doi:10.1016/j.cub.2007.01.029)
rhythms. Psychol. Res. 68, 252–270. (doi:10.1007/ (doi:10.1098/rstb.2013.0402) 61. Fowler CA, Dekle DJ. 1991 Listening with eye and
s00426-003-0143-8) 46. MacNeilage PF. 1998 The frame/content theory of hand: cross-modal contributions to speech
32. Hove MJ, Fairhurst MT, Kotz SA, Keller PE. 2013 evolution of speech production. Behav. Brain Sci. perception. J. Exp. Psychol. Hum. Percept. Perform.
Synchronizing with auditory and visual rhythms: an 21, 499–511. (doi:10.1017/S0140525X98001265) 17, 816–828. (doi:10.1037/0096-1523.17.3.816)
fMRI assessment of modality differences and 47. Pereira AS, Kavanagh E, Hobaiter C, Slocombe KE, 62. Bosker HR, Peeters D. 2020 Beat gestures influence
modality appropriateness. Neuroimage 67, Lameira AR. 2020 Chimpanzee lip-smacks confirm which speech sounds you hear. Proc. R. Soc. B 288,
313–321. (doi:10.1016/j.neuroimage.2012.11.032) primate continuity for speech-rhythm evolution. Biol. 20202419. (doi:10.1101/2020.07.13.200543)
33. Peña M, Langus A, Gutiérrez C, Huepe-Artigas D, Lett. 16, 20200232. (doi:10.1098/rsbl.2020.0232) 63. Pouw W, Paxton A, Harrison SJ, Dixon JA. 2020
Nespor M. 2016 Rhythm on your lips. Front. Psychol. 48. Risueno-Segovia C, Hage SR. 2020 Theta Acoustic information about upper limb movement
7, 1708. (doi:10.3389/fpsyg.2016.01708) synchronization of phonatory and articulatory systems in voicing. Proc. Natl Acad. Sci. 117, 11 364–11 367.
34. Gámez J, Yc K, Ayala YA, Dotov D, Prado L, in marmoset monkey vocal production. Curr. Biol. 30, (doi:10.1073/pnas.2004163117)
Merchant H. 2018 Predictive rhythmic tapping to 4276–4283.e3. (doi:10.1016/j.cub.2020.08.019) 64. Schutz M, Lipscomb S. 2016 Hearing gestures, seeing
isochronous and tempo changing metronomes in 49. Goller F, Cooper BG. 2004 Peripheral motor music: vision influences perceived tone duration.
the nonhuman primate. Ann. N.Y. Acad. Sci. 1423, dynamics of song production in the zebra finch. Perception 36, 888–897. (doi:10.1068/p5635)
386–414. (doi:10.1111/nyas.13671) Ann. N.Y. Acad. Sci. 1016, 130–152. (doi:10.1196/ 65. Cai ZG, Connell L, Holler J. 2013 Time does not flow
35. Merchant H, Honing H. 2014 Are non-human primates annals.1298.009) without language: spatial distance affects temporal
capable of rhythmic entrainment? Evidence for the 50. Kelso JAS, Tuller B, Harris K. 1983 A ‘Dynamic duration regardless of movement or direction.
gradual audiomotor evolution hypothesis. Front. Pattern’ perspective on the control and coordination Psychon. Bull. Rev. 20, 973–980. (doi:10.3758/
Neurosci. 7, 274. (doi:10.3389/fnins.2013.00274) of movement. In The production of speech (ed. PF s13423-013-0414-3)
36. Drijvers L, Özyürek A, Jensen O. 2018 Alpha and MacNeilage), pp. 137–173. Berlin, Germany: 66. Toiviainen P, Luck G, Thompson MR. 2010
beta oscillations index semantic congruency Springer-Verlag. Embodied meter: hierarchical eigenmodes in music-
induced movement. Music Percept. 28, 59–70. song. Cognition 163, 80–86. (doi:10.1016/j. acoustic interference. J. Comp. Physiol. A 193, 9
(doi:10.1525/mp.2010.28.1.59) cognition.2017.02.017) 477–483. (doi:10.1007/s00359-006-0205-7)

royalsocietypublishing.org/journal/rstb
67. Walton AE, Washburn A, Langland-Hassan P, Chemero 80. Ter Bekke M, Drijvers L, Holler J. 2020 The predictive 93. Holler J, Kendrick KH, Levinson SC. 2018 Processing
A, Kloos H, Richardson MJ. 2018 Creating time: social potential of hand gestures during conversation: an language in face-to-face conversation: questions with
collaboration in music improvisation. Top. Cogn. Sci. investigation of the timing of gestures in relation to gestures get faster responses. Psychon. Bull. Rev. 25,
10, 95–119. (doi:10.1111/tops.12306) speech. PsyArXiv. (doi:10.31234/osf.io/b5zq7) 1900–1908. (doi:10.3758/s13423-017-1363-z)
68. Pisanski K, Cartei V, McGettigan C, Raine J, Reby D. 81. Kaukomaa T, Peräkylä A, Ruusuvuori J. 2013 94. Roberts SG, Torreira F, Levinson SC. 2015 The effects
2016 Voice modulation: a window into the origins Turn-opening smiles: facial expression constructing of processing and sequence organization on the
of human vocal control? Trends Cogn. Sci. 20, emotional transition in conversation. J. Pragmat. 55, timing of turn taking: a corpus study. Front. Psychol.
304–318. (doi:10.1016/j.tics.2016.01.002) 21–42. (doi:10.1016/j.pragma.2013.05.006) 6, 3389. (doi:10.3389/fpsyg.2015.00509)
69. Halfwerk W, Jones PL, Taylor RC, Ryan MJ, Page RA. 82. Sacks H, Schegloff EA, Jefferson G. 1974 A simplest 95. Pouw W, Holler J. 2020 Timing in conversation is
2014 Risky ripples allow bats and frogs to eavesdrop systematics for the organization of turn-taking for dynamically adjusted turn by turn: evidence for lag-1
on a multisensory sexual display. Science 343, conversation. Language 50, 696–735. (doi:10.1353/ negatively autocorrelated turn taking times in telephone
413–416. (doi:10.1126/science.1244812) lan.1974.0010) conversation. PsyArXiv. (doi:10.31234/osf.io/b98da)

Phil. Trans. R. Soc. B 376: 20200334


70. Murray TG, Zeil J, Magrath RD. 2017 Sounds of 83. Stivers T et al. 2009 Universals and cultural variation in 96. Nityananda V, Balakrishnan R. 2020 Synchrony of
modified flight feathers reliably signal danger in a turn-taking in conversation. Proc. Natl Acad. Sci. 106, complex signals in an acoustically communicating
pigeon. Curr. Biol. 27, 3520–3525.e4. (doi:10.1016/ 10 587–10 592. (doi:10.1073/pnas.0903616106) katydid. J. Exp. Biol. 224, 241877. (doi:10.1242/jeb.
j.cub.2017.09.068) 84. Levinson SC. 2019 Interactional foundations of 241877)
Downloaded from https://royalsocietypublishing.org/ on 07 June 2023

71. Hogan BG, Stoddard MC. 2018 Synchronization of language: the interaction engine hypothesis. In 97. Husson R, Garde ÉJ, Richard A. 1952 L’acoustique
speed, sound and iridescent color in a hummingbird Human language: from genes and brain to behavior des salles du point de vue du chanteur et de
aerial courtship dive. Nat. Commun. 9, 5260. (ed. P Hagoort), pp. 189–200. Cambridge, MA: l’orateur. Ann. Télécommunications 7, 58–74.
(doi:10.1038/s41467-018-07562-7) MIT Press. (doi:10.1007/BF03017103)
72. Shockley K, Baker AA, Richardson MJ, Fowler CA. 85. Fröhlich M, Kuchenbuch P, Müller G, Fruth B, 98. Titze IR. 1973 The human vocal cords: a
2007 Articulatory constraints on interpersonal Furuichi T, Wittig RM, Pika S. 2016 Unpeeling the mathematical model. Phonetica 28, 129–170.
postural coordination. J. Exp. Psychol. Hum. layers of language: bonobos and chimpanzees (doi:10.1159/000259453)
Percept. Perform. 33, 201–208. (doi:10.1037/0096- engage in cooperative turn-taking sequences. Sci. 99. Herbst CT, Stoeger AS, Frey R, Lohscheller J, Titze IR,
1523.33.1.201) Rep. 6, 25887. (doi:10.1038/srep25887) Gumpenberger M, Fitch WT. 2012 How low can you
73. Wiltermuth SS, Heath C. 2009 Synchrony and 86. Wilson M, Wilson TP. 2005 An oscillator model of go? Physical production mechanism of elephant
cooperation. Psychol. Sci. 20, 1–5. (doi:10.1111/j. the timing of turn-taking. Psychon. Bull. Rev. 12, infrasonic vocalizations. Science 337, 595–599.
1467-9280.2008.02253.x) 957–968. (doi:10.3758/BF03206432) (doi:10.1126/science.1219712)
74. Gordon I, Gilboa A, Cohen S, Milstein N, Haimovich 87. Levinson SC. 2016 Turn-taking in human 100. Peters G. 2002 Purring and similar vocalizations in
N, Pinhasi S, Siegman S. 2020 Physiological and communication—origins and implications for mammals. Mamm. Rev. 32, 245–271. (doi:10.1046/
behavioral synchrony predict group cohesion and language processing. Trends Cogn. Sci. 20, 6–14. j.1365-2907.2002.00113.x)
performance. Sci. Rep. 10, 8484. (doi:10.1038/ (doi:10.1016/j.tics.2015.10.010) 101. Fitch WT. 2018 The biology and evolution of speech: a
s41598-020-65670-1) 88. Takahashi DY, Narayanan DZ, Ghazanfar AA. 2013 comparative analysis. Annu. Rev. Linguist. 4, 255–279.
75. McEllin L, Knoblich G, Sebanz N. 2020 Coupled oscillator dynamics of vocal turn-taking in (doi:10.1146/annurev-linguistics-011817-045748)
Synchronicities that shape the perception of joint monkeys. Curr. Biol. 23, 2162–2168. (doi:10.1016/j. 102. Tognoli E, Zhang M, Fuchs A, Beetle C, Kelso JAS.
action. Sci. Rep. 10, 15554. (doi:10.1038/s41598- cub.2013.09.005) 2020 Coordination dynamics: a foundation for
020-72729-6) 89. Geissmann T, Orgeldinger M. 2000 The relationship understanding social behavior. Front. Hum. Neurosci.
76. Vicary S, Sperling M, Zimmermann Jv, Richardson between duet songs and pair bonds in siamangs, 14, 317. (doi:10.3389/fnhum.2020.00317)
DC, Orgs G. 2017 Joint action aesthetics. PLoS Hylobates syndactylus. Anim. Behav. 60, 805–809. 103. De Jaegher H, Di Paolo E, Gallagher S. 2010 Can social
ONE 12, e0180101. (doi:10.1371/journal.pone. (doi:10.1006/anbe.2000.1540) interaction constitute social cognition? Trends Cogn.
0180101) 90. Gamba M, Torti V, Estienne V, Randrianarison RM, Sci. 14, 441–447. (doi:10.1016/j.tics.2010.06.009)
77. Kello CT, Bella SD, Médé B, Balasubramaniam R. Valente D, Rovara P, Bonadonna G, Friard O, 104. Dumas G, Kelso JAS, Nadel J. 2014 Tackling the
2017 Hierarchical temporal structure in music, Giacoma C. 2016 The Indris have got rhythm! social cognition paradox through multi-scale
speech and animal vocalizations: jazz is like a Timing and pitch variation of a primate song approaches. Front. Psychol. 5, 882. (doi:10.3389/
conversation, humpbacks sing like hermit thrushes. examined between sexes and age classes. fpsyg.2014.00882)
J. R. Soc. Interface 14, 20170231. (doi:10.1098/rsif. Front. Neurosci. 10, 249. (doi:10.3389/fnins. 105. Redcay E, Schilbach L. 2019 Using second-person
2017.0231) 2016.00249) neuroscience to elucidate the mechanisms of social
78. Alviar C, Dale R, Dewitt A, Kello C. 2020 Multimodal 91. Hattori Y, Tomonaga M, Matsuzawa T. 2013 interaction. Nat. Rev. Neurosci. 20, 495–505.
coordination of sound and movement in music and Spontaneous synchronized tapping to an auditory (doi:10.1038/s41583-019-0179-4)
speech. Discourse Process. 57, 682–702. (doi:10. rhythm in a chimpanzee. Sci. Rep. 3, 1566. (doi:10. 106. Bolis D, Schilbach L. 2020 ‘I interact therefore I am’:
1080/0163853X.2020.1768500) 1038/srep01566) the self as a historical product of dialectical
79. Falk S, Kello CT. 2017 Hierarchical organization in 92. Egnor SER, Wickelgren JG, Hauser MD. 2007 attunement. Topoi 39, 521–534. (doi:10.1007/
the temporal structure of infant-direct speech and Tracking silence: adjusting vocal production to avoid s11245-018-9574-0)

You might also like