You are on page 1of 22

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/365814426

Prosody is used for real-time exercising of other bodies

Article in Language & Communication · January 2023


DOI: 10.1016/j.langcom.2022.11.002

CITATIONS READS

3 56

2 authors:

Emily Hofstetter Leelo Keevallik


Linköping University Linköping University
28 PUBLICATIONS 145 CITATIONS 86 PUBLICATIONS 1,504 CITATIONS

SEE PROFILE SEE PROFILE

All content following this page was uploaded by Leelo Keevallik on 27 December 2022.

The user has requested enhancement of the downloaded file.


Language & Communication 88 (2023) 52–72

Contents lists available at ScienceDirect

Language & Communication


journal homepage: www.elsevier.com/locate/langcom

Prosody is used for real-time exercising of other bodies


Emily Hofstetter, Leelo Keevallik*
Linköping University, Department of Culture and Society, 581 83 Linköping, Sweden

a r t i c l e i n f o a b s t r a c t

Article history: While the lexico-grammatical and embodied practices in various instructional activities
Available online 28 November 2022 have been explored in-depth (Keevallik, 2013; Simone & Galatolo, 2020), the vocal ca-
pacities deployed by instructors have not been in focus. This study looks at how a Pilates
Keywords: instructor coaches student bodies by modulating the prosodic production of verbal in-
Prosody in interaction structions and adjusting vocal quality in reflexive coordination with the students’ ongoing
Embodiment
movements. We show how the body of one participant can be expressed and enhanced by
Pilates instruction
another’s voice in a simultaneous assembly of action and argue for the dialogical
Joint agency
Distributed agency
conceptualization of a speaker. These voice-body assemblies constitute evidence of how
Directives actions were brought about jointly rather than constructed individually.
Ó 2022 The Authors. Published by Elsevier Ltd. This is an open access article under the CC
BY license (http://creativecommons.org/licenses/by/4.0/).

1. Introduction

In this study, we discuss an arrangement wherein one person is vocalizing while other co-present participants are per-
forming exercises in an instructor-led Pilates class. We lay out the relationship between the vocal and bodily practices and
demonstrate how ‘sounding for other bodies’ (see Introduction, this issue) forms a part of Pilates instruction. The very ex-
istence of this practice constitutes a challenge for theories of communication that rely on the idea of a linear transfer of
information. Instead, we will show how actions are brought about in a dialogic manner and should thus be seen as distributed
between individuals. This has relevance for our understanding of the fundamental organization of human communication, as
well as speakership and agency, as impinging on intersubjectivity. The analytic focus will be on specific uses of prosody.

1.1. The sequential organization of vocal communication

Traditional models of vocal communication, language included, assume that speakers are separate units, individually
constructing speaking turns for a receiver (e.g. from de Saussure, 1916; to Chomsky, 1965; to Bavelas and Chovil, 2000; to
Townsend et al., 2017; for discussion see Daylight, 2017; Planer and Godfrey-Smith, 2021). Even within the field of naturally
occurring interaction, the earliest studies highlighted the way in which interaction participants organized the one-person-
speaks-at-a-time principle (Sacks et al., 1974), and subsequent research has revealed ways in which individual speakership
is maintained, such as the minimization of overlapping speech (Jefferson, 1986; Sidnell, 2001; Stivers et al., 2009), and the
individual rights of speakers to certain actions (Heritage and Raymond, 2005; Stevanovic and Peräkylä, 2012). Collaborative
utterances, likewise, have been shown to involve methods of organizing speaking that avoid competition for simultaneous
speakership but provide opportunities for multiple participants to contribute to the turn (Dressel and Kalkhoff, 2019; Dressel

* Corresponding author.
E-mail address: leelo.keevallik@liu.se (L. Keevallik).

https://doi.org/10.1016/j.langcom.2022.11.002
0271-5309/Ó 2022 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.
0/).
E. Hofstetter, L. Keevallik / Language & Communication 88 (2023) 52–72 53

and Satti, 2021; Iwasaki, 2009), including phonetic co-construction (Local 2003). Several studies have, however, pointed out
methods for simultaneous participation in interaction (Sidnell, 2001; Meyer, 2010; Pfänder and Couper-Kuhlen, 2019), and
that these are understudied in a research context that emphasizes individualized contributions to action (Bassetti and
Liberman, 2021). Our phenomenon reveals how simultaneous engagement can be done through distribution of resources.
This distribution still involves typically one speaker at a time, but other forms of simultaneous co-participation are organized
such that the event in question occurs across multiple participants.
Few studies are yet to fully acknowledge the degree to which semiotic contextures are organized by all co-participants (see
Goodwin, 2018). In particular, calls for less monologic analysis of prosody have been around for a while (Couper-Kuhlen and
Selting, 1996). Recent work is increasingly recognizing and investigating the complex distribution of resources and agency in
natural language use; the individual notion of a speaker is being questioned from different perspectives (Goodwin, 2004;
Paterson, 2020), clauses have been shown to emerge across individuals and bodies (Keevallik, 2013, 2018; Maschler et al.,
2020), and within evolutionary linguistics, naturalistic observation of great apes has shown communicative actions arising
from joint engagement (Genty et al., 2020). The current study will take a look at how such an engagement comes into being
and discusses what the consequences are for the theory of communication.

1.2. Speakership as a distributed endeavor

Interactional research has documented a myriad of ways in which speakership is distributed across multiple participants. Most
critically, the way a turn-at-talk unfolds is contingent on the ongoing, emergent work of ‘recipients’; the actions of non-speaking
participants contribute to what material gets spoken and with what syntax (Auer and Pfänder, 2011; C. Goodwin,1979,1980a; M. H.
Goodwin, 1980b; Hayashi, 2005; Lerner, 1991; Maschler et al., 2020). In other words, utterances are jointly produced, even when
only one party actively speaks (C. Goodwin, 1979), though additionally, speakership itself can be jointly or even chorally accom-
plished (Lerner, 1996). While theories of polyphony (Bakhtin, 1981) have long argued that a whole history’s worth of speakers are
layered into any given utterance, the above interactional research, and its theoretical formulation in dialogism (Du Bois, 2014;
Linell, 2009), shows that speakership is distributed in an empirical and praxeological sensedthat is, speakers themselves actively
organize utterances as distributed products, and this distribution is not just a theoretical understanding of language.
The distribution of speakership has consequences for participants’ expressed agency and accountability (Enfield, 2017); for
instance, speaking on others’ behalf can seriously impinge on individual’s self-expression and self-determination, as has been
documented in situations involving speech impairments (Aaltonen and Laakso, 2011; C. Goodwin, 2004; Norén et al., 2013)
and disability (Robillard, 1996). Utterances and actions index, through the distribution and design of their components, who is
accountable for what happens and who has rights to speak or act concerning any given issue at hand (e.g. Cekaite, 2016;
Jenkins, 2015; Wiggins, 2013). In instructional settings, the rights are enacted in an asymmetric manner, with one (or a few)
participants enacting rights to speak. For example, one party often enacts primary rights to extended turns, e.g., in school
classrooms (Seedhouse, 2004), driving instruction (Deppermann, 2018), crochet instruction (Ekström and Lindwall, 2014),
and dance classes (Keevallik, 2010, 2017). The instructing party typically displays entitlement to issue directives, though this
has largely been studied verbally (Antaki and Kent, 2012; Craven and Potter, 2010; Curl and Drew, 2008; Heinemann, 2006;
Lindström, 2005; and Reed, 2021 also shows how instructors claim rights over the internal workings of another body), with
less attention to date given to studying directives as embodied activities (M. H. Goodwin and Cekaite, 2013, 2018). The focus of
research in both verbal and embodied instruction research has been on sequential ordering, especially the accomplishment of
instruction-compliance(-evaluation). Such sequences are even possible in the coordination of rapid actions in, e.g., when
playing virtual football, performing surgery or learning to drive a race circuit (Mondada, 2013, 2014, 2017). Especially in these
latter contexts, the micro-sequential affordances of embodied responses may occur ‘early’ (Deppermann et al., 2021), as they
do not claim the turn-space that verbal turns do. In contrast to the sequential focus, we are here exploring a setting in which
achieving a fitted, embodied alignment concurrently to instructions is an unfolding members’ concern. A members’ solution
we document is to create cross-participant and cross-body configuration that is temporally tied into an evolving joint per-
formance. Synchronization has been shown to be actively pursued by, e.g., Tango teachers, when both correcting and
instructing new material (Ehmer, 2021), while vocalizations by Lindy Hop teachers have been shown to be systematically
used for body coordination and rhythm synchronization (Keevallik, 2021). We demonstrate the coordination of cross-
participant voice-body configurations using recordings of Pilates classes.

1.3. Real-time instructing of other bodies

The study further differs from much of the instruction literature in examining, not pre-posed directives or demonstrations,
but what happens when an instructor’s speech accompanies students’ bodily motions in real time. This kind of in-course
instruction has previously been described as requests to extend the currently ongoing action, which can be accomplished
with repetition of imperatives (Keevallik, 2010; Mondada, 2017; Okada, 2018) and other lexical items (Simone and Galatolo,
2021). Repetition in particular is used for occasioning serial actions, where the directive and response may up being per-
formed with increased simultaneity (Mondada, 2014, p.294-297), or for orienting to a lack of sufficient response and thus
acting as a repair (Mondada, 2014, p.297-299). Incitements in sports (Reynolds, 2021) or encouragements in video games
(Baldauf-Quilliatre and Carvajal, 2019) are similar practices, in that onlookers comment on how embodied actions should
occur (typically, that they should continue with additional effort), however such comments are not organized as typical
54 E. Hofstetter, L. Keevallik / Language & Communication 88 (2023) 52–72

instructions or requests, as the bodies under observation are already undertaking the suggested action. While some prosodic
features are mentioned in this work, for instance Mondada (2014: 299) describes the accelerated production of the repeated
instructions, and Huhtamäki and Grahn (2022) demonstrate how prosody can be part of the distinction between activity
transition markers and assessments, the full capacity of prosody is not yet analyzed in these settings and so the role of
different prosodic resources in accomplishing immediacy and organizing duration remains to be scrutinized.

1.4. Intersubjectivity and intercorporeality

The process involved here, wherein coordinated action is accomplished through and by (re)enacting mutual under-
standing or empathy among the co-present bodies, has been given a number of terms, including mutual incorporation (Fuchs
and De Jaegher, 2009), compresence (Merleau-Ponty, 1964), and empathic insertion (Goffman, 1974), all of which express
different nuances of how interacting participants build intersubjectivity, especially concerning the awareness of each other’s
bodies, upon the scaffold of embodied experience. Kesselheim and Brandenberger (2021) have recently shown how
perception can be socially shared through reproducing the events a co-participant just experienced. We, too, are looking at
how the instructor is making relevant her own prior bodily experience and just-prior demonstrations to be intercorporeally
(Fr. intercorporéité, Merleau-Ponty and Smith, 2002) involved in the moves in which the students are engaged. The bur-
geoning literature on the interactional understanding of intercorporeality has shown how participants accomplish ‘inhab-
iting’ others’ bodies, and how they use this inhabiting as a method for managing intersubjectivity and accountability (Meyer
et al., 2017). While intercorporeality may be attributable when members make sense of each other’s bodies (e.g., Due, 2021), it
is also a phenomenon achieved through interactional work such as action design, and making sensation available (e.g. M. H.
Goodwin, 2017; Kuroshima, 2020; Mondada, 2019). It is therefore an accountable phenomenondco-participants can make
each other accountable for displayed intercorporeal projections, or for absences. Stukenbrock (2017) has analyzed deixis as
one way that intercorporeality is actualized into utterances, and Cantarutti (2020, this issue) has also demonstrated how
participants can use co-animations to mutually embody understanding. This paper explores a different way in which
intercorporeal connection between participants is made relevant: through prosody.

1.5. Prosody and action

Most work on interactional prosody has focused on its sequential organization (e.g. Szczepek Reed, 2020), targeting next
speakers’ matching or diverging choice of prosodic resources, while the use of prosody in instruction has received much less
attention. Haddington et al. (2012) state that higher pitch on imperatives provides a sense of urgency. Reynolds (2021, p. 37)
argues that lexical repetition, pitch increase, and loudness combine to enact the most intense moment in the course of a
weight lift. The concurrent co-participation in joint action, also through prosody, was first discussed in Goodwin and
Goodwin (1987), who showed how assessments are contingent, partially simultaneous accomplishments by different
speakers who mutually calibrate their use of vocal resources, such as loudness and duration. Charles Goodwin later developed
his multimodal analysis to show how actions are collaboratively built by operating on materials provided in previous turns
and by deploying pitch shape, pitch height, duration and amplitude, especially when the one acting has very limited access to
lexical resources (C. Goodwin 1995). We can thus distribute interactional resources, including prosody, across actors and
thereby construct a joint action, as different materials with complementary properties elaborate each other. In particular,
congruent understandings are shown to be accomplished through the exact timing of laminated semiotic resources, such as
embodied nods and distinctive prosody (C. Goodwin, 2018, pp. 157–158). Importantly for our activity setting, it has been
shown how the guides of visually impaired sports climbers deploy prosodically calibrated lexical repetition to adjust the
duration and direction of the climbers’ ongoing movements (Simone and Galatolo, 2020, 2021).
Building on the above insights, we will show that the Pilates instructor designs the prosody of her utterances in ways that
sound for the students’ current body movements. We argue that the instructor and students orient to the maintenance of
synchrony and that the voice-body configurations are accountable for being together. The instructor laminates vocal materials
on the progressing and projectable body movements made by the students, and students time their movements reflexively to
the instructor’s voice. In our analysis, we highlight how pitch, loudness, and voice quality are used to organize this synchrony
and accomplish the voice and bodies as mutually implicated. First, the instructor uses a generic pitch shape to display the
structure of the exercise, such as from start position to a strenuous position and back. Second, the instructor times this pitch
shape to accompany the students’ ongoing movements, both instructing and following multiple students’ bodies. Third, the
instructor alters the pitch shape in order to instruct and ‘inhabit’ various qualities of movements, specifically increased strain
or effort, and slowing down or extending the body (which also results in more strain).

2. The data and collection

The data come from a collection of four video-recordings of hour-long Pilates classes in Estonian, given by a single
female instructor (whose pitch range in data is 85–594 Hz, with an average across the data at 320.8 Hz1). The classes include

1
The instructor’s average pitch refers to her average f0 in the data corpus, rather than her everyday speaking voice.
E. Hofstetter, L. Keevallik / Language & Communication 88 (2023) 52–72 55

students who have somewhat different levels of experience, though none are absolute beginners. Besides knowledge about
how instructions are given, the students may have some idea of the customary progression of the class. The exercises
consist of repetitive moves, and several rounds of the same move on different sides, with opposite limbs, etc. There is no
music playing, as is otherwise often the case in, for instance, yoga classes, and this makes the activity setting especially
interesting for the analysis of language and prosody. The pace of the class and rhythms of the exercise are created here
entirely in the interplay between the instructor’s voice and the students’ moving bodies. In order to scrutinize that
particular interface, we chose to work with segments of the class when the instructor is not fully performing the exercise
alongside the students. While she uses several intonation contours habitually, we here chose to focus on one very frequent
one to be able to showcase the relevant details and variations. The sound is only captured through camera microphones, so
as to not inhibit the instructor’s movements. There is some echo in the space, which can potentially be used interactionally,
though this also affects the pitch traces shown below. All participants have agreed to the use of the data for research
purposes, including the presentation of anonymised pictures, while the instructor has allowed herself to be both heard and
seen in research contexts.
The pitch traces were made using Praat, with f0 detection limits between 50 and 600 Hz, and formatted according to
Walker (2017). Manual corrections consisted of devoicing sections where no vocalization was audible, and shifting pitch in a
few instances where Praat’s pitch trace did not match the audible melody. The size of the pitch shifts are noted with
semitones, as these better approximate how we typically perceive pitch changes.

3. An exercise-structuring device: the UP/DOWN contour pair

To begin, let us demonstrate the design of what we call the UP/DOWN contour pair, which the instructor regularly deploys
in accompanying the exercises. We use capitals and the term contour pair, to capture that the phenomenon involves pitch
movement across the instructor’s word(s), but we emphasize that this practice is different from the prosodic patterns in
typical speech; it consists of a step-up and a step-down contour, it is highly stylized, and it can even become song-like, when
applied in rhythmical series. The exercises the contour pair accompanies are patterns of bodily movements that are repeated
multiple times. An exercise can be very simple, such as rolling up to a seated position from a laying down position and back
again, or more complicated, with a long chain of components. In the UP/DOWN contour pair, the instructor’s pitch rises
sharply while speaking words that indicate moving into an exercise, and falls sharply back down–to a pitch close to the
starting pitch–when moving out of an exercise, or, alternatively, to chunk components of an exercise. We include both sharp
‘steps’ as well as glides (gradual increase/decrease in pitch) in describing this phenomenon (corresponding roughly to stylized
versions of what Bolinger (1985) calls ‘up/down-skips’ and ‘up/down-glides’).2 The peak and the bottom pitches are close to
the extremes of the pitch range of the speaker. The contour pair is thus recognizable through these salient features, as well as
through its repetitive use. We indicate the UP contour with highlighted green text in transcripts, and the DOWN contour with
highlighted blue text.
First of all, UP/DOWN contour pairs occur during demonstrations of an upcoming exercise. In Extract 1, the instructor
demonstrates that the next exercise involves doing double kicks with one leg, and then changing to the other leg. Though she
is standing, and the students are lying on their mats, she partially enacts the body shape of the students, holding her hands in
front of her to mimic their sphinx-like pose on their stomachs on the floor (see Fig. 1.2).

Extract 1: Pilates 2011 pt1_6.28_double kicks demo

2
The absolute shifts in pitch are not consistent for two reasons. First, the instructor often inserts talk in between the opening and closing of the contour
(see e.g. Extract 4), and the pitch is not maintained perfectly during these incursions. Second, the instructor often starts or ends at different pitches ac-
cording to how the class has progressed in the series of exercises (e.g., a penultimate or ultimate repeat includes prosodic changes that indicate the
incipient end to the series of exercises, which affects the UP/DOWN pair, see Extract 1). Each of these inconsistencies are done in order to accomplish
further interactional work, which we discuss in the analysis. As long as the shift is salient and thereby obviously marked as not doing other interactional
work, the contour pair is able to function recognizably.
56 E. Hofstetter, L. Keevallik / Language & Communication 88 (2023) 52–72

Fig. 1.1. Pitch trace showing the UP/DOWN contour pair in Extract 1.3

Fig. 1.2. Multimodal presentation of L1–2 in Extract 1.

The instructor always starts the exercise-accompanying utterance lower, often around or just above her average pitch in
the corpus, and then jumps steeply up, in one or very few steps, to a markedly higher pitch. These steps up and then down can
be seen in the pitch trace (Fig. 1.1). The exercise starts with kicking, and the pitch rises while describing and enacting the kicks.
Subsequently, the next step in the exercise is to switch feet, and while the instructor’s first foot returns to the floor and she
steps forward to make the other foot available to kick, she says vahetus ‘switch’ and lowers the pitch back down, close to the
starting pitch. Accordingly, the pitch shift accompanies moving into and out of the exercise, here to accomplish the kicks and
return to home position. The UP/DOWN pitch shapes are then continued during student practice, repeated on the established
and successively shortened verbal formula given throughout the exercise (Keevallik, 2020b). The absolute steps up and down
are different; the first contour pair ends higher, the second lower, which accomplishes a list-like prosodic format, wherein the
first contour pair projects more to come, and the second contour pair projects the closing of the instruction. These

3
Note that the room (a wood floor and unpadded walls) echoes, resulting in pitch traces marked between the word/phrase boundary indications, even
though the instructor is not actively vocalizing. Although the instructor may in fact make active use of the echo to achieve a continuous soundscape, our
analysis focuses on moments where the instructor is actively vocalizing.
E. Hofstetter, L. Keevallik / Language & Communication 88 (2023) 52–72 57

contingencies affect the size of the pitch steps (11 vs 9 semitones up in first and second pair, respectively), but the UP/DOWN
form remains consistent.
The pitch changes are not iconically mapped to up or down motions, but to any exercise starting and then ending, or even
to smaller segments within complex exercises. In Extract 2 below, the exercise is to circle the legs in one direction (L1 ühele
poole ‘to one side’), and then back in the other direction (L2 teisele poole ‘to the other side’), while balancing on one’s buttocks
(see Figs. 2.1 and 2.2). The instructor is again only marking the exercise with her body, but in this case, she is also sounding for
the simultaneously exercising students.

Extract 2: Pilates 2011 pt2_21.15_clockwisecounterclockwise

Fig. 2.1. Pitch trace showing the UP/DOWN contour pair in Extract 2.

Fig. 2.2. Multimodal presentation of L1–2 in Extract 2. Students circle legs in one direction, then another. They reach the peak of a leg circle at imgB.
58 E. Hofstetter, L. Keevallik / Language & Communication 88 (2023) 52–72

Here we see the same pitch movement, a steep step upwards (13 semitones) when describing moving into the ex-
ercise, and again stepping back downwards (17 semitones) when describing the second half of the exercise trajectory (Fig.
2.1). The second (DOWN) contour begins at the same height as the UP finishes, thereby gluing the second to the first and
projecting a return for the legs. While one could imagine the pitch accompanying a single circledfor instance, with the
pitch shift up accompanying movement to a halfway point on the circle, and then a down shift accompanying the second
half of the circledthe shape instead covers a full circle each. This is because the UP/DOWN contour pair is not designed to
iconically represent moves in space but accompany and organize the repetition of the exercise in manageable chunks. The
circles one way, then the other way, prosodically and lexically constitute a whole repetition, making for a projectable
sequence that is easy for the students to join in. The pitch ‘chunks’ different movements into patterns, in this case gluing
the two circular movements of the exercise into prosodic couplets. The instructor thereby facilitates reflexive synchrony
with the students.
The chunking effect can also be seen in more complex movements, as follows (Extract 3). The exercise (a non-strenuous,
end-of-class exercise) is to first bend the knees down, then straighten them, then rise up onto one’s toes, and then sink back
down onto flat feet. Here too, the instructor uses two UP/DOWN shapes to cover the full exercise, thus chunking it into
prosodic couplets.

Extract 3: Pilates 2011 pt2_on toes 34.15

Fig. 3.1. Two consecutive UP/DOWN contour pairs in Extract 3, chunking a complex exercise.
E. Hofstetter, L. Keevallik / Language & Communication 88 (2023) 52–72 59

The pitch rises as the knees are described as bending and students move down (the green section in L1; which also
demonstrates that the pitch shift is not used iconically), and then returns to the approximate starting pitch with the
knees described as straightening (Fig. 3.1), thus returning to a natural base body position. This projected return is
matched in pitch, from the middle of the instructor’s used pitch range, to its upper reaches (11 and 9 semitones for the
first and second UP), and back again (21 and 23 semitones respective to the first and second DOWN), which could be seen
as an indexical marking of moving from and back to base position. The students move accordingly during the emerging
vocal production. The instructor then reopens the same pitch shape for rising onto the toes (the green section in L3). The
second half of this instance holds a slight variation, wherein the instructor jumps to an even higher pitch and adds a
comment to the exercise-accompanying phrase. Instead of describing and prosodically reflecting the bodies as merely
needing to go upwards, the instructor adds a comment about the students’ needing to move even more upwards albeit
lowering their heels. The comment is uttered with a considerably increased volume and speed, and a held high pitch
plateau (Fig. 3.1). The insertion nature of this comment is furthermore evident in that at least one student, StudentD,
misprojected the timing of the return to flat feet, lowering much earlier than the instructor’s accompanying phrase
(Extract 3, L3). This case illustrates the transformative power of the instructor’s vocal production – it is not merely
providing a routine rhythm for the exercise, and thus accompanying the students, but also deploying more or less
extreme prosodic effects to collaboratively achieve extended positions in the students’ bodies at this very moment. The
down motion is also eventually accompanied by a drop in pitch at the end of üles ‘up’4 (L3), concluding the pitch shape
pair.
We have thus shown that the UP/DOWN pitch shape follows the trajectory of the exercise, accomplishing a rhythmical and
melodic whole for the exercise’s overall development. The pitch is not established so formally in every exercise, and not every
exercise is formally demonstrated before it is undertaken, as the instructor may be relying on the students’ familiarity with
certain exercises. The contour pair, however, makes the structure of the exercise predictable, providing a means to chunk any
given exercise into couplets of movement. By having the pitch manage the exercise’s trajectory, moving together with the
exercising bodies, the instructor is able to use words to do other work, including indexing which body parts to use,
mentioning qualities for the body to sense or enact, and even providing individual commentary. We will discuss this
affordance further in the next section, where we will also highlight the usefulness of other prosodic resources, including
duration, loudness, and tempo.

3.1. Instructor orientation to prosodic coherence: inserting additional instructions

As we have shown, the exercises are accompanied by the UP/DOWN contour pairs on lexical formulae, such as
löök löök/vahetus ‘kick kick/switch’, ühele poole/teisele poole ‘to one side/to the other side’ (see Keevallik, 2020b;
2020a for discussion on how the formulae are contingently established and for further variations, e.g., ‘forward/
backward’, ‘in/out’, ‘ceiling/knee’). The Pilates instructor regularly interposes commentary between the rhythmically
deployed formulae, using prosody that marks the commentary as distinct, for instance by returning to her average
pitch range for instruction, employing tempo changes and rush-throughs, or maintaining a high pitch at the ‘ex-
tremity’ of the exercise trajectory. These insertions can, at times, result in discoordination, as the contour pair that
was projected becomes altered or paused (as was briefly shown in Extract 3, L3). The students and instructor then
work together to re-establish synchrony, which demonstrates participants’ close orientation to the instructor’s voice
and the exercising bodies as accountably accomplishing collaborative assemblies. The instructor and students are
accountable to each other for attuning to each other’s bodies and voice, as displayed through their public attempts
to achieve synchrony.
For instance, in Extract 4, the instructor uses the UP/DOWN contour pair (Fig. 4.1) while accompanying a series of
movements conducted from a side plank position. The students reach over their heads, then twist under their sup-
porting arm, back again, and finally arch backwards, before coming back to a side plank (see images in Extract 4). The
instructor segments these into pairs of UP/DOWN contours (e.g. keera/otse ‘twist/straight’). Instead of producing the
second half of the pair that fits with ava ‘open’ (the arching backwards movement) (L1-2), she makes a comment about
how the students’ posture ought to be while holding side plank, namely with the hips staying in place (L3).

4
The instruction to go ‘up’ in this case means straightening the posture.
60 E. Hofstetter, L. Keevallik / Language & Communication 88 (2023) 52–72

Extract 4: Pilates 2011 pt2_side arc planks_25.15

Fig. 4.1. UP/DOWN contour pairs in Extract 4 and an insertion puusad jäävad paigale ‘hips stay in place’.5

The comment puusad jäävad paigale ‘hips stay in place’ has a sustained high pitch, higher than her average pitch
instructional range (Fig. 4.1). The contour breaks with the usual UP/DOWN contour form that has been used, and that had
projectably started on ja ava ‘and open’ (L2). Instead, the instructor holds the pitch height approximately stable at the level it
had reached by the end of the first half of the shape, at ava ‘open’ (sustained at 10 semitones higher than ‘ja ava’ began). In

5
The pitch traces between word markers indicate echoes, rather than active vocalizing, see fn.1.
E. Hofstetter, L. Keevallik / Language & Communication 88 (2023) 52–72 61

other words, the down-part of the overall UP/DOWN is missing at this moment; whether the contour is on hold or will be
omitted is yet to be seen. Further evidence of this absence is that, when the instructor ends her inserted commentary, she
returns to the prosody relevant not for the suspended contour pair (i.e. not immediately downward) but for the current shape
of the students’ bodies. They are in the process of stretching over their heads, and the instructor matches both the description
and the contour so that she can produce the UP part with the stretch and the DOWN part alongside the termination of the
exercise.
The students must manage the insertion as well, in order to stay synchronized with the instructordthat is, they also work
to achieve the instructor’s exercise-accompanying utterances as sounding alongside their movements. In this extract, Stu-
dentA adds an extra stretch during the inserted comment, lengthening the arm opening time so that when she then reaches
over her head, she is more in time with the instructor’s utterance (L3). StudentB instead continues into the next movement in
the sequence, potentially anticipating that the inserted comment is a replacement for the verbal description of the current
movement. When the instructor produces the description after all, StudentB holds her position, reaching over her head, all
throughout the descriptive üle ‘over’ and until ja ‘and’ begins to emerge (this conjunction is systematically used by Pilates
students to project the ultimate move in a series (Keevallik, 2020b)). In other words, StudentB delays exercise progressivity
slightly later than StudentA. It is here we see an additional potential rationale for the instructor to open a new UP/DOWN
contour pair, rather than have üle ‘over’ as the DOWN half of the pair initiated with ava ‘open’. Opening a new pair, and
treating the ava ‘open’ contour as abandoned, allows for reestablishment of synchrony across students, and they can all
complete the last movement together (the UP contour of ja üle has the same semitone change as ja ava as well, 10 semitones).
Cancelling of one half of the general UP/DOWN pattern is a useful compromise in that it maintains the progressivity of the
exercise while intermittently talking about other urgent matters, primarily to achieve a higher quality exercise. The instructor
uses prosody to do the work of demonstrating attunement to students’ bodies, and lexical commentary to instruct the quality
and shape of their movements. She adapts to the physical needs of the students’ bodies as they tire, while students
demonstrate their ability to continue and their attention to the ongoing exercise through keeping their body in time with her
voice.

3.2. Student orientation to the structuring prosody: Varying synchronies

The instructor and students coordinate their vocal and bodily resources to arrive in synchrony. They jointly produce
the instructor’s utterances as vocalizing for the students’ bodies, rather than the instructor simply mirroring the students
or conversely merely providing a template for students to follow. Both instructor and students use elongations and
contractions of sound and movement, as well as holds and pauses, to produce and maintain their mutual synchrony.
However, even though the students may take cues from their fellows, they do not necessarily act in complete synchrony
with each other (they do not appear as synchronized swimmers or soldiers might); instead, Pilates students may syn-
chronize with different micro-moments of the instructor’s utterances and pitch shapes. For example, some students may
align with pitch peaks to reach maximum extensions while others seem to use them for bouncing back from these
extended positions.
Extract 5 demonstrates the variable moments of synchrony and methods students use to come into and maintain linked
timing. The exercise at hand is a swimming motion, with students on their stomachs, in which they reach forward with their
arms, then pull back while lifting their chests, and repeat (Fig. 5.1). The reach forward involves lowering their heads and
chests closer to, though still hovering above, the ground. Prosodically, the instructor treats the reach forward and chest
lowering as the opening trajectory of the exercise and thus receiving the UP half of the contour pair, and the lifting and pulling
arms back as the second half, receiving the DOWN half of the contour pair.

Extract 5: Pilates 2011 pt1_swimming_7.35


((only chest movements are marked, to simplify the transcript))
62 E. Hofstetter, L. Keevallik / Language & Communication 88 (2023) 52–72

Fig. 5.1. Variable student synchronizations to the instructor utterances in L1–2 in Extract 5.
E. Hofstetter, L. Keevallik / Language & Communication 88 (2023) 52–72 63

At several points, participants coordinate to better synchronize with the instructor’s utterances. Let us begin with Stu-
dentA, who misprojects the exercise timing, and does a full arm circle instead of half a circle to go with the UP contour and half
a circle with the DOWN contour. As a result, she mistimes the initial portion of the exercise such that she does the opposite
posture for the instructor’s utterance: she lowers her chest and reaches forward when the instructor utters ülesse ‘up’ (L1).
Across the ensuing utterances from the instructor, StudentA holds her position, slowly getting closer to the instructor’s
timing. By the third full repetition in the transcript (L5), StudentA is in synchrony with the instructor’s descriptions and
continues so.
As a further example, compare StudentC and StudentD. StudentC extends her movements such that each motion takes the
entire duration of the instructor’s utterances. In contrast, StudentD regularly holds her position at the peak of a motion,
especially the lower, reaching-forward posture (L4, L5-6, L6-7). This spaces out her motions so that she initiates the next
motion in better synchrony with the instructor. The students thus can be seen to control their motions in various ways to stay
in synchrony with the instructor, so that the instructor’s vocalizations are sounding out their current movements. In this
exercise, as in most others, the movements are either simple enough, or repeated enough or both, that the students could, in
theory, progress at their own tempo and squeeze in more (or fewer) repetitions. Keeping with the instructor, however,
achieves a ‘being with’ the utterances, wherein the students are doing the class with the instructor as well as the fellow
students, forming an assembly of joint movement.
However, the transcript above indicates only where the students start and stop their motions, making it difficult to see
where they reach the peak or gestural stroke of the motions. Transcript (Fig. 5.1) brackets each of the instructor’s syllables
whereupon the students time their peak of motion.
In Fig. 5.1, it is easier to see that the students vary in their timings among each other. During this specific exercise, StudentA
tends to peak on the first or second syllable of the word, StudentD usually on the second last or last syllable, and StudentC
slows down across the exercise repetitions, starting with peaks on the first syllable, then the second, then in the beat after the
described motion. We can thus see that students tie their motions to different parts of the utterance, though each working to
maintain some form of alignment with the instructor’s voice and doing the same number of repetitions within the overall
timeframe. Thereby they accomplish the voice-bodies’ assembly of exercising together in achieved, albeit not minute, syn-
chrony. By continuing synchrony, students demonstrate attentiveness to the progressing exercises and competence to
continue, and the instructor can monitor energy and adjust pace.

4. Instructor working with specific bodies

During the class, the instructor sometimes stops producing the exercise-accompanying utterances, to check her notes,
chat with the class, or give individual feedback, after which she needs to reengage with the exercising students in a way
that does not disrupt the ongoing rhythm of their practice. We can examine the instructor’s vocal adjustments when she
temporarily orients to one or two students at a time, using a variety of prosodic means. Extracts 6–8 show three instances
of the instructor focusing on individual students, while the other students continue repeating the exercise in the room,
also timed with the instructor’s vocal production. She first focuses on StudentF, then StudentD, and then, partially in
overlap with the focus on StudentD, she orients to StudentG. Each of the adjustments happens at different moments in the
overall pitch pattern and the exercise, therefore featuring attention to a variety of bodily issues that occur at those
moments. We will discuss these one at a time and present the extract in consecutive transcripts with continuous line
numbering.
In these Extracts (6–8) the students are moving into and out of boat pose, that is, balancing on the buttocks or sacrum while
lifting the torso and both feet, using the abdominal muscles to make a V shape with the body. They are currently rolling into
and out of the pose with their torsos (see Fig. 6.1).

Fig. 6.1. Example of the physical shapes of the boat pose sequence. Students use their abdomen to lift up into a V-shape (imgA), then again to slowly roll down
(imgB) to flat backs (imgC).
64 E. Hofstetter, L. Keevallik / Language & Communication 88 (2023) 52–72

4.1. Achieving peak effort and controlled return with StudentF

We will begin by looking at one repetition of the boat pose exercise, getting up and lowering down (Extract 6), with the
pitch accordingly moving up and down in the UP/DOWN contour pair, as shown in Fig. 6.2.

Extract 6: Pilates 2011 pt2_boat pose_14.05

Fig. 6.2. Pitch trace in Extract 6.

In line 1, the instructor encourages the students to keep rising, repeating üles üles üles üles üles üles ‘up up up up up up’ (L1-
2) in a rapid succession. The words are first rushed and uttered with flat pitch, thus conveying that the current body
movement should be continued (cf. Mondada, 2017; Simone and Galatolo, 2021 on lexical repetition; see Baldauf-Quilliatre
and Carvajal, 2019 and Reynolds 2021 on encouragement/incitement). As the instructor’s utterance becomes focused on
StudentF, through facing the student to the exclusion of others and eventually touching her (Fig. 6.3), the student lifts her
arms higher. This treats the instructor’s comments as particularly addressing her, and thus requiring some response. Though
she may not be able to lift higher with her torso, lifting the arms shows attention to participating in an attempt to do so. The
elongated last iteration of üles ‘up’ (L2) is uttered exactly as the student starts lifting her arms and ends when she gives up,
thus emerging as sounding precisely alongside her effort. Furthermore, during this word the instructor reaches the highest
E. Hofstetter, L. Keevallik / Language & Communication 88 (2023) 52–72 65

pitch frequency in the series of repetitions, making a more dramatic and elongated UP contour (16 semitones total from the
beginning of the üles repetitions). This contour, and its indexing of a temporary ‘peak’ of pitch, aligns with this particular
student’s ‘peak’ of effort. This is not an instruction but an incitement (Reynolds, 2021), that is, an encouragement to keep
going and push harder, and simultaneously an affirmation of the students’ struggle. In managing the emergent need to
accompany the extra peak of effort, a deviant contour is produced on the last üles (L2), with a lower ü and the extra high les
before the elongated falling pitch. The contour pair is reestablished with a particularly elongated DOWN portion on ja rulli alla
‘and roll down’ (L4, 15 semitones down), providing an unambiguous conclusion to this iteration of the exercise.

Fig. 6.3. Multimodal presentation of Extract 6. Instructor lifting the student’s feet (note instructor’s unbending of the knee in imgB) and the student falling back.

StudentF is already in the V shape when the instructor moves her, and the higher foot position makes holding the pose
more difficult, so she falls back slightly and utters a response cry (L3, Fig. 6.3). She also grabs hold of her thighs and uses her
hands to hold herself up, which displays inability to produce further effort. The instructor then begins the next part of the
exercise repetitiondthe roll down. The ja ‘and’ projects moving forward, and the student joins the utterance with her body,
rolling down at the appropriate moment. The student extends the roll down with the very elongated alla ‘down’ (L4), and the
vocalization and body movement emerge in perfect synchrony; alla ends precisely when the students’ hands finish the
downward roll. The instructor releases the student’s feet and moves away, closing the particular focus. The instructor, in this
sequence, uses not only pitch movement (the UP/DOWN contour pair) but also pitch range, with the extreme height in L2,
volume and duration (L4), to accomplish sounding with the student’s body, also for the benefit of everybody participating in
the difficult exercise. The student, meanwhile, contributes primarily through the body, only involving her voice at a point of
failure.

4.2. Achieving abdominal strain with StudentD

The instructor next undertakes a similar sequence with the adjacent student, StudentD. This time the instructor lowers the
student’s feet into a more difficult position, which makes it much more cumbersome for the student to lift into the V shape.
The student begins to roll up in synchrony with the instructor’s loud üles ‘up’ utterance (L7-8), however she slows down and
eventually uses her arms to help lift her higher, seemingly unable to do so with solely her abdomen as the exercise requires
(see Extract 7). By this point in time, with the student struggling to lift higher, the instructor’s üles ‘up’ is over. To continue
sounding for the student, the instructor utters the confirmation mhm ‘uhuh’ (L7), infused with strain in her voice, especially
on the latter lengthened syllable. This synchronizes closely with the student’s own strain grunt (L8), so that they co-produce
the strain, with the instructor sounding for and with the student’s efforts (see also Keevallik, in press). As for the contour pair,
the instructor is yet to utter an UP contour while StudentD is getting into the proper posture. While the pitch on üles falls
during those initial struggles, it emerges that the mhm instead coincides with the student’s pulling up and is produced with a
distinctly rising pitch (L7) (12 semitones, see Fig. 7.1), featuring the first half of the UP/DOWN contour. This illustrates how the
instructor vocally tunes in on minute details of the student’s attempt. Thus, the sounding for others is targeted to very precise
66 E. Hofstetter, L. Keevallik / Language & Communication 88 (2023) 52–72

moments of muscular engagement and also indexing that engagement with strain-infused voice (by this we mean breathy,
tense or creaky voice, or a combination of those, marked with S in the transcript).

Extract 7 (continued from Extract 6): Pilates 2011 pt2_boat_pose_14.05

Fig. 7.1. Pitch trace in Extract 7 (note that the pitch traces of L7 and 9 are partially compromised through the overlapped vocalizations).

The next instruction, also lexically fitted to what StudentD is not doing, i.e. stretching (siruta, L9), prosodically matches the
mhm but moves even higher before the final fall, thus continuing and extending the UP/DOWN contour pair to fit the
particular needs of a single student, while others are struggling along.
The repeat and extension of the UP half of the contour pair reflects the continued need of further effort by StudentD (L7),
while the lexical choice features both recognition with a loud (mhm ‘uhuh’) and further advice to her (ja siruta ‘and stretch’),
E. Hofstetter, L. Keevallik / Language & Communication 88 (2023) 52–72 67

which repeats a part of the UP half (4 semitones), even though then falling back to low pitch. Importantly, the strain is
iconically represented not only in the response cries by the student but also in the strained voice quality of the instructor,
showing how she ‘inhabits’ another currently struggling body through an empathetic vocal display. The instructor may also
tense her body empathetically, or to some degree as part of holding the student’s legs still, though this contributes to the joint
straining. The movement extension – and thus further effort – is accompanied by various sound stretches and an excep-
tionally loud acknowledgment mhm.

4.3. Accomplishing withheld return with StudentG

The instructor’s gaze now shifts to the next student, StudentG (L9). She again uses loud and speedy incitements, saying vel
vel vel veel veel ‘more more more more more’ as StudentG holds the V shaped posture steady (Extract 8). The student begins
the slightest movement downward in the middle of the incitements, but starts gaining speed during the instructor’s comment
to nüüd rulli läbi ‘now roll through’, thus synchronously accomplishing what is being instructed, rather than organizing it as a
sequence of instruction and response. The instructor incrementally adds a further description of rolling down in a rapidly
produced ja mine alla ‘and go down’ as StudentG and others in the room roll further down (L14). However, with both of these
rapidly produced utterances, the pitch remains very high, and the UP/DOWN contour pair has not yet progressed. It is only
with the final, very elongated ja alla ‘and down’ (L14) that the DOWN half is achieved (down 13 semitones on alla, see Fig. 8.1).
Both the ja and alla are extended to precisely fit the maximum duration of what the students are visibly able to withstand.
While StudentG arrives in the middle of the last word, the very end of the utterance emerges in tidy alignment with StudentD
(and other students’) final dropping of the shoulders and arms, with the initial loudness of the utterance fading away together
with the relaxing bodies.

Extract 8 (Continued from Extract 7)

Fig. 8.1. Pitch trace in Extract 8.

Like in earlier extracts, in Extracts 6–8 we documented the generic UP/DOWN contour pair but with specific adjustments
to local contingencies, such as being ‘with’ one student at a time and adjusting to what each of the students are visibly capable
of doing in terms of extending strenuous positions. By moving from student to student, the instructor is also skillfully creating
togetherness not only with the particular students she is gazing at or touching, but also with the exercising group as a whole.
To summarize what we saw in Extracts 6–8, vocal resources such as increased tempo, lengthenings, strained voice, and
loudness are used to index particular other bodies in their movement trajectories and empathically assumed effort, thus
68 E. Hofstetter, L. Keevallik / Language & Communication 88 (2023) 52–72

sounding in minute coordination with those bodies, incrementally adding verbal instructions, and adjusting repetitions,
speaking pace, and vocal extensions moment by moment. These amount to effectively ‘performing the same exercise’ by the
instructor, albeit primarily through vocal resources. The students and instructor mostly organize their contributions to be
asymmetrical, which accomplishes the phenomenon as distributed across participants and resources.

5. Discussion: prosodic resources for joint exercising

In the above analysis, we have shown how a Pilates instructor effectively performs the exercises alongside students. We
focused on moments where her co-inhabiting of the physical exercise was accomplished by vocal resources and identified several
repetitive prosodic patterns, from among which we chose to work with one, here called UP/DOWN contour pair, i.e., relatively
steep, stylized pitch movements that support the rhythmical nature of the exercises. The pitch shifts occurr as ‘skips’ (Bolinger,
1985) or as glides between lower and higher plateaus. The contour pair accompanies student movements, and all participants
actively work to maintain synchronic coordination between the instructor’s voice and the students’ bodies. While this stylized
pattern already constitutes a way of being with the exercising students in real time, we have shown a range of further prosodic
resources that are deployed to achieve various forms of jointness in the Pilates exercises, across voices and bodies.

5.1. Pitch movement

The synchronizing work analyzed above has focused largely on temporally coordinating the students’ bodies with the
instructor’s pitch movements. We argued that the basic UP/DOWN contour pair did not highlight iconicity in movements, but
instead accompanied a variety of body motions. The pair could furthermore achieve a chunking of complex exercises into
smaller steps, and/or co-inhabit the moving bodies. These latter situations can involve some iconic affordances, e.g., using
smooth pitch movements with body elongations and other fluid moves, however these aspects are beyond the scope of this
paper. Overall, pitch movement patterns serve to provide predictability through exercise repetitions, project their comple-
tions through starting high as well as project transitions to next exercises through recognizable melodic productions. In this
paper, we used the exercises carried out through the UP/DOWN contour pair to demonstrate participants’ joint orientation to
those, and to have a baseline for showing how further prosodic resources were exploited for various specific purposes.

5.2. Tempo

The pacing of lexical phrases, words, syllables and sounds was neatly tied to the local demands of the exercising bodies –
which constitutes what we mean by tempo in this study. We showed how the instructor sped up her intervening talk (Extract
4) in order to maintain the coherence of the established pitch pattern and paced the formula to the tempos possible for each
exercise (Extracts 1, 2, 4, 5, 6, 7). In addition, the instructor adjusted the vocal tempos to what the students’ bodies were able
to perform, such as slowing down in the face of displayed fatigue (Extract 7, L7). In instructions inserted into the rhythmical
formulae, pitch could stay level or rise even further. Precision timing has been discussed with the focus on directives
(Mondada, 2013, 2014, 2017). In our case, the instructor’s vocal actions are not easily categorizable as directives but rather
constitute a way of practicing Pilates together in the class.

5.3. Duration

The duration of sounds and the ongoing contour pairs were matched primarily to the currently displayed extension
abilities of the students, as shown with the colon sign roughly corresponding to 0.1 s in the verbal lines of transcripts (e.g.,
Extract 6, L2, 4, Extract 7, L7). We saw lengthenings of various sounds in the established instructive formulae, such as in üles
‘up’ and alla ‘down’ to accompany students during strenuous and preferably long movements. Repetition of lexical items (e.g.,
in Extract 8) is yet another tool for extending current move and effort, which has also been described in earlier literature
(Mondada, 2014; Okada, 2018; Reynolds, 2021) In particular, Simone and Galatolo (2021) have shown how the prosody of
lexical repetitions iconically conveys a sense of continuity, with minimal pitch changes between units. As we saw above, the
pattern we described is quite the contrary, featuring extreme and contrastive pitch movements up and down. In Pilates,
repetition is not only used to make relevant continuing the current movement, or urgently performing it, but also to make
relevant increasing bodily effort, as in contrast to guides in mapping climbs, the instructor is here coaching the students into
most beneficial moves. Furthermore, in our setting repetitions do not necessarily accomplish directive-like initiation-
compliance sequences; the vocal and embodied components of exercising are instead performed in parallel (cf. Mondada on
‘sequentially ordered simultaneities’, 2017, p. 91). We showed how synchronized duration is achieved in collaboration be-
tween participants, thus effacing the contrast between an initiator and a responder, and instead emphasizing continuous
reflexivity across participants.

5.4. Loudness

Loudness co-varies with pitch movement in a manner that provides the basic rhythm for the exercises. Louder (stressed)
syllables are interchanged with softer ones to create and maintain pace and achieve predictability of progress. This
E. Hofstetter, L. Keevallik / Language & Communication 88 (2023) 52–72 69

enhanced production of stress is in contrast with how the inserted instructions are produced, (e.g., puusad jäävad paigale
‘hips stay in place’ in Extract 4 is inserted by maintaining the high pitch level at the end of an UP), which reflects the fact that
loudness is also deployed to achieve the sounding in close collaboration with others’ bodies. Occasionally, loudness is
furthermore suggestive of variable strenuous body moves (see the wave forms in Figs. 6.2, 7.1, and 8.1, especially in com-
parison with Fig. 1.1 where the relaxed demonstration is accompanied by a considerably lower voice), though this requires
further investigation.

5.5. Voice quality

Perhaps most strikingly, the instructor’s voice can also embody strain, displaying physical effort through its effects on the
vocal tract (Keevallik, in press), which occurs in Extract 7 above. Alongside the student, the instructor also produces strain,
despite ‘merely’ positioning her feet. The instructor is not undertaking the same strain, and indeed can reproduce this posture
without visible strain herself (as evidenced at other points in the class). The strain-infused voice times precisely with when
the student’s abdomen is at maximum effort (evidenced in the need to support herself with arms to get higher, as well as the
overlapping strain grunt). The instructor’s sounding in this way affirms the need for additional strain, and at the same time
sanctions it as valid in the context of trying to do the pose well. Additionally, the vocal strain produces an incitement,
encouraging further effort. While pitch, tempo, duration, and loudness all constitute illustratively sounding to accompany or
instruct others, the strained voice quality most clearly ‘inhabits’ another body through configuring own body to accomplish a
vocal effect that needs a strained body to be performed.

6. Conclusion

In this paper we documented a way of deploying prosody to achieve a joint Pilates practice, focusing on one recurrent
prosodic contour that we called an UP/DOWN pair. The existence of the above body-voice assemblies support both the
understanding of language as co-produced across various participants, and the role of a speaker and agent being accom-
plished jointly rather than individually. A participant may sound with and for others’ current movements. Prosody in
particular was shown to achieve intersubjectivity in bringing the participants in the class to accomplish a joint exercise,
where one party, the instructor, is entitled to sound for other participants. The instructor relies on intercorporeality, grounded
in her own previous experience of the same exercises, and regularly in just having demonstrated them, to bring this off. By
closely monitoring prosodic resources deployed in relation to the moving bodies, we are able to expand on our knowledge of
prosody as an interactional and intersubjective resource. We showed how pitch movement in the form of an UP/DOWN
contour pair, as well as tempo, duration, loudness, and voice quality are used to ‘inhabit’ other bodies, coaching, expressing
and enhancing them in real time. The existence of this joint sounding and performing practice underlines the fact that
humans at times strive to achieve behavioral synchrony across participants, rather than always organize interaction
sequentially.
Although it is not the aim of this paper to specify why the students and instructor orient to and put effort into achieving
sounding for others, there are advantages. The class needs to establish a balance between extending the moves in the ex-
ercises, which makes them more difficult (though also more beneficial for practicing stability, balance, and endurance), and
establishing a pace and duration that everybody can successfully manage. The students allow the instructor to encourage a
tempo, while also making visible their limitations for the instructor to adapt to. Furthermore, the synchrony need not be
universal, allowing for different shapes, speeds, and peaks of motion. Although the students can see and monitor each other’s
relative timing, they are not accountable to be identical, letting students of different levels coexist in the class. They orient
instead to accountably maintaining synchronization with the instructor’s voice, thereby demonstrating their attentiveness
and effort. Through her vocalizing, the instructor at the same time demonstrates her attunement with the students’ bodies
and can monitor the class’ energy and progress, adjusting the ‘metronome’ of the contour reflexively with the students. More
generally, this practice illustrates how people establish simultaneous co-experiencing, and can do being together across
multiple bodies and asymmetrically deployed modalities. Its existence challenges conceptualizations of communication as a
linear, speaker-to-hearer process, and indicates that language and agency are multiparticipant, multimodal, and multi-
sensorial phenomena. There are times when participants actively attempt to overcome individualistic sequential contribu-
tions to interaction and participate as synchronously and jointly as possible, albeit through asymmetrical communicative
resources.

Acknowledgements

A huge thank you to our three anonymous reviewers for their insightful and helpful comments. The authors are also
indebted to Beatrice Szczepek Reed for her insights on prosody, and to Sally Wiggins, Agnes Löfgren, and Hannah Pelikan for
additional comments. Finally, a big thank you to the Sounding for Others special issue authors, who contributed their insights
to the paper in development multiple times across several workshops. The study was funded by the Swedish Research Council
grant VR2016-00827 ”Vocal Practices for coordinating human action” and by Riksbankens Jubileumsfond’s grant P21-0447
”Sounding for others: Distributed agency in action”.
70 E. Hofstetter, L. Keevallik / Language & Communication 88 (2023) 52–72

References

Aaltonen, T., Laakso, M., 2011. Halting Aphasic interaction. Creation of intersubjectivity and Spousal relationship in situ. Commun. Med. 7 (2), 95–106.
https://doi.org/10.1558/cam.v7i2.95.
Antaki, C., Kent, A., 2012. Telling people what to do (and, sometimes, why): contingency, entitlement and explanation in staff requests to adults with
intellectual impairments. J. Pragmat. 44 (6), 876–889. https://doi.org/10.1016/j.pragma.2012.03.014.
Auer, P., Pfänder, S. (Eds.), 2011. Constructions: Emerging and Emergent. De Gruyter. https://www.degruyter.com/view/title/37335.
Bakhtin, M., 1981. Discourse in the Novel. In: Holquist, M., Emerson, C. (Eds.), The Dialogic Imagination: Four Essays by M.M. Bakhtin. University of Texas
Press, Austin, pp. 259–297.
Baldauf-Quilliatre, H., Carvajal, I. C. de, 2019. Encouragement in videogame interactions. Soc. Interact. Video-Based Study Hum. Soc. 2 (2). https://doi.org/10.
7146/si.v2i2.118041. Article 2.
Bassetti, C., Liberman, K., 2021. Making talk together: simultaneity and rhythm in mundane Italian conversation. Lang. Commun. 80, 95–113. https://doi.org/
10.1016/j.langcom.2021.06.002.
Bavelas, J.B., Chovil, N., 2000. Visible acts of meaning: an integrated message model of language in face-to-face dialogue. J. Lang. Soc. Psychol. 19 (2), 163–
194. https://doi.org/10.1177/0261927X00019002001.
Bolinger, D., 1985. Intonation and its parts: Melody in spoken English. Stanford University Press, Stanford.
Cantarutti, M.N., 2020. The Multimodal and Sequential Design of Co-Animation as a Practice for Association in Interaction. PhD Thesis. University of York.
Cekaite, A., 2016. Touch as social control: haptic organization of attention in adult–child interactions. J. Pragmat. 92, 30–42. https://doi.org/10.1016/j.
pragma.2015.11.003.
Chomsky, N., 1965. Aspects of the Theory of Syntax. MIT Press.
Couper-Kuhlen, E., Selting, M., 1996. Towards an interactional perspective on prosody and a prosodic perspective on interaction. In: Couper-Kuhlen, E.,
Selting, M. (Eds.), Prosody in Conversation: Interactional Studies. Cambridge University Press, pp. 11–56.
Craven, A., Potter, J., 2010. Directives: entitlement and contingency in action. Discourse Stud. 12 (4), 419–442. https://doi.org/10.1177/1461445610370126.
Curl, T.S., Drew, P., 2008. Contingency and action: a comparison of two forms of requesting. Res. Lang. Soc. Interact. 41 (2), 129–153. https://doi.org/10.1080/
08351810802028613.
Daylight, R., 2017. Saussure and the model of communication. Semiotica 2017 (217), 173–194. https://doi.org/10.1515/sem-2016-0038.
de Saussure, F., 1916. Cours de linguistique générale. Payot.
Deppermann, A., 2018. Instruction practices in German driving lessons: differential uses of declaratives and imperatives. Int. J. Appl. Ling. 28 (2), 265–282.
https://doi.org/10.1111/ijal.12198.
Deppermann, A., Mondada, L., Doehler, S.P., 2021. Early responses: an introduction. Discourse Process 58 (4), 293–307. https://doi.org/10.1080/0163853X.
2021.1877516.
Dressel, D., Kalkhoff, A.T., 2019. Co-constructing utterances in face-to-face-interaction: a multimodal analysis of collaborative completions in spoken
Spanish. Soc. Interact. Video-Based Study. Hum. Soc. 2 (2). https://doi.org/10.7146/si.v2i2.116021.
Dressel, D., Satti, I., 2021. Embodied coparticipation practices in collaborative storytelling. Gesprächsforschung 22, 54–86.
Du Bois, J.W., 2014. Towards a dialogic syntax. Cognit. Ling. 25 (3), 359–410. https://doi.org/10.1515/cog-2014-0024.
Due, B.L., 2021. Distributed perception: Co-operation between sense-able, actionable, and accountable semiotic agents. Symbolic Interact. 44 (1), 134–162.
https://doi.org/10.1002/symb.538.
Ehmer, O., 2021. Synchronization in demonstrations. Multimodal practices for instructing body knowledge. Linguistics Vanguard 7 (s4). https://doi.org/10.
1515/lingvan-2020-0038.
Ekström, A., Lindwall, O., 2014. To follow the materials: the detection, diagnosis and correction of mistakes in craft education. In: Nevile, M., Haddington, P.,
Heinemann, T., Rauniomaa, M. (Eds.), Interacting with Objects: Language, Materiality, and Social Activity. John Benjamins, pp. 227–239.
Enfield, N.J., 2017. Elements of agency. In: Enfield, N.J., Kockelman, P. (Eds.), Distributed Agency. Oxford University Press. https://oxford.
universitypressscholarship.com/view/10.1093/acprof:oso/9780190457204.001.0001/acprof-9780190457204.
Fuchs, T., De Jaegher, H., 2009. Enactive intersubjectivity: participatory sense-making and mutual incorporation. Phenomenol. Cognitive Sci. 8 (4), 465–486.
https://doi.org/10.1007/s11097-009-9136-4.
Genty, E., Heesen, R., Guéry, J.-P., Rossano, F., Zuberbühler, K., Bangerter, A., 2020. How apes get into and out of joint actions: shared intentionality as an
interactional achievement. Interact. Stud. 21 (3), 353–386. https://doi.org/10.1075/is.18048.gen.
Goffman, E., 1974. Frame Analysis: an Essay on the Organization of Experience. The Maple Press.
Goodwin, C., 1979. The interactive construction of a sentence in natural conversation. In: Psathas, G. (Ed.), Everyday Language: Studies in Ethnomethod-
ology. Irvington Publishers, pp. 97–121.
Goodwin, C., 1980a. Restarts, pauses, and the achievement of a state of mutual gaze at turn-beginning. Socio. Inq. 50 (3–4), 272–302. https://doi.org/10.1111/
j.1475-682X.1980.tb00023.x.
Goodwin, C., 1995. Co-constructing meaning in conversations with an aphasic man. Res. Lang. Soc. Interact. 28 (3), 233–260. https://doi.org/10.1207/
s15327973rlsi2803_4.
Goodwin, C., 2004. A competent speaker who can’t speak: the social life of aphasia. J. Ling. Anthropol. 14 (2), 151–170. https://doi.org/10.1525/jlin.2004.14.2.
151.
Goodwin, C., 2018. Co-operative Action. Cambridge University Press.
Goodwin, C., Goodwin, M.H., 1987. Concurrent operations on talk: notes on the interactive organization of assessments. IPrA Papers in Pragmatics 1 (1), 1–
54.
Goodwin, M.H., 1980b. Processes of mutual monitoring implicated in the production of description sequences. Socio. Inq. 50 (3–4), 303–317. https://doi.org/
10.1111/j.1475-682X.1980.tb00024.x.
Goodwin, M.H., 2017. Haptic sociality: the embodied interactive constitution of intimacy through touch. In: Meyer, C., Streeck, J., Jordan, J.S. (Eds.),
Intercorporeality: Emerging Socialities in Interaction. Oxford University Press, pp. 73–102.
Goodwin, M.H., Cekaite, A., 2013. Calibration in directive/response sequences in family interaction. J. Pragmat. 46 (1), 122–138. https://doi.org/10.1016/j.
pragma.2012.07.008.
Goodwin, M.H., Cekaite, A., 2018. Embodied Family Choreography: Practices of Control, Care, and Mundane Creativity. Routledge.
Haddington, P., Nevile, M., Keisanen, T., 2012. Meaning in motion: sharing the car, sharing the drive. Semiotica 2012 (191), 101–116. https://doi.org/10.1515/
sem-2012-0057.
Hayashi, M., 2005. Joint turn construction through language and the body: notes on embodiment in coordinated participation in situated activities.
Semiotica 2005 (156), 21–53. https://doi.org/10.1515/semi.2005.2005.156.21.
Heinemann, T., 2006. ‘Will you or can’t you?’: displaying entitlement in interrogative requests. J. Pragmat. 38 (7), 1081–1104.
Heritage, J., Raymond, G., 2005. The terms of agreement: indexing epistemic authority and subordination in talk-in-interaction. Soc. Psychol. Q. 68 (1), 15–
38. https://doi.org/10.1177/019027250506800103.
Huhtamäki, M., Grahn, I.-L., 2022. Explicit positive assessments in personal training: their design and sequential and embodied environment. J. Pragmat.
188, 108–128. https://doi.org/10.1016/j.pragma.2021.12.001.
Iwasaki, S., 2009. Initiating interactive turn spaces in Japanese conversation: local projection and collaborative action. Discourse Process 46 (2–3), 226–246.
https://doi.org/10.1080/01638530902728918.
Jefferson, G., 1986. Notes on ‘latency’ in overlap onset. Hum. Stud. 9 (2), 153–183. https://doi.org/10.1007/BF00148125.
E. Hofstetter, L. Keevallik / Language & Communication 88 (2023) 52–72 71

Jenkins, L., 2015. Negotiating pain: the joint construction of a child’s bodily sensation. Sociol. Health Illness 37 (2), 298–311. https://doi.org/10.1111/1467-
9566.12207.
Keevallik, L., 2010. Bodily quoting in dance correction. Res. Lang. Soc. Interact. 43 (4), 401–426. https://doi.org/10.1080/08351813.2010.518065.
Keevallik, L., 2013. The interdependence of bodily demonstrations and clausal syntax. Res. Lang. Soc. Interact. 46 (1), 1–21. https://doi.org/10.1080/08351813.
2013.753710.
Keevallik, L., 2017. Linking performances: the temporality of contrastive grammar. In: Laury, R., Etelämäki, M., Couper-Kuhlen, E. (Eds.), Linking Clauses and
Actions in Social Interaction. Finnish Literature Society, pp. 54–73.
Keevallik, L., 2018. What does embodied interaction tell us about grammar? Res. Lang. Soc. Interact. 51 (1), 1–21 https://doi.org/10.1080/08351813.2018.
1413887.
Keevallik, L., 2020a. Grammatical coordination of an embodied action: the Estonian ja “and” as a temporal coordinator of Pilates moves. In: Maschler, Y.,
Pekarek Doehler, S., Lindström, J. (Eds.), Emergent Syntax for Conversation: Clausal Patterns and the Organization of Action. John Benjamins, pp. 221–
244.
Keevallik, L., 2020b. Linguistic structures emerging in the synchronization of a Pilates class. In: Taleghani-Nikazm, C., Betz, E., Golato, P. (Eds.), Mobilizing
Others: Grammar and Lexi within Larger Activities. John Benjamins, pp. 147–173.
Keevallik, L., 2021. Vocalizations in dance classes teach body knowledge. Linguistics Vanguard 7 (s4). https://doi.org/10.1515/lingvan-2020-0098.
Keevallik, L. (in press). Strain grunts and the organization of participation. In L. Mondada, A. Peräkylä (eds.) Body, Participation and the Self: New Per-
spectives on Goffman in Language and Interaction. Routledge.
Kesselheim, W., Brandenberger, C., 2021. The social construction of embodied experiences: two types of discoveries in the science centre. Linguistics
Vanguard 7 (s4). https://doi.org/10.1515/lingvan-2020-0101.
Kuroshima, S., 2020. Therapist and patient accountability through tactility and sensation in medical massage sessions. Soc. Interact. Video-Based Study
Hum. Soc. 3 (1). https://doi.org/10.7146/si.v3i1.120251. Article 1.
Lerner, G.H., 1991. On the syntax of sentences-in-progress. Lang. Soc. 20 (3), 441–458. https://doi.org/10.1017/S0047404500016572.
Lerner, G.H., 1996. On the “semi-permeable” character of grammatical units in conversation: conditional entry into the turn space of another speaker. In:
Ochs, E., Schegloff, E.A., Thompson, S.A. (Eds.), Interaction and Grammar. Cambridge University Press, pp. 238–276.
Lindström, A., 2005. Language as social action: a study of how senior citizens request assistance with practical tasks in the Swedish home help service. In:
Hakulinen, A., Selting, M. (Eds.), Syntax and Lexis in Conversation: Studies on the Use of Linguistic Resources in Talk-In-Interaction/Edited by Auli
Hakulinen, Margret Selting. J. Benjamins Pub, pp. 209–230.
Linell, P., 2009. Rethinking Language, Mind, and World Dialogically: Interactional and Contextual Theories of Human Sense-Making. Information Age
Publishing.
Local, J., 2003. Variable domains and variable relevance: interpreting phonetic exponents. J. Phonetics 31 (3), 321–339. https://doi.org/10.1016/S0095-
4470(03)00045-7.
Maschler, Y., Pekarek Doehler, S., Lindström, J., Keevallik, L. (Eds.), 2020. Emergent Syntax for Conversation: Clausal Patterns and the Organization of Action.
John Benjamins.
Merleau-Ponty, M., 1964. The philosopher and his shadow. In: McCleary, R.C. (Ed.), Signs. Northwestern University Press, pp. 159–181.
Merleau-Ponty, M., 2002. The Phenomemology of Perception (C. Smith, Trans.). Routledge.
Meyer, C., 2010. Self, Sequence and the Senses: Universal and Culture-specific Aspects of Conversational Organization in a Wolof Social Space. Habil-
itationsschrift, Bielefeld.
Meyer, C., Streeck, J., Jordan, J.S. (Eds.), 2017. Intercorporeality: Emerging Socialities in Interaction. Oxford University Press.
Mondada, L., 2013. Coordinating mobile action in real time: the timely organisation of directives in video games. In: Haddington, P., Mondada, L., Nevile, M.
(Eds.), Interaction and Mobility: Language and the Body in Motion. De Gruyter, pp. 300–342. https://doi.org/10.1515/9783110291278.300.
Mondada, L., 2014. Requesting immediate action in the surgical operating room: time, embodied resources and praxeological embeddedness. In: Drew, P.,
Couper-Kuhlen, E. (Eds.), Requesting in Social Interaction. John Benjamins, pp. 269–302.
Mondada, L., 2017. Precision timing and timed embeddedness of imperatives in embodied courses of action: examples from French. In: Sorjonen, M.-L.,
Raevaara, L., Couper-Kuhlen, E. (Eds.), Imperative Turns at Talk. John Benjamins Publishing, pp. 65–101. https://www.jbe-platform.com/content/books/
9789027265524-slsi.30.03mon.
Mondada, L., 2019. Rethinking bodies and objects in social interaction: a multimodal and multisensorial approach to tasting. In: Kissmann, U.T., van Loon, J.
(Eds.), Discussing New Materialism: Methodological Implications for the Study of Materialities. Springer Fachmedien, pp. 109–134. https://doi.org/10.
1007/978-3-658-22300-7_6.
Norén, N., Svensson, E., Telford, J., 2013. Participants’ dynamic orientation to folder navigation when using a VOCA with a touch screen in talk-in-interaction.
AAC (Augmentative Altern. Commun.)Augment. Altern. Commun. 29 (1), 20–36. https://doi.org/10.3109/07434618.2013.767555.
Okada, M., 2018. Imperative actions in boxing sparring sessions. Res. Lang. Soc. Interact. 51 (1), 67–84. https://doi.org/10.1080/08351813.2017.1375798.
Paterson, G., 2020. Group speakers. Lang. Commun. 70, 59–66. https://doi.org/10.1016/j.langcom.2019.02.002.
Pfänder, S., Couper-Kuhlen, E., 2019. Turn-sharing revisited: an exploration of simultaneous speech in interactions between couples. J. Pragmat. 147, 22–48.
https://doi.org/10.1016/j.pragma.2019.05.010.
Planer, R.J., Godfrey-Smith, P., 2021. Communication and representation understood as sender–receiver coordination. Mind Lang. 36 (5), 750–770. https://
doi.org/10.1111/mila.12293.
Reed, D.J., 2021. Situating embodied instruction – proxemics and body knowledge. Linguistics Vanguard 7 (s4). https://doi.org/10.1515/lingvan-2020-0131.
Reynolds, E., 2021. Emotional intensity as a resource for moral assessments: the action of “incitement” in sports settings. In: Robles, J.S., Weatherall, A.
(Eds.), How Emotions Are Made in Talk. John Benjamins Publishing Company, pp. 27–50.
Robillard, A.B., 1996. Anger in-the-social-order. Body Soc. 2 (1), 17–30. https://doi.org/10.1177/1357034X96002001002.
Sacks, H., Schegloff, E.A., Jefferson, G., 1974. A simplest systematics for the organization of turn-taking for conversation. Language 50 (4), 696. https://doi.
org/10.2307/412243.
Seedhouse, P., 2004. The interactional architecture of the language classroom: a conversation analysis perspective. Lang. Learn. 54 (Suppl. 1), x–300. https://
doi.org/10.1111/j.1467-9922.2004.00266.x.
Sidnell, J., 2001. Conversational turn-taking in a Caribbean English creole. J. Pragmat. 33 (8), 1263–1290. https://doi.org/10.1016/S0378-2166(00)00062-X.
Simone, M., Galatolo, R., 2020. Climbing as a pair: instructions and instructed body movements in indoor climbing with visually impaired athletes. J.
Pragmat. 155, 286–302. https://doi.org/10.1016/j.pragma.2019.09.008.
Simone, M., Galatolo, R., 2021. Timing and prosody of lexical repetition: how repeated instructions assist visually impaired athletes’ navigation in sport
climbing. Res. Lang. Soc. Interact. 54 (4), 397–419. https://doi.org/10.1080/08351813.2021.1974742.
Stevanovic, M., Peräkylä, A., 2012. Deontic authority in interaction: the right to announce, propose, and decide. Res. Lang. Soc. Interact. 45 (3), 297–321.
https://doi.org/10.1080/08351813.2012.699260.
Stivers, T., Enfield, N.J., Brown, P., Englert, C., Hayashi, M., Heinemann, T., Hoymann, G., Rossano, F., de Ruiter, J.P., Yoon, K.-E., Levinson, S.C., 2009. Universals
and cultural variation in turn-taking in conversation. Proc. Natl. Acad. Sci. USA 106 (26), 10587–10592. https://doi.org/10.1073/pnas.0903616106.
Stukenbrock, A., 2017. Intercorporeal phantasms: kinesthetic alignment with imagined bodies in self-defense training. In: Meyer, C., Streeck, J., Jordan, J.S.
(Eds.), Intercorporeality: Emerging Socialities in Interaction. Oxford University Press, pp. 237–263.
Szczepek Reed, B., 2020. Reconceptualizing mirroring: sound imitation and rapport in naturally occurring interaction. J. Pragmat.Journal of Pragmatics 167,
131–151. https://doi.org/10.1016/j.pragma.2020.05.010.
72 E. Hofstetter, L. Keevallik / Language & Communication 88 (2023) 52–72

Townsend, S.W., Koski, S.E., Byrne, R.W., Slocombe, K.E., Bickel, B., Boeckle, M., Goncalves, I.B., Burkart, J.M., Flower, T., Gaunet, F., Glock, H.J., Gruber, T.,
Jansen, D.A.W.A.M., Liebal, K., Linke, A., Miklósi, Á., Moore, R., Schaik, C. P. van, Stoll, S., Manser, M.B., 2017. Exorcising Grice’s ghost: an empirical
approach to studying intentional communication in animals. Biol. Rev. 92 (3), 1427–1433. https://doi.org/10.1111/brv.12289.
Walker, G., 2017. Visual representations of acoustic data: a survey and suggestions. Res. Lang. Soc. Interact. 50 (4), 363–387. https://doi.org/10.1080/
08351813.2017.1375802.
Wiggins, S., 2013. The social life of ‘eugh’: disgust as assessment in family mealtimes: Disgust as assessment in family mealtimes. Br. J. Soc. Psychol. 52 (3),
489–509. https://doi.org/10.1111/j.2044-8309.2012.02106.x.

Emily Hofstetter is a Research Fellow at Linköping University. She has studied social interaction in a variety of settings, including rock climbing and pol-
iticians’ encounters with their constituents.

Leelo Keevallik is a Professor in Language and Culture at Linköping University. She specializes on language in interaction and has targeted language use at
dance classes, sheep stables and multilingual construction sites.

View publication stats

You might also like