You are on page 1of 18

Conversation Analysis

Conversation analysis (CA) is recognized for challenging the


traditional way we understand and analyze communication.
From: Sticky Creativity, 2020

Related terms:

Phenomenology, Social Interaction, Artificial Intelligence,

Autism Spectrum Disorder, Communicative Interaction, Ethnomethodology,

Mealtimes, Public Health, Structured Interview, Shared Decision Making

Conversation Analysis
Danielle Lavin-Loucks, in Encyclopedia of Social Measurement,
2005

Conclusion: Advantages, Limitations, and Future


Directions
Advantages
What is distinct about CA is its approach to the phenomenon of interest.
The type of analysis that is produced can formally specify structures of
talk, locate endogenous order, and systematically illuminate the patterns
that characterize everyday interactions. What is most notable about the
conversation analytic approach is its appreciation of and attention to
detail. Each detail is treated as an analytical resource. Moreover, it is
through the careful analysis of detail that conversation analysts come to
an appreciation of how institutions are created, sustained, identified,
conveyed, and altered and how relationships are formed through social
interaction. Although conversation analysis may seem unsuited to
investigations of institutions, it can identify minute details of interaction
that comprise large-scale enterprises, as with the justice system or the
medical field.

Limitations
The main disadvantage of CA lies in the limitations it imposes on the type
of data suitable for analysis: recorded (video or audio) data. Although this
constraint guarantees the veracity of the data, it severely limits the scope
of examinable phenomena. In addition, some of the language surrounding
CA and the related literature is highly specialized and difficult to
understand for inexperienced practitioners. Although the transcription
system is relatively standardized, it too can appear obscure or difficult
and is likewise time-consuming to learn, sometimes giving the
impression of a foreign language.

Moreover, just as other social scientific methods are suitable for


answering specific types of questions, so too is CA. Researchers interested
primarily in the distribution of phenomena, or large-scale
macrosociological questions, find that the line-by-line analysis
characteristic to CA may not be well suited to their research topic.
Likewise, those large-scale macroprocesses that are not linguistically
based fall outside of the realm of appropriate topics for CA research.
Future Directions
The initial findings of conversational rules and the structure present in
conversation have already been extended into studies of institutional
settings and workplace environments including the realms of survey
research, human/computer interaction, news interviews, medical and
educational settings, and justice system operations. Some researchers
have used CA to examine how decisions are made, how news is delivered,
how diagnoses are communicated, and how disagreements and
arguments are resolved. Although CA here has been presented as a
method of analysis, or analytical technique, it can also be characterized as
an ideology; CA represents a way of thinking about the world and the
interactions that comprise lived social experiences. As a
microinteractional approach to order, CA is in its infancy. With the
variation in CA research, the potential for extention into the realm of
previously ignored social interaction is limitless. Although a great deal of
progress has been made in describing the organization of talk, a great deal
remains unexplained and unidentified.

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B0123693985004382

Conversation Analysis: Sociological


J. Heritage, in
International Encyclopedia of the Social & Behavioral Sciences,
2001

1 Background
CA emerged from the confluence of two theoretical initiatives in
sociology. The first derives from Erving Goffman who, in a long series of
theoretical writings, argued that social interaction forms a distinct
institutional order comprised of normative rights and obligations that
regulate conduct in interaction, and that functions as the medium for the
operation of other societal institutions. From Goffman, CA adopted the
essentially Durkheimian perspective that these normative conventions
are autonomous and independent of the social and psychological
characteristics of persons and their particular motivations and projects,
and indeed are the vehicles through which the particular characteristics
of interactants are made manifest in conduct. The second influence
derives from Harold Garfinkel's ethnomethodology, which stresses the
contingent and socially constructed nature of both action and the
understanding of action in the social world. From Garfinkel, CA adopted
the perspective that a common body of normative conventions and
practices are the basic resources for the methodical production and
recognition of action, and for the achievement of common
understandings of joint activities in a dynamic social context that is
maintained or altered with each successive contribution.

These two perspectives were fused into a methodology that focuses on


the sequential production of interaction. Analysis of the normative
practices through which interaction and its outcomes are built is possible
because participants display their understandings and analyses of one
another's conduct in each successive contribution to interaction. The
systematic ‘choices’ involved in each successive move are resources for
grasping the practices—conceived as a domain of massive order and
empirical regularity—through which these choices are implemented.

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B0080430767020015

Co-constructivism in Educational Theory


and Practice
K. Reusser, in
International Encyclopedia of the Social & Behavioral Sciences,
2001

1.4 Context of Discourse Linguistics: Grounding


From the perspective of communication or conversation analysis, co-
constructive or collaborative learning requires individuals to establish,
maintain, and update some degree of mutual understanding. The basic
process by which this is accomplished between individuals is called
grounding (Clark and Brennan 1991). Grounding as a basic form of
collaboration means the moment-by-moment coordination and
synchronization of the content-specific as well as the procedural aspects
and steps of co-constructive activity. There is no need, however, to fully
ground every aspect of an utterance. Clark and Brennan (1991, p. 148)
frame a pragmatic criterion for grounding: The conversants ‘mutually
believe that they have understood what [they] meant well enough for
current purposes.’ Thus, the techniques that are used for grounding are
shaped by the goal and the medium of communication. That is, the
criterion of grounding and the techniques exploited for its maintenance
dramatically change according to the purpose of communication (e.g.,
planning a party, swapping gossip, or gaining deep understanding) and
the constraints of its medium (copresence and visibility in face-to-face
communication; sequentiality and reviewability in letter communication,
e-mail, or computer-supported collaborative work).
Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B0080430767024086

Ethnomethodology in Education
Research
D. Macbeth, in
International Encyclopedia of Education (Third Edition), 2010

The Mid-Century
EM was a development of mid-century sociology. Harold Garfinkel was
party to an extraordinary generation of conceptual innovation in matters
of social science, including critical commentaries on its very possibility.
The registers ranged from sociology to anthropology to philosophy, and
how they were all unavoidably joined at the hip of natural language study.
Lists will not do those developments justice, and space will not permit a
fair accounting, but between the mid-1950s and early 1970s, Anglo-
American social science experienced a profusion of excavations,
proposals, and demonstrations from Herbert Blumer to Erving Goffman to
Clifford Geertz, and from Thomas Kuhn to Charles Taylor to Peter Winch
and Wittgenstein.

From these various directions, the formal structures and distinctions that
underwrite the natural-science model for social science became the
objects of critical inquiry that yielded sociology’s qualitative turn of the
1950s and 1960s. Central to the rethinking was the question of what
purchase the model could have for understanding meaningful social
worlds (or science, for that matter) in any actual case. One need not insist
on understanding actual cases to see how the familiar empiricism of
social science and educational research can say little, if anything, about
them. As a familiar example, for all kinds of practical administrative
purposes, it can be very useful to know that the average American family
produces 2.2 children. For program-planning purposes, for the
professional administrative offices of modern life, this can be very useful
stuff to know; and it is entirely descriptive, although in a peculiar way: It
tells us – and can tell us – nothing of any actual family. It tells us nothing
of the formative histories that yield its aggregate finding. It can be of no
use for understanding the very affairs it reports (though the report can be
of interest to other affairs). For these reasons, should we take interest in
the affairs that produce such data in the aggregate, we will have to look to
them, and do our looking in a very different fashion.

Within social science, Garfinkel’s EM was perhaps the most analytically


radical proposal among the commentaries, radical in that it recast the
exercise of sociological analysis and its relationship to the worlds we
study. The formulation of member methods broke the conceptual
monopoly on where and how we might find methods of inquiry, and who
a methodologist or analyst could be. His work drew on an eclectic yet
thematic collection of prior work, in Husserlian and Heideggerian
phenomenology, conversations and readings with Alfred Schutz and
Aaron Gurvitch, readings in philosophical pragmatism, the liminal works
of Merleau Ponty, the formal architectures of Talcott Parsons, his advisor
at Harvard, alongside the sociological canon. He set out to topicalize, and
then write an alternate to sociology’s consensus project of formal,
synthesizing analyses whose aim and achievement would be expressed in
synoptic theory. Parsons’ theory of social action became an exemplar for
the explication of what Garfinkel refers to as “formal analysis” and its
identifying exercise of “generic representational theorizing” (Garfinkel,
1996).

As for the promise of formal analysis and its synoptic field of view – the
macro-end of the familiar macro–micro divide, and the confident
panoptics it delivers – it was observed in return that every observation of
a distant landscape is a local reckoning. To see a learning disability, best
practice, or hegemonic complicity is to collect local reckonings by the
basketful, to rely on them, arrange them, submit them to certification
procedures, and render them as practical evidences of a structure that
was there all along. When we see this work, the comforts of distinctions
such as form and content, structure and function, and macro and micro,
collapse into fields of reflexive relations of co-constitution. Content
becomes constitutive of form. Structure lives in function’s local occasions.
Rather than genealogies, we find constitutive relations. Rather than causal
chains, we find demonstrable sense. (Notions of reflexivity are now
familiar in the literature. We have the reflexive practitioner and the
reflexive ethnography. These formulations trade on the wisdoms of the
early-moderns, who counseled know thyself. The counsel is beyond
reproach. We should. But the reflexivity of constitutive relations is quite
different, as a topic and task of analysis (Garfinkel, 1967).)

This is analytically radical fare, in that to take it seriously is to re-think


our received analytic appointments and ambitions. Central to this account
is the place of constitutive detail for the analysis and understanding of
social action, or what Garfinkel refers to as the concreteness of the
plenum (Garfinkel, 1996). Husserl and Schutz spoke of the lebenswelt.
Each refers to the life world, the profusion of sense, meaning, action, and
others that fills our lives in the natural attitude of everyday life. Neither
idiosyncratic nor subjective, the plenum refers to the meaningful
character of evident – and thus objective – worlds. EM proposes to
describe and understand the production of this public, witnessable order.
Yet it is a central premise of our familiar research methods that the social
world as we find it naturalistically, in its mundane presenting forms as,
for example, the chatter of a classroom during third period, is without
useful analytic possibilities. If we are to make good on the promise of a
social science, we cannot work with it that way, in its actual, temporal
durations and material productions. The contextual coherence of students
engaged in chit chat defeats the promise of science.

For that reason, we rely on formal methods to craft stable representations


of those ordinary worlds, by the familiar devices of code and variable
constructions, set in play within fields of mathematical – or theoretical –
relations, that might then yield models or logics that can, in turn, reveal
unseen order or structure. This is the family of practices Garfinkel refers
to as formal analysis, and by the expression he means to topicalize the
unspoken premise that there is no order in the concreteness of actual
occasions. Order, on the formal account, lies elsewhere. But one need not
doubt that there are indeed structures unnoticed or out of view – as in the
vector of a flu transmission or the collapse of a housing market – in order
to take interest in the local enactments that yield such things as vectors or
collapses, or the work of finding them.

Conversation Analysis
The subsequent and closely aligned development of conversation analysis
(CA) (Sacks, 1992; Sacks et al., 1974) gave these arguments a vivid field of
demonstrations, vivid in the profusion of circumstantial detail recorded in
tapes and on transcript. Conversation shows itself as a primordial
members’ method. In the organization of natural conversation, turn by
turn, we can see what such vernacular analyses could be. (The formative
history and conceptual relations between EM and CA is a regular topic in
the EM/CA literature, for example, Heritage, 1984.)

Sequential analysis gave detail to the proposition that meaning owns a


practical–analytic fabric, as in how every next turn to a conversation –
without exception – displays an understanding of the turn that has gone
before, and shapes the sequential horizon, or context, of next turns (Sacks
et al., 1974). Indeed, to speak next, whether at the dinner table or in the
classroom, is to evidence an understanding of the speaking just done,
what action it produces, what horizon of relevant next actions it projects,
and, of all things, where the ongoing turn might end, so that we might
begin our own. To speak next is to analyze all these things, whose
analyses are revealed in the production of an apt next turn, on time
(Moerman and Sacks, 1971/1988). The very formulation of turn taking in
classroom discourse studies owes a very large debt to these first studies.

EM studies thus aim to show that there is indeed order in the plenum, an
ignored, constitutive orderliness. To understand and describe it is to have
use for the circumstantiality of meaning’s productions. The point is not
that somehow one should have use for this worldliness, because it has
been ignored. It is rather and only that the order and structure of ordinary
worlds turns upon it. In our everyday lives, we do not remotely ignore this
circumstantiality. We live by it.

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780080448947015384
The Self-Organization of Human
Interaction
Rick Dale, ... Daniel C. Richardson, in
Psychology of Learning and Motivation, 2013

5.2.2 Interactional Patterns


Beyond this basic rhythm of interaction, conversation analysis has
persuasively shown how speech turns are often organized in functionally
structured sequences of turns, such as adjacency pairs: questions are
ordinarily responded to with an answer, not with another question; offers
and invitations are ordinarily followed by acceptances or declinations,
and so on (Schegloff, 1986). Turns and adjacency pairs are themselves not
free-floating entities, but often fulfill a role in larger interactional
patterns, locally unfolding routines that scaffold and constrain the
possibilities of actions and interpretation in joint activities (Clark, 1996;
Levinson, 1983). Interactional patterns are typically conceived of as
normative static phenomena already shared—or assumed to be shared—
by interlocutors (Sacks, Schegloff, & Jefferson, 1974; Schank & Abelson,
1977). The synergy approach, however, implies that these elements are
part of a dynamic context-sensitive interaction. Interactional patterns
vary in formality and flexibility from free and relatively unconstrained
conversation over the morning coffee to tightly structured and sometimes
even explicitly codified task-oriented conversations (Hutchins, 1995a,
1995b; Perry, 2010). Interactional patterns work to reduce the overall
degrees of freedom of the system in a functionally driven way and enable
a smoother flow of the interaction.

A number of recent studies indirectly show that ad hoc interactional


patterns emerge and are maintained in task-related interactions. In a
version of “the maze game” (Healey & Mills, 2006; Mills &
Gregoromichelaki, 2010), it was observed that, over the course of 12
games, participants radically structured and shortened their linguistic
exchanges from more than 150 turns to brief and efficient exchanges.
Through a shared history of interaction, the structure of their interaction
is stabilized. This enabled participants to smoothly produce and interpret
highly elliptical and fragmentary utterances without much negotiation or
clarification. Extending this work, Mills (2011) systematically investigated
how these interaction patterns emerge and spread in a small speech
community. Each participant played a number of games with shifting
partners within a “community”. Then, in a critical test trial, half of the
participants were paired with a member from another community. This
perturbation seriously disrupted the interaction in the affected groups.
Participants were found to edit their utterances to a much higher degree,
were observed to explicitly acknowledge each other’s utterances more
often, and overall performed less accurately. The findings suggest that
interactional patterns emerge from a shared history of interaction and
come to implicitly constrain the degrees of freedom of the interlocutors,
diminishing ambiguity and supporting a smoother and more effective
flow of the coordination (for a more comprehensive discussion of these
issues, cf. Mills, in press, and Fusaroli, Raczaszek-Leonardi, & Tylén, in
press).

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780124071872000022

Document Analysis
T. Rapley, K.N. Jenkings, in
International Encyclopedia of Education (Third Edition), 2010

Ethnomethodological Ethnography: Documents-


in-Interactions
Research on documents within the tradition of ethnomethodological
ethnography – and we include conversation analysis within this area –
focuses on describing in great detail the moment-by-moment social
organization of interactions with, around, and about documents (see
Table 1, fourth column). Like ethnographic work outlined in the last
section, it is not so much interested in the content of documents, but in
the organization of interactions where documents take some role.
However, in general terms, there are two central differences. First, for
ethnomethodological work, the immediate, here-and-now, to-and-fro of
the participants interactions is the central object of interest, description,
and analysis. Broader contextual factors, like participants’ backgrounds or
organizational policies, are generally not drawn on as a resource to
explain why an interaction happens in a specific way. Instead, the focus is
on what the participants in the interaction show themselves to actually
do, the resources they draw on in just that moment of interaction. Second,
ethnomethodological work increasingly relies on audio or video
recordings of naturally occurring interactions. These form the centerpiece
of analysis, are subjected to repeated viewings, and sequences of
interaction – including verbal and nonverbal features – are transcribed in
very fine detail, analyzed, and revised in conjunction with the original
recordings.

An example of this method in practice is Heap's various analyses of


audiotapes of young children reading in classrooms (e.g., Heap, 1985,
1991). Rather than assuming we already know what reading looks like, or
when it occurs in classrooms – through some a priori theory or criteria –
he focuses on the broad range of interactional activities that produce the
ongoing activity as reading. For him, children learn what counts as
reading criterially’, by learning what counts as reading procedurally, in
and through taking part in interactions where good reading is shown to
be taking place. For example, Freebody and Freiberg (2001) show how in a
moment of a parent–child interaction, where a child is reading a book
aloud, the parent works to praise, instruct the child to sound the letters,
correct a sounding, etc. In this moment of reading a document, the central
interactional task is reading out loud a written text, and good reading is
orientated to and produced as correct word-saying. In other contexts, say
a university seminar on a specific article, in and through debating and
discussing, students learn what good academic reading involves – finding
fault, contrasting with other documents, quoting and referencing from
this and other documents, reading footnotes, following up references, etc.

Such work focuses researchers’ attention on unpacking, in detail, how


specific tasks are undertaken.

Documents are not only paper-based, single-user, objects and


ethnomethodological ethnographic research has focused on how
mundane and digital technologies shape the world of educational
activities. Such research has focused on how people interact, work with,
and draw on such document-related technologies as white- and
blackboards, overheads, computer-based presentations, and videos.
Rendle-Short's (2006) study of computer-science seminar presentations
focuses on the academic monolog. She shows us how presenters work to
interact with the audience not only through their talk, but also through
their gestures, gaze, and bodily movements. Using a collection of
videotapes of seminars and transcribing in great detail the moment-by-
moment verbal and nonverbal behavior, she focuses on how presenters
coordinate a range of documents and technologies as they seek to engage
with the audience. As presenters talk, they time their slides to focus,
support, and supplement the issues they are raising. The text-based and
visual images on the slides work to illustrate some aspects of their
ongoing talk, and presenters through their hand gestures work to direct
the audiences’ attention to a specific aspect or issue. In this way, we begin
to see how, what appears as a relatively simple task – giving a seminar –
is saturated with a complex range of interactional work. The documents
the presenters work with, be it notes available to them, or slides available
to all, are brought to life and shaped through the concerted, moment-by-
moment, organization of talk, gesture, and technology.

Such ethnomethodological ethnographic research findings can seem


focused on the quite obvious, in that it can show us what we take for
granted. However, researchers in this tradition argue that it is only
through an understanding the details of what we actually do (the details
of which can largely escape our notice), over what people tells us they do,
or what we think they do, that we can then develop theories or
interventions that are actually relevant to current practice.

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780080448947015220
Sticky ideas
Sille Julie J. Abildgaard, in Sticky Creativity, 2020

Displays of idea ownership


The current exploration into idea ownership during brainstorming
employs tools from ethnomethodology and multimodal conversation
analysis (multimodal EMCA), as the main theoretical and methodological
framework. Conversation analysis (CA) is recognized for challenging the
traditional way we understand and analyze communication. Interaction is
accomplished on a turn-by-turn basis, where the participants are
reflective and oriented toward one another's context-shaped and context-
renewing actions (Due, 2016). Thus, speech is shaped in complex and
reflective ways by the structure of interaction (Sidnell, 2010, p. 157) – and
the same goes for presenting and proposing ideas in a group. When an
idea is proposed it is performed by the use of various communicative and
semiotic resources such as talk, bodily movement, gestures, materials (e.g.
a sticky note), and surroundings (Due, 2014, p. 208). Drawing on CA it
becomes possible to study the conversational turns in brainstorming
sessions. That is, how the speaking participant format or design her or his
turn to implement some action, in this case, the idea, for the other
participants as recipients (Hoey & Kendrick, 2017).

Getting an idea or talking about one's idea is a familiar situation to which


most people can relate. In addition, when people engage in teamwork or
collaboration, it is a common thing to attribute ownership to one's idea
such as referring to an idea as “Cecilia's idea” or “my idea” in order to
identify ideas as belonging to someone. Idea proposals, as any other
proposals, may be meet with disaffiliated responses (critique) and others
with affiliated responses (agreement) (Due, 2014). Moreover, when
presenting one's idea to others, the idea proposal might be met with
comments, suggestions for improvement, or new ideas. Thus, we tend to
protect our idea and argue for its existence (Baer & Brown, 2012).

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780128165669000045

Classroom Discourse and Student


Learning
L. Hemphill, in
International Encyclopedia of Education (Third Edition), 2010

Definition/Scope
Classroom discourse research arose in the 1970s as a special focus within
the new discipline of conversation analysis and early on identified some
of the distinctive formal characteristics and social purposes of talk in
schools. Research has generated descriptions of the forms of talk that are
specific to academic contexts, characterizations of the rules that govern
teacher–student talk, and accounts of the ways in which the development
of language skills can be fostered or hindered in classrooms. As analytic
tools have been applied to different kinds of school settings, the field has
contributed to better understanding of varied issues such as the
socialization of academic language, gender roles in classrooms, and
processes of second-language learning in school.

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780080448947005157

Ethnomethodology: General
S.E. Clayman, in
International Encyclopedia of the Social & Behavioral Sciences,
2001

4.3 Talk and Social Interaction


Perhaps the most widespread contemporary variant of ethnomethodology
is what has come to be known as conversation analysis. This burgeoning
field was developed by Harvey Sacks, originally Garfinkel's student and
colleague, in collaboration with Emanuel Schegloff and Gail Jefferson (e.g.,
Sacks 1992, Atkinson and Heritage 1984).

Conversation analysis (henceforth CA) involves the study of practical


reasoning as it is put to use in the conduct of spoken interaction. The
domain of interaction—what Erving Goffman referred to as the interaction
order—is more general than any of the specialized institutional domains
investigated by ethnomethodologists, for interaction lies at the heart of
virtually all of these institutions and extends as well to informal
encounters between persons. Moreover, in the history of the human
species, interaction developed long before other societal institutions came
into being, and is in this respect ‘the primordial site of sociality’ (Schegloff
1988).

CA differs from other lines of ethnomethodological research not only in


substance but in methodology. Conversation analysts rely exclusively on
audio- and video-recordings of interactional data, and transcripts that
capture the details of interaction as it actually occurs. Such data have
numerous advantages—they can be examined repeatedly, analyzed at an
unprecedented level of detail, and reproduced in published works so that
readers can independently assess the validity of analytic claims.

The resulting research enterprise has generated an impressive array of


interlocking and cumulative findings on a wide range of subjects. These
include the organization of turn taking, action sequences, lexical choice,
the relationship between talk and nonvocal activities, and the
collaborative management of various interactional activities (e.g., giving
advice, delivering good and bad news, telling troubles, etc.). More
recently, researchers have applied the analytic resources of CA to various
phenomena that intersect with, and can be informed by, the study of talk-
in-interaction. These include how talk is organized in various institutional
settings, and how it serves as a medium for the accomplishment of
occupational tasks such as medical examinations, classroom lessons,
journalistic interviews, trial examinations, and so on (e.g., Boden and
Zimemrman 1991, Drew and Heritage 1992). Researchers have also begun
to explore how the study of talk-in-interaction can illuminate linguistic
phenomena such as grammar (Ochs et al. 1996), as well as medical
disorders such as aphasia that manifest themselves at the level of speech
(Goodwin 1995, Heeschen and Schegloff 1999).

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B008043076700766X

Speaking, Writing and Communicating


Ruth E. Corps, in Psychology of Learning and Motivation, 2023

4.2 Incrementality and disfluency in dialog


Language production may also be difficult in laboratory tasks because
speakers are typically encouraged to produce well-formed utterances,
which are syntactically complete and do not contain any disfluencies,
such as uh or um. As a result, participants are encouraged to plan a full
utterance before speaking—if they do not, then they risk producing
disfluent utterances. Planning in this way may make production difficult,
given that there is much evidence for incremental planning in monolog
(see Section 2.2). One consequence of this incrementality is that speakers
are often disfluent, producing filled pauses such as uh or um (see Section
2.2.1). Although there is much research showing that utterances are
disfluent, this disfluency has been underestimated by theories of dialog,
and particularly by Levinson and Torreira (2015), because participants in
laboratory tasks are discouraged from producing disfluent utterances. In
particular, they have focused on fluent, idealized utterances, with the
implicit assumption that disfluencies need to be excluded to study the
mechanisms of conversation in their purest form (e.g., Bögels, Magyari, &
Levinson, 2015). This point is important because it suggests that speech
elicited in laboratory tasks designed to understand the mechanisms of
language production in dialog may be very different, and in fact more
difficult to produce, from speech as it naturally occurs.

To investigate how much conversational speech deviates from laboratory


speech, I conducted further analyses of the Santa Barbara Corpus
described in Section 4.1, focusing again on the 11 face-to-face
conversations between two people. This corpus has already been used to
study disfluency in an analysis by Tottie (2014), but Tottie focused solely
on the occurrence of uh and um. These filled pauses are thought to mark
hesitations by the speaker, and could be used to hold the floor while
further planning occurs (see Section 2.2.1). I was interested in these filled
pauses, but when analyzing the corpus for instances of parallel talk, I
noticed that utterances could be disfluent in a number of different ways.
For example, speakers often produced discourse markers, such as well or
you know, which are “sequentially dependent elements which bracket
units of talk” (Schriffin, 1987). They can be removed from an utterance
without altering its meaning or grammaticality (Schourup, 1999). Much
like filled pauses, speakers may produce these discourse markers as
hesitations, to buy time for further planning. Research suggests that
different filled pauses and discourse markers likely have different
functions (e.g., Clark & Fox Tree, 2002; Fox Tree & Schrock, 2002; Fuller,
2003). However, my aim was not to determine the different uses of these
filled pauses and discourse markers, but rather to illustrate that they
occur and contribute to the (dis)fluency of dialog.

Additionally, utterances were often incomplete (6) or contained


repetitions (often referred to as self-repairs in the Conversation Analysis
literature; e.g., Schegloff, Jefferson, & Sacks, 1977), taking many attempts
before successful articulation (7). In these instances, speakers had likely
planned part of their utterance, and finished articulating it before they
had planned the next part of their utterance. As a result, they abandoned
or reformulated their utterance. In other words, incomplete and repeated
utterances provide further evidence that planning is incremental. Table 1
provides counts and percentages for the different disfluency categories I
considered. I will discuss each of these categories in more detail below,
but a full coding criteria (along with examples) can be found at
https://osf.io/7aphq/.

Table 1. Frequencies (n) and proportions (%) of backchannels, incomplete


segments, repetitions, resumptions, discourse markers, and filled pauses
for segments in the Santa Barbara Corpus of Spoken American English.

Empty Cell Total segments


N = 3190

N %

Backchannels 533 16.71

Incomplete segments 912 28.59

Interruptions 300 9.40

Repetitions (or self-repairs) 571 17.90

Resumptions (after interruption) 69 2.16

Segments containing at least one filled pause 531 16.65

Segments containing at least one discourse 879 27.55


marker

Disfluent segments, containing at least one 1854 58.12

category
Note that these categories were not mutually exclusive, and so a
segment could belong to more than one category (i.e., it could contain
both a filled pause and a discourse marker). The final row in the table
shows the number of segments that were disfluent, and contained at
least one category. In particular, a segment was disfluent if it was
incomplete, interrupted, repeated, resumed, or contained a filled
pause or discourse marker, regardless of how many of these
phenomena occurred in the segment.

(6) Lynne: Cause y- I mean you get so tired.

(7) Lenore: I thought they used the horsehooves in .. for gelatin.


Although previous research has extensively quantified the frequency of
discourse markers, repetitions, and filled pauses in corpora (e.g., Crible,
2019; Crible, Degand, & Gilquin, 2017; Crible, Dumont, Grosman, &
Notarrigo, 2019; Crible & Pascual, 2020), this work has not considered
these findings in the context of theories of dialog, such as Levinson and
Torreira (2015). Furthermore, these corpora have often been based on
highly restricted tasks, such as describing a route around a map (e.g.,
Branigan et al., 1999), and have tended to focus on limited disfluency
types. Knowing what people say and how they speak in natural dialog is
not only critical for determining whether laboratory speech is a good
proxy for natural speech, but also for generating theories of speaking in
dialog.

Before I discuss the coding criteria I used for identifying disfluent


utterances, it is worth noting that previous research has shown that
backchannels are common in spontaneous conversation (e.g., Knudsen,
Creemers, & Meyer, 2020). The forms and functions of backchannels have
been widely discussed from linguistic and psychological perspectives
(e.g., Bangerter & Clark, 2003; Clark & Krych, 2004; Tolins & Fox Tree,
2014). They indicate to the present speaker that they should continue
talking either by proceeding in their narrative or elaborating it (e.g.,
Schegloff, 1982, 2000; Tolins & Fox Tree, 2016 ). These backchannels are
unlikely to contribute to disfluency—in fact, they likely contribute to the
flow of dialog by allowing the listener to respond without planning a full
utterance. However, I still quantified their occurrence because some
discourse markers (such as hmm) could be produced as backchannels.
Table 1 shows that 17% of the segments were backchannels (calculated as
the number of segments containing a backchannel divided by the total
number of segments).

Incomplete segments were those that contained an incomplete word or


were abandoned by the speaker and were not resumed in any of the
surrounding segments (i.e., the whole segment was incomplete).
Incomplete segments also included those in which the speaker was
interrupted by their partner and so did not finish their utterance. I also
considered segments in which the speaker repeated themselves (e.g., you
have to-to graduate) to be incomplete because the initial portion was
incomplete and subsequently repeated. Note that segments could be
incomplete in more than one way. For example, it could contain an
incomplete word, be resumed, and then subsequently be abandoned by
the speaker so the whole segment is incomplete. I did not determine how
many times each segment was incomplete—it was considered incomplete
if it belonged to any of these categories. In total, 29% of the segments
were incomplete, with 9% of them being incomplete because the other
speaker interrupted.

When segments were incomplete, speakers often began a new segment


by repeating part of their earlier, incomplete segment. To determine how
often speakers repeated part of their segment, I identified segments that
contained repetitions or that were resumed after an interruption by
another speaker. Again, I did not determine how many times each
segment was repeated. Rather I considered an utterance to be a repetition
if it was repeated at least once. In total, 18% of the segments were
repetitions, while 2% were resumptions of an earlier, interrupted segment.

When coding the discourse markers and filled pauses, I considered words
(such as well or you know) and sounds (such as uh or um) to be discourse
markers or filled pauses if they could be removed from the segment
without altering the speaker's meaning. For example, you know would be
considered a discourse marker in a segment such as And doing it and stuff
you know, but not in a segment such as Do you know what I mean?. Table 2
shows the counts and percentages for the individual filled pauses and
discourse markers. Segments could contain multiple occurrences of the
same filled pause or discourse marker. For example, the speaker could
produce uh multiple times in the same segment. But since I was
interested in how many segments contained at least one occurrence of
each type of filled pause or discourse marker, Table 1 shows the number
of times the speaker produced a particular type of filled pause or
discourse marker at least once in a segment. In total, 17% of the segments
contained at least one filled pause, and 28% of the segments contained at
least one discourse marker.

Table 2. Frequencies (n) and proportions (%) of different types of filled


pauses and discourse markets in the Santa Barbara Corpus of Spoken
American English.

Filled pause N %

Uh 326 10.22

Oh 134 4.20

Hm 66 2.07

Huh 23 0.72

Ah 11 0.35

Uhuh 4 0.13

Aw 7 0.22
Filled pause N %

Total filled pauses 571 17.42

Discourse markers

You know 315 9.88

Well 252 7.90

So 170 5.33

Like 164 5.14

I mean 115 3.61

Kinda 74 2.32

Geez 59 1.85

Man 59 1.85

Oh God 34 1.07

Right 33 1.04

Pretty 28 0.88

See 27 0.85

Really 19 0.60

Now 17 0.53

Sorta 15 0.47

Anyway 13 0.41

Total discourse markers 1394 43.70

Note that these categories were not mutually exclusive, and so a


segment could contain more than one filled pause or discourse
marker.

To determine how often segments were disfluent, I determined how many


were incomplete, interrupted, repeated, resumed, or contained a filled
pause or a discourse marker. Segments were considered disfluent if they
fell into any one of these categories. In total, 58% of the segments were
disfluent, and so around only 40% of the segments contained no
disfluency and were similar to the idealized utterances elicited in
laboratory tasks studying the mechanisms of speaking in dialog.

These findings add to an existing body of research that has shown that
spontaneous speech is disfluent (see Section 2.2.1), and suggest that
speech planning is incremental. Speakers are likely incremental in this
way because planning while comprehending is cognitively demanding
(e.g., Oomen & Postma, 2001). Although corpora analyses do not allow us
to draw conclusions about the direction of causality, there is some
evidence that the fluency of speech is affected when speakers dual-task
production and comprehension. For example, Boiteau, Malone, Peters, and
Almor (2014) had participants conduct a visuomotor tracking task while
simultaneously interacting with a confederate. Participants’ tracking
performance declined towards the end of the confederate's turn,
suggesting they began response planning at this point. Participants’
speech rate was also affected by concurrent tracking when they had to
plan a response compared to when they just had to listen, but there was
no evidence that planning while listening increased the number of
disfluencies participants produced. However, the authors considered only
ums and uhs, but it is clear from Tables 1 and 2 that there are many other
types of disfluencies.

This incrementality (and disfluencies, by extension) invites parallel talk.


Speakers (Speaker A) do not plan their full utterance before they speak,
and so they may often pause or hesitate while they plan later parts of
their utterance, leading to disfluent speech. This hesitation allows the
other speaker (Speaker B) to jump in and articulate their own increment.
Speaker A then articulates the rest of their utterance, and so they do not
directly respond to the immediately preceding utterance of Speaker B.
Thus, incrementality, disfluencies, and parallel talk are closely related to
each other.

These findings have important consequences for the way we think about
language during dialog. First, they suggest that the utterances we study in
the laboratory are very different from the utterances speakers actually
produce in natural conversation. This point may seem obvious, but it has
important consequences for Levinson and Torreira (2015) theory, which
has been used to motivate many studies investigating the mechanisms of
speaking during dialog. In particular, Levinson and Torreira (2015) claim
that next speakers must complete all stages of response planning as early
as possible (i.e., as soon as they can identify the gist of the current
speaker's utterance) if they are to achieve timely turn-taking and respond
within 200 ms. But such early-planning may not be necessary in natural
conversation—speakers could use disfluencies to hold their turn while
planning their utterance, thus minimizing the overlap between
production and comprehension processes.

Relatedly, experimental studies investigating production in dialog likely


make production harder than it needs to be. First, participants are often
encouraged to plan well-formed utterances, and any utterances
containing disfluencies are often excluded from analyses. Participants
may thus be discouraged from planning incrementally, and may instead
plan their complete utterance before they speak in an effort to ensure
they produce well-formed utterances. Relatedly, our corpora analyses
(Corps et al., 2022) have demonstrated that speakers do not always
directly respond to each other—instead, they develop their utterances in
parallel and continue an utterance they produced previously. This
situation is very different from laboratory tasks, where participants often
need to directly respond to the previous speaker and the content of their
own utterance depends on the content of the previous speaker's
utterance. As a result, speakers likely engage in more extensive advance
planning (resulting in a larger overlap between production and
comprehension) in laboratory tasks than there needs to be in natural
conversation, thus contributing to turn gaps longer than 200 ms.

In sum, it is clear that these theories are missing an important part of


natural speech—namely, that speakers are highly disfluent. Thus, these
results have important methodological and theoretical consequences, and
suggest that we need to study production both in highly controlled
laboratory tasks and in natural conversation if we are to build a clear
picture of the mechanisms of speaking during dialog (see also De Ruiter &
Albert, 2017). In particular, future experimental work could take excerpts
from speech corpora and test how disfluencies affect the accuracy of
when speakers articulate their responses. Additionally, they could also
test how disfluencies affect how participants distribute their attention
between response planning and simultaneous comprehension. Finally,
research could investigate whether parallel talk is more common in
instances where speakers hesitate and produce disfluencies. Testing these
hypotheses would provide insight into how comprehension, response
planning, and articulation are interwoven during conversation, and would
allow researchers to develop theories of language production in natural
dialog.

What these findings demonstrate, however, is that we currently do not


have a clear picture of speaking in dialog, like we do in monolog, because
these studies have tended to focus on highly idealized utterances, often
ignoring the fact that production is highly incremental, flexible, and far
from perfect.

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/S0079742123000026

All content on this site: Copyright © 2024 Elsevier B.V., its licensors, and contributors. All rights are
reserved, including those for text and data mining, AI training, and similar technologies. For all open
access content, the Creative Commons licensing terms apply.

You might also like