You are on page 1of 11

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/369451528

Coreference Delays in Psychotic Discourse: Widening the Temporal Window

Article in Schizophrenia Bulletin · March 2023


DOI: 10.1093/schbul/sbac102

CITATIONS READS

4 44

3 authors, including:

Alicia I. Figueroa-Barra
University of Chile
34 PUBLICATIONS 63 CITATIONS

SEE PROFILE

All content following this page was uploaded by Alicia I. Figueroa-Barra on 23 March 2023.

The user has requested enhancement of the downloaded file.


Schizophrenia Bulletin vol. 49 suppl. 2 pp. S153–S162, 2023
https://doi.org/10.1093/schbul/sbac102

SUPPLEMENT ARTICLE

Coreference Delays in Psychotic Discourse: Widening the Temporal Window

Downloaded from https://academic.oup.com/schizophreniabulletin/article/49/Supplement_2/S153/7083526 by guest on 23 March 2023


Claudio Palominos*,1, Alicia Figueroa-Barra2,3, and Wolfram Hinzen1,4
1
Department of Translation and Language Sciences, Universitat Pompeu Fabra, Barcelona, Spain; 2Department of Psychiatry
and Mental Health, Translational Psychiatry Laboratory - Psiquislab, School of Medicine, Universidad de Chile, Santiago, Chile;
3
Millennium Nucleus to Improve the Mental Health of Adolescents and Youths (IMHAY), Santiago, Chile; 4Institució Catalana de
Recerca i Estudis Avançats (ICREA), Barcelona, Spain
*To whom correspondence should be addressed; Universitat Pompeu Fabra, Roc Boronat 138, 08018 Barcelona, Spain; tel: (34) 93 542
2275, fax: (34) 93 542 1617, e-mail: claudio.palominos@upf.edu

Background and Hypothesis: Any form of coherent dis- Introduction


course depends on saying different things about the same
Words in spontaneous speech are used to reference entities
entities at different times. Such recurrent references to the
in the world. Once referenced, these entities remain avail-
same entity need to predictably happen within certain tem-
able in the universe of discourse for further referencing.
poral windows. We hypothesized that a failure of control
Such recurrence to the same entities is a fundamental
over reference in speakers with schizophrenia (Sz) would
precondition for any kind of narrative or coherence, in
become manifest through dynamic temporal measures.
which the same entities will need to be available for (re-)
Study Design: Conversational speech with a mean of 909.2
referencing across a sequence of different events in which
words (SD: 178.4) from 20 Chilean Spanish speakers with
they end up occurring. Unsurprisingly, therefore, natural
chronic Sz, 20 speakers at clinical high risk (CHR), and
language grammar features a rich set of devices allowing
20 controls were collected. Using directed speech graphs
speakers to navigate the universes of their discourse, such
with referential noun phrases (NPs) as nodes, we studied
as the determiners a, the, or some in English, which reg-
deviances in the topology and temporal distribution of such
ulate whether reference is to a new entity, an entity men-
NPs and of the entities they denote over narrative time.
tioned before or contextually present, or a set of entities
Study Results: The Sz group had a larger density of NPs
identified by some description.1 Determiners are applied
(number of NPs divided by total words) relative to both
to content words such as the noun car, which as such only
controls and CHR. This related to topological measures
express a general lexical-conceptual meaning. We would
of distance between recurrent entities, which revealed that
be unable to distinguish between reference to a specific
the Sz group produced more recurrences, as well as greater
entity, an entity mentioned before, or particular instances
topological distances between them, relative to controls. A
of some type of entity, when using single content words
logistic regression using five topological measures showed
alone: grammar is needed to generate referential meaning
that Sz and controls can be distinguished with 84.2% ac-
of these kinds. We here hypothesize that the latter type of
curacy. Conclusions: This pattern indicates a widening of
meaning, as distinct from conceptual meaning, can help
the temporal window in which entities are maintained in
to understand a crucial link between language and psy-
discourse and co-referenced in it. It substantiates and ex-
chosis2 and contribute to a linguistic model of it.3
tends earlier evidence for deficits in the cognitive control
Language anomalies clinically define criterial symp-
over linguistic reference in psychotic discourse and informs
toms of schizophrenia (Sz) such as formal thought dis-
both neurocognitive models of language in Sz and machine
order (FTD).4 In FTD, they specifically include referential
learning-based linguistic classifiers of psychotic speech.
anomalies, such as the loss of a train of thought or topic,
or unclear references or pronouns.5,6 Crucially, pronouns
Key words: speech graphs/reference/narrative/topological
lack conceptual meaning altogether, as does a word like
distances
the, which governs whether noun phrases (NPs) are def-
inite (eg, the car) or not. Recent linguistic studies have
© The Author(s) 2023. Published by Oxford University Press on behalf of the Maryland Psychiatric Research Center. All rights reserved. For
permissions, please email: journals.permissions@oup.com

S153
C. Palominos et al

shown that usage of definite NPs distinguishes Sz groups narrative time might provide an analytical substrate for
from controls, and Sz groups with and without FTD understanding language dysfunction in psychosis. By let-
from each other, in Spanish and English,7,8 with related ting nodes represent referential NPs rather than words,
patterns recently found in Turkish.9 Tovar et al.10 studied speech graphs contribute to a linguistic model of psy-
a rare sample of highly thought-disordered patients with chosis that can illuminate future computational studies
Sz and report that word-level (lexical-conceptual) anom- and neurocognitive models alike. Our basic hypothesis
alies were very rare, while referential anomalies were was that referential anomalies previously noted in Sz
pervasive. Interestingly, determiners have also made a would be manifest at the level of topological measures of

Downloaded from https://academic.oup.com/schizophreniabulletin/article/49/Supplement_2/S153/7083526 by guest on 23 March 2023


repeated appearance as parts of several highly accurate distance in such speech graphs.
computational classifiers of psychosis based on natural
language processing (NLP) tools.11–13 Importantly, nei- Methods
ther pronouns, determiners, nor other referential devices
such as proper names enter the semantic similarity spaces Participants
that also have been used to classify psychotic speech se- Clinical phenomenological interviews of 20 patients
mantically.11,13,14 Semantic similarity can track whether a diagnosed with Sz and of 20 people at clinical high risk
scooter is more similar to a car than a bear but they do (CHR) were used in this study. CHR participants were
not track referential meaning,12 which defines a critically recruited from the University Psychiatric Clinic of the
distinct dimension of linguistic organization. University of Chile and met ultra-high clinical risk criteria
Our aim here was to study referential meaning in psy- as assessed with Structured Interview for Psychosis-Risk
chotic speech from the viewpoint of the structure of the Syndromes and Scale of Psychosis-Risk Symptoms17.
space of the entities referenced as a narrative progress. Sz patients were recruited from the Clinical Hospital
Every such entity needs to be connected to every other Barros Luco Trudeau (CABL). The diagnosis of the Sz
in the universe of discourse some way, for a narrative to group was made by a mental health team according to
be coherent. A subset of these entities needs to be recur- Diagnostic and Statistical Manual of Mental Disorders,
rent, and such recurrence is subject to timing constraints: Fourth Edition (DSM-IV) criteria,18 the PANSS positive
making the temporal windows of recurrence too large will and negative symptom scale,19 and the guidelines of the
incur a processing cost, likely leading to perceived inco- Chilean national health guarantee program.20 CHR and
herence. Inspired by previous work using speech graphs Sz subjects were administered the Global Assessment
to identify FTD15 or to predict a diagnosis in individuals Functional scale (GAF, DSM-IV-TR21). Psychiatric
at high risk,16 we here also used graph theory, yet em- interviews ranged from 15 to 90 min (mean 52.5). All
ployed it to study differences in connectedness at the level patients received medication (for details, see Table S1 in
of the entities referenced by NPs. In the previous work, Supplementary Materials). Twenty control subjects were
words in connected speech are represented by nodes, selected from the ESECH (Estudio Sociolingüístico del
which are sequentially connected through directed edges. Español de Chile) study (Sociolinguistic Study of Chilean
In particular, in Mota et al.,16 the largest connected com- Spanish22). The length of the control interviews ranged
ponent (LCC) and the largest strongly connected compo- from 32 to 83 min (53.5 ± 10.2 min) with open-ended
nent (LSC) were calculated, both defined as the largest set questions. Some examples of questions during the clinical
of nodes directly or indirectly linked by some path, with interviews are: (1) What is the reason you are coming to
the difference that for LSC this path must be reciprocal the consultation? (2) What are you doing lately? Example
(that is, for any pair of nodes “a” and “b” in the path, “a” of questions in the control group include: (1) What have
reaches “b” and “b” reaches “a”). Note that while this you been up to lately? (2) Tell me if you have ever felt em-
measure captures a certain aspect of word recurrence, it barrassed about something that happened to you. Why? All
has no direct sensitivity to either conceptual or referential participants were native Chilean-Spanish speakers and
meaning. Graphs of Sz patients had less edges (E) and gave written consent for this study. Among the Sz group,
smaller LCC and LSC. An ad hoc disorganization index, eight subjects (40%) were hospitalized at the time of the
a linear combination of these connectedness variables study. Among the CHR participants, there were no hos-
(E, LCC, LSC, LSCz), was able to classify Sz and neg- pitalized subjects. Demographic and clinical data of the
ative symptomatology with accuracies exceeding 90%. sample are summarized in Table 1.
Although these connectedness variables do not attempt
to directly measure coherence or track its mechanisms,
Mota et al.16 reported a negative correlation (R = −0.4) Ethical Statement
between LSC and semantic incoherence. The Human Research Ethics Committee of the Faculty
We here approached connectedness in psychotic speech of Medicine of the University of Chile approved this
in a more hypothesis-driven way. We reasoned that speech study and its application protocol (DNI 163). Each pa-
graphs capturing the structure of referential meaning di- tient read and signed a written informed consent author-
rectly and the topology of the entities referenced over ized by the CEISH (parental consent).
S154
Coreference Delays in Psychotic Discourse

Table 1. Demographic and Clinical Characteristics of the Sample

CHR vs. Sz vs.


Comparison Sz vs. CHR Control Control

Group Sz CHR Control t P t P t P

N 20 20 20
Gender (%female) 40% 40% 40%
Age 35.7 ± 7.5 18.4 ± 2.8 32.6 ± 11.8 9.62 .000* −5.23 .000* 0.99 .330

Downloaded from https://academic.oup.com/schizophreniabulletin/article/49/Supplement_2/S153/7083526 by guest on 23 March 2023


Education (years) 12.4 ± 1.9 11.2 ± 1.7 15.1 ± 1.4 2.11 .042 −8.04 .000* −5.17 .000*
Clinical Scores
PANSS SIPS/SOPS
total 136.8 ± 15.6 general 7.7 ± 4.8
positive 32.4 ± 4.5 positive 11.7 ± 3.6
negative 35.4 ± 5.3 negative 16.2 ± 6.5
general 69.1 ± 8.0 disorganization 8.9 ± 4.7
GAF 32.9 ± 6.5 GAF 54.8 ± 11.5

Note: PANSS, Positive and Negative Syndrome Scale; SIPS, Structured Interview of Prodromal Syndromes; SOPS, Scale of Prodromal
Symptoms; SIPS/SOPS, Diagnoses of prodromal syndromes based on SIPS; GAF, Global Assessment of Functioning; Sz, schizo-
phrenia; CHR, clinical high risk.
*
Significance level < .05.

Speech Graph Analysis representation, the total number of nodes coincides with
All clinical interviews were recorded on audio and tran- the sum of recurrent and non-recurrent entities. Our
scribed by a team from LEPSI (Language, Psychosis and analysis proceeded from basic descriptive characteris-
Intersubjectivity, University of Chile) group. The inter- tics of the graphs (1), to topological measures relating
views of the ESECH corpus were already transcribed. these descriptive measures to narrative time (2), and fi-
All NPs were identified and annotated, as well as their nally measures of topological distance in all referential
relative position with respect to the consecutive words of chains (ie, sequences of recurrences to the same entity)
the speech. As in the work of Mota et al.,15,16 we adopted (3). Descriptive characteristics (1) were the number of
a speech graph approach, representing the speech pro- nodes, number of NPs (equal to the number of edges plus
duced by each individual as a graph composed by nodes one), number of recurrent (and non-recurrent) entities,
and directed edges. The nodes were the NPs, which allows and the average degree (number of connections through
us to study the temporal and topological structure of ref- edges with other nodes) of recurrent entities. The average
erential meaning. They are connected by edges, which degree of nonrecurrent entities was not calculated, since
represent the subsequent occurrence of these NPs in the these can be connected to maximally two nodes, and only
narrative. The number of words between nodes was cal- connected to one in case they occupy the first or the last
culated as a variable of interest (defining distance). As position of the graph. The normalized measures (2) in-
a surjective assignment of NPs to nodes, different NPs volved the density of NPs (number of NPs/ number of
can be assigned to the same node when the same entity words), which we broke down into the three elements
is co-referenced by these NPs. In other words, each node that make it up, namely the normalized numbers of re-
represents an entity present in the discourse, which can be current entities, of their recurrences, and of nonrecurrent
instantiated through different NPs. entities. The number of recurrences was measured in two
To make longer interviews comparable to shorter ones, ways, first as the average number of recurrences per en-
the interviews were limited to a maximum of 1000 words tity, then as the total number of recurrences throughout
(fully including the last utterance before reaching 1000 the speech. In group (3), we measured the production of
words). Figure 1 exemplifies a graph for a speech frag- NPs over narrative time. We calculated first the distance
ment. Each node in the graph represents a specific entity. between NPs as the number of words between consecu-
Recurring entities are the man and the woman. tive NPs.
Since we were mainly interested in the position of
the NPs corresponding to recurrent entities, we meas-
Graph-Theoretical Analysis ured the topological distance between recurrent NPs, de-
A distinction between recurrent and nonrecurrent entities fined as the number of nodes between a recurrent entity
was made, the former being those that appear at least node and the next node referring to the same entity, or
twice during the speech, while non-recurrent entities are equivalently, the number of edges in a directed path that
referenced only once. As can be deduced from the graph starts at a recurrent entity node and ends at the same

S155
C. Palominos et al

Downloaded from https://academic.oup.com/schizophreniabulletin/article/49/Supplement_2/S153/7083526 by guest on 23 March 2023


Fig. 1. Speech graph representation of a discourse.

(A) General speech graph representation of a discourse with 22 nodes and 30 edges. Letters represent entities as referenced by NPs (eg,
a man, the street, etc.). (B) A fragment of the same speech graph with 6 nodes and 7 edges: There was [a man][m] on [the street][st]
waiting for [someone][so]. Later, [a woman][w] met [him][m]. Although [they][th] went to [a cafe][ca], [she][w] seemed to be busy.
(C) Four edges in black (and three nodes in between) depicting a referential chain for the entity m. (D, E) Two depictions of referential
chains. After identifying re-appearances of the same entity, we calculate the topological distance as the number of edges in between
them: here there are four edges between a man and him (D), and four (different) edges between a woman and she (E). Nodes [m] and [w]
are divided in (D) and (E) only to visualize topological distances.

node. The final variables were obtained by calculating and (Honestly-significant-difference) HSD statistic are
the maximum and mean values of the topological dis- reported with the significance level of the HSD test set
tances of each entity (see Table 2 for a summary of these to.05. Only significant results are graphed. In addition, a
measures). sensitivity analysis considering different windows of nar-
rative time (as measured in number of words) was run.
Statistical Analysis Windows from 200 to 800 words (with increments of 100)
Basic descriptive characteristics were compared using were considered (with windows length equalized across
a Mann-Whitney U test. To compare the values of groups). For each window length, 20 random samples of
normalized and distance variables between groups, a that length were selected and we calculated the percentage
one-way ANCOVA test was run controlling for possible of these samples in which there were significant results be-
confounding variables as age and years of education, fol- tween groups. Details are reported in the Supplementary
lowed by a Tukey post hoc (simultaneous) pairwise com- Materials (Table S2). Next, a logistic regression was run to
parisons correcting for family-wise error rate. P-values distinguish between Sz and control groups based on only

S156
Coreference Delays in Psychotic Discourse

Table 2. List of Variables and Definitions

Basic Descriptive Characteristics

Number of nodes Total number of nodes of the speech graph


Number of NPs Total number of NPs along the speech (equal or greater than the number of nodes)
Number of recurrent entities Total number of recurrent entities during narrative time.
Number of nonrecurrent entities Total number of nonrecurrent entities (equal to the number of NPs that do not refer to a
recurrent entity).

Downloaded from https://academic.oup.com/schizophreniabulletin/article/49/Supplement_2/S153/7083526 by guest on 23 March 2023


Degree of recurrent entities Mean number of in-edges plus out-edges of nodes representing a recurrent entity.

Topological Measures Over Narrative Time


Density of NPs Number of NPs/number of words.
Density of recurrent entities Number of recurrent entities/number of words.
Density of nonrecurrent entities Number of nonrecurrent entities/number of words.
Average number of recurrences by entity Average number of recurrences (equivalent to number of times the entity was referred
through an NP, minus one)/number of entities.
Total number of recurrences by words Total number of recurrences (equivalent to the number of times all the entities together
were referenced through an NP, minus one)/total number of words.
Topological Distance Measures
Mean distance between NPs Mean distance (calculated as the number of words) between consecutive NPs.
Mean-mean entities distance Mean value of the mean topological distance (number of nodes) of each recurrent entity.
Max-mean entities distance Maximum value of the mean topological distance (number of nodes) of each recurrent
entity.
Mean-max entities distance Mean value of the maximum topological distance (number of nodes) of each recurrent
entity.
Max-max entities distance Maximum value of the maximum topological distance (number of nodes) of each recur-
rent entity.

Note: NPs, noun phrases.

three variables of interest: density of recurrent entities, entities, or (3) recurrences, or some combination by these
density of nonrecurrent entities, and average number of three. However, there were no significant group differ-
recurrences by entity, which together determine the den- ences in any of these numbers. On the other hand, when
sity of NPs; after which we added the mean distance be- considering the total number of recurrences by words,
tween NPs and max-max entities distance as a second step there were significant differences between Sz and controls
in the regression. Post-hoc, a preliminary and exploratory (Figure 2B). Furthermore, the effect of the higher density
analysis of semantic similarity was added to further elu- of NPs in Sz than controls was also reflected in a signifi-
cidate our topological distance results. For this, we used cantly smaller average distance between NPs in Sz (Figure
a fastText word embedding from Spanish Unannotated 2C). Although no significant differences in the mean dis-
Corpora.23 The embedding used contains 1 313 423 vec- tance between NPs were observed between Sz and CHR,
tors of dimension 300. All the words that matched the the mean distance was less in both Sz (4.7 words) and
embedding were included. In each case, the semantic sim- CHR (5.2 words) relative to controls (6.6 words).
ilarity was calculated as the cosine similarity of the vec- Furthermore, on average, distances between NPs refer-
tors for a moving window of 10 words. Subsequently, the ring to the same entity, counted in terms of the number
value was averaged across all windows. Results are pro- of nodes in between them, were larger in Sz than in
vided in Supplementary Figure S4. Statistical analysis was controls (Table 3 and Figure 2D and E). Figure 2D and
performed using Stata and Python (Python 3.9.4), using E show the difference in the mean-max and max-max
the SciPy and Sklearn libraries. The libraries Seaborn and distance between recurring entities, which indicates a
NetworkX were used to generate the graphics. widening of the temporal window between two links of
a referential chain when maximal values of these chains
Results are taken. Figure 2F shows the cumulative average dis-
tribution of distances between recurrent entities across
Results for group comparisons across all variables are groups. Interestingly, the control group had a larger pro-
summarized in Table 3 and Figure 2. There was a sig- portion of distances less than ten nodes, than both Sz
nificant difference in the density of NPs, with a higher and CHR. Thus, on average, 93% of the distances were
density in Sz than in both CHR and controls (Figure 2A). ten or fewer nodes in controls, while it was 86% and
This higher density of NPs in Sz could be caused by a 83% for Sz and CHR, respectively. This indicates that
higher number of (1) recurrent entities, (2) nonrecurrent for these last two groups, there is a higher proportion of

S157
C. Palominos et al

Table 3. Comparisons of Topological Measures Across Groups

Mean (SD)
CHR vs. Sz vs.
Descriptive Sz CHR Control Sz vs. CHR Control Control
Topological
Measures U P U P U P

Number of nodes 84.6 (32.1) 73.1 (21.7) 86.1 (37.4) 158 .188 130 .109 169.5 .385
Number of NPs 118.1 (38.2) 99.2 (26.9) 112.1 (37.8) 131 .050 133.5 .130 167.5 .363

Downloaded from https://academic.oup.com/schizophreniabulletin/article/49/Supplement_2/S153/7083526 by guest on 23 March 2023


Number of recurrent 9.9 (3.9) 8.3 (3.0) 9.3 (3.5) 158 .186 144 .209 170.5 .396
entities
Number of 74.8 (31.8) 64.7 (21.3) 76.8 (37.9) 166 .255 127.5 .096 170 .391
nonrecurrent entities
Average degree of 8.6 (2.8) 8.2 (4.6) 7.2 (1.9) 152.5 .149 163 .410 135.5 .099
recurrent entities

Mean (SD)
Normalized Topo- Mean HSD- Mean HSD- Mean HSD-
logical Measures Sz CHR Control Difference Test Difference Test Difference Test

Density of NPs 0.142 (0.037) 0.114 (0.03) 0.114 (0.038) 0.034 4.1747* 0.000 0.0472 0.034 4.1275a
Density of recurrent 0.012 (0.005) 0.010 (0.004) 0.009 (0.003) 0.003 2.7048 0.000 0.2432 0.003 2.9480
entities
Density of 0.090 (0.033) 0.075 (0.025) 0.078 (0.039) 0.021 2.7367 0.004 0.46371 0.017 2.2696
nonrecurrent entities
Average number of 4.3 (1.4) 4.2 (2.3) 3.7 (1.0) 0.138 0.3794 0.500 1.3706 0.638 1.7500
recurrences by entity
Total number of 0.052 (0.021) 0.039 (0.021) 0.036 (0.018) 0.014 2.8864 0.003 0.6721 0.017 3.5585a
recurrences by words

Mean (SD) Mean


Topological Differ- HSD- Mean HSD- Mean HSD-
Distance Measures Sz CHR Control ence Test Difference Test Difference Test

Mean distance be- 4.7 (0.8) 5.2 (0.8) 6.6 (2.6) 0.584 1.5356 1.399 3.6799a 1.983 5.2155a
tween NPs
Mean-mean entities 7.2 (4.1) 6.3 (4.2) 4.8 (4.0) 1.156 1.1239 1.555 1.5110 2.711 2.6349
distance
Max-mean entities 28.2 (23.4) 18.9 (16.2) 15.0 (16.6) 10.865 2.3461 3.896 0.8414 14.761 3.1874
distance
Mean-max entities 15.4 (9.7) 12.7 (8.7) 8.3 (6.7) 3.336 1.5932 4.454 2.1272 7.790 3.7204a
distance
Max-max entities 52.0 (27.0) 35.1 (23.1) 28.7 (24.1) 20.145 3.3095 6.383 1.0486 26.528 4.3581a
distance

Note: NPs, noun phrases; Sz, schizophrenia; CHR, clinical high risk; HSD, honestly-significant-difference.
a
HSD-test for significance level < 0.05.

long distances greater than ten nodes. A post hoc sensi- between NPs and max-max entities distance as variables
tivity analysis using different windows of words showed for the regression, the accuracy improved to 84.2% and the
that some significant differences between groups were ROC was 87.7. This same analysis allowed us to distinguish
already found in samples of 300 or more words, but also between CHR and controls with 83.8% accuracy, and be-
that some other topological measures required at least tween Sz and CHR with 74.4% accuracy. Finally, after ap-
800 words until differences could be observed between plying a 10-fold cross-validation in each comparison, the
groups (for details, see the Supplementary Table S2). latter accuracies changed as follows: 80% (Sz and controls),
In the logistic regression analysis, the accuracy of clas- 71.7% (CHR and controls), and 57.5% (Sz and CHR).
sifying Sz (compared to controls) based on three ana-
lytical variables (density of recurrent entities, density of Discussion
nonrecurrent entities, and average number of recurrences
by entity) was 63%. In a post hoc analysis, the area under This is the first study to target the referential structure
the curve for the receiver operating characteristics curve of meaning in psychotic discourse directly, using graph
(ROC) was 76.9. In a second step, adding the mean distance theory. Results confirmed that speakers with Sz deviate

S158
Coreference Delays in Psychotic Discourse

Downloaded from https://academic.oup.com/schizophreniabulletin/article/49/Supplement_2/S153/7083526 by guest on 23 March 2023

Fig. 2. (A) Density of NPs (number of NPs by number of words). (B) Total number of recurrences by number of words. (C) Mean
distance between NPs (distance in number of words). (D) Mean of max distances between recurrent entities (mean-max) (distance in
number of nodes). (E) Max of max distances between recurrent entities (max-max) (distance in number of nodes). (F) Cumulative
distribution of distances.

from controls as well as people at CHR in this partic- in Sz compared to controls. Put differently, not merely
ular dimension of meaning, confirming previous sugges- the NP-type, as shown in previous studies, but the timing
tions,8–10 yet substantiating them at the level of topological of NPs matters, to understand the language profile of Sz.
measures of distance between coreferential NPs. Results As noted, coreference is a necessary condition for co-
specifically showed that the Sz group produced more NPs herence and narrative of any kind: maintaining an en-
over narrative time than both CHR and control groups; tity in the computational workspace of language as a
and at the same time, there was a widening of the tem- discourse is generated, is fundamental. What we specifi-
poral window of coreference between recurring entities cally show here is that, in Sz, entities “linger around” for

S159
C. Palominos et al

longer. As referential chains are established, pairs of links entity can be more delayed in Sz, as shown when using
in these chains are intervened by the production of other NPs as nodes, while, at the same time, the recurrence to
NPs/entities, and mainly by recurrences of other entities, a certain word can be closer (in number of nodes that are
as can be inferred from the higher number of recurrences words), even considering the longest component (LSC).
and higher topological distances in Sz. Even though, by This apparent contradiction is explained by considering
contrast to these recurrences, the density of recurrent that speech graphs are constructed at different scales
and non-recurrent entities, and the average number of re- (using NPs versus words as nodes). It is possible that there
currences by entity, were not significant between groups, is a relationship between both measures and it remains to

Downloaded from https://academic.oup.com/schizophreniabulletin/article/49/Supplement_2/S153/7083526 by guest on 23 March 2023


the regression result showed that these three variables be discovered how they are related. In particular, wider
together allowed to distinguish the Sz group from con- temporal windows of coreference, as documented here,
trols, and accuracy improved substantially when distance could lead to less connected speech at the word level. It
measures were added. This pattern suggests that, in Sz, is also crucial to consider that the length of our speech
more referential chains involving different recurrent samples was significantly larger than in previous studies.
entities overlap, potentially causing an entanglement that Taking this into account, a post hoc sensitivity analysis
is a promising analytical candidate for understanding with different windows of words showed that some sig-
incoherence in Sz. The result serves to isolate a specific nificant differences between groups were already found
difference in how language production is cognitively when considering 300 words, and some other measures
controlled by speaker. Such cognitive control involves required at least 800 words until differences could be
interacting subsystems such as verbal working memory, observed between groups (see Supplementary Table S2
episodic memory (of what has been said and referenced), and Figures S1–S3). This indicates that the length of the
attention, and grammar. This invites further explorations speech is relevant to the results and has to be considered
at a neural level of how these subsystems connect with in future work.
each during language processing in Sz, giving rise to the Although we did not use automatized methods to
overall clinical impression of disorganized speech.2 identify referential chains, our methodology lends itself
We were not able to find significant results in this re- to exploration with current state-of-the-art computa-
gard in CHR as compared to controls, with the exception tional coreference-resolution tools (eg, Hugging Face24),
of the mean distance between NPs, where CHR differed which could also determine the distances between re-
from controls. However, in several cases (ie, density of re- current entities automatically. Apart from this, further
current entities, average number of recurrences by entity, research should go in at least in two directions. First,
total number of recurrences by words, and topological the measure of greater distance and the entanglement
distance measures), the means were in between those of of different referential chains referred to above could
Sz and control. Moreover, before correcting for multiple specifically explain the clinical phenomenon of derail-
comparisons there were significant differences between Sz ment, which is related to reference going astray. Second,
and CHR in the max-max distance (the maximum topo- this work has not explored yet why the nodes are or-
logical distance of all distances between recurrent entities) dered the way they are and what the direction of dis-
(P = .025), between CHR and controls in the mean-max course contributes to distinguishing groups. That is, to
distance (mean value of all maximum topological dis- what degree does the order matter, and how does the
tances between all recurrent entities) (P = .035), and be- accumulation of certain NPs determine the predicate re-
tween Sz and CHR in the total number of recurrences by lated to the entities already introduced? Ultimately this
words (P = .019), suggesting a possibility of using these is related to the irreversibility of the speech, something
variables to further discriminate between these three that was not addressed in the work of Mota et al.15,16
groups in larger samples. Additionally, a logistic regres- either, given that their measures do not take advantage
sion allowed us to distinguish between CHR and controls of the direction of the speech and, therefore, the graphs
with a 71.7% of accuracy after a 10-fold cross-validation. are completely reversible. Finally, we raise the question
It is interesting to note the differences and similarities of how the coreference delay in Sz relates to measures
between the max-max distance and the LSC used in Mota of semantic similarity using NLP tools, which purport
et al.,15,16 since both are topological measures, though in to index (in-)coherence. Previous works on this front
the latter works the graph nodes represent words. While have succeeded to classify psychosis based on lower se-
LSC considers the total number of nodes in the largest mantic similarity scores11,14 using vector representations
subset of nodes that are mutually reachable, the max- of words; yet a recent study found higher semantic simi-
max distance considers only subsets that start and end larity scores to characterize a first episode of Sz group.25
at the same node. This means that, by construction, the As noted in the introduction, semantic similarity at the
max-max distance is always less than or equal to the LSC lexical-conceptual level is a fundamentally different di-
(in the way the speech graphs are defined here). Contrary mension of meaning than the referential one targeted
to this previous work, we found that Sz had longer paths here, yet their relationships are of crucial interest for
than controls. This shows that the recurrence to a certain neurocognitive models of language processing in Sz.
S160
Coreference Delays in Psychotic Discourse

According to a preliminary analysis using a semantic References


similarity measure (for details, see Supplementary
1. Hinzen W, Sheehan M. The Philosophy of Universal Grammar.
Figure S4), there was no significant differences between Oxford: Oxford University Press; 2013.
groups in this measure. In principle, this could mean 2. Palaniyappan L. Dissecting the neurobiology of linguistic
that our variables are more sensitive, as they did dis- disorganisation and impoverishment in schizophrenia
criminate between groups. To take this issue further in [published online ahead of print, 2021 Sep 7]. Semin Cell
future work, a foundational hypothesis is needed that Dev Biol. 2021;S1084-9521(21)00235-4. doi:10.1016/j.
relates semantic similarity to our measures. This would semcdb.2021.08.015.

Downloaded from https://academic.oup.com/schizophreniabulletin/article/49/Supplement_2/S153/7083526 by guest on 23 March 2023


be a way of exploring how conceptual and referential 3. Hinzen W. Reference across pathologies: a new linguistic lens
on disorders of thought. Theor Linguist. 2017;43:3–4.
meaning interact.
4. McKenna PJ, Oh T. Schizophrenic Speech: Making Sense
of Bathroots and Ponds that Fall in Doorways. Cambridge:
Limitations Cambridge University Press; 2005.
5. Rochester S, Martin JR. Crazy Talk: A Study of the Discourse
Limitations of the present study include a limited sample of Schizophrenic Speakers. New York: Plenum Press; 1979.
size, and the inability to control for the effects of medi- 6. Andreasen NC, Grove WM. Thought, language, and commu-
cations. There also were age and education differences nication in schizophrenia: diagnosis and prognosis. Schizophr
between groups, but these are difficult to control for espe- Bull. 1986;12(3):348–359. doi:10.1093/schbul/12.3.348.
cially in a CHR sample, and were considered in the sta- 7. Sevilla G, Rosselló J, Salvador R, et al. Deficits in nominal ref-
tistical analyses. erence identify thought disordered speech in a narrative pro-
duction task. PLoS One. 2018;13(8):e0201545. doi:10.1371/
journal.pone.0201545.
Conclusion 8. Çokal D, Sevilla G, Jones WS, et al. The language profile
of formal thought disorder. NPJ Schizophr. 2018;4(1):18.
This study has been the first to study the structure of ref- doi:10.1038/s41537-018-0061-9.
erential meaning in psychotic discourse directly, using 9. Çokal D, Palominos-Flores C, Yalınçetin B, Türe O, Emre
graph theory. It adds to previous clinical findings of B, Hinzen W. Referential noun phrases distribute differently
referential anomalies, and linguistic studies of anom- in Turkish speakers with schizophrenia [published online
alous distributions of NPs, by reporting a difference ahead of print, 2022 Jul 21]. Schizophr Res. 2022;S0920-
9964(22)00259-6. doi:10.1016/j.schres.2022.06.024.
in their density and timing over narrative time. People
10. Tovar A, Schmeisser WS, Garí A, Morey C, Hinzen W.
with Sz manifest a difference in their cognitive control Language disintegration under conditions of severe formal
over the process of generating and maintaining refer- thought disorder. Glossa. 2019;4(1):1–24. doi: 10.5334/
ence, as is measurable in the distances between recurrent gjgl.720.
entities. This finding provides an explanatory target for 11. Bedi G, Carrillo F, Cecchi GA, et al. Automated analysis of
neurocognitive models of language processing in Sz, and free speech predicts psychosis onset in high-risk youths. NPJ
itself provides a possible mechanism for why psychotic Schizophr. 2015;1:15030. doi:10.1038/npjschz.2015.30.
discourse often appears less coherent. 12. Iter D, Yoon J, Jurafsky D. Automatic detection of incoherent
speech for diagnosing schizophrenia. In: Proceedings of the
Fifth Workshop on Computational Linguistics and Clinical
Psychology: From Keyboard to Clinic, June 2018. New Orleans,
Supplementary Material LA. Association for Computational Linguistic;2018:136–146
Supplementary material is available at https://academic. 13. Corcoran CM, Carrillo F, Fernández-Slezak D, et al.
Prediction of psychosis across protocols and risk cohorts
oup.com/schizophreniabulletin/. using automated language analysis. World Psychiatry.
2018;17(1):67–75. doi:10.1002/wps.20491.
14. Elvevåg B, Foltz PW, Rosenstein M, et al. Thoughts about
Funding disordered thinking: measuring and quantifying the laws
of order and disorder. Schizophr Bull. 2017;43(3):509–513.
This work was supported by grants SGR-1265 doi:10.1093/schbul/sbx040.
(Generalitat de Catalunya), PID2019-105241GB-I00/ 15. Mota NB, Vasconcelos NA, Lemos N, et al. Speech graphs
AEI/10.13039/501100011033, Ministerio de Ciencia, provide a quantitative measure of thought disorder in psych-
Innovación y Universidades (MCIU), and Agencia osis. PLoS One. 2012;7(4):e34928. doi:10.1371/journal.
Estatal de Investigación (AEI) (to WH); and by the pone.0034928.
Agencia Nacional de Investigación y Desarrollo Fondecyt 16. Mota NB, Copelli M, Ribeiro S. Thought disorder measured
program (grant number 11191122) (to AFB). as random speech structure classifies negative symptoms and
schizophrenia diagnosis 6 months in advance. NPJ Schizophr.
2017;3:18. doi:10.1038/s41537-017-0019-3.
17. McGlashan TH, Miller TJ, Woods SW, Hoffman R, Davidson
Acknowledgments L. Instrument for the assessment of prodromal symptoms
and states. In: Miller T, Mednick SA, McGlashan TH,
The authors have declared that there are no conflicts of Libiger J, Johannessen JO, eds. Early Intervention in Psychotic
interest in relation to the subject of this study. Disorders. NATO Science Series (Series D: Behavioural and

S161
C. Palominos et al

Social Sciences), vol 91. Dordrecht, Netherlands: Springer; 22. San Martín A, Guerrero S. Estudio Sociolingüístico del
2001. Español de Chile (ESECH): recogida y estratificación del
18. First MB, Gibbon M, Spitzer RL, Williams JBW, Benjamin corpus de Santiago. Boletín de filología 2015;50(1):221–247.
LS. Structured Clinical Interview for DSM-IV Axis II doi:10.4067/S0718-93032015000100009.
Personality Disorders (SCID-II). Washington, DC: American 23. Cañete J. Spanish Word Embeddings [Data set]. Zenodo;
Psychiatric Press, Inc.; 1997. 2019. doi:10.5281/zenodo.3255001.
19. Kay, SR, Opler, LA, Fiszbein, A. Manual of the Evaluation of 24. NeuralCoref 4.0. Coreference Resolution in spaCy with
Psychiatric Symptoms Used the Positive and Negative Syndrome Neural Networks. GitHub. https://github.com/huggingface/
Scale (PANSS). Tokyo: Seiwa Shoten Co., Ltd.; 1991. neuralcoref.

Downloaded from https://academic.oup.com/schizophreniabulletin/article/49/Supplement_2/S153/7083526 by guest on 23 March 2023


20. Ministerio de Salud. Guía Clínica: Para el tratamiento de 25. Alonso-Sánchez MF, Limongi R, Gati J, Palaniyappan L.
personas desde el primer episodio de Esquizofrenia. Santiago, Language network self-inhibition and semantic similarity
Chile: MINSAL; 2016. in first-episode schizophrenia: a computational-linguistic
21. American Psychiatric Association. Diagnostic and Statistical and effective connectivity approach [published online
Manual of Mental Disorders – Text Revision (DSM-IV-TR). ahead of print, May 11, 2022]. Schizophr Res. 2022:S0920–
Washington, DC: American Psychiatric Association; 2000. 9964(22)00160-8. doi:10.1016/j.schres.2022.04.007.

S162

View publication stats

You might also like