Professional Documents
Culture Documents
1 Introduction
In language comprehension, words are not processed in isolation. Rather, it
has been suggested that sentence processing is influenced by prediction.
Among other things, expectations are driven by knowledge of typical patterns
in language use and contextual information (Kuperberg and Jaeger 2016).
Consider, for example, the bigram vast majority: where vast appears, there
is a strong likelihood that majority will follow. Words that co-occur more
frequently in language use than would be expected by mere chance (under
the null hypothesis that their component words are independent) are called
collocations, and their component words are referred to as collocates. In the
corpus-linguistic and psycholinguistic communities, different metrics have
been put forward to quantify association strengths between collocates.
Certain metrics have predominantly served as tools in theoretical linguistics,
lexicography, and applied linguistics, whereas others have mainly been used
to model probabilistic language processing in information-theoretic terms
(Evert 2009; Hale 2016). However, there is no consensus on which of these
metrics are the most cognitively realistic, since, to the best of our knowledge,
they have never been pitted against each other in any online processing
experiment.
In order to test the cognitive realism of multiple available measures, we
present an online behavioral experiment involving collocation reading. One
hundred and ten native speakers of English participated in a self-paced reading
study designed to assess the predictive power of competing collocation metrics
in terms of reading times for the second word in collocated modifier–noun
bigrams like vast majority. Six of the most common corpus-linguistic association
scores – MI, MI3, Dice coefficient, T-score, Z-score, and log-likelihood (presented
in Section 1.1) – were pitted against the three following measures:
1) (log-transformed) forward transition probability is an information-theoretic
metric that can be considered the current gold standard in psycholinguistics
(Smith and Levy 2013);
2) (log-transformed) backward transition probability measures the likelihood
of the first word given the second and thus specifically excludes the possi-
bility of anticipatory processing, as it measures the likelihood of the first
word given the second (McCauley and Christiansen 2017);
3) (log-transformed) bigram frequency is a measure insensitive to collocation
strength but which has emerged as the method of choice in recent cogni-
tive linguistic work of usage-based inspiration (Christiansen and Arnon
2017).
It is worth noting that all of these association scores are bidirectional. In other
words, the scores do not describe if the first word is more predictive of the
second word or vice versa. A more in-depth mathematical comparison of the
metrics lies beyond the scope of the present paper, but excellent overviews can
be found in Evert (2009), Wiechmann (2008), Gries and Ellis (2015), and
Levshina (2015: Ch. 10).
Despite the assumed naturalistic nature of corpora, it is surprising that
evidence in support of the cognitive realism of corpus-derived association scores
has, so far, been relatively scarce and inconsistent. For example, Dąbrowska
(2014) showed that popular measures of association strength (Z-score, T-score,
bigram frequency, and MI) did not predict the choices participants made when
presented with sets of five bigrams and asked to pick the bigram that sounded
the most familiar to them. These findings contrast, however, with a series of
reading experiments by Ellis, Simpson‐Vlach, and Maynard (2008) focusing on
different facets of language processing (e.g. recognizing correct forms, accessing
pronunciation, comprehending in context), which found that raw frequency and
MI do correlate with native speakers’ accuracy and fluency. Durrant and Doherty
(2010) similarly report a significant influence of corpus-based association
metrics (T-score and MI) on overtly primed lexical decisions, but only on pairs
categorically binned as strongly collocated based on very high T-score and MI
values. Collocations with moderate or low T-score and MI values did not show
any priming effect. In a masked priming condition, only very strong collocates
that were additionally psychological associates (i.e. received a high score from a
task where participants listed the first three words to come to their mind when
shown the prompt) showed a priming effect.
One potential explanation for the restricted cognitive plausibility of associa-
tion scores is their bidirectionality, which glosses over the fact that mutual
expectations between words need not be symmetric. For example, in the collo-
cation bonsai tree, bonsai arguably attracts tree much more strongly than vice
versa (Levshina 2015: 224). As a result, bidirectional measures such as those
commonly calculated on the BNC might be cognitively unrealistic in light of
incremental language processing, whereby the first word of a collocation
becomes available before the second word. If, as many psycholinguists assume,
lexical processing is strongly supported by anticipatory processes, “forward-
looking” measures should have more bearing on online processing than bidir-
ectional measures. However, lexical processing cost may also (partly or fully) be
modulated by backward integration difficulty (i.e. the difficulty of connecting
words back to the prior discourse) without the need for prediction (Smith and
Levy 2013: 309–311). If this is the case, “backward-looking” measures should be
superior to forward-looking or bidirectional measures (Gries 2013).
Directionality is addressed in the current experiment through the two infor-
mation-theoretic metrics: log-transformed forward transition probability
between the first and the second word of a bigram (logForwardTP), often
referred to as “surprisal”, and log-transformed backward transition probability
(logBackwardTP), which gauges the probability of the first word of the bigram
given the second word. Both measures are relatively simple in computation;
their calculation includes only dividing the raw frequency of the whole bigram
by the raw frequency of its first word (in the case of logForwardTP) or by the raw
frequency of the second word (in the case of logBackwardTP). It may therefore
come as a surprise that over the last few years, logForwardTP has emerged as
the psycho- and neurolinguistic gold standard for measuring processing load in
lexical processing; less predictable words (in terms of logForwardTP) are more
difficult to process and elicit a “surprise” response in the brain, as has been
shown in a range of neurolinguistic (Frank et al. 2015) and behavioral reading
experiments (Boston et al. 2008; Demberg and Keller 2008; Frank and Bod 2011;
Levy 2008; Smith and Levy 2013; Frank 2013; Hale 2016; Lowder et al. 2018).
Although logBackwardTP has not received the same amount of scholarly atten-
tion, both adults and children have also been shown to be sensitive to this
metric in language processing (McCauley and Christiansen 2017).
A crucial conceptual similarity between corpus-linguistic association scores
and (forward/backward) transition probabilities is that they treat individual words
as the primitive building blocks between which tightness of association is com-
puted. However, this approach has been cast into doubt by certain cognitive
linguists, who have argued that processing units in the mental lexicon need not
be coextensive with the primitive units posited in theoretical and descriptive
approaches to language. They have put forward the alternative hypothesis
that the primitive building blocks of the mental lexicon are complex, unanalyzed
n-grams (i.e. sequences of n adjacent words or morphemes like don’t have to
worry, I don’t know, a state of emergency, at the same time). This suggestion
follows from the usage-based assumption that every single string encountered
in natural language use, no matter how complex, leaves a trace in memory. In this
view, repeated exposure to a string strengthens its memory trace, thereby reinfor-
cing its status as a primitive cognitive unit, with lexemes and morphemes repre-
senting derivative phenomena arising from partial overlap between different
strings (Abbot-Smith and Tomasello 2006; Bybee and McClelland 2005; O’Grady
2008; Gurevich et al. 2010; Hay and Baayen 2005; Bybee 2010; Blumenthal-Dramé
2016a; Blumenthal-Dramé et al. 2017; Siyanova-Chanturia et al. 2017).
The view that n-grams, rather than words or morphemes, represent the
natural units of language processing has received support from an increasing
number of neuro- and psycholinguistic studies attesting to positive correlations
between processing load and log-transformed n-gram frequency1 (Caldwell-
Harris and Morris 2008; Jiang and Nekrasova 2007; Tremblay and Baayen
2009; Tremblay and Tucker 2011; Siyanova-Chanturia 2015; Blumenthal-Dramé
2016a; Tremblay et al. 2011; Jacobs et al. 2016; Bannard and Lieven 2012). These
studies have exploited different tasks (self-paced reading, decision tasks, mem-
ory tests), technologies (pen-and-paper tasks, eye-tracking, brain mapping
methods such as electroencephalography and functional magnetic resonance
imaging), and dependent variables (response times [RTs], response accuracies,
eye movement measures, measures of neural activity). At least some of these
studies have shown that their results are not attributable to transition probabil-
ities between morphemes or individual lexical frequencies (Arnon and Cohen
Priva 2013; Arnon and Snider 2010; Bannard 2006). Thus, we consider bigram
frequency in addition to the corpus-derived association scores and the direc-
tional transition probability measures.
In summary, the abovementioned failure of corpus-derived association
scores to yield consistent correlations with cognitive tasks might be due to
theoretical premises that are not cognitively realistic, including but not neces-
sarily limited to the assumption of bidirectionality and the idea that individual
words are the building blocks of the mental lexicon.
1 In psycholinguistic research, raw frequencies are usually log-transformed, for two major
reasons: First, the usage frequencies of linguistic units follow a Zipfian distribution and log-
transformation reduces the risk of overly influential outlier stimuli exerting distorting effects
(see Levshina 2015: 64; Baayen 2008). Second, the relationship between raw frequency and
mental entrenchment is not one-to-one but follows a logarithmic scale. In other words, smaller
frequencies have a higher cognitive impact than higher frequencies, and the frequency differ-
ence between 1 and 10 is cognitively more relevant than the difference between 10,000 and
10,010 (Ellis 2002; Smith and Levy 2013).
prediction: that is, when self-paced reading trials were interleaved with trials in
which participants had to name pictures that were made more or less predictable
from auditorily presented sentence onsets (see also Gollan et al. 2011). By con-
trast, no effects of lexical predictability emerged when the naming task was kept
separate from the self-paced reading task.
An event-related potentials study by Wlotko and Federmeier (2015) showed
that when everything else was held constant, longer exposure times to words in
online reading boosted predictive processing, as suggested by increased seman-
tic integration problems for unexpected words. However, this effect was modu-
lated by the order of experimental conditions: when an experimental block
featuring long word exposure times (500 ms per word) preceded an experimental
block with shorter exposure times (250 ms), effects of semantic predictability
were found in both blocks. By contrast, when the order of experimental blocks
was reversed, no effects of semantic predictability were found in the block
involving speeded presentation times.
It is important to highlight that the two studies named above do not warrant
direct extrapolation to collocation processing, since they gauge lexical predic-
tivity in terms of cloze probability rather than corpus-derived association
strength. However, like studies on collocation processing, they are concerned
with the extent to which associative knowledge tied to lexical items is capita-
lized on in language processing.
We therefore decided to explore whether such task effects extend to the
processing of corpus-extracted collocations. More precisely, we decided to set
the stakes high and to test whether a minimal difference in task – answering
multiple-choice questions versus free response questions – can affect language
users’ online processing of collocations in self-paced reading. Multiple-choice
questions require subjects to choose a response from a set of provided options,
whereas free response formats require subjects to reconstruct knowledge. In L1
reading, free response formats are known to be significantly more difficult than
multiple-choice formats (measured, e.g. in terms of test performance in reading
tasks) (In’nami and Koizumi 2009; Rodriguez 2006). We therefore expected
word-by-word reading times to be higher in the free response condition than
in the multiple-choice condition.
However, we were agnostic as to potential effects of task differences on
collocation processing. On the one hand, increased word-by-word reading
times might be expected to boost associative processing, if they index an
increased involvement of the distributional knowledge stored along with
entries in the mental lexicon (see Wlotko and Federmeier 2015, above). On
the other hand, a stronger focus on individual words may disrupt the
2 Experimental design
2.1 Stimuli
Figure 1: Spearman correlation matrix between all collocation metrics tested in this study as
well as log-transformed frequencies for modifiers and nouns (henceforth: logModifierFreq and
logNounFreq) displayed using the R package corrplot version 0.84 (Wei and Simko 2017).
logForwardTP: Log-transformed forward transition probability; logBackwardTP: log-transformed
backward transition probability.
2.2 Participants
One hundred and twenty-three adult native speakers of English located in the
United Kingdom, Ireland, the United States, Canada, and Australia were recruited
via Prolific (http://www.prolific.ac/), a crowdsourcing platform, and paid for their
participation in the experiment. Of these, 13 were not considered in the analysis,
for different reasons: 9 participants did not meet the accuracy threshold of 80% in
the comprehension questions, 3 reported being early bilinguals, and 1 participant
attempted to complete the experiment on a mobile device. The remaining 110
participants ranged in age from 18 to 70 (median: 31, mean 34) and were 54%
female. Participants also self-reported education level (35% high school diploma
or less, 54% bachelor’s or associate degree, 11% master’s or higher).
Participants were instructed that they would be reading sentences one word at
a time and that pressing the spacebar would advance to the next word. They were
asked to read as normally as possible, not to rush, and to answer comprehension
questions as accurately as possible. They were also informed that they would not
receive feedback on their answers. In the free response block, they were told not to
worry about typing speed, capitalization, punctuation, or other typing errors.
Linear mixed effects models were fitted to log-transformed RTs with the package
lme4 from the statistical processing software R (version 3.5.1) (Bates et al. 2015).
The numerical predictor variables in all models were centered. p-Values for fixed
effects were calculated by means of the R-package lmerTest (Kuznetsova et al.
2017). Model comparison between non-nested models (all fitted to the same data
points) was always performed by comparing AIC (Akaike information criterion)
values; model comparison for nested models was done using the anova() func-
tion. For all reported models, model assumptions were checked on the basis of
diagnostic plots. A detailed script containing the entire analysis and the outcome
of all models is available as a supplementary file.
3.2 Preprocessing
Following standard psycholinguistic practice, raw RTs under 100 ms and above
2,000 ms were excluded as outliers. Moreover, log-transformed RTs falling out-
side of three standard deviations from each subject’s mean were rejected. This
resulted in 1.85% data loss. A baseline model was fitted to the remaining log-
transformed RTs. This model contained SENTENCE NUMBER and WORD LENGTH as
fixed effects and a by-SUBJECT intercept as a random effect. The rationale behind
fitting such a baseline model was to correct the response variable for effects
which are known to modulate RTs in self-paced reading experiments but are
independent of the experimental manipulation (Linzen and Jaeger 2015). The
per-word residual RTs of the baseline model are the corrected RTs that were used
for further analysis.2
2 To ensure that our results were not an artifact of residualizing (Wurm and Fisicaro 2014), we
conducted a follow-up analysis in which non-residualized log-transformed RTs were used in
3.3 Results
Based on these criteria, two models – those involving the collocation metrics
logBigramFreq and logBackwardTP – were retained for further analysis and
considered “provisional winning models”.4
It is interesting to note that two models satisfying criterion (1) did not meet
criterion (2): those containing logForwardTP and MI. In both of these models,
higher values in the association metrics correlate with significantly higher RTs.
In order to avoid drawing conclusions based on statistical significance that does
not derive from an informed hypothesis, we do not continue with the analysis of
these metrics at this stage. We return to this issue in the discussion.
Next, we tested whether adding an interaction by CONDITION to the two
provisional winning models significantly improves their fit. This turned out to be
the case: In both models, the collocation metric elicited a significantly weaker
facilitation in the typed condition.
both winning models presented in Tables 4 and 5, with the addition of word length and
(centered) trial order as fixed effects. The results from these models showed the same significant
patterns as the models with residualized RTs. See Section 1.6 of the Supplementary Material for
more detail.
3 Putting several collocation metrics into one model would not have worked for obvious
collinearity reasons (cf. Figure 1). Moreover, it would have defeated the purpose of identifying
which individual collocation metric is most cognitively realistic.
4 T-score, which produces a bigram ranking identical to logBigramFreq (cf. Section 2.1), did not
pass the Bonferroni-corrected threshold (p-value for the T-score coefficient = 0.0214).
This left us with two winning models containing a COLLOCATION METRIC and
5
CONDITION as fixed effects. Table 1 compares these two models in terms of AIC
2
and marginal R (computed using the function r.squaredGLMM() from the
MuMIn package) (Barton 2018). As the table shows, the model featuring
logBigramFreq shows slight AIC and R2 advantages (i.e. a lower AIC value and
a higher R2 value) over the model featuring logBackwardTP.
Table 1: Model comparison for the two models of best fit, all fitted to the same
data points (i.e. the corrected RTs for the critical words).
Tables 2 and 3 present the fixed-effects coefficients for the two models.
Table 2: Fixed effects of the linear mixed-effect model of best fit involving the
(centered) COLLOCATION METRIC logBigramFreq and the factor CONDITION (mul-
tiple choice vs. typed free response).
5 Effects of figurative versus literal meaning (main effects and interactions) were tested for
exploratory reasons but were not found in any winning model.
Table 3: Fixed effects of the linear mixed-effect model of best fit involving the
(centered) COLLOCATION METRIC logBackwardTP and the factor CONDITION
(multiple choice vs. typed free response).
Table 4: Fixed effects of the linear mixed-effect model of best fit involving the (centered)
COLLOCATION METRIC logBigramFreq, the factor CONDITION (multiple choice vs. typed free
response), and the factor POSITION (critical, spillover1, spillover2).
Note: The syntax for this model was lmer(corrected_RT ~ logBigramFreq*cond*position + (1|
participant) + (1|word) + (1|sentence)). The model is fitted to corrected log-transformed RTs.
Table 5: Fixed effects of the linear mixed-effect model of best fit involving the (centered)
COLLOCATION METRIC logBackwardTP, the factor CONDITION (multiple choice vs. typed free
response), and the factor POSITION (critical, spillover1, spillover2).
Note: The syntax for this model was lmer(corrected_RT ~ logBackwardTP*cond*position + (1|
participant) + (1|word) + (1|sentence)). The model is fitted to corrected log-transformed RTs.
Figure 2: Effects of logBigramFreq on corrected word-by-word RTs from the critical noun until
the second spillover word as a function of required answer format (multiple choice vs. typed
free response). For the model syntax and coefficients, cf. Table 4.
Figure 3: Effects of logBackwardTP on corrected word-by-word RTs from the critical noun until
the second spillover word as a function of required answer format (multiple choice vs. typed
free response). For the model syntax and coefficients, cf. Table 5.
Further complicating the picture, two metrics significantly predict RTs, but
in a direction that cannot be theoretically justified: MI and logForwardTP.
Higher values in these two metrics lead to significantly slower reading times
on the critical word. In the case of MI, it is generally well understood that the
metric has a strong low-frequency bias, as very few co-occurrences of two low-
frequency words can lead to a high MI score for the bigram (Evert 2009: 19). For
this reason, bigrams ranking high in MI tend to be highly specialized terms
which are likely to occur in a restricted range of contexts (e.g. epileptic fit,
ulterior motive, cosmetic surgery). By contrast, bigrams ranking low in MI are
made up of frequent words of everyday language (e.g. wild child, free time, tidy
room). The suspicion thus arises that a single-word frequency bias might under-
lie the unexpected result for MI, an intuition that is further supported by the fact
that the cubed variant MI3, which was designed to reduce the low-frequency
bias of MI, was not significant in any direction.
Might a single-word frequency bias also be driving the unexpected result
for logForwardTP? A closer look at Figure 1 confirms that both MI and
logForwardTP are strongly negatively correlated with logModifierFreq. That
is, higher values in these two association scores are correlated with lower
modifier frequencies (Pearson’s correlation between MI and logModifierFreq:
−0.69, p < 0.0001; Pearson’s correlation between logForwardTP and
logModifierFreq: −0.62, p < 0.0001).
As the stimuli consisted of collocated modifier–noun bigrams, the modifier
always directly preceded the critical word (the noun). It has generally been estab-
lished that word recognition effects may be delayed in self-paced reading as the
paradigm relies on a secondary, behavioral response, i.e. pressing the spacebar
(Smith and Levy 2013; Just et al. 1982). Thus, it is not unusual that effects on the first
spillover word reflect processing of the critical word, or effects on the critical word
reflect processing of the precritical word. The finding that both high MI and high
logForwardTP slow down RTs to the critical word, along with the fact that both
metrics correlate with low logModifierFreq, suggests that logModifierFreq might be
underlying the unpredicted effects of MI and logForwardTP.
The assumption that single-word frequency effects show up on the
following word would also have the potential to explain the patterns of
results found for the winning metrics: LogBigramFreq is positively correlated
with both logModifierFreq and logNounFreq (Pearson’s correlation between
logBigramFreq and logModifierFreq: 0.54, p < 0.0000; Pearson’s correlation
between logBigramFreq and logNounFreq: 0.42, p < 0.0000). Accordingly,
Figure 2 shows facilitation on both the critical word and the word immedi-
ately following the critical word (henceforth Spillover1). LogBackwardTP
is positively correlated with logModifierFreq (Pearson’s correlation: 0.38,
The comparison of different collocation metrics is not the only goal of the
current experiment. It also includes the variable of CONDITION: the difference
between multiple-choice and typed free response answer formats. As shown in
Section 3.3, both of the winning metrics (logBigramFreq and logBackwardTP)
provide better fit to the data when an interaction by condition is added. When
we explore the effect of this interaction by position (cf. Figures 2 and 3), we see
significantly faster reading times in the multiple-choice condition across both
models. This is in line with our prediction that typed free response formats are
more cognitively taxing than multiple-choice formats and thus elicit slower
processing. Second, we find that the effects of both collocation metrics on RTs
to critical words are stronger in the multiple-choice condition than in the typed
free response condition. Third, in the typed free response condition, the effect of
logBigramFreq is more sustained than in the multiple-choice condition, since it
carries over more strongly into the first spillover word (see Figure 2). By contrast,
in the typed response condition, the effect of logBigramFreq dissipates more
quickly after the critical word.
First, we found that logModifierFreq predicts RTs to the critical word better
than logNounFreq and either of the winning collocation metrics (logBigramFreq
and logBackwardTP). Similarly, logNounFreq predicted RTs on Spillover1 better
than logModifierFreq or either of the winning collocation metrics. See Section 2
of the Supplementary Materials for these (and the following) models and
coefficients.
We suspected that the unexpected effects of MI and logForwardTP (see
Sections 3.3 and 3.4.1) might disappear after factoring out previous word fre-
quency. To assess this, we first ran two separate models to partial out the effects
of previous word frequency from RTs to critical words and Spillover1, respec-
tively. The residuals from these two models were then joined into one data frame
and used as the new dependent variable (“cleanedlogRT”).
With the effect of previous word frequency removed from the residualized
RTs, we then ran two models testing whether MI and logForwardTP were still
predictive on the cleaned RTs. This was not the case (p = 0.9547 and p =
0.7898, respectively).
By contrast, two analog models involving logBigramFreq and logBackwardTP
showed that the metrics remain significant even after partialling out previous
word frequency (see Tables 6 and 7). Indeed, with previous word frequency
removed, the winning metrics now show an even more pronounced difference
by condition. In the multiple-choice condition, the metrics correlate with lower
RTs, but in the typed condition, this effect is reversed.
Table 6: Fixed effects of the linear mixed-effect model of best fit involving the (centered)
COLLOCATION METRIC logBigramFreq, the factor CONDITION (multiple choice vs. typed free
response), and the factor POSITION (critical, spillover1).
Note: The syntax for this model was: lmer(cleaned_RT ~ logBigramFreq*cond*position + (1|
participant) + (1|word) + (1|sentence)). The model is fitted to corrected log-transformed RTs,
after partialling out the effects of previous word frequency.
Table 7: Fixed effects of the linear mixed-effect model of best fit involving the (centered)
COLLOCATION METRIC logBackwardTP, the factor CONDITION (multiple choice vs. typed free
response), and the factor POSITION (critical, spillover1).
Note: The syntax for this model was: lmer(cleaned_RT ~ logBackwardTP*cond*position + (1|
participant) + (1|word) + (1|sentence)). The model is fitted to corrected log-transformed RTs,
after partialling out the effects of previous word frequency.
4 General discussion
The first and most straightforward conclusion is that traditional corpus-based
association measures were not cognitively realistic in predicting the reading
times of the second word in collocated modifier–noun bigrams. None of
the six bidirectional measures provided by the BNC (MI, MI3, Dice coefficient,
T-score, Z-score, and log-likelihood) correlated with a significant facilitation in
collocation processing. The fact that these metrics did not predict reading times
is in line with previous research such as Dąbrowska (2014), which showed that
native speaker intuitions about collocation status do not correspond to T-score,
Z-Score, or MI values (see Section 1.1).
Surprisingly, logForwardTP, also known as surprisal, did not predict RTs in
the expected direction. This result was unexpected due to the support previous
research has given the metric and the widely shared assumption that
logForwardTP captures forward-looking processes that are at play during online
language processing (see Section 1.1). However, this effect seems to be driven
largely by individual word frequencies, as it disappears when previous word
frequency is partialled out. It would be interesting to explore whether this result
is replicated by follow-up studies.
The two winning metrics identified in the present study include one metric
insensitive to the association strength between component words (logBigramFreq)
and one measure of association strength that is unidirectional from right to left
(logBackwardTP).
5 Conclusion
Overall, the results of the present study suggest caution when applying tradi-
tional corpus-linguistic collocation metrics to online language processing.
Measures like log-transformed bigram frequency may be more informative
than measures assuming words are the individual component parts combined
by associations of different strengths. On the other hand, individual word
frequencies are also critical, as they arise as the best predictor of reading times.
Access to collocational knowledge also appears to be task-contingent, with
slight modulations in experimental design having the potential to alter the
relationship between collocation metrics and the unfolding of effects across
the spillover region. We suggest that this task contingency may account for
the lack of convergence between earlier studies tracking the cognitive correlates
of corpus-derived collocations.
At the same time, the results highlight that we should refrain from jumping
to conclusions on the basis of single experiments: the predictivity of a given
metric in a certain modality (reading) and under certain task demands (e.g.
answering multiple-choice questions) need not generalize to other tasks or
modalities (e.g. typing free response answers).
Further research is underway to address collocation status independently of
individual word frequency through a design that allows for the direct compar-
ison of the usage frequency of the whole and usage frequency of each compo-
nent part. We hope this and other research on task pressures will contribute to
elucidating the open questions that remain.
References
Abbot-Smith, Kirsten & Michael Tomasello. 2006. Exemplar-learning and schematization in a
usage-based account of syntactic acquisition. The Linguistic Review 23(3). 275–290.
Aijmer, Karin & Bengt Altenberg. 2014. English corpus linguistics. New York & London:
Routledge.
Arnon, Inbal & Uriel Cohen Priva. 2013. More than words: The effect of multi-word frequency
and constituency on phonetic duration. Language and Speech 56(3). 349–371.
doi:10.1177/0023830913484891.
Arnon, Inbal & Neal Snider. 2010. More than words: Frequency effects for multi-word phrases.
Journal of Memory and Language 62(1). 67–82. doi:10.1016/j.jml.2009.09.005.
Baayen, R. Harald. 2008. Analyzing linguistic data: A practical introduction to statistics using R.
New York & Cambridge: Cambridge University Press.
Bannard, Colin 2006. Acquiring phrasal lexicons from corpora. University of Edinburgh dissertation.
Bannard, Colin & Elena Lieven. 2012. Formulaic language in L1 acquisition. Annual Review of
Applied Linguistics 32. 3–16. doi:10.1017/S0267190512000062.
Barton, Kamil 2018. MuMIn: Multi-Model Inference. https://CRAN.R-project.org/package=MuMIn.
Bates, Douglas, Martin Mächler, Ben Bolker & Steve Walker. 2015. Fitting linear mixed-effects
models using lme4. Journal of Statistical Software 67(1). 1–48. doi:10.18637/jss.v067.i01.
Biskup, Danuta. 1992. L1 influence on Learners’ renderings of english collocations: A Polish/
German empirical study. In Vocabulary and applied linguistics, 85–93. London: Palgrave
Macmillan. doi:10.1007/978-1-349-12396-4_8.
Blumenthal-Dramé, Alice. 2012. Entrenchment in usage-based theories: What corpus data do and do
not reveal about the mind (Topics in English Linguistics 83). Berlin: de Gruyter Mouton.
Blumenthal-Dramé, Alice. 2016a. 6. Entrenchment from a psycholinguistic and neurolinguistic
perspective. In Entrenchment and the psychology of language learning: How we reorganize
and adapt linguistic knowledge. Berlin, Boston: De Gruyter. doi:10.1515/9783110341423-007.
Blumenthal-Dramé, Alice 2016b. What corpus-based Cognitive Linguistics can and cannot
expect from neurolinguistics. Cognitive Linguistics 27(4). doi:10.1515/cog-2016-0062
Blumenthal-Dramé, Alice, Volkmar Glauche, Tobias Bormann, Cornelius Weiller, Mariacristina
Musso & Bernd Kortmann. 2017. Frequency and chunking in derived words: A parametric
fMRI study. Journal of Cognitive Neuroscience 29(7). 1162–1177. doi:10.1162/jocn_a_01120.
Blumenthal-Dramé, Alice & Evie Malaia. 2018. Shared neural and cognitive mechanisms in
action and language: The multiscale information transfer framework. Wiley
Interdisciplinary Reviews: Cognitive Science e1484. doi:10.1002/wcs.1484
Boston, Marisa, John Hale, Reinhold Kliegl, Umesh Patil & Shravan Vasishth. 2008. Parsing
costs as predictors of reading difficulty: An evaluation using the Potsdam Sentence
Corpus. Journal of Eye Movement Research 2(1). 1, 1–12.
Bybee, Joan. 2010. Language, usage and cognition. Cambridge; New York: Cambridge
University Press.
Bybee, Joan & James L. McClelland. 2005. Alternatives to the combinatorial paradigm of
linguistic theory based on domain general principles of human cognition. The Linguistic
Review 22(2–4). 381–410.
Caldwell-Harris, Catherine L. & Alison L. Morris. 2008. Fast Pairs: A visual word recognition
paradigm for measuring entrenchment, top-down effects, and subjective phenomenology.
Consciousness and Cognition 17(4). 1063–1081. doi:10.1016/j.concog.2008.09.004.
Carreiras, Manuel, Blair C. Armstrong, Manuel Perea & Ram Frost. 2014. The what, when, where,
and how of visual word recognition. Trends in Cognitive Sciences 18(2). 90–98.
doi:10.1016/j.tics.2013.11.005.
Chater, Nick & Morten H. Christiansen. 2018. Language acquisition as skill learning. Current
Opinion in Behavioral Sciences (The Evolution of Language) 21. 205–208. doi:10.1016/j.
cobeha.2018.04.001.
Christiansen, Morten H. & Inbal Arnon. 2017. More than words: The role of multiword sequences in
language learning and use. Topics in Cognitive Science 9(3). 542–551. doi:10.1111/tops.12274.
Christiansen, Morten H. & Nick Chater. 2016. The Now-or-Never bottleneck: A fundamental con-
straint on language. Behavioral and Brain Sciences 39. doi:10.1017/S0140525X1500031X.
Clark, Andy. 2013. Whatever next? Predictive brains, situated agents, and the future of cognitive
science. Behavioral and Brain Sciences 36(03). 181–204. doi:10.1017/S0140525X12000477.
Clark, Andy. 2016. Surfing uncertainty: Prediction, action, and the embodied mind. New York:
Oxford University Press.
Conklin, Kathy & Norbert Schmitt. 2012. The processing of formulaic language. Annual Review
of Applied Linguistics 32. 45–61. doi:10.1017/S0267190512000074.
Croft, William. 2001. Radical construction grammar: Syntactic theory in typological perspective.
New York: Oxford University Press.
Dąbrowska, Ewa. 2014. Words that go together: Measuring individual differences in native
speakers’ knowledge of collocations. The Mental Lexicon 9(3). 401–418.
doi:10.1075/ml.9.3.02dab.
Demberg, Vera & Frank Keller. 2008. Data from eye-tracking corpora as evidence for
theories of syntactic processing complexity. Cognition 109(2). 193–210.
doi:10.1016/j.cognition.2008.07.008.
Deuter, Margaret, James Greenan, Joseph Noble, Janet Phillips & Diana Lea. 2002. Oxford
collocations dictionary. Oxford: Oxford University Press.
Drummond, Alex 2016. Ibex Farm. http://spellout.net/ibexfarm/.
Durrant, Philip & Alice Doherty 2010. Are high-frequency collocations psychologically real?
Investigating the thesis of collocational priming. Corpus Linguistics and Linguistic Theory
6(2). doi:10.1515/cllt.2010.006
Ellis, Nick C. 2002. Frequency effects in language processing: A review with implications for
theories of implicit and explicit language acquisition. Studies in Second Language
Acquisition 24(2). 143–188. doi:10.1017/S0272263102002024.
Ellis, Nick C., Rita Simpson-Vlach & Carson Maynard. 2008. Formulaic language in native and
second language speakers: Psycholinguistics, corpus linguistics, and TESOL. TESOL
Quarterly 42(3). 375–396. doi:10.1002/j.1545-7249.2008.tb00137.x.
Evert, Stefan. 2009. Corpora and collocations. In Anke Lüdeling & Merja Kytö (eds.), Corpus
linguistics: An international handbook, vol. 2. 1212–1248. Berlin, New York: Mouton de Gruyter.
Frank, Stefan L. 2013. Uncertainty reduction as a measure of cognitive load in sentence
comprehension. Topics in Cognitive Science 5(3). 475–494. doi:10.1111/tops.12025.
Frank, Stefan L. & Rens Bod. 2011. Insensitivity of the human sentence-processing system to
hierarchical structure. Psychological Science 22(6). 829–834. doi:10.1177/0956797611409589.
Frank, Stefan L., Leun J. Otten, Giulia Galli & Gabriella Vigliocco. 2015. The ERP response to the
amount of information conveyed by words in sentences. Brain and Language 140. 1–11.
doi:10.1016/j.bandl.2014.10.006.
Gollan, Tamar H., Timothy J. Slattery, Diane Goldenberg, Eva Van Assche, Wouter Duyck & Keith
Rayner. 2011. Frequency drives lexical access in reading but not in speaking: The
Linzen, Tal & T. Florian Jaeger 2015. Uncertainty and expectation in sentence processing: Evidence
from subcategorization distributions. Cognitive Science 40(6). doi:10.1111/cogs.12274
Lowder, Matthew W., Wonil Choi, Fernanda Ferreira & John M. Henderson. 2018. Lexical
predictability during natural reading: Effects of surprisal and entropy reduction. Cognitive
Science doi:10.1111/cogs.12597.
Martyńska, Małgorzata. 2004. Do English language learners know collocations? Investigationes
Linguisticae 11. 1–12. doi:10.14746/il.2004.11.4.
McCauley, Stewart M. & Morten H. Christiansen. 2017. Computational investigations of multi-
word chunks in language learning. Topics in Cognitive Science 9(3). 637–652.
doi:10.1111/tops.12258.
O’Grady, William. 2008. The emergentist program. Lingua 118(4). 447–464. doi:10.1016/j.
lingua.2006.12.001.
Payne, Brennan R. & Kara D. Federmeier. 2017. Pace yourself: Intraindividual variability in
context use revealed by self-paced event-related brain potentials. Journal of Cognitive
Neuroscience 29(5). 837–854. doi:10.1162/jocn_a_01090.
Rodriguez, Michael C. 2006. Construct equivalence of multiple-choice and constructed-
response items: A random effects synthesis of correlations. Journal of Educational
Measurement 40(2). 163–184. doi:10.1111/j.1745-3984.2003.tb01102.x.
Siyanova, Anna & Norbert Schmitt. 2008. L2 learner production and processing of collocation: A
multi-study perspective. Canadian Modern Language Review doi:10.3138/cmlr.64.3.429.
Siyanova-Chanturia, Anna 2015. On the ‘holistic’ nature of formulaic language. Corpus
Linguistics and Linguistic Theory 0(0). doi:10.1515/cllt-2014-0016
Siyanova-Chanturia, Anna, Kathy Conklin, Sendy Caffarra, Edith Kaan & Walter J. B. van Heuven.
2017. Representation and processing of multi-word expressions in the brain. Brain and
Language 175. 111–122. doi:10.1016/j.bandl.2017.10.004.
Smith, Nathaniel J. & Roger Levy. 2013. The effect of word predictability on reading time is
logarithmic. Cognition 128(3). 302–319. doi:10.1016/j.cognition.2013.02.013.
Tremblay, Antoine & Harald Baayen. 2009. Holistic processing of regular four-word sequences.
Perspectives on Formulaic Language in Acquisition and Production. London and New York:
Continuum.
Tremblay, Antoine, Bruce Derwing, Gary Libben & Chris Westbury. 2011. Processing advantages of
lexical bundles: Evidence from self-paced reading and sentence recall tasks: Lexical bundle
processing. Language Learning 61(2). 569–613. doi:10.1111/j.1467-9922.2010.00622.x.
Tremblay, Antoine & Benjamin V. Tucker. 2011. The effects of N-gram probabilistic measures on
the recognition and production of four-word sequences. The Mental Lexicon 6(2). 302–324.
doi:10.1075/ml.6.2.04tre.
Wei, Taiyun & Viliam Simko. 2017. R package “corrplot”: Visualization of a correlation matrix.
https://github.com/taiyun/corrplot.
Wiechmann, Daniel 2008. On the computation of collostruction strength: Testing measures of
association as expressions of lexical bias. Corpus Linguistics and Linguistic Theory 4(2).
doi:10.1515/CLLT.2008.011
Wlotko, Edward W. & Kara D. Federmeier. 2015. Time for prediction? The effect of presentation
rate on predictive sentence comprehension during word-by-word reading. Cortex 68.
20–32. doi:10.1016/j.cortex.2015.03.014.
Wurm, Lee H. & Sebastiano A. Fisicaro. 2014. What residualizing predictors in regression
analyses does (and what it does not do). Journal of Memory and Language 72. 37–48.
doi:10.1016/j.jml.2013.12.003.
Full sentence Word MI MI Z-score T-score Log-likelihood Dice FTP BTP ModFreq NounFreq Bigram-
Freq
Katy was Accents . . . . . . . . , ,
surrounded by
foreign accents
on the train.
There existed a Argument . . . . . . . . , ,
strong
argument
against the bill.
Emma discerned Attitude . . . . . . . . , ,
the bad attitude
of her client.
Tanner purchased Band . . . . . . . . ,
Kyla McConnell and Alice Blumenthal-Dramé
Authenticated
Download Date | 11/30/19 5:21 PM
Brought to you by | Göteborg University - University of Gothenburg
(continued )
Table A1: (continued )
Full sentence Word MI MI Z-score T-score Log-likelihood Dice FTP BTP ModFreq NounFreq Bigram-
Freq
Ava documented Bird . . . . . . . . ,
the majestic
bird in her
journal.
The moldy bread Bread . . . . . . . . ,
was thrown
away.
Ryan chose a fast Car . . . . . . . . , ,
car at the
dealership.
Phoebe started a Chat . . . . . . . . ,
brief chat with
the postman.
Bentley mentioned Child . . . . . . . . , ,
the wild child
and his mother.
This vicious circle Circle . . . . ,. . . . ,
seems
unbreakable
sometimes.
Due to the Circumstances . . . . . . . . ,
mitigating
circumstances
Sarah was
released.
Today civilian Clothes . . . . . . . . , ,
clothes are
being washed.
Online collocation reading
(continued )
Authenticated
Download Date | 11/30/19 5:21 PM
Brought to you by | Göteborg University - University of Gothenburg
31
Table A1: (continued )
32
Full sentence Word MI MI Z-score T-score Log-likelihood Dice FTP BTP ModFreq NounFreq Bigram-
Freq
Kyra always had Coffee . . . . . . . . ,
decaffeinated
coffee with her
toast.
Last year provided Conditions . . . . . . . . , ,
favorable
conditions for
job creation.
Robert listened to Conscience . . . . . . . . , ,
his guilty
conscience
when making
decisions.
Tarek’s firm Conviction . . . . . . . . , ,
conviction
persuaded the
politicians.
Kyla McConnell and Alice Blumenthal-Dramé
Ty commented on Crime . . . . . . . . ,
the petty crime
plaguing the
city.
Gregory predicted Danger . . . . . . . . ,
the grave
danger
associated with
lead.
Connor was Deal . . ,. . ,. . . . , , ,
informed about
the great deal
Authenticated
Download Date | 11/30/19 5:21 PM
Brought to you by | Göteborg University - University of Gothenburg
on designer
jeans.
(continued )
Table A1: (continued )
Full sentence Word MI MI Z-score T-score Log-likelihood Dice FTP BTP ModFreq NounFreq Bigram-
Freq
Barbara Debate . . . . . . . . ,
remembered
the heated
debate at the
meeting.
Tris watched the Defeat . . . . . . . . ,
crushing defeat
unfold on TV.
Trevor was Depths . . . . . . . . ,
fascinated by
the murky
depths of the
ocean.
Something about Dessert . . . . . . . . ,
the rich dessert
made David ill.
Brenda prioritized Diet . . . . ,. . . . , ,
a balanced diet
and regular
exercise.
Rosa spoke about Driving . . . . ,. . . . ,
reckless driving
to the kids.
That is a prime Example . . . . . . . . , ,
example of
Renaissance art.
A characteristic Feature . . . . . . . . , ,
Online collocation reading
feature defined
Larry’s face.
Authenticated
Download Date | 11/30/19 5:21 PM
Brought to you by | Göteborg University - University of Gothenburg
(continued )
33
Table A1: (continued )
34
Full sentence Word MI MI Z-score T-score Log-likelihood Dice FTP BTP ModFreq NounFreq Bigram-
Freq
It is well known Feet . . . . . . . . ,
that itchy feet
drive people
crazy.
Ahmad’s son Fit . . . . . . . . ,
anticipated the
epileptic fit
before it
happened.
Mohammed took Food . . . . ,. . . . , ,
note of the
good food at
the pub.
Kevin envied the Fortune . . . . . . . . , ,
small fortune
his brother
inherited.
Kyla McConnell and Alice Blumenthal-Dramé
Lyssa spotted her Friend . . . . ,. . . . , ,
close friend in
the crowd.
Luca smelled the Fruit . . . . . . . . ,
rotten fruit on
the counter.
The kid played on Grass . . . . . . . . , ,
the green grass
near the school.
Clarissa observed Growth . . . . . . . . ,
the stunted
growth of the
plant.
Authenticated
Download Date | 11/30/19 5:21 PM
Brought to you by | Göteborg University - University of Gothenburg
(continued )
Table A1: (continued )
Full sentence Word MI MI Z-score T-score Log-likelihood Dice FTP BTP ModFreq NounFreq Bigram-
Freq
Sage’s tiresome Habit . . . . . . . . ,
habit quickly
became
annoying.
Katrina discovered Hair . . . . ,. . . . ,
blonde hair in
the bathroom.
Clara offered a Hand . . . . ,. . . ,
helping hand to
the workers.
Bartholomew House . . . . . . . . , ,
praised the
magnificent
house and its
owners.
It was not a bright Idea . . . . . . . . , ,
idea to visit
Crystal.
Chris was aware of Illness . . . . . . . . ,
the debilitating
illness and its
consequences.
Floyd criticized the Impact . . . . . . . . , ,
direct impact of
the pollution.
The object’s vital Importance . . . . ,. . . . , ,
importance
Online collocation reading
cannot be
overstated.
Authenticated
Download Date | 11/30/19 5:21 PM
Brought to you by | Göteborg University - University of Gothenburg
(continued )
35
Table A1: (continued )
36
Full sentence Word MI MI Z-score T-score Log-likelihood Dice FTP BTP ModFreq NounFreq Bigram-
Freq
Henry grabbed the Instrument . . . . . . . . ,
blunt
instrument and
appraised it.
Laura knew about Interest . . . . ,. . . . ,
the
government’s
vested interest
in the change.
Carla was a born Leader . . . . . . . . ,
leader her
teachers said.
Courtney Living . . . . . . . . ,
acknowledged
that communal
living had many
benefits.
Kyla McConnell and Alice Blumenthal-Dramé
Brianna Love . . . . . . . . ,
understood that
unrequited love
could be
painful.
Paul had a light Lunch . . . . . . . . , ,
lunch before
the interview.
Saul realized that Majority . . ,. . ,. . . . , ,
the vast
majority had
voted
Authenticated
Download Date | 11/30/19 5:21 PM
Brought to you by | Göteborg University - University of Gothenburg
incorrectly.
(continued )
Table A1: (continued )
Full sentence Word MI MI Z-score T-score Log-likelihood Dice FTP BTP ModFreq NounFreq Bigram-
Freq
Louise Man . . . . ,. . . . , , ,
complimented
the old man in
her
neighborhood.
Nellie pinpointed Motive . . ,. . . . . . ,
the ulterior
motive of the
banker.
Tess ensured the Music . . . . ,. . . . , ,
classical music
was showcased
correctly.
Achim dismissed Notion . . . . . . . . ,
the
preconceived
notion with a
sigh.
Christian wrote Occasion . . . . . . . . ,
about the
auspicious
occasion in his
memoir.
Priya found the Officer . . . . . . . . ,
commissioned
officer sitting
around outside.
Online collocation reading
(continued )
Authenticated
Download Date | 11/30/19 5:21 PM
Brought to you by | Göteborg University - University of Gothenburg
37
Table A1: (continued )
38
Full sentence Word MI MI Z-score T-score Log-likelihood Dice FTP BTP ModFreq NounFreq Bigram-
Freq
The senior officials Officials . . . . ,. . . . , ,
ultimately
decided
everything.
Vladimir Pace . . . . . . . . ,
maintained a
brisk pace
throughout the
walk.
Philippa Pain . . . . . . . . ,
experienced
excruciating
pain in her legs.
The business was People . . . . . . . . , ,
based on
common people
and their
Kyla McConnell and Alice Blumenthal-Dramé
desires.
Pablo regularly Principles . . . . . . . . , ,
tested the
moral principles
of his
employees.
Clemence was Protest . . . . . . . . , ,
intrigued by the
peaceful protest
in the capital.
(continued )
Authenticated
Download Date | 11/30/19 5:21 PM
Brought to you by | Göteborg University - University of Gothenburg
Table A1: (continued )
Full sentence Word MI MI Z-score T-score Log-likelihood Dice FTP BTP ModFreq NounFreq Bigram-
Freq
Hank noticed the Rain . . . . . . . . ,
pouring rain
and stayed
inside.
Chandler lost the Reader . . . . . . . . ,
avid reader in
the library.
Heather adjusted Reality . . . . . . . . , ,
to the harsh
reality after the
war.
The group thought Rights . . . . ,. . . . , , ,
that human
rights were very
important.
Brian examined Room . . . . . . . . ,
the tidy room
and was
satisfied.
Neveah took in the Scenery . . . . . . . . ,
incredible
scenery all
around her.
Charlie gave a Service . . . . ,. . . . , ,
speech about
military service
in the eighties.
Online collocation reading
Redmond Shame . . ,. . . . . . ,
considered it a
crying shame to
Authenticated
Download Date | 11/30/19 5:21 PM
Brought to you by | Göteborg University - University of Gothenburg
be poor.
39
(continued )
Table A1: (continued )
40
Full sentence Word MI MI Z-score T-score Log-likelihood Dice FTP BTP ModFreq NounFreq Bigram-
Freq
Ginny made sure Share . . . . ,. . . . , ,
that a fair share
was allocated
today.
Benny figured a Shower . . . . . . . . , ,
quick shower
would be nice.
Alaina followed the Smell . . . . . . . . ,
putrid smell to
the kitchen.
Ahmed checked Soil . . . . . . . . ,
the fertile soil
for invasive
insects.
Gabe is a brave Soul . . . . . . . . , ,
soul for going
skydiving.
Taylor reacted to Speed . . . . ,. . . . , ,
Kyla McConnell and Alice Blumenthal-Dramé
Authenticated
Download Date | 11/30/19 5:21 PM
Brought to you by | Göteborg University - University of Gothenburg
(continued )
Table A1: (continued )
Full sentence Word MI MI Z-score T-score Log-likelihood Dice FTP BTP ModFreq NounFreq Bigram-
Freq
Sherry had Surgery . . . . . . . . ,
cosmetic
surgery done
too often.
Brendon got a Team . . . . . . . . ,
glimpse of the
losing team
before they left.
Dominic rejoiced Time . . . . . . . . , ,
about the free
time he now
had.
Carl sensed that Time . . . . . . . . , ,
precious time
was running
out.
Shelly was Times . . . . . . . . , ,
interested in
ancient times
and faraway
lands.
Tom heard the Traffic . . . . . . . . , ,
heavy traffic
from his
window.
Hal questioned the Victory . . . . . . . . , ,
narrow victory
Online collocation reading
of his
opponent.
Authenticated
Download Date | 11/30/19 5:21 PM
Brought to you by | Göteborg University - University of Gothenburg
(continued )
41
Table A1: (continued )
42
Full sentence Word MI MI Z-score T-score Log-likelihood Dice FTP BTP ModFreq NounFreq Bigram-
Freq
Bella’s loud voice Voice . . . . . . . . , ,
carried the
choir.
Matthew felt the Water . . . . . . . . ,
tepid water with
his toe.
Sam recalled the Winter . . . . . . . . , ,
mild winter
three years ago.
Gertrude passed Youth . . . . . . . . ,
by a
disillusioned
youth on the
corner.
Katy was Accents . . . . . . . . , ,
surrounded by
foreign accents
on the train.
Kyla McConnell and Alice Blumenthal-Dramé
Range .– .– .– .– .– .– .– .– – – –
. . ,. . ,. . . . , , ,
Mean . . . . ,. . . . ,. ,. .
Standard deviation . . . . ,. . . . ,. ,. .
Note: For greater ease of readability, the five last columns of this table are not log-transformed. FTP: Forward transition probability; BTP: backward
transition probability; ModFreq: modifier frequency; NounFreq: noun frequency; BigramFreq: bigram frequency.
Authenticated
Download Date | 11/30/19 5:21 PM
Brought to you by | Göteborg University - University of Gothenburg
Online collocation reading 43
Figure A1: Scatterplot representing the relation between log-transformed bigram frequency
(logBigramFreq) and log-transformed backward transition probability (logBackwardTP) in the 91
collocations used in the present study (Pearson’s correlation: 0.7029, p < 0.0000).
Supplementary Material: The online version of this article offers supplementary material
(https://doi.org/10.1515/cllt-2018-0030).
Bionotes
Kyla McConnell
Kyla McConnell is a Ph.D. candidate in Linguistics at the English Department of the University of
Freiburg (Germany). Her primary research interests center around her ongoing dissertation
“Individual Differences and Task Effects in Predictive Coding”. This research focuses on topics
such as the extent to which quantitative and corpus-derived variables can reflect the cognition
of individual speakers and how speaker- and task-based variables can modulate language
processing. In this, she works with various psycho- and neurolinguistic experimental
paradigms and statistical methods to align large-scale data with real-time language
comprehension.
She previously studied English Language and Linguistics at the University of Freiburg
(Germany), and Hispanic Linguistics and German Language and Literature at the University of
North Carolina at Chapel Hill (USA).
Alice Blumenthal-Dramé
Her publications exploit behavioral and functional neuroimaging methods to explore the
extent to which statistical generalizations across “big data” (notably, large-scale text corpora
and databases derived from such corpora) have the potential to offer realistic insights into
language users’ cognition. Major motivations behind this research have been: (1) to put to the
test the cognitive reality of cognitive linguistic assumptions, and (2) to gain a better
understanding of the size and nature of the cognitive building blocks that are utilized in natural
language use.
Further research interests include morphological theories, psycholinguistic models, Gestalt
psychology, usage-based linguistics, language typology, and statistical methods.