You are on page 1of 12

28

Formulaicity
Fanny Forsberg Lundell

Introduction
One of the major contributions of corpus linguistics has been the discovery that collocations and
other types of multi-word combinations are ubiquitous in natural language. Sinclair (1991) intro-
duced the “idiom principle” to refer to the fact that “a language user has available to him or her
a large number of semi-preconstructed phrases that constitute single choices, even though they
might appear to be analyzable into segments. To some extent, this may reflect the recurrence of
similar situations in human affairs; it may illustrate a natural tendency to economy of effort; or
it may be motivated in part by the exigencies of real-time conversation” (Sinclair, 1991, p. 110).
Erman and Warren (2000) were among the first to quantify proportions of prefabricated language
in corpora (both in a spoken and a written corpus of English) and demonstrated that 58.6% of
the spoken productions was composed of preconstructed multi-word combinations, compared
to 52.3% of the written productions. These figures suggest that prefabricated language is highly
present in both spoken and written language. Another important characteristic of prefabricated
language is that it is often idiomatic or language specific, a phenomenon also referred to as
‘nativelike selection’ (cf. Pawley & Syder, 1983; Warren, 2005, for the notion of ‘idiomaticity’).
For example, English uses the word combination have a beer, whereas French uses prendre une
bière [‘take a beer’]. This inherent difficulty for the L2 learner is one of the reasons why corpus
linguists have become increasingly interested in prefabricated language and often use learner
corpora to examine learners’ use and development of multi-word combinations, most particularly
collocations and lexical bundles (see Paquot & Granger, 2012, and Ellis, Simpson-Vlach, Römer,
O’Donnell, & Wulff, 2015 for overviews).
Language learning theories have also increasingly emphasized the role of prefabricated or
formulaic language. Both first language and SLA scholars are interested in whether language
learning is rule-based or whether language is acquired through chunks. Usage-based models of
language acquisition (i.e., models that rely on the principle that language is acquired through use
and exposure to natural language) reserve a key role for formulaicity and chunks (see e.g., Ellis,
O’Donnell & Römer, 2016; Chapter 13, this volume). Research has also shown that an L2 user’s
problems with formulaic language have different implications than problems with a grammati-
cal feature or the pronunciation of a phoneme. As Wray (2002) proposed, formulaic language is
linked to processing advantages on both a cognitive and a social level. By conveying a message
in a conventional manner, the speaker signals familiarity with, and a sense of belonging to, a

370
Formulaicity

specific linguistic community. As such, the use of formulaic language is a means of conforming
to social norms and expectations and is accordingly of high importance for the L2 learner.
In the field of SLA, work on formulaicity can be divided into two main strands:

1) Research on L2 learners’ ability to acquire formulaicity in a target language (i.e., nativelike


selection), often with a focus on learners’ attempts at using targetlike forms such as take a
beer instead of have a beer;
2) Research on L2-specific formulaicity (i.e., the non-analyzed chunks mostly visible in begin-
ner learner production) and its contribution to the acquisition of grammatical rules (e.g., the
use of je voudrais [‘I would like’] in oral production which is otherwise characterized by
present tense only and no other forms in the conditional).

These diverging strands have created an ambiguity with respect to terminology, because what is
considered formulaic language in these two strands overlap at times, but this is not always the
case.
Importantly, unlike in learner corpus research, in the SLA literature, the focus has often been
not so much on recurrent multi-word combinations such as collocations and lexical bundles than
on the broader category of formulaic sequences, i.e. “sequence[s], continuous or discontinu-
ous, of words or other elements, which [are], or appear to be, prefabricated: that is, stored and
retrieved whole from memory at the time of use, rather than being subject to generation or analy-
sis by the language grammar” (Wray, 2002, p. 9). Accordingly, formulaic language is an umbrella
term for a variety of different linguistic structures (including idioms (grab the bull by its horns),
collocations (throw a party), discourse devices (on the one hand), routine formulae (nice to meet
you), but also L2-specific formulaicity (as described above), whose common denominator is
recurrence in production and supposed holistic storage in the speaker’s mind. However, the issue
of whether or not these multi-word combinations are memorized as whole units is still a matter of
debate. This is the reason why some researchers, especially corpus linguists, prefer more general,
or descriptive terms such as multi-word units, multi-word combinations, multi-word sequences,
or multi-word structures. In this chapter, the terms formulaic language and formulaic sequences
will however be used, as they cover both targetlike and L2-specific sequences and are also estab-
lished terms in SLA where they are often used as synonyms for phraseology and multiword
sequences, multiword combinations, etc.

Core Issues and Topics


Three major lines of investigation with respect to formulaic language in learner corpora will be
discussed in this section: the extent to which L2 learners acquire targetlike formulaicity, the role
of cross-linguistic influence, and L2-specific formulaicity.

To What Extent Do Learners Acquire Targetlike Idiomaticity?


One of the most important findings from SLA research using corpora to investigate formulaic-
ity is that formulaic language takes a long time to acquire. There are also differences between
the acquisition of formulaic language in spoken and written production. In addition, the time
it takes to acquire formulaic language in a target language will depend on the L1/L2 pairings
investigated. Many studies have investigated language pairings that are relatively similar, such
as L1 French/L2 English. However, English is a language that enjoys special status, given its
pervasiveness in media and culture around the world. In the Swedish context, for example, where
English has an important presence in society, Wiktorsson (2003) found that the only difference
in written production between advanced university students of English (Swedish L1) and native

371
Fanny Forsberg Lundell

speakers was that the L2 students overused formulaic sequences that were typical of the spoken
register, whereas the native speakers used more adjective + noun collocations or verb + noun col-
locations typical of written language.
Studies that have investigated other language pairs, including more distant language pairs
such as English-Hebrew, have reported different types of results. Levitzky-Aviad and Laufer
(2013), for example, analyzed L1 Hebrew learners’ use of English collocations in both elicited
tests and free writing over the course of 8 years. Two of their key findings were that collocations
take a particularly long time to develop in free writing and that learners seem to avoid the use of
collocations in free production. Forsberg (2010) investigated both a different mode and a differ-
ent L2: spoken L2 French. Learners ranged from beginner to near-native levels (very advanced
L2 users in a second language context). The general finding was that formulaic language in
spoken production is a feature that develops late in the learning process: Only the most advanced
learners used formulaic language to degrees that approached native speaker use. These still dif-
fered, however, from native speakers in the quantity of lexical collocations they used.
Studies have also started to explore the relationship between proficiency as evaluated using
the CEFR (Common European Framework of Reference) levels and the use of formulaic lan-
guage. Forsberg and Bartning (2010) and Paquot (2018, 2019) found that formulaic language
developed between the B2–C2 levels in L2 French and L2 English, respectively. These results
support the view that formulaic language is a good indicator of second language proficiency,
especially at advanced and very advanced proficiency levels.
As will become clear in the Main Research Methods section, however, the collocation stud-
ies discussed above have relied on different definitions of the term ‘collocation’, with the use
of statistical methods for the identification of formulaic language in corpora becoming more
frequent over the last decade. Several studies have used association measures (predominantly
MI score and t-score), to extract statistically significant word co-occurrences from learner cor-
pora and shed light on how collocations in learner production differ compared to collocations in
L1 productions. MI-score usually detects low-frequency, highly cohesive sequences, whereas
t-score detects highly frequent collocations. Durrant and Schmitt (2009) examined the use of
frequent English collocations by native speakers (NSs) and non-native speakers (NNs) on the
basis of written texts. Collocations were measured with MI scores, t-scores, and raw frequency
of occurrence. The results showed that the native writers used more low-frequency combinations
(MI-defined collocations) than the non-native writers. Granger and Bestgen (2014) also studied
MI-defined collocations and t-score defined collocations in the International Corpus of Learner
English (ICLE; Granger, Dagneaux, Meunier, & Paquot, 2009). Their results pointed in the same
direction: learners produced more high-frequency collocations than highly cohesive infrequent
ones, and this was particularly characteristic of lower proficiency levels. Finally, O’Donnell,
Römer, and Ellis (2013) studied four different corpus-analytic measures in first and second lan-
guage writing. They found that the infrequent, highly cohesive MI-defined formulas were less
frequent in productions by L2 writers and even in those by non-expert native writers. It therefore
seems like being able to produce infrequent, cohesive word combinations is a challenge not only
for non-native speakers in general, but also for native speakers in specialist genres.

Is L2 Formulaicity Influenced by L1 Patterns?


L1 entrenchment is one of the main obstacles in learning formulaic language in the L2 (Ellis,
2006). L1 entrenchment is a notion from cognitive psychology that refers to the fact that the
learners’ linguistic system is organized as an emergent network that has been tuned to the cues
of the L1 through thousands of hours of L1 processing (Ellis & Wulff, 2015, p. 82). Thus, the
preferred word combinations in the L1 will be strongly entrenched in the mental lexicon, and a

372
Formulaicity

lot of training will be necessary to modify the already established connections between words if
they are not congruent with the L2. Nesselhauf (2005), for example, investigated a subcorpus of
the ICLE, the ICLE-GE which contains argumentative essays from learners with German as L1.
One of the major findings of her study was that learner collocation errors were often not due to
the form of the collocation, for example its degree of restriction (i.e., how many words the com-
ponent parts can combine with, e.g., in the sequence dial a number, dial can hardly combine with
any other nouns). Rather, the errors were often related to L1 influence (*make homework (should
be ‘do homework’), possibly from German Hausaufgaben machen).
L1 entrenchment will not always lead to errors; it can also be facilitative. For example, some
experimental studies have shown that the processing of collocations is facilitated by the con-
gruence phenomenon, i.e., when the collocation is identical in the L1 and the L2 (cf. Wolter &
Gyllstad 2011; Peters 2016). L1 entrenchment and the frequency effects associated with it can also
be the source of learners’ preferences for specific formulaic sequences (e.g., Ellis, O’Donnell, &
Römer, 2015; Chapter 13, this volume). A number of studies by Paquot (e.g., 2013, 2014, 2017)
have focused on cross-linguistic effects observed through corpora. For example, Paquot (2014)
investigated transfer effects in French English as a Foreign Language (EFL) learners’ use of
recurrent word sequences, in particular two-to four-word lexical bundles overrepresented in the
French component of ICLE, compared with nine other ICLE learner sub-corpora. Lexical bun-
dles are “recurrent expressions, regardless of their idiomaticity, and regardless of their structural
status” (Biber et al., 1999, p. 990). The learners’ idiosyncratic use of lexical bundles was traced
back to various properties of French words and word combinations, including their functions in
discourse and their frequency of use (see Representative corpora and research below for more
details).

Is Formulaicity in First Language Use and Second Language Use


the Same Thing?
Myles, Hooper, and Mitchell have explored L2-specific formulaicity in several corpus stud-
ies focusing exclusively on sequences that appear to have psycholinguistic status as formulaic
sequences in the L2 learner’s system. Myles, Mitchell, and Hooper (1998) examined the relation-
ship between formulaic language and rule acquisition in instructed learners who are encouraged
by course activities to memorize unanalyzed units (p. 328). In their longitudinal study of English
L1 students of French aged 11–14, Myles et al. (1998) quickly identified j’aime [I like] and
j’adore [I love] as possible examples of L2-specific formulaicity. Learners said for example la
Monique j’aime le tennis [*‘Monique I love tennis’] instead of Monique aime le tennis [Monique
loves tennis], which suggests that the internal structure of j’aime is not analyzed and the sequence
is produced as a formulaic sequence. At first, learners were very reliant on formulaic sequences,
but when their communicative needs increased, they began to break down the sequences and
eventually were able to use parts of them productively (e.g., tu aimes [you like]). Myles, Mitchell,
and Hooper (1999) examined the same phenomenon, among the same group of learners, but this
time by studying interrogative formulas in French L2 and their contribution to the development
of grammatical rules. In this study, they observed the same relationship. In the 1998 article, the
authors reported having seen the formulas as units which, after their decomposition and analysis,
often disappeared from the learner’s repertoire. In Myles et al. (1999), the authors stressed even
more the role of formulas as material serving the acquisition of grammatical rules. They argue
that rules and formulas co-exist and that formulas do not necessarily disappear just because they
have served the purpose of figuring out grammatical rules.
Drawing on the work of Myles et al. (1998, 1999), Forsberg (2008) examined similar, learner
idiosyncratic, non-analytic constructions in the InterFra corpus (https://www.su.se/romklass/

373
Fanny Forsberg Lundell

interfra), another corpus of L2 French but with Swedish high school students aged 16 and 17.
Results demonstrated very few sequences of the kind described in Myles et al. (1998, 1999),
and they completely disappeared after the beginner level. In order to explain this divergence
in results, it is important to consider differences related to pedagogical practices and cognitive
maturity. It is possible that the students in the studies of Myles and colleagues had been exposed
to more drilling of formulaic sequences and very little teaching of grammatical rules. It is also
likely that the age difference between the 11–14-year-olds in the studies of Myles and colleagues
and the 16–17-year-olds in Forsberg (2008) influenced the capacity to process grammar.
The abovementioned studies show that when speaking of formulaicity or formulaic language,
researchers occasionally refer to non-analyzed utterances, which can be particularly frequent in
beginner productions (e.g., j’aime [I like], je voudrais [I would like]). Most of the time, however,
researchers mean targetlike combinations of words, such as ‘give a speech’, which often do not
appear until intermediate to advanced levels of L2 acquisition. Thus, it is important to clarify
exactly what kind of formulaic language is in focus in a particular study.

Main Research Methods


Two main types of research methods are typically distinguished when it comes to the identifica-
tion and analysis of (targetlike) formulaic language in corpora: the phraseological approach and
the frequency-based approach (see Granger & Paquot, 2008). These methods are discussed in
more detail below based on Erman, Forsberg Lundell, and Lewis (2016). See, for example, Wray
and Namba (2003), for more information about methods used in studies that focus on L2-specific
formulaicity.

Phraseological Approaches to Formulaicity


One key characteristic of phraseological studies is that they usually rely on manual analysis of
linguistic characteristics to define and identify different types of formulaic sequences. Proponents
of the phraseological approach have adopted the ideas of the Russian phraseologists from the
1940s, 1950s, and 1960s. Russian phraseology was largely oriented towards fixed and semi-fixed
expressions, using criteria such as degree of restrictedness of the expression’s component parts.
Following the Russian approach, scholars such as Cowie (1981) categorized word combinations
along a continuum of idiomaticity, beginning with free combinations (e.g., ‘blow a trumpet’) at
one end, spanning over restricted collocations (‘blow a fuse’), which are partly compositional,
and figurative idioms (e.g., ‘blow your own trumpet’) which are non-compositional, and ending
with pure idioms (‘blow the gaff’), which are completely opaque, at the other end. Only the last
three expressions can be labeled phraseological, because at least one of the members of the word
combination has a specialized or figurative sense.
Nesselhauf (2003) argued that the most common identification criterion for phraseologists is
that of “arbitrary restriction on substitutability” (2003, p. 225). In her own work on phraseologi-
cally defined verb-noun collocations, she made use of two criteria, of which at least one needs to
apply to the word combination for it to be labeled a collocation:

1) The sense of the verb (or the noun) is so specific that it only allows its combination with a
small set of nouns (or, if the noun is the most specific, a small set of verbs) (e.g. ‘dial’ as in
‘dial a number’);
2) The verb (or the noun) cannot be used in this sense with all nouns (or in the case of specific
nouns, verbs) that are syntactically and semantically possible (e.g. ‘take a picture/photo’, but
not ‘*take a film’) (2003, p. 225).

374
Formulaicity

Nesselhauf’s definition is similar to the criterion of ‘restricted exchangeability’, presented by


Erman & Warren (2000) as one criterion to identify what they call ‘prefabs’ (a synonym for
formulaic sequences). Restricted exchangeability means that at least one member of the prefab
cannot be replaced by a synonymous item without causing a change of meaning or function and/
or idiomaticity.
While most phraseologists have adopted an approach based on, e.g. restricted collocability or
degree of compositionality to identify and categorize multiword combinations, scholars such as
Mel’cuk (1998) have also proposed categorizations based on the function of phraseological units
in context.

Frequency-based Approaches to Formulaicity


Researchers working within the frequency-based approach to the study of word combinations
rarely focus on formulaic sequences in general (in fact, they would hardly use the term at all)
but rather are interested in recurrent word combinations that can be identified on the basis of
frequency only (typically lexical bundles) or association measures (i.e. [statistical] collocations).
As first discussed in the Core Issues and Topics section, lexical bundles are multi-word
sequences that recur frequently and are distributed widely across different texts (e.g., as far as,
at the end of) (Biber 2010, p. 170). The lexical bundles framework is largely data-driven and
retrieval can be fully automatized and applied to sizeable text corpora. Three parameters need
to be set: length of word combinations (many studies have analyzed 4-word lexical bundles),
minimum frequency threshold and a dispersion criterion (see Chen & Baker, 2010 for more
details about these parameters). Although they do not necessarily represent complete structural
units, lexical bundles have been found to serve important functions in L1 and L2, for instance,
in corpora of academic discourse (see Chen & Baker, 2010; Ädel & Erman, 2012). Differences
found between L1 and L2 use of lexical bundles typically include overuse of speech-like lexical
bundles (e.g. ‘all over the world’), underuse of academic-like bundles built around nouns (e.g.
‘the way in which’) as well as a less varied repertoire of types of bundles.
As mentioned above, the use of association measures is another approach to identifying collo-
cations. Researchers working with statistical measures often analyze relational collocations (i.e.
collocations in a specific grammatical relation), investigating, for instance, adjective + noun col-
locations (e.g., severe cold) or verb + noun collocations (throw a party). Here, the term ‘colloca-
tion’ should not be understood as arbitrarily restricted word combinations as in the phraseological
approach to word combinations but as words that co-occur significantly more than by chance.
In most recent collocation studies, association measures have been computed on the basis of
frequencies extracted from large reference corpora (e.g. the British National Corpus [BNC], the
Contemporary Corpus of American English [COCA]), not from learner corpora. There are cer-
tainly a number of technical reasons that support this methodological decision, including reasons
related to the minimum size of a corpus to compute association measures or the fact that asso-
ciation measures that rely on corpus size cannot be directly compared across corpora. However,
the most important reason why researchers used association measures computed from reference
native or expert corpora to measure collocational strength in learner language is that such scores
represent native-like idiomaticity on the basis of which they could rank co-occurrences in learner
corpora and judge their acceptability (e.g. Siyanova & Schmitt, 2008; Durrant & Schmitt, 2009;
Granger & Bestgen, 2014)11.
In learner corpus research, two association measures – Mutual Information and t-score – have
been used extensively. Mutual information (MI) is a measure derived from information theory
that quantifies the mutual dependency between two variables. The MI-score uses a logarithmic
scale to express the ratio between the frequency of the collocation and the frequency of random

375
Fanny Forsberg Lundell

co-occurrence of the two words in the combination (Church & Hanks, 1990). An MI score of > 3
is a regular threshold for a word combination to be considered a statistically significant colloca-
tion (see Hunston, 2002). The t-score, on the other hand, “is calculated as an adjusted value of
collocation frequency based on the raw frequency from which random co-occurrence frequency is
subtracted. This is then divided by the square root of the raw frequency” (Gablasova, Brezina, &
McEnery, 2017, p.162). It is often suggested that MI tends to give prominence to low-frequency
collocations whose component words are not often found apart (e.g. ultimate arbiter, immortal
souls, and tectonic plates), whereas the t-score prioritizes combinations of more frequent words
(e.g. long way, and hard work) (Durrant & Schmitt, 2009, p. 67). For this reason, researchers
have advocated using both measures (Durrant & Schmitt, 2009; Granger & Bestgen, 2014) (see
Core Issues and Topics section). Note that the two measures have also been heavily criticized, and
there have been calls in the field to broaden the range of statistical techniques used to establish the
degree of association between words (e.g., Gries, 2013; see also Wulff, Gries, & Lester, 2018 for
a study that relies on Delta-P as an association measure to quantify directional attraction).

Comparing Phraseological and Frequency-based Approaches


Not all sequences that are automatically extracted are necessarily interesting from the perspective
of the acquisition of targetlike formulaicity. This is, for example, the case with lexical bundle-
analysis, where manual analysis is required in separating what is actually a sequence of inter-
est (i.e., targetlike formulaicity) and what is not (e.g., as a matter of fact vs. the age of the).
Automatic extraction also fails to take into account semantic and pragmatic aspects, which can
be very important when identifying conventional sequences, given that it is not only a question
of conventional form but also of conventional function. An example of this are all the ‘clausal’
sequences, described in Erman et al. (2015): In everyday and professional interactions, there are
an important number of sequences where the form would maybe not be detected as formulaic,
but where a typical pragmatic function is connected to a particular sequence. Examples are I do
realize that in English or Je vous rassure in French [‘I assure you’]. If conventional sequences are
not frequent enough, techniques such as n-gram (i.e. lexical bundle) extraction will not extract
them despite their idiomatic nature, especially in small learner corpora.
By contrast, the more manual methods that are typically used by phraseologists would make
it possible to identify such sequences. But manual analysis comes with its limitations too. First,
there is a limit in the amount of text that can be analyzed manually; as a result, phraseological
studies are often small scale. Second, identification and categorization criteria are open to sub-
jective interpretation, which may lead to issues of internal consistency and lack of replicability.
In conclusion, given the advantages and limitations described above, the phraseological
approach and the frequency-based approach to formulaicity should be viewed as complemen-
tary. Future research would definitely benefit from more studies such as Erman, Lewis, and Fant
(2013) that compared different approaches on the same corpora to investigate what each method
could reveal about native and non-native use of formulaic language.

Representative Corpora and Research


A substantial amount of corpus research into EFL learners’ use of formulaic sequences is based
on the International Corpus of Learner English (ICLE), which consists of 3.7 million words
of argumentative essays by L2 English university students, organized into sixteen subcorpora
according to the learners’ L1 (e.g., Spanish, Italian, French, and Russian; cf. Granger, Dagneaux,
Meunier, & Paquot, 2009). For example, Paquot (2017) made use of the Integrated Contrastive
Model, a combination of Contrastive Analysis (CA) and Contrastive Interlanguage Analysis

376
Formulaicity

(CIA) (see Chapters 8 and 26, this volume), to investigate L1 effects on French and Spanish
EFL learners’ preferred use of three-word lexical bundles with a discourse or stance-oriented
function. Word combinations were extracted from the French and Spanish ICLE sub-corpora and
the frequency of their translation equivalent forms in the native language were analyzed on the
basis of French and Spanish L1 corpora. By means of studying two Romance languages as L1s,
Paquot showed that even if congruent forms exist in French and Spanish (dans le cas de (fr.) / en
el caso de (sp.) [in the case of]), L1 speakers differ in their propensity to use ‘in the case of’ in
L2 English: Spanish EFL learners use the English word combination much more often than the
French EFL learners. This difference can be related to different frequencies of the bundle in the
L1s, with en el caso de being about 5 times as frequent in Spanish as dans le cas de in French.
Paquot (2017) thus makes a call for more studies investigating L1 (frequency) effects on learners’
use of formulaic language.
The Stockholm Multi-Task corpus (Erman, Denke, Fant, & Forsberg Lundell, 2015) is a
spoken corpus and includes comparable data produced by learners of three different L2s, i.e.,
English, French, and Spanish (all with Swedish as the L1). The corpus was compiled for the
purpose of the research programme High Proficiency in Second Language Use (cf. Hyltenstam,
Bartning, & Fant, 2018) and data were collected from participants not often included in main-
stream SLA studies, i.e., long-residency L2 users, often highly educated, who have resided at
least five years in the target language community. The corpus contains five different oral activi-
ties (one interview, two role plays, and two retellings; hence the label ‘Multi-Task’) with 10 L1
speakers and 10 L2 speakers for each language, totaling 60 speakers. Data were recorded in Paris,
2007, Santiago de Chile, 2007, and London, 2008. One of the studies based on this corpus, which
deals with multi-word sequences in all three L2s, is Erman et al. (2015). Two different com-
municative tasks from the Multi-Task Corpus were investigated: (1) a role play over the phone
and (2) an online retelling of a film clip. Nativelike expression in L2 speech was investigated by
comparing quantity and distribution of different types of multiword structures in the speech of
highly advanced L2 speakers with that of native speakers. The study showed that two L2 groups
(English and Spanish L2) were nativelike in their use of one category of multiword structures,
i.e., social routines (e.g., do you want to have a think about it?), in the role play. Collocations, the
dominant category in the retelling task, were underrepresented in all three L2 groups compared
to the native groups. Furthermore, the English NNSs were nativelike on more measurements of
multiword structures than the French and Spanish NNSs. These results confirm the status of col-
locations as being particularly difficult for L2 learners, even after many years of immersion in
the target language.

Future Directions
In my view, the future of research on formulaic language in SLA is dependent on diversity at
two levels. The first level deals with the languages investigated. The entire field of SLA suffers
from a bias towards English as a second language. Although grammar has been investigated more
thoroughly for other L2s than English, research on formulaicity is still strikingly limited to the
English language. I would therefore suggest that the most needed step forward is the investiga-
tion of L2s other than English. Research has already been conducted to some extent on L2 French
(Forsberg, 2010), L2 Spanish (Erman, Denke, Fant, & Forsberg Lundell, 2015; Stengers, Boers,
& Housen, 2011), and L2 Polish (Jaworski, 1990), but much remains to be done. The study
on Polish, although small-scale, is thought-provoking because Jaworski (1990) suggested that
Polish speakers use less formulaic language than American English speakers in everyday small
talk. Such claims indeed underline the importance of crosslinguistic investigations of formulaic-
ity. This improvement in diversity of course entails the development of learner corpora in L2s

377
Fanny Forsberg Lundell

other than English. The French Learner Language Oral Corpora (FLLOC) (Myles & Mitchell,
2016) and Spanish Learner Language Oral Corpora (SPLLOC) (Mitchell, Dominguez, Arche,
Myles, & Marsden, 2008) initiatives for French and Spanish L2s, for example, constitute laud-
able efforts to fill this gap, and these corpora could be further explored when it comes to the use
of formulaic language.
It will also be important to diversify the language pairs investigated. As seen in the Core
Issues and Topics section, work has already been done on several languages, but a larger vari-
ety of languages needs to be compared. It is possible that formulaic language is one of the
language features where L1 influence plays a particularly strong role, including both posi-
tive and negative transfer, even at the more advanced proficiency levels. One related research
avenue worth investigating would be L3 influence in the acquisition of formulaic language. Is
the acquisition of formulaic language affected more by the L1 than by other L2s, or are all the
languages that a learner knows of equal importance? This kind of research would also have
important implications for language teaching, given that contrastive teaching, which includes
the L1 and possibly other L2s, could be beneficial for teaching formulaic language, as shown
by e.g., Laufer and Girsai (2008).
The second level of diversification refers to the discursive genres investigated (see Myles,
2015). Learner corpora are often composed of written productions in the form of argumentative
essays or spoken productions in the form of interviews and retellings. As a result, to date, we
know rather much about how (EFL) learners use formulaic language in argumentative essays, in
academic writing, and in life-story interviews. However, we know little of how learners perform
in other tasks and situations. One exception is Erman et al. (2015) who attempted to diversify
tasks by also including phone calls and different retellings for spoken language and showed that
formulaic language use differed depending on the communicative genre. If we want to further
our understanding of L2 learners’ acquisition of formulaic language, the issue of variation across
genres needs to be addressed.
In recent years, “language learning in the wild” (Wagner, 2015) has become an important
concept within interactional approaches to SLA. In spite of the practical difficulties surrounding
the collection and analysis of such authentic data, a welcome addition to the study of L2 formu-
laic language would be learner corpora documenting naturally occurring speech. Indeed, such
corpora would be of interest to scholars investigating formulaic language, especially in view of
the social importance of formulaic language (see Burdelski & Cook, 2012) An increasing num-
ber of people will need to learn a second language in order to function in society. In this societal
context, the efficient use of formulaic language in spoken production, especially in professional
communication, is an area that requires much more research. Wray (2002, p. 94) described the
use of formulaic language as a means to further speakers’ “promotion of self”. If this is the case,
then it is a vital linguistic feature for all speakers around the world who need to start a new life
in a new language. Applied linguists need materials that allow the investigation of ecologically
valid questions. This suggests that future research into formulaic language in second language
learner corpora should be more “wild”.

Further Reading
Ellis, N. C., Simpson-Vlach, R., Römer, U., O’Donnell, M. B., & Wulff, S. (2015). Learner corpora and
formulaic language in second language acquisition research. In S. Granger, G. Gilquin, & F. Meunier
(Eds.), The Cambridge handbook of learner corpus research (pp. 357–378). Cambridge: Cambridge
University Press.
This is a useful overview which connects research on formulaic language in corpora with second language
learning theories. It gives valuable pointers with respect to identification issues and how they relate to corpus
design matters.

378
Formulaicity

Gablasova, D., Brezina, V., & McEnery, T. (2017). Collocations in corpus-cased language learning research:
Identifying, comparing, and interpreting the evidence. Language Learning, 67(51), 155–179.
This paper constitutes a critical evaluation of learner corpus research into collocations, with useful discus-
sions on identification procedures and the impact of different genres.
Paquot, M., & Granger, S. (2012). Formulaic language in learner corpora. Annual Review of Applied
Linguistics, 32, 130–149.
This is a state-of-the art article, with a focus on statistical methods of investigation.

Related Topics
Chapters 8, 13, and 25.

Note
1 Note that automatic extraction of collocations and lexical bundles from reference corpora has also
been used for the purpose of creating collocation tests in applied linguistics research (cf. e.g. Forsberg
Lundell, Lindqvist, & Edmonds, 2018; Gyllstad 2007).

References
Ädel, A., & Erman, B. (2012). Recurrent word combinations in academic writing by native and non-native
speakers of English: A lexical bundles approach. English for Specific Purposes, 31(2), 81–92.
Biber, D. (2010). Longman student grammar of spoken and written English. India: Pearson Education.
Biber, D., Johansson, S., Leech, G., Conrad, S., & Finegan, E. (1999). Longman grammar of spoken and
written grammar. Harlow: Longman.
Burdelski, M., & Cook, M. (2012). Formulaic language in language socialization. Annual Review of Applied
Linguistics, 32, 173–188.
Chen, Y., & Baker, P. (2010). Lexical bundles in L1 and L2 academic writing. Language Learning and
Technology, 14(2), 20–49.
Church, K. W., & Hanks, P. (1990). Word association norms, mutual information, and lexicography.
Computational Linguistics, 16(1), 22–29.
Cowie, A. P. (1981). The treatment of collocations and idioms in learners’ dictionaries. Applied Linguistics,
2(3), 223–235.
Durrant, P., & Schmitt, N. (2009). To what extent do native and non-native writers make use of collocations?
IRAL (International Review of Applied Linguistics in Language Teaching), 47(2), 157–177.
Ellis, N. C. (2006). Selective attention and transfer phenomena in L2 acquisition: Contingency, cue
competition, salience, interference, overshadowing, blocking and perceptual learning. Applied
Linguistics, 27(2), 164–194.
Ellis, N. C., O’Donnell, M. B., & Römer, U. (2015). Usage-based language learning. In B. MacWhinney &
W. O’Grady (Eds.), The handbook of language emergence (pp. 163–180). Malden: Blackwell Publishing.
Ellis, N. C., Simpson-Vlach, R., Römer, U., O’Donnell, M. B., & Wulff, S. (2015). Learner corpora and
formulaic language in second language acquisition research. In S. Granger, G. Gilquin, & F. Meunier
(Eds.), The Cambridge handbook of learner corpus research (pp. 357–378). Cambridge: Cambridge
University Press.
Ellis, N. C., & Wulff, S. (2015). Usage-based approaches to SLA. In. B. VanPatten & J. Williams (Eds.),
Theories in second language acquisition (2nd ed., pp. 75–93). New York: Routledge.
Erman, B., Denke, A., Fant, L., & Forsberg Lundell, F. (2015). Nativelike expression in the speech of long-
residency L2 users: A study of multiword structures in the speech of L2 English, French and Spanish.
International Journal of Applied Linguistics, 25(2), 160–182.
Erman, B., Forsberg Lundell, F., & Lewis, M. (2016). Formulaic language – theories, methodologies
and implications for second language acquisition. In K. Hyltenstam (Ed.), Advanced proficiency and
exceptional ability in second languages (pp. 111–147). Berlin: Mouton de Gruyter.
Erman, B., Lewis, M., & Fant, L. (2013). Multiword structures in different materials, and with different
goals and methodologies. In J. Romero-Trillo (Ed.), Yearbook of corpus linguistics and pragmatics (vol.
1, pp. 77–103). Dordrecht: Springer.

379
Fanny Forsberg Lundell

Erman, B., & Warren, B. (2000). The idiom principle and the open choice principle. Text, 20(1), 29–62.
Forsberg, F. (2008). Le langage préfabriqué – formes, fonctions et fréquences en français parlé L2 et L1.
Oxford: Peter Lang Publishing.
Forsberg, F. (2010). Using conventional sequences in L2 French. IRAL (International Review of Applied
Linguistics in Language Teaching), 48(1), 25–50.
Forsberg, F., & Bartning, I. (2010). Can linguistic features discriminate between the CEFR-levels? A pilot
study on written L2 French. In I. Bartning, M. Martin, & I. Vedder (Eds.), Communicative proficiency
and linguistic development. Intersections between SLA and language testing research (pp. 133–158).
Eurosla monograph series 1.
Forsberg Lundell, F., Lindqvist, C., & Edmonds, A. (2018). Productive collocation knowledge at advanced
CEFR levels. Evidence from the development of a test for advanced L2 French. Canadian Modern
Language Review, 74(2), 627–649.
Gablasova, D., Brezina, V., & McEnery, T. (2017). Collocations in corpus-cased language learning research:
Identifying, comparing, and interpreting the evidence. Language Learning, 67(51), 155–179
Granger, S., & Bestgen, Y. (2014). The use of collocations by intermediate vs. advanced non-native writers:
A bigram-based study. IRAL, 52(3), 229–252.
Granger, S., Daneaux, E., Meunier, F., & Paquot, M. (2009). ICLE: International corpus of learner English.
Louvain-la-Neuve: Presses universitaires de Louvain.
Granger, S., & Paquot, M. (2008). Disentangling the phraseological web. In S. Granger & F. Meunier (Eds.),
Phraseology. An interdisciplinary perspective (pp. 27–49). Amsterdam: John Benjamins
Gries, S. Th. (2013). 50-something years of work on collocations: What is or should be next. International
Journal of Corpus Linguistics, 18(1), 137–165.
Gyllstad, H. (2007). Testing English collocations: Developing receptive tests for use with advanced Swedish
learners. Doctoral dissertation. English Department. Lunds Universitet.
Hunston, S. (2002). Corpora in applied linguistics. Cambridge: Cambridge University Press.
Hyltenstam, K., Bartning, I., & Fant, L. (Eds.) (2018). High-level proficiency in second language use and
multilingual contexts. Cambridge: Cambridge University Press.
Jaworski, A. (1990). The acquisition and perception of formulaic language and foreign language teaching.
Multilingua, 9(4), 397–411.
Laufer, B., & Girsai, N. (2008). Form-focused instruction in second language vocabulary learning: A case
for contrastive analysis and translation. Applied Linguistics, 29(4), 694–716.
Levitzky-Aviad, T., & Laufer, B. (2013). Lexical properties in the writing of foreign language learning over
eight years of study: Single words and collocations. In C. Bardel, C. Lindqvist, & B. Laufer (Eds.), L2
Vocabulary acquisition, knowledge and use (pp. 127–148). New perspectives on assessment and corpus
analysis. Eurosla monograph series 2.
Mel’cuk, I. (1998). Collocations and lexical functions. In A. P. Cowie (Ed.), Phraseology. theory, analysis,
and applications (pp. 23–53). Oxford: Clarendon Press.
Mitchell, R., Dominguez, L., Arche, M., Myles, F., & Marsden, E. (2008). SPLLOC: A new database for
second language acquisition research. EUROSLA Yearbook, 2008, 287–304.
Myles, F. (2015). Second language acquisition theory and learner corpus research. In S. Granger, G. Gilquin,
& F. Meunier (Eds.), The Cambridge handbook of learner corpus research (pp. 309–332). Cambridge:
Cambridge University Press.
Myles, F., & Mitchell, R. (2016). French language learner oral corpora (FLLOC). Retrieved from http://
www.flloc.soton.ac.uk/index.htm
Myles, F., Mitchell, R., & Hooper, J. (1998). Rote or rule? Exploring the role of formulaic language in
classroom foreign language learning. Language Learning, 48(3), 323–363.
Myles, F., Mitchell, R., & Hooper, J. (1999). Interrogative chunks in French, L2. A basis for creative
construction? Studies in Second Language Acquisition, 21, 49–80.
Nesselhauf, N. (2003). The use of collocations by advanced learners of English and some implications for
teaching. Applied Linguistics, 24(2), 223–242.
Nesselhauf, N. (2005). Collocations in a learner corpus. Amsterdam: Johan Benjamins.
O’Donnell, M. B., Römer, U., & Ellis, N. C. (2013). The development of formulaic sequences in first and
second language writing. Investigating effects of frequency, association and native norm. International
Journal of Corpus Linguistics, 18(1), 83–108.
Paquot, M. (2013). Lexical bundles and L1 transfer effects. International Journal of Corpus Linguistics,
18(3), 391–417.
Paquot, M. (2014). Cross-linguistic influence and formulaic language: Recurrent word sequences in French
learner writing. EUROSLA Yearbook 2014, 240–261.

380
Formulaicity

Paquot, M. (2017). L1 frequency in foreign language acquisition: Recurrent word combinations in French
and Spanish EFL learner writing. Second Language Research, 33(1), 13–32.
Paquot, M. (2018). Phraseological competence: A missing component in university entrance language tests?
Insights from a study of EFL learners’ use of statistical collocations. Language Assessment Quarterly,
15(1), 29–43.
Paquot, M. (2019). The phraseological dimension in interlanguage complexity research. Second Language
Research, 35(1), 121–145.
Paquot, M., & Granger, S. (2012). Formulaic language in learner corpora. Annual Review of Applied
Linguistics, 32, 130–149.
Pawley, A., & Syder, F. (1983). Two puzzles for linguistic theory: Nativelike selection and nativelike
fluency. In J. C. Richards & R. W. Schmidt (Eds.), Language and communication (pp. 191–227).
London: Longman
Peters, E. (2016). The learning burden of collocations: The role of interlexical and intralexical factors.
Language Teaching Research, 20(1), 113–138.
Sinclair, J. (1991). Corpus, concordance, collocation. Oxford: Oxford University Press.
Siyanova, A., & Schmitt, N. (2008). L2 learner production and processing of collocation: A multi-study
perspective. Canadian Modern Language Review, 64(3), 429–458.
Stengers, H., Boers, F., & Housen, A. (2011). Formulaic sequences and L2 oral proficiency. International
Review of Applied Linguistics in Language Teaching, 49(4), 321–343.
Wagner, J. (2015). Designing for language learning in the wild: Creating social infrastructures for second
language learning. In T. Cadierno & S. Eskildsen (Eds.), Usage-based perspectives on second language
learning (pp. 75–104). Berlin: Mouton de Gruyter.
Warren, B. (2005.). A model of idiomaticity. Nordic Journal of English Studies, 4(1), 35–54.
Wiktorsson, M. (2003). Learning idiomaticity. Lund studies in English 105. Doctoral dissertation. English
Department. Lunds Universitet.
Wolter, B., & Gyllstad, H. (2011). Collocational links in the L2 mental lexicon and the influence of L1
intralexical knowledge. Applied Linguistics, 32(4), 430–449.
Wray, A. (2002). Formulaic language and the lexicon. Cambridge: Cambridge University Press.
Wray, A., & Namba, K. (2003). Formulaic language in a Japanese-English bilingual child: A practical
approach to data analysis. Japanese Journal of Multilingualism and Multiculturalism, 9(1), 24–51.
Wulff, S., Gries, S. Th., & Lester, N. A. (2018). Optional that in complementation by German and Spanish
learners. In A. Tyler, L. Huan, & H. Jan (Eds.), What is applied cognitive linguistics? Answers from
current SLA research (pp. 99–120). Berlin: De Gruyter Mouton.

381

You might also like