Professional Documents
Culture Documents
net/publication/31084932
CITATIONS READS
34 636
1 author:
Dirk Siepmann
Universität Osnabrück
67 PUBLICATIONS 255 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
A Grammar of Spoken and Written French / Grammatik des gesprochenen und geschriebenen Französisch / Grammaire du
français parlé et écrit View project
All content following this page was uploaded by Dirk Siepmann on 23 April 2015.
1. Introduction
International Journal of Lexicography, Vol. 19 No. 1. Advance access publication 29 November 2005
ß 2005 Oxford University Press. All rights reserved. For permissions,
please email: journals.permissions@oxfordjournals.org
2 Dirk Siepmann
(a) Colligation ( you can stick your þ NP, far be it from me to þ INF, ignorer
tout de þ N, il n’y a qu’à þ INF, ce/cette N [tradition, etc.] est reste´(e),
NP dans l’âme, typisch þ N, etc.); note that this definition of colligation is
different from Firth’s (1957) or Hoey’s (1998)1, since it concerns not only
the grammatical preferences of individual words, but also those of longer
syntagms. Thus, the syntagm tu n’avais qu’à can be said to be in colligation
with an infinitive clause.
(b) Collocation between lexemes or phrasemes ( just as þ clause . . . so / in the
same manner þ clause, levy charges, briser ses chaussures, c’est-à-dire en
l’occurrence, regarde où tu vas, bon ben, à la fin, etc.).
(c) Collocation between lexemes and semantic-pragmatic (contextual)
features (beautifully þ [result of creative activity], [uncertainty] þ not so,
[question] þ eh bien, [expectation] þ duly, [negative contextual aspect] þ
(not) detract from s.o.’s enjoyment, help! [on such one-word collocations,
cf. González-Rey 2002: 95, 101)
(d) Collocation between semantic-pragmatic features (e.g. long-distance
collocations, Siepmann 2005).
This typology and the notational conventions that go with it present two
major advantages with a view to lexicographic applications: they allow us to
capture the full range of collocational phenomena, and they dispense almost
Collocation, Colligation and Encoding Dictionaries 3
2.1 Rationale
The rationale behind the Bilexicon project proceeds from a paradox about
foreign language learning in higher education: language teaching specialists
have long demanded that university graduates in modern languages should
have a native-like lexical competence in their L2 (e.g. Meißner et al. 2001);
in practice, however, such a competence is seldom attained, and few serious
4 Dirk Siepmann
So far little research effort has been expended upon describing the extent
of native-like lexical competence in the L2. There is only one study for the
language pair German-French (Hausmann, forthcoming), whose aim it is to list
a large section of the receptive vocabulary of French which is ‘intransparent’
from a German perspective.
What Hausmann has achieved for the receptive side the Bilexicon project
aims to do for the productive side: to draw up a near-comprehensive list of
those collocations (including colligations) which may be considered to make
up a native-like vocabulary. The compilation of the native-like vocabulary
proceeds from two premises:
(a) Any attempt to determine basic and advanced vocabularies must start
from a list of all native-speaker signs (perhaps even including manual and
facial gestures), i.e. the entire lexicon of the language. The approach is thus
essentially top-down.
(b) It is from such a list that a near-native vocabulary can then be constructed.
Thus, rather than asking, as the traditional frequency approach did, ‘which
are the most frequent words in the language, and which words do we need
to add to these to obtain a good working vocabulary?’, this approach poses
the question ‘what are the meaning units that native speakers use, and which
of these have to be mastered to be able to perform at a near-native (or lower)
proficiency level?’. It is based on the simple observation that some adult
learners can pass as native speakers of the L2 because they have perfect
pronunciation and a command of lexico-grammar which is sufficient to express
any communicative need in a correct and natural manner. Nevertheless these
learners have not normally attained the same level of lexical competence as
a native; even for them, the framing of ideas in the foreign language is
conditioned by linguistic proficiency. It is the level of vocabulary knowledge
achieved by such learners that can be described as ‘near-native’.
Collocation, Colligation and Encoding Dictionaries 5
It should have become clear that, despite its deficiencies, the second
alternative is more promising than an approach based on frequency alone,
especially if the point of departure is a clearly delimited area of the vocabulary,
such as the language of motoring or the vocabulary relating to feelings. First,
a very large corpus of subject-specific material is assembled from Internet and
other sources, such as corpora and published dictionaries. In constructing
such a corpus, it is important to include Internet genres that are lexically close
to real-life speech, such as news forums, e-mail, fan fiction, film and soap
opera scenarios. A further means of reducing the inevitable bias towards
writing in corpus construction is to elicit judgements from native speakers on
(1) Corpora and dictionary sources are tapped to identify all the individual
word-forms and words belonging to the vocabulary area in question. This
involves the making of a corpus-based ‘word list’ using for example
the WordSmith tool of the same name and the use of dictionaries which
allow full-text searches or searches by subject area, such as TLF, DO, PR
or CIDE.
(2) In the next step, programs such as WordSmith and Collocate are used
to determine the collocations and patterns entered by the items on the
word list.
(3) The third step is to eliminate redundant collocations on the basis of the
aforementioned economy effects.
2.3 Macrostructure
The project stands in the long tradition of what, borrowing from McArthur
(1986), we might call ‘thematic learner lexicography’ – a tradition that goes
Collocation, Colligation and Encoding Dictionaries 7
setting up and internal structuring of sub-areas and situation types. This stands
in contrast with traditional approaches to thesaurus building, where terms were
inserted into a fully pre-determined ontological structure. There are, of course,
obvious limitations to such an approach in that some words and collocations
have both general and topic-specific uses. A case in point is the vocabulary
relating to damage, which is important in such situation types as ‘car accidents’
but may also apply to a wide range of other situations (any kind of accident,
intention to harm, legal terminology, etc.).
Underlying this thematic organization in the electronic version will be a layer
of semantic links inspired by such work as Francis, Hunston and Manning
tradition of recording single words which has existed at least since Babylonian
antiquity.
There is, of course, no denying the fact that speakers can isolate words
from context and thus arrive at a definition of ‘word meanings’. However, since
the definition of word meaning requires the speaker to engage in a process of
abstraction, it is at least debatable whether it is ‘word meanings’ that underlie
the speaker’s competence. Even the elicitability of paradigmatic relations
between the meanings of individual words does not allow us to conclude
that word meanings are stored in paradigmatic networks in what is often
called the ‘mental lexicon’ (cf. Aitchison 1994). It is equally conceivable that
The second of these possibilities would partially solve the difficulties users
have in locating collocations because of their ‘directionality’; two-item
collocations are still normally recorded at the entry for the collocate rather
than for the base (i.e. the semantically most important word). Thus, users will
find meet a criterion under meet rather than criterion, although their
formulation process starts with the noun. One wonders, however, whether
the second and third of these schemas will always lead to an unequivocal
solution, as lexicographers’ and users’ views on what is semantically and
grammatically ‘most important’ may differ. The fourth solution reflects user
Collocation, Colligation and Encoding Dictionaries 11
type of syntactic relationship, after the manner of OC, for example. But then
again such clustering may be difficult to justify with clearly motivated multi-
word units like there is good reason to þ INF; there is a strong case here for
treatment under the relevant sense division of reason.
There are, of course, equally good reasons for giving main entry to
collocations as there are for recording them under a sub-entry, whether this be
a separate entry or a sense division of a particular headword (cf. Burger 1998:
172 on multi-word units). However, if we decide to give collocations main
entry status, this will entail an even more complex macrostructure. To take but
one example, multi-word collocations serving a pragmatic or text-structuring
for users – but it becomes one in the case of collocations which appear to have
been ‘freely’ put together by the application of general semantic and syntactic
rules. This can be illustrated with two examples, one from an unabridged
monolingual dictionary (GR) and one from a monolingual learner’s dictionary
(CCED).
GR, which offers a sprinkling of ‘extended’ collocations, will serve to
illustrate the haphazard nature of current practice (for further detail, see
Siepmann 2005). Thus, the exemplificatory infinitive clause pour n’en citer
qu’un exemple – a collocation of type 2 common in academic writing – is found
as the second example under sub-entry II.2:
This example sentence may, however, not be very useful to learners, since it
neglects to highlight that we are dealing with a transitional device that can be
employed in both spoken and written English rather than an ad-hoc formation.
The drawbacks of such practice should by now be obvious. For one thing,
neither the native nor the non-native user will be sensitised to the holistic
nature of multi-word units. For another, the non-native user in particular
will find it difficult to find variants of a particular collocation, such as pour ne
donner qu’un exemple or pour prendre un seul exemple in the case of the example
from GR – this is due to the lack of synonymic links in the mediostructure
Collocation, Colligation and Encoding Dictionaries 15
already touched upon. One reason for the lack of cross-referencing with
regard to synonyms is what may be termed the ‘alphabetical framework
approach to dictionary making’. In the compilation of large-scale dictionaries
one commonly starts by drawing up an alphabetical list, or ‘framework’
of the major sense divisions before assigning one small section of the
alphabetical list to the individual lexicographer, who will identify and enter
collocations of individual lexemes without much regard to the findings of his
or her colleagues.
As can also be inferred from the above examples, another serious
disadvantage of current practice is that common collocations tend to be
4. (XIVe). Admettre pour vrai après avoir nié, ou après avoir douté,
accepter malgré des réticences. 5X Admettre, avérer, déclarer . . . On a fini
par reconnaıˆtre son innocence. 5X Croire (à); ! aussi Rendre hommage*
à . . . On est force´ de reconnaıˆtre des divergences (cit. 1) entre certains
textes . . . Maintes fois, il le reconnaıˆt lui-meˆme, il manquait de bon sens
(! Grain, cit. 26). Reconnaıˆtre la supe´riorite´ de qqn. 5X Céder (3.: le
céder à); proclamer . . . Amener qqn à reconnaıˆtre. 5X Convaincre.
Note that such treatment is doubly limiting. For one thing, it conceals the
generativity of the patterns as well as the limits of such generativity; for
another, it omits to signal typical textual embeddings. Thus, a colligational
pattern such as NP/ADJ þ à ses heures tends to occur as an appositive (often
clause-initial), and this information must be made available to the dictionary
user. Cf. for example:
English German
German compound noun ‘Blechlawine’. See the entry from the projected
English-German bilingual thesaurus in Table 2.
To take but one more example, neither the ‘big four’ monolingual learners’
dictionaries8 nor CR recognize the specific sense that wait assumes in the area
of traffic; a bilingual methodology would reveal this sense since it requires non-
literal renditions such as rester en stationnement in French and stehen or halten
in German (see Table 3). This shows that, in a bilingual thesaurus, explicitness
can be achieved quasi automatically by recording all possible variants of
a collocation along with its topic-specific or situation-specific translations,
e.g. magazine fe´minin / magazine pour femmes ¼ women’s magazine.
Likewise, the principle of internal coherence (Mel’čuk et al. 1995: 36 ff.) can
be readily adhered to in a bilingual thesaurus based on collocations rather than
20 Dirk Siepmann
lexemes (or lexemes and collocations). This principle states that there should
be perfect correspondence between the definition (i.e., in the case of a bilingual
thesaurus, the translation), the syntactic patterns and the lexical patterns
entered by a lexeme or phraseme; the only problem here is the directionality
of translation, which may lead to a larger number of entries in a bilingual
dictionary, as illustrated by the aforementioned collocation stream of traffic.
When used on its own, this collocation can be translated almost literally into
German in the form of the compound nouns Verkehrsstrom or Verkehrsflut.
When modified by the adjective endless, however, it can be rendered more
elegantly by the colloquial compound Blechlawine.
donner þ exemple, which can be used in three different types of situation with
two different meanings (see Siepmann 2003):
(1) a situation where the speaker/writer wishes to cite another author: Miller
(1995) donne un exemple de . . .
(2) a situation where the speaker/writer introduces an example of his or her
own: pour donner un exemple, je vais vous donner un exemple
(3) a situation where the speaker/writer gives an actual example: l’Arabie
Saoudite donne un exemple d’Etat islamique moderne (¼ ‘is an example’)
qn fait un appel du pied à qn jd gibt jdm einen Wink mit dem Zaunpfahl
qn conduit qn/un animal/qc quelque part jd bringt jdn/ein Tier/etw
irgendwohin; (à pied ) jd führt jdn/ein Tier/etw irgendwohin; (en voiture)
jd fährt jdn/ein Tier/etw irgendwohin
jd schlachtet qn tue [o abat] un animal/des animaux
un animal butine ein Tier sammelt Nektar [o Blütenstaub]
une abeille butine (quelque part: a bee gathers / collects / sucks (up)
sur les fleurs des artichauts / dans nectar / pollen ( from artichoke
les pissenlits) blossoms / from dandelions); a bee
gathers / collects honey11
une abeille butine une plante a bee visits a plant (to collect nectar);
(pour qqc: pour le nectar) collects nectar from a plant; sucks
(up) nectar from a plant
une abeille butine le pollen / a bee sucks up nectar / a bee collects
le nectar / le miel (quelque part) pollen (somewhere)
Collocation, Colligation and Encoding Dictionaries 23
une branche / une articulation craque ein Ast / ein Gelenk knackt
la chaussure / le toit / le fauteuil / der Schuh / das Dach / der Sessel /
le parquet craque das Parkett knarrt
la neige craque der Schnee knirscht
qqc / qqn craque de qqc (etwa:) bei j-m knackt es irgendwo /
{bruits, matériaux de construction, . . .; an einem Ort knarrt etw.
jointures}
il craquait de toutes ses jointures alle seine Gelenke knackten / bei ihm
knackte es in allen Gelenken
la maison craque de bruits de im Haus knackt und knarrt es aus
radiateurs et de boiseries der Heizung und der Holztäfelung
c’est un pavé dans la mare das schlägt ein wie eine Bombe
qqn jette un pavé dans la mare / j-m sorgt für Aufregung / j-m erregt die
qqn envoie un pavé dans Gemüter / j-m wirbelt einigen
la mare / qqn lance un pavé Staub auf / j-m sorgt für Wirbel /
dans la mare j-m läßt die Wellen der Aufregung
hoch schlagen
24 Dirk Siepmann
an empty parking space, a tight parking spot, a traffic jam clears, double
bend, avoid a traffic jam, the motorway (road) links (Paris) with
(Bordeaux), close a motorway, come off the motorway, open a (new)
motorway, motorway journeys, a clear motorway, a valid driving licence,
take one’s driving test, nothing coming (etc.)
Table 8 compares the results for the English noun motorway with the
list of ‘motorway’ collocations given in OC. The comparison shows that a
large number of collocations which an active user (i.e. a translator or language
learner) might need have been missed out. Numerically best represented in
this example as well as in traditional dictionaries generally are noun þ noun,
adjective þ noun and noun þ verb collocations. Equally well covered in
traditional dictionaries are fully fixed expressions such as proverbs or idioms.
Among the collocations of type 2 three-item collocations or ‘triples’
(Hausmann 2003) are patchily covered, probably because both monolingual
Collocation, Colligation and Encoding Dictionaries 25
N þ ADJ: busy, four-lane (etc.), N þ ADJ: big, large, major (! Fr. grande
orbital, urban autoroute); clear (! G. frei); clogged;
congested;controlled; deserted; elevated;
N þ V: join, leave, turn off, build empty; toll-free (! G. gebührenfrei,
mautfrei)
N þ N: driving, traffic, network,
Turning to depth of coverage, we find that three areas in particular are in need
of improvement, viz. a) triples b) collocational synonymy c) complementation
Collocation, Colligation and Encoding Dictionaries 27
a busy road / a busy street; a much used eine stark befahrene Straße / eine
road viel befahrene Straße / eine
verkehrsreiche Straße
on the open road; on clear roads / on auf freier Strecke; auf offener Straße
clear motorways (etc.)
outside lane hogging / blocking the fast das Blockieren der Überholspur
lane / sitting in the outside lane
English French
the car swerved (1) across the road la voiture (1) a traversé la route et
and (2) into the ditch (2) a fini dans le fossé
the car veered (1) off the side of the la voiture (1) s’est déportée sur le
road and (2) several yards down an côté de la route et (2) a dévalé à
embankment plusieurs mètres en contrebas
30 Dirk Siepmann
English German
As seen above, a useful distinction can be established between four major types
of collocational relationship. However, the distinction cannot be transferred
as such to the dictionary for a number of reasons:
of each collocation should be given for the user to form a correct under-
standing of its use and to be able to use it productively in a new context.
Accordingly, unabridged dictionaries of the future should contain at least
the three major types of lemmas (‘one-item lemmas’, ‘multi-item lemmas’
and ‘morphematic lemmas’)10; to this we might add ‘separable lemmas’ as
representations of long-distance collocations and some collocations of type 3
(see Table 14). As seen in Tables 5 and 6, complementation patterns can be
shown using placeholders such as so or sth or typical representatives of the
semantic class which can be inserted into a particular slot, such as abeille
in Table 5.
he must have found his il a dû avoir son er hat wohl den
licence in a lucky bag permis dans une Führerschein im Lotto
/ (AE) he must have pochette surprise gewonnen / er hat wohl
got his licence from a seinen Führerschein bei
lucky dip Neckermann gekauft
they’ve got nothing mon dossier est vide man hat nichts gegen
against me mich in der Hand
rather than citation forms. The same goes for collocations where one language
uses an implicit form of words which the other tends to make explicit. Thus,
imagine a car parked alongside a fence, so that little space is left between the
passenger door and the fence. The typical question German drivers put to their
passengers in such a situation will go something like this: Soll ich ein Stück
vorsetzen? An English driver might prefer a more explicit wording along
the lines of: Do you want me to move the car / it forward a bit? (alongside Shall I
go forward a bit?)
Exceptions of type 2 occur when the languages under survey do not offer
the same number of collocations for some particular idea. Such difference
8. Conclusion
Notes
1
Hoey (1998) defines colligation thus: (a) the grammatical company a word keeps
(or avoids keeping) either within its own group or at a higher rank (b) the grammatical
functions that the word’s group prefers (c) the place in a sequence that a word prefers
(or avoids).
2
Note, however, that there is much less non-native material to be found on the
Internet for languages such as French, German or Italian, so that a more reliable picture
of native language use can be built up.
3
Of course, meaning arises through the interaction of mother and child long before
it can be represented linguistically (cf. Nelson 1998, Stern 1998). It is commonly
assumed that babies who are not yet able to speak assign meaning to the different
References
1. Dictionaries
Atkins, B. T. et al. 1993. Collins Robert French-English English-French Dictionary.
Unabridged. (3rd ed.). Glasgow: HarperCollins. (CR)
Atkins, B. T. et al. 1994. Le Robert & Collins. Vocabulaire anglais et ame´ricain. Paris:
Le Robert. (VAEA)
Binon, J. et al. 2000. Dictionnaire d’apprentissage du français des affaires. Paris: Didier.
(DAFA)
Dendien, J. 2004. Tre´sor de la Langue Française Informatise´. Paris: CNRS. (TLF)
Cop, M. et al. 2001. PONS Großwörterbuch Englisch. Stuttgart: Klett. (PGE)
Corréard, M. (ed.) 1994. Oxford/Hachette French Dictionary. French-English/
English-French, Oxford: Oxford University Press. (OH)
Crowther, J. et al. 2002. Oxford Collocations Dictionary for Students of English. Oxford:
Oxford University Press. (OC)
Chapman, R. L. (ed.) 1996. Roget’s International Thesaurus. Glasgow: HarperCollins.
(RO)
Collins Cobuild English Dictionary for Advanced Learners (3rd ed. 2001). Glasgow:
HarperCollins. (CCED)
Dornseiff, F. and Quasthoff, U. 2004. Der deutsche Wortschatz nach Sachgruppen. Berlin:
De Gruyter. (DO)
Hamblock, D. and Wessels, D. 1999. Großwörterbuch Wirtschaftsenglisch
Deutsch-Englisch/Englisch-Deutsch (5th ed.). Berlin: Cornelsen. (GW)
Knight, L. S. et al. 1999. Collins German-English English-German Dictionary. Unabridged
(4th ed.). Glasgow: HarperCollins. (CG)
McArthur, T. 1981. Longman Lexicon of Contemporary English. London: Longman.
(LLCE)
Quasthoff, U. (ed.) 2003. Franz Dornseiff: Der deutsche Wortschatz nach Sachgruppen
(CD-ROM). (DO)
Procter, P. (ed.) 2001. Cambridge International Dictionary of English on CD-ROM.
Cambridge: Cambridge University Press. (CIDE)
Rey, A. (ed.) 1993. Le nouveau Petit Robert. Paris: Le Robert. (PR)
Rey, A. (ed.) 1985. Le Grand Robert de la langue française sur CD-ROM. Paris:
Le Robert. (GR)
Schnorr, V. et al. 1996. PONS Großwörterbuch Französisch. Stuttgart: Klett. (PGF)
Schemann, H. 1991. Synonymwörterbuch der deutschen Redensarten. Stuttgart:
Klett. (SR)
Collocation, Colligation and Encoding Dictionaries 37
Walter, E. (ed.) 1994. Cambridge Word Routes. Anglais-Français. Cambridge:
Cambridge University Press. (CW)
Wehrle, H. and Eggers, H. 2001. Deutscher Wortschatz. Stuttgart: Klett. (WE)
2. Other literature
Aitchison, J. 1994. Words in the mind. An Introduction to the Mental Lexicon. Oxford:
Blackwell.
Arnaud, P. J. L. 1992. ‘La connaissance des proverbes français par les locuteurs natifs
et leur sélection didactique.’ Cahiers de Lexicologie 1: 195–238.
Baker, M., Francis, G. and Tognini-Bonelli, E. 1993. Text and Technology: In Honour