You are on page 1of 40

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/31084932

Collocation, Colligation and Encoding Dictionaries. Part


II: Lexicographical Aspects

Article  in  International Journal of Lexicography · November 2005


DOI: 10.1093/ijl/eci051 · Source: OAI

CITATIONS READS

34 636

1 author:

Dirk Siepmann
Universität Osnabrück
67 PUBLICATIONS   255 CITATIONS   

SEE PROFILE

Some of the authors of this publication are also working on these related projects:

PhraseoRom : La phraséologie du roman (ANR-DFG, 2016-2020) View project

A Grammar of Spoken and Written French / Grammatik des gesprochenen und geschriebenen Französisch / Grammaire du
français parlé et écrit View project

All content following this page was uploaded by Dirk Siepmann on 23 April 2015.

The user has requested enhancement of the downloaded file.


doi:10.1093/ijl/eci051 1

COLLOCATION, COLLIGATION AND


ENCODING DICTIONARIES. PART II:
LEXICOGRAPHICAL ASPECTS

Dirk Siepmann: Universita«t-GH Siegen, Fachbereich 3, Adolf-Reichwein-Strae,


D-57068 Siegen,Germany (dsiepmann@t-online.de)

Downloaded from http://ijl.oxfordjournals.org/ at Universitätsbibliothek Osnabrück on April 19, 2015


Abstract

The present article starts from a broad definition of collocations as holistic


lexico-grammatical or semantic units (see Part I for full details), asking how such
units can be adequately represented in bilingual and monolingual encoding dictionaries.
It is found that an onomasiological approach to dictionary making is better suited to
this task than a semasiological, framework-based methodology whereby individual
lexicographers work on small, alphabetically classified sections of the dictionary.
Typically, semasiological dictionaries and corresponding methodologies have difficulty
in arranging items in a clear and memorable way, give patchy or inadequate coverage to
semantic-pragmatic collocations, cannot provide adequate cross-referencing between
synonymous items and are prone to translation errors. It is shown how onomasiological
dictionaries and methodologies can remedy such deficiencies. The Bilexicon project
aimed at creating thematic learners’ dictionaries is the main source laid under
contribution with a view to illustrating the suggestions made.

1. Introduction

There is growing recognition that both structurally simple (i.e. (bound)


morphemes, lexemes) and structurally complex units (i.e. collocations or
colligational patterns) are linguistic signs (Feilke 2003). If the dictionary is
meant to be a record of such signs, the task of the lexicographer is to gather
together evidence of both types of sign. So far it has been lexemes, non-
compositional idioms and morphemes that have received the bulk of
lexicographic attention, but the future clearly belongs to collocation and
colligation in the widest possible sense. However, most linguistic models of
collocation are too limited (e.g. Hausmann 1999), too formalist (e.g. Mel’čuk
1998) or too broad (e.g. Kjellmer 1994) to be readily adaptable to lexicographic
practice (see the first part of this article, IJL 18/4).

International Journal of Lexicography, Vol. 19 No. 1. Advance access publication 29 November 2005
ß 2005 Oxford University Press. All rights reserved. For permissions,
please email: journals.permissions@oxfordjournals.org
2 Dirk Siepmann

A viable lexicographic definition of collocation can be based on the notions


of ‘Gebrauchsnorm’, or ‘usage norm’ (Steyer 2000: 108), reflected in concepts
such as ‘minimal recurrence’ (Kocourek 1991, Siepmann 2003) or ‘statistical
significance’ (Sinclair 1991), on the one hand, and the notion of ‘inhaltliche
Geschlossenheit’ or ‘holisticity’, on the other hand (Siepmann 2003). ‘Holisticity’
here refers to the facts that native speakers can ascribe meaning to general-
language collocations even if these are divorced from context and that such
units are intuitively considered as self-contained ‘wholes’. We thus arrive at the
following definition of ‘collocation’:

Downloaded from http://ijl.oxfordjournals.org/ at Universitätsbibliothek Osnabrück on April 19, 2015


a collocation is any holistic lexical, lexico-grammatical or semantic
unit which exhibits minimal recurrence within a particular discourse
community.

It should also be taken to include colligation with a particular grammatical


category, such as a noun phrase. Thus, the collocations the future belongs to
(die Zukunft gehört, l’avenir appartient à) or l’autoroute file would be felt to
be incomplete by most speakers, requiring as they do a prepositional object.
This variable complement is conceived of as part of the collocation.
With this definition in mind, it becomes possible to suggest a four-way
typology of collocation along the following lines (see Part I):

(a) Colligation ( you can stick your þ NP, far be it from me to þ INF, ignorer
tout de þ N, il n’y a qu’à þ INF, ce/cette N [tradition, etc.] est reste´(e),
NP dans l’âme, typisch þ N, etc.); note that this definition of colligation is
different from Firth’s (1957) or Hoey’s (1998)1, since it concerns not only
the grammatical preferences of individual words, but also those of longer
syntagms. Thus, the syntagm tu n’avais qu’à can be said to be in colligation
with an infinitive clause.
(b) Collocation between lexemes or phrasemes ( just as þ clause . . . so / in the
same manner þ clause, levy charges, briser ses chaussures, c’est-à-dire en
l’occurrence, regarde où tu vas, bon ben, à la fin, etc.).
(c) Collocation between lexemes and semantic-pragmatic (contextual)
features (beautifully þ [result of creative activity], [uncertainty] þ not so,
[question] þ eh bien, [expectation] þ duly, [negative contextual aspect] þ
(not) detract from s.o.’s enjoyment, help! [on such one-word collocations,
cf. González-Rey 2002: 95, 101)
(d) Collocation between semantic-pragmatic features (e.g. long-distance
collocations, Siepmann 2005).

This typology and the notational conventions that go with it present two
major advantages with a view to lexicographic applications: they allow us to
capture the full range of collocational phenomena, and they dispense almost
Collocation, Colligation and Encoding Dictionaries 3

entirely with complicated metalanguage such as that used in Mel’čuk’s


‘lexicologie combinatoire et explicative’ (Mel’čuk et al. 1995).
In what follows, I shall discuss some of the demands the full-scale integration
of lexico-grammatical units of the type just discussed places upon commercial
monolingual and bilingual encoding dictionaries. My main concern therefore
is with the reference needs of active users, such as the native French speaker
trying to write, speak or translate into English. My thesis is that the bilingual
onomasiological rather than the semasiological dictionary constitutes the ideal
repository for the collocational and colligational units required by active
users. After a brief description of the Bilexicon project aimed at producing

Downloaded from http://ijl.oxfordjournals.org/ at Universitätsbibliothek Osnabrück on April 19, 2015


near-comprehensive thematic learners’ dictionaries, I shall go on to marshal
various sorts of evidence on the weaknesses of the semasiological and the
strengths of the onomasiological approach. This will lead to the conclusion
that the traditional dictionary-making process should be turned on its head:
rather than starting from an alphabetical framework it should proceed from
a bilingual or multilingual onomasiological research base.
I shall then proceed to discuss coverage of collocations in current bilingual
and monolingual dictionaries, together with suggestions for improvement.
The last two sections will be devoted to types of lemmas and limits on the
translatability of collocations.

2. A brief outline of the Bilexicon project

The Bilexicon project pursues a theoretical as well as a practical aim. On the


theoretical side, the aim is to provide a sound basis for the production of
unabridged onomasiological bilingual learners’ dictionaries which focus on
collocation. On the practical side, such dictionaries are to be developed for the
language pairs English/French, English/German and French/German, both in
print and electronic form.
The project can be sketched in rough outline only. What is said here should
not be taken to suggest that the problem of describing the native-speaker
lexicon or specific sections thereof is easily solved (for a fuller account,
see Siepmann, in preparation; for a sample chapter, see the author’s website
www.dirk-siepmann.de).

2.1 Rationale

The rationale behind the Bilexicon project proceeds from a paradox about
foreign language learning in higher education: language teaching specialists
have long demanded that university graduates in modern languages should
have a native-like lexical competence in their L2 (e.g. Meißner et al. 2001);
in practice, however, such a competence is seldom attained, and few serious
4 Dirk Siepmann

efforts have been made to improve attainment levels. De Florio-Hansen


(2004: 83f ) sums up the situation at German universities by stating that
students’ linguistic competence does not increase significantly between the
beginning of their course of study and its successful completion.
However, to sustain a prolonged learning effort, students must be told how
many and which lexical items they have to learn before they can confidently
claim to be competent users of the foreign languages of their choice (cf. Council
of Europe 2001: 6.4.7.2). Only once this material basis for vocabulary learning
has been laid do methodological factors come into play and can realistic
assimilation targets be set.

Downloaded from http://ijl.oxfordjournals.org/ at Universitätsbibliothek Osnabrück on April 19, 2015


2.2 The compilation of a native-like vocabulary

So far little research effort has been expended upon describing the extent
of native-like lexical competence in the L2. There is only one study for the
language pair German-French (Hausmann, forthcoming), whose aim it is to list
a large section of the receptive vocabulary of French which is ‘intransparent’
from a German perspective.
What Hausmann has achieved for the receptive side the Bilexicon project
aims to do for the productive side: to draw up a near-comprehensive list of
those collocations (including colligations) which may be considered to make
up a native-like vocabulary. The compilation of the native-like vocabulary
proceeds from two premises:

(a) Any attempt to determine basic and advanced vocabularies must start
from a list of all native-speaker signs (perhaps even including manual and
facial gestures), i.e. the entire lexicon of the language. The approach is thus
essentially top-down.
(b) It is from such a list that a near-native vocabulary can then be constructed.
Thus, rather than asking, as the traditional frequency approach did, ‘which
are the most frequent words in the language, and which words do we need
to add to these to obtain a good working vocabulary?’, this approach poses
the question ‘what are the meaning units that native speakers use, and which
of these have to be mastered to be able to perform at a near-native (or lower)
proficiency level?’. It is based on the simple observation that some adult
learners can pass as native speakers of the L2 because they have perfect
pronunciation and a command of lexico-grammar which is sufficient to express
any communicative need in a correct and natural manner. Nevertheless these
learners have not normally attained the same level of lexical competence as
a native; even for them, the framing of ideas in the foreign language is
conditioned by linguistic proficiency. It is the level of vocabulary knowledge
achieved by such learners that can be described as ‘near-native’.
Collocation, Colligation and Encoding Dictionaries 5

In theory, therefore, it should be fairly easy to establish a procedure that


might be used in compiling a near-native vocabulary. In practice, however,
such a procedure still comes up against considerable, if not insuperable
difficulties. The procedure might look something like this. In a first step a
full-size lexico-grammar of at least one language would have to be compiled.
The main problem at this stage is to give a definition of multi-word units that is
sophisticated enough to distinguish these from ‘lexical bundles’ (Biber et al.
1999) or ‘n-grams’, i.e. mere strings of word forms which occur more than once
in a corpus. Such a definition has been attempted in Part I of this article. Thus,
for example, at the end of the is an n-gram retrievable from any medium-sized

Downloaded from http://ijl.oxfordjournals.org/ at Universitätsbibliothek Osnabrück on April 19, 2015


corpus, but underlying it is the colligation at the end of the NP.
The frequency approach is an adaptation, to linguistic units beyond the
word level, of the traditional procedure for determining core vocabularies.
At its simplest, it uses a very large corpus to determine the frequency of
each meaning unit; units whose frequency is below a minimum threshold are
discarded. It is not difficult to see why this approach, if used exclusively,
is more or less unworkable. The main reason is that there is no such thing as
a representative corpus, and there are no very large corpora available which
can provide accurate guidance on spoken usage. Even the Internet – or sections
of it, such as google.co.uk with the option ‘pages from the UK’ – is neither
representative nor reliable as a corpus. Apart from being skewed towards the
written language, it contains large amounts of outdated and non-native speaker
material2; it is also uninformative on range and distribution, i.e. the extent to
which an item appears in several different text types.
In an alternative approach, each collocational or colligational unit could
be subjected to a test for ‘economy effects’. As explained above, foreign-born
speakers who pass as natives have not normally developed the same lexical
competence as native speakers; they succeed in giving a native-like impression
by recycling or creatively recombining items from what is admittedly
a vast repertoire. This repertoire, however, need not contain the hundreds of
thousands of rough formulaic synonyms that native speakers have at their
disposal. In other words, the native-like speaker can achieve considerable
economies in learning effort by acquiring just one expression for each com-
municative need. Siepmann (in preparation) suggests that such economies
manifest themselves in at least eight different ‘economy effects’ resulting in the
elimination of a collocation or lexeme from the near-native vocabulary.
To take but one example, a native English speaker wishing to describe the
state of being stationary in traffic can choose from among a number of
synonymic expressions, such as be / get caught in a traffic jam, be / get caught
up in a traffic jam, be / get stuck in a traffic jam, sit in traffic, sit in a traffic jam,
be stationary, etc. For the non-native, knowledge of just one of these expres-
sions will do; when it comes to choosing which, the criteria of frequency,
availability and learnability may be invoked.
6 Dirk Siepmann

It should have become clear that, despite its deficiencies, the second
alternative is more promising than an approach based on frequency alone,
especially if the point of departure is a clearly delimited area of the vocabulary,
such as the language of motoring or the vocabulary relating to feelings. First,
a very large corpus of subject-specific material is assembled from Internet and
other sources, such as corpora and published dictionaries. In constructing
such a corpus, it is important to include Internet genres that are lexically close
to real-life speech, such as news forums, e-mail, fan fiction, film and soap
opera scenarios. A further means of reducing the inevitable bias towards
writing in corpus construction is to elicit judgements from native speakers on

Downloaded from http://ijl.oxfordjournals.org/ at Universitätsbibliothek Osnabrück on April 19, 2015


the currency of particular words and collocations in speech. It is to be expected,
however, that such tests will produce tangible results in only a few vocabulary
areas, such as proverbs (Arnaud 1992) or idioms. In others, such as motoring,
the sheer size of the lexical material precludes any detailed investigation of
native-speaker judgements.
The third alternative is some sort of combination of the frequency-based
approach and the approach drawing on economy effects, which could,
for example, be applied in succession. Economy effects may also be taken
into consideration in determining proficiency levels below the near-native level.
The subsequent procedure involves three major steps:

(1) Corpora and dictionary sources are tapped to identify all the individual
word-forms and words belonging to the vocabulary area in question. This
involves the making of a corpus-based ‘word list’ using for example
the WordSmith tool of the same name and the use of dictionaries which
allow full-text searches or searches by subject area, such as TLF, DO, PR
or CIDE.
(2) In the next step, programs such as WordSmith and Collocate are used
to determine the collocations and patterns entered by the items on the
word list.
(3) The third step is to eliminate redundant collocations on the basis of the
aforementioned economy effects.

In a fourth, optional step various proficiency levels might be distinguished


on the basis of the frequency of collocations and single words or on the
basis of the transparency of items for particular user groups (cf. Hausmann,
forthcoming).

2.3 Macrostructure

The project stands in the long tradition of what, borrowing from McArthur
(1986), we might call ‘thematic learner lexicography’ – a tradition that goes
Collocation, Colligation and Encoding Dictionaries 7

back almost to the dawn of civilisation. Recent examples of this tradition


include LLCE, VAEA and CW, to name but a few.
As McArthur (1998: 153) believes, ‘it is impossible to find an ultimate true
schema for ordering things and words in the world’, and the Bilexicon Project
lays no major claim to innovation in this respect. Its point of departure is
a fairly traditional division of the lexicon into topic areas such as ‘motoring’
and sub-areas such as ‘parking’. Where it does innovate is in the distinction
between topic areas and situation types and in cross-referencing between
syntactically and semantically similar patterns, which will be available only
in the electronic version.

Downloaded from http://ijl.oxfordjournals.org/ at Universitätsbibliothek Osnabrück on April 19, 2015


The distinction between topic areas and situation types is not perfectly
clear-cut and merits a brief explanation. In a sense, every communicative
situation is of course unique, but it seems permissible to generalise across
specific situations to arrive at similar ‘situation-types’ (Lyne 1985) or ‘text-
types’ embedded in more general ‘topic areas’ (McArthur 1981). An exclusive
focus on either of these, as found in the works just cited, seems severely
limiting, as topic areas and situation-types are interdependent. One situation-
type, such as a court hearing, can involve widely varying topics. It may also
be subdivided into any number of sub-types, down to as narrow a discoursal
span as the conversational turn in the case of a simple exchange of greetings
(speaker A: hello, speaker B: hello); conversely, the same topic, such as an
account of an accident, can occur in several different situation-types or text-
types, such as general conversation, court hearings, newspaper reports or
insurance claims letters. Let us consider a few examples to illustrate the
possible categorisation of various types of collocation (see Table 1).
What distinguishes the Bilexicon from other bilingual thesauri is that
allocation of entries to topic areas is essentially bottom-up, that is, it is
the collocations found in the subject-specific corpora which determine the

Table 1: Semantic categorization in a conceptually organised dictionary

Collocation Topic Area 1: Topic Area 2:


Situation Type 1 Situation Type 2

money/funds/a sum/etc. þ Banking –


leave þ account/bank/etc.
Tu craches ta valda ? Road traffic: Traffic Emotions: Impatience
lights (obsolescent)
regarde où tu vas! Movement: Moving Emotions: Care
with care (or Caution)
make s.o. feel small Emotions: Humiliation –
I would give anything Emotions: Cravings –
to þ INF
8 Dirk Siepmann

setting up and internal structuring of sub-areas and situation types. This stands
in contrast with traditional approaches to thesaurus building, where terms were
inserted into a fully pre-determined ontological structure. There are, of course,
obvious limitations to such an approach in that some words and collocations
have both general and topic-specific uses. A case in point is the vocabulary
relating to damage, which is important in such situation types as ‘car accidents’
but may also apply to a wide range of other situations (any kind of accident,
intention to harm, legal terminology, etc.).
Underlying this thematic organization in the electronic version will be a layer
of semantic links inspired by such work as Francis, Hunston and Manning

Downloaded from http://ijl.oxfordjournals.org/ at Universitätsbibliothek Osnabrück on April 19, 2015


(1996, 1998), who have shown that words entering similar patterns usually
share an aspect of meaning. This will enable users to extend their vocabulary
along a non-thematic route and will raise their awareness of the close link
between sense and syntax.

3. Semasiological vs. onomasiological dictionaries

As noted in the previous section, the Bilexicon project aims at producing


bilingual onomasiological dictionaries whose main entry type will be of a
collocational nature. This represents a break with the word-based lexicography
still current in both semasiological and onomasiological approaches. Semasio-
logical dictionaries tend to consist of an alphabetical word list leading the user
from the word to its meaning, while onomasiological dictionaries allow the
user to proceed from a particular concept and find the most appropriate
word for it. Both types of dictionary are therefore mainly based on individual
words – although, perforce, including phraseology in sub-entries and examples.
This section begins with a brief critique of the notion of ‘word meaning’ before
discussing the effectiveness of the two types of dictionary in representing
collocation.

3.1 Meaning units beyond the word

The vast majority of today’s dictionaries are based on the Saussaurean


paradigm that the basic unit of meaning in a language is the word; accordingly,
dictionaries are regarded as ‘word books’ (cf. German Wörterbücher) which
provide records of the various senses of individual words. So influential
has been this view of the dictionary that the bestsellers among present-day
monolingual and bilingual encoding dictionaries are small to medium-sized,
alphabetically organised pocket or desk dictionaries which list one-to-one
equivalents between words and provide only limited guidance on the syntag-
matics of language. Modern dictionaries thus perpetuate the time-honoured
Collocation, Colligation and Encoding Dictionaries 9

tradition of recording single words which has existed at least since Babylonian
antiquity.
There is, of course, no denying the fact that speakers can isolate words
from context and thus arrive at a definition of ‘word meanings’. However, since
the definition of word meaning requires the speaker to engage in a process of
abstraction, it is at least debatable whether it is ‘word meanings’ that underlie
the speaker’s competence. Even the elicitability of paradigmatic relations
between the meanings of individual words does not allow us to conclude
that word meanings are stored in paradigmatic networks in what is often
called the ‘mental lexicon’ (cf. Aitchison 1994). It is equally conceivable that

Downloaded from http://ijl.oxfordjournals.org/ at Universitätsbibliothek Osnabrück on April 19, 2015


observees in psychological experiments respond with particular paradigmatic
associations because they have repeatedly met the associated items in
syntagmatic strings (cf. Rapp and Wettler 1992, Rapp 1995); as Jones (2002)
has shown, antonyms, for example, tend to co-occur syntagmatically (good or
bad, rich and poor).
The crucial factor in the acquisition of meanings thus seems to be the
primary association between lexical units of varying length3 and their extra-
linguistic and/or intralingual context of occurrence rather than the secondary
paradigmatic connections between two or more words that speakers can
establish when prompted or the word meaning which they can ‘abstract out of
context’ when asked. Put another way, when ‘unprompted’, speakers produce
meanings by syntagmatically associating and/or modifying lexical chunks
which they have encountered before in similar contexts as the current one.
Our own practices of dictionary making have blinded us to the fact that we do
not communicate by stringing together individual words, but rather by means
of semi-prefabricated lexico-grammatical units.
This view, first proposed in outline by Bally (1909), has recently come to the
fore again in the Firthian tradition. Meaning is seen as residing in typical
combinations of lexical choices or ‘collocability’ on the one hand, and typical
combinations of grammatical choices or ‘colligation’ on the other (Hunston
2001). A crucial aspect of an item’s meaning is its ‘semantic prosody’, a term
which reflects the realisation that lexical items become infused with particular
connotations due to their typical linguistic environment (Sinclair 1991, Louw
1993, Stubbs 1995).
The implications of the above for lexicography, especially learner lexicog-
raphy are clear: if a) meaning is considered to be inherent in collocation (under
which term I here subsume colligation) and b) the dictionary is intended to
provide a record of the units of meaning in a language, then future dictionaries
will have to provide a full account of collocational meaning units and their
typical contexts of occurrence.4 One of the most obvious desiderata, then, is for
collocations, as defined in the introduction, to be given entry status. Rather
than appear in the exemplificatory material, collocations of this type should
themselves be illustrated with examples as necessary.
10 Dirk Siepmann

3.2 Difficulties ofthe semasiological dictionaryin recording and representing collocation

The foregoing considerations raise questions about the macrostructure, micro-


structure and mediostructure (Hartmann 2001: 64–66) of a dictionary which
could adequately represent collocation. There are a variety of systematic
reasons why traditional semasiological print dictionaries, whether mono-
lingual or bilingual, will tend to fall short of this goal. Tersely stated, the main
reasons are:

(1) the difficulty of arranging items in a clear and memorable way;


(2) the inadequate coverage and representation of collocation between lexemes

Downloaded from http://ijl.oxfordjournals.org/ at Universitätsbibliothek Osnabrück on April 19, 2015


and semantic-pragmatic features;
(3) insufficient discrimination between collocations and examples.

Let us deal with these in sequence.

3.2.1 Place of entry. Firstly, semasiological dictionaries arrange entries by the


alphabet. If collocations are to be given entry or sub-entry status in such
dictionaries, this will pose the age-old question about the word or word-form
under which the multi-word entry should appear. There is a wide range of
possibilities for resolving this question. The policy of many dictionaries is to
indicate some of the collocates of headwords in square brackets or in the
exemplificatory material and to enter (comparatively) fixed expressions such as
idioms at the first notional word. Thus, the idioms all hell breaks loose and
out of a clear blue sky would be found respectively at hell and clear. There are
a number of possible alternatives to this organizing schema (cf. Gates 1988).
For example:

(1) Collocations may be arranged alphabetically by their first components.


(2) Collocations may be entered at the semantically most important
component.
(3) Collocations may be entered at the grammatically most important
component.
(4) Collocations may be entered at the least frequent component if there is a
wide difference in frequency between the constituents (cf. Bogaards 1990).

The second of these possibilities would partially solve the difficulties users
have in locating collocations because of their ‘directionality’; two-item
collocations are still normally recorded at the entry for the collocate rather
than for the base (i.e. the semantically most important word). Thus, users will
find meet a criterion under meet rather than criterion, although their
formulation process starts with the noun. One wonders, however, whether
the second and third of these schemas will always lead to an unequivocal
solution, as lexicographers’ and users’ views on what is semantically and
grammatically ‘most important’ may differ. The fourth solution reflects user
Collocation, Colligation and Encoding Dictionaries 11

preferences identified in an empirical study, but seems only to apply to native


(French) dictionary users rather than language learners (Bogaards 1990).
For the sake of user convenience, it is desirable therefore to enter a col-
location under each of its meaning components and to cross-refer the user to
the place where the entry is found. Drawing on this insight, Petermann (1983)
has devised a consistent location policy for traditionally conceived ‘phrasemes’
(i.e. fixed expressions) which could also be applied to collocations. He suggests
that each phraseme should appear under each of its notional components
while being assigned only to one main entry. The choice of this entry is to be
determined by the following criteria: if the phraseme contains a noun, this

Downloaded from http://ijl.oxfordjournals.org/ at Universitätsbibliothek Osnabrück on April 19, 2015


becomes the main entry; if there are several nouns, main entry is given to the
first. If there is no noun, main entry is given to the first adjective, etc., in the
following order: verb, adverb, pronoun, numeral, interjection. Consistent as
this policy may be in theory, the question is whether the average dictionary user
can be expected to comprehend it. Interestingly, however, it is in keeping with
the results of an empirical study (Bogaards 1990), which found that Dutch
language learners begin their searches with nouns, followed by adjectives
and verbs.
Another common suggestion consists in recording different types of
phrasemes in different ways (Burger 1989: 595). Fully idiomatic phrasemes
are to be listed under one of their components only, with cross-references at
the entries for other components; the choice of the entry term should not be
governed by semantic considerations, as these require the largest amount of
previous knowledge on the part of the user. Partially idiomatic phrasemes
which are linked to a specific meaning of a headword are to be treated under
the relevant sense division. Non-idiomatic phrasemes have to be discussed
at each of their components, under the relevant senses. Although presenting
the clear advantage of highlighting connections of meaning, this arrangement
is theoretically unsound in that, rather than recognizing the holisticity of
collocations, it presupposes their semantic divisibility and may entail an
etymological re-motivation of what is only a partially motivated or
unmotivated fixed expression (see also Burger 1989: 595).
To compound matters, the nesting of collocations may make retrieval
difficult. A large number of syntactically well-formed collocations (cf. for
example regarde où tu vas or I’ve got [liquid, crumbs, etc.] all over/on [piece of
clothing, exercise book, etc.]) are made up of highly frequent individual lexemes
such as regarder, aller, have, haben, etc., a factor which contributes to heavily
inflating entries for such words. Current unabridged dictionaries bear ample
testimony to this, although they are still a long way from including the
totality of collocations. Thus, the entry for aller in PR, for example, runs to
three and a half columns.
One way of solving this problem would be to draw items together in blocks
at the end of the entry. Each block would present items exhibiting a particular
12 Dirk Siepmann

type of syntactic relationship, after the manner of OC, for example. But then
again such clustering may be difficult to justify with clearly motivated multi-
word units like there is good reason to þ INF; there is a strong case here for
treatment under the relevant sense division of reason.
There are, of course, equally good reasons for giving main entry to
collocations as there are for recording them under a sub-entry, whether this be
a separate entry or a sense division of a particular headword (cf. Burger 1998:
172 on multi-word units). However, if we decide to give collocations main
entry status, this will entail an even more complex macrostructure. To take but
one example, multi-word collocations serving a pragmatic or text-structuring

Downloaded from http://ijl.oxfordjournals.org/ at Universitätsbibliothek Osnabrück on April 19, 2015


function and beginning with the pronoun it (it behoves us þ to þ INF, it is worth
bearing in mind þ that/wh-clause, etc.) or the preposition to (to give an example,
to this end, to return to þ NP) would fill dozens of pages, and so would two-item
collocations beginning either with common nodes or common collocates
(such as increase or give).
From all this it seems reasonable to conclude, as most theorists do (cf. for
example Burger 1989: 595 on phrasemes), that there is no ready-made solution
for the positioning of collocational units in semasiological dictionaries.
Each case requires to be considered on its own merits, and the preferences of
particular user groups have to be taken into account (Bogaards 1990, 1991);
there should be neither consistent conflation into end-of-article nests nor
arbitrary allocation to a particular sense division. Rather, as with derivatives
and compounds (which have traditionally been conceived of as distinct from
collocations), it is inevitable to steer a middle course between considerations
of semantic relatedness, user convenience and economy of treatment (cf. Cowie
1999: 150 on derivatives and compounds). In any case, collocations should
be highlighted typographically, and, if necessary, attention should be drawn
to their special pragmatic and/or text-structuring functions. However, given
the sheer size of the class of collocations, alphabetical access seems an
unmanageable solution in the long run.

3.2.2 Representation of semantic-pragmatic collocations. If we now ascertain the


relationship between types of collocations and the problems associated with
recording them, it turns out that the semasiological dictionary experiences
the greatest difficulty in adequately representing purely semantic-pragmatic
collocations occurring in specific situation-types or topic areas. A pertinent
example is afforded by semantic-pragmatic collocations based around mordre
sur (‘overlap into’, ‘go over into’, ‘cut into’, ‘veer off course into/onto’), which
occur in three main topic areas, viz. a) geography (e.g. une re´gion mord sur une
autre), b) medicine (une partie du corps mord sur une autre) and c) motoring
(une voiture mord sur une partie de la route).
The bilingual semasiological encoding dictionary has two options to
represent such information: by adapting PGF style: une voiture mord sur qc
Collocation, Colligation and Encoding Dictionaries 13

(accotement, ligne médiane, etc.), or by adapting CR style: [voiture] mordre sur


[accotement]. Of these, the first would seem to be immediately comprehensible
to the user, since it is very close to a natural language sentence. The mono-
lingual encoding dictionary could solve the problem by using Cobuild’s folk
definition style, which allows the lexicographer to place typical collocates in
the first part of the defining sentence:

lorsqu’une voiture mord sur une partie de la chaussée ou sur le bas-côté,


elle va au-delà de la voie de circulation qui lui est normalement attribuée

Downloaded from http://ijl.oxfordjournals.org/ at Universitätsbibliothek Osnabrück on April 19, 2015


Unfortunately, apart from Cobuild, DAFA and, to a lesser extent, CIDE, none
of the available monolingual dictionaries have so far made any use of the above
procedures for representing collocational meaning.
One deficiency of the semasiological encoding dictionary which even Cobuild
has been unable to remedy is the impossibility of representing synonymy
between collocations in a space-saving and user-friendly manner. Let us
consider the following example of a collocation of type 3 and its possible
representation in a semasiological dictionary:
money=funds=a sum þ leave þ account=bank=fund=country

If we were to record this semantic-pragmatic collocation ([money] þ


leave þ [place where money is stored]) with a view to enabling the user to
comprehend and encode it in its entirety, we would have to make a minimum of
three entries (at money, funds and sum) and a maximum of eight entries (money,
funds, sum, account, bank, fund, country, leave), not to speak of the amount of
cross-referencing that would be required. Moreover, collocational attraction
between any two of the constituents in this semantic-pragmatic collocation
(e.g. funds þ leave þ country) may be too weak to show up in a concordance
based on mutual information (Church and Hanks 1990) or log likelihood
(Dunning 1993), thereby not warranting the inclusion of any specific
collocation. Yet the semantic-pragmatic collocation as a whole is clearly
frequent enough and of interest to language learners, especially since other
languages such as German may have slightly different ways of expressing
the same idea (e.g. money leaves an account – Geld geht von einem Konto ab /
[less commonly:] Geld verläßt ein Konto).

3.2.3 Examples vs. collocations. Another problem with existing semasiological


dictionaries is that they fail to distinguish between examples and collocations,
i.e. they frequently record holistic units within the exemplificatory material
rather than assigning them entry status and exemplifying them in their turn.
This is not usually a serious problem with traditional two-word collocations
in which the collocate assumes a specific meaning – if we disregard for
the moment the fact that such collocations may still be difficult to locate
14 Dirk Siepmann

for users – but it becomes one in the case of collocations which appear to have
been ‘freely’ put together by the application of general semantic and syntactic
rules. This can be illustrated with two examples, one from an unabridged
monolingual dictionary (GR) and one from a monolingual learner’s dictionary
(CCED).
GR, which offers a sprinkling of ‘extended’ collocations, will serve to
illustrate the haphazard nature of current practice (for further detail, see
Siepmann 2005). Thus, the exemplificatory infinitive clause pour n’en citer
qu’un exemple – a collocation of type 2 common in academic writing – is found
as the second example under sub-entry II.2:

Downloaded from http://ijl.oxfordjournals.org/ at Universitätsbibliothek Osnabrück on April 19, 2015


(XIVe). Cas, événement particulier, chose précise qui entre dans
(une catégorie, un genre . . .) et qui sert à confirmer, illustrer, préciser
(un concept). Voici un exemple de sa beˆtise. Pour ne (n’en) citer qu’un
(seul) exemple. Aperçu, échantillon, spécimen. Ce cas offre un exemple
typique de telle maladie. 5X Type. C’est un bel exemple de pre´sence
d’esprit! Alle´guer, apporter des exemples à l’appui d’une assertion, d’une
affirmation. 5X Preuve. Exemple concret illustrant une ide´e abstraite.
Appuyer (cit. 5) d’un exemple. Exemples donne´s dans un manuel de physique,
de chimie. Exemple bien, mal choisi. Donnez-moi un exemple de volcan
e´teint, de plissement tertiaire. Exemples à l’appui d’un raisonnement,
d’une de´monstration. Exemple qui prouve que . . . Il m’a cite´ l’exemple de
ce chanteur (! 1. Basse, cit. 7). Puiser ses exemples dans l’histoire
(! Égoı̈sme, cit. 1). (GR, s.v. exemple)

The multi-word collocation in question has been entered as an example


sentence followed by a full stop. This implies that the phrase can stand on its
own, thus obscuring its textual function of introducing an example, and
potentially leading at least the foreign-born user astray.
With a collocation such as we (now) turn (now) to the situation is even less
clear. In CCED it appears in the exemplificatory material at sub-entry 12 for
turn and is not explicitly marked as a collocational unit:

We turn now to the British news.

This example sentence may, however, not be very useful to learners, since it
neglects to highlight that we are dealing with a transitional device that can be
employed in both spoken and written English rather than an ad-hoc formation.
The drawbacks of such practice should by now be obvious. For one thing,
neither the native nor the non-native user will be sensitised to the holistic
nature of multi-word units. For another, the non-native user in particular
will find it difficult to find variants of a particular collocation, such as pour ne
donner qu’un exemple or pour prendre un seul exemple in the case of the example
from GR – this is due to the lack of synonymic links in the mediostructure
Collocation, Colligation and Encoding Dictionaries 15

already touched upon. One reason for the lack of cross-referencing with
regard to synonyms is what may be termed the ‘alphabetical framework
approach to dictionary making’. In the compilation of large-scale dictionaries
one commonly starts by drawing up an alphabetical list, or ‘framework’
of the major sense divisions before assigning one small section of the
alphabetical list to the individual lexicographer, who will identify and enter
collocations of individual lexemes without much regard to the findings of his
or her colleagues.
As can also be inferred from the above examples, another serious
disadvantage of current practice is that common collocations tend to be

Downloaded from http://ijl.oxfordjournals.org/ at Universitätsbibliothek Osnabrück on April 19, 2015


submerged amid a welter of detail. Thus, in GR, it takes a considerable amount
of searching to locate the concessive discourse marker il faut bien reconnaıˆtre
que within one of the sub-entries for reconnaıˆtre. The specific pragmatic
function of the marker is not made explicit; rather, it must be inferred from
the general definition given under sense division 4 of reconnaıˆtre or from its
synonymy with the evidence marker il faut se rendre à l’e´vidence, to which the
reader is cross-referred.

4. (XIVe). Admettre pour vrai après avoir nié, ou après avoir douté,
accepter malgré des réticences. 5X Admettre, avérer, déclarer . . . On a fini
par reconnaıˆtre son innocence. 5X Croire (à); ! aussi Rendre hommage*
à . . . On est force´ de reconnaıˆtre des divergences (cit. 1) entre certains
textes . . . Maintes fois, il le reconnaıˆt lui-meˆme, il manquait de bon sens
(! Grain, cit. 26). Reconnaıˆtre la supe´riorite´ de qqn. 5X Céder (3.: le
céder à); proclamer . . . Amener qqn à reconnaıˆtre. 5X Convaincre.

Reconnaıˆtre que. 5X Admettre, avouer, convenir (de); ! Boiteux, cit. 7;


démarche, cit. 4; Dieu, cit. 47; malheur, cit. 39; oracle, cit. 4. Ils ont tous
reconnu qu’il a fait ce qu’il a pu. 5X Tomber (d’accord). Vous n’he´siterez
(cit. 14) pas à reconnaıˆtre que. . . Je reconnais que . . . 5X Accorder; entendre
(j’entends bien). - Quoi qu’on dise, on doit reconnaıˆtre que . . . (- Canaille,
cit. 12). Force (cit. 58) lui e´tait de reconnaıˆtre que . . . (- Exciter, cit. 32).
Il faut bien, on doit reconnaıˆtre que . . . 5X Évidence (se rendre à l’évidence);
! Mélodique, cit. 1.

Turning now to colligational patterns, we find that quite a number of these


have found their way into the dictionaries, but that they are usually treated by
way of lexical exemplification. Here are a few examples from PR:

un me´canicien en herbe (PR; underlying colligation: NP [‘vocation’] þ


en herbe)
de la graine de voyou (PR; underlying colligation: de la graine de þ NP)
eˆtre musicien dans l’âme (PR; underlying colligation: NP þ dans l’âme)
16 Dirk Siepmann

Note that such treatment is doubly limiting. For one thing, it conceals the
generativity of the patterns as well as the limits of such generativity; for
another, it omits to signal typical textual embeddings. Thus, a colligational
pattern such as NP/ADJ þ à ses heures tends to occur as an appositive (often
clause-initial), and this information must be made available to the dictionary
user. Cf. for example:

Poe`te à ses heures, Guillaume improvisait des vers.


Nicolas, jardinier à ses heures, dispose d’une plantation qui lui fournit la
matie`re premie`re de ses pe´tards.

Downloaded from http://ijl.oxfordjournals.org/ at Universitätsbibliothek Osnabrück on April 19, 2015


3.2.4 Other deficiencies resulting from a semasiological methodology. Another point to
note (and one I shall expand upon in the section on translation equivalence
below) is that definitions and sense divisions in monolingual dictionaries
as well as translations in bilingual semasiological encoding dictionaries often
leave something to be desired. Again, this is primarily because bilingual
lexicographers who work on single letters or words often lack contextual,
or more accurately, subject-specific information; even if they have such
information in one language, they may still find it difficult to provide natural
textual equivalents because they fail to avail themselves of the time-honoured
strategy used by professional translators of comparing ‘parallel’ texts, i.e. texts
which deal with the same or similar subject matter in different languages.
To compound matters, bilingual dictionaries tend to exhibit an ‘empirical
dependency’ (Kromann 1991: 2714, Hausmann 2002: 16–19) on monolingual
dictionaries in the sense that the aforementioned alphabetical framework
is generally grounded on monolingual dictionaries. As a consequence,
interlingual divergences which could emerge from a contrastive analysis are
not normally taken account of.
There is ample evidence from a number of studies of such dependencies.
Hausmann (2002: 16–19) shows that OH was the first dictionary to introduce
the notion of ‘tact’ into its French renderings of the English adjective
insensitive, for the simple reason that its compilers had at their disposal two
new monolingual dictionaries which used ‘tact’ in their definitions and
provided several examples of its use including several typical collocations.
In similar vein, Cummins and Desjardins (2002) demonstrate that there is
insufficient discrimination in a number of bilingual dictionaries between the
various senses of two English-French pairs ( population/population and plus ou
moins/more or less) to enable correct encoding. For example, French population
has an affective use not paralleled by its direct English equivalent which is
better rendered by nouns or collocations such as people or the (general) public.
Again, it is reliance on monolingual dictionaries which appears to be the root
cause of such oversights.
Collocation, Colligation and Encoding Dictionaries 17

Another example can be seen in GW (German-English), which renders


the German compound noun Bildungsangebot by the clumsily literal word
combination educational offer. As a study of parallel texts will reveal, however,
the intended meaning is idiomatically expressed in British English as
educational provision (see also Laffling 1991) or training provision, as the case
may be.
While such shortcomings could be remedied fairly easily by consulting
parallel texts available from corpora or the Internet or by developing
algorithms for the automatic extraction of traditionally-conceived bipartite
verb-noun or noun-adjective collocations (cf. Laffling 1991; Smadja,

Downloaded from http://ijl.oxfordjournals.org/ at Universitätsbibliothek Osnabrück on April 19, 2015


McKeown and Hatzivassiloglou 1996; Fontenelle 2003), the situation is less
straightforward with extended collocations of the type far be it from me
to þ INF, vieles spricht dafür, dass (see Siepmann 2005), regarde où tu vas or
tout se passe comme si (see Siepmann 2004). These collocations are either
absent from dictionaries or wrongly translated because there are usually no
node words on which either the human lexicographer or extraction software
could base their search for an equivalent (cf. regarde où tu vas ¼ pass auf, wo du
hintrittst).5
Take, for example, the discourse marker far be it from me to . . . , which is
common in academic and journalistic prose. In CG this has been rendered by
es sei mir ferne, zu . . . The German expression is untypical of modern academic
or newspaper style and has a distinctly archaic ring to it. For lack of resources
in which to locate a workable equivalent, the lexicographer must have selected
one from the entry for fern(e) in an outdated monolingual German dictionary.
Greater familiarity with academic and newspaper German or reliance on
parallel texts would have thrown up solutions such as es liegt mir
fern þ zu þ INF or nichts liegt mir ferner, als þ zu þ INF.

4. Potential benefits of the onomasiological approach

My contention in this section is that the adoption of an onomasiological,


collocation-based approach is likely to make the dictionary compilation process
more reliable and more efficient, thereby ultimately leading to more reliable
final products. So far commercially available onomasiological dictionaries,
like their semasiological counterparts, have focussed on single words or
traditionally-conceived fixed expressions (e.g. RO, DO, WE) but they will
really come into their own when collocation is taken into account.
The principal reason why the onomasiological approach is superior to the
semasiological is not far too seek: as communicators, we do not start from
lists of individual words which we then go on to combine in a suitable fashion.
It is not ‘atomised single units, but concepts and processes’ (Götze 1999: 11)
that are represented in our brain. The concepts we wish to convey and the com-
municative choices we make are normally expressed either by collocations or,
18 Dirk Siepmann

less commonly, by individual words.6 As pointed out above, collocations are


inextricably linked with, and usually restricted to, some particular topic area
and/or situation-type through what may be described as neuronal assemblies,
i.e. the repeated association of lexical units or semantic-pragmatic features with
a situational or syntagmatic context. In the same way, the lexicographer gains
considerable advantage from focussing on collocational choices within a
particular subject area.
Let us now consider the ways in which the onomasiological approach can
resolve the problems noted above for the semasiological approach.

Downloaded from http://ijl.oxfordjournals.org/ at Universitätsbibliothek Osnabrück on April 19, 2015


4.1 General lexicographic principles and the onomasiological approach

We may start by looking at a number of lexicographical stringency criteria


proposed by Mel’čuk et al. (1995: 33 ff.). They point out, among other things,
that traditional dictionaries fail to describe semantically related lexemes in
a sufficiently uniform manner (Mel’čuk et al. 1995: 40). As an example they
cite nouns designating nationality. Whereas un Français is defined as ‘une
personne de nationalité française’ in one dictionary, un Chinois has no
definition, etc. Mel’čuk et al. (1995: 40) therefore posit the principle of
uniformity, which states that the articles representing phrasemes belonging to
one semantic field must be as closely similar as possible. It follows that,
although their ‘idealized’ dictionary is alphabetical for reasons of ease of use,
it is ultimately onomasiological since the central concept underpinning it is
the semantic field. Only an onomasiological methodology can guarantee
uniformity of treatment.
Another clear advantage of the onomasiological approach lies in its being
‘explicit’ in the sense that nothing is left to the user’s intuition. As Mel’čuk
et al. (1995: 35–36) point out, a collocation such as magazine fe´minin cannot be
entered as a mere example because it could theoretically mean either ‘magazine
about women’ or ‘magazine for women’. One wonders, however, whether full
explicitness can ever be achieved when using a monolingual methodology;
as mentioned in Section 2.1 above, many of the nicer sense distinctions in
one language (such as the various meanings of French population) only come to
light against the background of another language. Thus, while monolingual
collocational dictionaries such as OC may well record stream of traffic or flow
of traffic, they do not differentiate between the two senses of the collocation
which become apparent when comparison is made with equivalent German
expressions (in German a distinction is made between ‘fließender Verkehr’
into which the road user merges and ‘Verkehrsströme’ or ‘Verkehrsfluten’
visualised as continuous lines of dense traffic).7 Nor do they take note of
triple collocations such as endless stream of traffic, which may, however,
become apparent from a contrastive search for a viable equivalent of the
Collocation, Colligation and Encoding Dictionaries 19

Table 2: stream of traffic and its German equivalents

English German

stream of traffic / flow of traffic / der Verkehrsstrom /


traffic flow die Verkehrsflut
the steady stream of traffic die kontinuierliche Verkehrsflut
heading to St Sampsons in Richtung St. Sampsons
(die sich nach St Sampsons
ergießende Blechlawine)
look behind early and move into schauen Sie sich frühzeitig um

Downloaded from http://ijl.oxfordjournals.org/ at Universitätsbibliothek Osnabrück on April 19, 2015


the stream of traffic when safe und ordnen Sie sich bei einer
günstigen Gelegenheit in den
fließenden Verkehr ein
endless stream of traffic / die Blechlawine*
solid line of cars / heavy traffic
there is an endless stream of traffic von der Straße des 17.
from the Straße des Juni rollt eine Blechlawine am
17. Juni going past the Brandenburg Gate Brandenburger Tor vorbei
we go around a bend and there wir fahren um eine Kurve und
ahead of usis a solid line vor uns ergießt sich eine
of cars as far as you can see Blechlawine soweit das
Auge reicht

Table 3: wait and its French and German equivalents

English French German

I couldn’t je ne pouvais pas rester en ich konnte nicht lange halten /


wait very long stationnement très longtemps ich konnte nicht lange anhalten

German compound noun ‘Blechlawine’. See the entry from the projected
English-German bilingual thesaurus in Table 2.
To take but one more example, neither the ‘big four’ monolingual learners’
dictionaries8 nor CR recognize the specific sense that wait assumes in the area
of traffic; a bilingual methodology would reveal this sense since it requires non-
literal renditions such as rester en stationnement in French and stehen or halten
in German (see Table 3). This shows that, in a bilingual thesaurus, explicitness
can be achieved quasi automatically by recording all possible variants of
a collocation along with its topic-specific or situation-specific translations,
e.g. magazine fe´minin / magazine pour femmes ¼ women’s magazine.
Likewise, the principle of internal coherence (Mel’čuk et al. 1995: 36 ff.) can
be readily adhered to in a bilingual thesaurus based on collocations rather than
20 Dirk Siepmann

lexemes (or lexemes and collocations). This principle states that there should
be perfect correspondence between the definition (i.e., in the case of a bilingual
thesaurus, the translation), the syntactic patterns and the lexical patterns
entered by a lexeme or phraseme; the only problem here is the directionality
of translation, which may lead to a larger number of entries in a bilingual
dictionary, as illustrated by the aforementioned collocation stream of traffic.
When used on its own, this collocation can be translated almost literally into
German in the form of the compound nouns Verkehrsstrom or Verkehrsflut.
When modified by the adjective endless, however, it can be rendered more
elegantly by the colloquial compound Blechlawine.

Downloaded from http://ijl.oxfordjournals.org/ at Universitätsbibliothek Osnabrück on April 19, 2015


The problems with the definition of lexemes which arise from the inclusion
of such collocations as ce´libataire endurci do not occur in bilingual dictionaries
and are in fact purely theoretical, since collocations should be considered as
holistic meaning units. As Mel’čuk et al. (1995: 37) rightly conclude, the lexeme
ce´libataire on its own can never have the meaning ‘homme en âge d’être marié
qui n’a jamais été marié et qui veut rester tel’ although the above collocation
would seem to suggest just that.
Two additional principles proposed by Mel’čuk et al. (1995) are the principle
of ‘exhaustiveness’ and that of ‘compulsory consultation of databases’.
As outlined in Section 2, the fulfilment of these principles can be greatly aided
through using a bilingual or multilingual approach which should proceed in an
iterative cycle:

compilation of subject-specific corpora in at least two languages !


compilation of subject-specific word and collocation lists ! analysis of the
contextual embedding of collocations with the help of the Internet !
additions to corpora from Internet sources used in context analysis (etc.)

In summary, it could be said that future lexicography should pursue


a methodology which is diametrically opposed to the framework approach
outlined above. Sooner than proceeding from alphabetical lists of individual
lexical units based on monolingual dictionaries, it would be grounded in topic-
specific lists of collocations. The methodology of monolingual dictionary
making would thus also be turned on its head, since monolingual dictionaries
would benefit from the more detailed sense divisions established by bilingual
onomasiological lexicography.

4.2 Other potential benefits

An onomasiological methodology allows us to solve the problem of separating


different meaning units which would normally be allocated to the same article
in a semasiological dictionary. An example of this is the French collocation
Collocation, Colligation and Encoding Dictionaries 21

donner þ exemple, which can be used in three different types of situation with
two different meanings (see Siepmann 2003):

(1) a situation where the speaker/writer wishes to cite another author: Miller
(1995) donne un exemple de . . .
(2) a situation where the speaker/writer introduces an example of his or her
own: pour donner un exemple, je vais vous donner un exemple
(3) a situation where the speaker/writer gives an actual example: l’Arabie
Saoudite donne un exemple d’Etat islamique moderne (¼ ‘is an example’)

Downloaded from http://ijl.oxfordjournals.org/ at Universitätsbibliothek Osnabrück on April 19, 2015


The collocation would thus be given at least three entries in different sub-
sections of an onomasiological dictionary. Similar considerations hold true for
English collocations such as avoid an accident (cf. French empeˆcher un accident
vs. e´viter un accident) or leave the road (cf. German von der Straße abfahren
[intentional] vs. von der Straße abkommen [accidental]). It is the contrastive
background of a foreign language that allows the lexicographer to uncover the
polysemy of such items.9
Another problem noted above was the placement of collocations within
the dictionary; this can be resolved quite elegantly in an onomasiological
dictionary (or hybrid electronic dictionaries) such as the projected English-
French Bilingual Thesaurus (Bilexicon), where topic area and situation type
are the decisive factor in determining place of entry.
Likewise, in an onomasiological dictionary semantically related or syno-
nymic expressions do not need to be cross-referenced, as they will appear at
the same place in the dictionary. Examples are given in Table 4.

Table 4: Synonymic collocations in an onomasiological dictionary

Synonymic or semantically related Topic Area: Situation Type


collocations

encore nommé / autrement appelé / qu’on Discourse Markers:


appelle aussi Reformulation
don’t say a word / don’t make a sound / Noise: Telling people
be quiet / hush / quiet, please / shut up / to be quiet
wrap up / belt up / put a sock in it
Freizeit-N, Gelegenheits-N, Hobby-N Hobbies: Describing amateurs
when the right moment has come, in due Timing: Right moment
course, at the appropriate juncture, at the
appropriate moment, when the time has
come
fahren auf / befahren / benutzen / fahren Driving: Road use
(trans.) (þ Straße)
22 Dirk Siepmann

The division of labour among various lexicographers can thus be by topic


area rather than the alphabet. For one thing, this solves the problem of missing
cross-references or missing translations for synonymic items; for another,
it allows an allocation of tasks to lexicographers by areas of real-world
expertise rather than the alphabet. Errors or infelicities such as those discussed
in Section 3 can thus be avoided.
Turning now to the problems involved in adequately representing colloca-
tions (especially of the semantic-pragmatic type), we note that the onomasio-
logical approach allows us to adapt and further develop PGF style, as already
sketched above. PGF style indicates possible collocates in both subject and

Downloaded from http://ijl.oxfordjournals.org/ at Universitätsbibliothek Osnabrück on April 19, 2015


object position; sometimes generalised labels such as s.o. or s.th. are replaced
by more specific labels such as un animal. A few examples from PGF follow:

qn fait un appel du pied à qn jd gibt jdm einen Wink mit dem Zaunpfahl
qn conduit qn/un animal/qc quelque part jd bringt jdn/ein Tier/etw
irgendwohin; (à pied ) jd führt jdn/ein Tier/etw irgendwohin; (en voiture)
jd fährt jdn/ein Tier/etw irgendwohin
jd schlachtet qn tue [o abat] un animal/des animaux
un animal butine ein Tier sammelt Nektar [o Blütenstaub]

This practice can be further refined in onomasiological dictionaries. The


example of Table 5 illustrates the collocations entered by the French verb
butiner; this is a typical case where an individual word in French corresponds to
a collocation in English (for further evidence of interlingual correspondences
across morpho-syntactic levels, see Part I of this article).
For reasons of space and user convenience, typical subjects of butiner are
shown in the first line of the entry, so that they do not clutter up the following
lines, where the emphasis is on object complementation. In these lines the most

Table 5: An entry for butiner

butiner {une abeille, to collect nectar / pollen {a bee,


un papillon, une guêpe, . . . butine} a butterfly, a wasp, . . . collects nectar}

une abeille butine (quelque part: a bee gathers / collects / sucks (up)
sur les fleurs des artichauts / dans nectar / pollen ( from artichoke
les pissenlits) blossoms / from dandelions); a bee
gathers / collects honey11
une abeille butine une plante a bee visits a plant (to collect nectar);
(pour qqc: pour le nectar) collects nectar from a plant; sucks
(up) nectar from a plant
une abeille butine le pollen / a bee sucks up nectar / a bee collects
le nectar / le miel (quelque part) pollen (somewhere)
Collocation, Colligation and Encoding Dictionaries 23

common specific subject abeille is used consistently, where PGF uses a


superordinate term such as animal. In the case of butiner subject and object
complementation could probably be dealt with in the same way for any number
of language pairs. With some verbs, however, the presentation of subject þ verb
collocations and object þ verb collocations may be determined by the target
language. Consider, for example, the French verb craquer and its German
equivalents in Table 6.
This second example shows that complex colligations of the type qqc craque
de qqc must be illustrated with examples to be comprehensible to the dictionary
user. PGF style can also be adapted to variable idioms. In the example of

Downloaded from http://ijl.oxfordjournals.org/ at Universitätsbibliothek Osnabrück on April 19, 2015


Table 7, the core meaning is given as a noun entry, while the sentence entries
illustrate different collocations.

Table 6: An entry for craquer

craquer knacken / knistern / knarren /


krachen / knirschen

une branche / une articulation craque ein Ast / ein Gelenk knackt
la chaussure / le toit / le fauteuil / der Schuh / das Dach / der Sessel /
le parquet craque das Parkett knarrt
la neige craque der Schnee knirscht
qqc / qqn craque de qqc (etwa:) bei j-m knackt es irgendwo /
{bruits, matériaux de construction, . . .; an einem Ort knarrt etw.
jointures}
il craquait de toutes ses jointures alle seine Gelenke knackten / bei ihm
knackte es in allen Gelenken
la maison craque de bruits de im Haus knackt und knarrt es aus
radiateurs et de boiseries der Heizung und der Holztäfelung

Table 7: An entry for un pave´ dans la mare

un pavé dans la mare eine Bombe‘‘ (die irgendwo einschlägt)


’’
(¼ überraschende und beunruhigende
Nachricht)

c’est un pavé dans la mare das schlägt ein wie eine Bombe
qqn jette un pavé dans la mare / j-m sorgt für Aufregung / j-m erregt die
qqn envoie un pavé dans Gemüter / j-m wirbelt einigen
la mare / qqn lance un pavé Staub auf / j-m sorgt für Wirbel /
dans la mare j-m läßt die Wellen der Aufregung
hoch schlagen
24 Dirk Siepmann

In onomasiological dictionaries, additional economy of treatment may be


achieved by presenting collocations common to a particular semantic field at the
entry for the generic lexeme of the field, a suggestion that has already been
implemented by Mel’čuk and Wanner (1996: 233ff.) for the field of German
nouns denoting emotion. However, Mel’čuk and Wanner also draw attention to
the limitations of such an approach, given that even closely related nouns do not
share all their collocates (cf. Part I on the arbitrariness of collocation). For ease
of use and memorisation, it may in any case be preferable to give the entire set of
collocations for each concept or lexeme at the entry for that concept or lexeme.

Downloaded from http://ijl.oxfordjournals.org/ at Universitätsbibliothek Osnabrück on April 19, 2015


5. Coverage

This section is meant to illustrate by example how the onomasiological


approach can close some of the gaps found in current encoding dictionaries.
It will be seen that even the best collocational dictionaries are far from covering
anything like the entire range of collocation described in Part I of this article.
The section is divided into three parts. The first deals with breadth of coverage,
the second with depth, while the third offers suggestions for improvement.

5.1 Breadth ofcoverage

Within the Bilexicon project, a detailed trilingual investigation was conducted


into general-language items peculiar to one area of the vocabulary familiar to
most native speakers, namely road traffic. It was found that, while offering
a fair number of collocations in this area, OC misses out some very common
ones, such as

an empty parking space, a tight parking spot, a traffic jam clears, double
bend, avoid a traffic jam, the motorway (road) links (Paris) with
(Bordeaux), close a motorway, come off the motorway, open a (new)
motorway, motorway journeys, a clear motorway, a valid driving licence,
take one’s driving test, nothing coming (etc.)

Table 8 compares the results for the English noun motorway with the
list of ‘motorway’ collocations given in OC. The comparison shows that a
large number of collocations which an active user (i.e. a translator or language
learner) might need have been missed out. Numerically best represented in
this example as well as in traditional dictionaries generally are noun þ noun,
adjective þ noun and noun þ verb collocations. Equally well covered in
traditional dictionaries are fully fixed expressions such as proverbs or idioms.
Among the collocations of type 2 three-item collocations or ‘triples’
(Hausmann 2003) are patchily covered, probably because both monolingual
Collocation, Colligation and Encoding Dictionaries 25

Table 8: Coverage of motorway in OC and in an ideal dictionary

Published dictionaries Additional collocations


from trilingual analysis

N þ ADJ: busy, four-lane (etc.), N þ ADJ: big, large, major (! Fr. grande
orbital, urban autoroute); clear (! G. frei); clogged;
congested;controlled; deserted; elevated;
N þ V: join, leave, turn off, build empty; toll-free (! G. gebührenfrei,
mautfrei)
N þ N: driving, traffic, network,

Downloaded from http://ijl.oxfordjournals.org/ at Universitätsbibliothek Osnabrück on April 19, 2015


system, bridge, junction, N þ V: block, come off, cruise, get onto,
service area, service station, go onto, go on, turn off, get off,
crash, pile-up pull off, open, reopen

N þ Prep.: along the motorway, N þ motorway: toll (! Fr. à péage,


down the motorway, G. gebührenpflichtig, mautpflichtig),
off the motorway, motorway þ N: access, bridge, company
onto the motorway, (! Fr. société d’autoroute),
on the motorway, intersection, journey
up the motorway, (! G. Autobahnfahrt),
motorway from, lay-by, madness, maintenance,
motorway to miles, project
(! Fr. projet d’autoroute), trip

N þ Prep.: (be) beside he motorway


(! F. border l’autoroute)

triples: electronic motorway tolls


(elektronische Mauterhebung), on a clear
motorway, on clear motorway
(! G. auf freier (Auto-)Bahn, auf einer
freien Autobahn), excellent motorway
access, turn a trunk road into a
motorway (enlarge a trunk road into
a motorway) (! G. eine Bundesstraße
zur Autobahn ausbauen), widen a
motorway to four lanes (! G. vierspurig
ausbauen), to do a lot of motorway
driving, the motorway links A with B
(! F. relie A à B)
26 Dirk Siepmann

and collocational dictionaries such as OC exclude many common compound


nouns from their alphabetical framework. Thus, OC records parking as
a participial noun, but does not accord entry status to parking space, thus
missing out common triples such as empty parking space or look for a parking
space. It might be argued that empty parking space is not a collocation at all
but a free combination; this line of reasoning is contradicted by the fact that
the equivalent German collocation is freier Parkplatz (as opposed to leerer
Parkplatz, which corresponds to a deserted / empty car park; see Part I of this
article). This underscores again the importance of an onomasiological
approach, which does not pre-empt decisions on what to include on the basis

Downloaded from http://ijl.oxfordjournals.org/ at Universitätsbibliothek Osnabrück on April 19, 2015


of a restricted starting list. To take another example, while all unabridged
French dictionaries enter the expressions c’est-à-dire and en l’occurrence, none
of them mentions the frequent co-occurrence of the two.
This brings us to one of the most severely neglected subsets of collocations,
which have been termed ‘second-level discourse markers’ (Siepmann 2005).
Second-level discourse markers are fixed expressions, restricted collocations
or colligational patterns usually composed of two or more printed words;
typical examples are it is argued that, the same goes for, strictly speaking, force
est de þ INF, d’apre`s ce qui pre´ce`de or with this in mind. Although ubiquitous
in both academic and journalistic language, they have so far been paid scant
attention in lexicography. In PR, for example, there is no mention at all
of the various collocations based on the colligation force est de þ INF
( force est de constater / reconnaıˆtre / ajouter / . . .). As in the case of c’est-à-dire
en l’occurrence, these collocations in turn form their own collocations, which,
unsurprisingly, also go unrecorded in current semasiological dictionaries.
Some examples:

with this in mind þ let us turn to þ NP

turning to NP þ we find/note þ that-clause

not þ clause þ any more than þ clause

Patchy coverage is also given to conversational formulae of the type don’t


make a sound, do you hear me, I couldn’t agree more, look at the time. While
these four examples can all be located in CG or CR, those given in Table 9
are absent from at least one of the two.

5.2 Depth of coverage

Turning to depth of coverage, we find that three areas in particular are in need
of improvement, viz. a) triples b) collocational synonymy c) complementation
Collocation, Colligation and Encoding Dictionaries 27

Table 9: Conversational formulae

English French German

there’s no discussion il n’y rien à discuter


da gibt es nichts
zu diskutieren
I wouldn’t wish it c’est quelque chose que das würde ich
on anyone je ne souhaiterais niemandem wünschen
pas à mon pire ennemi (wollen) / das würde
ich nicht einmal
meinem ärgsten

Downloaded from http://ijl.oxfordjournals.org/ at Universitätsbibliothek Osnabrück on April 19, 2015


Feind wünschen
just being friendly j’ai seulement voulu ich meine es ja nur gut
être (me montrer)
aimable avec
(pour) toi/vous
this isn’t really (pour toi) il ne s’agit Dir geht es ja gar
about þ NP pas de þ INF / þ NP nicht um þ NP
and Bob’s your uncle et le tour est joué / und fertig ist die Laube
et voilà le travail
I wouldn’t kick Je ne coucherais Ich würde ihn/
him/her out of the bed. pas dans le sie nicht von der
porte-savon. Bettkante stoßen.

patterns or semantic-pragmatic collocations. The deficiencies found in each


of these areas will now simply be illustrated with a few examples from the
investigation into motoring vocabulary. The investigation revealed that triples
have been severely underestimated by theoreticians of collocations. Again,
the sheer size of the class, not all of whose members have been reproduced
here, indicates the superiority of an onomasiological, multilingual approach.
Where triples can be used alongside two-item collocations the triples have been
underlined (see Table 10).
Similar observations can be made for colligational patterns. The items in
Table 11 are just a small sample of those which have not been given their fair
share of attention in current dictionaries. Detailed cross-linguistic investigation
also threw up evidence of a general difference in patterning between English
and French which could never have been detected in a monolingual
investigation: in English two prepositions are often used in sequence to
describe movement, whereas French must resort to two clauses and two
different verbs to express the same idea (see Table 12). Finally, it may not be
amiss to illustrate (see Table 13) how the onomasiological approach can reveal
that synonymy, whether perfect or approximate, is not at all rare in natural
languages at the level of complex signs (i.e. collocations).
28 Dirk Siepmann

Table 10: Examples of common triples not found in other dictionaries


(English-German)

a busy road / a busy street; a much used eine stark befahrene Straße / eine
road viel befahrene Straße / eine
verkehrsreiche Straße
on the open road; on clear roads / on auf freier Strecke; auf offener Straße
clear motorways (etc.)
outside lane hogging / blocking the fast das Blockieren der Überholspur
lane / sitting in the outside lane

Downloaded from http://ijl.oxfordjournals.org/ at Universitätsbibliothek Osnabrück on April 19, 2015


winter road clearance der Winterdienst
s.o. changes into first gear / goes into j-m legt den ersten Gang ein
first gear / engages first gear / puts
the car into first gear / gets the car
into first gear
a good driving road eine Straße, auf der es sich gut fährt
s.o. goes along a path / a road j-m fährt (auf ) einem Weg / einer
Straße
the cab went along the coast road das Taxi fuhr über die Küstenstraße
(fuhr die Küstenstraße entlang)
s.o. uses a road as a rat-run j-m nutzt eine Straße als einen
Schleichweg
s.o. gets into the correct lane / s.o. j-m ordnet sich ein
selects the correct lane / s.o. moves
into the correct lane

5.3 Improving coverage

How can coverage be improved in future? Since OC was based on a large


general corpus (the BNC), this question is intimately linked to another, namely
whether any corpus can ‘approach the collective linguistic experience of
a language community’ (Howarth 1996: 72). Clearly, the answer still has to be
in the negative at the moment of writing, especially since most of today’s major
corpora are narrowly synchronic, comprising only the last fifteen years or so.
Yet in future very large corpora may well be built which will reflect the
knowledge and experience of language accumulated over several generations.
Everything stands or falls by the size and diversity of the corpora consulted,
so that it would obviously be wrong at the present time to infer the non-
existence of a collocation from its absence from a corpus.
As already pointed out, one way to overcome the limitations of exclusive
reliance on a large general corpus is by using sizeable subject-specific com-
parable corpora (this is the old principle of overall frequency vs. range first
Collocation, Colligation and Encoding Dictionaries 29

Table 11: Examples of common colligational patterns not found in other


dictionaries (English-German)

a car comes (þ verb of motion þ ing) ein Auto kommt


(þ Bewegungsverb
þ Partizip Perfekt)
another car came careering noch ein Wagen kam
around the corner um die Ecke gerast
a road has a . . . mph speed limit auf einer Straße ist die
Geschwindigkeit auf . . . km/h

Downloaded from http://ijl.oxfordjournals.org/ at Universitätsbibliothek Osnabrück on April 19, 2015


begrenzt/ auf einer
Straße gilt eine
Geschwindigkeitsbegrenzung
von . . . km/h
there is a car somewhere ein Auto fährt irgendwo
there was hardly a car es fuhr kaum ein Auto
on the streets
shall we go the [place name] way? sollen wir über [Ortsname]
fahren?
a road takes s.o. somewhere / eine Straße führt ( j-mden)
a road takes s.o. [distance] irgendwo hin / eine Straße
somewhere (through / past / geht irgendwo hin / über eine
to / into / across s.th.) Straße erreicht man [(nach)
Distanz] [Ort]
a gust of wind / a bend (etc.) forces eine Windböe (usw.) drängt
a car / s.o. (somewhere: j-mden / ein Fahrzeug
off the road, into the crash barrier, into (irgendwohin) ab; der Wind
the path of another vehicle, etc.); . . . forces drückt ein Fahrzeug aus der
a car to swerve (somewhere); causes a Fahrtrichtung; der Wind
car to swerve; {wind, force of the impact} drückt ein Fahrzeug zur
pushes a car somewhere Seite; in einer Kurve wird
ein Fahrzeug abgedrängt

Table 12: Cross-linguistic difference in verb patterning

English French

the car swerved (1) across the road la voiture (1) a traversé la route et
and (2) into the ditch (2) a fini dans le fossé
the car veered (1) off the side of the la voiture (1) s’est déportée sur le
road and (2) several yards down an côté de la route et (2) a dévalé à
embankment plusieurs mètres en contrebas
30 Dirk Siepmann

Table 13: Collocational synonymy in an onomasiological dictionary

English German

driving standards / driving practice / das Fahrverhalten / das Verhalten


driving behaviour / road manners im Straßenverkehr
s.o. sticks to the speed limit / s.o. j-m hält sich an die
keeps to the speed limit / s.o. Geschwindigkeitsbegrenzung /
observes the speed limit j-m beachtet die
Geschwindigkeitsbegrenzung
a car turns over three times / rolls j-m / ein Fahrzeug überschlägt

Downloaded from http://ijl.oxfordjournals.org/ at Universitätsbibliothek Osnabrück on April 19, 2015


three times / somersaults three sich dreimal
times / overturns three times
s.o. / a car is stopped by the police j-m / ein Wagen wird von der Polizei
(*s.o. is pulled by the cops) angehalten (*wird von den Bullen
gestoppt)
a car / a trailer swerves / goes out of j-m / ein Wagen bricht aus; j-m gerät
control / wipes out / veers off its aus der Spur; j-m kommt von der
path Fahrtrichtung ab; j-m gerät ins
Trudeln
a car gets trapped under another / ein Fahrzeug verkeilt sich in einem
a car is jammed under another / a anderen / ein Fahrzeug ist
car is left wedged under another / a eingekeilt unter einem anderen
car is left embedded under another

applied by Thorndike 1921); in addition, all such corpora should be compiled


for several languages. This is exactly the procedure followed in the afore-
mentioned investigation of road traffic vocabulary, which used a specialist
trilingual corpus of around 200 million words and three large general corpora
of around 600 million words. Such breadth in corpus selection will usually
enable the lexicographer to fill gaps in the corpora of one language by
translating an item from another language (of course, the translation should
itself be checked against a very large corpus such as the Internet). To give
a simple example, the French collocation heurter de plein fouet is highly
common in newspaper reports on car accidents, but corresponding English
collocations such as hit with full force / at speed are extremely rare in
comparable English corpora.
Such a procedure is also of great interest to contrastivists, since it enables
them to discover lexical gaps and divergences in colligational or clause patterns
(see above). Thus, the aforementioned study of motoring vocabulary showed
that there is no standard English equivalent for German aus der Kurve getragen
werden or French eˆtre de´porte´ dans un virage; however, expressions such
as wipe out on the bend or veer off the road on the bend may fill the bill.
Collocation, Colligation and Encoding Dictionaries 31

Similarly, monolingual German lexicography might well overlook such


colligational patterns as Geschwindigkeit auf der Autobahn or Straße, auf der
sich gut fahren läßt, whereas combinations such as the compound noun
motorway speed or the adjective-noun collocation a good driving road will be
readily detectable in an English corpus. Of course, such considerations are also
true for the other translation direction (cf. sick note on demand –
Gefälligkeitsattest – certificat de complaisance; accident involving . . . – accident
mettant en cause . . . – Unfall, an dem . . . beteiligt sind ).
Finally, it should be noted that, if the aim is to cover collocation as well as
colligation, then it will be impossible to fully automate the dictionary-making

Downloaded from http://ijl.oxfordjournals.org/ at Universitätsbibliothek Osnabrück on April 19, 2015


process in the foreseeable future. The reason for this is that such colligational
patterns as NP/ADJ þ dans l’âme / en herbe (etc.) cannot be located in even the
most sophisticated tagged corpora, since the retrieval software will also come
up with such sequences as NP/ADJ þ dans la maison / dans la grotte / dans
l’hôtel (etc.). Human intervention will thus remain indispensable.

6. Collocation types, lemma types and citation forms

As seen above, a useful distinction can be established between four major types
of collocational relationship. However, the distinction cannot be transferred
as such to the dictionary for a number of reasons:

(1) Firstly, there is no one-to-one correspondence between collocation types


and the three traditional lemma types (one-item lemma, multi-item lemma,
morphematic lemma); long-distance collocations do not fall into any of
these three categories; they also cut across the boundary of categories 2
and 3, as do some two-item collocations.
(2) Any dictionary maker who aims at commercial viability and user
friendliness should at least be wary of representing collocations of type 3
by means of general semantic labels such as [uncertainty] þ not so. In such
cases it may be wiser to exemplify rather than abstract away from actual
instances. For maximum user friendliness, the example should exhibit
prototypical features of the collocation to be recorded (cf. Harras 1989: 611
on entry words; on prototype theory, see Aitchison 1994). In learners’
dictionaries, the definition may help to introduce an element of generality
or abstraction that would be missing in other dictionaries, as witness the
example in Cobuild style (see Figure 1; Siepmann 2005: 318).

Note the pioneering use of broken underlines to illustrate the presence of


long-distance collocational attraction based on semantic features. The same
typographical presentation could be used in any bilingual dictionary. Since
bilingual dictionaries do not normally contain definitions, at least two examples
32 Dirk Siepmann
so /sou/
(...)
12 You can use not so to say that what you have just PHR as
stated is untrue although it may have seemed probable sentence
at first sight. This use is particularly common in written PRAGMATICS
English. Some might think Volkswagen, which now
owns 70 per cent of the Czech company, would have
thought the Skoda’s identity problematic. Not so. VW
sees Skoda as one of the most recognised brand names
in advertising.
Figure 1: A sample entry for ‘not so’ in Cobuild style

Downloaded from http://ijl.oxfordjournals.org/ at Universitätsbibliothek Osnabrück on April 19, 2015


Table 14: Lemma types

Linguistic Category Lemma type Example

morpheme morphematic lemma un micro-N, ein Hobby-N


lexeme one-item lemma une pomme
collocations of multi-item lemma: a) N à ses heures
type 1, 2 and 3 a) colligational b) une pomme de terre,
b) collocational tomber dans les pommes,
reconnaıˆtre ses torts
long-distance separable lemmas de meˆme que . . . de meˆme ;
collocations turning to . . . we find / note ;
of type 3 it was hoped that . . . not so

of each collocation should be given for the user to form a correct under-
standing of its use and to be able to use it productively in a new context.
Accordingly, unabridged dictionaries of the future should contain at least
the three major types of lemmas (‘one-item lemmas’, ‘multi-item lemmas’
and ‘morphematic lemmas’)10; to this we might add ‘separable lemmas’ as
representations of long-distance collocations and some collocations of type 3
(see Table 14). As seen in Tables 5 and 6, complementation patterns can be
shown using placeholders such as so or sth or typical representatives of the
semantic class which can be inserted into a particular slot, such as abeille
in Table 5.

7. The limits of translatability

Opponents of bilingual dictionaries or vocabulary lists for encoding purposes


have often argued that such learning materials encourage the erroneous
assumption of one-to-one equivalences between items. The argument is
clearly valid if we equate one-word items such as house and maison or
English population and French population, but it falls apart in the case of
Collocation, Colligation and Encoding Dictionaries 33

monoreferential collocations. As the aforementioned investigation into road


traffic vocabulary in English, French and German has shown, the over-
whelming majority of collocations in this area are not culture-specific and have
direct equivalents in the other languages. Even colloquial idioms, which
might be intuited to be culture-specific, usually have perfect equivalents
(see Table 15). Translational equivalences may exist between any type of
construction, as witness the examples given in Table 16.
There are, however, a few exceptions, which may arise from two types
of causes: 1) real-world constraints 2) language-internal developments
(cf. Siepmann 2003). Examples of type 1 are Trauspruch, which has no

Downloaded from http://ijl.oxfordjournals.org/ at Universitätsbibliothek Osnabrück on April 19, 2015


equivalent in French wedding ceremonies, and Reißverschlussverfahren, which
refers to the procedure whereby cars alternately move into another lane when
a lane closure is ahead. It follows that any collocation based on these
nouns, such as Trauspruch þ beten or nach dem Reißverschlussverfahren,
has to be rendered by means of a paraphrase (e.g. merge in turn). In such
cases, the lexicographer has no alternative but to record two example sentences

Table 15: Translational equivalences

English French German

he must have found his il a dû avoir son er hat wohl den
licence in a lucky bag permis dans une Führerschein im Lotto
/ (AE) he must have pochette surprise gewonnen / er hat wohl
got his licence from a seinen Führerschein bei
lucky dip Neckermann gekauft
they’ve got nothing mon dossier est vide man hat nichts gegen
against me mich in der Hand

Table 16: Translational equivalences between different types of item

English French German

A budding N un N en herbe ein angehender N / eine


angehende N / ein
angehendes N
an amateur N un N à ses heures ein Freizeit- (N) / ein
Gelegenheits- (N)
similarly with NP il en va semblablement Ähnliches gilt für NP
pour NP
an NP that exceeds un NP supérieur aux ein NP, der die Erwartungen
expectations attentes übertrifft
34 Dirk Siepmann

rather than citation forms. The same goes for collocations where one language
uses an implicit form of words which the other tends to make explicit. Thus,
imagine a car parked alongside a fence, so that little space is left between the
passenger door and the fence. The typical question German drivers put to their
passengers in such a situation will go something like this: Soll ich ein Stück
vorsetzen? An English driver might prefer a more explicit wording along
the lines of: Do you want me to move the car / it forward a bit? (alongside Shall I
go forward a bit?)
Exceptions of type 2 occur when the languages under survey do not offer
the same number of collocations for some particular idea. Such difference

Downloaded from http://ijl.oxfordjournals.org/ at Universitätsbibliothek Osnabrück on April 19, 2015


has frequently been noted in the area of single-word lexemes: it has long
been known, for example, that English has more verbs of movement than
either French or German. Similar observations can now be made for
collocations. Thus, English resemblance collocates with a wider variety of
adjectives denoting ‘strangeness’ than its French and German counterparts
(cf. Siepmann 2003).
Both types of exceptions require special attention on the part of the lexico-
grapher. It is particularly dangerous to resort to intuitive translations, as
a number of defective translations from published dictionaries (e.g. weiträumige
Umleitung ! *diversion covering a wide area [PGE]) readily attest.
Sometimes such translation errors occur because there are genuine
collocational gaps, but nevertheless the translator wishes to provide a
collocation at all costs. The best strategy to follow in such cases is to study
parallel texts and to offer a suitable paraphrase which should be marked
as such (e.g. by using the tilde).

8. Conclusion

The broadly-based definition of collocation on which this article is based


opens up new perspectives for both monolingual and bilingual lexicography.
Future dictionaries will need to record any type of structurally complex unit,
paying increased attention to collocational frameworks (my NP exactly) and
fixed expressions of regular syntactic composition (I’ve got eyes in my head,
there are good reasons for believing that, I couldn’t agree more, etc.). It has been
shown that bilingual or multilingual onomasiological lexicography is set to lead
the way in this endeavour, since it has obvious advantages over monolingual
and semasiological approaches; bilingual dictionaries should no longer be
based on monolingual dictionaries, but rather the other way round. It has
also emerged that the onomasiological dictionary of the future will constitute
a new kind of dictionary of synonyms to the extent that it will contain
collocational rather than one-word synonyms, along the lines of Schemann’s
(1991) dictionary of German idioms (SR).
Collocation, Colligation and Encoding Dictionaries 35

Notes
1
Hoey (1998) defines colligation thus: (a) the grammatical company a word keeps
(or avoids keeping) either within its own group or at a higher rank (b) the grammatical
functions that the word’s group prefers (c) the place in a sequence that a word prefers
(or avoids).
2
Note, however, that there is much less non-native material to be found on the
Internet for languages such as French, German or Italian, so that a more reliable picture
of native language use can be built up.
3
Of course, meaning arises through the interaction of mother and child long before
it can be represented linguistically (cf. Nelson 1998, Stern 1998). It is commonly
assumed that babies who are not yet able to speak assign meaning to the different

Downloaded from http://ijl.oxfordjournals.org/ at Universitätsbibliothek Osnabrück on April 19, 2015


phases of a proto-narrative sequence. The first meanings acquired in early language
acquisition are therefore of a holistic nature; the words ‘bath’ or ‘bathroom’, for
example, will be associated with the relevant proto-narrative sequence (entering the
room, opening the tap, feeling the warmth of the water, the stinging sensation of soap
in the baby’s eyes, etc.) rather than a room containing a toilet, a shower, a bathtub and
a washbasin. It thus appears that meaning is created by the repeated connection between
feelings and/or lexical units on the one hand and contexts on the other hand.
4
It will be noted that the underlying assumption here is that ‘more is better’. Active
users such as advanced foreign language learners and translators working into a foreign
language require the most detailed and comprehensive information possible. It might
be argued that such users should turn directly to corpora instead, but the advantage of
a good dictionary is that it provides a ready-made account of the significant features
of a lexical item in a clear and memorable way.
5
Another problem attendant upon automatic extraction is the lack of an adequate
corpus base for collocations typical of spoken language.
6
It may be noted in passing that most complete utterances which consist of an
individual word are, in fact, collocational in nature, cf. help!; blood!; bed!; they are
holistic, situation-specific units (cf. González-Rey 2002: 95, 101).
7
I do not wish to suggest that contrastive lexicology and bilingual or multilingual
lexicography can take account of all possible distinctions arising from cross-linguistic
comparison. As Hausmann (1995: 23) notes, such comparison could only be exhaustive
if it is restricted to lexical units with a relative degree of semantic autonomy; Hausmann
argues, for example, that lexical units exhibiting a high degree of context-dependence,
such as the French adjective sauvage would give rise to an endless multiplication
of potential equivalences. Arbitrary limits must therefore be set on the number of
languages to be compared as well as on equivalences and sense distinctions. The number
of languages will usually be restricted to two, i.e. the language pair treated in the
dictionary, since sense distinctions that are useful to, say, Italians using English are
not relevant to a French-English dictionary. It should also be noted, however, that
Hausmann overstates his case by focussing too much on the language of literature,
where creativity is at a premium. We will soon be able to cover exhaustively the ordinary
patterns, collocations and sense distinctions found in conversation and pragmatic
text types.
8
OALD is the only monolingual dictionary to record a similar sense (‘stop a vehicle
at the side of the road’), which is too specific (cf. waiting at the traffic lights).
36 Dirk Siepmann
9
This does not mean that the question of an item’s polysemy is decided by applying
interlingual criteria; rather, cross-linguistic comparison should be viewed as a useful
heuristic to discovering language-internal polysemy which could theoretically also be
detected through monolingual investigation. It is also worth bearing in mind that
polysemy is an extremely relative notion, and that the spectrum of meanings covered by
a large number of words can give rise to an almost infinite number of context-dependent
sense divisions (cf. footnote 4 above).
10
It may be misleading to speak of ‘multi-word lemmas’, as Steyer (2000) does,
since colligational patterns contain slots filled by particular categories rather than
a specific word.
11
Technically, of course, bees do not collect honey, but the collocation is often

Downloaded from http://ijl.oxfordjournals.org/ at Universitätsbibliothek Osnabrück on April 19, 2015


used in everyday language.

References
1. Dictionaries
Atkins, B. T. et al. 1993. Collins Robert French-English English-French Dictionary.
Unabridged. (3rd ed.). Glasgow: HarperCollins. (CR)
Atkins, B. T. et al. 1994. Le Robert & Collins. Vocabulaire anglais et ame´ricain. Paris:
Le Robert. (VAEA)
Binon, J. et al. 2000. Dictionnaire d’apprentissage du français des affaires. Paris: Didier.
(DAFA)
Dendien, J. 2004. Tre´sor de la Langue Française Informatise´. Paris: CNRS. (TLF)
Cop, M. et al. 2001. PONS Großwörterbuch Englisch. Stuttgart: Klett. (PGE)
Corréard, M. (ed.) 1994. Oxford/Hachette French Dictionary. French-English/
English-French, Oxford: Oxford University Press. (OH)
Crowther, J. et al. 2002. Oxford Collocations Dictionary for Students of English. Oxford:
Oxford University Press. (OC)
Chapman, R. L. (ed.) 1996. Roget’s International Thesaurus. Glasgow: HarperCollins.
(RO)
Collins Cobuild English Dictionary for Advanced Learners (3rd ed. 2001). Glasgow:
HarperCollins. (CCED)
Dornseiff, F. and Quasthoff, U. 2004. Der deutsche Wortschatz nach Sachgruppen. Berlin:
De Gruyter. (DO)
Hamblock, D. and Wessels, D. 1999. Großwörterbuch Wirtschaftsenglisch
Deutsch-Englisch/Englisch-Deutsch (5th ed.). Berlin: Cornelsen. (GW)
Knight, L. S. et al. 1999. Collins German-English English-German Dictionary. Unabridged
(4th ed.). Glasgow: HarperCollins. (CG)
McArthur, T. 1981. Longman Lexicon of Contemporary English. London: Longman.
(LLCE)
Quasthoff, U. (ed.) 2003. Franz Dornseiff: Der deutsche Wortschatz nach Sachgruppen
(CD-ROM). (DO)
Procter, P. (ed.) 2001. Cambridge International Dictionary of English on CD-ROM.
Cambridge: Cambridge University Press. (CIDE)
Rey, A. (ed.) 1993. Le nouveau Petit Robert. Paris: Le Robert. (PR)
Rey, A. (ed.) 1985. Le Grand Robert de la langue française sur CD-ROM. Paris:
Le Robert. (GR)
Schnorr, V. et al. 1996. PONS Großwörterbuch Französisch. Stuttgart: Klett. (PGF)
Schemann, H. 1991. Synonymwörterbuch der deutschen Redensarten. Stuttgart:
Klett. (SR)
Collocation, Colligation and Encoding Dictionaries 37
Walter, E. (ed.) 1994. Cambridge Word Routes. Anglais-Français. Cambridge:
Cambridge University Press. (CW)
Wehrle, H. and Eggers, H. 2001. Deutscher Wortschatz. Stuttgart: Klett. (WE)

2. Other literature
Aitchison, J. 1994. Words in the mind. An Introduction to the Mental Lexicon. Oxford:
Blackwell.
Arnaud, P. J. L. 1992. ‘La connaissance des proverbes français par les locuteurs natifs
et leur sélection didactique.’ Cahiers de Lexicologie 1: 195–238.
Baker, M., Francis, G. and Tognini-Bonelli, E. 1993. Text and Technology: In Honour

Downloaded from http://ijl.oxfordjournals.org/ at Universitätsbibliothek Osnabrück on April 19, 2015


of John Sinclair. Amsterdam/Philadelphia: Benjamins.
Bally, C. 1909/1951. Traite´ de Stylistique Française (Vol. 1). Geneva: Librairie
Georg & Cie.
Biber, Douglas et al. 1999. Longman Grammar of Spoken and Written English. London:
Longman.
Bogaards, P. 1990. ‘Où cherche-t-on dans le dictionnaire?’ International Journal of
Lexicography 3: 79–102.
Bogaards, P. 1991. ‘Word frequency in the Search Strategies of French Dictionary
Users.’ Lexicographica 7: 202–212.
Burger, H. 1989. ‘Phraseologismen im allgemeinen einsprachigen Wörterbuch’
in F. J. Hausmann, Franz Josef, et al. (eds.), Wörterbücher: Ein internationales
Handbuch zur Lexikographie. Vol. 1 (Handbücher zur Sprach- und
Kommunikationswissenschaft; Vol. 5). Berlin/New York: De Gruyter, 593–599.
Burger, H. 1998. Phraseologie: Eine Einführung am Beispiel des Deutschen. Berlin:
Schmidt.
Church K. W. and Hanks P. 1990. ‘Word Association Norms, Mutual Information
and Lexicography.’ Computational Linguistics 1: 22–29.
Council of Europe 2001. Common European Framework of Reference for Languages:
Learning, Teaching, Assessment. Cambridge: Cambridge University Press.
Cowie, A. 1999. English Dictionaries for Foreign Learners. Oxford: Oxford
University Press.
Cummins, S. and Desjardins, I. 2002. ‘A Case Study in Lexical Research for
Translation.’ International Journal of Lexicography 2: 139–156.
de Florio-Hansen, I. (2004), Wortschatzerwerb und Wortschatzlernen von
Fremdsprachenstudierenden. Erste Ergebnisse einer empirischen Untersuchung.
Fremdsprachen Lehren und Lernen 33: 83–113.
Dunning, T.E. 1993. ‘Accurate Methods for the Statistics of Surprise and Coincidence.’
Computational Linguistics 1: 61–74.
Feilke, H. 1996. Sprache als soziale Gestalt. Frankfurt: Suhrkamp.
Feilke, H. 2003. ‘Kontext – Zeichen – Kompetenz. Wortverbindungen unter
sprachtheoretischem Aspekt’ in K. Steyer (ed.), 41–64.
Firth, R. 1957. Papers in Linguistics. London: Oxford University Press.
Fontenelle, T. 2003. ‘Collocations et traitement automatique du langage naturel’
in F. Grossmann et A. Tutin, 75–88.
Francis, G., Hunston, S. and Manning, E. 1998. Collins Cobuild Grammar Patterns 2:
Nouns and Adjectives. London: HarperCollins.
Gates, E. 1988. ‘The treatment of multi-word lexemes in some current dictionaries of
English’ in M. Snell-Hornby, Mary (ed.) (1986), ZüriLEX’86 Proceedings. Papers
read at the Euralex International Congress. Tübingen: Francke, 99–106.
38 Dirk Siepmann
Götze, L. 1999. ‘Der Zweitspracherwerb aus der Sicht der Hirnforschung’ in Deutsch
als Fremdsprache 1: 10–16.
González-Rey, I. 2002. La phrase´ologie du français. Toulouse: Presses Universitaires
du Mirail.
Grossmann, F. and Tutin, A. (eds.) 2003. Les collocations: analyse et traitement. Travaux
et recherches en linguistique appliquée Série E. Amsterdam: De Werelt.
Harras, G. 1989. ‘Zu einer Theorie des lexikographischen Beispiels’ in Hausmann et al.
(eds.), 607–614.
Hartmann, R. R. K. 2001. Teaching and Researching Lexicography. London: Longman.
Hausmann, F. J. et al. (eds.) 1989–1991. Dictionaries: An International Encyclopedia
of Lexicography (3 Vols.). Berlin: Walter de Gruyter.
Hausmann, F. J. 1995. ‘Von der Unmöglichkeit der kontrastiven Lexikologie’ in

Downloaded from http://ijl.oxfordjournals.org/ at Universitätsbibliothek Osnabrück on April 19, 2015


H.-P. Kormann and A. L. Kjaer (eds.), Von der Allgegenwart der Lexikologie.
Kontrastive Lexikologie als Vorstufe zur zweisprachigen Lexikographie. Tübingen:
Niemeyer, 19–23.
Hausmann, F. J. 1999. ‘Le dictionnaire de collocations – Critères de son organisation’
in N. Greiner et al., Texte und Kontexte in Sprachen und Kulturen. Festschrift für
Jörn Albrecht. Trier: Wissenschaftlicher Verlag Trier, 121–140.
Hausmann, F. J. 2002. ‘La lexicographie bilingue en Europe: peut-on l’améliorer?’ in
La Lessicografia Bilingue tra presente e avvenire, Atti del Convegno Vercelli, 4–5
maggio 2000, a cura di Elena Ferrario e Virginia Pulcini, Vercelli: Mercurio, 11–32.
Hausmann, F. J. 2003. ‘Was sind eigentlich Kollokationen?’ in K. Steyer (ed.), 309–334.
Hausmann, F. J. forthcoming. Der undurchsichtige Wortschatz des Französischen.
Lernwortlisten für Schule und Studium.
Hoey, M. 1998. ‘ ‘‘Introducing Applied Linguistics’’: 25 Years On.’ Plenary Paper in the
31st BAAL Annual Meeting: ‘‘Language and Literacies’’, University of Manchester,
September 1998.
Howarth, P. 1996. Phraseology in English Academic Writing. Some Implications for
Language Learning and Dictionary Making. Tübingen: Niemeyer.
Hunston, S. 2001. ‘Colligation, Lexis, Pattern and Text’ in M. Scott and G. Thompson
(eds.), Patterns of Text. In honour of Michael Hoey. Amsterdam: Benjamins, 13–34.
Jones, S. 2002. Antonymy: A Corpus-based Perspective. London: Routledge.
Kjellmer, G. 1994. A Dictionary of English Collocations. Oxford: Clarendon Press.
Kocourek, R. 1991. La langue française de la technique et de la science. Wiesbaden:
Brandstetter.
Kromann, H.-P. 1991. ‘Principles of Bilingual Lexicography’ in F. J. Hausmann et al.,
2711–2728.
Laffling, J. 1991. Towards High-Precision Machine Translation. Based on Contrastive
Textology. Berlin: Foris Publications.
Louw, B. 1993. ‘Irony in the text or insincerity in the writer–the diagnostic potential
of semantic prosodies’ in M. Baker, G. Francis and E. Tognini-Bonelli (eds.),
157–176.
Lyne, A. A. 1985. The vocabulary of French business correspondence. Word frequencies,
collocations and problems of lexicographic method. Genève/Paris: Slatkine-Champion.
McArthur, T. 1981. Longman Lexicon of Contemporary English. Londres: Longman.
McArthur, T. 1986. ‘Thematic Lexicography’ in R. R. K. Hartmann, The History
of Lexicography. Papers from the Dictionary Research Centre Seminar at Exeter,
March 1986. Amsterdam: Benjamins, 157–166.
McArthur, T. 1998. Living Words: Language, Lexicography and the Knowledge
Revolution, Exeter: University of Exeter Press.
Collocation, Colligation and Encoding Dictionaries 39
Meißner, F. J. et al. 2001. ‘Zur Ausbildung von Lehrenden moderner Fremdsprachen.
Ergebnisse einer Reflexionstagung zur Lehrerbildung (23./24. März 2000, Schloss
Rauischholzhausen).’ Französisch heute 32: 212–227.
Mel’čuk, I. 1998. ‘Collocations and Lexical Functions’ in A. Cowie, Phraseology.
Theory, Analysis and Applications. Oxford: Clarendon Press, 23–53.
Mel’čuk, I., Clas, A. and Polguère, A. 1995. Introduction à la lexicologie explicative
et combinatoire. Louvain-la-Neuve: Duculot.
Mel’čuk, I. and Wanner, L. 1996. ‘Lexical Functions and Lexical Inheritance for
Emotion Lexemes in German’ in L. Wanner (ed.), Lexical Functions in Lexicography
and Natural Language Processing. Amsterdam: Benjamins, 207–277.
Nelson, K. 1998. Language in Cognitive Development. The Emergence of the Mediated
Mind. Cambridge: Cambridge University Press.

Downloaded from http://ijl.oxfordjournals.org/ at Universitätsbibliothek Osnabrück on April 19, 2015


Petermann, J. 1983. ‘Zur Erstellung ein- und zweisprachiger phraseologischer
Wörterbücher: Prinzipien der formalen Gestaltung und der Einordnung von
Phrasemen’ in J. Matesic (ed.), Phraseologie und ihre Aufgaben. Beiträge zum 1.
Internationalen Phraseologie-Symposium vom 12. bis 14. Oktober 1981 in Mannheim.
Heidelberg: Groos, 172–191.
Rapp, R. and Wettler, M. 1992. ‘Wie mit Hilfe des Assoziationsgesetzes freie
Wortverbindungen vorhergesagt werden können.’ Tagungsband der 34. Tagung
experimentell arbeitender Psychologen, Osnabrück, 401.
Rapp, R. 1995. Die Berechnung von Assoziationen. Hildesheim: Olms.
Sinclair, J. M. 1991. Corpus, Concordance, Collocation. Oxford: Oxford University
Press.
Siepmann, D. 2003. ‘Eigenschaften und Formen lexikalischer Kollokationen: Wider ein
zu enges Verständnis’ Zeitschrift für französische Sprache und Literatur 1: 260–283.
Siepmann, D. 2004. ‘Linguistische und didaktische Aspekte der Übersetzung
von Mehrwortgliederungssignalen am Beispiel der Suggestoren’ in B. Kovtyk and
G. Wendt, Ausbildung von übersetzern im neuen geeinten Europa 2004–linguistische,
didaktische und psychologische Aspekte. Berlin: Logos, 123–142.
Siepmann, D. 2005. Discourse Markers across Languages. A contrastive study of
second-level discourse markers in native and non-native text. New York: Routledge.
Siepmann, D. (in preparation). Thematic Learner Lexicography. Linguistic and
User-Related Aspects.
Smadja, F., McKeown, K. R. and Hatzivassiloglou, V. 1996. ‘Translating collocations
for bilingual lexicons: A statistical approach.’ Computational Linguistics 1:1–38.
Stern, D. 1998. Die Mutterschaftskonstellation. Eine vergleichende Darstellung
verschiedener Formen der Mutter-Kind-Psychotherapie. Stuttgart: Klett-Cotta.
Steyer, K. 2000. ‘Usuelle Wortverbindungen des Deutschen. Linguistisches Konzept
und lexikografische Möglichkeiten.’ Deutsche Sprache 2: 101–125.
Steyer, K. (ed.) 2003. Wortverbindungen–mehr oder weniger fest. (Jahrbuch des Instituts
für deutsche Sprache.) Berlin: De Gruyter.
Stubbs, M. 1995. ‘Corpus evidence for norms of lexical collocation’ in Cook, G. and
Seidlhofer, B. (eds.) 1995. Principle and practice in Applied Linguistics: Studies in
Honour of H.G. Widdowson. Oxford: Oxford University Press, 245–256.
Thorndike, E. L. 1921. The Teacher’s Word Book. New York: Columbia University.

View publication stats

You might also like