Words and Their Neighbours

Words and Their Neighbours
Oxford Handbooks Online

Michael Hoey
The Oxford Handbook of the Word
Edited by John R Taylor
Print Publication Date: Jun 2015 Subject: Linguistics, Morphology and Syntax
Online Publication Date: Aug 2014 DOI: 10.1093/oxfordhb/9780199641604.013.39
Abstract and Keywords
Words relate to their neighbours in a variety of ways. When we construct an utterance, or

interpret the utterances of others, we subconsciously note the collocations that a word on
its own makes with other words, as well as the collocations that the combination has with
other words or word combinations. We also note the syntactic environments of words and
word combinations (i.e. their colligations) and their association with words of particular
semantic sets. Parts of words also participate in these kinds of relation. Thus, a word can
be said to prime the contexts of its previous uses. These primings influence the way we
interpret a word in context as well as our future uses of the word; moreover, collocation
contributes to textual cohesion. The relation of a word to its neighbours thus lies at the
very core of a language and its use.
Keywords: collocation, colligation, semantic set, priming, cohesion
Page 1 of 18
PRINTED FROM OXFORD HANDBOOKS ONLINE (www.oxfordhandbooks.com). (c) Oxford University Press, 2015. All Rights
Reserved. Under the terms of the licence agreement, an individual user may print out a PDF of a single chapter of a title in
Oxford Handbooks Online for personal use (for details see Privacy Policy).
Subscriber: El Colegio de Mexico, A.C. (COLMEX); date: 26 April 2017

TRADITIONAL theories of language used to separate grammatical descriptions from the

descriptions of the lexicon, regarding the former as complex and the latter as essentially
simple (albeit cumbersome). Such was true, for example, of the earliest versions of
transformational-generative grammar. A consequence of this divide was not, as might
appear from a superficial inspection of the literature, the neglect of lexis; the 20th
century was a golden era for lexical study, from the completion of the Oxford English
Dictionary in the first quarter of the century to the appearance in the last quarter of the
corpus-driven Collins COBUILD Advanced Learners’ Dictionary and its successors. But
this way of describing lexis inevitably meant that for much of that century discoveries
about vocabulary were presented in essentially granular, list-like fashion.
Amongst descriptive and theoretical linguists it is probably no longer necessary to argue

the case for a more integrated theory, but the separation of grammar and lexis is still
widespread in language-teaching programmes, and most of the traditional terms used to
describe the ways the lexicon is organized assume a particularity and separateness to our
vocabulary. Antonymy, co-hyponymy, and meronymy are all paradigmatic relationships: it
is hot as opposed to cold, it is a spaniel rather than a Chihuahua, it is a toe not a foot.
Hyponymy and synonymy also draw on paradigms, though with awareness of context. It is
sweltering, it is scorching; it is a dog, in fact it is a spaniel. Corpus linguistic work has
overturned that opposition, showing that the evidence points to a lexicon that is greatly
more complex than previously allowed for and, in the view of some at least (Hoey 2005;
Hoey et al. 2007: ch. 2; Hunston and Francis 2000), a grammar that is simpler or at the
very least less coherent. Above all else, the findings of corpus linguistics show that we
have viewed words too long as particles and as belonging to fields, and it is time to look
at them as belonging also to waves (Halliday 1982; Pike 1959); this wave-like function is
particularly focused on in Sinclair’s (2004) discussion of the lexical item. This chapter
seeks to show some of the features of a word’s context that contribute to the waves that
words are part of and to which they contribute.
A sentence such as I must have dozed straight off (uttered by my wife, 16.6.13, but hardly
original) illustrates the problems of treating lexis as uncontextualized. The off (p. 142)
may hint at a light switch but it is not in opposition to on. Dozed might be replaced by
drifted (though not by slept), but if drifted is used, straight ceases to be a natural option
as a modifier of off. Straight here is used grammatically in a manner similar to the usage
in straight ahead and straight up but has no close meaning association with either. Finally,
the modal choice must have has no parallel in, say, I must have just woken up. The lexical
choices in the sentence are much more readily explicable in contextualized terms, that is,
in terms of the effect they have on each other.
To show how necessary it is to look at a word’s neighbours, and to help identify the
different ways in which a word’s neighbours may influence our understanding of any
particular word, I want to start with a handful of sentences that ought not to be
intelligible together because they contain a significant lexical ‘mistake’, and yet are in
fact so immediately intelligible that the mis-selection is characteristically overlooked. The
Page 2 of 18

text of which they are a part was found in the travel supplement for a Sunday newspaper;
the sentences in question are as follows:
In the village of Chilling there were more lessons, this time in metalwork. From
out of this tiny mud-hut hamlet comes the most beautiful beaten bronze, copper
and silver, found cladding traditional kitchen stoves across Ladakh. Smoke from
the crude forgeries rose over the village as I picked my way carefully down the
mountain between twisted trunks of willow trees. (The Independent on Sunday, 2
November 2008, p. 69)
The mis-selection is the choice of forgeries (= fraudulent copies) rather than forges (=
places for working on molten metal) in the third sentence, and the question I want to
address as a way of looking at how words relate to their neighbours is why the choice of
forgeries was not detected by the author, the editor, or any of the readers to whom I have
shown the passage. (I only noticed it because I was reading the text aloud.) In the process
of answering this question, I will sketch out some of the ways in which a word’s
neighbours affect (or are affected by) the choice and interpretation of that word.
The first and most obvious feature of the neighbourhood of any particular word (or, as we
shall see, any phrase or part of a word) is that of collocation. The term can be traced at
least as far back as Dr Samuel Johnson’s friend and companion, Sir William Jones (to
whom is usually attributed the notion of language families); his use of the term is cited in
Webster’s New International Dictionary, 1928 edition. The concept was reintroduced into
linguistic theory by Firth (1957), but actually the credit for discovering that collocation is
a ubiquitous and fundamental characteristic of the lexicon properly belongs to Sinclair
and his COBUILD team, much in the way that the Higgs-Bosun particle was postulated by
Higgs but its discovery was the work of the combined research forces of CERN.
Collocation may be defined both statistically and psychologically. A statistical definition

might be that a collocation is the co-occurrence of two words within a defined close
proximity where the frequency of the co-occurrence is demonstrably greater than can be
explained in terms of random juxtaposition. A psychological definition would draw on
experimentation with word associations, and would note that the language user (p. 143)
more readily associates one of the words with the neighbour(s) in question than with any
randomly selected item. An example of collocation (defined either way) is indeed that of
crude with forgeries, as inappropriately illustrated in the passage above. This is the
strongest lexical collocation of forgeries in the Guardian corpus,1 accounting for 4 per
cent of the 99 occurrences of forgeries. Since, though, these data are sparse, I undertook
a Google search on crude forgeries which resulted (on 28.6.13) in approximately 18,900
hits, corresponding to marginally over 1 per cent of the hits for forgeries (1,790,000). The
same picture applied to crude forgery, where a Google search resulted in 40,500 hits,
corresponding to slightly under 0.5 per cent of hits for forgery. These data suggest that
the collocation with crude is robust for both forgeries and forgery.
Page 3 of 18

A word’s relationships with its neighbours also include its colligations, i.e. the
grammatical patterns it participates in or the grammatical relationships it forms (Sinclair
2004; Hoey 2005). In the case of crude and forgeries, the relationship is dominantly that
of [adj + N], rather than (for example) [the N BE adj]. In the first 100 hits from a Google
search for crude and forger*, there were 72 instances of crude forger* and 28 instances of
all other combinations.
On the other hand, crude does not collocate with forge. Instead, an example of collocation
with the word forge is blacksmith’s. There were 29 instances of the nominal use of forge
in the Guardian corpus (eliminating addresses, places, and company names, where there
was no reference to the function of a forge), and four of these occur with blacksmith’s, in
the phrase blacksmith’s forge. The fact that we have only 29 instances of forge in five
years of news tells us that manual crafts such as forging are rarely newsworthy. If,
however, one Googles the words blacksmith’s and forge, one gets (approximately) 304,000
hits, suggesting again that the words are a robust collocation rather than an accident of
the sparse data from the Guardian.
The first important observation about these facts, which is generally true for collocation,
is that each collocation holds only for the grammatical form quoted; blacksmith’s does not
collocate with forges (at least, in the phrase blacksmith’s forges). Fraud collocates with
forgery in my Guardian corpus but not with forgeries. Renouf (1986) and Sinclair (1991)
have both noted that collocation is a property of the word form, not the lemma. So the
collocational contexts of each grammatical form of a word need to be separately
described.
The second important observation is that the neighbours of a word may strongly affect
the sense of the word in question. Thus the strongest collocations of crude in my Guardian
data are with oil and price, where crude means ‘unprocessed’ in the first case and in the
second functions as a noun, denoting ‘unprocessed oil’. The third strongest collocation of
crude is attempt as in a crude attempt to or a crude attempt at, where crude means
‘unsubtle’. Its sense in crude forgeries is closely related to this last sense but also picks
up the idea of ‘poorly crafted’. Thus a word’s relationships with its different neighbours
results in different senses or shades of meaning for the word.
The third observation is that slight differences in the way the same two words
(p. 144)
collocate may connote significant communicational differences. In other words, not only
do a word’s different neighbours affect the word’s likely sense but the different ways in
which the same neighbour is used may do the same. Thus in the first 100 Google hits for
the search on blacksmith’s and forge, we have 37 instances of blacksmith’s forge
(occasionally separated by a modifier or classifier), 15 instances of blacksmith forge
(with, again, an occasional intervening pre-modifier), and 48 instances of blacksmith +
forge where the words occur in the same immediate text but not as part of a single
nominal group. The first of these combinations behaves differently from the others in that
20 of the 37 instances (54 per cent) make reference to history, with the remainder
referring either to the craft of forging metal (14 instances) or to fantasy games of the
Page 4 of 18

sword and sorcery kind (3). This contrasts markedly with the behaviour of the other two
types of combination, where historical reference is the exception, accounting for just
three (20 per cent) of the combination blacksmith forge and two (4 per cent) of
blacksmith + forge. By contrast, 11 of the 15 instances (73 per cent) of the second
combination (blacksmith forge) and 43 out of the 48 instances of the third (90 per cent)
refer to the craft of forging, which is true of only 38 per cent of instances of blacksmith’s
forge.
The statistical definition of collocation treats collocates as occurring with higher than
random frequency within a span of five words (or fewer) on either side of the node word.
But a word may be affected by its less close neighbours as well. To investigate this
possibility, I created a mini-corpus of forge by searching on the word using Google, first
eliminating uses of forge where the word was serving as a verb and then taking the first
page of each website that referred to the craft of forging out of the first 300 sites listed
(plus two advertising sites offered by Google but set apart from the list). When a corpus
was created in this way (between 28.6.13 and 30.6.13), 69 of the sites listed (23 per cent)
matched the criteria given.2 The mini-corpus so created contained 5,345 words and the
average length of the ‘texts’ was 77 words. Within this tiny corpus of tiny texts, there
were 198 instances of forge, and traditional occurred six times within the conventional
five word environment. (It did not appear as a collocate of forge in the Guardian corpus.)
However, examination of the word list for the mini-corpus taken in its (small) entirety
showed there were a further 22 instances of traditional in the mini-texts, which means
that 26 instances of traditional occur in 5,345 words (slightly under one occurrence every
200 words). It is theoretically possible that almost 41 per cent of the 69 mini-texts of my
corpus contain both the word forge and the word traditional, though the likelihood is that
the proportion is a little lower, given the possibility of the words occurring more than
once in a mini-text.
Collocations are frequently evidence of a more general relationship that a word

(p. 145)
might have with its context, namely that of semantic preference (Sinclair 1999; 2004) or
semantic association (Hoey 2005); I shall use the latter term here, but there is very little
difference between the concepts. A semantic association occurs whenever we find, within
a defined close proximity of a particular word (or phrase, or part of word), a lexical choice
drawn from a recognizable semantic set or field. In addition to the six instances of
traditional occurring within five words of forge, we also have old (one instance, excluding
non-temporal uses), modern (3), historic (2), historical (2), and steeped in history (2)
occurring within the immediate vicinity of forge. From this we can conclude that forge has
a semantic association with POSITIONING IN TIME. It seems generally to be the case
(though I am not aware that it has been tested whether it is always the case) that at least
one member of the semantic set forming a semantic association with the node word will
also be a collocation with the node word. Where such is the case, it is probable that we
are initially primed to associate the node word with its collocate(s) and then extrapolate
from this relationship to the more general relationship of semantic association on the
basis of further encounters with the word in the company of words semantically related
to the originally noted collocate(s). Our recognition of the collocate(s) in effect serves as
Page 5 of 18

a bridge to our identifying the semantic association and then subsequently as a reminder
of its existence. In the case of the semantic association of forge with POSITIONING IN
TIME, it is traditional that forms the bridge between collocation and semantic association,
as to a weaker extent does modern.
Parallel to this, we noted above that forgery collocates with fraud in the Guardian corpus,
the latter word occurring 24 times in L2 position with respect to the node word (i.e. two
places prior to forgery), characteristically in the phrase fraud and forgery. But we also
find in the same position the following words: theft (12 occurrences), deception (5),
bribery (3), blackmail, burglary, counterfeiting, and murder, indicating that forgery has a
semantic association with CRIMINAL ACTIVITY. So, in this case, there are two collocates
—fraud and theft—that build the bridge to the semantic association with CRIMINAL
ACTIVITY and pave the way for recognition and acceptance of other members of the set.
Both collocation and semantic association have a wider and less recognized dimension
than the one just described. The collocations and semantic associations we have been
describing all make use of the five-word environment to the left and right of the node
word. But there are reasons to believe that a word’s relationship with its neighbours is
not exhausted by this narrow span.
Returning to forge, this word has a further (relatively unsurprising) semantic association
with METAL. In the Guardian corpus, the data are too sparse to permit this to be
observed, though iron appears as a collocation and aluminium and metallurgist both occur
once within the conventional five-word environment. When, however, we turn to the forge
mini-corpus created from the web, we find that both iron and metal are collocates of
forges, iron occurring 7 times and metal 6 times in the 198-line concordance, accounting
between them for 6.6 per cent of the instances of forge. In addition, still within the five-
word environment to left and right of the node word, we find two (p. 146) instances each
of the words ironwork and ironworks and one of metalwork. So far, then, we have done no
more than give another example of collocation and semantic association, with iron and
metal as the bridge words between the two. However, a glance at the outer ends of the
concordance lines beyond the five-word limit suggests that there are other members of
the METAL semantic association seven or eight words away from the node word, as well
as further instances of the words already mentioned. In short, the glance suggests that
words within the five-word boundary are the tip of a metallic iceberg in the larger
environment of the node word. This is confirmed by an examination of the word list for
the 5,345-word mini-corpus, which reveals that 241 of the words of this mini-corpus are
members of, or relate to, the semantic set of METAL (see Table 8.1). This means that 4.5
per cent of the vocabulary of the forge mini-corpus is METAL-related, or, put another way,
that 1 in 22 of the words contained in the corpus is a member of the METAL semantic set
or can be paraphrased in such a way that the paraphrase includes a member of the set.
In many respects this should not be surprising. It is part of our knowledge of forges that
they are used to work metal, and my mini-corpus was, after all, constructed out of
websites that referred to forges. But it is striking how the statistics we have traditionally
Page 6 of 18

used to identify collocation have characteristically denied collocational status to words in

the local but not immediate environment of the word under investigation. We are
reminded that Halliday and Hasan (1976) originally listed collocation as one of the
cohesive strategies available to language users, and were thinking of lexical relationships
that crossed sentential boundaries, not of relationships formed by items in close
proximity to each other.
Page 7 of 18

Table 8.1 METAL-related words in the web-derived mini-corpus of texts retrieved with the search-term forge (after the removal of
verbal forms, names, and addresses)
steel 52 metallurgical 5 aluminium 1
iron 45 bronze 4 pewter 1
metal 35 copper 4 silver 1
ironwork 33 ironmongery 3 gold 1
alloy 16 alloys 2 iron-masters 1
metalwork 11 ironmasters 2 metallurgically 1
metals 10 irons 2 metallurgist 1
ironworks 7 magnesium 1 metalworking 1
steelwork 7 nickel 1 nickel-plated 1
brass 5 tinmill 1 wrought-iron 1
Page 8 of 18
PRINTED FROM OXFORD HANDBOOKS ONLINE (www.oxfordhandbooks.com). (c) Oxford University Press, 2015. All Rights Reserved. Under the terms of the licence agreement, an
individual user may print out a PDF of a single chapter of a title in Oxford Handbooks Online for personal use (for details see Privacy Policy).

It may therefore be helpful to distinguish immediate collocation (collocation as

characteristically defined by corpus linguists, occurring within a narrow span of the node
(p. 147) word) and cohesive collocation (words that occur in the local textual environment
of the word under investigation but beyond the five-word span). Both, I would argue, are
essential to our interpretation of the word. Indeed, it is the presence and importance of
the latter type of collocation that partly accounts for the near-universal failure of readers
to notice the mistaken use of crude forgeries in the passage with which this chapter
started, and the corresponding and apparently universal success of these readers in
interpreting the passage in the way it must be assumed the writer intended.
The interpretation of forgeries as forges is further reinforced, albeit less strongly, by the
fact that in the forge mini-corpus, forge collocates (in the immediate sense) with coal (7
occurrences in 198 concordance lines), gas (7), and fire (6). Two of these belong to, and
build a bridge to, a semantic association with FUEL, with charcoal (2) and wood (1)
occurring in the immediate environment of forge in the mini-corpus and charcoal, oil,
electricity, and wood (again), each occurring once in the Guardian corpus in the
immediate environment of forge. This semantic association is greatly strengthened if we
take into account cohesive collocation. The word list for the forge mini-corpus reveals the
cohesive collocates listed in Table 8.2.
Page 9 of 18

Table 8.2 FUEL-related words in the web-derived mini-corpus of texts retrieved with the search-term forge (after the removal of
verbal forms, names, and addresses)
fuel 13 charcoal 4 woodburning 1
coal 12 firewood 2 coals 1
coke 11 kindling 2 diesel 1
wood 9 log 2 multifuel 1
gas 9 oil 2 multi-fuel 1
Page 10 of 18
PRINTED FROM OXFORD HANDBOOKS ONLINE (www.oxfordhandbooks.com). (c) Oxford University Press, 2015. All Rights Reserved. Under the terms of the licence agreement, an
individual user may print out a PDF of a single chapter of a title in Oxford Handbooks Online for personal use (for details see Privacy Policy).

The combination of the FUEL semantic association, marked both in the immediate and
less immediate environments, and the collocation of forge with fire (which occurs 5 times
as immediate collocation and 42 times as cohesive collocation) is liable to make the
reader interpret smoke in the phrase smoke from the crude forgeries as referring to the
use of fuels in the forge rather than to the destruction of illegal copies by fire.
Finally, of course, the passage we have been examining contains the most common
manifestation of the POSITIONING IN TIME association—traditional—though again
beyond the five-word boundary. I repeat the passage below with the METAL, FUEL, and
POSITIONING IN TIME items marked in bold. It will be noted that the first emboldened
word, metalwork, occurs 11 times in the forge mini-corpus. Although it may not qualify as
an immediate collocate, it would appear to be a strong cohesive collocate.
and (p. 148) silver, found cladding traditional kitchen stoves across Ladakh.
Smoke from the crude forgeries rose over the village as I picked my way carefully
down the mountain between twisted trunks of willow trees
In short, words are assigned their meanings by users on the basis of their immediate
collocations and semantic associations and on the basis of their cohesive collocations. In
the case of the passage we have been considering, it would appear that the cohesive
collocations and semantic associations—METAL, FUEL, POSITIONING IN TIME—
override the expectations of CRIMINAL ACTIVITY associated with crude forgeries. Or,
perhaps more plausibly, it is the presence of words from these semantic sets that sets up
the expectation of mention of forges and results in readers ignoring the fact that forgeries
has been used in its place; Emmott (1997) shows how a frame might be set up in a
narrative in such a way as to control interpretation of subsequent language that is
encountered.
I have tried so far to demonstrate how the different kinds of neighbour a word may have
(and the different kinds of relationships it may have with these neighbours) help explain
the interpretive process in at least one, otherwise inexplicable, textual encounter. But I
have only done part of the work. I have shown how the relationships that forge
characteristically has with its neighbours enable readers to recognize that forge is
intended. I have not shown why we overlook the mistake. For this, we must make
reference to two other claims about the ways words relate to, and gain significance from
their relationships with, their neighbours.
The first of these claims is that once a relationship is formed between a word and its
neighbour, the combination takes on a life of its own, such that it too may form
collocations, colligations, and semantic associations. These need not be the same as those
that the separate words might form.
Page 11 of 18

To investigate, I created a second mini-corpus from the web by searching for crude
forgeries and extracting each sentence that contained the search phrase. A total of 106
instances were collected and then concordanced in the normal way. The following
characteristics were identified for the word combination. First, it has a strong
colligational tendency to occur as the complement of BE clauses. Slightly over half the
occurrences in my data (55 out of 106) occurred in this pattern. It is unsafe to compare
these results with those for the 96 instances of forgeries (excluding crude forgeries)
collected from the Guardian corpus because of the different ways in which these corpora
were created, but it is interesting to note that only a quarter of the latter (23 out of 96)
occurred as complement with BE.
Secondly, one of the immediate collocations of crude forgeries is documents (7 out of 106
—accounting for 6.6 per cent of cases). The proportion is very much higher if cohesive
collocation is included (and here we are only talking of instances visible in the KWIC
format but outside the five-word limit): including these there are 28 instances of
documents (accounting for over a quarter of the data). The collocation is even stronger
with the combination BE crude forgeries. Of the 28 instances of documents just
mentioned, 23 occur to the left of BE crude forgeries, accounting for almost exactly half
of the cases of the combination (50.9 per cent).
Thirdly, the word documents itself serves as a bridge into two semantic
(p. 149)
associations that heavily overlap with each other—PAPER OR ELECTRONIC

DOCUMENTS and PAPER ARTEFACTS. The former occurs in 62 of the concordance lines
(58.5 per cent); the latter occurs in a further 11 lines. Taken together, the semantic
associations are manifested in 68.9 per cent of the data.
Fourthly, the combination crude forgeries has a semantic association with EXPOSURE.
This is manifested as shown (4 instances), revealed (3), exposed (3), turned out to be (3),
identify (2), proved (2), and detected (2) in the immediate environment, accounting for
17.8 per cent of the data; as before, there are considerably more instances if we take
account of instances falling just outside the five word boundary.
As before, the association is more precise than it first appears. EXPOSURE is particularly
associated with the nested combination PAPER OR ELECTRONIC DOCUMENTS/PAPER
ARTEFACTS BE crude forgeries (of which there are 55—the discrepancy with the figure
quoted above is explained by reference to the fact that there are a handful of cases where
BE is not present). There are 25 instances of semantic association with EXPOSURE to be
found among the 55 cases of the above combination, the great majority making use of the
reporting structure. A further 11 cases colligate with reporting verbs of CLAIM.
Altogether there are 36 cases of PAPER OR ELECTRONIC DOCUMENTS/PAPER
ARTEFACTS BE crude forgeries (65.5 per cent of the data) that are either associated with
claims of forgery or with claims of evidence of forgery.
Page 12 of 18

These kinds of interconnecting phenomena were described by Sinclair (2004) in his

account of the lexical item, and are key to the way that lexical priming theory seeks to
account for the complexities of the language we use and interpret. Of course the account
given here is incomplete. There is a collocation of crude forgeries with as, for example, in
which as functions as a stand-in for BE and immediately follows either EXPOSURE or
CLAIM. There is also an association with COLLECTIBLES, especially stamps, notes, and
coins. Even so, very few of the instances of crude forgeries escape all of the features that
have been discussed so far in this chapter, and this explains why the reader overlooks the
substitution of forgeries for forges. On the one hand, the larger textual environment
offered by our chosen passage conforms wholly to our primings for forge; on the other
hand, the same textual environment conforms to none of our primings for forgeries. It is
no wonder that when we read the passage we ignore forgeries and think that we have
read forges (or that forgeries is another word for forges).
The different kinds of relationship that a word has with its neighbours—collocations,
colligations, semantic associations, and a number of other relationships not discussed
here (see Hoey 2005 for a fuller account)—amount to a corpus-driven account of
interpretation, in which the key facts are lexical in nature rather than grammatical, and
are specific to the item rather than widely generalizable across the language (though of
course the types of relationship posited by corpus linguists are themselves a kind of
generalization). It is not necessary to accept any underlying theory of language to admit
the explanatory value of the features described in this chapter. Nevertheless, though the
study of corpora has led to the discovery of these features, they are arguably best
understood as psycholinguistic phenomena rather than purely corpus-linguistic
phenomena.
(p. 150)Since the 1970s, psycholinguistic research has been demonstrating that exposure
to certain words may accelerate recognition of certain other words (semantic priming),
the inference being that the words in question are stored in close proximity (Meyer and
Schvanefeldt 1976; Neely 1976; 1977). Other psycholinguistic research, also of 35 years’
standing, has shown that earlier exposure to a particular combination of words will
result, in some cases after a considerable time, in accelerated recognition of the second
word after the first word has been shown, even where the combination originally shown
was highly likely to have been a unique occurrence (repetition priming) (Scarborough et
al. 1977). The conclusion to be drawn is that each encounter with any piece of language
(contra Pinker 1994) is stored as received (though of course it may also be processed in
the ways Pinker and others suggest), and that this store is capable of being accessed on
subsequent linguistic encounters (see Pace-Sigge 2013 for a far more thorough account of
this literature). The ability to recognize literary allusion (and, more wretchedly,
plagiarism) in part depends on this access to the original wordings that are being
borrowed.
Lexical priming theory (Hoey 2005) seeks to integrate this psycholinguistic research
(which because it has its origins in psychological concerns has, as far as I can see, been
little accessed by linguists) with the findings of corpus linguists. It argues that every
Page 13 of 18

encounter we have with language, whether spoken or written, results in what we have
heard or read being stored. The mental store, as it accumulates data about any piece of
language, primes us to expect that piece of language to collocate with particular other
pieces of language because the store shows that they have been encountered repeatedly.
As we encounter linguistic expressions that only partly conform to our expectations,
though, we modify those expectations and become primed to associate the piece of
language in question with members of a particular semantic set, to which the collocations
originally identified belong. At the same time, our increasing collection of instances of
this piece of language will prime us to associate that piece of language with certain
grammatical functions, grammatical structures, and, most fundamentally, grammatical
categories. The grammar of a language, according to this view, is the cumulative and
inconsistent product of the local colligational primings of innumerable words.
As noted above, acceptance of the observable kinds of relationship that a word may form
with its neighbours does not entail acceptance of any particular theory that might
attempt to account for those observations. But if the theory just outlined were to be taken
seriously (and there is still a need to account for the psychological reality of the
collocations even if it is not), then it places the relationships that a word forms with its
neighbours at the very core of what it is to be a linguistic being. An essay that is
ostensibly about the reasons why a linguistic mistake is ignored and a text correctly
interpreted is in reality an essay about how we make and find meaning.
The discourse conventions of academic writing ought to have meant that the previous
sentence was the final one of this chapter. It is after all a broad generalization derived
from the previous analyses and discussion, and it attempts to assign significance to the
argument of the chapter. But it is not the final sentence and it should not be. The reason
is that in the previous four paragraphs there has been linguistic sleight of hand. The title
of this chapter is ‘Words and Their Neighbours’ and the previous paragraph, as well as
(p. 151) my brief account of the psycholinguistic research into semantic and repetition
priming, also talks of words. But the intervening account of lexical priming talks vaguely
of ‘pieces of language’. The reason of course is that the word does not have hallowed
status as a category. Corpus linguists of languages that employ alphabetic systems tend
to make use of words because they appear to be orthographically distinct, but they are
markedly less so in a language such as Chinese, where combinations of characters may in
the view of Chinese speakers result in collocations, phrases, or single words. Young
learners, whatever language they are primed in and for, cannot during early encounters
with the speech of those around them have any thoughts about which pieces of the sound
stream are words and which are combinations or constituents of words. A child’s first
primings (and an adult’s L2 primings where learning is solely by immersion) must be the
primings that associate certain pieces of the sound stream with certain meanings or
speech acts.
The reason I raise these issues is that I want to argue that our correct interpretation of
the crude forgeries passage is contingent upon a further factor in the wording that comes
from an unexpected quarter. So far we have looked at a word’s relationships with its
Page 14 of 18

neighbours at the level of the word and word combination; now we must look at the rank
below that of the word, or what might misleadingly be referred to as the level of
morphology. A word’s relationships to the surrounding text are partly affected by the
relationship of its sub-components to the surrounding text.
The difference between forges and forgeries lies in the piece of language -eries. I
therefore examined the ways that -eries is used in English, as represented in my corpora.
In the combined corpora of the Guardian and the BNC, there were 118,932 tokens of
words ending in -eries. Excluding series and queries, there were 142 separate types
making use of the ending, and these types could be grouped according to their
membership of a small number of distinct and unequal semantic sets. Two of these we
need not dwell on. The first group consists of Tuileries, shrubberies, rockeries, palmeries,
orangeries, and nurseries (in one of its senses) and appears to comprise gardens of
various kinds; these make up 4.2 per cent of the types. (Nurseries is the outlier of this
group, in that nurseries prepare plants for gardens rather than being gardens
themselves.) The other group is made up (somewhat curiously) of monasteries,
menageries, nurseries (in another sense), presbyteries, piggeries, nunneries, deaneries,
catteries, chancelleries, chanceries, and fisheries. This group seems to consist of the
residences of animals and the religious, with nurseries again the outlier, babies being
neither bestial nor saintly. They account for 11 (7.7 per cent) of the types ending in -eries.
The remaining groups are directly relevant to the way we interpret our crude forgeries
example. The first is concerned with sins and crimes. In the list are adulteries, trickeries,
treacheries, mockeries, snobberies, ruderies, bitcheries, skulduggeries, savageries,
debaucheries, robberies, quackeries, pruderies, lecheries, butcheries, enslaveries,
chicaneries, flatteries, and of course forgeries. These items account for 14.1 per cent, or 1
in 7, of the types ending in -eries. To these might be added a heterogeneous set of words
that, while not crimes or sins, seem to be used to express repugnance, unhappiness, or
disapproval: snotteries, splatteries, miseries, sludgeries, misdeliveries, slickeries,
grotesqueries, (p. 152) gaucheries, camperies, flummeries, and dysenteries. If these are
added to the tighter earlier list, they together comprise 21.8 per cent, or 1 in 5, of the
types found in my corpora. So we are characteristically primed to associate the ending of
forgeries with crimes, sins, and unpleasantness.
However, there is an even more important (for this argument) -eries group. Consider the
following group of types:
Page 15 of 18

wineries potteries tanneries
surgeries saddleries rotisseries
refineries perfumeries patisseries
ouzeries noodleries bakeries
meaderies smokeries hatcheries
haberdasheries fisheries distilleries
creameries collieries canneries
breweries butteries fromageries
boulangeries piggeries orangeries
creperies nurseries
All of these 29 items, a couple of which were included in earlier groupings, are concerned
with the production of something—food, drink, products—and they constitute 20.4 per
cent, or 1 in 5, of the list of types generated by my corpora. To them might be added
another group concerned with the provision of food and drink: boozeries, fast-fooderies,
hostelries, nosheries, kebaberies, chocolateries, and carveries. (Indeed noodleries,
brasseries, and creperies belong in both groups.) The two groups combined, all concerned
with the provision or production of some purchasable product, constitute 25.3 per cent,
or 1 in 4, of all the types. The implication is that we are primed to associate -eries with
products. In the light of this, consider again, for the last time, the passage we have been
using as the peg for our discussion of the ways words relate to their neighbours:
and silver, found cladding traditional kitchen stoves across Ladakh. Smoke from
the crude forgeries rose over the village as I picked my way carefully down the
mountain between twisted trunks of willow trees
We saw that the -eries of forgeries has a semantic association with products, and here we
find that the passage speaks of products—the most beautiful beaten bronze, copper and
(p. 153) silver—and talks of where the products are used. It is no wonder that forgeries
are (correctly) understood to mean forges.
Page 16 of 18

To sum up, then, words relate to their neighbours in a variety of ways and at a variety of
levels of detail. When we construct our own utterances or interpret those of others, we
subconsciously note the collocations that the word on its own makes with other words; we
also note the collocations that the combination has with other words or combinations of
words. The same goes for the colligations that we note for the word or word combination,
and for the semantic associations that we make in our minds between the word or word
combination and particular semantic sets. There are other kinds of association that a
word or a combination of words may contract, which we have not touched upon, such as
pragmatic association, textual collocation, textual semantic association, or textual
colligation (Hoey 2005), as well as referential association, orthographic association, and
punctuation association (Salim 2012).
Furthermore, as we have seen above, there are grounds for investigating whether the
sub-components of words also regularly participate in all these kinds of relationship, and
here of course we need to take account of phonetics and phonology. To the ‘sins and
crimes’ list of -eries could be added burglaries if the sound image rather than the spelling
is accessed, and to the ‘provision and products’ list could be added factories and
foundries. As I said above, near the false first ending of this chapter, the relationships
that a word forms with its neighbours are at the very core of what it is to be a linguistic
being; it is therefore unsurprising that an investigation of these relationships should
ultimately incorporate all levels, ranks, and types of language description. (p. 154)
Notes:
(1) The data used here and subsequently are taken from a corpus of the Guardian 1990-5,
by their kind permission; here and elsewhere the software used is WordSmith 6.0 (Scott
2013).
(2) The other sites used forge in three ways without remainder. They incorporated forge
into a name but made no other reference to the craft that might once have been
associated with the name; they referred to a technical software service; or they were
associated with swords and sorceries gaming in a range of ways. Only one site was hard
to classify, with both gaming and real metalworking associations.
Michael Hoey
Michael Hoey, University of Liverpool
Page 17 of 18

Page 18 of 18

Words and Their Neighbours - 2014 - Hoey

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Words and Their Neighbours - 2014 - Hoey

Uploaded by

Copyright:

Available Formats

Oxford Handbooks Online

Abstract and Keywords

Words relate to their neighbours in a variety of ways. When we construct an utterance, or

Keywords: collocation, colligation, semantic set, priming, cohesion

Subscriber: El Colegio de Mexico, A.C. (COLMEX); date: 26 April 2017

TRADITIONAL theories of language used to separate grammatical descriptions from the

Amongst descriptive and theoretical linguists it is probably no longer necessary to argue

Subscriber: El Colegio de Mexico, A.C. (COLMEX); date: 26 April 2017

Collocation may be defined both statistically and psychologically. A statistical definition

Subscriber: El Colegio de Mexico, A.C. (COLMEX); date: 26 April 2017

Subscriber: El Colegio de Mexico, A.C. (COLMEX); date: 26 April 2017

Collocations are frequently evidence of a more general relationship that a word

Subscriber: El Colegio de Mexico, A.C. (COLMEX); date: 26 April 2017

Subscriber: El Colegio de Mexico, A.C. (COLMEX); date: 26 April 2017

used to identify collocation have characteristically denied collocational status to words in

Subscriber: El Colegio de Mexico, A.C. (COLMEX); date: 26 April 2017

steel 52 metallurgical 5 aluminium 1

iron 45 bronze 4 pewter 1

metal 35 copper 4 silver 1

ironwork 33 ironmongery 3 gold 1

alloy 16 alloys 2 iron-masters 1

metalwork 11 ironmasters 2 metallurgically 1

metals 10 irons 2 metallurgist 1

ironworks 7 magnesium 1 metalworking 1

steelwork 7 nickel 1 nickel-plated 1

brass 5 tinmill 1 wrought-iron 1

Subscriber: El Colegio de Mexico, A.C. (COLMEX); date: 26 April 2017

It may therefore be helpful to distinguish immediate collocation (collocation as

Subscriber: El Colegio de Mexico, A.C. (COLMEX); date: 26 April 2017

fuel 13 charcoal 4 woodburning 1

coal 12 firewood 2 coals 1

coke 11 kindling 2 diesel 1

wood 9 log 2 multifuel 1

gas 9 oil 2 multi-fuel 1

Subscriber: El Colegio de Mexico, A.C. (COLMEX); date: 26 April 2017

Subscriber: El Colegio de Mexico, A.C. (COLMEX); date: 26 April 2017

associations that heavily overlap with each other—PAPER OR ELECTRONIC

Subscriber: El Colegio de Mexico, A.C. (COLMEX); date: 26 April 2017

These kinds of interconnecting phenomena were described by Sinclair (2004) in his

Subscriber: El Colegio de Mexico, A.C. (COLMEX); date: 26 April 2017

Subscriber: El Colegio de Mexico, A.C. (COLMEX); date: 26 April 2017

Subscriber: El Colegio de Mexico, A.C. (COLMEX); date: 26 April 2017

wineries potteries tanneries

surgeries saddleries rotisseries

refineries perfumeries patisseries

ouzeries noodleries bakeries

meaderies smokeries hatcheries

haberdasheries fisheries distilleries

creameries collieries canneries

breweries butteries fromageries

boulangeries piggeries orangeries

are (correctly) understood to mean forges.

Subscriber: El Colegio de Mexico, A.C. (COLMEX); date: 26 April 2017

Michael Hoey, University of Liverpool

Subscriber: El Colegio de Mexico, A.C. (COLMEX); date: 26 April 2017

Subscriber: El Colegio de Mexico, A.C. (COLMEX); date: 26 April 2017

You might also like