You are on page 1of 34

Linguistic Society of America

Morphological and Semantic Regularities in the Lexicon


Author(s): Ray Jackendoff
Source: Language, Vol. 51, No. 3 (Sep., 1975), pp. 639-671
Published by: Linguistic Society of America
Stable URL: http://www.jstor.org/stable/412891 .
Accessed: 26/05/2011 02:40

Your use of the JSTOR archive indicates your acceptance of JSTOR's Terms and Conditions of Use, available at .
http://www.jstor.org/page/info/about/policies/terms.jsp. JSTOR's Terms and Conditions of Use provides, in part, that unless
you have obtained prior permission, you may not download an entire issue of a journal or multiple copies of articles, and you
may use content in the JSTOR archive only for your personal, non-commercial use.

Please contact the publisher regarding any further use of this work. Publisher contact information may be obtained at .
http://www.jstor.org/action/showPublisher?publisherCode=lsa. .

Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printed
page of such transmission.

JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of
content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms
of scholarship. For more information about JSTOR, please contact support@jstor.org.

Linguistic Society of America is collaborating with JSTOR to digitize, preserve and extend access to Language.

http://www.jstor.org
MORPHOLOGICAL AND SEMANTIC REGULARITIES IN THE LEXICON

RAY JACKENDOFF
Brandeis University
This paper proposes a theory of the lexicon consistentwith the Lexicalist Hypoth-
esis of Chomsky's 'Remarks on nominalization' (1970). The crucial problem is to
develop a notion of lexical redundancyruleswhich permitsan adequatedescriptionof
the partial relations and idiosyncrasy characteristic of the lexicon. Two lexicalist
theories of redundancyrules, each equipped with an evaluation measure, are com-
pared on the basis of their accounts of nominalizations; the superior one, the FULL-
ENTRY THEORY, is then applied to a range of further well-known examples such as
causative verbs, nominal compounds, and idioms.
The starting point of the Lexicalist Hypothesis, proposed in Chomsky's 'Remarks
on nominalization' (1970), is the rejection of the position that a nominal such as
Bill's decision to go is derived transformationally from a sentence such as Bill
decided to go. Rather, Chomsky proposes that the nominal is generated by the base
rules as an NP, no S node appearing in its derivation. His paper is concerned with
the consequences of this position for the syntactic component of the grammar. The
present paper will develop a more highly articulated theory of the lexical treatment
of nominals, show that it is independently necessary, and extend it to a wide range
of cases other than nominalizations.1
The goal of this paper is very similar to that of Halle 1973: the presentation of a
framework in which discussion of lexical relations can be made more meaningful.
I will not present any new and unusual facts about the lexicon; rather, I will try
to formulate a theory which accommodates a rather disparate range of well-known
examples of lexical relations. The theory presented here, which was developed
independently of Halle's, has many points of correspondence with it; I have,
however, attempted a more elaborate working out of numerous details. I will
mention important differences between the theories as they arise.

1. LEVELSOF ADEQUACYIN DESCRIPTION.


In a theory of the lexicon, we can
distinguish three levels of adequacy in description, parallel to those discussed by
Chomsky 1965 for grammatical theory. The first level consists in providing each
lexical item with sufficient information to describe its behavior in the language.
This corresponds to Chomsky's level of observational adequacy, in which the
grammar is required to enumerate correctly the set of sentences in the language.
A theory of the lexicon meeting the second level of adequacy expresses the relation-
ships, sub-regularities, and generalizations among lexical items of the language,
e.g. the fact that decide and decision are related in a systematic fashion. This level
corresponds to Chomsky's level of descriptive adequacy, which requires the

1 My thanks go to John Bowers, FrancoisDell, Noam Chomsky, Morris Halle, and to classes
at the 1969 LinguisticInstitute and BrandeisUniversityfor valuablediscussion.Earlierversions
of this paper were presentedto the 1969 summerLSA meeting at the University of Illinois and
to the 1970 La Jolla Syntax Conference.Thanks also to Dana Schaul for many useful examples
scattered throughout the paper.
639
640 LANGUAGE, VOLUME 51, NUMBER 3 (1975)

grammar to express correctly relationships between sentences, such as the active-


passive relation.
A theory of the lexicon meeting the third level of'adequacy describes how the
particular relationships and sub-regularities in the lexicon are chosen-why the
observed relationships, and not other imaginable ones, form part of the description
of the lexicon in question. One of the questions that must be answered at this level
is, e.g., why decide rather than decision is chosen as the more 'basic' of the two
related items. This element of the theory takes the form of an 'evaluation measure'
which assigns relative values to competing lexical descriptions available within the
theory. This is the level of explanatory adequacy.
As Chomsky emphasizes, the evaluation measure does not decide between
competing THEORIES of the lexicon, but between competing descriptions within
the same theory. Each theory must provide its own evaluation measure, and a
comparison of competing theories must be based on their success in meeting all
three levels of adequacy.
Evaluation measures have typically been built into linguistic theories implicitly
as measures of length of the grammar, i.e. its number of symbols. One place where
such a measure is made explicit is in Chomsky & Halle 1968, Chapter 8. The
abbreviatory conventions of the theory-parentheses, braces etc.-are designed so
as to represent linguistically significant generalizations in terms of reduced length
of grammatical description. Similarly, Chomsky & Halle develop the concept of
marking conventions in order to be able to distinguish more 'natural' (i.e. explana-
tory) rules from less 'natural' ones, in terms of the number of symbols needed to
write the rules.
In ?2 below, I will present two theories of the lexicon compatible with the
Lexicalist Hypothesis. One has a traditional evaluation measure which is applied
to the number of symbols in the lexicon; the other has a more unusual measure
of complexity, referring to 'independent information content'. In ?3 I will show
that the latter theory is preferable. It is hoped that such an example of a n-on-
traditional evaluation measure will lead to greater understanding of the issue of
explanatory adequacy, which has been a source of great confusion in the field.

2. FORMULATION OF TWO PRELIMINARY THEORIES.The fundamental linguistic


generalization that must be captured by any analysis of English is that words like
decision are related to words like decide in their morphology, semantics, and
syntactic patterning. For Lees 1960, it seemed very logical to express this relation-
ship by assuming that only the verb decide appears in the lexicon, and by creating
the noun decision as part of a transformational process which derives the NP
John's decision to go from the S John decidedto go. However, for reasons detailed in
Chomsky 1970, this approach cannot be carried out consistently without expanding
the descriptive power of transformations to the point where their explanatory
power is virtually nil.
Without transformations to relate decide and decision, we need to develop some
other formalism. Chomsky takes the position that decide and decision constitute a
single lexical entry, unmarked for the syntactic feature that distinguishes verbs
from nouns. The phonological form decision is inserted into base trees under the
MORPHOLOGICALAND SEMANTIC REGULARITIES IN THE LEXICON 641

node N; decide is inserted under V. Since Chomsky gives no arguments for this
particular formulation, I feel free to adopt here the alternative theory that decide
and decision have distinct but related lexical entries. In regard to Chomsky's
further discussion, the theories are equivalent; the one to be used here extends
more naturally to the treatment of other kinds of lexical relations (cf. ?5). Our
problem then is to develop a formalism which can express the relations between
lexical entries in accord with a native speaker's intuition.2
It is important to ask what it means to capture a native speaker's intuition of
lexical relatedness. It makes sense to say that two lexical items are related if knowing
one of them makes it easier to learn the other-i.e. if the two items contain less
independent information than two unrelated lexical items do. A grammar that
expresses this fact should be more highly valued than one that does not. The
advocate of a transformational relationship between decide and decision claims
that this intuitive sense of relatedness is expressed by his transformation, in that
it is unnecessary to state the shared properties of the words twice. In fact, it
is unnecessary to state the properties of decision at all, since they are predictable
from the lexical entry of decide and the nominalization transformation.3 Hence
a grammarcontaining the nominalization transformation contains less independent
information than one without it-since instead of listing a large number of
nominalizations, we can state a single transformation. Within such a grammar,
the pair decide-decisioncontains fewer symbols than a random pair such as decide-
jelly: given decide, there need be no lexical entry at all for decision, but jelly
needs a lexical entry whether or not decide is listed. Furthermore, the regularity
of decide-decision means that many pairs will be related by the transformation,
so a net reduction in symbols in the grammar is accomplished, and the evaluation
measure will choose a grammar including the transformation over one without it.
Since the Lexicalist Hypothesis denies a transformational relationship between
decide and decision, their relationship must be expressed by a rule within the
lexical component. Transformational grammar has for many years had a name
for the kind of rule that expresses generalizations within the lexicon-it is called a

2
Advocates of the theory of generative semantics might at this point be tempted to claim
that a formalism for separate but related lexical items is yet another frill requiredby lexicalist
syntax, and that generativesemanticshas no need for this type of rule. I hasten to observe that
this claim would be false. In the generative semantic theory of lexical insertion developed in
McCawley 1968 and adopted by Lakoff 1971a, lexical items such as kill and die have separate
lexical entries, and are inserted to distinct derived syntactic/semantic structures. For a
consistent treatmentof lexical insertion, then, breakin The windowbroke must be inserted onto
a tree of the form [vBREAK],while break in John broke the window must be inserted onto
[vCAUSE BREAK], which has undergonePredicateRaising; in other words, breakhas two distinct
lexical entries. Semantically,the two breaksare related in exactly the same way as die and kill;
but clearly break and break must be related in the lexicon in a way that die and kill are not.
A similarargumentholds for ruleand rulervs. ruleand king. Thus generativesemanticsrequires
rules expressing lexical relations for exactly the same reasons that the Lexicalist Hypothesis
needs them. Only in the earlier 'abstract syntax' of Lees 1960 and Lakoff 1971b are such rules
superfluous.
3 Of course, it also is difficultto express the numerous idiosyncrasiesof nominalizations, as
Chomsky 1970 points out at some length.
642 LANGUAGE, VOLUME 51, NUMBER 3 (1975)

lexical redundancy rule; but little work has been done until now toward a formal-
ization of such rules.
The first question we must ask is: By what means does the existence of a lexical
redundancy rule reduce the independent information content of the lexicon ? There
are two possibilities. The first, which is more obvious and also more akin to the
transformational approach, gives decide a fully specified entry; but the entry for
decision is either non-existent or, more likely, not fully specified. The redundancy
rule fills in the missing information from the entry of decide at some point in the
derivation of a sentence containing decision, perhaps at the stage of lexical insertion.
As in the transformational approach, the independent information content of
decide-decision is reduced, because the entry for decision does not have to be filled
in. The evaluation measure again can simply count symbols in the grammar. We
THEORY.
may call this theory the IMPOVERISHED-ENTRY
Within such a theory, a typical lexical entry will be of the form given below. All
aspects of this form are traditional except for the 'entry number', which is simply
an index permitting reference to a lexical entry independent of its content:
(1) rentry number
/phonological representation/
L syntacticfeatures
SEMANTIC REPRESENTATION J

For example, decide will have the form 2. The entry number is arbitrary, and the
semantic representation is a fudge standing for some complex of semantic markers.
The NP indices correlate the syntactic arguments of the verb to the semantic
arguments (cf. Jackendoff 1972, Chapter 2, for discussion of this):
(2) -784
/decid/
+V
+[[NPi on NP2]
_NP1DECIDE
ONNP2_
We now introduce a redundancy rule, 3, in which the two-way arrow may be read
as the symmetric relation 'is lexically related to'. The rule thus can be read: 'A
lexical entry x having such-and-such properties is related to a lexical entry w
having such-and-such properties.'
x - ~w 1
/y + ion/ ly!
(3) +N ^ +
( ) + [NP1's (P) NP2] + [NP1 (P) NP2]
ABSTRACT RESULT OF ACT NP1 Z NP2
OF NP1'S Z-ING NP2 - L
Given the existence of 3, decision needs only the following lexical entry:
(4) -375
derived from 784
by rule 3
MORPHOLOGICAL AND SEMANTIC REGULARITIES IN THE LEXICON 643

This theory thus reduces the lexical entry for decision to a cross-reference to the
related verb plus a reference to the redundancy rule. The entries of many other
nouns will be simplified similarly by the use of a reference to 3. The independent
information content of the lexicon can be determined straightforwardlyby adding
up the information in lexical entries plus that in redundancy rules; hence the
evaluation measure can be stated so as to favor grammars with fewer symbols.
A second possible approach to lexical redundancy rules, the FULL-ENTRYTHEORY,
assumes that both decide and decision have fully specified lexical entries, and that
the redundancy rule plays no part in the derivation of sentences, as it does in both
the transformational theory and the impoverished-entry theory. Rather, the
redundancyrule plays a role in the information measure for the lexicon. It designates
as redundantthat information in a lexical entry which is predictable by the existence
of a related lexical item; redundant information will not be counted as independent.
In the full-entry theory, lexical entries again have the form of 1, except that an
entry number is unnecessary. Decide has the form of 2, minus the entry number.
Decision, however, will have the following entry:
(5) [/decid + ion/
+N
+ [NP's __ on NP2]
ABSTRACT RESULT OF ACT OF
NP1'S DECIDING NP2
We evaluate the lexicon as follows: first, we must determine the amount of in-
dependent information added to the lexicon by introducing a single new lexical
entry; then, by adding up all the entries, we can determine the information content
of the whole lexicon.
For a first approximation, the information added by a new lexical item, given a
lexicon, can be measured by the following convention:
(6) (Information measure)
Given a fully specified lexical entry W to be introduced into the lexicon,
the independent information it adds to the lexicon is
(a) the information that W exists in the lexicon, i.e. that W is a word
of the language; plus
(b) all the information in W which cannot be predicted by the existence
of some redundancy rule R which permits W to be partially
described in terms of information already in the lexicon; plus
(c) the cost of referring to the redundancy rule R.
Here 6a is meant to reflect one's knowledge that a word exists. I have no clear
notion of how important a provision it is (it may well have the value zero), but I
include it for the sake of completeness. The heart of the rule is 6b; this reflects one's
knowledge of lexical relations. Finally, 6c represents one's knowledge of which
regularities hold in a particular lexical item; I will discuss this provision in more
detail in ?6.
To determine the independent information content of the pair decide-decision,
let us assume that the lexicon contains neither, and that we are adding them one by
one into the lexicon. The cost of adding 2, since it is related to nothing yet in the
644 LANGUAGE, VOLUME 51, NUMBER 3 (1975)

lexicon, is the information that a word exists, plus the complete information content
of the entry 2. Given 2 in the lexicon, now let us add 5. Since its lexical entry is
completely predictable from 2 and redundancy rule 3, its cost is the information
that a word exists plus the cost of referringto 3, which is presumably less than the
cost of all the information in 5. Thus the cost of adding the pair decide-decision
is the information that two words exist, plus the total information of the entry 2,
plus the cost of referring to redundancy rule 3.
Now note the asymmetry here: if we add decision first, then decide, we arrive at
a different sum: the information that two words exist, plus the information
contained in 5, plus the cost of referring to redundancy rule 3 (operating in the
opposite direction). This is more than the previous sum, since 5 contains more
information than 2: the four extra phonological segments +ion and the extra
semantic information represented by ABSTRACT RESULT OF ACT OF. To establish
the independent information content for the entire lexicon, we must choose an
order of introducing the lexical items which minimizes the sum given by successive
applications of 6. In general, the more complex derived items must be introduced
after the items from which they are derived. The information content of the
lexicon is thus measured as follows:
(7) (Information content of the lexicon)
Given a lexicon L containing n entries, W1, ..., Wn, each permutation P
of the integers 1, ..., n determines an order Ap in which W1, ..., W,, can
be introduced into L. For each ordering Ap, introduce the words one
by one and add up the information specified piecemeal by procedure 6,
to get a sum Sp. The independent information content of the lexicon
L is the least of the n! sums Sp, plus the information content of the
redundancy rules.
Now consider how an evaluation measure can be defined for the full-entry
theory. Minimizing the number of symbols in the lexicon will no longer work,
because a grammar containing decide and decision, but not redundancy rule 3,
contains fewer symbols than a grammar incorporating the redundancy rule, by
exactly the number of symbols in the redundancy rule. Since we would like the
evaluation measure to favor the grammar incorporating the redundancy rule, we
will state the evaluation measure as follows:
(8) (Full-entry theory evaluation measure)
Of two lexicons describing the same data, that with a lower information
content is more highly valued.
The details of the full-entry theory as just presented are somewhat more complex
than those of either the transformational theory or the impoverished-entry theory.
However, its basic principle is in fact the same: the evaluation measure is set up
so as to minimize the amount of unpredictable information the speaker knows
(or must have learned). However, the measure of unpredictable information is no
longer the number of symbols in the lexicon, but the output of information
measure 7: this expresses the fact that, when one knows two lexical items related
by redundancy rules, one knows less than when one knows two unrelated items
of commensurate complexity.
I will argue that the full-entry theory, in spite of its apparent complexity, is
MORPHOLOGICAL AND SEMANTIC REGULARITIES IN THE LEXICON 645

preferable to the impoverished-entry theory. As a prelude to this argument, I will


mention two other discussions of redundancy rules.
The formulation of morpheme-structure rules-those redundancy rules which
predict possible phonological combinations within the words of a language-is
also open to an impoverished-entry theory and a full-entry theory. The former is
used in Halle 1959 and in the main presentation of Chomsky & Halle, where the
redundancy rules are treated as part of the re-adjustmentrules. However, Chapter 8
of SPE describes some difficulties in this theory pointed out by Stanley 1967. The
alternative theory presented is (I believe) a notational variant of the full-entry
theory: the redundancy rules do not play an active role in a derivation, but rather
function as part of the evaluation measure for the lexicon.4 If the full-entry theory
turns out to be correct for the closely related area of morpheme-structurerules, we
should be inclined to prefer it for the rules relating lexical items.
Halle 1973 proposes a variant of the full-entry theory for the processes of word
formation we are concerned with here. In his theory, the redundancy rules generate
a set of'potential lexical items' of the language. He then uses the feature [ + Lexical
Insertion] to distinguish actual words from non-existent but possible words.
A 'special filter' supplies unpredictable information, including the value of
[Lexical Insertion]. The filter thus contains all the information of 6a and 6b, but
has nothing that I can relate to 6c.
Consider the contents of Halle's filter, an unordered list of idiosyncratic informa-
tion. This list must include referenceto every lexical item, including all potential but
non-existent ones. It is not rule-governed-rather, it is intended to state precisely
what is not rule-governed. It is clear why Halle sets up the lexicon in this way: he is
trying to retain a portion of the lexicon where the independent information can be
measured simply by counting features, and the filter is just such a place. Our
formulation of the information measure in the full-entry theory has freed us of the
necessity of listing the independent information separately, or of distinguishing it
extrinsically from the redundant information. Instead we have a lexicon containing
merely a set of fully specified lexical entries (giving exactly those words that exist),
plus the set of redundancy rules. (I will mention Halle's theory again briefly at the
end of ?5.1.)
3. WHICHTHEORY? The argument for fully specified entries comes from con-
sideration of words whose affixation is predictable by a redundancy rule, but whose
putative derivational ancestors are not lexical items of English. Examples are
aggression, retribution, and fission, which have the morphological and semantic
properties of the nouns described in redundancy rule 3, but for which there are no
corresponding verbs *aggress, *retribute,or *fiss. Our intuition about these items
is that they contain less independent information than comparable items which
cannot be partially described by a redundancy rule (e.g. demise and soliloquy),
but that they contain more than comparable items which are related to genuine
lexical items (e.g. decision, attribution).
4 Chomsky & Halle retain impoverishedlexical entries, but only for the purpose of counting
up features not predicted by redundancyrules and listing what potential words actually exist.
Paired with each impoverishedentry, however, is a fully specifiedentry, which is what actually
takes part in the derivation.
646 LANGUAGE, VOLUME 51, NUMBER 3 (1975)

How can the three theories we have discussed describe these verbs? The trans-
formational theory must propose a hypothetical lexical item marked obligatorily
to undergo the nominalization transformation (cf. Lakoff 1971b). Thus the lexicon
must be populated with lexical items such as *fiss which are positive absolute
exceptions to various word-formation transformations. The positive absolute
exception is of course a very powerful device to include in grammatical theory
(see discussion in Jackendoff 1972). Furthermore, the use of an EXCEPTIONfeature
to prevent a lexical item from appearing in its 'basic' form is counter-intuitive: it
claims that English would be simpler if *fiss were a word, since one would not have
to learn that it is exceptional. Lakoffin fact claims that there must be a hypothetical
verb *king, corresponding to the noun king as the verb rule corresponds to the
noun ruler. Under his theory, the introduction of a real verb king would make
English simpler, in that it would eliminate an absolute exception feature from the
lexicon. In other words, the evaluation measure for the transformational theory
seems to favor a lexicon in which every noun with functional semantic information
has a related verb. Since there is little evidence for such a preference, and since it is
strongly counter-intuitive in the case of king, the transformational account-besides
requiring a very powerful mechanism, the absolute exception-is incorrect at the
level of explanatory adequacy.
Next consider the impoverished-entry theory. There are two possible solutions
to the problem of non-existent derivational ancestors. In the first, the entry of
retributionis as unspecified as that of decision (4); and it is related by redundancy
rule 3 to an entry retribute, which however is marked [-Lexical Insertion]. The
cost of adding retributionto the lexicon is the sum of the information in the entry
*retribute,plus the cost of retribution'sreferences to the redundancy rule and to the
(hypothetical) lexical item, plus the information that one word exists (or, more
likely, two-and the information that one of those is non-lexical). Under the
reasonable assumption that the cost of the cross-references is less than the cost of
the phonological and semantic affixes, this arrangement accurately reflects our
initial intuition about the information content of retribution. Furthermore, it
eliminates the use of positive absolute exceptions to transformations, replacing
them with the more restricted device [-Lexical Insertion]. Still, it would be nice to
dispense with this device as well, since it is rather suspicious to have entries which
have all the properties of words except that of being words. The objections to
hypothetical lexical items in the transformational theory at the level of explanatory
adequacy in fact apply here to [-Lexical Insertion] as well: the language is always
simpler if this feature is removed.
We might propose eliminating the hypothetical lexical entries by building them
into the entries of the derived items:

(9) 511 1
derived by rule 3 from
/retribut/
+V
+ [NP1 for NP2]
_ NP2 RETRIBUTE NP2 _ -
MORPHOLOGICAL AND SEMANTIC REGULARITIES IN THE LEXICON 647

The cost of 9 is thus the information that there is a word retribution, plus the
information within the inner brackets, plus the cost of referring to the redundancy
rule. Again, the assumption that the cross-reference costs less than the additional
information /ion/ and ABSTRACT RESULT OF ACT OF gives the correct description
of our intuitions. This time we have avoided hypothetical lexical items, at the
expense of using rather artificial entries like 9.
This artificiality betrays itself when we try to describe the relation between sets
like aggression-aggressive-aggressor, aviation-aviator, and retribution-retributive.
If there are hypothetical roots *aggress, *aviate, and *retribute,each of the members
of these sets can be related to its root by the appropriate redundancy rule 3,
lOa, or 10b, where lOa and 10b respectively describe pairs like predict-predictive
and protect-protector (I omit the semantic portion of the rules at this point for
convenience-in any case, ?4 will justify separating the morphological and
semantic rules):
[x ] [w 1
(10) a. l/y + ive/ <[> IUy
[+A J L+VJ

b. ly + or/ - [/y/ ]
+N _ +V
Suppose we eliminate hypothetical lexical items in favor of entries like 9 for
retribution.What will the entry for retributivelook like? One possibility is:
(11) [65 1
derived by rule lOa from
"/retribut/
+V
+NP1 for NP2
_NP1 RETRIBUTE NP2 _

But this solution requires us to list the information in the inner brackets twice, in
retributionand retributive:such an entry incorrectly denies the relationship between
the two words.
Alternatively, the entry for retributivemight be 12 (I use 3' here to denote the
inverse of 3, i.e. a rule that derives verbs from -ion nouns; presumably the presence
of 3 in the lexical component allows us to use its inverse as well):
-
(12) -65
derived by 3' and lOa from
511

Thus retributiveis related to retributionby a sequence of redundancy rules, and the


independent information content of the pair retribution-retributiveis the informa-
tion that there are two words, plus the information within the inner brackets of 9,
plus the cost of referring to 3' once and lOa twice. This is closer to the intuitively
correct solution, in that it relates the two words. However, it is still suspicious,
648 LANGUAGE, VOLUME 51, NUMBER 3 (1975)

because it claims retributionis more basic than retributive.Clearly the entries could
just as easily have been set up with no difference in cost by making retributive
basic. The same situation will arise with a triplet like aggression-aggressor-
aggressive, where the choice of one of the three as basic must be purely arbitrary.
Intuitively, none of the three should be chosen as basic, and the formalization of the
lexicon should reflect this. The impoverished-entry theory thus faces a choice:
either it incorporates hypothetical lexical items, or it describes in an unnatural
fashion those related lexical items which are related through a non-lexical root.
Consider now how the full-entry theory accounts for these sets of words,
beginning with the case of a singleton like perdition (or conflagration),which has
no relatives like *perdite, *perditive etc., but which obviously contains the -ion
ending of rule 3. We would like the independent information content of this item
to be less than that of a completely idiosyncratic word like orchestra-but more
than that of, say, damnation,which is based on the lexical verb damn. The impover-
ished-entry theory resorts either to a hypothetical lexical item *perdite or to an
entry containing another entry, like 9, which we have seen to be problematic.
The full-entry theory, on the other hand, captures the generalization without
extra devices. Note that 6b, the measure of non-redundant information in the
lexical entry, is cleverly worded so as to depend on the existence of redundant
information somewhere in the lexicon, but not necessarily on the existence of
related lexical entries. In the case of perdition, the only part of the entry which
represents a regularity in the lexicon is in fact the -ion ending, which appears as
part of the redundancy rule 3. What remains irregular is the residue described in
the right-hand side of 3, i.e. that part of perdition which corresponds to the non-
lexical root *perdite.Hence the independent information content of perditionis the
information that there is a word, plus the cost of the root, plus the cost of referring
to rule 3. Perdition adds more information than damnation,then, because it has a
root which is not contained in the lexicon; it contains less information than
orchestra because the ending -ion and the corresponding part of the semantic
content are predictable by 3 (presumably the cost of referringto 3 is less than the
information contained in the ending itself; see ?6).
We see then that the full-entry theory captures our intuitions about perdition
without using a hypothetical lexical item. The root *perdite plays only an indirect
role, in that its COSTappears in the evaluation of perdition as the difference between
the full cost of perditionand that of the suffix; nowhere in the lexicon does the root
appear as an independent lexical entry.
Now turn to the rootless pair retribution-retributive.Both words will have fully
specified lexical entries. To determine the independent information content of the
pair, suppose that retributionis added to the lexicon first. Its independent informa-
tion, calculated as for perdition above, is the information that there is a word, plus
the cost of the root *retribute, plus the cost of referring to 3. Note again that
*retribute does not appear anywhere in the lexicon. Now we add to the lexicon
the entry for retributive,which is entirely predictable from retributionplus redun-
dancy rules 3 and lOa. According to information measure 6, retributiveadds the
information that it is a word, plus the cost of referringto the two redundancy rules.
The cost of the pair for this order of introduction is therefore the information that
MORPHOLOGICAL AND SEMANTIC REGULARITIES IN THE LEXICON 649

there are two words, plus the information in the root *retribute,plus the cost of
referring to redundancy rules three times. Alternatively, if retributiveis added to
the lexicon first, followed by retribution,the independent information content of the
pair comes out the same, though this time the cost of the root appears in the
evaluation of retributive. Since the costs of these two orders are commensurate,
there is no optimal order of introduction, and thus no reason to consider either
item basic.
Similarly, the triplet aggression-aggressor-aggressivewill have, on any order of
introduction, an independent information content consisting of the information
that there are three words, plus the information content of the root *aggress, plus
the cost of referring to redundancy rules five times (once for the first entry in-
troduced, and twice for each of the others). Since no single order yields a signi-
ficantly lower information content, none of the three can be considered basic to the
others.
Thus the full-entry theory provides a description of rootless pairs and triplets
which avoids either a root in the lexicon or a claim that one member of the group is
basic, the two alternatives encountered by the impoverished entry theory. The full-
entry theory looks still more appealing when contrasted with the transformational
theory's account of these items. The theory of Lakoff 1971b introduces a positive
absolute exception on *perdite, requiring it to nominalize; but *aggress may
undergo either -ion nominalization, -or nominalization, or -ive adjectivaliza-
tion, and it must undergo one of the three. Lakoff is forced to introduce Boolean
combinations of exception features, together marked as an absolute exception, in
order to describe this distribution-patently a brute force analysis.
In the full-entry theory, then, the lexicon is simply a repository of all information
about all the existing words; the information measure expresses all the relation-
ships. Since the full-entry theory escapes the pitfalls of the impoverished-entry
theory, without giving up adequacy of description, we have strong reason to
prefer the former, with its non-standard evaluation measure. From here on, the
term 'lexicalist theory' will be used to refer only to the full-entry theory.
Before concluding this section, let us consider a question which frequently
arises in connection with rootless pairs and triplets: What is the effect on the lexicon
if a back-formation takes place, so that a formerly non-existent root (say *retribute)
enters the language? In the transformational theory, the rule feature on the hypo-
thetical root is simply erased, and the lexicon becomes simpler, i.e. more regular.
In the lexicalist theory, the account is a bit more complex, but also more sophis-
ticated. If retribute were simply added without disturbing the previous order for
measuring information content, it would add to the cost of the lexicon the informa-
tion that there is a new word plus the cost of referring to one of the redundancy
rules. Thus the total cost of retribution-retributive-retributewould be the informa-
tion that there are three words, plus the information in the root retribute,plus the
cost of four uses of redundancy rules. But now that retribute is in the lexicon, a
restructuring is possible, in which retribute is taken as basic. Under this order of
evaluation, the information content of the three is the information that there are
three words, plus the information in retribute, plus only two uses of redundancy
rules. This restructuring, then, makes the language simpler than it was before
650 LANGUAGE, VOLUME 51, NUMBER 3 (1975)

retribute was introduced, except that there is now one more word to learn than
before. What this account captures is that a back-formation ceases to be recognized
as such by speakers precisely when they restructure the evaluation of the lexicon,
taking the back-formation rather than the morphological derivatives as basic.
I speculate that the verb *aggress, which seems to have only marginal status in
English, is still evaluated as a back-formation, i.e. as a derivative of aggression-
aggressor-aggressive, and not as their underlying root. Thus the lexicalist theory
of nominalizations provides a description of the diachronic process of back-
formation which does more than simply erase a rule feature on a hypothetical
lexical item: it can describe the crucial step of restructuringas well.

4. SEPARATEMORPHOLOGICAL RULES.At the outset of the dis-


AND SEMANTIC
cussion, I stated redundancy rule 3 so as to relate lexical items both at the mor-
phological and semantic levels. In fact, this formulation will not do. It claims that
there is a particular meaning, ABSTRACT
RESULTOFACTOFV-ING,associated with the
ending -ion. However, several different semantic relations obtain between -ion
nominals and their related verbs, and several nominalizing endings can express the
same range of meanings. Some of the morphological rules are stated in M1 (the
morphological part of 3), M2, and M3:

(13) ..M: [ ]
[/+ ion/]

M2: [Y+ ment] ]

M3: [Y+al] [+VI]


Some of the semantic rules areS1 (the semantic part of 3), S2, and S3:

[+N
+ [NPi's - ((P)NP2)] "+ V

(14) S1: ABSTRACT RESULT OF + [NP1 - ((P)NP2)]


ACT OF NP1'S Z-ING NP1 Z NP2
NP2

[+N 1 [+V 1
S2: +[ (NP2)] ] + [NP1 (NP2)]
GROUPTHATZ-S (NP2) NP1 Z (NP2)

+N ]

+[(NP1's) ((P)NP2)] [+V


S3: (ACT^
*- +
+[NP1--((P)NP)]I
(NP1'S)
.^P^ pROCSOF0
{PROCESS) JF LN1ZN21
NP1ZNP2
[

l Z-ING NP2
An example of the cross-classification of the morphological and semantic
relations is the following table of nouns, where each row contains nouns of the
MORPHOLOGICAL AND SEMANTIC REGULARITIES IN THE LEXICON 651

same semantic category, and each column contains nouns of the same morpho-
logical category.
(15) Ml M2 M3
S 1: discussion argument rebuttal
S2: congregation government
S3: copulation establishment refusal
That is, John's discussion of the claim, John's argument against the claim, and
John's rebuttal of the claim are semantically related to John discussed the claim,
John argued against the claim, and John rebutted the claim respectively in a way
expressed by the subcategorization conditions and semantic interpretations of S1;
the congregationand the governmentofFredonia are related by S2 to they congregate
and they govern Fredonia; and John's copulationwith Mary, John's establishmentof
a new order, and John's refusal of the ojfferare related by S3 to John copulatedwith
Mary, John established a new order, and John refused the offer.5 There are further
nominalizing endings such as -ition (supposition),-ing (writing) and -0 (offer); and
further semantic relations, such as ONE WHO Z's (writer, occupant) and THING IN
WHICH ONE Z's (residence, entrance). The picture that emerges is of a family of
nominalizing affixes and an associated family of noun-verb semantic relationships.
To a certain extent, the particularmembers of each family that are actually utilized
in forming nominalizations from a verb are chosen randomly. Insofar as the choice
is random, the information measure must measure independently the cost of
referring to morphological and semantic redundancy rules (cf. ?6 for further
discussion).
How do we formalize the information measure, in light of the separation of
morphological rules (M-rules) and semantic rules (S-rules)? An obvious first point
is that a semantic relation between two words without a morphological relationship
cannot be counted as redundancy: thus the existence of the verb rule should render
the semantic content of the noun ruler redundant, but the semantic content of
king must count as independent information. Hence we must require a morpho-
logical relationship before semantic redundancy can be considered.
A more delicate question is whether a morphological relationship alone should
be counted as redundant. For example, professorand commission(as in 'a salesman's
commission') are morphologically related to profess and commit; but the existence
of a semantic connection is far from obvious, and I doubt that many English
speakers other than philologists ever make the association. What should the
information content of these items include? A permissive approach would allow
the phonology of the root to be counted as redundant information; the only
non-redundant part of professor, then, would be some semantic information like
TEACH. A restrictive approach would require a semantic connection before mor-
phology could be counted as redundant; professor then would be treated like
perdition, as a derived word with a non-lexical root, and both the phonology
/profess/ and the semantics TEACHwould count as independent information.
5 I assume, with Chomsky 1970, that the of in nominalizationsis transformationallyinserted.
Note also that the semanticrelations are only approximate-the usual idiosyncrasiesappear in
these examples.
652 LANGUAGE, VOLUME 51, NUMBER 3 (1975)

In ?5.1 I will present two cases in which only morphological rules play a role
because there are no semantic regularities. However, where semantic rules exist,
it has not yet been established whether their use should be required in conjunction
with morphological rules. Therefore the following restatement of information
measure 6 has alternative versions:
(16) (Information measure)
Given a fully specified lexical entry W to be introduced into the lexicon,
the independent information it adds to the lexicon is
(a) the information that W exists in the lexicon; plus
(b) (permissive form) all the information in W which cannot be
predicted by the existence of an M-rule which permits W to be
partially described in terms of information already in the
lexicon, including other lexical items and S-rules; or
(b') (restrictive form) all the information in W which cannot be
predicted by the existence of an M-rule and an associated S-rule
(if there is one) which together permit W to be partially described
in terms of information already in the lexicon; plus
(c) the cost of referring to the redundancy rules.
Examples below will show that the permissive form of 16 is preferable.
The redundancy rules developed so far describe the
5. OTHERAPPLICATIONS.
relation between verbs and nominalizations. It is clear that similar rules can describe
de-adjectival nouns (e.g. redness, entirety, width), deverbal adjectives (predictive,
explanatory), denominal adjectives (boyish, national, fearless), de-adjectival verbs
(thicken, neutralize, yellow), and denominal verbs (befriend, originate, smoke).
Likewise, it is easy to formulate rules for nouns with noun roots like boyhood,6
adjectives with adjective roots like unlikely, and verbs with verb roots like re-enter
and outlast.7 Note, by the way, that phonological and syntactic conditions such as
choice of boundary and existence of internal constituent structure can be spelled
out in the redundancy rules.
Note also that a complex form such as transformationalistcan be accounted
for with no difficulty: each of the steps in its derivation is described by a regular
redundancy rule. No question of ordering rules or of stating rule features need ever
arise (imagine, by contrast, the complexity of Boolean conditions on exceptions
which would be needed in the entry for transform,if we were to generate this word
in a transformational theory of the lexicon). Transformationalistis fully specified
in the lexicon, as are transform, transformation,and transformational.The total
information content of the four words is the information that there are four words,
plus the information in the word transform,plus idiosyncratic information added
by successive steps in derivation (e.g. that transformationin this sense refers to a
component of a theory of syntax and not just any change of state, and that a
transformationalistin the sense used in this paper is one who believes in a particular
6 The abstractnessof nouns in -hood, mentioned by Halle 1973 as an example, is guaranteed
by the fact that the associated semantic rule yields the meaning STATE (or PERIOD) OF BEING
A Z.
7 Cf. Fraser 1965 for an interestingdiscussion of this last class of verbs.
MORPHOLOGICAL AND SEMANTIC REGULARITIES IN THE LEXICON 653

form of transformational theory), plus the cost of referring to the three necessary
redundancy rules. Note that the information measure allows morphologically
derived lexical items to contain more semantic information than the rule predicts; we
use this fact crucially in describing the information content of transformationalist.
More striking examples will be given below.
With these preliminary observations, I will now present some more diverse
applications of the redundancy rules.
VERBS.Many verbs in English can be analysed into one of the
5.1. PREFIX-STEM
prefixes in-, de-, sub-, ad-, con-, per-, trans- etc. followed by one of the stems -sist,
-mit, -fer, -cede, -cur etc. Chomsky & Halle argue, for phonological reasons, that
the prefix and stem are joined by a special boundary =. Whether a particular
prefix and a particular stem together form an actual word of English seems to be
an idiosyncratic fact:
(17) *transist transmit transfer *transcede *transcur
persist permit prefer precede *precur
consist commit confer concede concur
assist admit *affer accede *accur
subsist submit suffer succeed *succur
desist *demit defer *decede *decur
insist *immit infer *incede incur
We would like the information measure of the lexicon to take into account the
construction of these words in computing their information content. There are two
possible solutions. In the first, the lexicon will contain, in addition to the fully
specified lexical entries for each actually occurring prefix-stem verb, a list of the
prefixes and stems from which the verbs are formed. The redundancy rules will
contain the following morphological rule, which relates three terms:

E+StemJJ
The information content of a particular prefix-stem verb will thus be the informa-
tion that there is a word, plus the semantic content of the verb (since there is no
semantic rule to go with 18, at least in most cases), plus the cost of referring to
morphological rule 18. The cost of each individual prefix and stem will be counted
only once for the entire lexicon.
Since we have up to this point been unyielding on the subject of hypothetical
lexical items, we might feel somewhat uncomfortable about introducing prefixes
and stems into the lexicon. However, this case is somewhat different from earlier
ones. In the case of perdition, the presumed root is the verb *perdite. If perdite
were entered in the lexicon, we would have every reason to believe that lexical
insertion transformations would insert perdite into deep structures, and that the
syntax would then produce well-formed sentences containing the verb perdite.
In order to prevent this, we would have to put a feature in the entry for perdite to
block the lexical insertion transformations. It is this rule feature [-Lexical
654 LANGUAGE, VOLUME 51, NUMBER 3 (1975)

Insertion] which we wish to exclude from the theory. Consider now the lexical
entry for the prefix trans-:
(19) [/trans/ ]
+ly
[Prefix]
Trans-has no (or little) semantic information, and as syntactic information has only
the marker [+Prefix]. Since the syntactic category Prefix is not generated by the
base rules of English, there is no way for trans- alone to be inserted into a deep
structure. It can be inserted only when combined with a stem to form a verb, since
the category Verb does appear in the base rules. Hence there is no need to use the
offending rule feature [-Lexical Insertion] in the entry for trans-, and no need to
compromise our earlier position on *perdite.
However, there is another possible solution which eliminates even entries like 19,
by introducing the prefixes and stems in the redundancy rule itself. In this case, the
redundancy rule consists of a single term, and may be thought of as the simplest
type of'word-formation' rule:
trans sist
per mit
con fer
aD = cede /
(20) (/ suB tain /
de

The information content of prefix-stem verbs is the same as before, but the cost of
the individual prefixes and stems is counted as part of the redundancy rule, not of
the list of lexical items. The two solutions appear at this level of investigation to be
equivalent, and I know as yet of no empirical evidence to decide which should be
permitted by the theory or favored by the evaluation measure.
This is the case of morphological redundancy without semantic redundancy
promised in ?4. Since, for the most part, prefixes and stems do not carry semantic
information, it is not possible to pair 18 or 20 with a semantic rule.8 Obviously
the information measure must permit the morphological redundancy anyway.
Besides complete redundancy, we now have three cases to consider: those in which
a semantic redundancy rule relates a word to a non-lexical root (e.g. perdition),
those in which the semantic rule relates a word incorrectly to a lexical root (e.g.
professor), and those in which there is no semantic rule at all. The three cases are
independent, and a decision on one of them need not affect the others. Thus the
decision to allow morphological redundancy for prefix-stem verbs still leaves open
the question raised in ?4 of how to treat professor.
It should be pointed out that word-formation rules like 20 are very similar to
8
If they did carry semantic information, it would be more difficult, but not necessarily
impossible, to state the rule in the form of 20. This is a potentialdifferencebetweenthe solutions,
concerning which I have no evidence at present.
MORPHOLOGICAL AND SEMANTIC REGULARITIES IN THE LEXICON 655

Halle's word-formation rules (1973). The major difference here between his theory
and mine is that his lexicon includes, in addition to the dictionary, a list of all
MORPHEMESin the language, productive and unproductive. The present theory
lists only WORDSin the lexicon. Productive affixes are introduced as part of lexical
redundancy rules, and non-productive non-lexical morphemes (such as *perdite)
do not appear independently anywhere in the lexical component. Other than the
arguments already stated concerning the feature [Lexical Insertion], I know of little
evidence to distinguish the two solutions. However, since Halle has not formulated
the filter, which plays a crucial role in the evaluation measure for his theory of the
lexicon, it is hard to compare the theories on the level where the present theory
makes its most interesting claims.

5.2. NOUN COMPOUNDS.


The compound nouns in 21 are all formed by conca-
tenating two nouns:
(21) a. garbage man, iceman, milkman, breadbasket, oil drum
b. snowman, gingerbread man, bread crumb, sand castle
c. bulldog, kettledrum, sandstone, tissue paper
Although the meaning of each compound is formed from the meanings of the two
constituent nouns, the way in which the meaning is formed differs from line to line.
Part of a speaker's knowledge of the English lexicon is the way in which the
meanings of compounds are related to the meanings of their constituents: thus we
would say that someone did not know English if he (seriously) used garbage man to
mean 'a man made out of garbage', by analogy with snowman.
If one brought Lees 1960 up to date, one would get an approach to compounds
which uses transformations to combine nouns randomly, controlled by exception
features so as to produce only the existing compounds with the correct meanings.
But how can such exception features be formulated? Either noun in a compound
can be changed, with a corresponding change in acceptability: we have garbage
man, garbage truck, but not *garbage gingerbread, *garbage tree; we also have
garbage man, gingerbreadman, but not *ant man, *tissue man. Thus the use of
exception features will require each noun in the lexicon to be cross-listed with
every other noun for the compounding transformations. Furthermore, since
gingerbreadis itself a compound, ginger, bread,and man will all somehow have to be
related by the exception features. In the end, the exception features appear to be
equivalent to a listing of all the existing compounds along with their meanings.
In the lexicalist theory, we can dispense with exception features in the description
of compounds. We simply give each actually occurring compound a fully specified
lexical entry, and in the list of redundancy rules we enter morphological rule
22 and semantic rules 23a,b,c, describing the data of 21a,b,c respectively. Of
course, there are a great number of additional semantic rules (cf. Lees, chapter 4);
I list only these three as a sample:

(22) [I[Nx] [NY]/!] [|+N]


656 LANGUAGE, VOLUME 51, NUMBER 3 (1975)

m-i\ , +N JLZJ ]
(3) a. [Z THAT CARRIES W] (
\+N }

1,~~~~I W\\

f+N1l
bZ MADEOF WI '' { +NN}

+
\
C.LZ
c+. LIKE A
A W\
W] N{ [\} [z
[+N]

The redundancy rules thus define the set of possible compounds of English, and the
lexicon lists the actually occurring compounds.
The information measure 16 gives an intuitively correct result for the independent
information in compounds. For example, since the nouns garbage and man are
in the lexicon, all their information will be counted as redundant in evaluating the
entry for garbage man. Thus the independent information content of garbage man
will be the information that such a word exists, plus any idiosyncratic facts about
the meaning (e.g. that he picks up rather than delivers garbage), plus the cost of
referring to 22 and 23a. The information of a complex compound like gingerbread
man is measured in exactly the same way; but the independent information in its
constituent gingerbread is reduced because of its relation to ginger and bread.
Gingerbreadman is thus parallel in its evaluation to the case of transformationalist
cited earlier.
Now consider the problem of evaluating the following nouns:
(24) a. blueberry, blackberry
b. cranberry, huckleberry
c. gooseberry, strawberry
Blueberryand blackberry are obviously formed along the lines of morphological
rule 25 and semantic rule 26. This combination of rules also forms flatiron, high-
chair, madman, drydock, and many others.

(25) [I[Ax] [NY]/] JI+A]}

(26) [Z WHICHIS WJ {[+A ]}

Thus blueberryand blackberry are evaluated in exactly the same way as garbage
man.
Cranberryand huckleberrycontain one lexical morpheme and one non-lexical
morpheme. The second part (-berry)and its associated semantics should be redun-
MORPHOLOGICAL AND SEMANTIC REGULARITIES IN THE LEXICON 657

dant, but the phonological segments /craen/ and /hukl/, and the semantic charac-
teristics distinguishing cranberries and huckleberries from other kinds of berries,
must be non-redundant. Hence this case is just like perdition, where a non-lexical
root is involved, and the information measure formulated for the case of per-
dition will yield the intuitively correct result. One problem is that the lexical
categories of cran- and huckle- are indeterminate, so it is unclear which mor-
phological rule applies. Likewise, it is unclear which semantic rule applies. However,
I see nothing against arbitrarily applying the rules which cost least; this conven-
tion will minimize the information in the lexicon without jeopardizing the generality
of the evaluation procedure.
We observe next that gooseberry and strawberrycontain two lexical morphemes
and are both berries, but gooseberries have nothing to do with geese and straw-
berries have nothing to do with straw. This case is thus like professor, which has
nothing to do semantically with the verb profess, and exactly the same question
arises in their evaluation: should they be intermediate in cost between the previous
two cases, or should they be evaluated like cranberry, with straw- and goose-
counted as non-redundant? The fact that there is pressure towards phonological
similarity even without semantic basis (e.g. gooseberry was once groseberry) is
some evidence in favor of the permissive form of 16, in which morphological
similarity alone is sufficient for redundancy.
Another semantic class of compound nouns (exocentric compounds) differs
from those mentioned so far in that neither constituent describes what kind of
object the compound is. For example, there is no way for a non-speaker of English
to know that a redhead is a kind of person, but that a blackhead is a kind of
pimple.9 Other examples are redwing (a bird), yellow jacket (a bee), redcoat (a
soldier), greenback (a bill), bigmouth (a person), and big top (a tent). The mor-
phological rule involved is 25; the semantic rule must be

[?N 1 [[?N]
(27) THING WITH A Z j <-4 [A.]
[ WHICH IS w \ [?A ]

This expresses the generalization inherent in these compounds, but it leaves open
what kind of object the compound refers to. The information measure gives as the
cost of redhead,for example, the information that there is a word, plus the informa-
tion that a redhead is a person (a more fully specified form of THING in 27), plus
the cost of referring to 27. This evaluation reflects precisely what a speaker must
learn about the word.
A transformationaltheory of compound formation, on the other hand, encounters
severe complication with this class of compounds. Since a compounding trans-
formation must preserve functional semantic content, the underlying form of
redheadmust contain the information that a redhead is a person and not a pimple,
and this information must be captured somehow in rule features (or derivational
constraints) which are idiosyncratic to the word redhead. I am sure that such con-
straints can be formulated, but it is not of much interest to do so. The need for
9 I am grateful to Phyllis Pacin for this example.
658 LANGUAGE, VOLUME 51, NUMBER 3 (1975)

these elaborate rule features stems from the nature of transformations. Any
phrase-markertaken as input to a particular transformation corresponds to a set
of fully specified output phrase-markers. In the case of exocentric compounds, the
combination of the two constituent words by rule 27 does not fully specify the
output, since the nature of THINGin 27 is inherently indeterminate.
We thus see an important empirical difference between lexical redundancy
rules and transformations: it is quite natural and typical for lexical redundancy
rules to relate items only partially, whereas transformations cannot express partial
relations. Several illustrations of this point have appeared already, in the mor-
phological treatment of perdition and cranberryand in the semantic treatment of
transformationalist.However, the case of exocentric compounds is perhaps the
most striking example, since no combination of exception features and hypothetical
lexical items can make the transformational treatment appear natural. The lexicalist
treatment, since it allows rules to relate exactly as much as necessary, handles
exocentric compounds without any remarkable extensions of the machinery.
VERBS.There is a large class of verbs which have both transitive
5.3. CAUSATIVE
and intransitive forms;10 e.g.,
(28) a. The door opened.
b. Bill opened the door.
(29) a. The window broke.
b. John broke the window.
(30) a. The coach changed into a pumpkin.
b. Mombi the witch changed the coach from a handsome young man
into a pumpkin.
It has long been a concern of transformational grammarians to express the fact
that the semantic relations of door to open, of window to break, and of coach to
change are the same in the transitive and intransitive cases.
There have been two widely accepted approaches, both transformational in
character. The first, that of Lakoff 1971b, claims that the underlying form of the
transitive sentence contains the intransitive sentence as a complement to a verb of
causation-i.e., that the underlying form of 28b is revealed more accurately in the
sentence Bill caused the door to open. The other approach, case grammar, is that of
Fillmore 1968. It claims that the semantic relation of door to open is expressed
syntactically in the deep structures of 28a and 28b, and that the choice of subject
is a purely surface fact. The deep structuresare taken to be 31laand 31b respectively:
(31) a. past open [objective the door]
b. past open [objective the door] [Agentive by Bill]
These proposals and their consequences have been criticized on diverse syntactic
and semantic grounds (cf., e.g., Chomsky 1972, Fodor 1970, and Jackendoff
1972, Chap. 2); I do not intend to repeat those criticisms here. It is of interest to
note, however, that Lakoff's analysis of causatives is the opening wedge into the
generative semanticists' theory of lexicalization: if the causative verb break is the
result of a transformation, we would miss a generalization about the nature of

10 This class may also include the two forms of begin proposed by Perlmutter 1970.
MORPHOLOGICAL AND SEMANTIC REGULARITIES IN THE LEXICON 659

agentive verbs by failing to derive the causative verb kill by the same transformation.
But since kill has as intransitive parallel not kill but die, and since there are many
such causative verbs without morphologically related intransitives, the only way
to avoid an embarrassing number of exceptions in the lexicon is to perform lexical
insertion AFTERthe causative transformation, as proposed by McCawley 1968.
Again, the difficulty in this solution lies in the nature of transformations. There
are two cross-classifying generalizations which a satisfactory theory must express:
all causative verbs must share a semantic element in their representation; and the
class of verbs which have both a transitive causative form and an intransitive non-
causative form must be described in a general fashion. Expressing the second
generalization with a transformation implies a complete regularity, which in turn
loses the first generalization; McCawley's solution is to make a radical move to
recapture the first generalization.
There remains the alternative of expressing the second generalization in a way
that does not disturb the first. Fillmore's solution is along these lines; but he still
requires a radical change in the syntactic component, viz. the introduction of case
markers.
The lexicalist theory can leave the syntactic component unchanged by using the
power of the lexicon to express the partial regularity of the second generalization.
The two forms of break are assigned separate lexical entries:
7/brak/
(32) a. NP ]
,NPi BREAK

7/brtk/
+V
+[NP2 ' NP1]
_NP2 CAUSE (NP, BREAK)_
The two forms are related by the following morphological and semantic rules:"

(33) [+V
a[ KV]

b. +[NP1 ]| - +[NP2 NP1]


[NP, W NP2 CAUSE(NP, W)J
Thus the independent information contained in the two entries for break is the fact
that there are two words,12 plus the independent information in the intransitive
form 32a, plus the cost of referring to the redundancy rules. Hence the relation
between the (a) and (b) sentences in 28-30 is expressed in the lexicon and not in
the transformational component.

11Since 32a is an identity rule, it is possibly dispensable.I have includedit here for the sake of
explicitness, and also in order to leave the form of the information measure unchanged.
12
Perhapsthe use of the identityrule 32a could make the two words count as one, if this were
desirable.I have no intuitions on the matter, so I will not bother with the modification.
660 LANGUAGE, VOLUME 51, NUMBER 3 (1975)

This solution permits us still to capture the semantic similarity of all causative
verbs in their lexical entries; thus die and kill will have entries 34a and 34b
respectively:
-
-/d/
(34) a. +
+a [NP,__]
NP1 DIE

-/kil/ 1
+V
+ [NP2 NP1]
NP2 CAUSE(NP1 DIE)
Die and kill are related semantically in exactly the way as the two entries of break:
one is a causative in which the event caused is the event described by the other.
However, since there is no morphological rule relating 34a-b, the information
measure does not relate them; the independent information contained in the two
entries is the fact that there are two words, plus all the information in both entries.
Thus the lexicalist theory successfully expresses the relation between the two
breaks and their relation to kill and die, without in any sense requiring kill and die
to be exceptional, and without making any radical changes in the nature of the
syntactic component.
A further possibility suggested by this account of causative verbs is that the
partial regularitiesof the following examples from Fillmore are also expressed in the
lexicon:
(35) a. Bees swarmed in the garden.
We sprayed paint on the wall.
b. The garden swarmed with bees.
We sprayed the wall with paint.
Fillmore seeks to express these relationships transformationally, but he encounters
the uncomfortable fact that the (a) and (b) sentences are not synonymous: the (b)
sentences imply that the garden was full of bees and that the wall was covered with
paint, but the (a) sentences do not carry this implication. Anderson 1971 shows
that this semantic difference argues against Fillmore's analysis, and in favor of one
with a deep-structure difference between the (a) and (b) sentences. A lexical treat-
ment of the relationship between the two forms of swarmand spray could express
the difference in meaning, and would be undisturbed by the fact that some verbs,
such as put, have only the (a) form and meaning, while others, such as fill, have
only the (b) form and meaning. This is precisely parallel to the break-break vs.
die-kill case just discussed.
Consider also some of the examples mentioned in Chomsky 1970. The relation of
He was amused at the stories and The stories amused him can be expressed in the
lexicon, and no causative transformation of the form Chomsky proposes need be
invoked. The nominalization his amusementat the stories contrasts with *the stories'
amusement of him because amusement happens to be most directly related to the
adjectival amused at rather than to the verb amuse. Other causatives do have
nominalizations, e.g. the excitation of the protons by gamma rays. I take it then that
MORPHOLOGICAL AND SEMANTIC REGULARITIES IN THE LEXICON 661

the existence of only one of the possible forms of amusement is an ad-hoc fact,
expressed in the lexicon.
Chomsky also cites the fact that the transitive use of grow, as in John grows
tomatoes, does not form the nominalization *the growth of tomatoes by John.
Rather the growth of tomatoes is related to the intransitive tomatoes grow. Again
we can express this fact by means of lexical relations. This time, the relation is
perhaps more systematic than with amusement,since nouns in -th, such as width and
length, are generally related to intransitive predicates. Thus the meaning of growth
can be predicted by the syntactic properties of the redundancy rule which introduces
the affix -th. The transitive grow does in fact have its own nominalization: the
growing of tomatoes by John. Thus Chomsky's use of causatives as evidence for the
Lexicalist Hypothesis seems incorrect-in that causatives do have nominalizations,
contrary to his claim. But we can account for the unsystematicity of the nominaliza-
tions, as well as for what regularities do exist, within the present framework.
Note also that our account of causatives extends easily to Lakoff's class of
inchoative verbs (1971b). For example, the relation of the adjective open to the
intransitive verb open ('become open') is easily expressed in a redundancy rule
similar to that proposed for causatives.
As further evidence for the lexicalist theory, consider two forms of the verb
smoke:

(36) a {The cigar smoked.


( The chimneys}
, , , , (the cigar.
b. John smoked 1..*thechi.
(*the chimney.J
f
The intransitive verb smoke means 'give off smoke'; it is related to the noun smoke
by a redundancy rule that applies also to the appropriate senses of steam, smell,
piss, flower, and signal. The transitive form of smoke in the sense of 36b is partially
related to the intransitive form by 33 in that it means 'cause to give off smoke',
but it contains additional information-something like 'by holding in the mouth
and puffing'. This information is not predictable from the redundancy rule, but it
provides the clue to the anomaly of *John smoked the chimney (so, if John were
a giant, he might well use a chimney like a pipe, and then the sentence might be
acceptable). A transformational theory has no way to capture this partial generaliza-
tion without artificiality. The lexicalist theory simply counts the unpredictable
information as non-redundant, and the predictable information as redundant.
While we are on the subject of smoke, it may be interesting to point out some
other senses of smoke as illustration. Call the noun smoke1, and the intransitive
and transitive senses just discussed smoke2 and smoke3 respectively. There is
another transitive verb smoke4, which means 'permeate or cover with smoke' as
in John smoked the ham. The redundancy rule relating smoke4 to smoke1 is also
seen in verbs like paint, another sense of steam, water (in water the garden), and
powder (as in powder your nose), flour, and cover. There is another intransitive
smoke5, meaning 'smoke3 something'. The ambiguity in John is smoking is between
smoke2 and smoke5. Smoke5 is related to smoke3 by a redundancy rule that also
handles two forms of eat, drink, draw, read, cook, and sing. From smoke3 we also
get the nominalization smoke6, 'something that is smoked 3' (e.g. A cigar is a good
662 LANGUAGE, VOLUME 51, NUMBER 3 (1975)

smoke) by the redundancy rule that also gives the nouns drink, desire, wish, dream,
find, and experience. The verb milk (as in milk a cow) is related to the noun as
smoke1 and smoke3 are related, but without an intermediate *The cow milked
('The cow gave off milk'); the relation between the two milks requires two sets of
redundancy rules used together. We thus see the rich variety of partial regularities
in lexical relations: their expression in a transformational theory becomes hard to
conceive, but they can be expressed quite straightforwardlyin the lexicalist frame-
work.
5.4. IDIOMS. Idioms are fixed syntactic constructions which are made up of words
already in the lexicon, but which carry meanings independent of the meanings of
their constituents. Since the meanings are unpredictable, the grammar must rep-
resent a speaker's knowledge of what constructions are idioms and what they
mean. The logical place to list idioms is of course in the lexicon, though it is not
obvious that the usual lexical machinery will suffice.
Fraser 1970 discusses three points of interest in the formalization of idioms.
First, they are constructed from known lexical items; the information measure,
which measures how much the speaker must learn, should reflect this. Second,
they are for the most part constructed in accordance with known syntactic rules
(with a few exceptions such as by and large), and in accordance with the syntactic
restrictions of their constituents. Third, they are often resistant to normally applic-
able transformations; e.g., The bucket was kicked by John has only the non-idio-
matic reading. I have nothing to say about this third consideration, but the first
two can be expressed in the present framework without serious difficulty.
Let us deal first with the question of the internal structure of idioms. Since we
have given internal structure to items like compensationand permit, there seems
to be nothing against listing idioms too, complete with their structure. The only
difference in the lexical entries is that the structure of idioms goes beyond the word
level. We can thus assign the lexical entries in 37 to kick the bucket, give hell to, and
take to task.13
(37) a. [NP1 [vp [vkik] [NP [Art63] [Nbukat]]]]

b [NP1 [vp [vgiv] [NP [Nhel]] [pp [pto] NP2]


LNP1YELL AT NP2 J

[NPi [v [vtak] NP2 [pp [pto] [NP [Ntaesk]]]]1


cLNP.
CRITICIZENP2 J
The lexical insertion rule will operate in the usual way, inserting the lexical entries
onto deep phrase markers that conform to the syntactic structure of the lexical
entries. Since the structure of the entries goes beyond the word level, the idiom
must be inserted onto a complex of deep-structure nodes, in contrast to ordinary
words which are inserted onto a single node.
13
The normal notation for strict subcategorizationrestrictions is difficult to apply in this
case, so I have for convenience adopted a notation in which the strict subcategorizationcon-
ditions are combined with the phonological and syntactic representations, in an obvious
fashion. No particular theoretical significance is intended by the change in notation. This
proposal, which appears to be much like that of Katz 1973, was arrived at independently.
MORPHOLOGICAL AND SEMANTIC REGULARITIES IN THE LEXICON 663

As with ordinary lexical entries, the strictly subcategorized NP's must have a
specific grammatical relation with respect to the entry, and this is indicated in the
entries of 37. In the case of take NP to task, the strictly subcategorized direct
object is in fact surrounded by parts of the idiom; i.e., the idiom is discontinuous.
But in the present theory, this appears not to be cause for despair, as our formalisms
seem adequate to accommodate a discontinuous lexical item.
This last observation enables us to solve a puzzle in syntax: which is the under-
lying form in verb-particle constructions, look up the answer or look the answer
up? The standard assumption (cf. Fraser 1965) is that the particle has to form a
deep-structureconstituent with the verb in order to formulate a lexical entry; hence
look up the answer is underlying, and the particle movement transformation is a
rightward movement. But Emonds 1972 gives strong syntactic evidence that the
particle movement rule must be a leftward movement. He feels uncomfortable about
this result because it requires that look ... up be discontinuous in deep structure;
he consoles himself by saying that the same problem exists for take ... to task,
but does not provide any interesting solution. Having given a viable entry for
take ... to task, we can now equally well assign discontinuous entries to idiomatic
verb-particle constructions, vindicating Emonds' syntactic solution.
By claiming that the normal lexical-insertion process deals with the insertion of
idioms, we accomplish two ends. First, we need not complicate the grammar in
order to accommodate idioms. Second, we can explain why idioms have the
syntactic structure of ordinary sentences: if they did not, the lexical insertion rules
could not insert them onto deep phrase markers. Our account of idioms thus has
the important virtue of explaining a restriction in terms of already existing con-
ventions in the theory of grammar-good evidence for its correctness.
Now that we have provided a way of listing idioms, how can we capture the
speaker's knowledge that idioms are made up of already existing words? To relate
the words in the lexicon to the constituents of idioms, we need morphological
redundancy rules. The appropriate rules for kick the bucket must say that a verb
followed by a noun phrase forms a verb phrase, and that an article followed by a
noun forms a noun phrase. But these rules already exist as phrase-structure rules
for VP and NP. Thus, in the evaluation of idioms, we must use the phrase-structure
rules as morphological redundancy rules. If this is possible, the independent in-
formation in kick the bucket will be the information that it is a lexical entry, plus
the semantic information DIE, plus the cost of referringto the phrase-structurerules
for VP and NP.
Though mechanically this appears to be a reasonable solution, it raises the
disturbing question of why the base rules should play a role in the information
measure for the lexical component. Some discussion of this question will appear
in ?7. At this point I will simply note that this solution does not have very drastic
consequences for grammatical theory. Since the base rules can be used as redun-
dancy rules only if lexical entries go beyond the word level, no descriptive power
is added to the grammar outside the description of idioms. Therefore the proposal
is very limited in scope, despite its initially outrageous appearance.
If the base rules are used as morphological redundancy rules for idioms, we
might correspondingly expect the semantic projection rules to be used as semantic
664 LANGUAGE, VOLUME 51, NUMBER 3 (1975)

redundancy rules. But of course this cannot be the case, since then an idiom would
have exactly its literal meaning, and cease to be an idiom. So we must assume that
the permissiveversion of the information measure is being used: both morphological
and semantic redundancy rules exist, but only the morphological rules apply in
reducing the independent information in the idiom. This is further evidence that the
permissive version of the information measure must be correct.
Note, by the way, that a transformational theory of nominalization contains
absolutely no generalization of the approach that accounts for idioms. Thus the
lexicalist hypothesis proves itself superior to the transformational hypothesis in a
way totally unrelated to the original arguments deciding between them.

6. THE COSTOF REFERRINGTO REDUNDANCY RULES. In evaluating the independent


information of lexical entries, we have continually included the cost of referring to
redundancy rules. We have not so far specified how to calculate this cost, or how to
relate it quantitatively to other costs in the lexicon. In this section I will propose
some preliminary answers to those questions.
In the discussion of the full-entry theory in ?2, 1 said that the cost of referringto a
redundancy rule in evaluating a lexical entry represents one's knowledge of which
regularities hold in that particular lexical entry. In order to be more specific, let us
reconsider the meaning of the information measure in the full-entry theory. In
measuring the independent information contained in a lexical entry, we are in
effect measuring how much new information one needs in order to learn that
lexical item. If the lexical item is totally unrelated to anything else in the lexicon,
one must learn it from scratch. But if there is other lexical information which helps
one know in advance some of the properties of the new word, there is less to learn;
this is captured in clause (b) of the information measure.
In learning that a new lexical item can be formed on the basis of an old lexical
item and a redundancy rule, however, something must be learned besides the
identity of the old lexical item: namely, which redundancy rule. to apply. For
example, part of one's knowledge of the lexicon of English is the fact that the
nominalizations of refuse and confuse are refusal and confusion, not *refusion and
*confusal, although in principle the latter forms could exist. That is, in learning the
words refusal and confusion,one must learn the arbitraryfact that, of the choice of
possible nominal affixes, refuse uses -al and confuse uses -ion. Clause (c) of the
information measure, the cost of referring to the redundancy rule, is meant to
represent this knowledge. I am claiming therefore that the evaluation of refusal
must take into account the fact that it, and not *refusion,is the proper nominaliza-
tion of refuse.
For a clear case of the use of clause (c), let us turn to another example. Botha
1968 discusses the process of nominal compounding in Afrikaans, which contains
many compounds which are morphologically simple concatenations of two nouns,
as in English. But there are also many compounds in which the two nouns are
joined by a 'link phoneme' s or a. Botha demonstrates at great length that there
is no phonological, morphological, syntactic, or semantic regularity in the use
of link phonemes; i.e., the link phoneme must be learned as an idiosyncrasy of
each individual compound.
MORPHOLOGICAL AND SEMANTIC REGULARITIES IN THE LEXICON 665

In the present theory, the Afrikaans lexicon contains three morphological rules
for noun compounds:

(38) a. [[NX] [NY]/] {[I I}

[+,, LSN ]lu { i!


[I[NX]
S [NY]/]
{[L+ N]}

b-L+N l Wyl
1+N]
[I[NX] [NY]!] {[IxI]

Since all the morphological information of a particular compound is predicted by


one of the three rules in 38, clause (b) of the information measure contributes
nothing to the information content of the compound. But since the speaker
must learn which of the three is appropriate, clause (c) must contribute the cost of
the information involved in making this choice.
A third example involves inflectional morphology. Halle 1973 argues that
paradigmatic information should be represented in the dictionary, and in fact that
only and all fully inflected forms should be entered. As a consequence, the lexical
insertion rules must enter partial or complete paradigms into deep structures, and
the rules of concord must have the function of filtering out all but the correct
forms, rather than that of inserting inflectional affixes.14Under Halle's proposal,
part of the task of the lexical component of English is to list the correspondences
between the present and past tense forms of verbs. Accordingly, we can state a
few morphological redundancy rules relating present to past tense forms in English:

(39)a ]^ r/xd ]
(a) [+ [V+pres]] [+[V+past]J
b f/CoVCo/ ir/CoVCo+t/
b L+[V+pres]J [+[V+past]]

[/CO back /CO --aback CO/


c.round -caround
+[VY+pres] _ +[V+past]
df/CoVCo/ 1 f/COoOx+d/ 1
d. [V+pres]J [[V+past]]
14 This of course requiresrules of concord to be of a differentformal nature than ordinary
transformations.But perhapsthis is not such a bad result, consideringthat the most convincing
cases for Lakoff's global rules seem to be in this area. An independentargumentthat concord
rules differ formally from transformationscould serve as evidence that transformationsneed
not be global: only the very limited class of concord rules, which are no longer transformations
at all, need information from various levels of derivation. This more highly structuredtheory
reduces the class of possible grammars.
666f LANGUAGE, VOLUME 51, NUMBER 3 (1975)

Here 39a is the regular rule for forming past tenses, and the other three represent
various irregular forms: 39b relates keep-kept, dream-dreamt, lose-lost, feel-felt
etc.; 39c relates tell-told, cling-clung, hold-held, break-broke etc.; the very marginal
and strange 39d relates just the six pairs buy-bought, bring-brought,catch-caught,
fight-fought, seek-sought, and think-thought. Note that 39b-c take over the
function of the 'precyclic re-adjustment rules' described by Chomsky & Halle
(209-10).15
A final preliminary point in this example: in the evaluation of a paradigm by the
information measure, I assume that the information that a word exists is counted
only once for the entire paradigm. Although one does have to learn whether a verb
has a nominalization, one knows for certain that it has a past tense, participles, and
a conjugation. Therefore the information measure should not count knowledge
that inflections exist as anything to be learned.
Now let us return to the problem of measuring the cost of referring to a redun-
dancy rule. Intuitively, the overwhelmingly productive rule 39a should cost virtually
nothing to refer to; the overwhelmingly marginal rules 39b-d should cost a great
deal to refer to, but less than the information they render predictable. The disparity
in cost reflects the fact that, in choosing a past tense form, 39a is ordinary and
unremarkable, so one must learn very little to use it; but the others are unusual or
'marked' choices, and must be learned. We might further guess that 39b-c, which
each account for a fair number of verbs, cost less to refer to than 39d, which applies
to only six forms (but which is nevertheless perceived as a minor regularity). Still,
the pair buy-boughtcontains less independent information than the totally irregular
pair go-went, which must be counted as two independent entries.
These considerations lead to a formulation of the cost of reference something
like
(40) The cost of referring to redundancy rule R in evaluating a lexical entry
W is IR,W x PR,W, where IR,W is the amount of information in W
predicted by R, and PR.Wis a number between 0 and 1 measuring the
regularity of R in applying to the derivation of W.
For an altogether regular rule application, such as the use of 39a with polysyllabic
verbs, PR,W will be zero. With monosyllabic verbs and 39a, PR,W will be almost but
not quite zero; the existence of alternatives means that something must be learned.
For 39b-d, PR.Wwill be close to 1; their being irregular means that their use does
not reduce the independent information content of entries nearly as much as 39a.
In particular, 39d will reduce the independent information content hardly at all.
In fact, it is quite possible that the total information saved by 39d in the evaluation
15 I have not considered the question of how to extend the phonological generalization of
39c to other alternationssuch as mouse-mice,long-length.Perhapsthe only way to do this is to
retain the rule in the phonology, and simply let the lexical redundancyrule supply a rule feature.
But a more sophisticated account of the interaction of the morphological rules might capture
this generalizationwithouta rule feature; e.g., one could consider factoringmorphologicalrules
into phonological and syntactic parts, as we factored out separate morphologicaland semantic
rules in ?4. In any event, I am includingall the phonology in 39 because many people have been
dissatisfiedwith the notion of re-adjustmentrules: I hope that bringing up an alternativemay
stimulate someone to clarify the notion.
MORPHOLOGICAL AND SEMANTIC REGULARITIES IN THE LEXICON 667

of the six relevant pairs of lexical entries is less than the cost of stating the rule.
Our evaluation measure thus reflects the extremely marginal status of this rule. In
other cases, perhaps the nominalizing affixes and Afrikaans compounds, the various
possible derived forms are in more equal competition, and PR.Wwill have a value
of, say, 0.3.
I will not suggest a precise method of calculating PR,W,as I believe it would be
premature. However, the general concept of how it should be formulated is fairly
clear. Count a lexical pair related by R as an ACTUALuse of R. Count a lexical
entry which meets one term of the structural description of R, but in whose evalua-
tion R plays no role, as a NON-USEof R. For example, confuse counts as a non-use
of the rule introducing the -al nominal affix, since it meets the structuraldescription
of the verbal term of the rule, but there is no noun confusal. The sum of the actual
uses and the non-uses is the number of POTENTIAL
uses of R. PRW should be near
zero when the number of actual uses of R is close to the number of potential uses;
PRw should be near 1 when the number of actual uses is much smaller than the
number of potential uses; and it should rise monotonically from the former
extreme to the latter.
If phonological conditions can be placed on the applicability of a redundancy
rule, PR,Wdecreases; i.e., the rule becomes more regular. For example, if the actual
uses of 39b all contain mid vowels (as I believe to be the case), then this specification
can be added to the vowel in 39b, reducing the potential uses of the rule from the
number of monosyllabic verbs to the number of such verbs with mid vowels. Since
the number of actual uses of the rule remains the same, PRW is reduced; and,
proportionately, so is the cost of referring to 39b in the derivations where it is
involved.
It is obvious that this concept of PR,Wmust be refined to account for derivations
such as perdition with non-lexical sources; for compounding, where the number of
potential uses is infinite because compounds can form parts of compounds; and for
prefix-stem verbs, where the lexical redundancy rule does not relate pairs of items.
Furthermore, I have no idea how to extend the proposal to the evaluation of idioms,
where the base rules are used as lexical redundancy rules. Nevertheless, I believe
the notion of regularity of a lexical rule and its role in the evaluation measure for
the lexicon is by this point coherent enough to satisfy the degree of approximation
of the present theory.

7. CREATIVITY IN THELEXICONAND ITS IMPLICATIONS. The accepted view of the


lexicon is that it is simply a repository of learned information. Creativity is taken to
be a product of the phrase-structurerules and transformations. That is, the ability
of a speaker to produce and understand new sentences is ascribed to his knowledge
of a productive set of rules which enable him to combine a fixed set of memorized
words in infinitely many ways.
If we were to adhere to this view strictly, it would be difficult to accept the
treatment of the lexicon proposed here. For example, it is quite common for
someone to invent a new compound noun spontaneously and to be perfectly under-
stood. This creative use of the compound rule, we would have to argue, is evidence
that compounding must be a transformational process rather than a lexical one.
668 LANGUAGE, VOLUME 51, NUMBER 3 (1975)

This conclusion would fly in the face of all the evidence in ?5.3 against a transform-
ational account of compounds.
The way out of the dilemma must be to follow the empirical evidence, rather
than our preconceived notions of what the grammar should be like. We must
accept the lexicalist account of compounds, and change our notion of how creativity
is embodied in the grammar.
The nature of the revision is clear. Lexical redundancy rules are learned from
generaIizationsobserved in already known lexical items. Once learned, they make it
easier to learn new lexical items: we have designed them specifically to represent
what new independent information must be learned. However, after a redundancy
rule is learned, it can be used generatively, producing a class of partially specified
possible lexical entries. For example, the compound rule says that any two nouns
N1 and N2 can be combined to form a possible compound N1N2. The semantic
redundancy rules associated with the compound rule provide a finite range of
possible readings for N1N2. If the context is such as to disambiguate NiN2, any
speaker of English who knows N1 and N2 can understand N1N2 whether he has
heard it before or not, and whether it is an entry in his lexicon or not. Hence the
lexical rules can be used creatively, although this is not their usual role.
In ?5.4,I proposed that the description of idioms uses the phrase-structurerules
as lexical redundancy rules. In broader terms, the rules normally used creatively
are being used for the passive description of memorized items. Perhaps this change
in function makes more sense in light of the discussion here: it is a mirror image to
the creative use of the normally passive lexical redundancy rules.
We have thus abandoned the standard view that the lexicon is memorized and
only the syntax is creative. In its place we have a somewhat more flexible theory of
linguistic creativity. Both creativity and memorization take place in both the
syntactic and the lexical component. When the rules of either component are used
creatively, no new lexical entries need be learned. When memorization of new
lexical entries is taking place, the rules of either component can serve as an aid to
learning. However, the normal mode for syntactic rules is creative, and the normal
mode for lexical rules is passive.
Is there, then, a strict formal division between phrase-structure rules and
morphological redundancy rules, or between the semantic projection rules of deep
structure and the semantic redundancy rules? I suggest that perhaps there is not,
and that they seem so different simply because of the differences in their normal
mode of operation. These differences in turn arise basically because lexical rules
operate inside words, where things are normally memorized, while phrase-structure
rules operate outside words, where things are normally created spontaneously.
One might expect the division to be less clear-cut in a highly agglutinative language,
where syntax and morphology are less separable than in English.
To show that the only difference between the two types of rules is indeed in their
normal modes of operation, one would of course need to reconcile their somewhat
disparate notations and to show that they make similar claims. Though I will
not carry out this project here, it is important to note, in the present scheme, that
the syntactic analog of a morphological redundancy rule is a phrase-structurerule,
not a transformation. This result supports the lexicalist theory's general trend
MORPHOLOGICAL AND SEMANTIC REGULARITIES IN THE LEXICON 669

toward enriching the base component at the expense of the transformational com-
ponent.'6
8. SUMMARY.This paper set out to provide a theory of the lexicon that would
accommodate Chomsky's theory of the syntax of nominalizations. This required a
formalization of the notion 'separate but related lexical entries'. The formalization
developed uses redundancy rules not for part of the derivation of lexical entries, but
for part of their evaluation. I take this use of redundancy rules to be a major
theoretical innovation of the present approach.
In turn, this use of redundancy rules entails the formulation of a new type of
evaluation measure. Previous theories have used abbreviatory notations to reduce
the evaluation measure on the grammar to a simple count of symbols. But we have
seen that the usual notational conventions cannot capture the full range of general-
izations in the lexicon. Accordingly I have formulated the evaluation measure as
a minimization of independent information, measured by the rather complex
function 16 and its refinement in 40. The abandonment of the traditional type of
evaluation measure is a second very crucial theoretical innovation required for an
adequate treatment of the lexicon.17
The concept of lexical rules that emerges from the present theory is that they are
separated into morphological and semantic redundancy rules. The M-rules must
play a role, and the S-rules may, in every lexical evaluation in which entries are
related. Typically, the redundancy rules do not completely specify the contents of
one entry in terms of another, but leave some aspects open. This partial specification
of output is a special characteristic of lexical redundancy rules not shared by other
types of rules; I have used this characteristic frequently in arguing against trans-
formational solutions.
In the discussion of nominalizations, I have taken great pains to tailor the
information measure to our intuitions about the nature of generality in the lexicon.
In particular, attention has been paid to various kinds of lexical derivatives with
non-lexical sources, since these form an important part of the lexicon which is not
accounted for satisfactorily in other theories.
While our solutions were developed specifically with nominalizations in mind,
there is little trouble in extending them to several disparate areas in the lexicon.
I have shown that parallel problems occur in these other areas, and that the solution
for nominalizations turns out to be applicable. Insofar as the success of a theory
is measured by how easily it generalizes to other problems, this theory thus seems
quite successful for English. A more stringent test would be its applicability to
languages where morphology plays a much more central role.
Another measure of a theory's success is its salutary effect on other sectors of the
16
Halle 1973 argues for a view of lexical creativity very similar to that proposed here, on
similar grounds.
17
One might well ask whether the traditional evaluation measure has inhibited progress in
other areas of the grammaras well. I conjecturethat the approach to markingconventions in
SPE (Chapter9) suffersfor this very reason: Chomsky & Halle set up markingconventions so
that more 'natural' rules save symbols. If, instead, the markingconventions were used as part
of an evaluation measure on a set of fully specified rules, a great deal of their mechanical
difficultymight well be circumventedin expressingthe same insights.
670 LANGUAGE, VOLUME 51, NUMBER 3 (1975)

theory of grammar. The most important effect of the present theory is to eliminate
a major part of the evidence for Lakoff's theory of exceptions to transformations
(1971b): the lexicon has been set up to accommodate comfortably both regular
and ad-hoc facts, with no sense of absolute exceptionality; and transformations
are not involved in any event. Since (in Jackendoff 1972) I have eliminated another
great part of Lakoff's evidence, virtually all of Lakoff's so-called exceptions are now
accounted for in a much more systematic and restricted fashion. We also no longer
need hypothetical lexical entries, a powerful device used extensively by Lakoff.
With practically all of Lakoff's evidence dissolved, we see that the theory of
exceptions plays a relatively insignificant role in lexicalist grammar. A small
dent has also been made in the highly controversial area of idiosyncratic phono-
logical re-adjustment rules, though much further work is needed before we know
whether they are eliminable.
There are three favorable results in syntax as well. First and most important, the
analysis of causative verbs, which supposedly provides crucial evidence for the
generative semantics theory of lexicalization, can be disposed of quietly and without
fuss, leaving the standard theory of lexical insertion intact. Second, idioms can be
listed in the lexicon and can undergo normal lexical insertion; some of their syn-
tactic properties emerge as an automatic consequence of this position. Third, the
direction of the English particle movement transformation can finally be settled
in favor of leftward movement.
Thus a relatively straightforward class of intuitions about lexical relations has
been used to justify a theory of the lexicon which has quite a number of significant
properties for linguistic theory. Obviously, many questions remain in the area of
morphology. I would hope, however, that this study has provided a more congenial
framework in which to pose these questions.
REFERENCES
ANDERSON, S. 1971. On the role of deep structurein semanticinterpretation.Founda-
tions of Language7.387-96.
BOTHA, RUDOLF P. 1968. The function of the lexicon in transformationalgenerative
grammar. The Hague: Mouton.
CHOMSKY, N. 1965. Aspects of the theory of syntax. Cambridge, Mass.: MIT Press.
. 1970. Remarks on nominalization. In Jacobs & Rosenbaum, 184-221.
. 1972. Some empirical issues in the theory of transformational grammar. Goals
of linguistic theory, ed. by S. Peters, 63-130. Englewood Cliffs, N.J.: Prentice-
Hall.
, and M. HALLE.1968. The sound pattern of English. New York: Harper & Row.
EMONDS,J. E. 1972. Evidence that indirect object movement is a structure-preserving
rule. Foundations of Language 8.546-61.
FILLMORE, C. 1968. The case for case. Universals in linguistic theory, ed. by E. Bach &
R. Harms, 1-88. New York: Holt, Rinehart & Winston.
FODOR,JERRY.1970. Three reasons for not deriving 'kill' from 'cause to die'. Linguistic
Inquiry 1.429-38.
FRASER,B. 1965. An examination of the verb-particle construction in English. MIT
dissertation.
. 1970. Idioms within a transformational grammar. Foundations of Language
6.22-42.
MORPHOLOGICAL AND SEMANTIC REGULARITIES IN THE LEXICON 671

HALLE,M. 1959. The sound pattern of Russian. The Hague: Mouton.


--. 1973. Prolegomena to a theory of word formation. Linguistic Inquiry 4.3-16.
JACKENDOFF, R. S. 1972. Semantic interpretation in generative grammar. Cambridge,
Mass.: MIT Press.
JACOBS,R., and P. ROSENBAUM (eds.) 1970. Readings in English transformational
grammar. Waltham, Mass.: Blaisdell.
KATZ,J. J. 1973. Compositionality, idiomaticity, and lexical substitution. A Festschrift
for Morris Halle, ed. by S. Anderson & P. Kiparsky, 357-76. New York: Holt,
Rinehart & Winston.
LAKOFF,G. 1971a. On generative semantics. Semantics: an interdisciplinary reader,
ed. by D. Steinberg & L. Jakobovits, 232-96. Cambridge: University Press.
--. 1971b. Syntactic irregularity. New York: Hiolt, Rinehart & Winston.
LEES,R. B. 1960. The grammar of English nominalizations. Bloomington: Indiana
University.
MCCAWLEY,J. 1968. Lexical insertion in a transformational grammar without deep
structure. Papers from the 4th Regional Meeting, Chicago Linguistic Society,
71-80.
PERLMUTTER, D. 1970. The two verbs 'begin'. In Jacobs & Rosenbaum, 107-19.
STANLEY, R. 1967. Redundancy rules in phonology. Lg. 43.393-436.
[Received 1 July 1974.]

You might also like