Professional Documents
Culture Documents
Your use of the JSTOR archive indicates your acceptance of JSTOR's Terms and Conditions of Use, available at .
http://www.jstor.org/page/info/about/policies/terms.jsp. JSTOR's Terms and Conditions of Use provides, in part, that unless
you have obtained prior permission, you may not download an entire issue of a journal or multiple copies of articles, and you
may use content in the JSTOR archive only for your personal, non-commercial use.
Please contact the publisher regarding any further use of this work. Publisher contact information may be obtained at .
http://www.jstor.org/action/showPublisher?publisherCode=lsa. .
Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printed
page of such transmission.
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of
content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms
of scholarship. For more information about JSTOR, please contact support@jstor.org.
Linguistic Society of America is collaborating with JSTOR to digitize, preserve and extend access to Language.
http://www.jstor.org
MORPHOLOGICAL AND SEMANTIC REGULARITIES IN THE LEXICON
RAY JACKENDOFF
Brandeis University
This paper proposes a theory of the lexicon consistentwith the Lexicalist Hypoth-
esis of Chomsky's 'Remarks on nominalization' (1970). The crucial problem is to
develop a notion of lexical redundancyruleswhich permitsan adequatedescriptionof
the partial relations and idiosyncrasy characteristic of the lexicon. Two lexicalist
theories of redundancyrules, each equipped with an evaluation measure, are com-
pared on the basis of their accounts of nominalizations; the superior one, the FULL-
ENTRY THEORY, is then applied to a range of further well-known examples such as
causative verbs, nominal compounds, and idioms.
The starting point of the Lexicalist Hypothesis, proposed in Chomsky's 'Remarks
on nominalization' (1970), is the rejection of the position that a nominal such as
Bill's decision to go is derived transformationally from a sentence such as Bill
decided to go. Rather, Chomsky proposes that the nominal is generated by the base
rules as an NP, no S node appearing in its derivation. His paper is concerned with
the consequences of this position for the syntactic component of the grammar. The
present paper will develop a more highly articulated theory of the lexical treatment
of nominals, show that it is independently necessary, and extend it to a wide range
of cases other than nominalizations.1
The goal of this paper is very similar to that of Halle 1973: the presentation of a
framework in which discussion of lexical relations can be made more meaningful.
I will not present any new and unusual facts about the lexicon; rather, I will try
to formulate a theory which accommodates a rather disparate range of well-known
examples of lexical relations. The theory presented here, which was developed
independently of Halle's, has many points of correspondence with it; I have,
however, attempted a more elaborate working out of numerous details. I will
mention important differences between the theories as they arise.
1 My thanks go to John Bowers, FrancoisDell, Noam Chomsky, Morris Halle, and to classes
at the 1969 LinguisticInstitute and BrandeisUniversityfor valuablediscussion.Earlierversions
of this paper were presentedto the 1969 summerLSA meeting at the University of Illinois and
to the 1970 La Jolla Syntax Conference.Thanks also to Dana Schaul for many useful examples
scattered throughout the paper.
639
640 LANGUAGE, VOLUME 51, NUMBER 3 (1975)
node N; decide is inserted under V. Since Chomsky gives no arguments for this
particular formulation, I feel free to adopt here the alternative theory that decide
and decision have distinct but related lexical entries. In regard to Chomsky's
further discussion, the theories are equivalent; the one to be used here extends
more naturally to the treatment of other kinds of lexical relations (cf. ?5). Our
problem then is to develop a formalism which can express the relations between
lexical entries in accord with a native speaker's intuition.2
It is important to ask what it means to capture a native speaker's intuition of
lexical relatedness. It makes sense to say that two lexical items are related if knowing
one of them makes it easier to learn the other-i.e. if the two items contain less
independent information than two unrelated lexical items do. A grammar that
expresses this fact should be more highly valued than one that does not. The
advocate of a transformational relationship between decide and decision claims
that this intuitive sense of relatedness is expressed by his transformation, in that
it is unnecessary to state the shared properties of the words twice. In fact, it
is unnecessary to state the properties of decision at all, since they are predictable
from the lexical entry of decide and the nominalization transformation.3 Hence
a grammarcontaining the nominalization transformation contains less independent
information than one without it-since instead of listing a large number of
nominalizations, we can state a single transformation. Within such a grammar,
the pair decide-decisioncontains fewer symbols than a random pair such as decide-
jelly: given decide, there need be no lexical entry at all for decision, but jelly
needs a lexical entry whether or not decide is listed. Furthermore, the regularity
of decide-decision means that many pairs will be related by the transformation,
so a net reduction in symbols in the grammar is accomplished, and the evaluation
measure will choose a grammar including the transformation over one without it.
Since the Lexicalist Hypothesis denies a transformational relationship between
decide and decision, their relationship must be expressed by a rule within the
lexical component. Transformational grammar has for many years had a name
for the kind of rule that expresses generalizations within the lexicon-it is called a
2
Advocates of the theory of generative semantics might at this point be tempted to claim
that a formalism for separate but related lexical items is yet another frill requiredby lexicalist
syntax, and that generativesemanticshas no need for this type of rule. I hasten to observe that
this claim would be false. In the generative semantic theory of lexical insertion developed in
McCawley 1968 and adopted by Lakoff 1971a, lexical items such as kill and die have separate
lexical entries, and are inserted to distinct derived syntactic/semantic structures. For a
consistent treatmentof lexical insertion, then, breakin The windowbroke must be inserted onto
a tree of the form [vBREAK],while break in John broke the window must be inserted onto
[vCAUSE BREAK], which has undergonePredicateRaising; in other words, breakhas two distinct
lexical entries. Semantically,the two breaksare related in exactly the same way as die and kill;
but clearly break and break must be related in the lexicon in a way that die and kill are not.
A similarargumentholds for ruleand rulervs. ruleand king. Thus generativesemanticsrequires
rules expressing lexical relations for exactly the same reasons that the Lexicalist Hypothesis
needs them. Only in the earlier 'abstract syntax' of Lees 1960 and Lakoff 1971b are such rules
superfluous.
3 Of course, it also is difficultto express the numerous idiosyncrasiesof nominalizations, as
Chomsky 1970 points out at some length.
642 LANGUAGE, VOLUME 51, NUMBER 3 (1975)
lexical redundancy rule; but little work has been done until now toward a formal-
ization of such rules.
The first question we must ask is: By what means does the existence of a lexical
redundancy rule reduce the independent information content of the lexicon ? There
are two possibilities. The first, which is more obvious and also more akin to the
transformational approach, gives decide a fully specified entry; but the entry for
decision is either non-existent or, more likely, not fully specified. The redundancy
rule fills in the missing information from the entry of decide at some point in the
derivation of a sentence containing decision, perhaps at the stage of lexical insertion.
As in the transformational approach, the independent information content of
decide-decision is reduced, because the entry for decision does not have to be filled
in. The evaluation measure again can simply count symbols in the grammar. We
THEORY.
may call this theory the IMPOVERISHED-ENTRY
Within such a theory, a typical lexical entry will be of the form given below. All
aspects of this form are traditional except for the 'entry number', which is simply
an index permitting reference to a lexical entry independent of its content:
(1) rentry number
/phonological representation/
L syntacticfeatures
SEMANTIC REPRESENTATION J
For example, decide will have the form 2. The entry number is arbitrary, and the
semantic representation is a fudge standing for some complex of semantic markers.
The NP indices correlate the syntactic arguments of the verb to the semantic
arguments (cf. Jackendoff 1972, Chapter 2, for discussion of this):
(2) -784
/decid/
+V
+[[NPi on NP2]
_NP1DECIDE
ONNP2_
We now introduce a redundancy rule, 3, in which the two-way arrow may be read
as the symmetric relation 'is lexically related to'. The rule thus can be read: 'A
lexical entry x having such-and-such properties is related to a lexical entry w
having such-and-such properties.'
x - ~w 1
/y + ion/ ly!
(3) +N ^ +
( ) + [NP1's (P) NP2] + [NP1 (P) NP2]
ABSTRACT RESULT OF ACT NP1 Z NP2
OF NP1'S Z-ING NP2 - L
Given the existence of 3, decision needs only the following lexical entry:
(4) -375
derived from 784
by rule 3
MORPHOLOGICAL AND SEMANTIC REGULARITIES IN THE LEXICON 643
This theory thus reduces the lexical entry for decision to a cross-reference to the
related verb plus a reference to the redundancy rule. The entries of many other
nouns will be simplified similarly by the use of a reference to 3. The independent
information content of the lexicon can be determined straightforwardlyby adding
up the information in lexical entries plus that in redundancy rules; hence the
evaluation measure can be stated so as to favor grammars with fewer symbols.
A second possible approach to lexical redundancy rules, the FULL-ENTRYTHEORY,
assumes that both decide and decision have fully specified lexical entries, and that
the redundancy rule plays no part in the derivation of sentences, as it does in both
the transformational theory and the impoverished-entry theory. Rather, the
redundancyrule plays a role in the information measure for the lexicon. It designates
as redundantthat information in a lexical entry which is predictable by the existence
of a related lexical item; redundant information will not be counted as independent.
In the full-entry theory, lexical entries again have the form of 1, except that an
entry number is unnecessary. Decide has the form of 2, minus the entry number.
Decision, however, will have the following entry:
(5) [/decid + ion/
+N
+ [NP's __ on NP2]
ABSTRACT RESULT OF ACT OF
NP1'S DECIDING NP2
We evaluate the lexicon as follows: first, we must determine the amount of in-
dependent information added to the lexicon by introducing a single new lexical
entry; then, by adding up all the entries, we can determine the information content
of the whole lexicon.
For a first approximation, the information added by a new lexical item, given a
lexicon, can be measured by the following convention:
(6) (Information measure)
Given a fully specified lexical entry W to be introduced into the lexicon,
the independent information it adds to the lexicon is
(a) the information that W exists in the lexicon, i.e. that W is a word
of the language; plus
(b) all the information in W which cannot be predicted by the existence
of some redundancy rule R which permits W to be partially
described in terms of information already in the lexicon; plus
(c) the cost of referring to the redundancy rule R.
Here 6a is meant to reflect one's knowledge that a word exists. I have no clear
notion of how important a provision it is (it may well have the value zero), but I
include it for the sake of completeness. The heart of the rule is 6b; this reflects one's
knowledge of lexical relations. Finally, 6c represents one's knowledge of which
regularities hold in a particular lexical item; I will discuss this provision in more
detail in ?6.
To determine the independent information content of the pair decide-decision,
let us assume that the lexicon contains neither, and that we are adding them one by
one into the lexicon. The cost of adding 2, since it is related to nothing yet in the
644 LANGUAGE, VOLUME 51, NUMBER 3 (1975)
lexicon, is the information that a word exists, plus the complete information content
of the entry 2. Given 2 in the lexicon, now let us add 5. Since its lexical entry is
completely predictable from 2 and redundancy rule 3, its cost is the information
that a word exists plus the cost of referringto 3, which is presumably less than the
cost of all the information in 5. Thus the cost of adding the pair decide-decision
is the information that two words exist, plus the total information of the entry 2,
plus the cost of referring to redundancy rule 3.
Now note the asymmetry here: if we add decision first, then decide, we arrive at
a different sum: the information that two words exist, plus the information
contained in 5, plus the cost of referring to redundancy rule 3 (operating in the
opposite direction). This is more than the previous sum, since 5 contains more
information than 2: the four extra phonological segments +ion and the extra
semantic information represented by ABSTRACT RESULT OF ACT OF. To establish
the independent information content for the entire lexicon, we must choose an
order of introducing the lexical items which minimizes the sum given by successive
applications of 6. In general, the more complex derived items must be introduced
after the items from which they are derived. The information content of the
lexicon is thus measured as follows:
(7) (Information content of the lexicon)
Given a lexicon L containing n entries, W1, ..., Wn, each permutation P
of the integers 1, ..., n determines an order Ap in which W1, ..., W,, can
be introduced into L. For each ordering Ap, introduce the words one
by one and add up the information specified piecemeal by procedure 6,
to get a sum Sp. The independent information content of the lexicon
L is the least of the n! sums Sp, plus the information content of the
redundancy rules.
Now consider how an evaluation measure can be defined for the full-entry
theory. Minimizing the number of symbols in the lexicon will no longer work,
because a grammar containing decide and decision, but not redundancy rule 3,
contains fewer symbols than a grammar incorporating the redundancy rule, by
exactly the number of symbols in the redundancy rule. Since we would like the
evaluation measure to favor the grammar incorporating the redundancy rule, we
will state the evaluation measure as follows:
(8) (Full-entry theory evaluation measure)
Of two lexicons describing the same data, that with a lower information
content is more highly valued.
The details of the full-entry theory as just presented are somewhat more complex
than those of either the transformational theory or the impoverished-entry theory.
However, its basic principle is in fact the same: the evaluation measure is set up
so as to minimize the amount of unpredictable information the speaker knows
(or must have learned). However, the measure of unpredictable information is no
longer the number of symbols in the lexicon, but the output of information
measure 7: this expresses the fact that, when one knows two lexical items related
by redundancy rules, one knows less than when one knows two unrelated items
of commensurate complexity.
I will argue that the full-entry theory, in spite of its apparent complexity, is
MORPHOLOGICAL AND SEMANTIC REGULARITIES IN THE LEXICON 645
How can the three theories we have discussed describe these verbs? The trans-
formational theory must propose a hypothetical lexical item marked obligatorily
to undergo the nominalization transformation (cf. Lakoff 1971b). Thus the lexicon
must be populated with lexical items such as *fiss which are positive absolute
exceptions to various word-formation transformations. The positive absolute
exception is of course a very powerful device to include in grammatical theory
(see discussion in Jackendoff 1972). Furthermore, the use of an EXCEPTIONfeature
to prevent a lexical item from appearing in its 'basic' form is counter-intuitive: it
claims that English would be simpler if *fiss were a word, since one would not have
to learn that it is exceptional. Lakoffin fact claims that there must be a hypothetical
verb *king, corresponding to the noun king as the verb rule corresponds to the
noun ruler. Under his theory, the introduction of a real verb king would make
English simpler, in that it would eliminate an absolute exception feature from the
lexicon. In other words, the evaluation measure for the transformational theory
seems to favor a lexicon in which every noun with functional semantic information
has a related verb. Since there is little evidence for such a preference, and since it is
strongly counter-intuitive in the case of king, the transformational account-besides
requiring a very powerful mechanism, the absolute exception-is incorrect at the
level of explanatory adequacy.
Next consider the impoverished-entry theory. There are two possible solutions
to the problem of non-existent derivational ancestors. In the first, the entry of
retributionis as unspecified as that of decision (4); and it is related by redundancy
rule 3 to an entry retribute, which however is marked [-Lexical Insertion]. The
cost of adding retributionto the lexicon is the sum of the information in the entry
*retribute,plus the cost of retribution'sreferences to the redundancy rule and to the
(hypothetical) lexical item, plus the information that one word exists (or, more
likely, two-and the information that one of those is non-lexical). Under the
reasonable assumption that the cost of the cross-references is less than the cost of
the phonological and semantic affixes, this arrangement accurately reflects our
initial intuition about the information content of retribution. Furthermore, it
eliminates the use of positive absolute exceptions to transformations, replacing
them with the more restricted device [-Lexical Insertion]. Still, it would be nice to
dispense with this device as well, since it is rather suspicious to have entries which
have all the properties of words except that of being words. The objections to
hypothetical lexical items in the transformational theory at the level of explanatory
adequacy in fact apply here to [-Lexical Insertion] as well: the language is always
simpler if this feature is removed.
We might propose eliminating the hypothetical lexical entries by building them
into the entries of the derived items:
(9) 511 1
derived by rule 3 from
/retribut/
+V
+ [NP1 for NP2]
_ NP2 RETRIBUTE NP2 _ -
MORPHOLOGICAL AND SEMANTIC REGULARITIES IN THE LEXICON 647
The cost of 9 is thus the information that there is a word retribution, plus the
information within the inner brackets, plus the cost of referring to the redundancy
rule. Again, the assumption that the cross-reference costs less than the additional
information /ion/ and ABSTRACT RESULT OF ACT OF gives the correct description
of our intuitions. This time we have avoided hypothetical lexical items, at the
expense of using rather artificial entries like 9.
This artificiality betrays itself when we try to describe the relation between sets
like aggression-aggressive-aggressor, aviation-aviator, and retribution-retributive.
If there are hypothetical roots *aggress, *aviate, and *retribute,each of the members
of these sets can be related to its root by the appropriate redundancy rule 3,
lOa, or 10b, where lOa and 10b respectively describe pairs like predict-predictive
and protect-protector (I omit the semantic portion of the rules at this point for
convenience-in any case, ?4 will justify separating the morphological and
semantic rules):
[x ] [w 1
(10) a. l/y + ive/ <[> IUy
[+A J L+VJ
b. ly + or/ - [/y/ ]
+N _ +V
Suppose we eliminate hypothetical lexical items in favor of entries like 9 for
retribution.What will the entry for retributivelook like? One possibility is:
(11) [65 1
derived by rule lOa from
"/retribut/
+V
+NP1 for NP2
_NP1 RETRIBUTE NP2 _
But this solution requires us to list the information in the inner brackets twice, in
retributionand retributive:such an entry incorrectly denies the relationship between
the two words.
Alternatively, the entry for retributivemight be 12 (I use 3' here to denote the
inverse of 3, i.e. a rule that derives verbs from -ion nouns; presumably the presence
of 3 in the lexical component allows us to use its inverse as well):
-
(12) -65
derived by 3' and lOa from
511
because it claims retributionis more basic than retributive.Clearly the entries could
just as easily have been set up with no difference in cost by making retributive
basic. The same situation will arise with a triplet like aggression-aggressor-
aggressive, where the choice of one of the three as basic must be purely arbitrary.
Intuitively, none of the three should be chosen as basic, and the formalization of the
lexicon should reflect this. The impoverished-entry theory thus faces a choice:
either it incorporates hypothetical lexical items, or it describes in an unnatural
fashion those related lexical items which are related through a non-lexical root.
Consider now how the full-entry theory accounts for these sets of words,
beginning with the case of a singleton like perdition (or conflagration),which has
no relatives like *perdite, *perditive etc., but which obviously contains the -ion
ending of rule 3. We would like the independent information content of this item
to be less than that of a completely idiosyncratic word like orchestra-but more
than that of, say, damnation,which is based on the lexical verb damn. The impover-
ished-entry theory resorts either to a hypothetical lexical item *perdite or to an
entry containing another entry, like 9, which we have seen to be problematic.
The full-entry theory, on the other hand, captures the generalization without
extra devices. Note that 6b, the measure of non-redundant information in the
lexical entry, is cleverly worded so as to depend on the existence of redundant
information somewhere in the lexicon, but not necessarily on the existence of
related lexical entries. In the case of perdition, the only part of the entry which
represents a regularity in the lexicon is in fact the -ion ending, which appears as
part of the redundancy rule 3. What remains irregular is the residue described in
the right-hand side of 3, i.e. that part of perdition which corresponds to the non-
lexical root *perdite.Hence the independent information content of perditionis the
information that there is a word, plus the cost of the root, plus the cost of referring
to rule 3. Perdition adds more information than damnation,then, because it has a
root which is not contained in the lexicon; it contains less information than
orchestra because the ending -ion and the corresponding part of the semantic
content are predictable by 3 (presumably the cost of referringto 3 is less than the
information contained in the ending itself; see ?6).
We see then that the full-entry theory captures our intuitions about perdition
without using a hypothetical lexical item. The root *perdite plays only an indirect
role, in that its COSTappears in the evaluation of perdition as the difference between
the full cost of perditionand that of the suffix; nowhere in the lexicon does the root
appear as an independent lexical entry.
Now turn to the rootless pair retribution-retributive.Both words will have fully
specified lexical entries. To determine the independent information content of the
pair, suppose that retributionis added to the lexicon first. Its independent informa-
tion, calculated as for perdition above, is the information that there is a word, plus
the cost of the root *retribute, plus the cost of referring to 3. Note again that
*retribute does not appear anywhere in the lexicon. Now we add to the lexicon
the entry for retributive,which is entirely predictable from retributionplus redun-
dancy rules 3 and lOa. According to information measure 6, retributiveadds the
information that it is a word, plus the cost of referringto the two redundancy rules.
The cost of the pair for this order of introduction is therefore the information that
MORPHOLOGICAL AND SEMANTIC REGULARITIES IN THE LEXICON 649
there are two words, plus the information in the root *retribute,plus the cost of
referring to redundancy rules three times. Alternatively, if retributiveis added to
the lexicon first, followed by retribution,the independent information content of the
pair comes out the same, though this time the cost of the root appears in the
evaluation of retributive. Since the costs of these two orders are commensurate,
there is no optimal order of introduction, and thus no reason to consider either
item basic.
Similarly, the triplet aggression-aggressor-aggressivewill have, on any order of
introduction, an independent information content consisting of the information
that there are three words, plus the information content of the root *aggress, plus
the cost of referring to redundancy rules five times (once for the first entry in-
troduced, and twice for each of the others). Since no single order yields a signi-
ficantly lower information content, none of the three can be considered basic to the
others.
Thus the full-entry theory provides a description of rootless pairs and triplets
which avoids either a root in the lexicon or a claim that one member of the group is
basic, the two alternatives encountered by the impoverished entry theory. The full-
entry theory looks still more appealing when contrasted with the transformational
theory's account of these items. The theory of Lakoff 1971b introduces a positive
absolute exception on *perdite, requiring it to nominalize; but *aggress may
undergo either -ion nominalization, -or nominalization, or -ive adjectivaliza-
tion, and it must undergo one of the three. Lakoff is forced to introduce Boolean
combinations of exception features, together marked as an absolute exception, in
order to describe this distribution-patently a brute force analysis.
In the full-entry theory, then, the lexicon is simply a repository of all information
about all the existing words; the information measure expresses all the relation-
ships. Since the full-entry theory escapes the pitfalls of the impoverished-entry
theory, without giving up adequacy of description, we have strong reason to
prefer the former, with its non-standard evaluation measure. From here on, the
term 'lexicalist theory' will be used to refer only to the full-entry theory.
Before concluding this section, let us consider a question which frequently
arises in connection with rootless pairs and triplets: What is the effect on the lexicon
if a back-formation takes place, so that a formerly non-existent root (say *retribute)
enters the language? In the transformational theory, the rule feature on the hypo-
thetical root is simply erased, and the lexicon becomes simpler, i.e. more regular.
In the lexicalist theory, the account is a bit more complex, but also more sophis-
ticated. If retribute were simply added without disturbing the previous order for
measuring information content, it would add to the cost of the lexicon the informa-
tion that there is a new word plus the cost of referring to one of the redundancy
rules. Thus the total cost of retribution-retributive-retributewould be the informa-
tion that there are three words, plus the information in the root retribute,plus the
cost of four uses of redundancy rules. But now that retribute is in the lexicon, a
restructuring is possible, in which retribute is taken as basic. Under this order of
evaluation, the information content of the three is the information that there are
three words, plus the information in retribute, plus only two uses of redundancy
rules. This restructuring, then, makes the language simpler than it was before
650 LANGUAGE, VOLUME 51, NUMBER 3 (1975)
retribute was introduced, except that there is now one more word to learn than
before. What this account captures is that a back-formation ceases to be recognized
as such by speakers precisely when they restructure the evaluation of the lexicon,
taking the back-formation rather than the morphological derivatives as basic.
I speculate that the verb *aggress, which seems to have only marginal status in
English, is still evaluated as a back-formation, i.e. as a derivative of aggression-
aggressor-aggressive, and not as their underlying root. Thus the lexicalist theory
of nominalizations provides a description of the diachronic process of back-
formation which does more than simply erase a rule feature on a hypothetical
lexical item: it can describe the crucial step of restructuringas well.
(13) ..M: [ ]
[/+ ion/]
[+N
+ [NPi's - ((P)NP2)] "+ V
[+N 1 [+V 1
S2: +[ (NP2)] ] + [NP1 (NP2)]
GROUPTHATZ-S (NP2) NP1 Z (NP2)
+N ]
l Z-ING NP2
An example of the cross-classification of the morphological and semantic
relations is the following table of nouns, where each row contains nouns of the
MORPHOLOGICAL AND SEMANTIC REGULARITIES IN THE LEXICON 651
same semantic category, and each column contains nouns of the same morpho-
logical category.
(15) Ml M2 M3
S 1: discussion argument rebuttal
S2: congregation government
S3: copulation establishment refusal
That is, John's discussion of the claim, John's argument against the claim, and
John's rebuttal of the claim are semantically related to John discussed the claim,
John argued against the claim, and John rebutted the claim respectively in a way
expressed by the subcategorization conditions and semantic interpretations of S1;
the congregationand the governmentofFredonia are related by S2 to they congregate
and they govern Fredonia; and John's copulationwith Mary, John's establishmentof
a new order, and John's refusal of the ojfferare related by S3 to John copulatedwith
Mary, John established a new order, and John refused the offer.5 There are further
nominalizing endings such as -ition (supposition),-ing (writing) and -0 (offer); and
further semantic relations, such as ONE WHO Z's (writer, occupant) and THING IN
WHICH ONE Z's (residence, entrance). The picture that emerges is of a family of
nominalizing affixes and an associated family of noun-verb semantic relationships.
To a certain extent, the particularmembers of each family that are actually utilized
in forming nominalizations from a verb are chosen randomly. Insofar as the choice
is random, the information measure must measure independently the cost of
referring to morphological and semantic redundancy rules (cf. ?6 for further
discussion).
How do we formalize the information measure, in light of the separation of
morphological rules (M-rules) and semantic rules (S-rules)? An obvious first point
is that a semantic relation between two words without a morphological relationship
cannot be counted as redundancy: thus the existence of the verb rule should render
the semantic content of the noun ruler redundant, but the semantic content of
king must count as independent information. Hence we must require a morpho-
logical relationship before semantic redundancy can be considered.
A more delicate question is whether a morphological relationship alone should
be counted as redundant. For example, professorand commission(as in 'a salesman's
commission') are morphologically related to profess and commit; but the existence
of a semantic connection is far from obvious, and I doubt that many English
speakers other than philologists ever make the association. What should the
information content of these items include? A permissive approach would allow
the phonology of the root to be counted as redundant information; the only
non-redundant part of professor, then, would be some semantic information like
TEACH. A restrictive approach would require a semantic connection before mor-
phology could be counted as redundant; professor then would be treated like
perdition, as a derived word with a non-lexical root, and both the phonology
/profess/ and the semantics TEACHwould count as independent information.
5 I assume, with Chomsky 1970, that the of in nominalizationsis transformationallyinserted.
Note also that the semanticrelations are only approximate-the usual idiosyncrasiesappear in
these examples.
652 LANGUAGE, VOLUME 51, NUMBER 3 (1975)
In ?5.1 I will present two cases in which only morphological rules play a role
because there are no semantic regularities. However, where semantic rules exist,
it has not yet been established whether their use should be required in conjunction
with morphological rules. Therefore the following restatement of information
measure 6 has alternative versions:
(16) (Information measure)
Given a fully specified lexical entry W to be introduced into the lexicon,
the independent information it adds to the lexicon is
(a) the information that W exists in the lexicon; plus
(b) (permissive form) all the information in W which cannot be
predicted by the existence of an M-rule which permits W to be
partially described in terms of information already in the
lexicon, including other lexical items and S-rules; or
(b') (restrictive form) all the information in W which cannot be
predicted by the existence of an M-rule and an associated S-rule
(if there is one) which together permit W to be partially described
in terms of information already in the lexicon; plus
(c) the cost of referring to the redundancy rules.
Examples below will show that the permissive form of 16 is preferable.
The redundancy rules developed so far describe the
5. OTHERAPPLICATIONS.
relation between verbs and nominalizations. It is clear that similar rules can describe
de-adjectival nouns (e.g. redness, entirety, width), deverbal adjectives (predictive,
explanatory), denominal adjectives (boyish, national, fearless), de-adjectival verbs
(thicken, neutralize, yellow), and denominal verbs (befriend, originate, smoke).
Likewise, it is easy to formulate rules for nouns with noun roots like boyhood,6
adjectives with adjective roots like unlikely, and verbs with verb roots like re-enter
and outlast.7 Note, by the way, that phonological and syntactic conditions such as
choice of boundary and existence of internal constituent structure can be spelled
out in the redundancy rules.
Note also that a complex form such as transformationalistcan be accounted
for with no difficulty: each of the steps in its derivation is described by a regular
redundancy rule. No question of ordering rules or of stating rule features need ever
arise (imagine, by contrast, the complexity of Boolean conditions on exceptions
which would be needed in the entry for transform,if we were to generate this word
in a transformational theory of the lexicon). Transformationalistis fully specified
in the lexicon, as are transform, transformation,and transformational.The total
information content of the four words is the information that there are four words,
plus the information in the word transform,plus idiosyncratic information added
by successive steps in derivation (e.g. that transformationin this sense refers to a
component of a theory of syntax and not just any change of state, and that a
transformationalistin the sense used in this paper is one who believes in a particular
6 The abstractnessof nouns in -hood, mentioned by Halle 1973 as an example, is guaranteed
by the fact that the associated semantic rule yields the meaning STATE (or PERIOD) OF BEING
A Z.
7 Cf. Fraser 1965 for an interestingdiscussion of this last class of verbs.
MORPHOLOGICAL AND SEMANTIC REGULARITIES IN THE LEXICON 653
form of transformational theory), plus the cost of referring to the three necessary
redundancy rules. Note that the information measure allows morphologically
derived lexical items to contain more semantic information than the rule predicts; we
use this fact crucially in describing the information content of transformationalist.
More striking examples will be given below.
With these preliminary observations, I will now present some more diverse
applications of the redundancy rules.
VERBS.Many verbs in English can be analysed into one of the
5.1. PREFIX-STEM
prefixes in-, de-, sub-, ad-, con-, per-, trans- etc. followed by one of the stems -sist,
-mit, -fer, -cede, -cur etc. Chomsky & Halle argue, for phonological reasons, that
the prefix and stem are joined by a special boundary =. Whether a particular
prefix and a particular stem together form an actual word of English seems to be
an idiosyncratic fact:
(17) *transist transmit transfer *transcede *transcur
persist permit prefer precede *precur
consist commit confer concede concur
assist admit *affer accede *accur
subsist submit suffer succeed *succur
desist *demit defer *decede *decur
insist *immit infer *incede incur
We would like the information measure of the lexicon to take into account the
construction of these words in computing their information content. There are two
possible solutions. In the first, the lexicon will contain, in addition to the fully
specified lexical entries for each actually occurring prefix-stem verb, a list of the
prefixes and stems from which the verbs are formed. The redundancy rules will
contain the following morphological rule, which relates three terms:
E+StemJJ
The information content of a particular prefix-stem verb will thus be the informa-
tion that there is a word, plus the semantic content of the verb (since there is no
semantic rule to go with 18, at least in most cases), plus the cost of referring to
morphological rule 18. The cost of each individual prefix and stem will be counted
only once for the entire lexicon.
Since we have up to this point been unyielding on the subject of hypothetical
lexical items, we might feel somewhat uncomfortable about introducing prefixes
and stems into the lexicon. However, this case is somewhat different from earlier
ones. In the case of perdition, the presumed root is the verb *perdite. If perdite
were entered in the lexicon, we would have every reason to believe that lexical
insertion transformations would insert perdite into deep structures, and that the
syntax would then produce well-formed sentences containing the verb perdite.
In order to prevent this, we would have to put a feature in the entry for perdite to
block the lexical insertion transformations. It is this rule feature [-Lexical
654 LANGUAGE, VOLUME 51, NUMBER 3 (1975)
Insertion] which we wish to exclude from the theory. Consider now the lexical
entry for the prefix trans-:
(19) [/trans/ ]
+ly
[Prefix]
Trans-has no (or little) semantic information, and as syntactic information has only
the marker [+Prefix]. Since the syntactic category Prefix is not generated by the
base rules of English, there is no way for trans- alone to be inserted into a deep
structure. It can be inserted only when combined with a stem to form a verb, since
the category Verb does appear in the base rules. Hence there is no need to use the
offending rule feature [-Lexical Insertion] in the entry for trans-, and no need to
compromise our earlier position on *perdite.
However, there is another possible solution which eliminates even entries like 19,
by introducing the prefixes and stems in the redundancy rule itself. In this case, the
redundancy rule consists of a single term, and may be thought of as the simplest
type of'word-formation' rule:
trans sist
per mit
con fer
aD = cede /
(20) (/ suB tain /
de
The information content of prefix-stem verbs is the same as before, but the cost of
the individual prefixes and stems is counted as part of the redundancy rule, not of
the list of lexical items. The two solutions appear at this level of investigation to be
equivalent, and I know as yet of no empirical evidence to decide which should be
permitted by the theory or favored by the evaluation measure.
This is the case of morphological redundancy without semantic redundancy
promised in ?4. Since, for the most part, prefixes and stems do not carry semantic
information, it is not possible to pair 18 or 20 with a semantic rule.8 Obviously
the information measure must permit the morphological redundancy anyway.
Besides complete redundancy, we now have three cases to consider: those in which
a semantic redundancy rule relates a word to a non-lexical root (e.g. perdition),
those in which the semantic rule relates a word incorrectly to a lexical root (e.g.
professor), and those in which there is no semantic rule at all. The three cases are
independent, and a decision on one of them need not affect the others. Thus the
decision to allow morphological redundancy for prefix-stem verbs still leaves open
the question raised in ?4 of how to treat professor.
It should be pointed out that word-formation rules like 20 are very similar to
8
If they did carry semantic information, it would be more difficult, but not necessarily
impossible, to state the rule in the form of 20. This is a potentialdifferencebetweenthe solutions,
concerning which I have no evidence at present.
MORPHOLOGICAL AND SEMANTIC REGULARITIES IN THE LEXICON 655
Halle's word-formation rules (1973). The major difference here between his theory
and mine is that his lexicon includes, in addition to the dictionary, a list of all
MORPHEMESin the language, productive and unproductive. The present theory
lists only WORDSin the lexicon. Productive affixes are introduced as part of lexical
redundancy rules, and non-productive non-lexical morphemes (such as *perdite)
do not appear independently anywhere in the lexical component. Other than the
arguments already stated concerning the feature [Lexical Insertion], I know of little
evidence to distinguish the two solutions. However, since Halle has not formulated
the filter, which plays a crucial role in the evaluation measure for his theory of the
lexicon, it is hard to compare the theories on the level where the present theory
makes its most interesting claims.
m-i\ , +N JLZJ ]
(3) a. [Z THAT CARRIES W] (
\+N }
1,~~~~I W\\
f+N1l
bZ MADEOF WI '' { +NN}
+
\
C.LZ
c+. LIKE A
A W\
W] N{ [\} [z
[+N]
The redundancy rules thus define the set of possible compounds of English, and the
lexicon lists the actually occurring compounds.
The information measure 16 gives an intuitively correct result for the independent
information in compounds. For example, since the nouns garbage and man are
in the lexicon, all their information will be counted as redundant in evaluating the
entry for garbage man. Thus the independent information content of garbage man
will be the information that such a word exists, plus any idiosyncratic facts about
the meaning (e.g. that he picks up rather than delivers garbage), plus the cost of
referring to 22 and 23a. The information of a complex compound like gingerbread
man is measured in exactly the same way; but the independent information in its
constituent gingerbread is reduced because of its relation to ginger and bread.
Gingerbreadman is thus parallel in its evaluation to the case of transformationalist
cited earlier.
Now consider the problem of evaluating the following nouns:
(24) a. blueberry, blackberry
b. cranberry, huckleberry
c. gooseberry, strawberry
Blueberryand blackberry are obviously formed along the lines of morphological
rule 25 and semantic rule 26. This combination of rules also forms flatiron, high-
chair, madman, drydock, and many others.
Thus blueberryand blackberry are evaluated in exactly the same way as garbage
man.
Cranberryand huckleberrycontain one lexical morpheme and one non-lexical
morpheme. The second part (-berry)and its associated semantics should be redun-
MORPHOLOGICAL AND SEMANTIC REGULARITIES IN THE LEXICON 657
dant, but the phonological segments /craen/ and /hukl/, and the semantic charac-
teristics distinguishing cranberries and huckleberries from other kinds of berries,
must be non-redundant. Hence this case is just like perdition, where a non-lexical
root is involved, and the information measure formulated for the case of per-
dition will yield the intuitively correct result. One problem is that the lexical
categories of cran- and huckle- are indeterminate, so it is unclear which mor-
phological rule applies. Likewise, it is unclear which semantic rule applies. However,
I see nothing against arbitrarily applying the rules which cost least; this conven-
tion will minimize the information in the lexicon without jeopardizing the generality
of the evaluation procedure.
We observe next that gooseberry and strawberrycontain two lexical morphemes
and are both berries, but gooseberries have nothing to do with geese and straw-
berries have nothing to do with straw. This case is thus like professor, which has
nothing to do semantically with the verb profess, and exactly the same question
arises in their evaluation: should they be intermediate in cost between the previous
two cases, or should they be evaluated like cranberry, with straw- and goose-
counted as non-redundant? The fact that there is pressure towards phonological
similarity even without semantic basis (e.g. gooseberry was once groseberry) is
some evidence in favor of the permissive form of 16, in which morphological
similarity alone is sufficient for redundancy.
Another semantic class of compound nouns (exocentric compounds) differs
from those mentioned so far in that neither constituent describes what kind of
object the compound is. For example, there is no way for a non-speaker of English
to know that a redhead is a kind of person, but that a blackhead is a kind of
pimple.9 Other examples are redwing (a bird), yellow jacket (a bee), redcoat (a
soldier), greenback (a bill), bigmouth (a person), and big top (a tent). The mor-
phological rule involved is 25; the semantic rule must be
[?N 1 [[?N]
(27) THING WITH A Z j <-4 [A.]
[ WHICH IS w \ [?A ]
This expresses the generalization inherent in these compounds, but it leaves open
what kind of object the compound refers to. The information measure gives as the
cost of redhead,for example, the information that there is a word, plus the informa-
tion that a redhead is a person (a more fully specified form of THING in 27), plus
the cost of referring to 27. This evaluation reflects precisely what a speaker must
learn about the word.
A transformationaltheory of compound formation, on the other hand, encounters
severe complication with this class of compounds. Since a compounding trans-
formation must preserve functional semantic content, the underlying form of
redheadmust contain the information that a redhead is a person and not a pimple,
and this information must be captured somehow in rule features (or derivational
constraints) which are idiosyncratic to the word redhead. I am sure that such con-
straints can be formulated, but it is not of much interest to do so. The need for
9 I am grateful to Phyllis Pacin for this example.
658 LANGUAGE, VOLUME 51, NUMBER 3 (1975)
these elaborate rule features stems from the nature of transformations. Any
phrase-markertaken as input to a particular transformation corresponds to a set
of fully specified output phrase-markers. In the case of exocentric compounds, the
combination of the two constituent words by rule 27 does not fully specify the
output, since the nature of THINGin 27 is inherently indeterminate.
We thus see an important empirical difference between lexical redundancy
rules and transformations: it is quite natural and typical for lexical redundancy
rules to relate items only partially, whereas transformations cannot express partial
relations. Several illustrations of this point have appeared already, in the mor-
phological treatment of perdition and cranberryand in the semantic treatment of
transformationalist.However, the case of exocentric compounds is perhaps the
most striking example, since no combination of exception features and hypothetical
lexical items can make the transformational treatment appear natural. The lexicalist
treatment, since it allows rules to relate exactly as much as necessary, handles
exocentric compounds without any remarkable extensions of the machinery.
VERBS.There is a large class of verbs which have both transitive
5.3. CAUSATIVE
and intransitive forms;10 e.g.,
(28) a. The door opened.
b. Bill opened the door.
(29) a. The window broke.
b. John broke the window.
(30) a. The coach changed into a pumpkin.
b. Mombi the witch changed the coach from a handsome young man
into a pumpkin.
It has long been a concern of transformational grammarians to express the fact
that the semantic relations of door to open, of window to break, and of coach to
change are the same in the transitive and intransitive cases.
There have been two widely accepted approaches, both transformational in
character. The first, that of Lakoff 1971b, claims that the underlying form of the
transitive sentence contains the intransitive sentence as a complement to a verb of
causation-i.e., that the underlying form of 28b is revealed more accurately in the
sentence Bill caused the door to open. The other approach, case grammar, is that of
Fillmore 1968. It claims that the semantic relation of door to open is expressed
syntactically in the deep structures of 28a and 28b, and that the choice of subject
is a purely surface fact. The deep structuresare taken to be 31laand 31b respectively:
(31) a. past open [objective the door]
b. past open [objective the door] [Agentive by Bill]
These proposals and their consequences have been criticized on diverse syntactic
and semantic grounds (cf., e.g., Chomsky 1972, Fodor 1970, and Jackendoff
1972, Chap. 2); I do not intend to repeat those criticisms here. It is of interest to
note, however, that Lakoff's analysis of causatives is the opening wedge into the
generative semanticists' theory of lexicalization: if the causative verb break is the
result of a transformation, we would miss a generalization about the nature of
10 This class may also include the two forms of begin proposed by Perlmutter 1970.
MORPHOLOGICAL AND SEMANTIC REGULARITIES IN THE LEXICON 659
agentive verbs by failing to derive the causative verb kill by the same transformation.
But since kill has as intransitive parallel not kill but die, and since there are many
such causative verbs without morphologically related intransitives, the only way
to avoid an embarrassing number of exceptions in the lexicon is to perform lexical
insertion AFTERthe causative transformation, as proposed by McCawley 1968.
Again, the difficulty in this solution lies in the nature of transformations. There
are two cross-classifying generalizations which a satisfactory theory must express:
all causative verbs must share a semantic element in their representation; and the
class of verbs which have both a transitive causative form and an intransitive non-
causative form must be described in a general fashion. Expressing the second
generalization with a transformation implies a complete regularity, which in turn
loses the first generalization; McCawley's solution is to make a radical move to
recapture the first generalization.
There remains the alternative of expressing the second generalization in a way
that does not disturb the first. Fillmore's solution is along these lines; but he still
requires a radical change in the syntactic component, viz. the introduction of case
markers.
The lexicalist theory can leave the syntactic component unchanged by using the
power of the lexicon to express the partial regularity of the second generalization.
The two forms of break are assigned separate lexical entries:
7/brak/
(32) a. NP ]
,NPi BREAK
7/brtk/
+V
+[NP2 ' NP1]
_NP2 CAUSE (NP, BREAK)_
The two forms are related by the following morphological and semantic rules:"
(33) [+V
a[ KV]
11Since 32a is an identity rule, it is possibly dispensable.I have includedit here for the sake of
explicitness, and also in order to leave the form of the information measure unchanged.
12
Perhapsthe use of the identityrule 32a could make the two words count as one, if this were
desirable.I have no intuitions on the matter, so I will not bother with the modification.
660 LANGUAGE, VOLUME 51, NUMBER 3 (1975)
This solution permits us still to capture the semantic similarity of all causative
verbs in their lexical entries; thus die and kill will have entries 34a and 34b
respectively:
-
-/d/
(34) a. +
+a [NP,__]
NP1 DIE
-/kil/ 1
+V
+ [NP2 NP1]
NP2 CAUSE(NP1 DIE)
Die and kill are related semantically in exactly the way as the two entries of break:
one is a causative in which the event caused is the event described by the other.
However, since there is no morphological rule relating 34a-b, the information
measure does not relate them; the independent information contained in the two
entries is the fact that there are two words, plus all the information in both entries.
Thus the lexicalist theory successfully expresses the relation between the two
breaks and their relation to kill and die, without in any sense requiring kill and die
to be exceptional, and without making any radical changes in the nature of the
syntactic component.
A further possibility suggested by this account of causative verbs is that the
partial regularitiesof the following examples from Fillmore are also expressed in the
lexicon:
(35) a. Bees swarmed in the garden.
We sprayed paint on the wall.
b. The garden swarmed with bees.
We sprayed the wall with paint.
Fillmore seeks to express these relationships transformationally, but he encounters
the uncomfortable fact that the (a) and (b) sentences are not synonymous: the (b)
sentences imply that the garden was full of bees and that the wall was covered with
paint, but the (a) sentences do not carry this implication. Anderson 1971 shows
that this semantic difference argues against Fillmore's analysis, and in favor of one
with a deep-structure difference between the (a) and (b) sentences. A lexical treat-
ment of the relationship between the two forms of swarmand spray could express
the difference in meaning, and would be undisturbed by the fact that some verbs,
such as put, have only the (a) form and meaning, while others, such as fill, have
only the (b) form and meaning. This is precisely parallel to the break-break vs.
die-kill case just discussed.
Consider also some of the examples mentioned in Chomsky 1970. The relation of
He was amused at the stories and The stories amused him can be expressed in the
lexicon, and no causative transformation of the form Chomsky proposes need be
invoked. The nominalization his amusementat the stories contrasts with *the stories'
amusement of him because amusement happens to be most directly related to the
adjectival amused at rather than to the verb amuse. Other causatives do have
nominalizations, e.g. the excitation of the protons by gamma rays. I take it then that
MORPHOLOGICAL AND SEMANTIC REGULARITIES IN THE LEXICON 661
the existence of only one of the possible forms of amusement is an ad-hoc fact,
expressed in the lexicon.
Chomsky also cites the fact that the transitive use of grow, as in John grows
tomatoes, does not form the nominalization *the growth of tomatoes by John.
Rather the growth of tomatoes is related to the intransitive tomatoes grow. Again
we can express this fact by means of lexical relations. This time, the relation is
perhaps more systematic than with amusement,since nouns in -th, such as width and
length, are generally related to intransitive predicates. Thus the meaning of growth
can be predicted by the syntactic properties of the redundancy rule which introduces
the affix -th. The transitive grow does in fact have its own nominalization: the
growing of tomatoes by John. Thus Chomsky's use of causatives as evidence for the
Lexicalist Hypothesis seems incorrect-in that causatives do have nominalizations,
contrary to his claim. But we can account for the unsystematicity of the nominaliza-
tions, as well as for what regularities do exist, within the present framework.
Note also that our account of causatives extends easily to Lakoff's class of
inchoative verbs (1971b). For example, the relation of the adjective open to the
intransitive verb open ('become open') is easily expressed in a redundancy rule
similar to that proposed for causatives.
As further evidence for the lexicalist theory, consider two forms of the verb
smoke:
smoke) by the redundancy rule that also gives the nouns drink, desire, wish, dream,
find, and experience. The verb milk (as in milk a cow) is related to the noun as
smoke1 and smoke3 are related, but without an intermediate *The cow milked
('The cow gave off milk'); the relation between the two milks requires two sets of
redundancy rules used together. We thus see the rich variety of partial regularities
in lexical relations: their expression in a transformational theory becomes hard to
conceive, but they can be expressed quite straightforwardlyin the lexicalist frame-
work.
5.4. IDIOMS. Idioms are fixed syntactic constructions which are made up of words
already in the lexicon, but which carry meanings independent of the meanings of
their constituents. Since the meanings are unpredictable, the grammar must rep-
resent a speaker's knowledge of what constructions are idioms and what they
mean. The logical place to list idioms is of course in the lexicon, though it is not
obvious that the usual lexical machinery will suffice.
Fraser 1970 discusses three points of interest in the formalization of idioms.
First, they are constructed from known lexical items; the information measure,
which measures how much the speaker must learn, should reflect this. Second,
they are for the most part constructed in accordance with known syntactic rules
(with a few exceptions such as by and large), and in accordance with the syntactic
restrictions of their constituents. Third, they are often resistant to normally applic-
able transformations; e.g., The bucket was kicked by John has only the non-idio-
matic reading. I have nothing to say about this third consideration, but the first
two can be expressed in the present framework without serious difficulty.
Let us deal first with the question of the internal structure of idioms. Since we
have given internal structure to items like compensationand permit, there seems
to be nothing against listing idioms too, complete with their structure. The only
difference in the lexical entries is that the structure of idioms goes beyond the word
level. We can thus assign the lexical entries in 37 to kick the bucket, give hell to, and
take to task.13
(37) a. [NP1 [vp [vkik] [NP [Art63] [Nbukat]]]]
As with ordinary lexical entries, the strictly subcategorized NP's must have a
specific grammatical relation with respect to the entry, and this is indicated in the
entries of 37. In the case of take NP to task, the strictly subcategorized direct
object is in fact surrounded by parts of the idiom; i.e., the idiom is discontinuous.
But in the present theory, this appears not to be cause for despair, as our formalisms
seem adequate to accommodate a discontinuous lexical item.
This last observation enables us to solve a puzzle in syntax: which is the under-
lying form in verb-particle constructions, look up the answer or look the answer
up? The standard assumption (cf. Fraser 1965) is that the particle has to form a
deep-structureconstituent with the verb in order to formulate a lexical entry; hence
look up the answer is underlying, and the particle movement transformation is a
rightward movement. But Emonds 1972 gives strong syntactic evidence that the
particle movement rule must be a leftward movement. He feels uncomfortable about
this result because it requires that look ... up be discontinuous in deep structure;
he consoles himself by saying that the same problem exists for take ... to task,
but does not provide any interesting solution. Having given a viable entry for
take ... to task, we can now equally well assign discontinuous entries to idiomatic
verb-particle constructions, vindicating Emonds' syntactic solution.
By claiming that the normal lexical-insertion process deals with the insertion of
idioms, we accomplish two ends. First, we need not complicate the grammar in
order to accommodate idioms. Second, we can explain why idioms have the
syntactic structure of ordinary sentences: if they did not, the lexical insertion rules
could not insert them onto deep phrase markers. Our account of idioms thus has
the important virtue of explaining a restriction in terms of already existing con-
ventions in the theory of grammar-good evidence for its correctness.
Now that we have provided a way of listing idioms, how can we capture the
speaker's knowledge that idioms are made up of already existing words? To relate
the words in the lexicon to the constituents of idioms, we need morphological
redundancy rules. The appropriate rules for kick the bucket must say that a verb
followed by a noun phrase forms a verb phrase, and that an article followed by a
noun forms a noun phrase. But these rules already exist as phrase-structure rules
for VP and NP. Thus, in the evaluation of idioms, we must use the phrase-structure
rules as morphological redundancy rules. If this is possible, the independent in-
formation in kick the bucket will be the information that it is a lexical entry, plus
the semantic information DIE, plus the cost of referringto the phrase-structurerules
for VP and NP.
Though mechanically this appears to be a reasonable solution, it raises the
disturbing question of why the base rules should play a role in the information
measure for the lexical component. Some discussion of this question will appear
in ?7. At this point I will simply note that this solution does not have very drastic
consequences for grammatical theory. Since the base rules can be used as redun-
dancy rules only if lexical entries go beyond the word level, no descriptive power
is added to the grammar outside the description of idioms. Therefore the proposal
is very limited in scope, despite its initially outrageous appearance.
If the base rules are used as morphological redundancy rules for idioms, we
might correspondingly expect the semantic projection rules to be used as semantic
664 LANGUAGE, VOLUME 51, NUMBER 3 (1975)
redundancy rules. But of course this cannot be the case, since then an idiom would
have exactly its literal meaning, and cease to be an idiom. So we must assume that
the permissiveversion of the information measure is being used: both morphological
and semantic redundancy rules exist, but only the morphological rules apply in
reducing the independent information in the idiom. This is further evidence that the
permissive version of the information measure must be correct.
Note, by the way, that a transformational theory of nominalization contains
absolutely no generalization of the approach that accounts for idioms. Thus the
lexicalist hypothesis proves itself superior to the transformational hypothesis in a
way totally unrelated to the original arguments deciding between them.
In the present theory, the Afrikaans lexicon contains three morphological rules
for noun compounds:
b-L+N l Wyl
1+N]
[I[NX] [NY]!] {[IxI]
(39)a ]^ r/xd ]
(a) [+ [V+pres]] [+[V+past]J
b f/CoVCo/ ir/CoVCo+t/
b L+[V+pres]J [+[V+past]]
Here 39a is the regular rule for forming past tenses, and the other three represent
various irregular forms: 39b relates keep-kept, dream-dreamt, lose-lost, feel-felt
etc.; 39c relates tell-told, cling-clung, hold-held, break-broke etc.; the very marginal
and strange 39d relates just the six pairs buy-bought, bring-brought,catch-caught,
fight-fought, seek-sought, and think-thought. Note that 39b-c take over the
function of the 'precyclic re-adjustment rules' described by Chomsky & Halle
(209-10).15
A final preliminary point in this example: in the evaluation of a paradigm by the
information measure, I assume that the information that a word exists is counted
only once for the entire paradigm. Although one does have to learn whether a verb
has a nominalization, one knows for certain that it has a past tense, participles, and
a conjugation. Therefore the information measure should not count knowledge
that inflections exist as anything to be learned.
Now let us return to the problem of measuring the cost of referring to a redun-
dancy rule. Intuitively, the overwhelmingly productive rule 39a should cost virtually
nothing to refer to; the overwhelmingly marginal rules 39b-d should cost a great
deal to refer to, but less than the information they render predictable. The disparity
in cost reflects the fact that, in choosing a past tense form, 39a is ordinary and
unremarkable, so one must learn very little to use it; but the others are unusual or
'marked' choices, and must be learned. We might further guess that 39b-c, which
each account for a fair number of verbs, cost less to refer to than 39d, which applies
to only six forms (but which is nevertheless perceived as a minor regularity). Still,
the pair buy-boughtcontains less independent information than the totally irregular
pair go-went, which must be counted as two independent entries.
These considerations lead to a formulation of the cost of reference something
like
(40) The cost of referring to redundancy rule R in evaluating a lexical entry
W is IR,W x PR,W, where IR,W is the amount of information in W
predicted by R, and PR.Wis a number between 0 and 1 measuring the
regularity of R in applying to the derivation of W.
For an altogether regular rule application, such as the use of 39a with polysyllabic
verbs, PR,W will be zero. With monosyllabic verbs and 39a, PR,W will be almost but
not quite zero; the existence of alternatives means that something must be learned.
For 39b-d, PR.Wwill be close to 1; their being irregular means that their use does
not reduce the independent information content of entries nearly as much as 39a.
In particular, 39d will reduce the independent information content hardly at all.
In fact, it is quite possible that the total information saved by 39d in the evaluation
15 I have not considered the question of how to extend the phonological generalization of
39c to other alternationssuch as mouse-mice,long-length.Perhapsthe only way to do this is to
retain the rule in the phonology, and simply let the lexical redundancyrule supply a rule feature.
But a more sophisticated account of the interaction of the morphological rules might capture
this generalizationwithouta rule feature; e.g., one could consider factoringmorphologicalrules
into phonological and syntactic parts, as we factored out separate morphologicaland semantic
rules in ?4. In any event, I am includingall the phonology in 39 because many people have been
dissatisfiedwith the notion of re-adjustmentrules: I hope that bringing up an alternativemay
stimulate someone to clarify the notion.
MORPHOLOGICAL AND SEMANTIC REGULARITIES IN THE LEXICON 667
of the six relevant pairs of lexical entries is less than the cost of stating the rule.
Our evaluation measure thus reflects the extremely marginal status of this rule. In
other cases, perhaps the nominalizing affixes and Afrikaans compounds, the various
possible derived forms are in more equal competition, and PR.Wwill have a value
of, say, 0.3.
I will not suggest a precise method of calculating PR,W,as I believe it would be
premature. However, the general concept of how it should be formulated is fairly
clear. Count a lexical pair related by R as an ACTUALuse of R. Count a lexical
entry which meets one term of the structural description of R, but in whose evalua-
tion R plays no role, as a NON-USEof R. For example, confuse counts as a non-use
of the rule introducing the -al nominal affix, since it meets the structuraldescription
of the verbal term of the rule, but there is no noun confusal. The sum of the actual
uses and the non-uses is the number of POTENTIAL
uses of R. PRW should be near
zero when the number of actual uses of R is close to the number of potential uses;
PRw should be near 1 when the number of actual uses is much smaller than the
number of potential uses; and it should rise monotonically from the former
extreme to the latter.
If phonological conditions can be placed on the applicability of a redundancy
rule, PR,Wdecreases; i.e., the rule becomes more regular. For example, if the actual
uses of 39b all contain mid vowels (as I believe to be the case), then this specification
can be added to the vowel in 39b, reducing the potential uses of the rule from the
number of monosyllabic verbs to the number of such verbs with mid vowels. Since
the number of actual uses of the rule remains the same, PRW is reduced; and,
proportionately, so is the cost of referring to 39b in the derivations where it is
involved.
It is obvious that this concept of PR,Wmust be refined to account for derivations
such as perdition with non-lexical sources; for compounding, where the number of
potential uses is infinite because compounds can form parts of compounds; and for
prefix-stem verbs, where the lexical redundancy rule does not relate pairs of items.
Furthermore, I have no idea how to extend the proposal to the evaluation of idioms,
where the base rules are used as lexical redundancy rules. Nevertheless, I believe
the notion of regularity of a lexical rule and its role in the evaluation measure for
the lexicon is by this point coherent enough to satisfy the degree of approximation
of the present theory.
This conclusion would fly in the face of all the evidence in ?5.3 against a transform-
ational account of compounds.
The way out of the dilemma must be to follow the empirical evidence, rather
than our preconceived notions of what the grammar should be like. We must
accept the lexicalist account of compounds, and change our notion of how creativity
is embodied in the grammar.
The nature of the revision is clear. Lexical redundancy rules are learned from
generaIizationsobserved in already known lexical items. Once learned, they make it
easier to learn new lexical items: we have designed them specifically to represent
what new independent information must be learned. However, after a redundancy
rule is learned, it can be used generatively, producing a class of partially specified
possible lexical entries. For example, the compound rule says that any two nouns
N1 and N2 can be combined to form a possible compound N1N2. The semantic
redundancy rules associated with the compound rule provide a finite range of
possible readings for N1N2. If the context is such as to disambiguate NiN2, any
speaker of English who knows N1 and N2 can understand N1N2 whether he has
heard it before or not, and whether it is an entry in his lexicon or not. Hence the
lexical rules can be used creatively, although this is not their usual role.
In ?5.4,I proposed that the description of idioms uses the phrase-structurerules
as lexical redundancy rules. In broader terms, the rules normally used creatively
are being used for the passive description of memorized items. Perhaps this change
in function makes more sense in light of the discussion here: it is a mirror image to
the creative use of the normally passive lexical redundancy rules.
We have thus abandoned the standard view that the lexicon is memorized and
only the syntax is creative. In its place we have a somewhat more flexible theory of
linguistic creativity. Both creativity and memorization take place in both the
syntactic and the lexical component. When the rules of either component are used
creatively, no new lexical entries need be learned. When memorization of new
lexical entries is taking place, the rules of either component can serve as an aid to
learning. However, the normal mode for syntactic rules is creative, and the normal
mode for lexical rules is passive.
Is there, then, a strict formal division between phrase-structure rules and
morphological redundancy rules, or between the semantic projection rules of deep
structure and the semantic redundancy rules? I suggest that perhaps there is not,
and that they seem so different simply because of the differences in their normal
mode of operation. These differences in turn arise basically because lexical rules
operate inside words, where things are normally memorized, while phrase-structure
rules operate outside words, where things are normally created spontaneously.
One might expect the division to be less clear-cut in a highly agglutinative language,
where syntax and morphology are less separable than in English.
To show that the only difference between the two types of rules is indeed in their
normal modes of operation, one would of course need to reconcile their somewhat
disparate notations and to show that they make similar claims. Though I will
not carry out this project here, it is important to note, in the present scheme, that
the syntactic analog of a morphological redundancy rule is a phrase-structurerule,
not a transformation. This result supports the lexicalist theory's general trend
MORPHOLOGICAL AND SEMANTIC REGULARITIES IN THE LEXICON 669
toward enriching the base component at the expense of the transformational com-
ponent.'6
8. SUMMARY.This paper set out to provide a theory of the lexicon that would
accommodate Chomsky's theory of the syntax of nominalizations. This required a
formalization of the notion 'separate but related lexical entries'. The formalization
developed uses redundancy rules not for part of the derivation of lexical entries, but
for part of their evaluation. I take this use of redundancy rules to be a major
theoretical innovation of the present approach.
In turn, this use of redundancy rules entails the formulation of a new type of
evaluation measure. Previous theories have used abbreviatory notations to reduce
the evaluation measure on the grammar to a simple count of symbols. But we have
seen that the usual notational conventions cannot capture the full range of general-
izations in the lexicon. Accordingly I have formulated the evaluation measure as
a minimization of independent information, measured by the rather complex
function 16 and its refinement in 40. The abandonment of the traditional type of
evaluation measure is a second very crucial theoretical innovation required for an
adequate treatment of the lexicon.17
The concept of lexical rules that emerges from the present theory is that they are
separated into morphological and semantic redundancy rules. The M-rules must
play a role, and the S-rules may, in every lexical evaluation in which entries are
related. Typically, the redundancy rules do not completely specify the contents of
one entry in terms of another, but leave some aspects open. This partial specification
of output is a special characteristic of lexical redundancy rules not shared by other
types of rules; I have used this characteristic frequently in arguing against trans-
formational solutions.
In the discussion of nominalizations, I have taken great pains to tailor the
information measure to our intuitions about the nature of generality in the lexicon.
In particular, attention has been paid to various kinds of lexical derivatives with
non-lexical sources, since these form an important part of the lexicon which is not
accounted for satisfactorily in other theories.
While our solutions were developed specifically with nominalizations in mind,
there is little trouble in extending them to several disparate areas in the lexicon.
I have shown that parallel problems occur in these other areas, and that the solution
for nominalizations turns out to be applicable. Insofar as the success of a theory
is measured by how easily it generalizes to other problems, this theory thus seems
quite successful for English. A more stringent test would be its applicability to
languages where morphology plays a much more central role.
Another measure of a theory's success is its salutary effect on other sectors of the
16
Halle 1973 argues for a view of lexical creativity very similar to that proposed here, on
similar grounds.
17
One might well ask whether the traditional evaluation measure has inhibited progress in
other areas of the grammaras well. I conjecturethat the approach to markingconventions in
SPE (Chapter9) suffersfor this very reason: Chomsky & Halle set up markingconventions so
that more 'natural' rules save symbols. If, instead, the markingconventions were used as part
of an evaluation measure on a set of fully specified rules, a great deal of their mechanical
difficultymight well be circumventedin expressingthe same insights.
670 LANGUAGE, VOLUME 51, NUMBER 3 (1975)
theory of grammar. The most important effect of the present theory is to eliminate
a major part of the evidence for Lakoff's theory of exceptions to transformations
(1971b): the lexicon has been set up to accommodate comfortably both regular
and ad-hoc facts, with no sense of absolute exceptionality; and transformations
are not involved in any event. Since (in Jackendoff 1972) I have eliminated another
great part of Lakoff's evidence, virtually all of Lakoff's so-called exceptions are now
accounted for in a much more systematic and restricted fashion. We also no longer
need hypothetical lexical entries, a powerful device used extensively by Lakoff.
With practically all of Lakoff's evidence dissolved, we see that the theory of
exceptions plays a relatively insignificant role in lexicalist grammar. A small
dent has also been made in the highly controversial area of idiosyncratic phono-
logical re-adjustment rules, though much further work is needed before we know
whether they are eliminable.
There are three favorable results in syntax as well. First and most important, the
analysis of causative verbs, which supposedly provides crucial evidence for the
generative semantics theory of lexicalization, can be disposed of quietly and without
fuss, leaving the standard theory of lexical insertion intact. Second, idioms can be
listed in the lexicon and can undergo normal lexical insertion; some of their syn-
tactic properties emerge as an automatic consequence of this position. Third, the
direction of the English particle movement transformation can finally be settled
in favor of leftward movement.
Thus a relatively straightforward class of intuitions about lexical relations has
been used to justify a theory of the lexicon which has quite a number of significant
properties for linguistic theory. Obviously, many questions remain in the area of
morphology. I would hope, however, that this study has provided a more congenial
framework in which to pose these questions.
REFERENCES
ANDERSON, S. 1971. On the role of deep structurein semanticinterpretation.Founda-
tions of Language7.387-96.
BOTHA, RUDOLF P. 1968. The function of the lexicon in transformationalgenerative
grammar. The Hague: Mouton.
CHOMSKY, N. 1965. Aspects of the theory of syntax. Cambridge, Mass.: MIT Press.
. 1970. Remarks on nominalization. In Jacobs & Rosenbaum, 184-221.
. 1972. Some empirical issues in the theory of transformational grammar. Goals
of linguistic theory, ed. by S. Peters, 63-130. Englewood Cliffs, N.J.: Prentice-
Hall.
, and M. HALLE.1968. The sound pattern of English. New York: Harper & Row.
EMONDS,J. E. 1972. Evidence that indirect object movement is a structure-preserving
rule. Foundations of Language 8.546-61.
FILLMORE, C. 1968. The case for case. Universals in linguistic theory, ed. by E. Bach &
R. Harms, 1-88. New York: Holt, Rinehart & Winston.
FODOR,JERRY.1970. Three reasons for not deriving 'kill' from 'cause to die'. Linguistic
Inquiry 1.429-38.
FRASER,B. 1965. An examination of the verb-particle construction in English. MIT
dissertation.
. 1970. Idioms within a transformational grammar. Foundations of Language
6.22-42.
MORPHOLOGICAL AND SEMANTIC REGULARITIES IN THE LEXICON 671