Professional Documents
Culture Documents
Chapter Outline
Key Terms
Merge, constituent, head, phrase, node, hierarchy, dominate.
(1)
Another test is the following. The question How do they taste? has to
be answered by something adjectival. Delicious! would be an appropriate
answer, but Sausages! would not be. Delicious sausages! is also an
inappropriate answer to the same question, showing that the combination of
the adjective and noun behaves like a noun, not like an adjective.
So delicious sausages behaves syntactically just like a noun and not
like an adjective. But didn’ t we say before that what makes a noun a noun
(or actually: what makes a word carry a feature [N]) results from its
syntactic behaviour? If that is the case, then we must conclude that the
combination of an adjective and a noun (or the combination of something
with a feature [A] and something with a feature [N]) must also carry the
feature [N]. Put differently, a combination of an element with feature [A]
and an element with feature [N] inherits the grammatical feature of the
noun. The newly created constituent also has the feature [N], indicating that
it shares this categorial property with one of its components. This is
indicated in (2 ) by the added [N] feature to the whole constituent:
(2) [ delicious[A] sausages[N] ][N]
Note that what we are doing here is a variation on the substitution test
introduced in chapter 1 , based on the following generalisation :
(3) If two elements X and Y share the same syntactic features, then in
every grammatical sentence that contains X you can replace X by Y
(and vice versa) and the sentence remains grammatical.
Note, by the way, that it is not because the verb follows the adverb that
the verb determines the categorial status of the new constituent. The so-
called linear order (the order in which the words are pronounced) is
irrelevant. The combination often reads in the sentence John often reads can
be replaced by reads often , which shows that both often reads and reads
often behave like a verb.
Now we can make a generalisation over all combinations of two
constituents: whenever two constituents (both of course carrying some
feature) can be combined, then the feature of one of them becomes the
feature of the newly created constituent. Such a generalisation captures all
the concrete cases that we have seen so far, and many more.
Note that this does not mean that all constituents can simply be
combined into larger constituents (very reads is bad, for instance). Rather, it
says that whenever two constituents can be combined, in other words when
the combination is good English, one of the two constituents determines the
feature of the newly created constituent. Let’ s be precise. We say that
whenever there is a grammatical combination of two constituents that each
carries some feature, let’ s say [X] and [Y], one of the two determines the
feature of the whole, so either [X] or [Y]. This we can formalise as in (6 )
below:
(7) Every constituent has a feature that is the same as the feature of
one of the words in it.
Again, this generalisation is true for all types of constituents, not just
of nominal ones. In a similar way, we can say that a constituent consisting
of a verb and two adverbs rather than one (often reads quietly ) also carries
the feature [V], and a constituent consisting of an adjective and two adverbs
(quite seriously late ) also carries the feature [A].
To conclude, if you look at the simplest syntactic structure , one
consisting of two words, then one of these words determines the properties
of the new constituent. And if you look at a more complex constituent then,
no matter how complex it is, one word in there is always responsible for the
behaviour of the entire constituent. What we need now is a mechanism that
derives this generalisation. In other words, what we need is a theory.
Exercises
d. rather ridiculously
(10)
(11)
(12)
This structure states that very delicious behaves like an adjective and
that the adjective delicious is therefore the head. The entire constituent thus
has the feature [A]. In the same way, the merger of a verb and an adverb (in
whatever order) yields a VP (Verb Phrase), as shown below. This ensures
that the entire constituent has the feature [V] and behaves verbally.
(13)
(14)
It is quite standard in the syntactic literature, however, to reserve the
_P notation for the highest node of the phrase. Lower phrases of the same
categorial type, such as the node dominating delicious sausages , are
marked as intermediate, and the notation used for such levels is ʹ (said: bar
). The structure of (14 ) with this notation is then the following:
(15)
(17)
Here, Mary is the highest nominal category; the node above it lacks the
feature [N]. We can tell, because the node straight above NP is VP, and VP
is not nominal. Therefore, Mary is a phrase, and therefore we call it NP,
although it consists of a single word only. Now, because we want to be
consistent, we should do the same for the representations in (11 )– (13 ),
where the adjective in (11 ) and the adverbs in (12 ) and (13 ) should be
treated as phrases as well. We should therefore add the Ps to the
representations in the following way:
(18)
For now, this may look like a notational trick, but we will see later on
that an analysis of phrases that includes single-word phrases has various
advantages.
Recall that we argued in chapter 1 that nominal phrases (like delicious
sausage , or delicious sausages ), proper names (like Mary ) and pronouns
(like her, it and he ) share the same syntactic behaviour: we can for instance
replace one with the other. For this reason, we concluded that all these
constituents carry a [D] and an [N] feature, even when you don’ t see the
element with the [D] feature. Now, when you look at the representations of
nominal phrases in this section, you can clearly see their ‘ [N]-ness’ , but
you may wonder what happened to their ‘ [D]-ness’ , as there is no [D]
feature (or D node) in sight. We will forget about [D]/D for the moment,
and return to it in section 4.3 , where it will be responsible for an important
development in the theory.
(19)
There is no restriction on the length of what can replace the dots
because the theory simply does not say anything about it. The theory
requires that the VP has a head, which it does, and that is all. We therefore
expect that a verb can be combined with an [N] category consisting of a
single word, as in (to) know Mary . Alternatively, it can be combined with a
bunch of words. This is indeed correct. (To) know Mary is a correct
expression in English, but so is (to) know beautiful Mary .
(20)
At the same time, all these constituents that the verb can merge with,
regardless of whether it is Mary, beautiful Mary or beautiful Mary from
Italy , have a feature [N]. So the verb can merge with anything that is a
nominal phrase, i.e. a phrase with the noun as a head.
Let us try to put this into a precise but very general statement. The
phrase that merges with some head is always the highest phrase of that
categorial type, which we indicate by using the ‘ P’ . For example, since the
verb in (20 ) heads a phrase of its own, the nominal phrase it merges with
must be a full NP. The fact that the non-head is always a maximal phrase
(YP, where Y can be any category) can be formalised as in (21 ):
(21) A constituent that merges with a syntactic head X is always a
maximal phrase: [X YP]XP
(22)
a. in trees
b. in the trees
e. …
One issue that may pop up, though, is that, as opposed to NPs, PPs
cannot consist of one word, namely the preposition only. In trees is a well-
formed syntactic constituent, but just in is not. Note for instance that a verb
can be combined with an NP consisting of just a noun (as in I love trees ) or
with a more complex NP (for instance, I love these trees ). On the other
hand, a verb can be combined with a PP (for instance, They live in trees )
but not with just a preposition (*They live in ). This does not run against the
generalisation, though, as it only states that heads merge with maximal
phrases. It follows from the generalisation that if you merge a head with a
constituent of which the P is a head, this head merges with the PP. The fact
that a PP cannot consist of just a preposition, unlike an NP, which can
consist of only a noun, must then have an independent reason, which
according to most linguists working on prepositions lies in their meaning
(you can’ t talk about in or on without specifying in or on what).
We have seen two examples of the pattern in (21 ) now, one in which
X is a verb and one in which X is a preposition. It is quite easy to show that
in other cases, where X is, for instance, an N or A, it can combine with a
constituent consisting of more than one word too. Wherever afraid of John
can appear afraid can appear as well, showing that afraid of John is an
adjectival phrase (an AP) in which an A is combined with a PP, and this PP
consists of more than one word (here, of John ). This creates the structure in
(23a ). In the same vein, a noun like destruction can occur in every position
where destruction of Rome can occur, which shows that we are dealing with
an NP in which a noun is combined with a PP, and this PP consists of more
than one word. The structure is provided in (23b ) .
(23)
We have now reached the conclusion that any head X obeys the
statement in (21 ), even when this X merges with one single word. It is
important to see that this generalisation is a consequence of the Merge
mechanism, which puts a clear restriction on what can be a head (one word)
but no restriction on the constituent that a head combines with (one or more
words).
In our illustration of (21 ), all examples so far have involved merging a
head with something on the right. Would that mean that (21 ) is only valid
with mergers in which the head is on the left? No. There is nothing in (21 )
that is sensitive to left– right distinctions. The mechanism in (21 ) holds for
every merger involving the head, both on the left and on the right. Let us
now look at an example involving the left side of the head. We saw earlier
that a noun can be modified by an adjective to yield a nominal phrase, like
delicious sausages . We also saw back in example (12 ) that very delicious
is an adjectival phrase. So we predict that we can also merge a noun like
sausages with the AP very delicious . Of course, this prediction is correct as
well: very delicious sausages is a grammatical nominal phrase. Its structure
looks as follows:
(24)
It is also easy to show that two phrases within an NP, each consisting
of more than one word, can occur on different sides of the head. Suppose
we modify sausages from Italy by adding the information that they are very
delicious. This would give us the following structure:
(25)
Since it turns out that heads can be combined not with just one phrase
consisting of one or more words, but with several, the generalisation in (21
) can be broadened to the one in (26 ):
(26) Every maximal phrase, XP, contains minimally a head X but can
in addition contain one or more phrases: [… (ZP) X (YP) … ]XP
Whether ZP and YP occur on the right or left of the head does not
really matter (and is a topic that we will not deal with much until chapter 9
).
To sum up, we have proposed Merge as the operation responsible for
the creation of larger constituents out of smaller constituents. This operation
requires that every phrase contains minimally a head. Apart from the head,
this phrase may contain one or more other phrases and these other phrases
can consist of one or several words. This generalisation is valid no matter
what type of phrase we are dealing with. It is, in other words, a cross-
categorial generalisation. A fundamental property that products of Merge
share is that they are hierarchies . It is this property that will become
important in the next section .
Exercise
a. really stupidly
e. really
2.3 Consequences: Testing the Predictions of
Merge
We now have a theory of syntactic structure predicting that every syntactic
constituent consisting of more than one word has an internal hierarchical
structure. The Merge hypothesis predicts that a phrase can consist of two
constituents, and each of these constituents can again consist of two
constituents, etc. Eventually, you will reach the word level. But how do we
know that this is the correct way of looking at syntactic structures?
The best way to test this, and to test hypotheses in general, is to see
what kinds of predictions this hypothesis makes. If these predictions are
incorrect, the hypothesis must be rejected, or at least modified. For instance,
the hypothesis that the earth is flat predicts that you cannot sail around it.
Since you can, you know this hypothesis is false.
But what if the predictions are borne out? You might cheer and think
your hypothesis is correct (and this happens way too often). But you must
be more careful. To come up with an example: suppose you entertain the
hypothesis that the sun rotates around the earth (as many people thought for
a long time). You might say that this predicts that roughly half of the day
you can see the sun and the other half of the day you can’ t. This is
obviously correct, but at the same time we know now that the hypothesis is
wrong. In fact, this ‘ correct prediction’ does not tell us anything. The
reason is that another hypothesis, the one that says that the earth moves
around the sun, makes exactly the same prediction, namely that there are
days and nights.
What does this tell us? It tells us that whenever you want to test a
particular hypothesis you should not only show that the predictions that one
hypothesis makes are correct, but also that the alternative hypotheses do not
make these correct predictions. As it turns out, the hypothesis that takes the
sun to be the centre of our solar system explains the orbits of other planets
better than the hypothesis that the earth is the centre. The behaviour of other
planets, then, falsifies the earth-central hypothesis and confirms the sun-
central hypothesis.
So if we want to check whether the Merge hypothesis is correct, we
need to compare the predictions of this hypothesis with the predictions
made by other hypotheses about syntactic structures. Now, what would be
an alternative hypothesis to the idea that phrases and sentences are
hierarchical structures built up with the operation Merge, the way we saw
earlier? This would for instance be a hypothesis stating that sentences are
not hierarchical, and that every word in a complex constituent is just a bead
on a string; under this hypothesis constituents (and sentences) would lack
internal structure beyond individual words. Such a hypothesis would simply
say that every word is glued to another word in the phrase. For expensive
delicious sausages , this would mean that delicious is glued to sausages and
expensive is glued to delicious . We therefore call this hypothesis the Glue
hypothesis. This leads to the following two representations for expensive
delicious sausages :
(27)
Now we have a very clear difference between the Merge hypothesis
and the Glue hypothesis. The Merge hypothesis predicts that every syntactic
constituent has an internal hierarchical structure. The Glue hypothesis
predicts that syntactic constituents do not have an internal structure beyond
the words that are glued together. To give a concrete example, under the
Merge hypothesis delicious sausages is a constituent within the constituent
expensive delicious sausages , whereas this is not the case under the Glue
hypothesis: under the latter hypothesis there is one big constituent,
consisting of identifiable words only. If we want to argue in favour of the
Merge hypothesis and against the Glue hypothesis, then, we must provide
evidence for the existence of such intermediate constituents as delicious
sausages .
Now, there are two types of evidence that we can distinguish, not just
in this case but in scientific research in general, namely conceptual
evidence and empirical evidence . One theory can be considered superior to
an alternative theory if it is conceptually simpler, more attractive, more
elegant, more beautiful, or requires fewer assumptions. In addition, a theory
can be considered superior to an alternative if it can account for more
empirical observations: in that event, the theory is empirically superior.
Both types of evidence in favour of the Merge hypothesis can be put
forward. We will start in section 2.3.1 with empirical evidence for Merge
based on the syntactic behaviour of phrases. Then we will continue in
section 2.3.2 with showing that Merge is conceptually attractive once you
start to compare it to alternative theories.
(28)
(29)
(30)
(31)
(32)
But this simple generalisation is only possible if those [N] nodes exist
in the first place. Since under the Glue hypothesis delicious sausages is not
a constituent, no such simple generalisation is possible. Of course, we could
strengthen our point by simply merging more adjectives to the structure.
Each additional adjective will create an additional subconstituent, and this
subconstituent can in its turn be replaced by ones . We therefore have a
situation in which the data can easily be captured by the one theory but not
by the other.
We conclude on the basis of the data in (28 )– (31 ) that pronouns can
be used to replace syntactic constituents and syntactic constituents only.
Therefore, we have good reason to believe that the substitution test can be
used as a diagnostic for constituency. If so, the test reveals the existence of
multi-word constituents within multi-word constituents, just as Merge
expects. We should be careful, however, not to take this too hastily as
evidence for the Merge hypothesis. What we are in fact saying is the
following: the Queen of Denmark frequently is not a constituent because we
cannot replace it by a pronoun. And the pronoun her cannot refer back to
the Queen of Denmark frequently , because that is not a constituent. In
effect, we have a case of circular reasoning , because you can only accept
the conclusion (which is: based on the substitution test the Queen of
Denmark frequently is not a constituent) if you also believe the premise,
namely that substitution by a pronoun reveals constituency to begin with.
What we need, therefore, is some independent evidence that tells us that the
substitution test indeed reveals constituency, as we strongly suspect. In
other words, we need a second diagnostic.
As we have seen, in our hierarchical representations every node is a
structural unit, or building block, and we have reserved the word ‘
constituent’ to refer to these units. Now, each individual word is a
constituent but, as we have seen, several words can together form a
subconstituent in an even bigger constituent. What we can observe, and this
will be our second diagnostic, is that in a lot of cases these subconstituents
can be moved to another place in the sentence, for instance to the front.
Take the example in (33 ):
(35)
We can observe that within this VP there is a subconstituent of the type
PP, namely in new leaders. In new leaders is a constituent because there is a
node in the tree, namely PP, that dominates in, new and leaders and nothing
else. Dominate here means: you can reach in, new and leaders from the PP
node by only going down; you cannot reach believe by only going down
from the PP node. Therefore, the PP does not dominate believe . In the same
vein, this PP in turn contains a subconstituent, namely new leaders . NP is a
node in the tree that dominates only new and leaders but not believe and in .
New leaders , therefore, is a constituent too. What we can observe is that
both the PP and the NP can be fronted (brought to the beginning of the
sentence) as well, showing quite clearly that they behave as syntactic units:
(36)
(37)
(38)
(39)
At the same time, the substitution test does work here, as ones can
replace delicious sausages and leaders without any problems: I especially
like expensive ones, Harry doesn’ t usually believe in new ones . So one test
succeeds and one test fails to show the constituency in these examples. It is
also possible that the movement test works but not the substitution test. And
Harry doesn’ t usually believe in ones doesn’ t sound good at all, although
the movement test clearly shows that new leaders is a syntactic constituent.
What this tells us is that these are one-way tests: if a string of words
can be moved or substituted by a pronoun, it is a constituent. If you cannot
move a string of words, however, you cannot conclude anything. And the
same is true if the substitution test fails. Compare this to the following. If it
rains, you get soaked. But if you are soaked, it does not automatically
follow that it is raining. Other factors may have caused this, for instance a
bucket of water that someone put on the door for you. The same is true for
the relation between constituency and constituency tests. A test may fail but
for a reason that has nothing to do with constituency. The fact that the tests
only work one way is what provides work for syntacticians. One of their
tasks is to unravel the restrictions on these processes, and we will do some
of this in later chapters (especially chapters 6 and 7 ). Whatever the
outcome of this research will be, though, there are two immediate lessons
for now.
One is that, as a student, you should always run both tests: if one
works (as with in new leaders ) or two work (as with the Queen of Denmark
), bingo. If both fail (as with the Queen of Denmark frequently ), you have
no evidence for constituency.
And second, it is fairly clear that the facts are much better handled by a
theory that uses hierarchical representations than one that uses linear
strings. The Merge hypothesis starts out with certain predictions (one
should in principle be able to move a constituent, or replace it by a
pronoun), whereas the Glue hypothesis assumes that there are no internal
constituents beyond the individual words, and therefore cannot predict these
facts .
(40)
(41)
So here is the conceptual evidence in favour of the Merge hypothesis.
If you ask yourself the question which of these two grammars is simpler,
there can be only one answer. The Merge grammar wins, as it only requires
one rule to build expensive delicious sausages ([A]+[N]), whereas the Glue
grammar needs two. In other words, the Merge grammar is a simpler and
more elegant grammar. This makes it conceptually more attractive.
But the situation deteriorates quickly for the Glue hypothesis if we
consider its predictions. The Glue grammar must contain a rule that
combines an [A] with another [A], as we have just seen. This rule not only
leads to a more complex grammar, it is actually a rule that we do not want
in the first place. The Glue grammar predicts that [A]+[A] constituents are
grammatical expressions in English, but they are not. As an answer to the
question ‘ How are your sausages?’ , you could say ‘ Expensive!’ , or you
could say ‘ Delicious!’ . But you cannot say ‘ Expensive delicious!’ To
exclude such examples, grammars must contain a rule that bans [A]+[A]
constituents in English. These rules are added in (42 ):
(42)
Note now that this additional rule can be simply added to the Merge
grammar but not to the Glue grammar for the simple reason that it would
create a contradiction with the already existing Rule 2: Glue ([A], [A]). A
grammar cannot contain a rule allowing a particular combination and at the
same time forbidding it. On the other hand, the Merge grammar does not
have any rules that contradict each other. This means that the combination
of the ungrammaticality of expensive delicious , and the grammaticality of
expensive delicious sausages , is evidence in favour of the Merge
hypothesis and against the Glue hypothesis, because only the first, unlike
the second, hypothesis can capture these facts.
In short, the Glue hypothesis does not give us what we need. This does
not automatically mean that the Merge hypothesis is what we need, since
we have only considered one alternative. Although it is impossible to go
through all the imaginable options, there is at least one alternative we
should look at. Let’ s call it the List hypothesis. It shares with the Glue
hypothesis the idea that syntactic structures are not hierarchically structured
but are just linear orderings. At the same time, it crucially deviates from the
Glue hypothesis in not requiring an [A]+[A] rule, i.e. it is not expected that
[A]+[A] combinations are possible. How would that work? Well, the List
grammar just lists all the linear strings that are grammatical, as in (43b ). It
just says that [A A N] is a grammatical string in English. There is no longer
any mechanism, such as Merge or Glue, that creates these strings. The
strings are just listed. The advantage is that we no longer need a Glue
operation that puts an [A] and an [A] together to create expensive delicious
sausages . And if we want, we can just state that the string [A A] is
ungrammatical, thereby mimicking the Rule 2 of the Merge grammar:
(43)
The List hypothesis does better than the Glue hypothesis in that it does
not run into the empirical problem we noted, namely how to exclude the
ungrammatical ‘ Expensive delicious!’ Conceptually, however, it is easy to
see that the List grammar still needs a larger set of statements than the
Merge grammar, namely three vs. two. This conceptual problem gets bigger
with every [A] that you add. The List grammar needs to add a statement to
the effect that [A + A + A + N] is also grammatical, and so is [A + A + A +
A + N]. There is no upper limit. And all that time, the Merge grammar can
take a nap, as it already covers all these grammatical strings.
Exercises
d. back in France
e. talk to me slowly
c. If you front a man with a knife , as in (ii), the sentence all of the
sudden has one meaning only. Why is this?
C8 The following constituent can have two trees. What are those
two trees? Justify your answer.