You are on page 1of 12

P.

Schlenker - Ling 1 - Introduction to the Study of Language, UCLA

Introduction to Language - Lecture Notes 4B

Sentence Structure II:


Phrase Structure Grammars

☞ Goal: How are sentences built (or 'generated', as linguist say)? Corresponding to the two hypotheses that
were considered in the preceding Lecture Notes, we discuss two possibilities. The first hypothesis, based on a
'word chain device' (formally called a 'finite state model' or a 'Markov model), yields sentences that have a flat
structure. We already found an argument against such a hypothesis in the preceding Lecture Notes- the
sentences of English do not have a flat structure. We show that this hypothesis has other defects as well. The
second hypothesis, by contrast, generates (=produces) sentences that do not have a flat structure. It involves
Phrase Structure Rules, which yield trees with labels added to indicate the syntactic category of each
constituent (e.g. Noun Phrase, Verb Phrase, etc.). The resulting tree is seen to recapitulate the process by
which a sentence is generated =produced) by the rules of grammar: a group of elements forms a constituent
whenever they have been introduced by the application of the same rule.

1 Review: Constituency

1.1 Summary: Trees

(i) In every sentence, certain groups of words form 'natural units' [=constituents] and may:
-stand alone
-be moved as a unit
-be replaced as unit by a pronoun
(ii) Trees encode the information about constituents: two expressions are a natural unit (=constituent) if there
is a sub-tree that contains them and nothing else.
(iii) A sentence that can be analyzed as 2 different trees is structurally ambiguous (e.g. Lucy will hit the
student with the book)

1.2 A Puzzle Explained: Question Formation

 The Puzzle (repeated from earlier Lecture Notes

Pinker discusses in Chapter 2 of The Language Instinct (p. 29) the example of question formation. If we wish
to form a question that corresponds to the assertion John is in the garden, we may simply move the auxiliary is
to the beginning of the sentence, yielding Is John __ in the garden? [here __ simply indicates that a word has
been displaced]. In a slightly more complex case, such as John is in the garden next to someone who is asleep,
we form the corresponding question by moving to the beginning of the sentence the first is, yielding Is John __
P. Schlenker - Ling 1 - Introduction to the Study of Language, UCLA

in the garden next to someone who is asleep? If we tried instead to move the second is, we would obtain a
sharply ungrammatical result ('ungrammatical' in the descriptive sense we will use throughout this course): *Is
John is in the garden right next to someone who __ asleep?
These contrasts are recapitulated in (1):
(1) a. John is in the garden next to someone who is asleep.
b. Is John __ in the garden next to someone who is asleep? (Move the first is)
c. *Is John is in the garden right next to someone who __ asleep? (Move the second is)
From these one might be tempted to infer that the rule of question formation is to systematically move to the
beginning of the sentence the first is which is uttered. Pinker shows that this hypothesis is incorrect, since it
predicts (incorrectly) that the question corresponding to (2)a is (2)b:
(2) a. A unicorn that is eating a flower is in the garden
b. *Is a unicorn that __ eating a flower is in the garden? (Move the first is)
c. Is a unicorn that is eating a flower __ in the garden? (Move the second is)
We do not discuss at this point what the correct rule is (it will turn out that it must be stated in more abstract
terms than 'moving the first is' or 'moving the second is'). But we observe that a child that only heard simple
cases of question formation (e.g. Is John __ in the garden?) would have to infer a rather complex and subtle
rule from limited data. For the same reason as in the case of integers mentioned above, the child must have
something to guide his acquisition of a rule that goes beyond the sentences that he has heard."

 The Solution: 'move the auxiliary which is immediately under the right-hand daughter of the root'
The solution of the puzzle is that the rule of question formation should be stated in terms of structure (i.e. in
terms of syntactic trees) rather than in terms of strings (=linear order). The rule of question formation in
English is to move to the beginning of the sentence (i.e. to add to the tree) the auxiliary which is immediately
under the right-hand daughter of the root (the root is the top-most node of the tree).
(3) a. b.

If Mary is replaced with the person who will be hired (clearly a constituent - for instance it may be replaced
with the pronoun 'he' or 'she'), the general structure of the sentence is not affected, and in particular the same
word will is moved which was moved in the simple sentence. Crucially, it is not the word will contained in the
person who will be hired which is moved - as one wants. This is illustrated in (4) [note that a triangle stands
for a constituent whose internal structure is omitted for simplicity; in homeworks you should specify the
complete structure of a tree, i.e. you should not use triangles, unless the exercise tells you to do so]:

2
P. Schlenker - Ling 1 - Introduction to the Study of Language, UCLA

(4) a. b.

Going back to our original puzzle with A unicorn is in the garden, we can apply exactly the same reasoning.
Constituency tests would lead one to posit the following structure, where a unicorn is a single constituent.
(5)

The rule of question formation can then be applied in the same way as in our earlier examples:
(6)

And just as we want, the rule functions in exactly the same way when a unicorn is replaced with a unicorn that
is eating flowers; and the right result is obtained:

3
P. Schlenker - Ling 1 - Introduction to the Study of Language, UCLA

(7) a. b.

2 An Incorrect Model: Finite State Grammars (=Markov Model)

A plausible -but incorrect- model is discussed by Pinker in Chapter 4 of The Language Instinct, the
Finite State Model (also called 'Markov Model'; Pinker also calls it a 'word chain device'). It is both natural
and historically important, since it was considered plausible until the 1950's. In a nutshell, it attributes to a
speaker a simple mental system that allows him or her to determine whether a given word can or cannot follow
another given word. Here is the example of a Finite State Model discussed by Pinker (I have added a 'START'
and an 'ACCEPT' states, which are implicit in Pinker's discussion; the idea is that you feed the sentence to the
machine, starting with the first word, one word after the other. If you end up in the ACCEPT state after the last
word has been processed, the sentence is accepted; otherwise the sentence is rejected):

(8)
happy

the boy ice cream


START a girl eats hot dogs ACCEPT
one dog candy
(9) Examples of sentences that are generated by (8):
a. the boy eats ice cream
b. the happy boy eats ice cream
c. the happy happy boy eats hot dogs
d. a happy happy girl eats candy
(10) Examples of ungrammatical sentences that are not generated by (8):
a. *boy the eats ice cream
b. *happy boy eats hot dogs
c. *hot dogs eats the dog

4
P. Schlenker - Ling 1 - Introduction to the Study of Language, UCLA

(11) Examples of grammatical sentences that are not generated by (8):


a. some boy eats ice cream
b. the dog that the dog eats eats ice cream
c. either the boy eats ice cream or the girl eats candy

There are two important arguments against the Finite State Model:

-Argument 1: It does not account for the tree-like structure of sentences that we observed in Lecture Notes 3B.

-Argument 2: It cannot properly account for 'long distance dependencies', i.e. constructions in which two
elements that depend on each other are separated by an arbitrary number of words.
(12) Example of a long distance dependency: either ... or...
a. Either John is sick or he is depressed
b. Either John thinks that he is sick or he is depressed
c. Either Mary knows that Johns thinks that he is sick or she is depressed
d. Either the boy eats hot dog or the dog eats hot dog
e. Either the happy happy boy eats hot dog or the dog eats candy
etc.

We could try to integrate the either ... or construction into our Finite State Model, but no simple solution
would work. To see this, observe that in the following model nothing requires that a sentence that starts with
either should also contain or somewhere down the road. And for good reason: in order to 'remember' this, the
model would need some kind of memory, which it lacks completely. The problem turns out to be very severe.
In fact, Noam Chomsky became famous in the 1950's by proving that no matter how complex a finite state
machine was, it could not handle all constructions of English.

(13)
happy

either the boy ice cream


START a girl eats hot dogs
ACCEPT
if one dog candy

or then

(14) Some grammatical sentences generated by (13)


a. Either a girl eats candy or a boy eats hot dogs
b. Either a happy girl eats candy or a boy eats hot dogs

5
P. Schlenker - Ling 1 - Introduction to the Study of Language, UCLA

(15) Some ungrammatical sentences generated by (13)


a. *Either a girl eats candy
b. *Either a happy girl eats candy

3 A Better Model: Phrase Structure Grammars


Our goal, then, is to devise a system of rules that addresses the two criticisms given in Argument 1 and
Argument 2 above. In other words, the system we are trying design should:
Requirement 1: Account for the tree-like structure that sentences have, and
Requirement 2: Provide an analysis of long-distance dependencies, i.e. constructions in which two elements
that depend on each other are separated by an arbitrary number of words.
We start with some properties that are satisfied by all or most sentences:
(i) All sentences have a verb (e.g. sleep, eat, claim) and an inflection, which may appear as an auxiliary (will,
might, can, should, did, do, does) or as an affix on the verb (the latter case will not be discussed here, as it
involves further complexities).
(ii) All sentences include, normally before the verb, a group of words that contains a noun, be it a common
noun (e.g. man, woman, table) or a proper name (John, Mary).

This is illustrated in the following sentences:


(16) a. John will sleep
b. The director will sleep
c. Mary will hit John
d. The director will criticize John
If we performed constituency tests on these sentences, we would see that they all start in the same way:
-first, they contain a constituent that includes a noun
-second, they contain a constituent of the form [Inflection + ___], where ___ is a constituent that contains a
verb.
(17) a. [John] [will [sleep]]
b. [The director] [will sleep]
c. [Mary] [will [hit John]]
d. [The director] [will [criticize John]]
The initial group that contains a noun we will call a Noun Phrase, NP for short. The group that contains a verb,
referred to as ____ above, will be called a Verb Phrase. The group [Inflection + Verb Phrase] will be called I'
(pronounced 'I bar'. I for inflection, ' (i.e. bar) to indicate that it contains other things in addition). With this
background, we can start writing our grammar. Because each sentence contains an inflection, it is called an
'Inflection Phrase', symbolized as IP.

IP → NP I' (a sentence consists of a Noun Phrase followed by an I bar)


I' → I VP (an I bar consists of an Inflection followed by a Verb Phrase).

We can now write the rest of the grammar:

6
P. Schlenker - Ling 1 - Introduction to the Study of Language, UCLA

I → will, might, can, should, does, did (an Inflection is: will, or might, or can, or should, or does, or did)
NP → PN, D N (a Noun Phrase comprises either a Proper Name/ProNoun alone, or a Determiner and a
Noun)
PN → John, Bill, Mary, Sam, he, she...
N → President, director, boy, girl, Dean, friend, mother...
VP → Vi, Vt NP, Vs CP (a Verb Phrase comprises either an intransitive Verb Vi alone, or a transitive verb Vt
followed by a Noun Phrase, or a verb of speech or though Vs followed by a Complementizer Phrase)

D → the, some, a, every, my, his, her...

Vi → sleep, run, snore, fall...


Vt → meet, date, hit, kill, criticize...
Vs → think, say, believe, claim...

CP → C IP (a Complementizer Phrase comprises a Complementizer followed by an Inflection Phrase)


C → that
Let us first go through some very simple examples. The tree is constructed from the top, applying one rule at
each step. For instance the fact that IP is the mother of NP and I' indicates that we have applied the rule:
IP → NP I'. Similarly the fact that I' is the mother of I and VP indicates that we have applied the rule:
I' → I VP, etc.

(18) IP

I'

NP I VP

Vi
PN will

Mary sleep

(19) IP

I'

NP I VP

D N Vi
will
the President
sleep
We can also generate some of the sentences that occupied us in Lecture Notes 3B:

7
P. Schlenker - Ling 1 - Introduction to the Study of Language, UCLA

(20) IP

I'

NP I VP

PN Vt NP
will
D N
Mary meet
the President

(21) IP
I'

NP
I VP

D N will Vt NP
D N
Your friend meet
the President
Crucially, we observe that our Phrase Structure Grammar generates sentences 'with the right structure', i.e.
with the tree-like structure that was discussed in Lecture Notes 3B. The only difference is that some non-
branching nodes have been added (reminder: a non-branching node is a node with just 1 daughter). When the
non-branching nodes and the labels are disregarded, we obtain exactly the trees that were argued for in Lecture
Notes 3B:
(22)

Mary will

meet
the President

(23)

will

Your friend meet


the President

8
P. Schlenker - Ling 1 - Introduction to the Study of Language, UCLA

We also note that our little grammar can generate more complex sentences, thanks in particular to our rule for
verbs of speech and thought (e.g. believe, think, claim, etc.), which can embed an Inflection Phrase within
another Inflection Phrase, as is shown below (the embedding of a constituent of a given category within
another constituent of the same category is called recursion; it is essential to generate an infinite language):

(24) IP
Recursion of IP (=an IP is
I' embedded within another IP)
NP I VP

PN Vs CP
will
C
Mary claim IP
that I'

NP I VP

PN will Vi

John sleep

Observe that nothing would prevent us from embedding the IP in (24) within a larger IP, e.g. John will think
that ____. Since this procedure can be repeated as many times as we want, our grammar can generate an
infinite number of sentences.

At this point it should already be clear that we have met Requirement 1: our grammar does account for the
tree structure that was argued for in Lecture Notes 3B. What about Requirement 2, then? Do we now have an
account of long-distance dependencies? We do, as soon as we add one rule to our little grammar:

IP → either IP1 or IP2, if IP1 then IP2

This rule generates trees such as the following:

(25)

either IP1 or IP2

9
P. Schlenker - Ling 1 - Introduction to the Study of Language, UCLA

It is then clear that by adding under IP1 or IP2 any of the trees that can be generated by our grammar, we
obtain a grammatical sentence. Requirement 2 has thus been met as well.

4 The Head Parameter


[This part of the Lecture Notes will probably not be discussed until Thursday, February 5th, 2004]
The constituents generated by our Phrase Structure Grammar have labels that indicate which element
gives them their 'crucial' properties. For instance a Verb Phrase is so-called because it always contains a verb
in a specified position. We say that the verb is the head of the Verb Phrase. A major property of natural
languages is that their constituents are headed.
We make a further observation, which is specific to English. A head always comes before its sister.
Linguists call the sister of a head its complement. Thus we can express the same fact by stating that in English
the head always comes before its complement. For instance, the inflection I comes before its complement VP;
the complementizer C comes before its complement IP; and a transitive verb Vt (e.g. hate) comes before its
complement NP (e.g. the President).
Interestingly, the position of the head relative to its complement depends on the language. This is one
additional parameter which can account for language variation (reminder: we discussed the 'Null Subject
Parameter' in previous lectures). English is uniformly head-initial, in the sense that in every construction the
head comes before its complement. By contrast, Japanese is uniformly head-final, in the sense that the head
always comes after its complement. While this does not account for all syntactic differences between English
and Japanese, it accounts for quite a few, and brings out the similarities between two apparently very different
word orders:
(26) John-ga Mary-o but-ta
John-particle Mary-particle hit-PAST
'John hit Mary'
(27) IP

I'

NP VP I

PN NP V PAST

John
PN hit

Mary

(28) Bill-wa John-ga Mary-o but-ta to omot-ta


Bill-particle John-particle Mary-particle hit-PAST that think-PAST
'Bill thought that John hit Mary'

10
P. Schlenker - Ling 1 - Introduction to the Study of Language, UCLA

(29) IP
NP I'

PN VP
I

Bill CP V
PAST

IP C think

I' that

NP VP I

PN NP V PAST

John
PN hit

Mary

It should be noted that English and Japanese are two extreme examples: head-initial for all constructions
(English), or head-final for all constructions (Japanese). Some languages display a mixed pattern, in which
some constructions (e.g. Verb Phrases) are head-initial, while others (e.g. Complementizer Phrases) are head-
final.

11
P. Schlenker - Ling 1 - Introduction to the Study of Language, UCLA

Appendix. Contents of Chapter 4 of Pinker's Language Instinct

4. How Language Works (Syntax)

(i) General properties of grammar

-Two 'tricks'
(a) Arbitrariness of the sign [Saussure] (75)
(b) Infinite use of finie means [Humboldt] (75)

-Syntax is a discrete combinatorial system (75)

-Syntax is autonomous from cognition (76)


(a) Meaningful sentences that are ungrammatical
(b) Grammatical sentences that are meaningless

(ii) The Markov Model (=the Finite State Model) (81)

(iii) Syntactic Trees (90)


(a) Basic components (90)
(b) Structural ambiguity (94)

(iv) Phrase Structure (97)

(a) Parts of speech (98)


(b) X' theory (99)
(c) The Head Parameter (103)
(d) Thematic roles (105)
(e) Case (107)
(f) IP (110)
(g) Function words (111)
(h) Deep structures (113)

12

You might also like