Sentence Structure II: Phrase Structure Grammars: Introduction To Language - Lecture Notes 4B

P.
Schlenker - Ling 1 - Introduction to the Study of Language, UCLA
Introduction to Language - Lecture Notes 4B
Sentence Structure II:

Phrase Structure Grammars
☞ Goal: How are sentences built (or 'generated', as linguist say)? Corresponding to the two hypotheses that
were considered in the preceding Lecture Notes, we discuss two possibilities. The first hypothesis, based on a
'word chain device' (formally called a 'finite state model' or a 'Markov model), yields sentences that have a flat
structure. We already found an argument against such a hypothesis in the preceding Lecture Notes- the
sentences of English do not have a flat structure. We show that this hypothesis has other defects as well. The
second hypothesis, by contrast, generates (=produces) sentences that do not have a flat structure. It involves
Phrase Structure Rules, which yield trees with labels added to indicate the syntactic category of each
constituent (e.g. Noun Phrase, Verb Phrase, etc.). The resulting tree is seen to recapitulate the process by
which a sentence is generated =produced) by the rules of grammar: a group of elements forms a constituent
whenever they have been introduced by the application of the same rule.
1 Review: Constituency
1.1 Summary: Trees
(i) In every sentence, certain groups of words form 'natural units' [=constituents] and may:
-stand alone
-be moved as a unit
-be replaced as unit by a pronoun
(ii) Trees encode the information about constituents: two expressions are a natural unit (=constituent) if there
is a sub-tree that contains them and nothing else.
(iii) A sentence that can be analyzed as 2 different trees is structurally ambiguous (e.g. Lucy will hit the
student with the book)
1.2 A Puzzle Explained: Question Formation
 The Puzzle (repeated from earlier Lecture Notes
Pinker discusses in Chapter 2 of The Language Instinct (p. 29) the example of question formation. If we wish
to form a question that corresponds to the assertion John is in the garden, we may simply move the auxiliary is
to the beginning of the sentence, yielding Is John __ in the garden? [here __ simply indicates that a word has
been displaced]. In a slightly more complex case, such as John is in the garden next to someone who is asleep,
we form the corresponding question by moving to the beginning of the sentence the first is, yielding Is John __
P. Schlenker - Ling 1 - Introduction to the Study of Language, UCLA
in the garden next to someone who is asleep? If we tried instead to move the second is, we would obtain a
sharply ungrammatical result ('ungrammatical' in the descriptive sense we will use throughout this course): *Is
John is in the garden right next to someone who __ asleep?
These contrasts are recapitulated in (1):
(1) a. John is in the garden next to someone who is asleep.
b. Is John __ in the garden next to someone who is asleep? (Move the first is)
c. *Is John is in the garden right next to someone who __ asleep? (Move the second is)
From these one might be tempted to infer that the rule of question formation is to systematically move to the
beginning of the sentence the first is which is uttered. Pinker shows that this hypothesis is incorrect, since it
predicts (incorrectly) that the question corresponding to (2)a is (2)b:
(2) a. A unicorn that is eating a flower is in the garden
b. *Is a unicorn that __ eating a flower is in the garden? (Move the first is)
c. Is a unicorn that is eating a flower __ in the garden? (Move the second is)
We do not discuss at this point what the correct rule is (it will turn out that it must be stated in more abstract
terms than 'moving the first is' or 'moving the second is'). But we observe that a child that only heard simple
cases of question formation (e.g. Is John __ in the garden?) would have to infer a rather complex and subtle
rule from limited data. For the same reason as in the case of integers mentioned above, the child must have
something to guide his acquisition of a rule that goes beyond the sentences that he has heard."
 The Solution: 'move the auxiliary which is immediately under the right-hand daughter of the root'
The solution of the puzzle is that the rule of question formation should be stated in terms of structure (i.e. in
terms of syntactic trees) rather than in terms of strings (=linear order). The rule of question formation in
English is to move to the beginning of the sentence (i.e. to add to the tree) the auxiliary which is immediately
under the right-hand daughter of the root (the root is the top-most node of the tree).
(3) a. b.
If Mary is replaced with the person who will be hired (clearly a constituent - for instance it may be replaced
with the pronoun 'he' or 'she'), the general structure of the sentence is not affected, and in particular the same
word will is moved which was moved in the simple sentence. Crucially, it is not the word will contained in the
person who will be hired which is moved - as one wants. This is illustrated in (4) [note that a triangle stands
for a constituent whose internal structure is omitted for simplicity; in homeworks you should specify the
complete structure of a tree, i.e. you should not use triangles, unless the exercise tells you to do so]:
2
(4) a. b.
Going back to our original puzzle with A unicorn is in the garden, we can apply exactly the same reasoning.
Constituency tests would lead one to posit the following structure, where a unicorn is a single constituent.
(5)
The rule of question formation can then be applied in the same way as in our earlier examples:
(6)
And just as we want, the rule functions in exactly the same way when a unicorn is replaced with a unicorn that
is eating flowers; and the right result is obtained:
3
(7) a. b.
2 An Incorrect Model: Finite State Grammars (=Markov Model)
A plausible -but incorrect- model is discussed by Pinker in Chapter 4 of The Language Instinct, the
Finite State Model (also called 'Markov Model'; Pinker also calls it a 'word chain device'). It is both natural
and historically important, since it was considered plausible until the 1950's. In a nutshell, it attributes to a
speaker a simple mental system that allows him or her to determine whether a given word can or cannot follow
another given word. Here is the example of a Finite State Model discussed by Pinker (I have added a 'START'
and an 'ACCEPT' states, which are implicit in Pinker's discussion; the idea is that you feed the sentence to the
machine, starting with the first word, one word after the other. If you end up in the ACCEPT state after the last
word has been processed, the sentence is accepted; otherwise the sentence is rejected):
(8)
happy
the boy ice cream

START a girl eats hot dogs ACCEPT
one dog candy
(9) Examples of sentences that are generated by (8):
a. the boy eats ice cream
b. the happy boy eats ice cream
c. the happy happy boy eats hot dogs
d. a happy happy girl eats candy
(10) Examples of ungrammatical sentences that are not generated by (8):
a. *boy the eats ice cream
b. *happy boy eats hot dogs
c. *hot dogs eats the dog
4
(11) Examples of grammatical sentences that are not generated by (8):

a. some boy eats ice cream
b. the dog that the dog eats eats ice cream
c. either the boy eats ice cream or the girl eats candy
There are two important arguments against the Finite State Model:
-Argument 1: It does not account for the tree-like structure of sentences that we observed in Lecture Notes 3B.
-Argument 2: It cannot properly account for 'long distance dependencies', i.e. constructions in which two
elements that depend on each other are separated by an arbitrary number of words.
(12) Example of a long distance dependency: either ... or...
a. Either John is sick or he is depressed
b. Either John thinks that he is sick or he is depressed
c. Either Mary knows that Johns thinks that he is sick or she is depressed
d. Either the boy eats hot dog or the dog eats hot dog
e. Either the happy happy boy eats hot dog or the dog eats candy
etc.
We could try to integrate the either ... or construction into our Finite State Model, but no simple solution
would work. To see this, observe that in the following model nothing requires that a sentence that starts with
either should also contain or somewhere down the road. And for good reason: in order to 'remember' this, the
model would need some kind of memory, which it lacks completely. The problem turns out to be very severe.
In fact, Noam Chomsky became famous in the 1950's by proving that no matter how complex a finite state
machine was, it could not handle all constructions of English.
(13)
happy
either the boy ice cream

START a girl eats hot dogs
ACCEPT
if one dog candy
or then
(14) Some grammatical sentences generated by (13)

a. Either a girl eats candy or a boy eats hot dogs
b. Either a happy girl eats candy or a boy eats hot dogs
5
(15) Some ungrammatical sentences generated by (13)

a. *Either a girl eats candy
b. *Either a happy girl eats candy
3 A Better Model: Phrase Structure Grammars

Our goal, then, is to devise a system of rules that addresses the two criticisms given in Argument 1 and
Argument 2 above. In other words, the system we are trying design should:
Requirement 1: Account for the tree-like structure that sentences have, and
Requirement 2: Provide an analysis of long-distance dependencies, i.e. constructions in which two elements
that depend on each other are separated by an arbitrary number of words.
We start with some properties that are satisfied by all or most sentences:
(i) All sentences have a verb (e.g. sleep, eat, claim) and an inflection, which may appear as an auxiliary (will,
might, can, should, did, do, does) or as an affix on the verb (the latter case will not be discussed here, as it
involves further complexities).
(ii) All sentences include, normally before the verb, a group of words that contains a noun, be it a common
noun (e.g. man, woman, table) or a proper name (John, Mary).
This is illustrated in the following sentences:

(16) a. John will sleep
b. The director will sleep
c. Mary will hit John
d. The director will criticize John
If we performed constituency tests on these sentences, we would see that they all start in the same way:
-first, they contain a constituent that includes a noun
-second, they contain a constituent of the form [Inflection + ___], where ___ is a constituent that contains a
verb.
(17) a. [John] [will [sleep]]
b. [The director] [will sleep]
c. [Mary] [will [hit John]]
d. [The director] [will [criticize John]]
The initial group that contains a noun we will call a Noun Phrase, NP for short. The group that contains a verb,
referred to as ____ above, will be called a Verb Phrase. The group [Inflection + Verb Phrase] will be called I'
(pronounced 'I bar'. I for inflection, ' (i.e. bar) to indicate that it contains other things in addition). With this
background, we can start writing our grammar. Because each sentence contains an inflection, it is called an
'Inflection Phrase', symbolized as IP.
IP → NP I' (a sentence consists of a Noun Phrase followed by an I bar)

I' → I VP (an I bar consists of an Inflection followed by a Verb Phrase).
We can now write the rest of the grammar:
6
I → will, might, can, should, does, did (an Inflection is: will, or might, or can, or should, or does, or did)
NP → PN, D N (a Noun Phrase comprises either a Proper Name/ProNoun alone, or a Determiner and a
Noun)
PN → John, Bill, Mary, Sam, he, she...
N → President, director, boy, girl, Dean, friend, mother...
VP → Vi, Vt NP, Vs CP (a Verb Phrase comprises either an intransitive Verb Vi alone, or a transitive verb Vt
followed by a Noun Phrase, or a verb of speech or though Vs followed by a Complementizer Phrase)
D → the, some, a, every, my, his, her...
Vi → sleep, run, snore, fall...

Vt → meet, date, hit, kill, criticize...
Vs → think, say, believe, claim...
CP → C IP (a Complementizer Phrase comprises a Complementizer followed by an Inflection Phrase)

C → that
Let us first go through some very simple examples. The tree is constructed from the top, applying one rule at
each step. For instance the fact that IP is the mother of NP and I' indicates that we have applied the rule:
IP → NP I'. Similarly the fact that I' is the mother of I and VP indicates that we have applied the rule:
I' → I VP, etc.
(18) IP
I'
NP I VP
Vi
PN will
Mary sleep
(19) IP
I'
NP I VP
D N Vi
will
the President
sleep
We can also generate some of the sentences that occupied us in Lecture Notes 3B:
7
(20) IP
I'
NP I VP
PN Vt NP
will
D N
Mary meet
the President
(21) IP
I'
NP
I VP
D N will Vt NP
D N
Your friend meet
the President
Crucially, we observe that our Phrase Structure Grammar generates sentences 'with the right structure', i.e.
with the tree-like structure that was discussed in Lecture Notes 3B. The only difference is that some non-
branching nodes have been added (reminder: a non-branching node is a node with just 1 daughter). When the
non-branching nodes and the labels are disregarded, we obtain exactly the trees that were argued for in Lecture
Notes 3B:
(22)
Mary will
meet
the President
(23)
will
Your friend meet

the President
8
We also note that our little grammar can generate more complex sentences, thanks in particular to our rule for
verbs of speech and thought (e.g. believe, think, claim, etc.), which can embed an Inflection Phrase within
another Inflection Phrase, as is shown below (the embedding of a constituent of a given category within
another constituent of the same category is called recursion; it is essential to generate an infinite language):
(24) IP
Recursion of IP (=an IP is
I' embedded within another IP)
NP I VP
PN Vs CP
will
C
Mary claim IP
that I'
NP I VP
PN will Vi
John sleep
Observe that nothing would prevent us from embedding the IP in (24) within a larger IP, e.g. John will think
that ____. Since this procedure can be repeated as many times as we want, our grammar can generate an
infinite number of sentences.
At this point it should already be clear that we have met Requirement 1: our grammar does account for the
tree structure that was argued for in Lecture Notes 3B. What about Requirement 2, then? Do we now have an
account of long-distance dependencies? We do, as soon as we add one rule to our little grammar:
IP → either IP1 or IP2, if IP1 then IP2
This rule generates trees such as the following:
(25)
either IP1 or IP2
9
It is then clear that by adding under IP1 or IP2 any of the trees that can be generated by our grammar, we
obtain a grammatical sentence. Requirement 2 has thus been met as well.
4 The Head Parameter

[This part of the Lecture Notes will probably not be discussed until Thursday, February 5th, 2004]
The constituents generated by our Phrase Structure Grammar have labels that indicate which element
gives them their 'crucial' properties. For instance a Verb Phrase is so-called because it always contains a verb
in a specified position. We say that the verb is the head of the Verb Phrase. A major property of natural
languages is that their constituents are headed.
We make a further observation, which is specific to English. A head always comes before its sister.
Linguists call the sister of a head its complement. Thus we can express the same fact by stating that in English
the head always comes before its complement. For instance, the inflection I comes before its complement VP;
the complementizer C comes before its complement IP; and a transitive verb Vt (e.g. hate) comes before its
complement NP (e.g. the President).
Interestingly, the position of the head relative to its complement depends on the language. This is one
additional parameter which can account for language variation (reminder: we discussed the 'Null Subject
Parameter' in previous lectures). English is uniformly head-initial, in the sense that in every construction the
head comes before its complement. By contrast, Japanese is uniformly head-final, in the sense that the head
always comes after its complement. While this does not account for all syntactic differences between English
and Japanese, it accounts for quite a few, and brings out the similarities between two apparently very different
word orders:
(26) John-ga Mary-o but-ta
John-particle Mary-particle hit-PAST
'John hit Mary'
(27) IP
I'
NP VP I
PN NP V PAST
John
PN hit
Mary
(28) Bill-wa John-ga Mary-o but-ta to omot-ta

Bill-particle John-particle Mary-particle hit-PAST that think-PAST
'Bill thought that John hit Mary'
10
(29) IP
NP I'
PN VP
I
Bill CP V
PAST
IP C think
I' that
NP VP I
PN NP V PAST
John
PN hit
Mary
It should be noted that English and Japanese are two extreme examples: head-initial for all constructions
(English), or head-final for all constructions (Japanese). Some languages display a mixed pattern, in which
some constructions (e.g. Verb Phrases) are head-initial, while others (e.g. Complementizer Phrases) are head-
final.
11
Appendix. Contents of Chapter 4 of Pinker's Language Instinct
4. How Language Works (Syntax)
(i) General properties of grammar
-Two 'tricks'
(a) Arbitrariness of the sign [Saussure] (75)
(b) Infinite use of finie means [Humboldt] (75)
-Syntax is a discrete combinatorial system (75)
-Syntax is autonomous from cognition (76)

(a) Meaningful sentences that are ungrammatical
(b) Grammatical sentences that are meaningless
(ii) The Markov Model (=the Finite State Model) (81)
(iii) Syntactic Trees (90)

(a) Basic components (90)
(b) Structural ambiguity (94)
(iv) Phrase Structure (97)
(a) Parts of speech (98)

(b) X' theory (99)
(c) The Head Parameter (103)
(d) Thematic roles (105)
(e) Case (107)
(f) IP (110)
(g) Function words (111)
(h) Deep structures (113)
12

Sentence Structure II: Phrase Structure Grammars: Introduction To Language - Lecture Notes 4B

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Sentence Structure II: Phrase Structure Grammars: Introduction To Language - Lecture Notes 4B

Uploaded by

Copyright:

Available Formats

P.

Schlenker - Ling 1 - Introduction to the Study of Language, UCLA

Introduction to Language - Lecture Notes 4B

Sentence Structure II:

1.1 Summary: Trees

1.2 A Puzzle Explained: Question Formation

 The Puzzle (repeated from earlier Lecture Notes

2 An Incorrect Model: Finite State Grammars (=Markov Model)

the boy ice cream

(11) Examples of grammatical sentences that are not generated by (8):

either the boy ice cream

(14) Some grammatical sentences generated by (13)

(15) Some ungrammatical sentences generated by (13)

3 A Better Model: Phrase Structure Grammars

This is illustrated in the following sentences:

IP → NP I' (a sentence consists of a Noun Phrase followed by an I bar)

We can now write the rest of the grammar:

D → the, some, a, every, my, his, her...

Vi → sleep, run, snore, fall...

CP → C IP (a Complementizer Phrase comprises a Complementizer followed by an Inflection Phrase)

Your friend meet

IP → either IP1 or IP2, if IP1 then IP2

This rule generates trees such as the following:

either IP1 or IP2

4 The Head Parameter

(28) Bill-wa John-ga Mary-o but-ta to omot-ta

Appendix. Contents of Chapter 4 of Pinker's Language Instinct

4. How Language Works (Syntax)

(i) General properties of grammar

-Syntax is a discrete combinatorial system (75)

-Syntax is autonomous from cognition (76)

(ii) The Markov Model (=the Finite State Model) (81)

(iii) Syntactic Trees (90)

(iv) Phrase Structure (97)

(a) Parts of speech (98)

You might also like