Professional Documents
Culture Documents
Optimality Theory: Phonology, Syntax, and Acquisition: January 2000
Optimality Theory: Phonology, Syntax, and Acquisition: January 2000
net/publication/208033160
CITATIONS READS
35 532
3 authors, including:
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
Analogy and frequency in regular and irregular verbs in Germanic languages View project
All content following this page was uploaded by Jeroen van de Weijer on 04 December 2015.
Edited by
Joost Dekkers
Frank van der L,eeuw
Jeroen van de Weijer
OXFORD
UNIVERSITY PRESS
Introduction
' By way of acknowledgen~ent,we would like to thank Peter Ackema, Hans Broekhuis, Frank van
der Leeuw, Geraldine Legendre, Ad Neeleman, and Marc van Oostendorp for their valuable com-
ments on versions of this article as it approached finality. The usual disclaimers apply.
2 Optimality Theory
. .. - ~.
- ,: t-,.-?:~<fi:-s::.
unmarked option: codas are avoided if other tendencies do not interfere and ..- -
prevail. The assumption in OT that constraints are violable can be considered I I1STTcl3tS 215 -.~11L I . . -
- .
the formal correlate of linguistic tendencies, whereas constraint ranking expresses .-.
.-.: TdI122.-.-; 2TS : T - - . . -
. .. . .
the degree to which individual languages exhibit these tendencies-the higher the -,&'iLz-.: l: - -. Lt?. :r5: :- .-~ - :
rank of a constraint in the individual language, the more contexts there are in - .--.-
I:..: xorf :O T X ~ .
.
.
.
,.
which we can observe its effect. In other words, OT is a theory of markedness. - - th:.
1.-k * - - >
- - -- .----~-
LL..2 : . -
. ,
In the most general of terms, OT states that decisions are made on the basis ::::::.: r a x r . ; . ~ 1 1~ .- .- - -
of a parallel evaluation of available options with respect to a hierarchy of con- .:ZC-JaseS,
flicting requirements. As such it is applicable to domains other than grammar Ls: s1: illuitr::e T:-~ ~.
but also too general to provide us with a substantive theory of grammar. Only : ..-sfquect I?;T:I:: . .
after thorough linguistic inquiry can we determine the specific properties an OT 1:'s:. rhat rhs :c:; . ... .:: _
grammar should have. In the present volume a range of scholars study the theo- 1:; :hat T\-e _12~.?11 i . _ ..
retical questions that arise in this line of inquiry. 1?:s.: -1-D \\-itk y e : : ::- - -
.- .,. , . are nl~-?-- - - - - - -
This introductory article consists of a phonology, a syntax, and an acquisition -. . . .
section. In each section we briefly introduce the field in general as well as the -.
c
f r o - - ..
.. ~-
-L.. - -
~
issues addressed in the corresponding sections of the volume. First, however, we -.-Lr.- - outpu: stru::.::i E. . -
introduce some of the basic notions to be found in an OT grammar.
entire sentences. Hence it might be desirable to structure the input to a greater ::sht. For each ic;:: - - .
or lesser extent (see Section C). :2T~~est number of - . _ - ~ : -
. .
.::ghest-ranked corxrr L~- - I
(1) Input dzd-+Candidate set PI--+
Optimal
i candidate ?P~iolations than
I!: r: :- . -.
3:: an exclamatior z:-. .
A candidate set contains output structures. These structures are possible analyses x:aluation, and ars r>:r; - - - --
of the input (e.g. words in word phonology, or sentences in syntax). According jates, which will be 5- I _ . -, ~
to the principle of inclusiveness, GEN produces all those analyses of the input dates A and D rio!z+ :I -. ~
that 'are admitted by very general considerations of structural well-formedness' nleans that the seio::
(McCarthy and Prince 1993), which could include universal properties of, for rinue. When we m-o-.-fi1 - -
instance, syllable and phrase structure. rhese two candidate5 1 - . - -
The evaluator (EVAL)evaluates candidate sets with respect to particular rank- Jnce. Hence A is t;; r --
i n g ~of the constraint inventory Con. It is often assumed that Con is universal ?assing that the tota: 1..-I
(see, however, Boersma, this volume, and Ellison, this volume). Its members are :TVO violations and B :r -
simple and conflicting statements about the form of the output (well-formedness On the assumption ryi- - .- .
constraints) or the relation between the output and the input (faithfulness or six possible rankinpi : f . - -
correspondence constraints). Because of the conflict between constraints, all instance, the rankins 1
conceivable linguistic structures will violate at least some of the constraints. from L. In L', the e ~ - i _ r . -
Boersma, Dekkers and van de Weijer: Introduction 3
-
-7-. fsziits do not interfere and However, constraint violation per se does not lead to ungrammaticality, since
~ ~
--e :-iolable can be considered constraints are violable and strictly ranked; those structures that minimally vio-
- :- :i: : -.nstraint ranking expresses late rankings are optimal, and by definition grammatical. An output structure
~sndencies-the higher the
- -~1 i 5 minimally violates a ranking if all alternative structures that have an equal or
-.
. .. . ..ore contexts there are in
.--:
~
better score on the lowest-ranked constraint score worse on the ranking domi-
. . 1s a theory of markedness. nating this constraint. Parametrization is brought about by differences in con-
:c ::i.-ns are made on the basis straint ranking, and rankings can be taken to define grammars of individual
- - r:izxt to a hierarchy of con- languages.
:r n i i n s other than grammar Let us illustrate this with an abstract example (concrete examples are given in
. . .. . . ..
-. . . r :neorv of grammar. Only subsequent sections). Let us suppose that Con consist of CON,, CON,, and
- - - : 1s: ->ecific properties an OT CON?,that the language L is defined by the ranking CON,>> CON, >> CON^,
--.is : f scholars study the theo- and that we have to evaluate a candidate set which contains the output struc-
. ~
tures A-D with respect to the input I. The violations each of these structures
.. .
1 <:-;tax, and an acquisition
-- incurs are givenin tableau (2). In OT tableaus the top row gives the constraint
. . .
. : z:.; m general as well as the
-: ranking from left to right. In subsequent rows constraint violations are given for
:I: I .-;'::me. First, however, we each output structure. Each asterisk (or 'star') represents one violation.
.- 1 .:I r : OT grammar.
. .- -
-. - . For every possible
..- _.> 1.
-
~ 1).
.
- - . . .
- -_ =
Inputs are in principle
if:.
. -.
:---: . 7.x-ord phonology. How-
I
:
- . . ::.sx combinatorial tasks, the
- . -.. :
: - - .. . s-::cent. In syntax, for in-
r ::- r i :,.-ords suffice to evaluate It is easiest to determine optimal candidates by reading tableaus from left to
- - -:_:s the input to a greater right. For each column we have to determine which candidate(s) incur(s) the
lowest number of violations. In ( 2 ) , B is the only candidate that violates the
highest-ranked constraint CON,,even though candidate B has a smaller number
- 1 :::ma1 candidate of violations than the other candidates. This violation is fatal, which is indicated
-
by an exclamation mark (cells to the right of fatal violations are irrelevant for
- - - - --:s are possible analyses evaluation, and are therefore shaded). This leaves us with three remaining candi-
-Z In syntax). According
,-- dates, which will be evaluated with respect to the next constraint, CON,. Candi-
- - -.s analyses of the input dates A and D violate this constraint only once, whereas C does so twice. This
--,;rural well-formedness' means that the second violation of CON,by C is fatal, and only A and D con-
-- - er sal properties of, for tinue. When we move on to CON^, this constraint turns out to decide between
these two candidates. D incurs two violations of CON^, whereas A violates it only
I - --:ypect to particular rank- once. Hence A is the optimal candidate and therefore grammatical. Note in
-
sd that Con is universal passing that the total number of violations is irrelevant. The fact that A incurs
- - - olume). Its members are two violations and B only one has no influence on the evaluation.
- = 2utput (well formedness On the assumption that rankings define grammars of individual languages, all
_ -2;. input (faithfulness or six possible rankings of our three constraints represent distinct grammars. For
- - - >etween constraints, all instance, the ranking CON, >> CON,>> CON^ defines the language L', distinct
. _- .$me of the constraints. from L. In L', the evaluation of the candidate set {A, B, C, D} with respect to I
4 Optimality Theory
B. Optimality-Theoretic Phonology
Optimality Theory was first developed with respect to phonology. There are
good reasons why the phonological domain should have been very susceptible
to an Optimality-theoretic treatment. Phonology is a rich hunting-ground for
potential conflict. There is, for instance, the conflict between ease of articulation
and the necessity of perceptual distinctness, which plays a role in most assimila-
tion processes. Second, there is a potential conflict between phonological regu-
larity (e.g. in terms of syllable structure) and morphological needs. Finally, the
different demands of segmental and suprasegmental phonology may conflict.
In this section we examine some cases of constraint conflicts in phonol-
ogy to illustrate these different sources, and to establish why constraint-based
approaches are preferable to rule-based approaches.
-- _.optimal because it is the This constraint forbids two (and, by implication, three or more) onset conso-
nants. Candidate forms which violate this constraint will be weeded out in
- 1- s\t-ranked constraint.
favour of candidates which have, for instance, an epenthetic vowel between the
consonants, or in which one of the consonants has been deleted; other con-
straints (e.g. a constraint forbidding deletion from underlying forms, which
would prevent the latter type of scenario) will take care of further selection.
Let us examine a case in point to illustrate the interaction of this constraint
with other ones. In Sinhalese, loanwords from Dutch with obstruent + liquid are
broken up by epenthesis (Sannasgala 1976; van de Weijer 1996; cf also Jacobs and
Gussenhoven [this volume] and Lacharite and Paradis [this volume] for the
treatment of loanword phonology in Optimality Theory and other frameworks):
(5) Dutch Sinhalese
- -: :T: phonology. There are
TS plan palana 'plan'
- - -
...;k v e been very susceptible
procuratie peraka1a:si-ya 'procuration'
. . :-i r rich hunting-ground for kraan kara:ma-ya 'tap' (i.e. cock or fauset)
f r r :ti;reen ease of articulation vrouw porova 'queen' (in cards)
.. .. ::r.-s. a role in most assimila-
- ~ .- . .
. Sinhalese obviously experiences the effects of *COMPLEXONSET. In addition,
. .
- -~ r .:sT\~eenphonological regu- this language requires open syllables, and forbids resolution of the consonant
- - .
--
. ..8_l.~gical
--
~
. -.
_. _ - - - :sh why constraint-based
. -
loanword as unaltered as possible. The latter constraint could be regarded as a
. ...
..
--
general Faithfulness or Correspondence Constraint (McCarthy and Prince 1995),
forbidding deletion and epenthesis in general.
(6) MAX-10 : Preserve lexical segments2
- long-standing problem
D E P - I 0 (V) : DO not insert a vowel
--- -a the phenomenon that
i - - s bame kinds of output In Sinhalese, the *COMPLEXONSET
constraint dominates the D E P - I 0 con-
_ -. - - have a ban on conso- straint:
- - _ ? beveral different rules
- -.I-.
or resulting from mor-
. ,ap such clusters by varl-
-.e-ion). Other conspiracies
?,-ablished prlnclple, such
- - metrlfied (Kagel 1997).
- _ - _onsplracies, but does not Although the third candidate deviates from the input form, and therefore vio-
_ -eared towards an optimal lates DEP-10, this violation is not fatal because the candidate respects the two
.
- mant clusters In the pho- higher-ranked constraints, *COMPLEXONSET and I\;~Ax-10.
- --?tures such regularltles by Interestingly, the data from Sinhalese are complicated by the behaviour of
' Or, technically, 'every segment in the input appears in the output', for MAX-10, and 'every
vowel in the output has a correspondent in the input', for D E P - I 0 (V). The formulations in (6) only
serve expository purposes.
6 Optirnality Theory
loanwords with initial s plus consonant clusters. These are presented in (8):
(8) Dutch Sinhalese
spatie ispa:su-va 'space'
stoep sto:ppu - irto:ppu-va 'footpath'
strijkijzer stirikka-ya - istrikka-ya 'iron' (of the household variety)
srhop -
sko:ppa isko:ppa-ya 'spade'
These data show that initial [s] plus stop clusters are either not broken up or
receive a prothetic, instead of an epenthetic, vowel. This indicates that the syl-
labic cohesion between [s] and stop is stronger than the cohesion between
obstruents and liquids, a tendency which is observed in language after language
(Broselow 1991, van de Weijer 1996). This may be formulated as, for instance,
a constraint forbidding epenthesis between a syllable-initial [s] and a consonant,
DEP-IO(VI s-C):
(9) DEP-IO(Vi s-C) : Do not insert a vowel between [s] and a consonant
Note that this constraint is itself dominated by a constraint of the type MAX-
IO(V), which forbids deletion of lexical vowels. Thus, like DEP-IO(V) (see
above), this constraint must compare input and output: it says that every seg-
ment in the output must have a correspondent in the input. The constraint is un-
violated in Sinhalese, although it may be violated in other languages (see below).
This constraint ordering yields the attested form with prothetic iil (and with,
presumably, resyllabification of s and k into different syllables) as optimal. Note,
however, in (8) that the candidate 1sko:ppa-yal also occurs, apparently in free
variation with the prothetic form, although it violates the *COMPLEXONSET con-
straint. The problem of variability is a precarious one in Optimality Theory (see
Hayes [this volume], Boersma [this volume], and a number of contributions to
Hinskens et al. 1997, for instance). One way of formalizing it is by leaving certain
constraints unranked, or by assuming different constraint rankings for different
speech registers. Variability therefore touches on the heart of Optimality Theory,
since it bears on the nature of constraint ranking within a language or on differ-
ent constraint rankings within the same language. We will not address this here.3
Note that another solution would be to assume that s plus stop clusters are single segments, so
that they do not violate the * C ~ M P L E X ~ Nconstraint.
SET This would account for the fact that forms
like 1sko:ppa-yal also occur, but it does not account for the fact that 1isko:ppa-yal is also possible.
It is important, however, to point out that the representation of segmental properties is crucial to
a correct interpretation of constraints: see also Smith (this volume).
Boersma, Dekkers and van de Weijer: Introduction 7
- z.: are presented in (8): Given the hypothesis that constraints are universal, the tendencies they express
are also expected to play a role in other languages and other processes. Given
the hypothesis that constraint rankings are language-specific, it is expected that
in other languages other tendencies are predominant. This is shown by similar
- - : rausehold variety)
loanword phonology data from Saramaccan, a creole language spoken in Suri-
nam (Smith, p.c.):
- -
. sither not broken up or
1:s (11) Dutch Saramaccan
--
:. _:is indicates that the syl- schroef sukulufu 'screw'
.. - - - - - schout sikutu 'bailiff'
- the cohesion between
.-
- z - :t . ianguage after language
: schopp(en) sikopu 'spade(s)'
- -~ :: i: r7:ulated as, for instance, portret pctilati 'portrait'
- r -?-..-.:rial [s] and a consonant, melk meliki 'milk'
verstaan fusutan 'understand'
- -- :Z: 5 and a consonant In this language all clusters, regardless of their content, are broken up. In
Optimality Theory terms, the constraints DEP-IO(V / s-C) and *COMPLEX
-.-raint of the type MAX-
-
- ONSETmust be reversed in this language. In addition, the language obeys
-- hke DEP-IO(V) (see
-A.
strictly a constraint demanding open syllables and a constraint barring /r/ (which
_ _-:- ~t says that every seg-
is realised as [l]).The constraints demanding these latter properties of the out-
- . - 7 ~t The constraint is un-
- - rr languages (see below). put are not shown in the tableau below.
analyses by Prince and Smolensky (1993),McCarthy and Prince (1993a, b), et seq.:
A 1 1 ;r:-:-i-;Ser of contributions to
. _ _ : i r g it is by leaving certain (13) um + aral + urn-aral 'teach'
.- - . - . *
. . :..:.-.;rankings for different
-
urn + sulat + s-urn-alat 'write'
- - 5 I-. t zrr of Optimality Theory,
- .
- um + gradwet -+ gr-urn-adwet 'graduate' (French 1988)
- .- - ;:::1- a language or on differ-
~
. .
s -:.-21not address this here.3
-
In McCarthy and Prince ( ~ g g o )such
, data are analysed using 'prosodic cir-
..
cumscription' (see also McCarthy [this volume] for discussion). A prosodically
- - - . :::.~jters are single segments, so circumscribed unit is divided from the base, after which the infix is prefixed to
-.; ricount for the fact that fornls
- -
-
.
..
-.~.
:jko:ppa-yal is also possible.
.:,~..eiltal properties is crucial to
the remainder of the stem. The prosodically circumscribed unit is then attached
again to prefix+base, giving rise to surface infixation. McCarthy and Prince
, - .
-- =
. (igg3b) note that in this case the prosodically circumscribed unit must be an
. .~..
--z ..,.>..: - - -
. -
. . - ..
.
.
...... . - ~ : . . .
--.-
- -. ,-.
--
.
~ -
-
.- .- > - - , - - . .
.-- . --
-.-
-L>-- . - ...=
LC.. ....
-
The first two candidates have closed syllables due to pre-/infixation of -urn-.
The last two candidates have no violations for this constraint (induced by the
prefix), and the candidate in which the prefix is as leftmost as possible wins out.
The difference between a rule-based approach and a constraint-based approach
is that constraints such as those in (14) are well motivated and have a broad
scope of application, while the mechanism of prosodic circumscription has to
be specified for every separate affix, often cannot be independently motivated,
and makes predictions-as we saw above-that are not borne out.
r-
- _ -&ar in a mora framework a principled account of 'Non-Derived Environment Blocking' effects, descrip-
- ,nit that formally corres- tively the mirror image of the cyclic ones. Both accounts hinge on the calcula-
., not possible (according tion of morphologically derived forms via surface-to-surface comparison rather
- - -- - s lust a single consonant than from underlying representations (see also Burzio 1994, 1996, 1998). Kager
-- r ndwet). Instead, Prince (this volume) deals partly with the same kinds of phenomena, and provides
a:,b, analyse Tagalog infix- evidence in favour of a correspondence-based account over a derivationallcyclic
- -
- 7 ogical, requlrlng that all theory on the basis of stress and affixation in Dutch. Certain stress-governed
- - - -1at prefixes are located at blocking effects in affixation, Kager argues, cannot even be captured in Lexical
- -
?rn ,McCarthy and Prlnce Phonology while they follow naturally from the interaction between phonological
and morphological constraints.
As observed above, both the contribution by Jacobs and Gussenhoven and
that by Lacharite and Paradis deal with loanword phonology. Jacobs and
Gussenhoven (this volume) show the superiority of a constraint-based account
over a rule-based one because the former avoids having to state the constraints
to which adapted loanwords have to conform twice: once as a morpheme struc-
- - . mcltvet over other possi- ture constraint of the language, and secondly as a rule which is applied to loan-
- - ':Carthy and Prlnce 1993b: words (see also Section D.13). Lacharite and Paradis (this volume) focus on
. .-: are grven in the tableau: another aspect: they argue that the concept of 'rules' has not disappeared in OT,
as is sometimes claimed, but rather that rules have received another status: in
particular, rules have been relegated to the Generator G E N (see above). Thus,
they argue, the way in which candidates are generated should receive greater
attention, and the fact that rules are tucked away in G E N should not be used as
a criterion to distinguish between OT and other constraint- or rule-based ap-
proaches (cf also Broekhuis and Dekkers [this volume] for a similar stance in
Optimality syntax: see also Section C.2.1 below).
McCarthy (this volume) focuses on prosodic faithfulness constraints, and
-c? ple-Imfixation
__: of -urn-. shows how they account for a number of phenomena that were former-
- 3~straint(induced by the ly analysed under the heading of Prosodic Circumscription (see above; see
_ :- rost as posslble wlns out. McCarthy and Prince 1990). While conserving the insights of the circum-
- - - csnhtraint-based approach scriptional analysis, McCarthy shows that a correspondence account relies on
--,-ti\ ated and have a broad more general, and therefore more attractive, constraints and principles. Finally,
- --
. 21, c~rcumscr~ption has to Smith (this volume) ties together Optimality Theory and Dependency Phonol-
- - , ndependently motivated, ogy, and aims to show that in this connection the issue of segmental structure
- . --z -st borne out. is far from trivial.
. . - . - .- .
.--: phonology section of the
C. Optimality-Theoretic Syntax
The recent birth of OT syntax can be attributed to two developments in the
. . - -- . .
-h.axre had a long and con-
:--. study of grammar. On the one hand, Optimality Theory has proved to be suc-
- - ::::19~6, Kiparsky 1982, Kaisse cessful in the domain of phonology. Given this success, and given that there is
-- I - r:~.: -:~lume, Burzio argues that no reason whatsoever to assume that Optimality Theory is applicable only to
.-- . . - .r = 1r'r only useful in accounting phonology, investigating the possibilities of applying this theory to syntax can
I - - .:::::;;re, but can also provide be considered a logical next step.
10 Optimality Theory
At the same time, the emergence of OT syntax seems to fit into the general
tendency in syntax to blame the ungrammaticality of a sentence on the existence
of a better alternative. This view on grammaticality is also found in Chomsky's
Minimalist Program (Chomsky 1995), although Chomsky takes optimization to
play a much more modest role than OT syntacticians do. Whereas Chomsky's
only criterion for evaluation is derivational cost4, the inventory of violable con-
straints assumed in OT syntax is richer. As a result, the OT constraints interact
and conflict with each other. This interaction is exploited by the assumption that
constraints are ranked, and that parametrization can be reduced to differences
in ranking between languages. Chomsky's economy conditions, on the other
hand, have no such direct parametrizing effect. In the Minimalist Program, the
locus of parametrization is the lexicon.
This section consists of two parts. In the first part, we give a brief introduc-
tion to OT syntax for those readers who are not familiar with it, by illustrating
that the application of OT to syntax enables us to successfully account for a
diverse set of syntactic phenomena. The second part should be seen as an intro-
duction to the syntax section of this volume. We discuss several issues that are
addressed there (and elsewhere) which pertain to the way in which OT could or
should be applied to syntax.
Such as (i) Minimize the number of movement steps (Fewest Steps); (ii) Minimize the length
of movement paths (Minimal Link Condition); (iii) Minimize overt movement (Procrastinate).
Boersma, Dekkers and van de Weijer: Introduction 11
: ::E;USS several issues that are come too rich. C&L explicitly state that filters 'will have to bear the burden of
. . - -
?:I :$-a\, in which OT could or accounting for [. . .] all contextual dependencies that cannot be formulated in
the narrower framework of core grammar' (p. 432).
An important point Pesetsky makes is that the question of whether the
complementizer is pronounced or not depends on whether it appears in clause-
- - _L lng their theory of filters, initial position. This is illustrated by the French examples given in (17): the
- _ i t least some outcome 1s complementizer que is only pronounced in the absence of an element in SpecCP.
(p. 465). In OT syntax,
--? (17) a. l'homme que je connais
-:irx, in which at least one the man that I know
- in thls light, ~t 1s not so b. l'homme avec qui j'ai dans6
- -
sli-equlpped for analyslng the man with whom I have danced
- 7,enunciation patterns in Pesetsky explains this phenomenon by using the three violable constraints given
-111s out that OT does well
in (18). RECmakes a distinction between deletion of relative pronouns embed-
- - L ~ l c a wing
l' (3a, 4a, 5a, 6a) ded in a PP and deletion of those that are not. Only in the former case is a
meaningful element, the preposition, deleted, which leads to a violation of REC.
The second constraint, LE(CP), is an alignment constraint (see McCarthy and
Prince igg3a) which is violated whenever the first pronounced element in a CP
is not a head from the extended projection of the verb. And finally, TELprohib-
its the pronunciation of function words, such as the complementizer que.
(18) a. Recoverability (REc): A syntactic unit with semantic content must be
pronounced5
--1
: C&L (would) analyse in
j We have simplified Pesetsky's definition of KEC. The original dsfinition, given in (i), seems to
- - - C.~.I),and stylistic rules be too coinplex to qualify as a possible OT constraint, since it expresses matters that can be captured
in terms of interaction of simpler constraints (cf BroeMluis and Delikers [this volume]).
- : :.- Steps); (ii) Minimize the length (i) Recoverability (REc): A syntactic unit with semalltic content luust be pronounced unless it has
-: :r: movement (Procrastinate). a sufficiently local antecedent.
12 Optimality Theory
-.
b. Left Edge CP (LE(CP)): The first pronounced word in CP is a function .-:lled COMP Filter. 5.-cr - .
word related to the main verb of the CP. -7the tableaus abo7:t. - - .:
c. Telegraph (TEL):DO not pronounce function words. Liqnment is successf:.
This ranking gives the evaluations in the tableaus (20) and (21). In tableau
(zo), deletion of the relative pronoun does not violate Recoverability because
it is not embedded in a PP. Furthermore, LE(CP) prefers elements in SpecCP .inother advantage I f 1 . -
rsneral and in pricci;.: - -
'
to be deleted because this ensures that the complementizer occurs in CP-initial
position. As a result, candidate (c) comes out as optimal, and T E Lis not rele- --:an\- more left-peri;:er r
vant. :?setsky shows that ; 1 ;-~L '
:I :he interaction if ::1 - -
. b d finally, we p-c<. r- - - -
/ b. l'homme qui $ ~ t eje connais 1 1 Ler us illustrate this f. 1 lf
-
In tableau (21), a different pictures emerges. When avec qui is deleted, the f r i this descriptioc f . 1-I : -
preposition is not recoverable, which excludes candidates (c) and (d). The sec- -::ions are availa5:~.- : .- 1 -
ond constraint, LE(CP), does not decide between the two remaining candidates r:::straints are in I r.: r - . -- ,
(a) and (b), because it is violated by both. In this case, T E Lplays a crucial role - --
because it prefers candidate (b) over candidate (a), since only the former in- 11 a. French: Lr ,_:
volves the deletion of the function word que. Note that this is an example of the Je pense ' <:I .
-
emergence of the unmarked (see McCarthy and Prince 1994). 'I think thz: :r Y -- -
- .
if we substi:.:::
.:.:?I-~ :1 7 ~ ~
One of the advantages of an OT analysis along these lines is that it is compatible -.I.??U
. -
1201.
rurthermore. ::i- - :_:
with a fully endocentric approach to phrase structure. In the 1970s it was as- I: irI~ulatedb!- C k L Ir - -
sumed that the left periphery of the (relative) clause had the structure in (zza), constrain:~. 1..
- .--
:~:.-3nt -: . ~
whereas currently the endocentric structure in (22b) is standardly associated with -~ I .sls
- ~< and -: an.-: .-- : - -
- .
.- -.-Laus ( l o ) anc
the complementizer domain. Now, if the complementizer is taken to project, as 1: -- 7
- r ;definition of tlrz:. - . -
in (b), there is no syntactic node that dominates both the relative pronoun and r-;:~ss do surfacs 15%:. I - . ~.
the complementizer but not the rest of the clause. Consequently, C&L's Doubly ~ ~ ..>.:aint
. ~ . 7
inyento?: :
.;:
:
Boersma, Dekkers and van de Weijer: Introduction 13
- L; cord in CP 1s a function Filled COMP Filter, given in ( q a ) , cannot be invoked to exclude candidate (a)
in the tableaus above.6 As Pesetsky shows, an alternative analysis in terms of
- -- i- x\ords. alignment is successful.
--- -r? ranked as in (19): 22) a. [, [ ,,,,,
Wh-phrase complementizer] . . .]
b. ,,[ [ Wh-phrase] [, [, complementlzer] . . .]I
- _ . 10) and (21). In tableau
ire Recoverabllity because
-r&s elements in SpecCP Another advantage of Pesetsky's analysis is that it refers to a small number of
- - :- sltizer occurs in CP-initial general and in principle universal constraints which turn out to be relevant for
- --
ylal, and TELis not rele- many more left-periphery phenomena than the one presented here. For example,
Pesetsky shows that C&L's For-to Filter, formulated as in ( q b ) , can be reduced
to the interaction of the three constraints given in (18), which falsifies C&L's
remark that the two filters in (23) 'are [. . .] minimal, in the sense that there
seems to be no simpler way to state the facts' (p. 450).'
And finally, we predict that alternative rankings (co-)define other languages.
Let us illustrate this for LE(CP) and T E Lwith clausal complements of epistemic
verbs. In the French (zqa), the complementizer is obligatorily pronounced,
which is consistent with our assumption that LE(CP) outranks T E Lin this lan-
guage. A language that has the opposite ranking lacks complementizers. Chinese
- -. .
.
.-.?n avec qui is deleted, the
~
. .. fits this description (Rint Sybesma, p.c.), as is shown in (24c). In English, both
. .: :::f:Sates (c) and (d). The sec- options are available, which indicates that LE(CP) and T E L are in a tie. If two
.~ .. - - .
. . . : n ~ remaining
o candidates constraints are in a tie, both rankings are valid.
. .. . - . >
.. . T E L plays a crucial role
. -
. .-: I . iince only the former in- (24) a. French: LE(CP) >> TEL
-+ :?: this is an example of the Je pense "(que) le PrCsident de la Republique a dkguise la vCritC.
. - -
- :r:s 1994). 'I think that the French president covered up the truth.'
One could reformulate (23a) as in (i). However, (i) is far from surface-true. Cross-linguistically
as well as within individual languages, it seems to hold only for a subset of heads and their specifiers.
In other words, (i) would have to be interpreted as a violable constraint.
(i) Do not pronounce both a head and its specifier.
Further, if we substitute the hierarchical (i) for the linear LE(CP), we would lose part of Pesetsky's
empirical coverage, such as his account for the facts formerly explained by the For-to filter in (23b).
Note, finally, that neither (23a) nor (i) explains the ungrammaticaiity of candidates (b) and (d) in
_ is that it is compatible
-:nes tableau (20).
Furthermore, it is rvell known that neither of the two filters in (23) is universal, which had to
- J I ~In
the 1970s ~t was as- be stipulated by C&L. In an OT approach, this can be attributed to a difference in ranking of the
--: ?ad the structure In (zza), relevant constraints. Note that this is a principled advantage of OT which Pesetsky does not exploit
--- ..
standardly associated with in his analysis; any ranking of the three constraints in (18) bans doubly filled COIMP configurations.
In tableaus (20) and (211, the (a) candidates are harmonically bound by the (b) candidates (see below
- :>:lzer 1s taken to project, as
- - -7 the relative pronoun and
for a definition of harmonic boundedness). The fact that there are languages in which the (a) can-
didates do surface leads Broekhuis and Dekkers (this volume) to the conclusion that Pesetsky's
lonsequently, C&L's Doubly constraint inventory needs to be revised.
14 Optimality Theory
..
b. English: LE(CP) <> T E L ..-
- -:,:ever, right-branain; r -, ~
...--
'I think (that) the President covered up the truth.' . ..i of view. And el-sr .r . -
-_ . - to .the
.ltendency ;1: - . -..
C. 1.2 Information structure
-~z.-.redge of the claust. :. r -
and (26) (Italian 'Free' Inversion), word order optionality is only apparent. Al- ... .. .- 2jb) is syntactiir::-.- - - - - I -
though C&L do not discuss these cases, they would probably analyse them in -7 or the PP that mc- t - _ -
terms of stylistic rules (cf Rochemont 1978). The application of these rules is not 3 - - - athat this order re-~t---
:
optional if their interpretational effects are taken into account. 1~1.
(25) Zubizarreta (1994):
a. Max put all the boxes of home furnishings in his car.
b. Max put in his car all the boxes of home furnishings.
-5: as in (25), the diff?:;:.:
.-
(26) Grimshaw and Samek-Lodovici (1995): :.z?rences. In (26b), t h t i _ I :. -.
a. Gianni ha gridato. r:r:rpretation is asso;izrs : -
Gianni has screamed. .:l~ek-Lodovici(199 j r:; ..: - .-
b. Ha gridato Gianni. r r :zsed constituents re: 1 . r: -
Zubizarreta (1994) argues that movement applies not only for morphological but !- ::i,s. On the other h ~ r F r
- ...
. -ition, which they atlr:: _ - ::
also for interpretational reasons.' The main difference between these two types
of movement is that only the former is self-serving, in the sense that it applies r 5 :onsidered a violab:, : . 7 -
to satisfy the needs of the moved constituent. The latter, on the other hand, is 1: 1 S U B J E C T(Grimik:- _ !.
.:
part of a syntactic conspiracy that targets a specific representation. The highest A-s~?s;.f:- : -
Let us start with the examples in (2j), where (25a) displays the unmarked
: I Italian, A L I G N - f o c ~1 .i:-. -
order. In (zjb), all the boxes of home furnishings is interpreted as narrowly fo- -
fige of the sentence.:I E:;~
cused. Such pragmatic information is marked syntactically by the inverted order
r 1;us, since subjects a ~ t;- - - :
of the NP and the PP. This inversion can be analysed in two ways. Either the Ir not.
NP is moved rightward, or the PP is moved leftward. In either case, a syntax
based on the hypothesis that movement applies in order to check features -31 a. *Has screams, - ~ : -
(Chomsky 199j) will not be able to map this pragmatic information onto syntac- b. John has scr::-~: 1
--
tic structure in a straightforward manner. _.2r difference betn-esr 1:; -
Let us suppose that focused elements are endowed with a [+foc] feature, which ::nkings of these two ;: r .- - . 1
must move to the checking domain of a Focus Phrase. Now, if the NP is moved, - -.
it should end up in a right-branching specifier of this FocP to check its feature. 30)
-.
-
-=
a. Ha gridato G.;r-
Zubizarreta actually argues that movement only indirectly applies for interpretational reasons;
b. Gianni,,,, hr ;-_
she associates focus with primary stress, and stress assignment with depth of embedding. By suggest-
-
ing that information structure is directly responsible for the marked word order in (25b), and that For a more detailed dii:--
-
: n structure and syntax 1
~
linear order, rather than depth of embedding, is relevant here, we are simplifying matters consider- ..
ably. This is done for expository reasons only. ;r:mshaw and Samek-Iodc-~.: . -
Boersma, Dekkers and van de Weijer: Introduction 15
-
. -
-I: :.-.:a account.
STAY.
--
-
~
.
.-
- -
.
r:
...;.
- -.
In his car.
5rnishings.
(27) a. ALIGN-focus:Align focused constituents with the right edge of CP.
b. STAY:Traces are disallowed.
Just as in (25), the difference in word order in (26) gives rise to interpretational
differences. In (26b), the subject is focused, whereas in (26a) it is not. Again, focus
interpretation is associated with the right edge of the sentence. Grimshaw and
Samek-Lodovici (1995) argue that there are two forces at work. On the one hand,
focused constituents tend to appear in clause-final position, to satisfy ALIGN-
L- . rr .: : :1': only for morphological but focus. On the other hand, subjects have a preference for the preverbal (SpecIP)
.
.
. .
- :. ;..;e between these two types position, which they attribute to the constraint SUBJECT, given in (28), which can
.
. .. ...=. in the sense that it applies be considered a violable counterpart of the Extended Projection Principle.
:, .atter, on the other hand, is
-
- - - ~
or not.
- - : :x-,.-rrJ.In either case, a syntax
- -
r r -.:i ic order to check features
-
(29) a. *Has screamed John.
-
L
. . ..- .
-
.. - - , -
,, information onto syntac-
;~
b. John has screamed.
.. .
The difference between English and Italian is captured by assuming opposite
-
~ - -
-
~.
-.
:I :..-ith a [+foc] feature, which rankings of these two constraints, as is illustrated in tableaus (30) and (31).~
. -
-..--
- ... ;--. Tow, if the NP is moved,
- .
- .
..
-
I 1s:
- ,
FocP to check its feature. (30) ALIGN-focus 1 SUBJECT
a. Ha gridato Gianni,+iocl I *
. - .
sc::::: for interpretational reasons;
- -
1 b. Gianni,,,,, ha gridato 1 *! 1
.- - tepth of embedding. By suggest-
:I
: :-:::sJ word order in (25b), and that For a more detailed discussion of the possibilities of accounting for the interaction of informa-
- -
.- -
5 :rs simplifying matters consider- tion structure and syntax in OT, the reader is referred to Costa (1996, 1997), Dekkers (1997),
Grimshaw and Samek-Lodovici (1996), and Samek-Lodouici (1996).
16 Optimality Theory
; - ~ ? E c , nor 03-1: - - . -r
C.1.3 Head and operator movement - - ..
~- - 7 ~ jh
-.-*A-~ dec1ara;i~~~j
1 :: .
Although OT syntax enables us to explain interface phenomena such as the ones f~ d ~ - S P E CT.h: ;: :: -
presented above in a natural way, it should by no means be considered to be -
I -:.rion(s) of eltnsr - I - : r
only a theory of 'surface syntax'. Since the first syntactic OT studies (Legendre
et al. 1993; early versions of Grimshaw 1997), it has become clear that OT can :: 11ati-ix intcrrc;::: I
.-
provide us with a general theory of syntax-in fact, the broad empirical scope -
developed minimalist syntax. This section will be concluded with an OT analysis b. [e [John ~ri:: r: : - -
of some aspects of head and operator movement, traditionally localised in the c. [what e [Jar --
-
---
transformational component of syntax. .
-
-
- d. [what will [irk: - - r
A remarkable property of Germanic subject-auxiliary inversion is the fact that
e. [ will [John : - ...
it is contingent on constituent preposing. This is illustrated in (32) and (33) for
English. In the interrogative in (32), the Wh-element is parsed in its left-periph- -
-,
- ,; Matrix declarari---si
era1 scope position and the auxiliary precedes the subject. In declaratives, on the .,
~
--
-. -
- _ - - -1odules of grammar do
onological constraints, The evaluation of English interrogatives is given in tableau (36))in which candi-
date (d) is evaluated as optimal, since it is the only one that violates neither
OP-SPEC,nor OB-HD.In tableau (37) it is illustrated that the presence of CP
in English declaratives does not serve any purpose; all candidates vacuously sat-
-- :
.. ~
..: ...~.,omena such as the ones isfy OP-SPEC.The projection of CP will therefore always entail a(dditiona1)
-
. . -
-
..;dns be considered to be
~ ~
. . .~
. _ I >,-come clear that OT can
(36) Matrix interrogatives OP-SPEC OB-HD STAY
-
- . A
~.
. -
- .-
31s broad empirical scope * -
*!
I x r u r e of the also recently
- *! * *
r :1;:;ded with an OT analysis
. .r_r -::jitionally localised in the
. *! **
**%
of wh-phrases which conflicts with OP-SPEC.Legendre et al. (1998) argue for r =rl+r,ilquestions 3s; r ::: . -
the constraint *ADJOIN,prohibiting adjunction in SpecCP. If this constraint - - .
outranks OP-SPEC(and STAY),raising of a single Wh-phrase will be preferred - . -. Derivation
over multiple fronting." On the opposite ranking, the reverse will be true. At - : l i ? q h OT is gen,r:-. . - -
--
first sight, the absence of multiple interrogatives in a language like Italian is un- r grammar, mosr .-'. - -- -
expected, since a priori one would expect that each language evaluates one of the 1-95),for instanc?. r12..-: : -
possible multiple Wh-structures as optimal. We will return to this point below. 1 -S~ructure,S-Strui;ur: _ r _I:
tlrations, are evalu:;,; I- -~
, and
-,:- Legendre e: .?:. .- - -
" For expository purposes, we have adapted Legendre et al.'s analysis to Grimshaw's (cf. Legendre See Legendre et 01. 1, -.
et al. 1995, 1998). See also Ackema and Neeleman (1998). . .:;;on.
-
Boersma, Dekkers and van de Weijer: Introduction 19
_-?ID, I to C movement does s;ntax with important typological power." However, OT does not provide us
means to prevent a viola-
- - =I - :ith a substantive theory of grammar right away, nor does it automatically lead
---:~tlzer. A Canadian French ~3 a unified view on the set-up of syntax. This makes it possible for Pesetsky to
-
-
- - The question of whether xgue for a much more modest role of OT in syntax than for instance Grimshaw
- _ - --: rank of a constraint like toes. Pesetsky's main argument against a fully-fledged OT syntax is based on the
sistence of so-called absolute ungrammaticality or inefability.
Syntactic objects are not always ungrammatical because of the existence of a
jetter alternative. Sometimes ungrammaticality seems absolute, sometimes, a
sentence is ungrammatical without there being a grammatical alternative avail-
able in the language. The absence of multiple interrogatives in Italian noted
dbove is a clear example. This leads Pesetsky to argue that OT syntax deals only
~ r i t hmatters of deletion and pronunciation, whereas some other module of
grammar determines which structure-building and movement operations should
-I -:i ;o multiple interrogatives. apply. By assumption, this module consists of inviolable conditions, which
- .... Z ~ S S remain in situ, as pre- ~vouldaccount for ineffability. However, it is not clear that Pesetsky's line of
.:--..: p . ~ . )Among
. Wh-raising thought holds water. Several scholars have shown that ineffability can be cap-
- .
.. -:I Ir some languages, all Wh- tured in a fully-fledged OT syntax in terms of underparsing. It is not obvious
-
- :. Sustrated for Bulgarian in
.i that an explanation for ineffability in terms of OT is inferior to an approach that
.
... r s:. 11171-phrase is raised, as
-- -~
?-
I refers to inviolable conditions. It seems that this issue just attracts more atten-
- Znz:iik !;9c). And finally, there tion in OT, and not if that is necessarily a more fundamental problem for this
- . ..
. _ . I . :,. Rizzi (1982: 51) reports
-.~ theory than for others. In the section, we review the possibilities of underparsing
as well as the closely related issue of the role that semantic interpretation should
play in candidate evaluation.13
Because the syntactic studies written within OT are so far small in number,
it is not surprising that disagreement has arisen concerning the exact properties
OT syntax should have; in part this is due to the influence of independently
existing (non-OT) syntactic theories. We illustrate this in Section (2.2.1 below by
presenting different views on the role that derivation could play in OT syntax.
It is not unlikely that these differences will disappear once the body of work
- 1 .~rsfers multiple fronting grows, the influence of other theories is channelled, and OT-specific answers to
- -=-Crs et al. (1998) argue for general questions become widely accepted.
- - specCP. If t h ~ sconstraint
- !-phrase will be preferred C.2.1 Derivation
- - 7 s reverse will be true At Although OT is generally associated with a representationalist view on (modules
_ anguage like Italian is un- of) grammar, most OT syntacticians do leave room for derivation. Miiller
-
_rzuage evaluates one of the (1998), for instance, assumes a multi-level syntactic model which consists of
_ -turn to this point below. D-Structure, S-Structure, and LF, and argues that derivations, rather than repre-
sentations, are evaluated. Grimshaw (1997) adopts a model of syntax where the
levels of S-Structure and Logical Form co-exist (a heritage of the Principles and
Parameters framework). Contrary to Miiller, she does not explain her choice for
this multi-level model of syntax. Although Grimshaw alludes to LF, she does not
make crucial use of this level of representation. LF seems relevant only in the
determination of the candidate set, because she assumes 'that competing candi-
dates have non-distinct logical forms' (p. 376). However, Grimshaw also pro-
poses that argument-predicate information is given in the input, and leaves the
possibility open that the input 'should include a specification of LF-related
properties, such as scope' (ibid.), which might be considered a redundancy in
her system.
Another, related, heritage of the Principles and Parameters framework is the
hypothesis that syntactic representations are transformationally derived, and that
constraints-e.g., STAY given in (27)-may refer to traces of the transforma-
tions. However, Bresnan (this volume) argues that OT can more naturally be
applied to syntax if a parallel correspondence theory is assumed, rather than a
transformational syntax. More particularly, Bresnan (this volume) develops an
OT syntax based on LFG, in which the optimal correspondence mapping is
determined between two parallel structures: a feature structure (f-structure)
which constitutes the input, and a categorial structure (c-structure) which serves
as the output. F-structure is comparable to the structured input assumed by
Legendre et al. (1995,1998)-although the former is more precise than the latter:
see also Section C.3 below-while c-structure is expressed in terms of extended
X-bar theory, more or less along the lines of Grimshaw (1991). As Bresnan (this
volume) shows, it is possible to translate Grimshaw's transformational theory
into a non-transformational one based on imperfect correspondence mapping
without losing the essence of Grimshaw's analyses. Those constraints that seem
to presuppose a transformational syntax can relatively easily be recast in terms
of extended X-bar theory and correspondence.
As we have already noted above, Pesetsky (1997, 1998) takes a direction that
is entirely different from Bresnan's. He argues that OT syntax should not be
concerned with building syntactic structure but rather with the way this struc-
ture is spelled out. Broekhuis and Dekkers (this volume) follow Pesetsky to a
large extent. They argue that if one accepts (the essence of) the Minimalist Pro-
gram, it is desirable from an empirical point of view to consider Chomsky's
computational system to be the Generator of OT syntax. This enables them to
(re)consider the division of labour between the Generator and the Evaluator. For
instance, whereas Pesetsky assumes that the Evaluator determines the optimal
pronunciation of one single LF representation, Broekhuis and Dekkers (this
volume) present arguments in favour of the idea that pronunciation patterns of
several LF representations may be compared with each other, so long as they
.. . -
receive the same semantic interpretation. They reach this conclusion by arguing . i a r e n c ~ s
a;: :I :-~ - r:
- -Z .-
. ~rztations.1 : r:-.
that, among other things, it has empirical advantages to assume that economy . 1-
-
. .. -,tures play a .r:; r : . ..
of movement is not part of the computational system but rather an OT con- .
Boersma, Dekkers and van de Weijer: Introduction 21
_, - this conclusion by argulng two sentences are coinposed of the same words, they receive distinct semantic
- +sb to assume that economy interpretations. In this light, it seems plausible that the semantics of output
.-sm but rather an OT con- structures play a crucial role in the evaluation procedure in one way or another.
22 O~timalityTheory
And indeed, there exists a broad consensus that semantic interpretation has an
influence on the evaluation procedure, although the formal implementation of
this idea is subject to debate. Roughly, two viewpoints can be distinguished.
Some authors (e.g. Ackema and Neeleman [this volume], Broekhuis and
Dekkers [this volume]) argue for a semantic identity requirement on Candidate
Sets (henceforth the Semantic Identity Approach). According to this require-
ment, only structures that have an identical (or non-distinct) semantic interpre-
tation may enter in competition with each other. The two structures in (42) are
thus in distinct candidate sets and will not be in competition, simply because
they have different meanings. Others propose that the input is structured
(henceforth the Structured Input Approach). According to Legendre et al. (1995:
blo), 'Inputs consist of [. . .] skeletal structures containing predicate-argument
structure and scope information'. Along the same lines, Bresnan (this volume)
identifies the input with an LFG-based feature structure (see Section C.2.1).
Let us illustrate this with our example of multiple interrogatives. Legendre et
al. assume that the input for multiple interrogatives takes the form in (43). Some
outputs that are faithful to the input are given in (44). These structures corre-
spond to the Chinese, Bulgarian, and English examples in (39) which we have
repeated in (45).
: .
. - ..
-. :-mantic interpretation has an if the input is structured, this structure must be generated in one way or
. " - ;.-. :1?formal implementation of another. Under the Semantic Identity Approach, the task of establishing scope
. ~ . + ~ 2 i n t scan be distinguished. and predicate-argument relations is attributed to the syntax system itself. This
. . .- .
.
- :..:is. volume], Broekhuis and means we do not need an additional generative device to structure the input,
-
.
.
:_r:i?- requirement on Candidate but rather an algorithm that prevents the comparison of semantically distinct
- - r :- - :k . .iccording to this require- structures.
. . - r :_-distinct) semantic interpre- Second, within the Structured Input Approach, unfaithful analyses of the
-- input structure may compete with faithful ones. Legendre et al. argue that
r - - - :r ne two structures in (42) are
-
-
- r = .-.1 competition, simply because ~vh-elementsmay be realised as non-interrogative quantifiers, in which case to
- - .
r ~ - + rlat the input is structured whom would surface as to someone: Hence, the unfaithful analysis (47) of the
.. -
-
- . - .i:e
.-
-.
-
r r r rding to Legendre et al. (1995:
_ r : ::ntaining
predicate-argument
lines, Bresnan (this volume)
input structure (43) competes with its faithful counterparts in (44), and may
even win.
147) Unfaithful output:
- I . .-.r: i r r ~ i t u r e(see Section C.2.1).
~
-
-- --
On the assumption that underparsing has an effect on the interpretation of out-
1 - : - s:cznples in (39) which we have put structures, candidates conveying different meanings may block each other.
Because faithful parses are clearly unmarked, Legendre et al. propose that in
structures in which who surfaces as a non-interrogative quantifier, the constraint
PARSE- Wk is violated. Now, in Wh-raising languages (OP-SPEC>> STAY)in
which *ADJOINhas a high and PARSE-Miha low rank (more precisely: if
"ADJOINand OP-SPECoutrank PARSE-Whand STAY),a simple interrogative
will surface whenever a multiple interrogative is targeted. These languages there-
fore lack multiple interrogatives. We established in section C.1.3 that Italian is
such a language. The Structured Input Approach thus provides us with an expla-
nation for language-specific ineffability which exploits the possibilities of partial
underparsing ('sometimes it is best to say something else').
Within the Semantic Identity Approach, this type of ineffability cannot be
explained in terms of partial underparsing, since competition between for
instance simple and multiple interrogatives is not allowed because they are se-
mantically distinct. However, Ackema and Neeleman (this volume) present an
-
_ - fen
-:- the input of (42a) and alternative explanation in terms of total underparsing, which they illustrate with
--:- .ation and scope information a typological study of passives. In their view, all candidate sets contain a fully
- -7c underparsed structure because this so-called null parse is consistent with any
semantic interpretation. This enables them to claim that if well-formedness out-
weighs faithfulness, the null parse, instead of a party underparsed structure, may
come out as the optimal candidate ('sometimes it is best to say nothing'). In this
- - atlonal variants-Legendre et way they explain why languages lack (certain) passives.
- - ,e~trictionon candidate sets. Ackeina and Neeleman (this volume) mention two drawbacks of the partial
- -pproaches have the same net underparsing approach that do not arise in their analysis. First, they argue that
- - - Tltrary to the Semant~cIden- the partial underparsing approach does not acknowledge the autonomy of syntax
..
:-<,upposes a syntax for inputs: because it presupposes interaction of semantic and syntactic constraints. Second,
they remark that it is not obvious what underparsed structures should look
like. For instance, in a language that lacks multiple interrogatives (they take
Irish as an example), the second Wh-phrase in (48a) cannot be fully deleted, as
in (48b), because it is not the case in Irish that otherwise obligatory arguments
may remain silent in interrogatives. Nor is it true for Irish that (48a) is allowed
to surface if what is interpreted as a non-interrogative. Finally, there is the op-
tion Legendre et al. actually plumped for, according to which the second
Wh-phrase is realized as an existentially quantified expression, as in (48c).
Ackema and Neeleman argue that this option is not unproblematic because there
seems to be no phonological relation between for instance what and something
in Irish.
(48) a. Who destroyed what?
b. Who destroyed?
c. Who destroyed something?
Be this as it may, the total underparsing approach seems to have a more fun-
damental flaw, as noted by Legendre et al. (1998). Let us return to our analysis
of multiple interrogatives. If *ADJOINand O P - S P E Coutrank PARSE- Wh and
STAY,this does not necessarily lead to the election of the null parse as the opti-
mal candidate. All things being equal, whenever some other parse constraint (for
instance Ackema and Neeleman's PARSE-Passive,which requires that passive
morphology be parsed) outranks *ADJ O I N and OP-SPEC,the null parse will not
come out as optimal so long as it violates this parse constraint. As Ackema and
Neeleman (this volume) point out themselves, this leads to the presumably
false prediction that there could exist languages which lack multiple
interrogatives in active, but not in passive, sentences. To repair this, they pro-
pose a cyclic evaluation procedure which ensures that parse constraints do not
interact with each other.
The terms absolute ungrarnrnaticality and ineffability are thus used to refer not
only to the language-particular absence of specific constructions, but also to the
universal ungramn~aticalityof specific syntactic structures. There are at least two
potential sources for universal ineffability. First, it can be due to harmonic
boundedness (see Prince and Smolensky 1993). A candidate C, is harmonically
bound by some candidate C,iff C,incurs all constraint violations C2 does and
C,incurs more constraint violations than C2 does. In that case, C, will lose on
any ranking. If the constraint inventory is universal (see Boersma [this volume]
and Ellison [this volume] for claims to the contrary), C, will be ungrammatical
in all languages. In most of the tableaus given above concrete examples of har-
mollic boundedness can be found. For instance, in tableau (37), repeated here
as (49) for convenience, the candidates (b) and (c) are harmonically bound by
candidate (a), because they both incur the violation of STAY that candidate (a)
does (which is given in parentheses) as well as some additional violation. Hence,
they are predicted never to surface (cf footnote 7).
Boersma, Dekkers and van de Weijer: Introduction 25
---r--.ed structures should look (49) Matrix declaratives 1 OP-SPEC I OB-HD 1 STAY 1
J
- - - 212 lnterrogatlves (they take
a. [John will [t read books]] (*)
-\L cannot be fully deleted, as
- - - r ~ \ise obligatory arguments b. [ e [John will [t read books]]] *! (*>
_ = 3. Irlsh that (48a) is allowed c. [ will [John t [t read books]]] 1 1 (*)*! 1
- l i e Fmally, there 1s the op-
Second, syntactic structures may be universally ungrammatical because the Gen-
- -Jmg to which the second
- srator simply cannot produce them (see Broekhuis and Dekkers [this volume]).
-?a expression, as In (48c).
For instance, if X-bar theory is taken to be a defining property of the Generator
- -nproblematlc because there
(see Grimshaw 1997), any phrase marker that is not in accordance with the
- btance what and somethzng
X'-schema will be an illicit output. Given the standard assumption that the Gen-
erator is universal, this structure is predicted not to appear in any language.
_ ~-_ _. ,
.- .-
_..-.
-
. i
..;
__;
..
. -- .
-
. -
- 111K:s.
~:-:
(51) a. whether the language is ATR- or RTR-dominant; - -
- - 5 -2;: :.:oT- :,:-: - -.
b. whether the language honours HIIATR; - -..-.I
-
~ ~? I : ~ . r2: I:- - - -
. .
-. -
There is one ambiguity to fix, however. In an RTR-dominant language that . 2-23 :kc? :121; -:~ -
-~-.:
. , r-.=-.
:--- - 7 -
honours HIIATR, an underlying /€ti/ has the surface candidates [€ti], which .:
- -2<25 T\-:tfl - ~
..--.
~.
violates H A R M O N Y[ ;E ~ I ]which
, violates H I / A T R ; and [eti], which violates . :>.% ~ncl>r'*---
. -- .- -.
L L A .
..
language, there remain two candidates: [eti], which would mildly suggest ATR .-.? ?:a\-id?d b::. :.-I :
~
--
. :. This s t r a t s 9 :1 . . :
-
dominance, and [€ti],which is a surface violation of harmony and increases the
~. ...... .
:. :.:Ec?r from ?.>
vocabulary. NOW,Pulleyblank and Turkel (1995, 1996) interpret the dominance
parameter in such a way that it influences the surface vocabulary-i.e. /&ti/sur- --, I,? T , . ~ U C ~I--
.
-
- - - . -
.,.{;:~
faces as [€ti]. In other words, RTR dominance is expressed as the ranking 00
MAXRTR >> H A R M O N Y>> MAXATR (this is a slight simplification of :.:i~n and T\-&*: :-1 - . - ~
-
face vocabularies of VtV words; the labels A through G follow Pulleyblank and -:.: I L _ i to their :A:;;-; - -
Turkel's (1995, 1996) typology. For instance, language B is a Wolof-like language: -:::?t grammars.
--
it honours the HIIATR grounding condition and is RTR-dominant. The gram- >;cording to rki _
-_ -..-
. as her currsr.: :-: ; - - :
mars A1 and A2 generate the same vocabulary: if there are no alternations, they , --
describe the same language, which we will call A. Language C is the intersection : t ~ : i i i 2 it with a diz:rt:- ; -
of A and B; D is the intersection of A and E; G is a proper subset of E; and F is 1x5 algorithm is r.:-;-;
. .
a proper subset of B. Pulleyblank and Turkel cite instances of real languages that :irammar F, ana r:r r- I .- .
more or less match each of the seven languages in the typology (52). r t e ] , with a pro?:-:--- .
. .
The first three rows of the surface vocabulary in (52) enumerate the harmonic n:ut conflicts ~rir?.-:1; . - - ~
words. The surface forms in the first row are always possible; forms with [I] are r::olds [I]. The ic.r<:r-.r _ - -
7 .
Boersma, Dekkers and van de Weijer: Introduction 27
-
- - , R \ I O N Y Such
z
. a language
1-1 whether an underlying
(52) Grammar
HIIATR parameter
A i A 2 B
- - +
C
+
D
- -
E F
+
G
+
- constraint language, ATR LOIRTR parameter - - - - + + + +
- - -7 ITR value in the Input dominance parameter ATR RTR R T R ATR RTR ATR RTR ATR
:>J> MAXRTR;we also see
_ ,ahe
iti, ite, eti, ete, ete, Eta, ate, ata + + + + + + + +
_ _ ..need low vowel /a/, or the I ~ II,~ EE, ~ Irta,
, at1 + + + +
3lank (1994) ascribe these ita, at;, eta, ate, ata + + + +
;iondttzons LOIRTR 'if a it^, ti, Ite, etI, Ita, atI, ete, ete,
-
- LI; el is high, then it is ad- eta, ate, ata, ata
:-_ndlng conditions directly
_ _? RTR, it has the ranking
normally surface as [a]; eta, ate + +
- A T R >> MAXRTR,SO that ita, ati + + + +
. ..
- -. -- - -.- .>:I-ingthree binary param-
ruled out in languages with high H I / A T R ,and forms with [a] in languages with
high Lo/RTR.
The last four rows of (52) enumerate the thinkable disharmonic words. Most
of them do not occur in any of the languages. However, the disharmonic forms
[ite] and [ita] are licensed in RTR-dominant languages with high HI/ATR (B
--
- - -~ : A i-dominant language that and F), and the disharmonic forms [ate] and [ati] are licensed in ATR-dominant
I candidates
- --.-;:- [eti], which languages with high LO/RTR(E and G); in the RTR-dominant language D, by
- - ~.
. 2nd [eti], which violates
- 7 ,
contrast, underlying /ate/ would give [ate].
- - - ::::!nates MAXRTRin such a Note that P&P learning algorithins can be used for learning constraint gram-
--. ;? :could mildly suggest ATR mars provided that the constraint rankings are translated into parameter settings
..
-. : r ;'~rrnony and increases the first. This strategy is discussed in D.2 to D.7. OT-specific algorithms follow
- - : 1- : : interpret the dominance thereafter from D.8.
.-f:;s ~ocabulary-i.e. /€ti/ sur-
. . .: .i sxpressed as the ranking
-. D.2 T h e Triggering Learning Algorithm
- - . .i .I slight simplification of Gibson and Wexler (1994) proposed their Triggering Learning Algorithm (TLA)
in order to account for the acquisition of three syntactic parameters (subject
-1their seven possible sur- location, object-verb order, verb-second). Pulleyblank and Turkel (1995) applied
-
_ - - G follow Pulleyblank and the TLA to their three-parameter grammar space of tongue-root-harmony con-
- - _ 1s a Wolof-hke language:
9;
_: straint grammars.
_ - RTR-dominant. The gram- According to the TLA, the learner can, at any moment during acquisition,
- - z:? ale no alternations, they have as her current hypothesis any of the eight grammars A1 to G, and she may
_ - zuage C is the intersection replace it with a different grammar only if incoming data conflicts with it-i.e.
-.-oper subset of E; and F is the algorithm is error-driven. Suppose, for instance, that the learner is currently
-
.-inces of real languages that in grammar F, and that the language environment is A. A possible datum, now,
. - - - 2 t\ pology (52). is [ ~ t e ]with
, a probability of 1/18, if all possible words are equally likely. This
- ;: enumerate the harmonic input conflicts with her current assumption of high-vowel grounding, which
.possible; forms with [I] are forbids [I]. The conflicting input is a trigger the learner will try a different
28 Optimality Theory
grammar which she chooses randomly from the set of grammars adjacent to F sarning scheme, the .c_-- .
(i.e. the algorithm is conservative), and will only accept that new candidate gram- ould generate the S L - - ..
mar if it does allow [rte] (i.e. the algorithm is greedy). Pulleyblank and TL
According to the binary parameters in (51), grammar F must be considered .
rqually likely as the _- - _
- -
adjacent to B, D, and G because a transition to any of these would involve .lows that in the en1 -
changing a single parameter setting. We can represent the adjacency of all eight 3ility of being replac;; - -.
- -
3 3 Unlearnable Lair;
According to Gold 11;~- . .-
ruaranteed to find the :r r I: - -.
language A is thus le;:. r : -
ir? not.
Suppose that a leaizsr : . -
:Tar B. Because langu:;: I
For instance, any shortest path from E to B involves three parameter flips.
tnvironment C can ?I-?: I - ~ .~
Now suppose that the learner's environment is language A. Confronted with
:~xt of the unrestricti~vs:~-: r
the A-datum [ I ~ Ewhile
] her hypothesis is F, she will try the adjacent B, D, or
knguage B. The com~:;:: - - . -
G, all with probability 113. She will only change her grammar if the new gram-
-:.
mar does license [rte] i.e. if the grammar that she tries happens to be AI, A2, D,
or E. Combining these, we conclude that she will only make the plunge if the j j) Language C
new grammar is D. Since the data [etr], [ ~ t a ] ,at^], and [ I ~ I ]trigger the same
grainmar change, the probability that the learner's grammar after the next da-
tum (randomly taken from the vocabulary of A) is changed to D, is 5xi/i8xi/3
= 5/54. The complete transition graph in an A environment is:
- .:t of grammars adjacent to F learning scheme, the learner can never replace a grammar with a grammar that
-+.spt that new candidate gram- would generate the same language.
- %). Pulleyblank and Turkel propose that 'any of the set of possible languages is
- -mmar F must be considered equally likely as the starting hypothesis of the TLA'. The transition graph (54)
- -
any of these would involve shows that in the environment of language A, all grammars have a finite proba-
- - -.-nt the adjacency of all eight bility of being replaced with the grammar A after one or two steps, and that
grammar A will never be abandoned once the learner has reached it. All learners
will therefore eventually settle down in language A if the environment is A.
Thus, language A will always be learned correctly.
Starting from a randomly chosen grammar, 112 of the learners will end up in an
A grammar, 114 in B, and only 114in C.
The learning graph for language G shows three sinks or local maxima (B,E ,
and G):
(56) Language G
. .
-- : :he probability, for instance,
- ~.
.. .,
:
~
If the distribution of initial states is uniform, the probabilities that the learner ,ipart from the superst: ;- -
--
ends up in each of the eight grammars is: 7/24 for B, 8/24 for E, and 9/24 for G. .3 2 current hypothesis cf 11: :.
The learning of the correct grammar for language B ('Wolof') is not guaran- :st out because all adiai;:: jr - 1
teed either; the learner may end up in the absorbing cycle E-AI: r:im and greediness to~i:?:-
I f the higher mountair -1: r: . r
rlimbing down-being i T :- L
r .- .
rill disallow the latter. ic>
r:nt hypothesis is B. A;; r ::- ;
I:?? grammar if the inci.n:.- 1 I
:>at she cannot go to G i:r r-
r:ers. Greediness, hou72~-5: ::
1 because these do not r L -
This graph shows that a learner can go back and forth between two adjacent 3.5 Improving Convei.gr.::- - .
-
grammars: if she is in grammar C (so that an underlying /itat would have to reduce the superset ~ rr 7: -~
surface as [ita]), the B-datum [ita] may cause her either to reject her hypothesis Yar. In syntactic theory :
of ATR dominance (and go to B), or to hypothesize that harmony is dominated 1987) was confronted b;: :I r - 7
by LO/RTR(and go to G); conversely, if she is in G (which disallows [a]) and :on-pro-drop languag?:. T - .
is confronted with a B-datum that contains [a], she will have to cancel the :on-pro-drop hypothesis- I ~ r : .
LOIRTRgrounding condition by going to C. -.,.-henthey start to spe;,. .-. -
The learning graphs for the three remaining languages D, F, and E, can be 3 'it rains' provides po:.: - :
obtained from ( j j ) , (56), and (57), respectively, by replacing all instances of 'B' -7arameter.)
with 'E' and vice versa, and doing the same for the pairs C-D and F-G. Gibson and Wexler I; :- T -
ima, the original language would die out in a few generations. Because no such G,she will always end n; -:- - :
instabilities have been reported for the instances of the languages B, C, D, E, F, see that with a probabii::- -
and G identified by Pulleyblank and Turkel (1995), we will assume that the non- xp in the correct g r a I r r l r I
convergence is a problem in the case at hand. clom initial state, but is r I r - ~
The fact that language A is learnable in the limit is largely because it is the least sis F, she will end up ic ;-1- -.
restrictive of the seven languages: there is always positive data around to wipe out language.
any grounding constraints. In graph (56), however, we see that if the learner's We can now compar? 1x7 r .
hypothesis is E,she will never have any reason to go to the target grammar G, in the cases of the uni?::: r- -
: :robabilities that the learner Apart from the superset problem, there is a more generic source of concern.
- - - 3 8/24 for E, and 9/24 for G. The current hypothesis of the learner may be a grammar from which she cannot
_ __:-
B ('Wolof') is not guaran- get out because all adjacent grammars are worse. This is a problem of conserva-
:cycle E-A1: tism and greediness together: if you are at a hilltop and want to reach the top
of the higher mountain instead, you can choose between jumping the valley or
climbing down-being conservative will disallo~lthe former, and being greedy
will disallow the latter. M7e can see this in graph (57). Suppose the learner's cur-
rent hypothesis is B. According to table (52), she will only be urged to change
the grammar if the incoming G-datum is [eta] or [ate]. Conservatism tells us
that she cannot go to G directly because that would require flipping two param-
eters. Greediness, however, tells us not to go to the adjacent languages A2, C, or
F because these do not allow [ate].
- - %rth between two adlacent D.5 Improving Convergence: T h e Most Restrictive Initial State
-,:rlvlng /ita/ would hdve to To reduce the superset problem, we can try to start in the most restrictive gram-
-- :r i r r to reject her hypothesis mar. In syntactic theory, this subset principle (Berwick 1985, Wexler and Manzini
tAat harmony is dominated 1987) was confronted by the problem that pro-drop languages are supersets of
- 2 ~ t h l c hdisallows [a]) and non-pro-drop languages. This means that a child would have to start with a
,nz will have to cancel the non-pro-drop hypothesis, although most children do not produce any pronouns
when they start to speak. (A possible line of defence is that the empty subject
- _-guages D, F, and E, can be in 'it rains' provides positive evidence for the negative setting of the pro-drop
-23lacing all instances of 'B' parameter.)
- ;pairs C-D and F-G. Gibson and TVexler (1994) applied the idea of a maximally restrictive initial
state by requiring their learner of constituent order to start with a negative set-
ting of the V2 parameter. In the tongue-root-harmony case, we could start with
. - :.. - :s s:\-ironments.
~. This does not undominated grounding conditions. Such a restrictive grammar will be changed
I :, 1
:
; Roberts 1993). It is possi- only if positive evidence is found of the occurrence of ungrounded feature com-
::1 local maxima do actually binations. We would thus start in language F or G (both with a probability of
- -
-
_:
:: r r i :he next generation would 1/2), with positive settings of both grounding parameters HIIATR and LOIRTR.
. - - - - t , :sarnable without local max- We can easily see from graphs (56) and (57) that if the learner starts in F or
- - . :
:- izrrations. Because no such G, she will always end up in the target language. From graph (55), however, we
. .
- ::-:: 11s languages B, C, D, E, F, see that with a probability of 112 of starting in G, the chance that she will end
.- ; - .-2 !<ill assume that the non- up in the correct grammar C is also 112. This is an improvement over the ran-
. . dom initial state, but is not our ideal, because if the learner starts with hypothe-
. -
2rgely because it is the least
- - . - .i sis F, she will end up in grammar B, which again represents a superset of the C
-
:-.:;-s data around to wipe out language.
: :T. :ii' see that if the learner's We can now compare the probabilities of convergence for all seven languages
- - I r: to the target grammar G, in the cases of the uniform and restricted initial distributions:
. . . .
- :-:.ss:s E-the vocabulary of G
- - .r :5 lrarning scheme, the non- (58) / initial
grammar
. -
. . ;.:nnot be signalled by the target language A B C D E F G
T -.I ..rii?n, this will lead to a high randomly chosen from A I ~2 B c D E F G 1 8/11 114 114 8/11 318 318
:::strictive grammar. randomly chosen from FG 1 1 I / 1 2 1 1 1
32 Optimality Theory
..
Learnability is not perfect yet. There is no initial parameter setting that makes 5 :rnguage. In Y.::? : :
all languages A to G learnable.14The source of this problem lies in Pulleyblank .- rrrmeters hat-? rc:. - - - .
and Turkel's 'packaging' of the tongue-root constraints: ATR dominance stands . ,-. ;hat the tor:-s;: ;.~
.- .
for MAXATR>> "HARMONY >> MAXRTR.In Smolensky's (1996bj proposal, :-.1s reached ~ 5 - y- -
structural constraints (like HARMONY) should initially dominate all faithfulness k:;h ma>-be l?-i.ri- ::. -
-
- ...---..., ~ s space
r this ii I- -
constraints (like MAXATRand MAXRTR), SO that the languages F and G are not ~
.-.
flicting. For example, consider the underlying forms lital, I~tal,lital, and l~tal,
-.
the probab:.:r-: . : T . . - ~
Smolensky 1993), should all be possible inputs to a universal set of innate con- - .. . -':a1 tox:, a po:-.x I 111 .- ::
straints. If the grounding constraints are honoured, the output in the initial state -
.-. ..., an esponsr.:i::
- -*- t: r : . - .
~
should be [ita], perhaps violating some MAXconstraints. But [ita] violates the -..---.-oh-ed.
harmony constraint. A system of binary feature values cannot meet Smolensky's ?\? must concl::t: .:1 .~-
requirement of initially undominated structural constraints. -..-; - acquisition t i ~ : ? . f - 7 . -
- ,
1 -sides, Ber\rick . - -
D.6 Improving Convergence: Relaxing Conservatism or Greediness r>:!- \rere fortuna:? :r - - r -
To tackle local maxima, the validity of both the conservatism constraint and the :r';jtitueat-ordsriz~ ;: - .
-
greediness co~lstraintwas challenged by Berwick and Niyogi (1996): 'If the :rc7blem it doss r : :r 1
learner drops either or both of the Greediness and Single Value Constraints, the :! 57)).
resulting algorithm [. . .] converges faster (in the sense of requiring fewer exam- -_ -. 1t.rzproving CL.::-. :.
ples) than the TLA' (p. 607). They go on to show that this statement is true for #.
--
Gibson and Wexler's (1994) example of a three-parameter constituent-ordering - 2 2 local-maxima 2
:::: -
parameter spaces. We can see this if we compute the expected number of exam- ,>ace. Apart from :::r . . - - -
.-
ples (data) the learner needs to arrive at the target grammar, in the two compet- I I 'mutations'), ~\-ki;:~ - ,
~ ~ o u be
l d equal to 2"'~. In Berwick and Niyogi's algorithm, which does not use I< and Y, the learner . - :
the conservatism and greediness conditions, the average number of triggers (con- ;iithout sacrificins t:r: :: . .:
flicting data) required for reaching the target grammar (from any non-target :he least fit candiirr+- . :: -
grammar) would be 2'7-1 For a three-parameter space this is 7, but for a more Pulleyblank and T-.r - r ~
-
- - ~
For Gibson and Wexler's algorithm, we can compute the convergence under recombinations (r.cl: r; .- I
the simplifying assumption that the parameters have independent influences on measure of fitness. :T :. -
has to maintain 3: r ~
: - - --<
correct ranking of 2 5 r: - ~
Note that the problem cannot be solved by an initial unsetting of the dominance parameter, sand data. Interestir;: I :-
along the lines of Gibson and TVeexler, who start with the specifier-head and complement-head learner correctly a;l---t. r - -
parameters unset. If the learner starts with a combined FG hypothesis and encounters the C-datum
[ita], she will flip the LOIRTR parameter and go to a combined BC hypothesis, from which she will As everyone a s h 1 - :1-
never get to the target grammar C. gence onto the $ 1 : ~: -- - -: -
Boersma, Dekkers and van de Weijer: Introduction 33
-
- --?ter setting that makes :he language. In such a case, the worst initial state is one in which all N, binary
- -.sm hes m Pulleyblank larameters have the wrong value. With every trigger, there is a probability of
-1TR dominance stands 1, that the correct parameter change will be chosen. Thus, the target grammar
:- AJ's (1996b) proposal, I\-illbe reached after at most N; triggers. For a three-parameter space this is 9,
--minate all faithfulness i,\,hichmay be worse than in Berwick and Niyogi's algorithm, but for a 30-pa-
- --g~agesF and G ale not rameter space this is only 900, to which Berwick and Niyogi's alternative consti-
tutes a drastic deterioration.
: nhtramt on top. How- With respect to the quantity of data needed to reach the target grammar, the
. --:I are inherently con- difference between the two algorithms is somewhat less dramatic because in the
'1ta1, Iital, and 11ta1, TLA the probability that a datum is a trigger decreases as the hypothesis ap-
- -ns base' (Prince and proaches the target grammar. Even if the quantity of data thus becomes propor-
-- --.a1 set of innate con- tional to N,', a polynomial dependence of the learning time will always outper-
-;ut in the initial state form an exponential dependence as soon as realistic degrees of freedom are
- - But [ita] violates the involved.
---~ot meet Smolensky's We must conclude that without the conservatism and greediness constraints
-
-+ L\ the acquisition time of realistic grammar spaces would be prohibitively large.
Besides, Berwick and Niyogi's alternative does not solve the superset problem
(they were fortunate in the lack of superset languages in their three-parameter
- -7 constraint and the constituent-ordering problem), so in the case of our tongue-root-harmony
- '\ T ogi (1996): 'If the problem it does not even converge (and it crashes on the absorbing cycle
- : -1ue Constraints, the of (57)).
-:quiring fewer exam-
- -
~Tatementis true for D.7 Improving Convergence: Genetic Algorithms
- r- Lonstituent-ordering The local-maxima problem is smaller if the learner is allowed to maintain multi-
ple hypotheses at the same time (Clark and Roberts 1993). For example, suppose
to hold for larger
::-- the learner arrives at two different local maxima X and Y in a 20-parameter
- = 1-23. number of exam- space. Apart from the usual parameter swappiilgs within the hypotheses X and
- - m the two compet- Y ('mutations'), which will not help her out of the trap, she will have the option
of creating a new hypothesis Z that copies, say, 11parameter settings from X and
:, ,bible
grammars N the remaining nine from Y (a 'recombination'). If this hypothesis is better than
- - hlch does not use
- 1) X and Y,the learner will have succeeded in getting out of a local maximum
- - - - l ~ e rof triggers (con- without sacrificing either conservativeness or greediness. After each generation,
-- --om any non-target the least fit candidates are removed from the pool of hypotheses.
- . ,; but for a more Pulleyblank and Turkel (this volume) propose an algorithm with OT-specific
mutations (swapping the rankings of adjacent constraints), without any
-A . convergence under
--: recombinations (not really genetic, therefore), and with an involved OT-specific
- -Yndent
: influences on measure of fitness. The learner of the tongue-root-harmony system of Yoruba
has to maintain 32 hypotheses at a time, and will usually converge upon the
correct ranking of the ten relevant constraints (in six strata) within a few thou-
- -::dominance parameter, sand data. Interestingly, Pulleyblank and Turkel show that in their example the
- .~ndcomplement-head
---:::
r: 2niounters the C-datum
learner correctly achieves any subset grammar in the majority of trials.
.
... c
.,. from which she will
- ..- .... As everyone acknowledges, the genetic approach does not guarantee conver-
gence onto the global maximum. Moreover, it places a large burden on the
34 Optimality Theory
learner, who has to maintain several, possibly very distinct, hypotheses. By con- :hat ensures that in r:-.: ::
trast, some OT-specific algorithms (Sections D.8 and D.11) are guaranteed to +an the previous ~7in:c: .-
error-driven variant of this algorithm, which is called Error-Driven Constraint ,-sample, the under1yir.g f :-
Demotion (EDCD). Like the TLA this algorithm is error-driven, conservative,
greedy, and oblivious, but unlike the TLA, it always converges upon the target
grammar: OT grammars are learnable with a number of triggers that relates to
the square of the number of constraints (which is as fast as the grammars with
independent parameters described in Section D.6), without ever getting stuck in
a local maximum.
As an example, we will show how an OT learner can go from hypothesis B
The learner now demcre: --
to the target grammar G-something impossible according to graph (57). In
~
- 2otheses. By con- that ensures that in the new grammar, the form [ate] will become better
- -
L.r guaranteed to than the previous winner [ate] (though not necessarily better than a third can-
didate):
_ - m a 1 maxima, and
- - -I - ?JI one conclusion:
-ound ranked con-
. : - -31s volume) devel- Now that harmony is ranked low, the new hypothesis is not a tongue-root-
_:.,rlbe here only the harmony grammar! The learner will now need evidence of harmony-for
- - - - --Drlven Constraint
- example, the underlying form /et&/,which surfaces as [ete] in language G:
L - AI en, conservatlve,
- >:;- upon the target
- -- ~gers that relates to
b _.;he grammars with
21 er getting stuck in
;? from hypothesis B
..- - -- -2 to graph (57). In The learner now demotes MAXRTR(the only high violated constraint in the
adult form) below H A R M O N (the
Y highest violated constraint in her own win-
_ -11 ATR >> MAXRTR
'11 LXATRbecause of
ner, so that this constraint becomes crucially violated in her new evaluation),
- - q because [a] is al-
giving the ranking in (62).
- 2 and the learner en- When comparing this learning algorithm with TLA, we can see that something
very different is going on: we need an underlying form. Moreover, an underlying
- 7 2thesls. Assuming that
let&/suddenly becomes [ete], while it would have become [ E ~ Ein] the original
- - home relevant candi-
hypothesis B.
The new grammar is already ATR-dominant (in fact, it is language C). The
datum [ate] is relevant again, ihough the learner now chooses a different incor-
rect winner from before:
- -
*
MA^
b' [ate]
[atel *!
_ which 1s shown by
,, w [atel * *
-
21 er, is [ate], which is
- sn learner wlll have to Since HARMONY and LO/RTRare in the same stratum, two candidates are both
-he violated constraints optimal in the learner's hypothesis. One possible interpretation of this (Anttila
- 5,- 1 iolated constraint in 1995) is that the learner can choose freely from [ate] and [ate]. If she chooses
rnmediately below that [ate], this will be different from the correct adult form [ate], so that H A R M O N Y
-:-noted below MAXATR, will be demoted below Lo/RTR,into the stratum of MAXRTR, as we see in (63).
: nearly) minlmal ~ h a n g e After that, the underlying form let€/ is relevant again:
36 Optimality Theory
learner's grammar, as they are in the target grammar G. But we need the datum (this volume) use of the:::5 :
[ate] again, because for the underlying form late/, the candidates [ate] and [at&] tion of prior knowledgs r r .
tie for optimality: volume) of an iterative st::;
underlying forms (see Sf;:- : ~
mars very well: the rankings would be reshuffled at each learning step. The
EDCD is not very robust against errors either: a single error may destroy the
-
- -- -
1 2erceptual p11onolo~I:: r :- -
:?r grammar from tht :.
-
-
--
grammar in such a way that it can cost on the order of N" learning steps :rticulatory gesture, thf 7 : : . -
(though usually much fewer; N is the number of constraints) to climb out of the _om above, where th?: :: : -.
wrong grammar. For instance, if the learner has reached the target grammar B :,arner but possibly ;:r .:;
(HIIATR>> MAXRTR>> H A R M O N >> Y MAXATR>> Lo/RTR), an erratic 'ase of communicati~rr 7 - :=.
[it11 datum, if taken as correct by the learner, will demote HI/ATR past ~onstraintsare const:;:::: : -r
MAXRTR,into the stratum of H A R M O N Y and
, it will require an additional se- ~onstraintslike 'a lo^\- - - :: ">
quence of e.g. lit11 + [iti], lital + [ita], let&/ + [ ~ t e ]and
, latal + [ata], to lrocess to put these 2:::I
arrive at the correct grammar again. larm.
The gradual learning algorithm by Boersma (1997, this volume) makes use of
a continuous ranking scale. When the learner's winner conflicts with the adult 3.13 Second-Languag; .--: -
form, all constraints violated in the adult form are demoted by a small step T\'hen confronted with I 1 7:- 5
along this scale, and all constraints violated in the learner's winner are promoted con of this second lans-1-;: i
by the same small step. In tableau (59), this would mean that H A R M O NisY jtraint system of her firs: 1:; L
slightly demoted and MAXATRis slightly promoted; if the form [ate] was ad- probably equal an adulr :-::~
ministered repeatedly, the two constraints would ultimately swap places. Differences between t h -:- 7
For a continuous scale to make sense, evaluation must be stochastic, so that language could then ariis 1 - - z
constraints can maintain a safety margin in their ranking difference. If the target ences in the production. --I -
grammar is a stochastically evaluating grammar, the gradual learner will always terminacy by showing :krr r
converge upon that grammar-i.e. she will learn to exhibit the same amount of were uttered, the constr:.:~ :
variation as she hears in her language environment. They hold a universal rsr::;- -
Although the gradual algorithm is much slower than EDCD, it is quite resis- input into a set of feat^::? . - -r
tant to errors in the data: the number of learning steps needed after an error is
typically one, and this single step may come at a later date. For instance, if the D.14 Acquisition of t i z ~Z:
learner has reached the target grammar B and takes an erratic [itr] datum at face In a language that disalli- . r
value, the constraints HIIATR and MAXRTRwill approach one another by a from the input /ka/ or 3-r::- --I
small step. This causes an increase in the probability that MAXRTRwill outrank [ka] for the first time, tkr :i -.:
HIIATR at some future evaluation. When that happens once, HIIATR will be as lkatl. According to 7:: 1
promoted and MAXRTRwill be demoted so that the original rankings are re- Smolensky 1993: 192), ths :!: - i
stored. morphological considera:- :
lates a FAITHconstrain:. - r: - =
0 . 1 2 First-Language Acquisition at all, and both pairs ~ r i ~ i -z =~
-
A strong theory of Universal Grammar maintains that all the constraints the the output, which is the ::I-:-
learner will ever need are available from the beginning (Tesar and Smolensky the inputs of .cvhicl~the s;:.:--1
1993: 1). Under such a theory it is found that structural constraints must these creates the most hzr-:-. -:
be initially high-ranked (Gnanadesikan 1995, Smolensky 1996b). However, a To implement this st::::;-
non-nativist viewpoint could identify this situation with the initial state in infinite number of inputs
the acquisition of speech production, when articulatory gestures have still to has to determine the win:.:;
be learned. of candidates, effectively i ;.--
Boersma (this volume) argues that the learner starts out with an empty a tableau of tableaus:
grammar without any constraints: as soon as the learner manages to categorize
Boersma, Dekkers and van de Weijer: Introduction 39
:. - : r I- :r:l learning step. The a perceptual phonological feature, the relevant faithfulness constraints will enter
: . :r~: z r o r may destroy the her grammar from the bottom; and as soon as the learner starts to master an
- . r 1;: 3f ,V learning steps articulatory gesture, the relevant articulatory constraints will enter her grammar
.
to climb out of the
.. ;-.... from above, where they have been waiting in a virtual pool irrelevant for the
. .
. . .r.- t :m?
target grammar B learner but possibly playing a role in what Ellison (this volume) would call
-.
- - - = LoIRTR), an erratic 'ease of communication between linguists'. Thus, Boersma argues that sets of
.- -
~
.
~
. ~ ,:mote
. HI/ATR past constraints are constructed from the data. These sets will contain unusual
- . ::ire an additional se- constraints like 'a low vowel should be ATR'; it is the task of the learning
: - ii . and iatal + [ata], to process to put these at the bottom of the hierarchy, where they can do no
harm.
- - . olume) makes use of
_ _ ~ i l l c t swith the adult D. 13 Second-Language Acquisition
--- oted by a small step When confronted with a different language, a speaker could take over the lexi-
- :-T \ .
Inner are promoted
_ - zin that H A R M O N Y is
con of this second language, and run the underlying forms through the con-
straint system of her first language. In the phonology, an underlying form would
- _ * i s form [ate] was ad- probably equal an adult form of the target language, as perceived by the learner.
- _ - t h swap places. Differences between the learner's output and the native output of the target
- be stochastic, so that language could then arise either by differences in the perception or by differ-
-- _ -:i~tierence. If the target ences in the production. Jacobs and Gussenhoven (this volume) tackle this inde-
- - -:~al learner wlll always terminacy by showing that if a learner perceives the target forms exactly as they
-- 7.t the same amount of were uttered, the constraint system can still produce the correct surface form.
They hold a universal perceptual parser responsible for categorizing the foreign
- EDCD, it is qulte resls- input into a set of feature values supplied by Universal Grammar.
L - -.x e d e d after an error is D. 14 Acquisition of the Lexicon
;rte For instance, d the
_ ---atlc [itr] datum at face In a language that disallows coda consonants, the surface form [ka] could come
-:-aach one another by a from the input lkal or from the input lkatl. When confronted with the output
- _ R outrank
- ~ I A X R Twill [ka] for the first time, the learner has to put this in his or her lexicon as lkal or
-----, once, HIIATR wlll be as lkatl. According to the principle of lexicoi~ optimization (Prince and
= -rlglnal rankngs are re- Smolensky 1993: 192), the learner will choose /ka/ (in the absence of dominating
morphological considerations) because the input-output pair lkatl + [ka] vio-
lates a F A I T Hconstraint, whereas /ka/ -3[ka] violates no faithfulness constraints
at all, and both pairs violate the same structural constraints (which evaluate only
- - it all the constraints the the output, which is the same in both cases). The learner's strategy is to find all
_ -2 (Tesar and Smolensky the inputs of which the optimal output form is [ka], and to determine which of
_;tural constraints must these creates the most harmonic input-output pair with [ka].
--\hr1gg6b). However, a To implement this strategy, the abstract learner has to generate a possibly
- i ~ t hthe illitla1 state m infinite number of inputs (richness of the base), and for each of these inputs, she
_ arv gestures have st111 to has to determine the winning output after generating a possibly infinite number
of candidates, effectively squaring GEN. This procedure can be represented in
. .;,rts out with an empty a tableau of tableaus:
:_--i?r manages to categorize
40 Optimality T h e o r y
seems to save Gnanadesikan's (1995) analysis of unfaithful child utterances with -(1996). 'Surface constrr~- - :
-'nough Hale and Reiss (1998) continue the discussion by noting that tableaus
~ k e(69) would always force a German listener to recognize the utterance [ra:t]
2s 1ra:tl 'advice' and never as 1ra:dl 'bicycle', which adults pronounce as [ra:t]
i s a result of final devoicing.
In Section D.8, we saw that the constraint-sorting algorithm needed to know
an input form. In general, however, it is the correct parse that has to be known;
for Tesar (this volume) this is the correct metric interpretation (division into
feet) of an overt surface stress pattern. Tesar uses robust interpretive parsing to
-
find this correct parse: interpretive parsing may result in a metric interpretation
that is different from the adult's because of the incorrect constraint ranking, but
--bol ( i ) is put before the
this is better than no interpretation at all, since the constraint demotions that
-- ?, the pointing feather
arise if the learner's surface stress pattern (as computed with her constraint
system) does not match the adult stress pattern may lead to an improved metri-
I,
-: least one input that
cal interpretation o n the next cycle of interpretive parsing. Tesar claims that
- filst-language acquisi-
such an iteration of interpretive and production-oriented parsing often leads to
-he output forms to be
the correct acquisition of the grammar, especially when the learner starts from
a suitable initial ranking.
In their genetic learning algorithm, Pulleyblank and Turkel (this volume) use
the strategy exemplified in tableau (69) for determining the underlying forms
that are compatible with the attested adult output forms. They limit the number
of possible input forms by adhering to the assumption that a young learner will
- . -eve allows the learner only posit inputs that contain a subset of the phonological content of the en-
- -.Fol mstance, the adult countered output.
.- _c ~vouldnever produce
E. References
Ackema, P. and A. Neeleman (1998). 'Optimal questions'. Natural Language and Linguistic
Tizeory 16, 443-90.
Anttila, A. (1995). 'Deriving variation from grammar: A study of Finnish genitives'. Ms,
Stanford University. ROA-63. Revised version in Hinskens et al. (eds).
Archangeli, D. and D. Pulleyblank (1994). Grounded Phonology. Cambridge, MA, MIT
Press.
-1 r r.:ize [kat] as lkatl; other- Beckman, J., L. Walsh Dickey and S. Urbanczyk (eds) (199;). Papers in Optinzality Theory.
$. r -:tr case, she will pronounce University of Massachusetts Occasional Papers 18.
: =::ei~:ed underlying forms in Berwick, R. C. (1985). The Acquisition of Syntactic Knowledge. Cambridge, MA, MIT Press.
-and P.Niyogi (1996). 'Learning from triggers'. Linguistic Inquiry 27, 605-22.
- . .- .OT r grammars for com-
- Boersma, P. (1997). 'How we learn variation, optionality, and probability'. Proceedings of
_ s i r r and Smolensky 1996,
.. . the Institute of Phonetic Sciences of the University of Amsterdam 21, 43-58. [Earlier
- :
.
I and Reiss (1996), who version in ROA-221.
-. .- li kat/ would need high- Broselow, E. (1991).'The structure of fricative-stop onsets'. Paper presented at the Confer-
-- --
I ... this that the production
-
ence on Phonological Feature Organization, LSA Summer School.
r -.ir interpretive parsing thus Burzio, L. (1994). Principles of English Stress. Cambridge, Cambridge University Press.
- il-rh6~1 child utterances with -(1996).'Surface constraints versus Underlying Representation'. In Durand and Laks
.- . - ,- ; 1996b) idea of an initial
- (eds).
constraints, al- -(1998). 'Multiple correspondence'. Lingua 104, 79-109.
42 Optimality Theory
--
Chomsky, N. (1995). The Minimalist Program. Cambridge, MA, MIT Press. ;?<cndre, G., C. Wilson ;::
-and M. Halle (1968). The Sound Pattern of English. New York, Harper & Row. case and grammatic:. .:
---and H. Lasnik (1977). 'Filters and control'. Linguistic Inquiry 8, 425-504. of the Berkeley Liizpi . -
.-
Clark, R. and I. Roberts (1993). 'A computational model of language learnability and ---(1998). . - :- '
Costa, J. (1996). 'Word order and constraint interaction'. Ms, Leiden UniversityIHIL. ---K. Home: 1:~:
-(1997). 'Word order typology in Optimality Theory'. Ms, Leiden UniversityiHIL and
Beckman et al. (eds . : -- -
University of Lisbon. ..iiCarthy, J. J. (1986). 'OC: :-:
Dekkers, J. R. M. (1997). 'French word order: A conspiracy theory'. In J. Coerts and 17, 207-63.
H. de Hoop (eds) Linguistics in The Netherlands 1997.Amsterdam and Philadelphia, ---and A. S. Prince 15-
John Benjamins, 49-60. broken plural'. Nat::,..:. I
Durand, J. and B. Laks (eds) (1996). Current Trends in Phonology: Models and Methods. --(1993a). 'Generr::?: :
- .
-
Salford, Manchester, University of Salford Press. --(1993b). 'Prosot:: r-
French, K. M. (1988). Insights into Tagalog red~~plication, infixation and stress from non- University of Mas~ac.:~:r - -
linear phonology. Summer Institute of Linguistics and University of Texas, Arling- --(1994). 'The ems:;:-
ton. Ms, University of hias-:.:- . ~-
Gibson, E. and K. Wexler (1994). 'Triggers'. Linguistic Inquiry 25, 407-54. --(1995). 'Faithhi::. -
Gold, E. M. (1967). 'Language identification in the limit'. Information and Control lo, 249-384.
447-74. Mascarb, J. (1976). Catalrii: .'
Gnanadesikan, A. (1995). 'Markedness and faithfulness constraints in child phonology'. MIT, Cambridge, 1\Ias2z::. :-
Ms, U.Mass. [Rutgers Optimality Archive 67, http://ruccs.rutgers.edu/roa.html.] Club, Bloomington, In:-. - -
Grimshaw, J. (1991). 'Extended projection'. Ms, Brandeis University. Miiller, G. (1998). 'Order ;r: : --~
-(1997). 'Projection, heads, and Optimality'. Linguistic Inquiry 28, 373-422. unmarked'. Ms, Univer,::
-and V. Samek-Lodovici (1995). 'Optimal subjects'. In Beckman et al. (eds), 589-605. Myers, S. (1997). 'Expressin; r - I -
Hale, M. and C. Reiss (1996). 'The initial ranking of faithfulness constraints in UG'. Ms, Pesetsky, D. (1997). 'Optin-.r--r~ -- -
Concordia University. ROA-104. D. Archangeli and D. T. L.: I
--(1998). 'Formal and empirical arguments concerning phonological acquisition'. Blackwell, 134-70
Linguistic Iizquiry 29: 656-83. -(1998). 'Some O p t i m a l i ~; - r -
Hinskens, F., R. van Hout and W. L. Wetzels (eds) (1997). Variation, Change and Phono- Boulder.
logical Theory. Amsterdam and Philadelphia, John Benjamins. Pulleyblank, D. (1993). 'Volri: r- :
Huang, C.-T. J. (1982). Logical Relatioizs in Chinese and the Theory of Grammar. PhD Fonologia, University of 1 -
thesis, IMIT. --- (1996). 'Neutral vowels i: I :-
Inkelas, S., 0 . Orgun and C. Zoll (1997). 'The implication of lexical exceptions for the Canadian Journal of Lii:; .
. . .
._: ..;:.I \LIT Press. Legendre, G., C. Wilson and P. Smolensky (1993). 'An optimality-theoretic typology of
5 ~ 1-ork,
. Harper & Row. case and grammatical voice systems'. Proceedings of the Nineteenth Aiznual Meeting
. .
- ~
- ...
..:."r) 8, 425-504. of the Berkeley Linguistics Society, Berkeley, CA.
. . ---(1998). 'When is less more? Faithfulness and Minimal Links in
1:. r r language learnability and
. - WH-Chains'. In P. Barbosa et al. (eds).
- ..I. Leiden UniversitylHIL. ---K. Homer and W. Raymond (1995). 'Optimality and Wh-extraction'. In
,- - ; Lziden UniversitylHIL and Beckman et al. (eds), 607-36.
McCarthy, J. J. (1986). 'OCP effects: Gemination and antigemination'. Linguistic Inquiry
theory'.
-; In J Coerts and 17, 207-63.
- -c-,.-erdam and Philadelphia, ----and A. S. Prince (1990). 'Foot and word in prosodic morphology: The Arabic
broken plural'. Natural Language and Linguistic Theory 8, 209-82.
- ~7gy)):
Models and Methods. --(1993a). 'Generalized alignment'. Yearbook of Morphology, 79-153.
--(1993b). 'Prosodic morphology I: Constraint interaction and satisfaction'. Ms,
- ,jttan and stress fiom non- University of Massachusetts and Rutgers University.
- _ ,TIT erslty of Texas, Arling- --(1994). 'The emergence of the unmarked: Optimality in prosodic morphology'.
. -
'
: iL71iiry28, 373-422.
::::ban et al. (eds), 589-605.
unmarked'. Ms, University of Stuttgart.
Myers, S. (1997). 'Expressing phonetic naturalness in phonology'. In Roca (ed.), 125-52.
. . :-.3:5s constraiilts in UG'. Ms, Pesetsky, D. (1997). 'Optimality Theory and syntax: Movement and pronunciation'. In
D. Archangeli and D. T. Langendoen (eds) Optirnality Theory: An Overview, Oxford,
. - -5 phonological acquisition'. Blackwell, 134-70 . .
---- (1998). 'Some Optimality principles of sentence pronunciation'. In Barbosa et al. (eds).
r ~ i cphonology'. Lznguzstic Prince, A. S. and P. Smolensky (1993). 'Optimality Theory: Constraint interaction
in generative grammar'. Ms, Rutgers University and University of Colorado at
- ~ntlon,Change and Phono- Boulder.
_ - am~ns. Pulleyblank, D. (1993). 'Vowel harmony and Optimality Theory'. Actaj do Workshop Sobre
Theory of Grammar PhD Fonologia, University of Coimbra, 1-18.
---- (1996). 'Neutral vowels in Optimality Theory: A comparison of Yoruba and Wolof'.
- - -c.\;ical Pho~~ology'.Phonology
featural asymmetries'. In Durand and Laks (eds).
; .:1 1.4. Yang (ed.) Linguistics Rizzi, L. (1982). Issues in Italian Syntax. Dordrecht, Foris.
Roca, I. M. (ed.) (1997). Derivations and Constraints in Phonology. Oxford, Clarendon.
-
--. . ~ ~ < i irules'.
al Linguistic Inquiry Rochemont, M. (1978). A Theory of Stylistic Rules in English. Doctoral dissertation,
University of Massachusetts.
--
- - ~ a u t s o u d a s(ed.) The Applica- Rudin, C. (1985). Aspects of Bulgarian Syiztax: Conzplementizers and \+"-constructions.
- ..
- . -t .-..?.pe, Mouton. Columbus, OH, Slavica Publishers.
- - . ;::due et mouvement de Wh en -(1988). 'On multiple questions and multiple Lfi-fronting'. Natziral Langtlage and
Lirzguistic Theory 6, 445-501.
44 Optimality Theory