Professional Documents
Culture Documents
A generative grammar for natural language is an explicit formal system (Chomsky, 1965: 4; 1995:
162, fn. 1) based on an algorithm that generates structural descriptions of complex structures made
out (in orthodox accounts) of atomic elements. This system, in short, takes lexical items from an
unordered set (the so-called Numeration) and combines them step-by-step binarily, establishing
hierarchical relations between pairs of objects. Current developments within transformational
versions of generative grammar, from Chomsky (1995) on, have led to the proposal of a single,
free generative algorithm called Merge, which takes objects X, Y already constructed and forms a
new object Z. (Chomsky, 01: 40). Needless to say, stipulations aside, X and Y can be of arbitrary
complexity, either terminals (i.e., lexical items, which do not branch) or non-terminals (branching
nodes) built via Merge (i.e., sub-trees), which, in Uriagerekas (00a) terms corresponds to the
distinction between (a) Monotonic (expressible in Markovian terms2) and (b) non-Monotonic (nonMarkovian) Merge.
Merge was devised as an attempt to unify two defining properties of natural languages, hierarchical
structure and displacement. The former (External Merge) includes a theory of proper containment,
dominance, or non-connectedness relations between syntactic objects; the latter (Internal Merge, or
Move), a theory of how syntactic objects are interpreted in syntactic locations that differ from their
phonological location. There are some crucial notions in the previous paragraph: one of them is the
new object Z part. Within the Chomsky Hierarchy CH, an inclusive classification of formal
grammars (see Chomsky, 1956: Theorem 1), External Merge EM (where X and Y belong neither to
the same object nor to each other, but are taken from the Numeration, in turn a subset of the
Lexicon used to derive an expression) is regarded by Uriagereka (2008: 234) as a context-free
operation, as it creates Z from two any X, Y available in a derivational space, and is thus available
for Push-Down Automata, allowing true recursion, i.e., free center embedding (as opposed to Finite
State Automata, which allow only head-tail recursion). Notice, however, that this Merge operation
can be further decomposed as follows:
Most of what follows is taken from Krivochen (2015a), although the discussion has been shortened and
adapted to the present context. We refer the reader to that publication for details about phrase structure
building. Here, we will focus on alternatives rather than expose the problems of current assumptions.
2
Consider that Markov models allow head-tail recursion, insofar as loops can consist of arbitrarily long
chains of symbols, linearly ordered (i.e., excluding hierarchy). If this is so, as Uriagereka (2008: 233)
proposes, then the Markovian character of monotonic Merge (or merge to the root) follows
straightforwardly: the monotonically assembled sequence [he [saw [the [book]]]] (and any other like it) can be
expressed by means of a Markov chain, oversimplified as he saw the book, with each lexical item
configuring a state of the system. We did not include loops here, but it is equally simple to add n loops to the
Markov chain, as in [he saw the old, boring, heavy book].
Such a structure is abbreviated as DNP3Sg. Feature structures vary in their specificity, and in the
number of features they contain (here, we have only categorial and agreement features, but
different formalisms have adopted more, including specifications for grammatical functions and
subcategorization frames). Unification applies to feature structures in the following way (Shieber,
1986: 14):
3)
In formal terms, we define the unification of two feature structures D and D as the most
general feature structure D, such that D D and D D. We notate this D = D D.
2
Unification adds information (e.g., feature structures are not identical, but compatible)
Unification does not add information (e.g., feature structures are identical)
Unification fails due to conflicting information (e.g., same features, different value)
Thus, while Merge could be formulated in such a way that derivations must be counter-entropic
(that is, the informational load for the interfaces is always increased, at least if one interprets the
extension condition and the full interpretation principle strictly), this is only one of the possibilities
with Unification, which makes the derivational dynamics, in our opinion, more interesting (insofar
as the computational consequences of each possibility have to be taken into account at each
derivational point, if Unification is to be implemented in a derivational model). Jackendoff (2011:
276) claims that Merge can be regarded as a particular instance of Unify, in which feature structures
are reduced to two, in the following way:
5) Merge (X, Y)
Unify (X, [x y]), being [x y] a feature structure containing unspecified terminal elements =
[X, y], as X x.
Unify (Y, [X, y]) = [X, Y], as Y y.
Unification-based grammars do not consider a structure-mapping algorithm, since they are by
definition non-transformational: there is no operation relating phrase markers, nor are there
different, subjacent levels of structure (or conditions applying to them).
Merge-based and Unification-based models of structure building include most major linguistic
theories (MP, HPSG, LFG, Simpler Syntax). We have reviewed pros and cons of both in Krivochen
(2015a), here we will go further, assuming the discussion there.
It is crucial to bear in mind that, despite what some (widely spread) theories imply, models of
structure building are nothing but models, not the actual object of research. That is, a binarybranched tree is not a sentence, it is only a model of a sentence, just like Bohrs atomic model was
not an atom. For non-linguists, this clarification might seem otiose, but within generative linguistics
it is most important: MGGs theories of displacement and linearization, for instance, depend on
3
XP
YP
Y
ZP
Z
Xs asymmetrical c-command path is marked in blue and Ys, in green. Z does not asymmetrically
c-command anything and nor does its sister (they mutually c-command, which is referred to as a
point of symmetry), but since the most embedded element in the structure is a trace, it need not be
linearized (as it receives no phonological exponent). The transitive c-command relations we have
graphed are translated into linear order as follows:
7) XYZ[t]
Notice that the axiom (to which we will return below, as it is an example of function-based syntax,
against which we will argue) rests on two far-reaching stipulations (there are others, like the
assimilation of specifiers and adjuncts, or restrictions over labeling, but they all derive, in our
perspective, from the following two):
However, there is an essential confusion here: while the linear character of the signifi is an
observable phenomenon, theory-independent; the syntactic structure that is inferred from it is quite
the opposite, a highly theory-dependent purely abstract set of stipulations. The confusion between
levels has lead theorists like Adger (2012, p.c.), or Boeckx (2014) (not to mention Chomsky, 2013,
to put just the most recent references) to claim that every road leads to Merge, and you cannot have
an explanatory theory of language unless assuming something like it (going as far as saying that
() it would require stipulation to bar either of them () [External and Internal Merge],
Chomsky, 2013: 40). Particularly after the explosion of the so-called biolinguistic enterprise (see
Chomsky, 2005; Di Sciullo and Boeckx, 2011), MGG proponents are under the impression that they
are actually describing and explaining something external to their particular formalism. This is a
4
In Krivochen (2015a) we have resorted to a modified version of the well-known tree model purely for
expository purposes, but the same considerations apply. We are quite confident the problems we tackle (e.g.,
the structure of coordination, adjunction, iteration, and their consequences for computational models of
linguistic competence) exist independently of the specific framework we utilized, and arise in other
frameworks as well (HPSG, LFG, Simpler Syntax, Construction Grammar).
Interestingly enough, Kornais arguments are the mirror image of Chomskys (1957): since there are
portions of English that are not finite-state modelable, then English is not a finite state language. That is, at
least, a leap of faith: the existence of portions of a language that go beyond Type 3 complexity does not mean
that the language itself (understood, as in computer science, as a collection of strings) is all beyond finite-state
complexity, as we have shown in Krivochen (2015a).
It is almost otiose to say that this is a historical simplification, but not as much as the reader might think.
The formal apparatus of Syntactic Structures and also The Logical Structure of Linguistic Theory is pretty
safely describable as a watered down adapted version of Turings formalisms, along with Posts mathematical
insight, see, for instance, the Post Tag System (a variant of FSM). The less-known Systems of Syntactic
Analysis (Chomsky, 1953) presents a more sophisticated formalism, although equally over-restricted for the
purposes of natural language analysis.
(ii)
(iii)
(iv)
specification of a function f such that SDf(i, j) is the structural description assigned
to sentence Si, by grammar Gj, for arbitrary i,j
(v)
specification of a function m such that m(z) is an integer associated with the
grammar G, as its value (with, let us say, lower value indicated by higher number)
Chomsky (1965: 31. Our highlighting)
This quotation, which belongs to the Standard Theory stronghold Aspects of the Theory of Syntax
makes use of the concept of function in two senses: first, it maps the set of sentences to the set of
structural descriptions (arguably, both infinite sets), and this mapping is the grammar itself (point
(iv)); then, it specifies a monadic function that takes Gn as its value. Some features are shares
between both definitions: the specification of the set of possible sentences and possible structural
descriptions is one of the major issues here. While, if considering PSR of the kind , F the set of
structural descriptions is not in bijective relation with the set of possible sentences (since a single
10
11
There are also problems when one considers a linguistic theory to be a formal axiomatic system, explicit as
it might be, and Gdels Incompleteness Theorems enter the game. We will not go into this problem, however,
since it only touches on our concerns tangentially, and the brief discussion of the decision problem should be
enough to raise the readers scepticism with respect to the formal foundations of some major theories of
syntax.
12
In relation to the projection of lexical information onto phrase structure, Mller (forthcoming)
claims that
The HPSG lexicon [] consists of roots that are related to stems or fully inflected words.
The derivational or inflectional rules may influence part of speech (e.g. adjectival
derivation) and/or valence (-able adjectives and passive) [] The stem is mapped to a
word and the phonology of the input [] is mapped to the passive form by a function f.
(Mller, forthcoming: 16. Our highlighting)
This is problematic in several aspects. On the one hand, if the stem is mapped by means of a
function, then the whole process is impenetrable until it halts. This means that either the system
only generates well-formed candidates (which in turn requires a plethora of stipulations in order to
specify the feature matrix of the lexical entry accordingly), or the theory has a big problem of
computational tractability, as it is unlikely the operations required to derive, parse, and re-derive (in
case a suboptimal candidate has been built) happening serially can in fact be solved by a mind in
polynomial time. The sentence-level displays the same problems as the word-level, at a different
scale (taken from Green, 2011: 24):
13
Structure building: take separate objects and , Merge them, and build {, }, always
extending the phrase marker.
14
While the possibilities for structure building are, in principle n-ary, the MP is constrained by the
stipulations of X-bar theory (Chomsky, 1986), and allows only binary branching. Thus, Merge is
always applying to two objects, either at the root or at the non-root. During the 90s, Kayne (1994)
attempted to reinforce the binarity requirement he had put forth in the 80s, and came up with a
linearization algorithm known as the Linear Correspondence Axiom:
phrase structure () always completely determines linear order [] (Kayne, 1994: 3)
Linear Correspondence Axiom: [LCA] d(A) is a linear ordering of T. [A a set of nonterminals, T a set of terminals, d a terminal-to-nonterminal relation] (Kayne, 1994: 6)
Quite transparently, linear order is a function of phrase structure8, and phrase structure building is
always a two-place predicate, mapping to {, } via the function Merge. and , in turn, can be
atomic (taken from the Lexical Array), or not, which gives rise to two computationally different
types of Merge: Monotonic and Non-Monotonic, respectively (see Krivochen, 2015a for extensive
discussion). Let us illustrate both derivational options:
10)
Merge(, )
Merge( , (, ))
Merge((, ), ( , (, )))
11)
If the LCA is taken seriously, then (11) is a problem: there is no way to linearize {, } with respect
to { , {, }}, as there is a point of symmetry. There is no way to satisfy the function. There is
where Uriagerekas Multiple-Spell-Out model comes into play. However, that models increased
theoretical coherence comes with a price: assuming Spell-Out can take place more than once in the
course of a derivation and the system in charge of sending chunks of structure for their
linearization is sensitive to points of symmetry (or at least to Markovian sub-units). None of this is
clear under a MGG approach.
8
In this sense, we could express Spell-Out as a function f(SO), for any syntactic object SO, as a mapping of
SO = {, { , { }}} to SO = , ( a transitive, antisymmetric relation of linear precedence such
that = precedes ) and if x SO is a non-terminal, then SO applies to x internally as well.
15
Combination problem:
The combination problem is a purely mathematical problem that, in our opinion, greatly
undermines the plausibility of syntactico-centric theories (taking the term from Culicover and
Jackendoff, 2005). Assume we have a Numeration containing n elements. All possible
combinations of n using all n are defined by the expression n!, but that is not what MGG says. The
binarity requirement forces us to use all n, yes, but in subsequent sets of two elements. This is
mathematically expressed by an extension of the factorial formula:
12)
!
!!
where k is the number of elements we are manipulating at any given derivational point. Changing
the notation, to make it linguistically more transparent, the formula would look like this:
13)
!
( )! !
NUM is a specific numeration; Di is a derivational point, such that a derivation can be defined as a
set S = {Di, Dj, Dk, , Dn} of subsequent states. All that formula is saying is that the possible
combinations given a Numeration of n elements is defined not only by the possible combinations of
n in sets of n, but by the combination of n in sets of k elements (n > k) where order (i.e., branching
side) is relevant, which are represented by derivational stages. Assume now the simplest possible
Numeration for a sentence like [John loves Mary], considering only the basic functional categories9:
14) Num = {C, T, v, D, D, John, love, Mary}
By the formula in (13), there are 28 possible combinations, only two of which converge: [John
loves Mary] and [Mary loves John], and only one of which actually conveys the intended
proposition love(John, Mary). If branching side is not really relevant, we are left with 56
possibilities, of which the same two converge at the interfaces. In our opinion, the system is not
very efficient at this respect, particularly given the fact that there is no way to guarantee the
syntactic engine will not generate any of the 26 non-convergent possibilities unless resorting to
9
We will assume here that proper names are DPs, since the procedural instructions conveying by D (sortal
referentiality) apply equally to proper and common names. See Krivochen (2012: 61-63) for discussion within
the present framework. Two Ds are assumed since we have two proper names, and arguably, the procedural
instructions for each could vary (e.g., if we know John but not Mary, we are only able to presuppose her
existence, in the logical sense). We will also assume, just for the sake of the exposition, that Person and
Number morphological features reside in the T node, a common assumption in MGG that we find
inexplicable at best.
16
Uniformity problem:
The uniformity problem, which we have already referred to above, is simply the resistance of
MGG to think in terms other than binary-branching structure. In more technical terms, any strictly
constrained generative algorithm has the problem of forcing all strings to comply with a single
structural template (e.g., X-bar), which sometimes is too much structure, and sometimes, it leaves
aside relevant structural features. In a word, MGGs syntactic theory is quite like a Procrustean bed.
Consider (15):
15) The old old old man
(15) is simply an iterative pattern, but we cannot know that from the string only: it is the
[XXX] situation we dealt with above. In these cases, we cannot say that one instance of the A
takes the others as arguments, as a binary branching traditional X-bar theoretical representation
would entail: that would impose extra structure on the phrase marker, in the form of silent
functional heads (Cinques 1998 FP, for instance), or embedded APs, [old [old [old]]], which not
only complicate the theoretical apparatus (as we need independent evidence for the silent functional
material), but also do not account for the semantics of the construction, which is not scopal but
incremental (see Uriagereka, 2008: Chapter 6 for discussion). For sentences like (65), a phrase
structure grammar of the kind , F ( a finite set of initial strings and F a finite set of rewriting
rules) like the one argued for in Chomsky (1957), imposes too much a structure over the adjective
stalking, assuming a uniform system of binary projection. Chomsky himself (1963: 298) recognizes
the structural overgeneration of phrase structure grammars:
a constituent-structure grammar necessarily imposes too rich an analysis on sentences
because of features inherent in the way P-markers are defined for such sentences.
Curiously, there was no attempt within MGG to improve the phrase structure engine in order to
include these simple dependencies, rather, examples like (15) and similar patterns were increasingly
overlooked from the 80s on: the rise of X-bar theory as a template for syntactic structure
(Chomsky, 1981; Stowell, 1981), and its binary branching requirement (Kayne, 1984) directly
excluded n-ary branching dependencies (some of which correspond to very simple Markov chains)
from the picture. Minimalisms Merge operation maintained the essentially binary character,
deduced from apparently independent principles like antisymmetry (Kayne, 1994, 011; Di
Sciullo, 2011) and feature valuation operations (Pesetsky & Torrego, 2007; Wurmbrand, 2014).
Structural uniformity is one of the foundational stones of Minimalism from Chomsky and Miller
(1963: 304) (who first proposed binary branching) on, and despite its allegedly economical
17
Implementational problem:
This problem is simple to enunciate, but more difficult to explain, at least under traditional
assumptions. In short, derivations are at odds with real-time processing. That is, the false topology
assumed in MGG (either top-down or bottom-up) is in any case at odds with the processing of the
linear signal, which is left-to-right, following the speech flow. We have called it a false topology
because there is actually no theory about the characteristics of the topological space in which
derivations take place, except perhaps some general considerations we can make based on Roberts
(2014) considerations on ultrametric distances in X theory. However, those considerations pertain
to the structures themselves, not the spaces in which they are derived. Our concern, then, is twofold:
we have questioned the empirical adequacy and theoretical status of structurally uniform phrase
markers, and we are also concerned with the properties assigned to the working spaces in which
these phrase markers are derived. The name of the problem comes, quite obviously, from Marrs
(1982) levels of analysis:
Computational level: what the system does (e.g.: what kinds of problems it can solve), and
why
Algorithmic/representational level: how the system does what it does, more specifically,
what representations it uses and what processes it employs to build and manipulate the
representations
Physical/implementational level: how the system is physically realised (e.g., by means of
which kind of neural circuitry)
Having a look at the kind of arguments put forth not only within the MP, but also in LFG, HPSG
and related theories, we think that, while they all address the what in the computational level, and
propose theories for the how, the why is often just stipulated (e.g., UG), and there is little, if any,
concern with respect to the implementational level, even for so-called biolinguistics. In short,
theories of language tend to neglect crucial aspects of the implementational level and stipulate, well
18
19