A Proposed Approach To Handling Unbounded Dependencies in Automatic Parsers

University of Alexandria Faculty of Arts English Language Department
A Proposed Approach to Handling Unbounded Dependencies in Automatic Parsers
A THESIS SUBMITTED TO THE ENGLISH LANGUAGE DEPARTMENT, FACULTY OF ARTS, THE UNIVERSITY OF ALEXANDRIA IN FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF ARTS IN COMPUTATIONAL LINGUISTICS By Ramy Muhammad Magdi Ragab Abdel Azim Supervised by:
Dr. Sameh Al-Ansary Associate Professor of Computational Linguistics Department of Phonetics and Linguistics Faculty of Arts Alexandria University
Dr. Heba Labib Assistant Professor of Linguistics Department of English Language and Literature Faculty of Arts Alexandria University
to the memory of Professor Hassan Atiyya Taman (2010)
Contents
Abstract Acknowledgements Symbols and Abbreviations List of Figures List of Tables 1. INTRODUCTION 1.1. Motivation 1.2. The Problem 1.3. Aims and Contributions 1.4. Thesis Structure 1.5. UDs defined 1.6. The class of UDs 1.6.1. Strong UDs 1.6.2. Weak UDs 1.7. Nomenclature 2. UDS AND SYNTACTIC FORMALISMS 2.1. Derivational Approaches 2.2. Generalized phrase structure grammar (GPSG) 2.3. Head-driven phrase structure grammar (HPSG) 2.4. Categorial grammar (CG) 2.5. Lexical functional grammar (LFG) 17 18 21 23 25 27 30 31 32 34 37 38 44 52 60 63
2.6. Towards an Ontology of Gaps 2.6.1. Gaps between Objects and Subjects 2.6.2. The Distribution of Gaps 2.6.3. The Ontology 3. Parsing and Formal Languages 3.1. The Concept of a Formal Language 3.2. Defining a Generative Grammar 3.3. Formal Grammars and their Relation to Formal Languages 3.4. The Chomsky Hierarchy 3.5. Automata 3.6. Parsing Theories and Strategies 3.7. The Universal Parsing Problem 3.8. Major Parsing Direction 3.9. Top-down Parsing 3.10. Bottom-up Parsing 3.11. The Cocke-Kasami-Younger Algorithm 3.12. The Earley Algorithm 3.13. Statistical or Grammarless Parsing 3.14. Text vs. Grammar Parsing: the Nivre Model 3.15. Text Parsing and the Problem of UDs 4. UDs Parsing Complexity 4.1. The Rimell-Clark-Steedman (RCS) Test 4.2. The Parsers Set
70 71 73 77 81 82 83 84 86 89 91 92 93 95 96 98 100 103 104 105 108 110 112
4.3. Parser Architecture and Design 4.3.1. The Compiler 4.3.2. The Parser 5. Architectural Modifications 5.1. The Small-scale Latent Parser (SLP) 5.2. UDs SLP Pseudo-Algorithm 6. Processing Modifications 6.1. Gap-Threading 6.2. Gaps in Python 6.3. Memoization
113 113 115 118 118 120 123 123 127 129
7. Conclusions
133
REFERENCES
137
"In the beginning was the word. But by the time the second word was added to it, there was trouble. For with it came syntax, the thing that tripped up so many people."
John Simon, Paradigms Lost
This is a fertile area of research, in which definitive answers have not yet been found.
Sag &Wasow, Syntactic Theory: a formal introduction
Abstract
Unbounded dependencies (UDs) represent a set of syntactic constructions in the English language that face syntactic and computational analyses with a number of challenges. Unbounded dependencies cover such constructions as wh-questions, relative clauses, topicalized sentences, tough movement clauses, it-clefts and many more. Though each of the previous constructions may have received considerable attention in the syntactic literature, the awareness of the unity of all these constructions and their likeminded behavior that make them form a coherent whole was largely missing in such treatments. This thesis explores the linguistic nature of UDs and how they were handled within the current flurry of syntactic theories. The thesis provides analyses of UDs within the Principles & Parameters model (as representative of derivational approaches to syntax), Generalized Phrase Structure Grammars, Head-driven Phrase Structure Grammars, Lexical-functional Grammars, and Categorial Grammars (as representatives of nonderivational approaches). The thesis, then offers a newly devised gaps-ontology that aims at gathering all the information and rules related to the behavior of gaps in unbounded dependencies in one integral theoretical entity that can be utilized in computational environments. The thesis claims that the problem of UDs parsing is basically a computational problem not a syntactic one, i.e. the solution of the problem lies in the parsing strategy and techniques used not the theoretical underpinnings of the different syntactic analyses available. Accordingly, the thesis proposes two types of solutions to the parsing
problem of UDs: the first introduces modifications on the architectural design of the universal parser, subscribing to the highly useful technique of modularity and thus devising what the thesis calls a Small-scale Latent Parser. The other proposes processing modifications represented by the techniques of gap-threading and memoization. The thesis also claims that top-down parsing cannot be endorsed as a possible strategy for parsing UDs and favors, thus, bottom-up parsing strategies instead.
Acknowledgments
My interests in computer science and the study of computational linguistics were triggered 13 years ago when I began my work with Dr. Nabil Ali. Dr. Ali, an engineer by training and the father of Arabic informatics and computational linguistics, brought to my attention many important works and gave me the opportunity to see how a real computational system looks like. The late Prof. Hasan Taman, original supervisor of the thesis, is the one who should be accredited with the current organization of the thesis. He insisted, against my disposition to work on theoretical issues alone, on a problem-solving method that finds a problem and proposes solutions, which explains the title of the thesis itself (his exact phrasing). Prof. Tamans belief in me and in my academic abilities was so crucial in infusing me with the spirit that made me work on this thesis and recover from so many bouts of despair. May his soul rest in peace. Prof. Azza el-Khoulys and Prof. Sahar Hamoudas kindness and support made this thesis see the light of day. Dr. Sameh al-Ansarys patience, unflinching support and understanding also revitalized the hope of finishing this thesis. Without him I would not have been able to finish the thesis in the first place, not to mention his comments and suggestions that improved the outlook and organization of the thesis. My debt to him will always be remembered. Also, Prof. Olga Matars kind approval to be one of the examiners brought me such happiness because she was the first one I hoped could supervise my work even before Prof. Taman, but unfortunately at that time she was unable to slot me in her already full schedule of graduate theses supervisions. Dr. Heba Labibs sweet kindness and
01
understanding were simply invaluable. Dr. Medhat Issa, my dear friend and colleague, gave me an unforgettable example of vicariousness and true friendship. Finally, the help I received from my dear beloved wife, Nermin, is simply beyond any possible attempt of description. Her loving care and her unending patience always humbled me and made me realize how much I am blessed. Also her comments and reading of the manuscript of the thesis were so illuminating and insightful. My discussions with my Professor and father-in-law Prof. Mohammad Saleh al-Dali and his undying support were so formative and influential. It is such a privilege to be connected to such a scholarly giant. And customarily, of course, any mistakes, flaws or signs of incoherence are mine and mine alone.
00
Symbols and Abbreviations

_ / e ' AB = L G VN VT L(G)
(N, , S, P)
Underscores represent the position (s) of gaps in a sentence. Represents the SLASH feature in which the slashed feature on the right-hand side of the slash is missing. Null or empty categories. Adding up to in HPSG There is a category A missing somewhere within it a B, within Moortgats version of CG. In LFG, a variable that refers to the lexical item being categorized. In LFG, an equation meaning that the features of the nodes below and above are being shared. Lambda, a symbol referring to a string consisting of zero elements. Language in formal languages theory. Grammar in the theory of formal languages. Nonterminal variables Terminal variables The grammar of a language L in formal language theory. Elements of a formal grammar G. The left-hand side elements are rewritten as the right-hand side elements, e.g. S NP VP
xS

x belongs to or a member of S. Refers either to the root of a sentence or, in formal language theory, to terminals of sentence in contrast to N which refers to non-terminals.
In the Earley parsing algorithm, the dot is used on the right-hand side of the grammar rule to tell us where the rule has reached or to what extent it progressed, e.g. S VP, [0, 0] Boxed numbers, or tags, in AVMs indicate structure sharing in HPSG
02
NLP UDs ST GPSG LFG HPSG CG(s) CCG TG ATN PSG GB P&P MP TP C CP DP SPEC CF-PSG FFP ID LP HFP CSLI QUE REL INHER AVM SYNSEM SPR 3sg
Natural Language Processing Unbounded Dependencies Syntactic Theory Generalized Phrase Structure Grammar Lexical Functional Grammar Head-driven Phrase Structure Grammar Categorial Grammar(s) Combinatory Categorial Grammar Transformational Grammar Augmented Transition Networks Phrase Structure Grammar Government and Binding theory Principles and Parameters theory Minimalist Program A clause consisting of an NP and a VP. Complementizer within a P&P context. Complementizer phrase within a P&P context. Determiner Phrase within a P&P context. Specifier within a P&P context. Context-free Phrase Structure Grammar. Foot Feature Principle within a GPSG context. Immediate Dominance rules within a GPSG context. Linear Precedence within a GPSG context. Head Feature principle within a GPSG context. Stanfords University Center for the Study of Language and Information. A feature of questions in HPSG. A feature of relative clauses in HPSG. Inheritance feature in HPSG. Attribute Value Matrix in HPSG and Unification Grammars. Syntax-semantics interface in HPSG. Specifiers in HPSG. Third person singular in HPSG.
03
ELL2 CGEL UDtp UDcl P/F V TM LBA PDA FSA NLs TAGs OT LFG P&S J&M CKY CNF DCG WFSG TP RCS WSJC PTB SLP P
Encyclopedia of Language and Linguistics, second edition. Cambridge Grammar of the English Language. UDs typology. UDs class. Positions and functions of gaps in a gaps-ontology. Vocabulary in formal language theory. Turing Machine. Linear bounded automata. Push-down automata. Finite-state automata. Natural Languages Tree-adjoining grammars Optimality-theoretic LFG Pollard and Sag (1994) Jurafsky and Martin (2009) Cocke-Kasami-Younger parsing algorithm. Chomsky Normal Form Definite clause grammar Well-formed strings of the grammar. Text Parsing The Rimell-Clarck-Steedman Test. Wall Street Journal Corpus The Penn Treebank Small-scale Latent Parser Polynomial class in Complexity theory.
04
List of Figures and Tables Figures

(1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (12) (13) (14) (15) (16) (17) (18) (19) (20) (21) (22) (23) (24) (25) (26)
The Class of Unbounded Dependency Constructions.
A derivational analysis of the sentence Who do you think Jim Kissed?. A derivational analysis of the sentence Who do you think Jim Kissed? (modified). A derivational analysis of the sentence Who do you think Jim Kissed? (modified). A derivational analysis of the sentence Who do you think Jim Kissed? (modified). A derivational analysis of the sentence Which city did Ian visit?. Tree geometry of the structure of a UD in GPSG. A GPSG analysis of the sentence Sandy we want to succeed. An HPSG analysis of the sentence Kim, we know Sandy claims Dana hates. An attribute value matrix (AVM) for the verb sees in HPSG. An HPSG structural description (SD) of gaps in UDs. A CG analysis of the sentences Whom do you think he loves? and Who do you think loves him?. A CG analysis of the sentence Who Jo hits? An LFG analysis of the sentence What Rachel thinks Ross put on the shelf? The c-structure of What Rachel thinks Ross put on the table? The f-structure of What Rachel thinks Ross put on the table? C-structure of What did the strange, green entity seem to try to quickly hide? (Asudeh 2009) F-structure of What did the strange, green entity seem to try to quickly hide? (Asudeh 2009) A subject-predicate analysis of the topicalized sentence The others I know are genuine. CGEL. A proposed Gap ontology. GAPS AVM. The Chomsky Hierarchy and its corresponding automata. a top-down analysis of the sentence Book that flight. A bottom-up analysis of the sentence Book that flight. A CKY parsing of the sentence Book the flight through Houston. An illustration of an attachment ambiguity in the sentence I shot the elephant in my pajamas.
05
(27) (28) (29) (30) (31) (32) (33) (34) (35)
Components of a language processing system. The structure of a compiler within a language processing system. The Parser within the compiler. Small-scale Latent Parser. GAPS AVM. Flowchart of UDs SPL algorithm. Gap-threading in the sentence John, Sally gave a book to. A parse of the sentence Who do you claim that you like? using Python. A parser blueprint incorporating all proposed modifications.
Tables
(1) (2)
(3) (4)
Position/function of Gaps. Multi-locus Gaps.

Formal elements of a PSG. Chomsky hierarchy grammars and their corresponding automata. An Earley algorithm analysis of the sentence Book that flight. Examples of the seven types of UDs used in the RSC Test. Parser accuracy on the UDs corpus according to the RCS Test.
(5)
(6)
(7)
06
Chapter1: Introduction
07
Chapter1: Introduction
1.1. Motivation:
Since the beginning of the year 2000 up until the end of 2003, I have worked on natural language processing (NLP) solutions for two major companies in Egypt. My first-hand experience with actual large-scale parsers made me aware of some problems facing those parsers in the processing of certain grammatical constructions. I decided, back then, to tackle one of the most difficult problems facing those parsers unbounded dependencies. Complex syntactic phenomena stand out as a challenge to computational implementation in NLP applications. The challenge resides in the problematic nature of these phenomena: they are syntactically rich with details, and as a consequence of complexity, they are interleaved with many other linguistic phenomena. In addition, they exhibit a sufficiently perplexing tendency towards being polymorphous and diverse. Unbounded dependencies (or, alternatively, long-distance dependencies, fillergap constructions, wh-movement constructions, A-bar dependency constructions, extraction dependencies, etc.) are classic examples of how complex and theoretically as well as computationally challenging these syntactic phenomena can be. Terry Winograd (Winograd 1983) gives us an unequivocal statement about the significance of UDs to the then current syntactic theory. He says:
08
The need to account for this phenomenon [UDs] is one of the major forces shaping grammar formalisms. It was one of the motivations for the original idea of transformations, and in some recent versions of TG, the only remaining transformations are those needed to handle it. The hold register in ATN grammars, the distant binding arrows of LFG, and the derived categories of PSG are other examples of special devices that have been added on top of simpler underlying mechanisms in order to handle it. (Winograd 1983: 478) Since the 1970s, it has been generally assumed that a number of grammatical constructions show a type of uniform behavior and architecture that they should be considered en masse. Chomsky (1977) notes that the rule of wh-movement has, inter alia, the following general characteristics:
123it leaves a gap. where there is a bridge, there is an apparent violation of subjacency. it observes wh-islands. (Chomsky 1977: 86)
Grammatical phenomena that fall under the rubric of UDs cover the following constructions: topicalization, wh-questions, wh-relatives, it-clefts, tough movement, etc. The most important feature marking all these constructions is the existence of gaps as Chomsky noted above. UDs represent a unique class of grammatical constructions that require some especially devised mechanisms in order to successfully process them syntactically and computationally. A basic example on UDs is given in the following sentence:
(a) Sam, I think he told me he tried to understand __.
The above sentence can be represented in the following, largely theory-neutral, tree diagram
09
S NP NP V NP V S VP S VP NP S
NP
V
VP
IP I
Sam,
I think
he told
NP V me he tried to understand___
Sentence (a) above is a topicalized sentence where the object of the sentence is fronted to add emphasis to the intended message of the construction. The fronting of Sam, i.e. its displacement from the normal object position in the English language (an SVO language) left a trace in the position of the displaced object that tells us about the history or the original constitution of the structure before the displacement process. This trace is usually marked with a hyphen or a dash representing the displaced element. This account somehow subscribes to a movement-based hypothesis that is part of the derivational approach to UDs evidenced in TG, GB, P&P and MP theories of syntax.1
The example above and the subsequent explanation should not be taken as a sign of the researchers subscription to the Chomskyan model and its various manifestations and developments. On the contrary. The present work openly criticizes those approaches and spots many deficiencies in them as will be seen in chapter 2.
21
1.2. The Problem

UDs represent a unique instance in the history of contemporary syntactic theory (henceforth: ST) and NLP. In fact, they became the raison d'tre of a handful of extremely influential syntactic formalisms and a number of novel computational theorems and techniques. Robust syntactic formalisms such as Generalized Phrase Structure Grammar (henceforth: GPSG), Lexical Functional Grammar(s) (henceforth: LFG), Head-driven Phrase Structure Grammar(s) (henceforth: HPSG), and modern Categorial Grammar(s) all owe, some way or another, many of their formative concepts and notational devices to studies of UDs. Ivan Sag (Sag 1982) expresses this fact succinctly by saying that: Few linguists would take seriously a theory of grammar which did not address the fundamental problems of English that were dealt with in the framework of standard transformational grammar by such rules as Thereinsertion, It-extraposition, Passive, SubjectSubject raising, and Subject Object raising. (Sag 1982, p. 427)
UDs happen to be one of those constructions. This is not the whole picture, though. UDs form an integrated component in most syntactic theories that have attained a considerable degree of maturity. Its internal complexity and the sophistication needed to handle them formally and computationally made them a benchmark against which the validity, expressive power, and exhaustiveness of treatment of any given syntactic theory are gauged. None the less, only few works have paid attention to handling UDs in a uniform manner, i.e. works dealing with UDs as a uniform whole surveying their
20
treatment in different syntactic formalisms that subscribe to different linguistic frameworks are quite meager.1 As regards the computational handling of UDs, there have been various attempts at unraveling their syntactic complexity through computers. The basic idea was to test the robustness of a particular grammar formalism or computational system (Oltmans 1999). The idea of robustness is of the essence here. A computational system is deemed robust if it exhibits graceful behavior in the presence of exceptional conditions. Robustness in NLP is concerned with the systems behavior when input falls outside of its initial coverage. For instance, if the system is fed with rules describing and specifying the behavior and structure of relative clauses in English, it will not be negatively affected if these rules are not covered in full. But the question remains: why study UDs from a computational viewpoint? The answer to this question seems to be unanimous in the computational literature. UDs have been always identified in computational linguistics works as a problem. Charniak (1993) mentions the following concerning UDs:
Another standard problem with CFGs is long distance dependencies This problem can be solved within a CFG, although it gets a bit complicated. (Charniak 1993: 8-9)
In Mellish et al. (1994) the situation is even more clear-cut:

The problem is more severe when we come to consider long distance dependencies, or more correctly unbounded dependencies in which two unrelated pieces of structure may be arbitrarily far apart and not in the same level in the tree. (Mellish et al. 1994: 129-130)
1
Only recently Robert Levine and Thomas Hukari have produced a uniform treatment of UDs in their full manifestations in their: R. Levine & T. Hukari (2006) The Unity of Unbounded Dependency Constructions. CSLI Publications, Stanford University. Unfortunately, I was unable to secure a copy of the book, but I read a detailed academic review of it by Robert Borsley. However, the main thrust of the book is on the syntax-theoretic aspects of UDs within the framework of HPSG without any reference to computational issues (see Borsley 2009).
22
Pereira (1981) finds out that one of the most important benefits of connecting parsing with deduction is the [h]andling of gaps and unbounded dependencies on the fly without adding special mechanisms. Such an excessive interest and engagement with UDs, give us a clear unhampered view of the status of UDs as a computational problem. There seems to be a common realization amongst computational linguists and syntacticians of the problematic nature of UDs; a fact that precipitated many of the current theoretical frameworks both in pure syntax and in computational linguistics. Statistically speaking, there is a common belief that UDs and similar phenomena do not represent a sizable portion of any general large-scale corpus, hence ignoring their treatment. Surprisingly, however, around three quarters of the Wall Street Journal corpus (WSJC) in the Penn Treebank (PTB) are non-local dependencies, which happen to include UDs most of the time. The internal sophistication of UDs, their typological diversity, the existence of gaps, their considerable corpus frequency not only lay bare UDs as an engaging problem (syntactically and computationally) but as a compelling one as well.
1.3. Aims and Contributions:

The main goal of this thesis is to provide outlines for solutions of the problem of UDs as a computational problem. The overall aims of the thesis can be summarized in the following points: Placing UDs in their proper positions as regards simple, non-theoretic, grammatical analysis.
23
Uncovering the role of UDs in the formation and evolution of many syntactic theories and formalisms.
Laying hands on the key element(s) that could enable us to unravel the grammatical complexity of UDs.
Proposing a syntax-theoretic solution for UDs in terms of a proposed gapsontology.
Highlighting the complexity of processing UDs computationally, i.e. UDs as a parsing problem.
Proposing two types of solutions regarding the automatic parsing of UDs: the first has to do with the overall parser design (some tweaking and modifications of the parser architecture); while the second offers two parsing techniques that may enable the parser to process UDs in a robust and efficient way.
However, before embarking on discussing the general outlines of my study, it is of paramount importance to examine a question of method which confronts the researcher at the outset. A linguist who has been trained on the dynamics and sophisticated details of the many linguistic theories currently available while hardly having any formal training in computability theory or computer science is unlikely to offer any detailed or profoundly technical treatment of a phenomenon such as UDs from a computational viewpoint. Besides, in order to prepare a serious computationally viable study of UDs, a linguist needs an intricate set of computational tools that can only be secured and afforded by such large commercial/research entities (IBM, Microsoft, Carnegie Mellon, etc.), not to mention the academic and technical expertise that cannot be obviated.
24
Thus, what a linguist can do on their own is to adumbrate certain guidelines that relate to the interface between theoretical and computational linguistics. Many a research has been marred because its author was unable to resist the temptation of going computational: a temptation that normally leads to a chaotic morass of computational nuances that, with the wisdom of hindsight, prove quite hard to disentangle. This aptitude towards things computational can be ascribed to the current hype given to anything that has to do with computers, without having, on the part of the researcher, any proper knowledge, training or experience to do so. I have attempted to get around this dilemma by focusing on the theoretical syntactic issues that relate directly to computational parsing offering a broad, semi-technical approach to solutions. As such, none of the arguments or proposals in the computational section of this work should be judged as technical; they are just a number of theoretical postulations, conjectures and refutations on how, in my opinion and according to my knowledge of computer science, these problems can be solved.
1.4. Thesis Structure:

The thesis is broadly divided into three sections: the first focuses on the extensive theoretical backdrop of the phenomenon, providing an eclectic approach towards a uniform view of one of the lynchpin components of both the theoretical division of the work and the computational one gaps. The second represents a rough treatment of the computational and the parsing problems involved using the second part as a springboard. The third represents the researchers contribution to the problem of UDs
25
parsing by giving two sets of proposed modifications on the architectural and processing levels of the parser. The first section can be seen as the syntax-theoretic part that deals with the definitions, typology and grammatical analysis of UDs (Chapter 1 and 2) and how a number of syntactic theories and formalisms dealt with them. In addition to the analytical exposition, this section is permeated with critiques of those theories and formalisms in their treatment of UDs, along with an attempt (a perfunctory one though) at digging up their intellectual milieus and methodological underpinnings (Chapter 2). Section 2.6 proposes a gaps-ontology in which an eclectic, but hopefully harmonious, mlange of the theoretical component of gaps and gaps handling is offered. This concludes the syntax-theoretic section of the thesis. The second section of the thesis focuses on parsing theory and its roots in the study of formal languages (Chapter 3). Sections 3.6 - 14 discuss the various strategies and techniques of parsing available in the literature. Chapter 4 considers the complexity of UDs parasability as evidenced in a recent computational experiment. Sections 4.3 - 5 examine the architecture and design of mainstream parsers and how they are built. The final section of the thesis represents the contributions part of the work where the proposed modifications mentioned earlier are found. Chapter 5 proposes architectural and design modifications on the universal parser by introducing the notion of modularity and by devising a Small-scale Latent parser. Chapter 6 proposes the next set of modifications that relate to the processing of the parser itself. This final section concludes with a brief account of the conclusions of the thesis.
26
1.5. UDs Defined

The syntactic phenomenon of unbounded dependencies has been, as alluded to above, a major springboard for many theoretical proposals and syntactic formalisms. Naturally, this multiplicity of origins generated concomitantly a multiplicity of definitions and designations in the literature. First, I will look at the different definitions of UDs and how these differences can be accounted for. Then I will survey the various designations found in the relevant syntactic literature. The concept of "unbounded dependencies" was first introduced by Gerald Gazdar (1981) to refer to a set of syntactic structures handled within transformational frameworks in terms of movement or, more specifically, wh-movement. The use of the adjective "unbounded" in such contexts, however, goes back to J. Bresnan (1976) during the heyday of transformational approaches to grammatical analysis. Originally, however, the idea of "unboundedness" is a mathematical concept used in algebraic and computational studies of unbounded operators, set theory, number theory and algorithmics (Gowers 2009). The mathematical undertones of the term will be discussed later in the following section. Crystal (2008) defines an unbounded dependency as [a] term used in some theories of grammar (such as GPSG) to refer to a construction in which a syntactic relationship holds between two constituents such that there is no restriction on the structural distance between them (e.g. a restriction which would require that both be constituents of the same clause); also called a long-distance clause. In English, cleft sentences, topicalization, wh-questions and relative clauses have been proposed as examples of constructions which involve this kind of dependency; for instance, a wh-constituent may occur at the beginning
27
of a main clause, while the construction with which it is connected may be one, two or more clauses away, as in What has John done?/What do they think John has done?/ What do they think we have said John has done?, etc. In GB theory, unbounded dependencies are analyzed in terms of movement. In GPSG, use is made of the feature SLASH. The term is increasingly used outside the generative context . (Crystal 2008: 501)
1
Crystal's definition deserves a while of analytical contemplation. First, we need to establish the fact that Crystal (2008) is a relatively basic specialized dictionary targeted at professional as well as lay readers. This means that encountering detailed argumentative analyses of linguistic phenomena would be a rare incident in his work. He establishes his definition of UDs upon an abstract postulate that describes UDs as having a syntactic relationship between two constituents "such that there is no restriction on the structural distance between them." The idea of having no restriction on the structural distance between two dependencies is a mathematically or logically oriented idea rather than a natural language based one. In other words, natural language cannot permit such infinitely continuous clausal concatenations. It has to have a bound (i.e. a sentence must end somewhere in a linguistic text). The idea of unboundedness is thus a potentiality rather than an actuality. Mathematically-oriented thinking about language, however, has a natural proclivity towards abstraction and higher-order
1
The final two sentences in Crystal's definition are interesting from an error analysis viewpoint, however. First, he describes GB as handling UDs in terms of movement, which is essentially correct. However, he continues his description by stipulating another fact about the handling of UDs in GPSG through the feature SLASH. The feature SLASH, as we will see later, is postulated in GPSG to account primarily for the existence of gaps in UDs, while describing movement only as the main technique for handling UDs in GB. This entails an intrinsic mistake in proposing that GB theory has no theorem for handling gaps, which is incorrect. Second, Crystal describes GPSG, HPSG, LFG and CGs as theories "outside the generative context." In fact all these theories are "generative" in essence; they are only nontransformational.
28
language. A more linguistically-real term would be "long-distance dependencies", which was later adopted by most non-transformational syntactic theories and syntactic formalisms handling the phenomenon of UDs. Trask (1993) defines UDs in a more poised manner. He notes how UDs present "a major headache for syntactic analysis," and that "all sorts of special machinery have been postulated to deal with them." He takes a more development-oriented approach to the handling of the phenomenon: for example, he mentions that classical TG made a liberal use of the theoretically problematic unbounded movement rules, and that GB and GPSG both reanalyzed UDs in terms of chains of local dependencies. GB used traces and GPSG came up with a feature SLASH. LFG, on the other hand, used arcs in its f-structures. I shall deal with all these formative concepts in more detail later in this work. Matthews (1997) defines the phenomenon of UDs as a "[r]elation between syntactic elements that is not subject to a restriction on the complexity of intervening structures." His definition is a restriction-based one, bearing in mind the formative concepts of island and cross-over constraints. Another definition based on psycho-syntactic realization of UDs is found in Slack (1990). According to him UDs represent a unique linguistic phenomenon, he writes: One linguistic phenomenon which, more than any other, focuses on the problem of addressing structural configurations is that of unbounded dependency. Typically, in sentences like The boy who John gave the book to __ last week was Bill, the phrase The boy is taken as the filler for the missing argument, or gap, of the gave predicate, as indicated by the
29
underline. At the level of constituent structure there are no constraints on the number of lexical items that can intervene between a filler and its corresponding gap. (Slack 1990: 268) Slack (1990) dissects the phenomenon of UDs in a more profound manner. He states that UDs belong to a class of linguistic phenomena in which the structural address of an element is determined by information which is only accessible over some arbitrary distance in the structure. According to him it is necessary to determine the address of the gap to which a filler belongs. The arbitrariness of the distance separating the gaps and their fillers in the input strings, makes the specification of the set of potential predicate-argument relations that the filler can be involved in (and thus the identification of a direct address of the gap) quite an impossible task (ibid.). The former definitions can be classified as non-partisan, i.e. they do not subscribe to any particular syntactic theory, framework or formalism. Also being mostly dictionary entries they are naturally confined by the constraints of brevity, simplicity and neutrality. Apart from encyclopedic definitions, I need to establish the fact that the study of UDs have been originally formulated within more arcane journal articles and research monographs. For that matter Gazdar et al. (1985) presents the first perspicuous and formally rigorous definition of UDs. I shall not dwell further on GPSG and its treatment of UDs for I have included a whole section dedicated to this classic and most influential treatment of UDs (see 3.2.).
1.6. The Class of UDs:

Any rigorous treatment of the phenomenon of Unbounded Dependencies should rest on a uniform, holistic comprehension of its nature. By "holistic" I refer to the necessity of
31
treating UDs in an undivided manner; i.e. studying relative clauses, wh-questions or topicalized constructions separately will not shed enough light on the nature and dynamics of the phenomenon. The study of UDs should be applied to the complete set of constructions recognized and classified as unbounded dependency constructions. These constructions are included within the following two subsets: strong UDs and weak UDs.
1.6.1. Strong UDs:

In what sense is the first subset of UDs "strong"? "Strength" here is rather a misnomer for compatibility or isomorphism. They are strong because they require the filler and the gap to be of the same syntactic category. According to Pollard & Sag (1994: 157158), the first subset clearly represents strong UDs because there is an overt constituent in a non-argument position (sentences 1-5 group A) (normally the wh-phrase) that is strongly associated with the gap indicated by "_". Strong UDs include the following structures: GROUP (A) Topicalization: (1) This sort of problemi, my motherj is difficult to talk to _j about _i.1 Wh-questions: (2) Which violini are these sonatasj difficult for them to play _j on _i? Wh-relative clauses: (3) This is the bookithat the manj we told the story to _jbought_i. It-clefts:
1
Underscores and small subscripts (j, i, etc.) in this and the following sentences represent gaps or empty elements (traces of nominal or pronominal antecedents); this is a notational convention found in the majority of syntactic analyses of UDs and similar grammatical constructions.
30
(4) It is Kim whoiSandy loves _i. Pseudo-clefts: (5) This is whati Kim loves _i.
1.6.2. Weak UDs:

Weak UDs, on the other hand, have no overt filler in a non-argument position (sentences 1-4 group B); instead they have a constituent in an argument position that is "loosely" co-referential with the gap or the trace. Weak UDs include the following structures:
GROUP (B) Tough movement: (1) Sandyi is hard to love _i. Purpose infinitives: (2) I bought iti for Sandy to eat _i. Non-wh relatives (3) This is the politicianiSandy loves _i. Non-wh clefts (4) It's Kimi Sandy loves _i. Two important points have to be mentioned here. First, UDs are indeed unbounded, which means that the dependency may, theoretically speaking, extend ad infinitum. Second, there is a syntactic category-matching condition between the filler and the gap, especially in strong UDs. The following examples illustrate these two points: (1) a) Kimi, Sandy trusts _i. b) [On Kim]i, Sandy depends _i. (2) a) Kimi, Chris knows Sandy trusts _i. b) [On Kim]i, Chris knows Sandy depends _i.
32
(3) a) Kimi, Dana believes Chris knows Sandy trusts _i. b) [On Kim]i, Dana believes Chris knows Sandy depends _i. In (1) the gap is an argument of the main clause, in (2) it is an argument of an embedded complement clause, and in (3) it is an argument of a complement clause within a complement clause. Mathematically speaking, there is no bound on the depth of embedding. The following diagram represents the above-mentioned in a clearer style.
Figure (1) The Class of Unbounded Dependency Constructions Evidently, the class of UDs has a rich taxonomical structure that justifies its complexity. As noted above, studying each of the branches in the above tree diagram on its own will yield unsubstantial insights into UDs. As a first approximation, the thing that gathers all these different syntactic constructions under a uniform category is the existence of a "gap" somewhere in the construction. An oxymoron as it might seem,
33
the existence of gaps or missing elements in the sentence is the common denominator that holds all the above branches under one node UDs. That is why I allocate a special section for the handling of gaps in UDs later in this work (see Chapter 4). Also, I found out that an eclectic theory of gaps might be a step towards a better and more profound comprehension of the phenomenon of UDs and the more general phenomenon of gapping. For the sake of brevity and better visibility conditions, the present work will focus mainly on strong UDs throughout the proposed analyses and critiques. Weak UDs will be sporadically mentioned throughout the work, though they will not have a proper treatment on their own right. The partial exclusion of weak UDs from the work will hardly affect the treatment of the overall phenomenon. Strong UDs have all the features that we need in order to analyze UDs. Weak UDs, on the other hand, are more of a subset of strong UDs: a fact that makes obviating the handling of weak UDs a reasonable act in the footsteps of Ockham's razor.
1.7. Nomenclature
UDs have been variously termed in the literature. Y. Falk (2006: 316) recognized the following designations: extraction, long-distance dependencies, wh dependencies (or wh-movement), A' dependencies (or A' movement), syntactic binding, operator movement, and constituent control. The concept owes its multifarious terminological manifestations to different realizations of its nature and functions. Each linguistic school or syntactic formalism saw UDs according to its defining characteristics and theoretical grounding. Transformational theories (such as
34
Chomsky's GB, P&P and MP), for instance, have essentially a dynamic, movementbased conception of most linguistic constructions; a fact which clearly explains the use of such terms as "wh-movement, A' movement, syntactic binding," etc. On the contrary, non-transformational theories (such as GPSG, HPSG, CG) proceed from a static monostratal1 conception of linguistic constructions, hence their use of such terms as "unbounded dependency constructions and long-distance dependencies."
Terminologically speaking, the term "extraction" is the only common ground where transformational and non-transformational theories meet (on the use of extraction in non-transformational contexts see Sag 1994).
This term refers to the idea that syntactic structures are essentially monostratal, i.e. they consist of only one level of representation, which is a surface apparent level. The Chomskyan postulate of a deep structure is irrevocably repudiated within this monostratal framework. Gazdar et al (1985) was the first unequivocal statement of this theorem on which are based the whole frameworks of GPSG, HPSG and DCG. For more details see Horrocks (1987), Gazdar et al (1985), Sag et al (1994), Sag et al (2003), Brown ed. (2006).
35
36
Chapter 2: UDs and Syntactic Formalisms
37
Chapter 2: UDs and Syntactic Formalisms
2.1. Derivational Approaches to UDs
Most of the reviews of the literature I came across in academic theses and books dealing with UDs, from a historiographical viewpoint, seem to be a disparate collation of information that hardly precipitates profound understanding or evaluation of the intellectual context that spawned and fostered the growth of syntactic theory. This is not the case here as far as I hope. My faith is that syntactic theory (and its handling of UDs) can hardly be understood or profoundly appreciated without a firm belief in the utility of coming to grips with the intellectual milieu that made such scholarly feats possible. Fortunately, the historiography of UDs in both the syntactic and the computational realms is as much variegated as could help build a mosaic that is informative, insightful and sufficiently panoramic. I believe, thus, along with Tomalin (2006) that [i]t could hardly be claimed that to consider the aims and goals of contemporary generative grammar, without first attempting to comprehend something of the intellectual context out of which the theory developed, is to labour in a penumbra of ineffectual superficiality. (Tomalin 2006: 20) Another important factor that necessitates this line of research has to do with UDs themselves. The study of UDs has been a major formative force in the field, a fact that made it a prerequisite (and a keepsake) for anyone embarking on a serious study of
38
syntactic theory. The inherent complexity of unbounded dependency constructions and the challenges they posed before syntacticians of different streamlines and the various analytical strategies and tools proposed to handle them endowed these constructions with a level of significance unprecedented in the field. That is why I adopted a historical-cum-theoretical approach in studying them, because, as far as I can see, this is the approach that is the most felicitous and the most enlightening as well. Historically, UDs have been studied according to two different approaches: the transformational and the non-transformational.1 Transformational approaches analyze UDs from a movement-based perspective. The filler of a UD is marked with an underscore (as in [a] below) then it changes its location through a series of movements till it reaches the leftmost position in the tree. (a) 1. Which car does John think you should purchase_? 2. That book you should read_. 3. This is the car which_ John told me he thinks I should purchase_. 4. Whom do you think Jim kissed _?
Sentence (4) (see Carnie 2006: 325) can be represented according to a transformational (derivational) framework like the following2:
Transformational approaches have also been known as derivational approaches, because they depend on processes that derive, via transformations, the final output of a sentence from certain hypothesized deep structures to their final realizations as surface structures (see Bussmann 1996; Trask 1993; Radford 2003). 2 The version used here is a recent version of the transformational enterprise known as P&P (Principles and Parameters) which is the version before the last emendation stated in Chomskys The Minimalist Program (1995).
39
Figure (2)
According to TG analyses, this is the original deep representation of the sentence; where the wh-word is situated at the bottom of the tree. This means that in order to move who to its proper position a number of movements have to be done. These movements can also be illustrated in the following tree (see Carnie 2006: 326):
41
Figure (3)
Arcs in the above tree (Figure 2) represent derivational processes (formerly: transformations) that work on deriving the S-structure (surface structure) from the previous D-structure (Figure 1). According to Carnie (), the proposed movements above are made in two hops, moving first to the specifier of the embedded CP, then on to the higher CP to check that Cs [+WH] feature. The final phase of these rather tortuous movements is the realization of the S-structure, which can be represented like the following (Carnie 2006: 327):
40
Figure (4)
The two arcs in the above figure represent the two hops Carnie just referred to. Now, we can have the correct S-structure where the wh-phrase will be situated at its rightful initial position in the tree, as shown in figure (4) (Carnie: 328):
42
Figure (5) Fodor (1978) pointed out that the effects of Wh-movement are not strictly local. The sstructure position of a wh-phrase can be arbitrarily far from its d-structure position. The sentence Which city did Ian visit can serve as an example
43
Figure (6) The analysis proceeds by creating the appropriate CP structure by attaching the phrase which city in the [SPEC, CP] position. Then the analysis proceeds to account for did by attaching it to the C position. Then the analysis proceeds to handle the verb visit by identifying it with a verb that requires a NP. The analysis identifies an antecedent where there is no argument position for the proposed NP. Here comes in the role of the wh-trace (t) attaching it to a post-verbal NP node (Gorrell 1995: 132-133). The fundamental line of argument evident in transformational analyses proceeds from a psychological springboard entrenched in hypothetical reasoning that hardly accounts for the computational handling we aspire to study.
2.2. UDs in Generalized Phrase Structure Grammar (GPSG)

The domineering nature of Noam Chomskys transformational grammar generated a sense of dissatisfaction among leading younger linguists during the early 80s. Gerald Gazdar was one of those leading linguists. Back at that time linguists began to call what is now GPSG Gazdar Grammar. Gerald Gazdar, however, did not like that nor did his collaborators: Ewan Klein, Geoffrey Pullum and Ivan Sag. Their main focus was on the study of PSGs (Phrase Structure Grammars) but they did not have a specific name for what they were doing. After attending a talk by Emmon Bach called Generalized
44
Categorial Grammar, they decided to pinch his use of the word generalized to describe their work, which became known as Generalized Phrase Structure Grammar.1 GPSG is just CF-PSG with some novel notation and rule collapsing techniques. But what is CF-PSG? CF-PSG stands for Context Free-Phrase Structure Grammar. It is a formal grammar in which all the rules which directly license local sub-trees are context-free rules. A context-free rule is a rewrite rule which expands exactly one category into an ordered string of zero or more categories and for the application of which no environment is specified (see Trask 1993: 59-61; Bussmann 1996: 365-66). These rewrite rules are of the form A X1 Xn, e.g. S NP + VP This is GPSG from a formal perspective. However, GPSG is the result of a successful mlange of linguistics, computer science, mathematics and symbolic logic. Gazdar (1981) was the first attempt to criticize the core of TG the concept of transformations. The very first line of this formative work reads as follows: Consider eliminating the transformational component of a generative grammar. However, his most pronounced statement concerning the rejection of transformations comes two paragraphs later: My strategy in this article will be to assume, rather than argue, that there are no transformations, and then to show that purely phrase structure (PS) treatments of coordination and unbounded dependencies can offer explanations for facts which are unexplained, or inadequately explained, within the transformational paradigm. (Gazdar 1981)
I became aware of this background through a personal communication with Gerald Gazdar himself on the 10th of April 2002. He was kind enough to explain to me the meaning of Generalized, which really confused me at the time. My sincerest thanks are due to him.
1
45
It has to be noted that GPSG was something of a revolution against Chomskyan TG (Gazdar et al. 1985; Horrocks 1987; Borsely 1999; Falk 2006). And since the class of UDs was one of the constructions that TG adherents used as proof of the inadequacy of the class of Phrase Structure Grammars (PSGs) in describing natural language syntax, Gazdar and his collaborators decided to show that this assumption was basically mistaken (Falk 2006).1 Thus the earliest work in GPSG dealt with UDs in greater detail. Gazdars paper opened up new avenues of research in theoretical linguistics and formal computer science producing four years later the seminal and foundational work by Gazdar, Klein, Pullum and Sag (1985).2 In Gazdar et al. (1985) we will encounter the first formally perspicuous exposition of the nature of UDs. According to Gazdar et al. (1985: 137) an unbounded dependency construction is one in which (i) a syntactic relation of some kind holds between the substructures in the construction, and (ii) the structural distance between these two substructures is not restricted to some finite domain (e.g. by a requirement that both be substructures of the same simple clause).
GPSG was a frontal attack on transformational grammar. It not only attacked the lynchpins of the concept of transformations, but it also showed how unfounded other sacrosanct concepts such as Deep Structure vs. Surface Structure are. Another attack was against the permeating psychologism of TG and its claim to universality. As such and against this backdrop, GPSG was founded on a monostratal model (a model that accepts no dualisms or hypothesized deep vs. surface dichotomies) with an intricate use of set-theoretic concepts, just to cleanse their syntactic model of any possible trace of psychologism. In spite of the rigorous nature of GKPS, the first chapter has this air of revolutionary manifestoes, and it is by far the authors clearest statement on what GPSG is, (see Gazdar et al. 1985: 1-16). 2 Sometimes abbreviated as GKPS based on authors initials.
46
(iii)
topicalization, relative clauses, constituent questions, free relatives, clefts, and various other constructions in English have been taken to involve a dependency of this kind.
According to Gazdar et al. (1985: 137), it is analytically useful to think of such constructions, conceptualized in terms of tree geometry (in the usual way, root up and leaves down), as having three parts: the top, the middle and the bottom. The top is the substructure which introduces the dependency, the middle is the domain of the structure that the dependency spans, and the bottom is the substructure in which the dependency ends, or is eliminated. Gazdar et al. (1985: 138) illustrate their proposed tree geometry as follows:
47
Figure (7) Tree geometry of the structure of a UD in GPSG
Gazdar et al. (1985: 138) theory of UDs claims that the principles which govern the bottom and the middle are completely general in character, in that all types of UDs receive the same treatment. The idea is that the proposed analysis of UDs will be focused on the middle of the construction which involves no more than the feature SLASH along with feature instantiation principles. Of these principles the Foot Feature Principle (FFP) is the most important.
48
The central claim of GPSG analysis of unbounded dependencies is that these dependencies are simply a global consequence of a linked series of mother-daughter feature correspondences. The main formative components of GPSG is a set of metarules that generate other rules, such as Immediate Dominance (ID)/ Linear Precedence (LP) rules, along with feature instantiation principles, such as FFP, Head Feature Principle (HFP) and SLASH. The feature SLASH, however, is our mainstay in the analysis of UDs, because it represents and accounts for the behavior of the most significant element in an unbounded dependency construction gaps. But what is a SLASH? When we write down in quasi-algebraic notation that we have, for instance, a set A/B, this means that the set A lacks or is missing the element B. The SLASH or [/] is originally an algebraic symbol for a missing element. The value of the SLASH feature will be a category corresponding to a gap dominated by the categories bearing a SLASH specification. A gap is created by some Immediate Dominance (ID) rule which introduces a constituent that has a SLASH feature; the feature-matching principles of GPSG push it down the head path of the category on which it first appears, and a multiplicity of metarules allow it eventually to be cashed out as a gap at the bottom of a nonlocal tree structure (see Levine 1989: 124-5). The best way to come to grips with the effects of the FFP apropos slash categories is to inspect an example of its application. Consider the following ID rules:
49
According to the above rules and according to feature instantiation principles, we can predict that the resulting structures will be the following:
Though the above notation seems a little difficult to follow, it is actually very straightforward. Rule (e.) above, for instance, refers to a verb phrase (VP) missing (/) a noun phrase, an object in this case (NP), which conforms with ID rule number (45) that deals with transitive verbs that takes a prepositional object as part of its
1
Numbers in square brackets refer to a list of rules provided as an appendix in Gazdar et al. (1985: 245-9).
51
subcategorization such as approve of, which itself lacks the existence of this object (PP/NP). Now, we need to see an example illustrating all the formal nuances mentioned above. A topicalized sentence like (a) will suffice. (a) Sandy we want to succeed. The normal ordering of this sentence would normally reads We want Sandy to succeed. However, a topicalized structure such as (a) within the framework of GPSG can be represented according to the following tree:
Figure (8) The basic idea in GPSG analysis of UDs is that the constituent containing the gap has a missing element feature (Falk 2006). This is represented by the [+NULL] e above. The constituent headed by want is a VP/NP (a verb with a missing object). The
50
e (empty) is a pronominal that refers back to Sandy. The whole clausal constituent containing this VP/NP is S/NP, since it is missing the same NP as the VP it dominates. As a result of the above feature sharing, the same element occupies the filler and gap positions at the same time, without any indication or sign of movement. This movement-less approach to UDs along with a solid formal apparatus (ID & LP rules, metarules, FCRs, FSDs and FIPs) catapulted GPSG as a suitable alternative to the much disputed TG framework. However, GPSG was short-lived: its sophisticated formalism and nuanced quasi-algebraic treatment of complex phenomena such as UDs made it forbidding to the majority of linguists during the 80s. But this was not the end of GPSG, though. For, it continued its existence, as we shall see in the next chapter, in a different guise, this time as the much more successful framework of HPSG (Headdriven Phrase Structure Grammar).
2.3. UDs in Head-driven Phrase Structure Grammar (HPSG)
According to Sag et al. (1999: 435) HPSG was formulated in an intellectually eclectic environment at Stanfords Center for the Study of Language and Information (CSLI). During the 1980s, CSLI was incubating a number of theories, approaches and frameworks that aim at formulating a kaleidoscopic view to language and its mechanisms. Sag and Pollard established their theory of HPSG on a variety of theories and formalisms: situation semantics, data type theory, TG, GPSG, CG and Unification Grammars. This eclectic formation endowed HPSG with an undeniable flexibility on the theoretical and formal levels.
52
There are three known hallmarks in the history of HPSG: the publication of Sag and Pollard (1987); Sag and Pollard (1994); and Sag and Wasow (1999). These are hallmarks in the sense that they marked some definitive changes in the views of the authors or the formal apparatus of HPSG in general. Unlike GPSG, HPSG shifted its attention from rules to features. This is clearly manifested in the adoption of Unification Grammars use of typed (or sorted) feature structures. A typed feature structure consists of features representing linguistic entities (words, phrases, sentences) and values that identify the dimensions of those features. For example, the feature PERSON in a given feature structure has three values: 1st, 2nd, and 3rd. According to this, the word you has the property second person and this is represented by the feature value pair [PERSON 2nd]. Sag and Pollard (1994:8) suggests that the role of their proposed linguistic theory is to give a precise specification of which feature structures are to be considered admissible. And also according to their view, the types of linguistic entities that correspond to the admissible feature structures constitute the predictions of the theory. UDs have received considerable treatment within HPSG. This could be ascribed to two reasons: the first one has to do with the incremental theoretical prerequisiteness of UDs as a sophisticated syntactic phenomenon that many see as a testing-ground for any proposed syntactic theory or formalism (see Winograd 1983; Falk 2006). The second reason has to do with the importance of UDs within the previous contributory progenitor GPSG.1 However, HPSG took the analysis of UDs some steps further. In HPSG UDs get more than a single feature, a wh-feature, as they used to get in GPSG.
1
It has to be noted that Ivan Sag, one of the original expositors of GPSG, became later the central figure in HPSG work for his, along with Carl Pollard, 1987 and 1994 publications.
53
In HPSG they get two distinct features: QUE and REL for questions and relative constructions (Pollard and Sag 1994: 159). This separation could be accounted for on the ground that the only information that needs to be kept track of in an interrogative dependency is the nominal-object corresponding to the wh-phrase, while in a relative dependency the referential index of the relative pronoun is all that is required (see Pollard and Sag 1994). Another difference relates to the realization of feature structures in both GPSG and HPSG. In GPSG, foot features take the same kind of value, which is normally a syntactic category, while in HPSG, nonlocal features take sets as values.1 According to Pollard and Sag (1994: 159) this strategy will enable HPSG to deal with more sophisticated UDs, such as multiple UDs as in the following sentences: 1- [A violin this well crafted]1, even [the most difficult sonata]2 will be easy to play 2 on 1. 2- This is a problem which1 John2 is difficult to talk to 2 about 1. It is noteworthy to mention the fact that in HPSG, strong UDs are analyzed in terms of a filler-gap conception. This peculiar conception underscores the centrality of the concept of gap in any treatment of UDs. This is why I think that HPSG is ahead of most other syntactic theories in the analysis of UDs, because of this very gap-based analysis. This competitive edge will be more clearly accounted for later in this work (see ch.?).
Again the mathematical, especially algebraic, influence on syntactic theory is very much manifested in this instance where the use of sets is borrowed from algebraic Cantorian Set Theory.
54
Take the following sentence (P&S 1994: 160) as an example of how HPSG analyzes a topicalized clause of the strong UDs type: 1- Kim1, we know Sandy claims Dana hates 1.
Figure (8) The analysis provided above looks similar, to a great extent, to Gazdars bottommiddle-top model (see figure 6). In HPSG, the bottom of the arboreal skeleton is where the dependency is introduced, because at the bottom there exists the terminal node that triggers the whole unbounded dependency. This terminal node is associated with a special sign that must be nonempty. As for interrogative dependencies, this sign is an interrogative pronoun (what, which, where, etc.) with a nonempty value for the QUE
55
nonlocal feature, while in relative dependencies, the sign is a relative word (e.g. who, which) having also a nonempty value for the REL nonlocal feature. What really distinguishes HPSG from previous theories or formalisms is its reliance on associativity: it attempts to associate linguistic objects with each other by a number of concepts and techniques. Central to these is the concept of inheritance hierarchy, the embodiment of which can be seen in the above tree diagram (figure 8). Instead of the crude movement transformations in all versions of Transformational Grammars, we get here a more computationally sound technique where the traits of a certain linguistic object are inherited from one object to another. The SLASH category in the above tree, for example, is being inherited from one stratum of analysis to the other by inserting boxed numbers and the feature INHER. So the SLASH feature at the bottom of the dependency passes from daughter to mother up the tree, and the top is where the dependency is discharged or bound off (Pollard and Sag 1994: 160-161). As with GPSG, HPSG is more inclined towards computational implementation, because it originally availed itself from many computational models and procedures, and it has to be noted here that the concept of inheritance is a genuine computational procedure that HPSG incorporated into its theoretical architectonic.1 HPSG uses a number of features to construct what it considers to be a complete description of a given linguistic entity. For the description of the syntax-semantics interface, for example, it employs a feature SYNSEM that represents the syntactic as well as the semantic content of a particular lexical item. This is realized via what HPSG
The idea of inheritance is directly borrowed from computer science, especially from work on Genetic Algorithms which resorts to biological jargon and concepts such as inheritance, evolution and survival of the fittest (see Dopico et al 2009)
1
56
theorists call feature structures. A feature structure is a representational tool whereby a linguistic sign is represented in HPSG. Feature structures consist of typed features such as word, phrase, clause, head, daughters, etc. Feature structures are concepts that get materially represented via Attribute-Value Matrices or AVMs. An AVM is like an identity card: it has all the basic information that should be available for a linguistic entity to be identified. For instance, a verb in the third person like sees is represented according to the following feature structure:
Figure (10) According to the above sign or feature structure or AVM, the lexeme sees has the phonological value (PHON) sees , the agreement value (AGR) 3sg = third person, and the case value (CASE) acc = accusative. This is a rather simplified version of an HPSG feature structure. But it gives an idea about its main representational scheme. However, the most succinct definition of HPSG I came across can be found in Robert Levines article in the Encyclopedia of the Cognitive Sciences (2003), he says
57
Head-driven phrase structure grammar is a monostratal theory of natural language grammar, based on richly specified lexical descriptions which combine according to a small set of abstract combinatory principles stated as formulae in a constraint logic regulating, for the most part, the satisfaction of valence and other properties of syntactic heads. These constraints, applying locally, determine the flow of information, encoded as feature specifications, through arbitrarily complex syntactic representations, and capture all syntactic dependencies both local and non-local in elegant and compact form requiring no derivational apparatus.
This theoretically rich definition deserves an equally rich analysis. The first fact about HPSG in this definition is that it is monostratal, which means that it does not subscribe to derivational or transformational theories of natural language grammars (see fn.1 in p.28 above). This, of course, reminds us of the early beginnings of GPSG (Gazdar 1982). The second important notice that really characterizes the theory of HPSG is its lexicalism: as Levine (2003) puts it, HPSG is based on richly specified lexical descriptions. This highlights HPSGs attention to the value of lexical items as bearers of information and as the glue that binds linguistic descriptions together. In fact, HPSG is head-driven because it relies on lexical heads, such as sees above, on its descriptions of linguistic entities. Finally, the definition gives us a hint concerning HPSG recourse to mathematical and logico-mathematical jargon in its descriptions of local and nonlocal (UDs) syntactic dependencies in an elegant and compact form. 1 Implied here is
Note here also the use of elegant and compact which is a commonplace description in mathematical and logico-mathematical literature. A mathematical proof, for instance, has to be elegant and compact in the sense that it admits of no logical fallacies, internal inconsistencies or needless tortuous sub-proofs.
1
58
the idea that a derivational apparatus as in GB, P&P and MP formalisms is essentially inelegant and incompact. HPSG, then, looks at UDs as filler-gap constructions (Pollard & Sag 1994), or as constructions with gaps (GAPs) that can be resolved via the detection of the sites or positions of those gaps and relating them to their original positions via inheritance. This is realized by stipulating what HPSG calls the GAP Principle (Sag &Wasow 1999; Carnie 2003). The GAP Principle states the following: A well-formed phrase structure licensed by a headed rule other than the Head Filler Rule must satisfy the following SD1:
Figure (11) This means that the mother GAP feature subsumes all the GAP values in its daughters. The symbol in the diagram above simply refers to the arithmetical notion of adding
up to, but this time the entities added are not single linguistic objects but lists of linguistic objects (Sag & Wasow 1999: 351). The boxed n above is also the arithmetical indication of the idea of any number of. Gaps in HPSG will be more thoroughly, and comparatively, explored later along with other syntactic frameworks.
SDs stand for Structural Descriptions, which are the amalgamation of constraints from lexical entries, grammar rules, and relevant grammatical principles. See Sag &Wasow (1999: 68)
59
2.4. Categorial Grammar(s)
From a historical vantage point, Categorial Grammars (or CGs) antedate all generative theories of syntax. CG was first formulated within a strictly logical backdrop: it was Kasimir Ajdukiewicz, the famous Polish logician and algebraist of the Lvov-Warsaw school of logic and mathematics, who introduced the idea of functional syntax in his Die syntaktische Konnexitt (1935). But Ajdukiewiczs treatment was strictly logicomathematical, a fact which made his work quite forbidding for linguists. 1 Two decades later, Yehoshua Bar-Hillel (1953), also a logician, came along with a revived interest in Ajdukiewiczs CG, but this time combining it with many insights and methods from American linguists during the 1950s. This new combination of ideas and methods of mathematical logic and structural linguistics spawned a novel interest in CG in the USA and the Continent. The interesting thing about Bar-Hillels revival of CG is his belief in the suitability of CG for machine translation purposes. That explains why computational linguists tend to prefer CG, and other likeminded formalisms, over other syntactic theories bereft of such computational aptitude. Being an offshoot of advanced logical and formal studies, CGs emphasis on the semantics of natural languages is naturally expected. Unlike other formalisms and theories of syntax, CG has no separate module for semantic processing; for it sees semantics as an inherently inextricable component of syntactic description. In other words, syntax and semantics in CG are one and the same thing: every rule of syntax is,
1
Besides being an excruciating reading even for the initiated in mathematical logic, Ajdukiewiczs paper appeared in a Polish philosophical journal and has therefore been unknown to most linguists, (Y. Bar-Hillel 1953: 1).
61
inherently, a rule of semantics (Wood 1993: 3). CG has the following properties (Wood 1993: 3-5): (1) It sees language in terms of functions and arguments rather than of constituent structure. (2) Syntax and semantics are integral. (3) It is monotonous (monostratal), i.e. it avoids destructive devices such as movement or deletion rules which characterize transformational grammars. (4) It takes to its logical extreme the move towards lexicalism, i.e. the syntactic behavior of any linguistic item is directly encoded in its lexical category specifications. The other peculiar aspect that has to do with CG and UDs is the somehow troubled relationship between the two. Ironically, Bar-Hillel lost faith in CG because he found out that it was unable to process discontinuous constructions (such as UDs) (Wood 1993: 23,104). But the theory of CG during the 1960s was not very much developed to handle such sophisticated syntactic constructions such as UDs. Since that early, UDs intractability was recognized as a processing fact that any syntactic theory or formalism has to efficiently and rigorously account for. Classical CG did not offer any straightforward method to deal with UDs (Wood 1993: 104). However, Ades and Steedman (1982) used the recursive power of generalized composition to reach what they called a derivational constituent which can be utilized to apply backwards to the fronted object giving the correct semantic interpretation (Wood 1993: 105). A sentence like Who(m) do you think he loves? can be represented according to Ades and Steedman (1982) in the following way
60
Figure (12) Recent advances in CG produced the more elaborate type-logical categorical grammar. What interests me the most in this more advanced formalism is its proposal of a novel procedure to handle gaps in UDs. Bob Carpenter (1997) adopts Moortgats approach to UDs to account for the existence of gaps and how they should be treated within a CG-based framework. As Carpenter (1997: 203) mentions, Moortgats analysis rests on proposing an additional binary category constructor,, that can be used to construct equations of the form AB. This equation means that there is a category A missing somewhere within it a B. For instance, snp is a sentence from which a noun phrase has been extracted. The extraction constructor AB is a generic form for both A/B and A\B that may be instantiated in the following: snp = s/np or s\np which indicate a sentence lacking a noun phrase on the right or left frontiers. The use of the SLASH feature in CG is similar to that in GPSG and HPSG; the difference lies in the adoption of feature structures and AVMs in HPSG and the adoption of the Lambek
62
calculus (a semi-algebraic linear formalism) in CG. An example of how advanced CG handles a UD can be of use here. The phrase Who Jo hits is formally represented in CG according to the following schemata, see Carpenter (1997: 206).
Figure (13) A representation of who Jo hits?
The postulation of (snp) in the beginning of the relative or interrogative clause (under who) is the notational tool that unravels the unboundedness of the structure by postulating that there is a missing noun phrase somewhere in the construction.
2.5. Lexical Functional Grammar (LFG)

This is the fourth syntactic theory through which I try to explain and unravel the nature of UDs. LFG is one of the most prominent theories of grammar belonging to the generative tradition. It is also one of the theories that subscribe to a nontransformational agenda. Being non-transformational boosted the theorys potential for a rigorous treatment of UDs. This is due to the fact that most non-transformational
63
theories and formalisms are more inclined towards formalisms that are couched in mathematical or semi-mathematical terms. This is the case with LFG. But in what sense LFG is different from the other theories mentioned above? It differs from GPSG and CG in that LFG is, in fact, a complete theory of language syntax, with a separate explanatory module for the study of language acquisition, universals and cognitive aspects. This is not the case with GPSG or CG, because both of them, and especially GPSG, pose ruthless critiques to the prevalent psychologism in GB and P&P. And both of them are more devoted to such applications as computational linguistics and AI. LFG is similar to HPSG because the latter also sustains certain claims to universality and psychological reality. But all of them share a staunch rejection of transformational rules and assumptions. They also share their avid interest in lexicalism: the four of them (GPSG, HPSG, CG, LFG) see the lexicon as the springboard for any viable and true grammatical analysis. As opposed to GB and P&P, the non-transformational approaches mentioned above see lexical categories as the keys with which we can unravel syntactic riddles, especially the riddle of UDs. That also accounts for the high importance of UDs analyses within the frameworks of all those theories. GPSG proposed the Head Feature Principle, which restores to lexical items their due powers instead of ascribing all powers to extra-linguistic features and movements as is the case with transformational grammars (Falk 2001). HPSG, which is a more stringent framework than GPSG (Carnie 2008), bases the entire linguistic analysis on the head sign, which is an instantiation of a certain lexical item or word. CG is even more extremist on the issue
64
of lexicalism; that is why it derives its analytical momentum from certain atomic lexical categories. LFG is also lexical or lexicalist because the lexicon plays a major role in it. In LFG (Dalrymple ELL2 2006) the lexicon is richly structured, with lexical relations rather than transformations or operations on phrase structure trees as a means of capturing linguistic generalizations. Yehuda Falk (2003) adds to the major tenets of LFG what he calls the Lexical Integrity Principle, which states the following: Words are the atoms out of which syntactic structure is built. Syntactic rules cannot create words or refer to the internal structures of words, and each terminal node (or leaf of the tree) is a word. (Falk 2003: 4) The other aspect of LFG has to do with its emphasis on functionalism. The functional part of LFG means that grammatical functions (or grammatical relations) such as subject and object are primitives of the theory, not defined in terms of structural configurations or semantic roles1 (Dalrymple 2006). LFG grants such grammatical functions as subject and object a rather universal character where such abstract grammatical functions are at play in the structure of all languages no matter how dissimilar they might appear. The theory assumes that as languages obey certain universal principles as regards abstract syntactic structures, they do the same thing regarding the principles of functional organization (Dalrymple 2001). This is LFG as pertains to its nomenclature, i.e. the lexical and functional epithets.
This is the standard view of transformational approaches. According to this view subject and object are not part of the syntax vocabulary, i.e. they are extra-configurational. Those grammatical functions or relations derive from the phrase structure they happen to occur in. If subjects, for example, can be controlled, this control, according to this view, is attributed to the structural lineaments of the position where the subject occurs. For a more in depth discussion, see Falk (2003).
1
65
C-structure and F-structure: The two divisions of the formal architecture of LFG are constituent structure (cstructure) and functional structure (f-structure). The c-structure is concerned with the description of syntactic structure while the f-structure details the semantic-cumfunctional structure of the linguistic entities concerned. The formal machinery of cstructure depends on X-bar syntax with the addition of a number of techniques and concepts that characterize the LFG theory and its formalism. C-structure can be illustrated according to the following figure (Falk 2003) analyzing the following clause: What Rachel thinks Ross put on the shelf
Figure (14)
According to this description the empty category (e) is tied to or bound with the antecedent filler by what LFG calls metavaraibles represented by the up and down arrows. The use of double arrows has been left over in the more recent versions of LFG
66
incorporating functional components into the tree; this could be illustrated in the following sentence:
Figure (15) the c-structure of What Rachel thinks Ross put on the table?
The corresponding f-structure looks like the following
Figure (16) the f-structure of What Rachel thinks Ross put on the table?
The previous descriptions are classic representations of UDs that are due to Kaplan and Bresnan (1982) and Kaplan and Zaenan (1989) respectively. More recent advances in LFG tend to be more detailed and hence more sophisticated. The following example from Asudeh (2009) is just an example. The
67
clause What did the strange, green entity seem to try to quickly hide? gets the following constituent and functional descriptions respectively:
Figure (17) C-structure of What did the strange, green entity seem to try to quickly hide?
(Asudeh 2009)
68
Figure (18) F-structure of What did the strange, green entity seem to try to quickly hide? (Asudeh 2009)
The interesting thing about this clause, however, is that it not only describes how LFG handles the phenomenon of UDs but it also describes a host of other syntactic phenomena such as Adjunction, Raising and Control. To sum up, early LFG (Kaplan and Bresnan 1982) analyzed UDs in terms of cstructure that explicitly drew the relation between a displaced constituent and its corresponding gap via the double arrow notation. However, Kaplan and Zaenan (1989)
69
showed that the previous treatment was deficient in accounting for functional constraints on UDs (Dalrymple 2001). This led them to incorporating f-structure components in their analysis of UDs, thus abandoning the double arrow notation as seen in figure (15) above.
2.6. Towards an Ontology of Gaps

The previous accounts pose a serious question as to the various treatments of UDs. But, despite the various moot points among the many theories and formalisms scantily described in the previous sections, the one thing that all those theories tend to agree upon is that the key to unlocking the sophistication of unbounded constructions lies in providing a rigorous account of gaps (a.k.a. empty categories, null elements, missing elements, SLASH categories, traces). A correct and rigorous account of gaps will be the liaison between the purely theoretical treatment of UDs and computational implementation. This is due to the fact that dealing with gaps represents a crystallized problem, and all computational theorizing or implementation is based on problemsolving. Thus first we need to identify what might be called an ontology of gaps.1 What I mean by ontology here is not very much dissimilar from the ontology used in mainstream computer science. In computer science, there is no agreed upon definition of ontology (Cimiano et al. 2010: 579). However, the definition that I see quite pertinent to what I mean is Noy and McGuinnesss:
Ontology here roughly means a model for description detailing the multifaceted nature of gaps. In computer science terms the ontology I am proposing here is some type of a domain ontology in which I seek to describe the vocabulary (main elements) in that specific domain. For more details see: (Cimiano et al. 2010; Noy and McGuinness 2001).
71
Ontology is a formal explicit description of concepts in a domain of discourse. (Noy and McGuinness 2001). According to this definition, an ontology of gaps will be a formal and explicit description of the concept of gaps and its different instances in the domain of syntax or grammatical description. In order to guarantee better visibility conditions for an explicit ontology of gaps, I will largely depend in the following section on Rodney Pullum and Geoffrey Huddlestons magisterial The Cambridge Grammar of the English Language (henceforth: CGEL).1 My justification for this choice rests on the fact that CGEL exhibits a poised disposition towards most theories and formalisms in the literature. It is to a great extent theory-independent and it is based on decades of specialized research that benefited from theoretical insights and the most recent findings in the study of the grammar of English.
2.6.1. Gaps between Objects and Subjects:

In most instances gaps occur in object positions as in (i): i. Whati [did you buy_i]?
The gap in (i) is an object of the verb buy. But gaps can also occur in subject positions as in (ii):
Published in 2002, the CGEL (around 1850 pages) has become the foremost reference work on the grammar of the English language. It by far superseded and eclipsed Quirk et al. Comprehensive Grammar (1985). However, the thing that really distinguishes CGEL from other reference grammars is its ability to stay theory-independent all along. In other words, despite the many and sometimes the necessary temptations of preferring or endorsing a specific syntactic theory or formalism instead of another, CGEL remained impervious to such temptations.
70
ii.
Whoi [do you think [_i was responsible]]?
The gap in (ii) functions as subject of the clause [who] was responsible. So according to this, gaps in an unbounded dependency construction can function only as either: (a) A post-head dependent; or (b) Subject in clause structure According to (a) above gaps in UDs can function as objects (post-head dependent), the statement is straightforward as regards direct objects, but what is the case of indirect objects, i.e. can gaps in UDs functions as indirect objects? CGEL sees that gaps in indirect object positions are normally not accepted. However, the specification normally here leaves some space to argumentation. It is simply an issue of acceptability judgments. Gaps do not only occur in object positions, they also occur in subject positions as well. Within this framework we must distinguish between immediate subjects and embedded subjects. Because in the case of embedded subjects gaps are permitted only in bare content clauses, i.e. clauses without the that subordinator, as in the following sentence: i. He is the mani [they think [_i attacked her]].
In (i) above the sentence is a bare declarative, i.e. it has no that as a subordinator. While with immediate subjects we can have gaps in normal positions and with straightforward referencing, as is the case with the following sentence: ii. This is the copyi [thatis [_i defective]].
In (ii) above that functions as subordinator in prenuclear position, and the subject in the nucleus is sustained by a gap that is linked to the antecedent copy. This construction is
72
of the type where the antecedent of the gap occurs outside the syntactic domain of the relative clause. In the sentence the others I know are genuine, we can see this clearly where the object of the clause the others occurs outside the nucleus clause I know _ are genuine, which corresponds, if we fill the position of the gap, with the canonical clause I know the others are genuine (figure 19 below). This is a typical case of topicalization.
Figure (19) A subject-predicate analysis of the topicalized sentence The others I know are genuine. CGEL
2.6.2. The Distribution of Gaps:

Part of specifying an ontology of gaps in UDs is to know exactly their expected whereabouts in a specific syntactic domain. Expectedly, despite the fact that unbounded dependency constructions have no upper bound in a given tree on how embedded the gap will be, there are constraints on where these gaps should occur. The following table adapted from The Cambridge Grammar (CGEL pp. 1088-1096) represents a tally of the
73
permissible position/functions of such gaps. This table will be of immense importance in the formation of a gaps-ontology in the next section.
Position/Function Example(s) (Mono-locus Gaps)
VP in predicate function
Most of the criticismsi he [accepted_i with good grace]. I dont know [wherei he [found it_i]]. It was to her cousini [that she [sold the business_i]].
AdjPs in predicative complement function
Whether it is ethicali Im not [so certain_i]. Thats the only crime [of whichi they could find him [guilty_i]].
Post-head complement
It was herei [she said [she found the knife_i]]. Hes the only onei [that Im [sure she told_i]]. Here is a booki [I think [_imight help us]].
Closed interrogatives in complement function
There are several booksi here [that Im not sure [if youve read_i]]. The actor had to be careful with the amount of venom poured into a character [whoi in the end we dont know [whether to hate or pity_i]].
Open
These are the only dishesi [that they taught me [howj to
74
interrogatives in complement function
cook_i _j]]. The man in the dock was a hardened criminali [that the judge later admitted he didnt know [whyj he had ever released_i _j in the first place]].
Non-finite clauses in post-head complement function
Its youi [I want [to marry_i]]. Whati did you [tell the police_i]]? They are the ones [to whomi he had the weapons [sent_i]].
PP
Some of usi he wouldnt even speak [to_i]. This is the knifei [you should cut the tomatoes [with_i]].
NP
To which safei is this [the key_i]? He knows little about any of the companies [in whichi he owns [shares_i]]. What kinds of birdsi have you been collecting [pictures of_i]?
Modifiers
Thats the cari [Im saving up [to buy_i]]. Which monthi are you taking your holidays [in_i] this year?
10 Subjects
They have eight children [of whomi [five_i] are still living at home].
11 Coordinates
Who was the guyi[that [Jill divorced_i] [and Sue subsequently married_i]]? Table (1) Position/Function of Gaps
75
Multi-locus Gaps Nested dependencies
Example(s) Which of the two instrumentsi will this piecej be easier [to play_j on_i]?
Parasitic gaps
They do an annual reporti [that I always throw_i away without reading_i]. Table (2) Multi-locus Gaps
Mono-locus gaps represent a certain pattern where a gap occurs in a single position inside a particular unbounded dependency, such as 1: i-iii in table (1) above. While multi-locus gaps are instantiated in nested dependencies and parasitic gaps. In a nested dependency, as in (i) in table (2) above, the infinitival phrase to play on is called a hollow clause, while the underlined elements mark the antecedents of the two gaps: the first functioning as object of the verb play and the second as object of the preposition on. In constructions containing parasitic gaps, the gaps are called parasitic because the antecedent of the gap in the construction happens also to be a gap itself, i.e. there is a gap playing the role of a parasite on the other gap; or there is a gap that is dependent on the existence of another gap referred to as the real gap [Engdahl 1983; Sag 1983].1CGEL (p. 1095), however, provides a caveat against confusing parasitic gaps with across-the-board gaps. Those latter gaps are simply instances of gaps in coordinated constructions as in
1
ElisabetEngdahl is the one who coined the term parasitic gaps in her 1983 paper with the same title in Linguistics and Philosophy. In the same issue Ivan Sag offered corroboration (Sag 1983) of Engdahls insights but within a GPSG framework offering a solid proof on the validity and feasibility of the then nascent GPSG approach.
76
i. ii.
It was a proposali [that [Kim supported_i] [but everyone else opposed_i]]. Whoi did you say [[John liked_i] and [Mary hated_i]]?
Engdahl (1983) also calls gaps of this type coordinate gaps. The idea of coordinate gaps lies in the fact that the second gap cannot be substituted with a personal pronoun (e.g. it), so there is nothing parasitic about this gap because it functions the way it does because of the necessary rules of coordination. As is well known in coordinate structures, processes like relativization must apply across the board: if relativization applies within one coordinate it must apply within all (CGEL p.1096).
2.6.3. The Ontology:

Understanding the phenomenon of gaps in unbounded dependencies is like viewing a large mosaic: one needs to take some steps backwards in order to be able to see and understand the larger image and how the assemblage of small pieces of stone or ceramics ultimately proves meaningful. This is what Im going to do in this section. So far, gaps in UDs have been showcased from many angles and positions. For one thing we have the many types of UDs, e.g. wh-relatives, wh-interrogatives, cleft sentences, topicalized sentences, etc. We also have two classes of UDs: strong and weak. There is also the many positions/functions of gaps in UDs, and finally a multiplicity of syntactic theories and formalisms (P&P, GPSG, HPSG, CG, LFG) all approaching the problem of UDs gaps analysis one way or the other. In other words, there are many givens that could shed light on the overall picture of gaps in UDs. But these givens need to be rearranged.
77
We will assume that gaps are the focus of the research here and that they are hub around which everything else that has to do with UDs is connected. This can be illustrated in the following figure:
GAPS
Positions/ functions (table 1)
UDs types (UDtp)
UDs class (UDcl)
Figure (20) In the above chart positions/functions (abbreviated as P/F) will play the role of the inventory (data in table 1) where all the possible grammatical positions and functions of UDs are stored. The UDs types (abbreviated as UDtp) are the wh-relatives, clefts, topicalizations, etc. The UDs class (UDcl) tells us whether a certain UD belongs to either the strong or the weak classes of UDs. We could have added a box for syntactic theories and formalisms in the chart above, but I shall defer the reasons for such exclusion till the very end of the work. Now we can formulate the above chart in a more familiar form the attribute value matrix (AVM).
78
Figure (21) This AVM or list will be the cornerstone on which the remainder of the work rests. The above AVM simple as it may seem, it actually rounds up all the information we gathered and studied on UDs. The simplicity of the AVM resuscitates the question of formal rigour and brings it back to the fore. I believe that the above GAPS AVM constitutes an ontology of gaps in UDs that is formally rigorous, because the above AVM possesses two of the major characteristics that are expected in any formally rigorous system: it is complete (in the sense that it possesses all the grammatical information available) and consistent (in the sense that none of the elements in the list contradicts the other). These are the highest possible ideals in any rigorous system, even in axiomatic systems in metamathematics, completeness and consistency.1 As mentioned above the GAPS AVM along with data in table (1) will constitute the bridge that will connect the theoretical linguistics sections of the work with the following sections on computational parsing and implementation.
Kurt Gdel (1906-1978) in his On Formally Undecidable Propositions of Principia Mathematica and Related Systems (1931) proved that formal systems can either be complete or consistent. This was termed Gdels Incompleteness Theorem. And as Gdels proof is certainly true of large formal systems, small-scale systems such as the one formalizing gaps in UDs above can indeed be complete and consistent at the same time, because compared to the formalized systems of metamathematics, the GAPS AVM above is admittedly infinitesimal.
79
81
Chapter 3 Parsing and Formal Languages
Computational or automatic parsing is the process whereby input sentences are analyzed within a computational environment. However before getting into the details of parsing theory, parsing strategies and specific techniques for handling UDs as a parsing problem we must first shed light on the foundations on which parsing theory and, in fact, all the theory of computation rest i.e. the theory of formal languages. What makes formal languages candidate for such a foundational position is the fact that they represent the bridge that links natural language processing with computational implementation. Formal languages were first rigorously realized by the German philosopher and logician Gottlob Frege (1848-1925) in his Begriffsschrift (1879) in which he tried to formulate a formal system for the description of linguistic, logical and mathematical concepts (Grattan-Guinness 2001). This early confluence of the linguistic, the logical and the mathematical attested to the importance of formal languages as a successful descriptive and explanatory device. The works of Kurt Gdel (1906 1978), Alan Turing (1912 1954) and Alonzo Church (1903 1995) are clear examples on that. In this chapter I will try to sketch the fundamentals of the theory of formal languages in order to build up for a solid exposition of parsing theory, parsing strategies and architecture. Following are a number of definitions that will serve as a legend without which we will not be able to read the map of parsing theory.
80
3.1. The Concept of a Formal Language:

A formal language is like any other natural language in that it consists of a vocabulary, a grammar (syntax) and rules that govern the interaction of them both. The main characteristic difference between a formal language and a natural one is that the former is non-ambiguous, i.e. it does not allow semantic ambivalence, homonymy or metaphorical insinuations. A formal language is a quasi-mathematical language that values rigor and consistency above all else. That explains the symbolic, almost algebraic, nature of all formal languages. A formal language consists of finite strings of elements of some basic vocabulary. An alphabet or vocabulary V is a finite nonempty set the elements of which are letters. A word over an alphabet V is a finite string consisting of zero or more letters of V whereby the same letter can occur several times (exactly like natural languages). The string consisting of zero elements or letters is called the empty word, written as . Thus , 0, 1, 001, 1111 are words over the alphabet V= {0, 1}. The set of all words over an alphabet V is denoted by V*. An arbitrary set of words of V*is called a language and is usually denoted by L. This notion of language is fairly general but not practical, though. It includes all written languages be they natural or artificial, but it does not tell us how to define a particular language (Revez 1985). But because of the almost infinite number of natural and artificial languages, we need to describe and define a particular language in a rigorous manner. For instance, if V= {a, b} then L1 = {a, b, } L2 = {ai bi i = 0, 1, } L3 = {PP-1 PV*}
82
L4 = {an2 n = 1, 2, } The above languages are all well-defined languages. But we need other specific tools to define more realistic languages. Here the need for a generative grammar comes to the fore. But what is a generative grammar from a formal language perspective.
3.2. Defining a Generative Grammar:

A generative grammar (or simply a grammar) G is an ordered four-tuple (VN, VT, S, F) where VN and VT are finite alphabets with VN VT = , S is a distinguished symbol of VN, and F is a finite set of ordered pairs (P,Q) such that P and Q are in (VN U VT)* and P contains at least one symbol from VN. The symbols of VN are called non-terminal symbols or variables and normally they are referred to in the form of capital letters. The symbols of VT are called terminal symbols and will be referred to in the form of small letters (Revez 1985). According to the above definition of generative grammar, the two sets of nonterminal and terminal symbols are disjoint sets in every generative grammar. And the non-terminal symbol S is called the initial symbol, and typically is used to start the derivation of the sentence (Revez 1985). An alternative definition defines a grammar as a G where G consists of, also, a fourtuple (V, T, P, S).1 The T represents the V represents the variables, the T the terminals,
There are few alternative notations such as (N, , S, P) and (N, T, S, P). But all of these different notations refer to the same components: N refers to the set of non-terminals (NP, VP, PP, etc.), or T to the set of terminal symbols (i.e. words or lexical items), S to the distinguished or unique starting symbol (the sentence) and P refers to the set of rules that govern the behavior of all the previous sets (i.e. phrase structure grammar rules of the type S NP VP)
1
83
the P the production rules and the S the unique start state or the sentence. The language that is described by a grammar G is denoted L(G) (S. Walter 1990).
3.3. Formal Grammars and their Relations to Formal Languages

In the 1950s Chomsky developed a novel classification of language types in which he defined four types of grammars: type 0 grammars, type 1, type 2 and type 3. His model was an attempt at mathematizing1 the concept of grammar in relation to the study of natural language. Shortly afterwards, Chomskys mathematized concept of grammar took over within the field of computer science and the study of formal languages. Computer scientists and programmers became aware of the importance of that concept when the syntax of the programming language ALGOL was defined by a context free grammar (Hopcroft and Ullman 1969). However, before discussing the Chomsky Hierarchy we must first illustrate the relation between a grammar and a language in this formal context.
The difference between mathematized and mathematical should be stressed here. Chomskys attempt during the 1950s was an attempt to mathematize the study of natural languages: he wanted to divest the study of linguistics at that time of its behavioral and structuralist garbs by imbuing the study of language with a formidable type of rigor that can only be found in mathematics and mathematical logic(s). He did not use mathematics in the strictest sense of the word. Only the feel of mat hematics. That is why the use of mathematized above is more accurate than the misleading mathematical. Leonard Bloomfield was actually the first of the structuralists to draw the attention of linguists during the 1920s to the importance of using mathematics (Cantors set theory and Euclids postulational method) in the study of language (Bloomfield 1921, 1926, 1938). For a learned and detailed study of the impact of the formal sciences on linguistics during the 20th century see (Tomalin 2006).
1
84
The language generated by a grammar is the set of all terminal strings that can be derived using the productions of that grammar, each derivation beginning with the start symbol of that grammar. (Alan Parkes, 2008). A language then is the product of specific grammar rules that interact amongst themselves producing an abstract structure of a language. Let us assume that G represents one version of a Phrase Structure Grammar (henceforth: PSG) and that G is specified as (N, T, P, S). The following table (Alan Parkes 2008) well illustrates the definition of a PSG.
Table (3) Formal elements of a PSG. The above table clearly states that a PSG cannot accept an empty category eon lefthand sides, i.e. empty or null strings cannot appear alone on the left-hand side of any of the productions of a PSG. However, such empty strings or terminals are allowed on
85
right-hand sides. Given a rigorous definition of what PSG is and what it consists of, now we can approach the Chomsky hierarchy on more solid grounds.
3.4. The Chomsky hierarchy

Noam Chomsky, one of the founders of formal language theory, proposed (in Chomsky 1956) a formal classification of formal languages that came to be known in the literature as the Chomsky Hierarchy. The hierarchy defines classes of grammars or languages of strictly increasing power.1Chomsky defined four grammar types: type 0, type 1, type 2 and type 3 (Millesh and Ritchie 2000; Linz 2001; Martin-Vide 2003; Rvsz 1985 and Salomaa 1973). It has to be noted that knowledge of these types and the ability to distinguish between them are of paramount importance both to the computer scientist and the computational linguist alike. Type 0 grammars2: This class of grammars utilizes unrestricted rules. This means that both sides of the rewrite rules can have any number of terminals or non-terminals. Type 0 grammars are equivalent in their expressive power to Turing machines. Type 1 grammars: These grammars are restricted in that the right hand side of the rewrite rules must contain at least as many symbols as the left hand side. It is context sensitive because in
Power here refers to a grammars generative power or generative capacity: the grammars ability to define strings of a particular language. The most powerful grammars are those at the top of the hierarchy, while the least are near the bottom. In other words, if type 0 grammars are at the top of the hierarchy this means that this type or class of grammars are the most powerful in that they can generate (or better still define) a greater number of languages than grammars of type 1, 2, or 3. 2 Also known as unrestricted or general rewrite grammars and recursively enumerable grammars.
1
86
a type 1 grammar a rule such as the followingASB AXBmeans that an S can be rewritten as an X in the context of a preceding A and a following B. Type 2 grammars: In this type of grammar the left-hand side consists of a single non-terminal symbol. This means that the left-hand non-terminal can be rewritten in any context within the right-hand side, thus its context-freeness. Type 3 grammars: This class of grammars is the most restricted of all grammar classes. Each rule has a single non-terminal on the left-hand side and a terminal symbol optionally followed by a non-terminal on the right-hand side. These grammars are equivalent in their expressive power to finite-state machines. All the above grammars share a set of prerequisites (Millesh and Ritchie 2000): There is a set of non-terminal symbols There is a set of terminal symbols There is a distinguished non-terminal symbol There is a set of productions There is a definition of how productions may be used to rewrite sequences of symbols as other sequences of symbols A well-formed sentence is any sequence of terminal symbols which can be derived from the distinguished symbol by successive applications of rules.
Most natural languages and programming languages are members of the class of type 2 grammars or context-free grammars. And in a range of important cases, the complexity
87
of a certain class of grammar corresponds to the complexity of the machines (or automata) needed to recognize them. The following table1 well illustrates this fact (E. Stabler 1999). Grammar Type Type 0 Type 1 Type 2 Type 3 Example Corresponding Automata Turing machines [TM] Linear-bounded automata [LBA] Push-down automata [PDA] Finite-state automata [FSA]

A xB, A x
Table (4) Chomsky hierarchy grammars and their corresponding automata The following Venn diagram illustrates the containment relationship between the four grammar classes. As mentioned above type 0 grammars are unrestricted while type 3 grammars are the most restricted.
The data and format of the table are adapted from (Millesh and Ritchie 2000; Linz 2001; Rvsz 1985).
88
Building on the above set of information concerning formal grammars and languages, Type2 grammars represented by context free grammars are the most relevant to both programming and natural languages. According to Revesz (1985:17) CFGs are the most interesting ones [grammars] in view of the applications. The complexity of grammar types corresponds with the complexity of the abstract machine that is needed to recognize those grammars and languages. Those abstract machines are called automata.
3.5. Automata:
Automata theory is the study of abstract computing devices or machines (Hopcroft et al. 2001). Alan Turing in the 1930s came up with the idea of constructing an abstract machine aiming to describe the boundary between what a computing or calculating
89
machine could do and what it could not do.1In fact Turings abstract machine proved to be the forerunners of todays modern computers. The word automata is Greek for the things that can function on their own without the interference of humans. In our case automata is of paramount importance because one of its types, the push-down automaton (PDA), is the model or machine that can recognize CFLs and CFGs. Besides, automata represent the most straightforward path towards the study and discussion of parsing theory. Abstract machines or automata are in essence a class of recognition procedures. This type of recognition procedures is necessary to recognize the validity or otherwise of a given text for a given language, especially in the cases of text processing or translation. The sequence is as follows: a compiler analyzes a source program to check its correctness; a document processor performs a spell check on words, then verifies syntax; and a graphic user interface must check that the data are correctly entered (Reghizzi 2009).
The pith of the argument concerning formal languages and automata theory is that most NLs and PLs happen to fall under the category of Type 2 grammars. This means that most of those languages are context-free languages interpreted and realized via a type of NFA (non-deterministic finite automata) called Push-down automaton (Hopcroft et al. 2001).
Turing published his findings concerning this subject in a formidable paper titled On Computable Numbers, with an Application to the Entscheidungsproblem in 1936 in the Proceedings of the London Mathematical Society. In this paper Turing described the first abstract machine known later as Turing Machine.
1
91
I will not elaborate further on the subject of formal languages and automata for fear of missing the point: all what I wanted is to give the forthcoming discussion of parsing theory and techniques a solid theoretical and formal grounding.
3.6. Parsing Theory and Strategies
Parsing, put simply, is the automatic analysis of syntactic structure. Computationally speaking it is a search algorithm or a recognition algorithm.1 From a formal vantage point the basic idea of parsing is to parse (search for and find out) a string w, using a certain grammar G, generating all strings in L and checking if w is among them (Parkes 2008).Though illuminating, the previous abstract definitions of parsing, needs to be fleshed out more to better suit the requirements, nature and dynamics of NLs. Parsers that deal with computer languages are different from those dealing with NLs. The essential difference has to do with the power of the grammar formalism used. The generative capacity of formal grammars of computer languages is normally stronger because computer languages are designed so as to permit encoding by unambiguous grammars and parsing in linear time of the length of the input (Indurkhya and Damerau 2010:59). In contrast, NLs require more powerful devices that can handle the existence
Parsing till the early 20th century simply described the manual practice of analyzing sentences according to prescribed grammar rules as practiced in elementary schools. As a grammatical term it goes back to the ancient Greeks where it derives from the Latin phrase partesorationis which itself is a translation of the Greek logou mere. The concept was first used by Aristotle in his Poetics and De interpretatione in the context of his analysis of the constituents of logos. The partesorationis was incorrectly translated into English as parts of speech while the correct translation would be parts of statement. For further details see (Irvine 1994:32).
90
of such anomalies as UDs and semantic ambiguity. UDs, especially, are generally seen as very strong cases for expressive power (ibid.).
3.7. The Universal Parsing Problem

Building on the previous exposition of formal grammars, there is what came to be known as the Universal Parsing Problem (UPP) which can be defined as follows: Given a grammar G and an input string x *, derive some or all of the analyses assigned to x by G (Nivre 2006:13).1 The most widely used type of formal grammars in computer science and computational linguistics is the context-free grammar (CFG) discovered and formulated by Chomsky (1956). However, over the years different grammatical formalisms have been proposed and formulated to fill the gaps in CFGs. Of those we tackled GPSG, LFG, HPSG and CG. Also in recent years there has been a special interest in what is known as mildly context-sensitive grammars, which lie in the twilight zone between type 2 and type 3 grammars. Tree-Adjoining Grammars (TAGs) (Joshi 1987) and Combinatory Categorial Grammars (CCGs) (Steedman 1987) appear to strike a good balance between linguistic adequacy and computational complexity (Nivre 2006). However, in the following sections I will adopt a line of discussion away from handling linguistic theories or syntactic formalisms. The discussion proposed will focus on parsing theory itself i.e. parsing theory in its computer science environment. Then
1
We can find a different realization of the UPP from the view point of Optimality-theoretic LFG. It is defined as follows: Given a phonological string s and an OT LFG G as input, return the input candidate pairs <i, c>generated by G such that the candidate c has phonological string s and c is the optimal output for iwith respect to the ordered constraints defined in G. (see Mark Johnson 2002: 65)
92
a new section will relate that purely computational discussion with a practical suite of proposed techniques and strategies that focus on the parsing of UDs. The first thing that comes to the minds of those interested in parsing theory and practice is a questioning about how a parser works and what are the different strategies and theories necessary for understanding the world of parsing and parser design.1 The following section will address the nature of automatic parsers, how they work and the predominant strategies in parsing theory and technology.
3.8. Major Parsing Directions

Parsing has been defined in terms of two approaches: the structural approach and the recognition approach. The first approach defines parsing as the decomposition of complex structures into their constituent parts (Bunt et al. (eds.) 2005:1) or as the process of structuring a linear representation in accordance with a given grammar (Grune & Jacobs 2008:1). The other approach considers parsing as a recognition process where parsing is the process of recognizing a correct sentence of a language (Adelsberger 2001:170). Parsing, however, represents both attitudes: it is structural and recognition-based at the same time. This is due to the fact that parsing theory and techniques are the product of parallel research initiatives in computer science (Kasami 1965; Knuth 1965; Younger 1967; Aho& Ullman 1972, 1977; Hopcroft & Ullman 1979; Tomita 1985, 1987) on the one hand and psychology and psycholinguistics on the other (Kimball 1973; Clark & Clark 1977; Fodor & Frazier 1978, 1980; Kenneth
1
I tend to repeat my interest in parser design and not just parsing techniques because part of the solution proposed to the handling of UDs has to do with certain tweaking in parser design or architecture.
93
1979; Randall et al. 1983). Other definitions play an eclectic tone that embraces both attitudes, e.g. Parsing is usually not a goal in itself, but is intended to be instrumental to the recognition, interpretation, or transformation of complex structures (H. Bunt 1996:2). Still, other parsing specialists see parsing not as recognition but as deduction (Pereira & Warren (1983); Sikkel (1998); Kallmeyer 2010). Parsing in this last sense is a deductive process in which rules of inference are used to derive statements about the grammatical status of strings from other such statements (Shieber et al. 1995). Differences in the definition of parsing reflect different interests in parsing itself, but the definition that prevailed over others was the structural one because it has more affinities with the computational background in which the field of parsing thrived. This computational background received its first crystallization with the introduction of a number of parsing algorithms sketched in the following pages. An eye-birds view of available parsing algorithms makes us quickly realize that the difference between all these algorithms lies in how they start their work or, more technically, how they handle input strings. In my view there are two main trends: the first is date-driven beginning with the lexicon; the other, however, is grammar-driven or rule-based beginning with grammar rules rather than lexical items. The other characteristic feature that differentiates a certain parsing algorithm from another is how the parser moves. Movement here means the direction of the search mechanism endorsed. For instance there are bottom-up, top-down, breadth-first and depth-first strategies. Work in studying these fields is often seen through dichotomies: bottom-up vs. top-down, breadth-first vs. depth-first, top-down depth-first vs. bottom-up breadthfirst and so on and so forth.
94
3.9. Top-down parsing:

Dick Grune and Ceriel J.H. Jacobs (2008: vii) in the second edition of their seminal Parsing Techniques: a practical guide say that: Basically almost all parsing is done by top-down search with left-recursion protection. They also note that pure bottom-up parsers without a top-down component are rare and not very powerful (ibid.). Topdown and bottom-up parsing strategies are one of the earliest parsing algorithms developed throughout the late 1950s and early to mid 1960s. Top-down parsing is goaldirected search while bottom-up parsing is data-directed search. These two descriptions represent not just two techniques in parsing theory but represent a more profound dichotomy between two historic intellectual traditions in western thought, namely: the rationalist tradition (top-down) and the empiricist tradition (bottom-up) (Jurafsky and Martin).A top-down parser starts with the root of the syntactic tree S down to the rest of the tree nodes and branches.The next step tells us to expect an NP followed by a VP and then the VP by itself. A preliminary analysis of a simple sentence like Book that flight will look like the following applying top-down search:
95
Figure (23) a top-down analysis of the sentence Book that flight. The search algorithm tests all the possible manifestations of the sentence then matches up with the correct one. The above illustration shows us that the 5thtree in the 3rdrow is the correct analysis. The search began with the root S testing the different possibilities of the syntactic analysis till it reaches the correct one.
3.10. Bottom-up Parsing:

Bottom-up parsers in contrast to top-down parsing start with the input words which the tree consists of and builds the analysis up. So it starts with no prior assumptions or suppositions. The parse results are successful if they manage to reach up to the root of the input sentence S. The previous sentence Book that flight, for instance, will look like the following in a bottom-up parser:
96
Figure (24) a bottom-up analysis of the sentence Book that flight. In the above figure the parser starts with terminal nodes in an attempt to identify the correct matches. It goes up till it reaches the correct analysis at the end.The pivotal difference between a top-down parser and bottom-up one is that the former starts its search packed up with a huge number of grammar rules that look for the most felicitous match. Bottom-up parsers are also packed up with a huge database of grammar rules but they are instigated in relevance to the input words of the sentence without assuming anything a priori.
97
Computationally speaking, both algorithms have their pros and cons. Top-down parsers are time efficient because they identify the matches of the input by generating trees straight away, a fact that means that the parser never explores sub-trees but generates them on its own. The bottom-up parser generates those sub-trees on the basis of the words it analyses with so many wanton sub-trees. What distinguishes the topdown strategy is its speed because it does not wait for the input words to build trees according to the information provided by the lexicon but it does exert enormous effort on the S trees themselves. As shown in Jurafsky and Martin (2009) the weakness of top-down parsers stems from the fact that they build those trees before ever examining the input (J & M). Bottom-up parsers, on the contrary, never suggest trees without a solid base in the input. The above is a simplistic account of the two most fundamental parsing strategies in the field. In the following section I will illustrate another famous parsing algorithm based on and especially devised for formal and programming languages the CockeKasami-Younger algorithm.
3.11. The Cocke Kasami Younger Algorithm:

The Cocke Kasami Younger Algorithm (CKY) algorithm is one of the earliest and simplest parsing algorithms.1 It was first known during the 1960s (Kasami 1965; Younger 1967) and it is simple because it is only functional with Chomsky Normal
Interestingly, John Cocke, Tadao Kasami and Daniel Younger discovered the algorithm independently.
98
Form (CNF).1The CKY algorithm builds an upper triangular matrix T, where each cell Ti,j (0 i< j n) is a set of non-terminals (Indurkhya & Damerau 2010).2The secret of the CKY is its use of parse triangles or charts and dynamic programming.3 A parse triangle is composed of squares that form in their totality half a square. The CKY is a bottom-up algorithm that starts with terminal inputs, i.e. lexical items. Then it builds up its more or less glass pyramid from bottom upwards: starting with nouns, verbs, determiners, adjectives then noun phrases, verb phrases and so on. In more formal terms it first builds the lexical cells Ti1,ifor the input word wi then the non-lexical cells Ti,k (i< k1)are filled by applying the binary grammar rules: Ti1,i= { A | A wi} Ti,k= {A | A BC, i< j < k, B Ti,j, C Tj,k} The sentence is recognized (parsed) by the algorithm if S T0,n, where S is the start symbol of the grammar (ibid).4A graphic representation of the CKY algorithm may look like the following:
A context free grammar is in Chomsky Normal Form if every rule is of the form ABC, Aa where a is any terminal and A, B and C are any variables except that B and C may not be the start variable. In addition to this, automata theory puts forward the theorem that says: Any context free language is generated by a context free grammar in Chomsky normal form. (Spiser 2006:106-107) 2 Also known as CYK. 3 Dynamic programming is simply a method for solving complex problems via analyzing and deconstructing these problems into smaller and simpler sub-problems. 4 It has to be noted that the CKY algorithm does not handle empty constituents. In other words the grammar G provided should not have any e constituents.
1
99
Figure (25) A CKY parsing of the sentence Book the flight through Houston. The sequence of the recognizer functions in a bottom-up fashion gathering all the possible grammatical structures together. So, the parse forest of the sentence Book the flight through Houston would look like the following [0,1] [0,3] [1,2] [2,3] [1,3] [3,4] [4,5] [3,5] [2,5] [3,5] [1,5] [0,3] [1,5] [0,5] = S
3.12. The Earley Algorithm:

A major overhaul upon the CKY algorithm was devised by J. Earley (1970) in what came to be known as the Earley algorithm. Both algorithms, though, are based on the
011
concept of dynamic programming (see footnote above). But while the CKY is a bottom-up search algorithm, the Earley algorithm uses top-down search to implement its analyses (J & M).In addition to this the Earley algorithm uses a special feature to signal the progress of the rules devised the dotted rules. A dot is used on the righthand side of the grammar rule to tell us where the rule has reached or to what extent it progressed. Using the same sentence Book that flight the rules of the grammar would look like the following: S VP, [0, 0] NP Det Nominal, [1, 2] VPVNP , [0, 3] In reality this is done by resorting to three fundamental procedures or functions: the predictor, the scanner and the completer. The predictor creates new states representing the expectations of the top-down parser. The scanner is called to examine the input and incorporate a particular state with the corresponding prediction of a word with a particular part-of-speech into the chart. Finally the completer verifies the results and assures that the parser has successfully reached the correct part of speech. A concrete example applied to the sentence Book that flight will look like the following in an Earley-based algorithm parsing.
010
Table (5) An Earley algorithm analysis of the sentence Book that flight. The above algorithms stand out as two of the basic and most important and widespread parsing algorithms available in the literature. These algorithms and many others represent the internal mechanisms that may exist in a given parser. Reaching this point
012
the parser itself as a structural and architectural computational entity has to be illustrated. The following section is an attempt towards that goal.
3.13. Statistical or Grammarless Parsing:

The widespread ambiguity in natural languages motivated the use of statistical methods in parsing. Ambiguity stems from the fact that a given input sentence could have several possible parse trees, because sentences on average tend to be very syntactically ambiguous, due to problems like coordination ambiguity and attachment ambiguity (J &M 2009: 493). The following sentence illustrates one type of attachment ambiguity the parser may encounter:
Figure (26) An illustration of an attachment ambiguity in the sentence I shot the elephant in my pajamas. The first tree where the prepositional phrase in my pajamas is attached to the noun phrase an elephant produces a humorous interpretation (but a possible one in a fictional
013
context). The second tree, however, where the prepositional phrase is directly dominated by the verb phrase shot an elephant PP renders the more acceptable meaning of a man shooting an elephant wearing his pajamas. It has been discovered that grammar rules alone are not sufficient for choosing the correct parse (Luger & Stubblefield 2004). Using a statistical parser in the above case means that the parser will look for the parse with the highest probability of correctness. Recently parsing experts tried to use statistical techniques without the use of grammar rules to do the parsing. This is what is called Grammarless Parsing. Astonishingly, this new technique of parsing proved to be very successful yielding results that far exceed its grammarbased counterpart. A recent study proved that statistical parsers based on mathematical reasoning and techniques without any grammar rules or the help of an expert linguist scored better results than grammar-based parsers: parsing a test set of 1473 sentences the statistical parser scored 78% while the grammar-based parser scored 68% (David Magerman 1994).
3.14. Text vs. Grammar Parsing: the Nivre Model

Joakim Nivre (2005) postulated a new parsing strategy calling it text parsing (TP) which is in contrast to grammar parsing. According to Nivre, TP stands in contrast to grammar parsing. The notion of grammar parsing is formally expressed via a formal grammar G defining a formal language L(G) over terminal alphabet . In light of this
formulation the parsing problem can be defined as follows: Given a grammar G and an input string x
*, derive some or all of the
analyses assigned to x by G.
014
However, solving a parsing problem, such as the problem of UDs, requires a parsing algorithm that accounts for the problems and nuances context. Formally, such an algorithm is required to compute analyses for a string x relative to a grammar G. Algorithms under this category make use of tabulation to store partial results (the CYK algorithm, for instance) allowing decent reductions of the search space and, thus, providing a solution to the problem of ambiguity (Nivre 2005: 106-108). From this angle, traditional parsing algorithms are seen as constructive, i.e. they analyze sentences by constructing syntactic representations in conformity with the rules of the grammar postulated. Another strategy endorses what is called eliminative parsing where the parser keeps on eliminating syntactic representations which violate constraints already set within the parser rules module until it identifies a valid analysis. This type of strategy can be seen in a number of frameworks such as Constraint Grammar (Karlsson 1990), Parallel Constraint Grammar (Koskenniemi 1990, 1997) and Constraint Dependency Grammar (Maruyama 1990).1
3.15. Text Parsing and the Problem of UDs:

If grammar parsing is governed by a set of formal properties and rules, TP represents the empirical side of the parsing module itself. The empirical aspect of TP is manifested in the use of an analyzed corpus of texts, where each text segment is assigned a correct syntactic analysis by human syntacticians. This type of corpus is called a Treebank. TP is heavily dependent upon concrete manifestations of a language. This is in contrast to grammar parsing which can work on abstract formal rules yielding
1
For more details on constraint-based frameworks and parsing by constraints, see Shieber (1992), Minnen (2001), and Nivre (2006a, 2006b).
015
generic results that may, or may not, apply to a number of natural and artificial languages. The problem of TP can be formulated according to the following thesis (Nivre 2005). A text in a language L is a sequence T = (x1, . . . , xn) of sentences (strings) xi, thus the text parsing problem can be defined as follows: Given a text T = (x1, . . . , xn) in language L, derive the correct analysis for every sentence xi T. Applying the above general formulation of the problem of TP to the specific problem of UDs parsing within a text parsing framework needs an elaboration on the nature of UDs within this particular context. UDs represent a restricted subset of natural language, which is English in our case. This fact underscores the suitability of the method of TP to the phenomenon of UDs, simply because TP is functional when there are specific data to be worked upon. The ontology of UDs mentioned earlier represents such specificity. Being fragments of an actual natural language, English, the problem of UDs parsing can be formulated as follows: Given a text T = (ud1, . . . , udn) in language L(E), derive the correct analysis for every sentence udi T. The (ud) above stands for an unbounded dependency construction, the L(E) stands for the English language. In addition to the above formal statement of the problem of UDs parsing within the framework of TP, it behooves us to mention that all input strings belonging to the Treebank should be well-formed strings of the grammar (WFSG) (Nivre 2005). This is due to the fact that checking the well-formedness of a particular string in the UDs parser is not part of its mechanism (as we shall see in the following
016
sections) and exceeds its intended functionality. A caveat is in order, though. The contours of Nivres proposed contrariety between grammar parsing and text parsing methods are far from being clearly marked out. TP can never do without the help of a grammar module that includes at least a basic set of rules. In addition to this, TP itself has two strategies: the grammar-driven strategy and the data-driven strategy (Nivre 2005b).
017
Chapter 4 UDs Parsing Complexity:

The number of parsing algorithms available in theory and in practice is really hard to account for. It requires a hefty tome of formalisms and theorems to keep abreast of all the available algorithms. The algorithms sketched so far represent, I believe, a basic perfunctory backdrop to certain theoretical underpinnings in parsing theory. But we need to move on to deal with the problem(s) of parsing encountered in the process of UDs parsing. The basic problematic admitted by almost all major syntactic formalisms/ theories resides in the existence of gaps in UDs. As mentioned earlier on, the serious trouble with gaps in UDs is that they represent elements that simply do not show in the phonological or graphological manifestations of the sentence processed (empty categories, null elements, etc.). In a computational analysis, in contrast to the psychological, this state of affairs cannot be accounted for because a computer algorithm has to deal with a concrete materialization of a certain linguistic stretch.1 What I propose in this work towards a solution of this problem is a two-fold proposal based on two sets of modifications: architectural modifications that have to do with parser architecture/ design and another set of processing or procedural modifications associated with the parsing mechanism itself. The modifications proposed, however, are strictly related to the problem of UDs parsability and bear on both theoretical as
1
Even in psychologically motivated theories this fact is also being admitted but in a rather different guise. Chomsky (1982), for instance, observes that the properties of gaps are intrinsically significant in that the language learner can confront little direct evidence bearing on them. The language learner in Chomskys terminology is equivalent to both the psychological notion of a parser as well as the computational (Gorrell 1995: 130).
018
well as implementation issues in theoretical linguistics and computer science respectively. Most of the works cited and the vast literature concerned with the linguistic and computational analyses of UDs perceive them just as a processing problem without considering the potential role of the architectural aspects of the parser as part of a possible solution. The complexity of UDs as a parsing problem necessitated the occasion of coming up with novel techniques to handle them. The global register HOLD in Augmented Transition Networks (ATN) (Gazdar & Mellish 1989a, 1989b) and Gap Threading in Definite Clause Grammars (DCGs) (Hodas 1992, 1997) are two well known examples. The technique of Gap Threading, particularly, will be dealt with in more detail in the following sections being one of the significant solutions of UDs parsing. However, the complexity of UDs parsing is not an internal problem that should be ascribed to the nature of UDs alone. This is an oversimplification of the problem. Most problems happen to be problems because of certain aspects internal and specific to them and other aspects that have to do with our perception of those problems qua problems. In other words, the problem of UDs parsing has other factors external to it that effect its problematicity: these factors are on the side of the computational linguists perception and reaction to UDs within the parser itself. The inadequacy of available parsers handling UDs is discussed in the following section.
019
4.1. The Rimmel-Clark-Steedman (RCS) Test:

In 2009 Laura Rimell, Stephen Clark and Mark Steedman wrote a report on the stateof-the-art performance of a number of parsers in relation to those parsers handling of UDs (Rimell et al. 2009). The report was based on a corpus containing 700 sentences describing seven different unbounded dependency constructions. The aim of the report was to evaluate the efficiency of those parsers in relation to recovering UDs. According to the RCS test, UDs were specifically chosen for the following two reasons: 1- UDs provide a strong test of the parsers knowledge of the grammar of the language, since many instances of UDs are difficult to recover using shallow techniques in which the grammar is only superficially represented. 2- Recovering UDs is necessary to completely represent the underlying predicateargument structure of a sentence, useful for applications such as Question Answering and Information Extraction. The seven UDs types used in the corpus are listed in the following table:
001
Table (6) Examples of the seven types of UDs used in the RSC Test
The corpus consists of almost 100 sentences for each of the seven types above. The test did not annotate all sentences since it is interested solely in UDs. All the sentences in the corpus were taken from the Penn Treebank (PTB) the Brown Corpus and the Wall Street Journal Corpus (WSJC).
000
4.2. The Parser Set

The parsers chosen for the test are the following: the C&C CCG parser, the Enju HPSG parser, the RASP parser, the Stanford parser and the DCU post processor of PTB parsers based on LFG and applied to the output of the Charniak and Johnson re-ranking parser. As seen from the above list, the parsers chosen represent different types of parsing technology and approaches: the grammar-based, the purely statistical and the hybridized. The C&C CCG parser is based on CCGbank, which is a version of the Penn Treebank couched in the Combinatory Categorial Grammar syntactic framework. The importance of this parser lies in the fact that CCG is primarily concerned with the identification and analysis of UDs. The same criteria of preferability apply to the Enju parser because it is based on an HPSG framework, which is also primarily concerned with UDs. The RCS Test analyzed the performance of the five parsers but paid special attention to the CCG parser because of its high performance in comparison to the other parsers analyzed. Statistically, the parsers accuracy in identifying UDs in the sample corpus mentioned are shown in the following table:
Table (7) Parser accuracy on the UDs corpus according to the RCS Test
002
Despite the fact that the five parsers chosen represent the state-of-the-art in parsing technology, the poor performance of them all (25% - 59% accuracy) attests to the fact that current parsing technology is deficient at recovering some UDs which are crucia l for fully representing the underlying predicate-argument structure of a sentence.The substantial results and conclusions of the RCS Test were quite illuminating in redirecting my attention to the necessity of coming up with novel solutions and techniques related to the parsing of UDs. The solutions and modifications proposed are divided into two sections: the first addresses what I call architectural problems or design modifications that, I think, if applied will highly improve the accuracy of the overall performance of the parser. The second section proposes modifications in the internal mechanism and processing of the parser dealing with UDs.
4.3. Parser Architecture and Design: 4.3.1. The Compiler

A parser as a computational component of a computational system is a module within a more comprehensive system which is the Compiler. A compiler is a program that takes as input a program written in a source language and produces as output an equivalent program written in another (target) language (Harry Henderson 2009 Ency. Of Computer Science Technology: 95). A parser is one step or module within that overall program. But the compiler itself is a part of a larger structure which is the language processing system which can be illustrated in the following figure (Aho, Sethi, Lam, Ullman 2007).
003
Figure (27) Components of a language processing system The compiler itself consists of a number of modules. According to Aho et al. (1973) a compiler consists of a number of steps or processes: 1- Lexical analysis. 2- Book-keeping. 3- Parsing or syntax analysis 4- Code generation or translation to intermediate code (e.g. assembly language). 5- Code optimization. 6- Object code generation (e.g. assembly). The design of compilers is shown in figure (28) (Henderson 2009).
004
Figure (28) The structure of a compiler within a language processing system
4.3.2. The Parser:

But what interests us here is the parser module and how it is structured and the way it functions. The parser is an intermediate component within the compiler that
005
understands and transforms input strings into codes interpretable by the translator within the compiler (see Figure 29).
Figure (29) The Parser within the compiler. A natural language parser takes the input strings and identifies them transforming them into tokens then with the help of a library of grammar rules the parser builds syntactic trees corresponding with syntactic analyses (see the figure below). Those syntactic analyses could be based on a particular syntactic formalism (GPSG, LFG, HPSG, CCG, etc.) or they could be theory neutral, i.e. not based on any theoretical syntactic formalism but based on a particular set of rules specifically written for handling natural language texts like DCG, LISP or PATR.
006
007
Chapter 5:
Architectural Modifications:
The main problem facing traditional parsers is their lack of modularity. The structure of a parser, as we have seen above, rests on a uniform design that admits of no internal divisions or partitioning that may target a special purpose (such as the handling of a special syntactic phenomenon like UDs). In other words, a parser in theory is a continuum that contains the lexicon, the grammar rules and the syntactic analyses produced. What I propose in this section is a modification on the architecture of the parser design that aims at modularizing the parser itself.1 By modularity I mean the introduction of specifically designed modules into the parser to handle unique problems such as UDs.
5.1. The Small-scale Latent Parser (SLP):

Traditional parser architecture begins with the source string or the input which is analyzed lexically; the results are then processed as tokens to be fed to the syntactic analyzer that contains the grammar rules component (see Figure 29). The parser
During my work on this thesis, I have come across two relevant patents registered in the USA. The first describes a Modular Parser Architecture with Mini Parsers published in Aug. 2006 under USA Patents No.: US 7,089,541 B2; and the second is entitled Method and Apparatus for Determining Unbounded Dependencies during Syntactic Parsing also published in Nov. 2006 under USA Patents No.: US 2006/0253275 A1. It has to be noted, however, that both patents were published two years after the registration of my thesis in 2004. I came across them after formulating all the proposals and modifications in this work and after reaching most of the conclusions.
008
proposed, however, has an additional module manifested in what I call a Small-scale Latent Parser (SLP) (Figure 30).
Figure (30) Small-scale Latent Parser. From its nomenclature the parser module proposed has two characteristics: it is small-scale in the sense that it deals with a limited number of rules and a controlled or closed set of data. The other aspect of the module is manifested in its latency in the sense that it stays dormant until it comes across a special trigger or cue representing these data, e.g. a wh-word, a relativizer, a topicalized structure, etc. The SLP itself is like a mini-parser that includes all the features and characteristics of traditional parsers using unification-based attributes. These attributes are manifested in the use of a structured GAP-ontology represented below
009
Figure (31) GAPS AVM The GAP attribute value matrix (AVM) proposed above consists of the following:
1- A position/ function (P/F) inventory . 2- A UDs typology (UDtp) identifying the various types of UDs in English. 3- A UDs class identifier (UDcl) (the class of strong UDs and the class of weak UDs).
All three procedures call upon data in tables (1) and (2) above.
5.2. UDs SPL Pseudo-Algorithm:

The processes in the SLP work in an algorithmic method which I call the UDs parsing algorithm. This pseudo-algorithm functions as follows: 1- Start 2input: English text [Sn] 3proceed to[GAP AVM filter] 4find [UDs triggers] 5if [Sn] [UDs] 6return to [Mother parser] 7if S [UDs] = YES 8proceed to[syntactic analyzer] 9use [P/F inventory] 10- then [UDtp identifier] 11- if S = S/NP = YES 12- then 13- identify GAP 14- then 15- proceed to [Mother parser] 16- end
021
It is obvious that the above only takes the shape of an algorithm but it follows algorithmic reasoning as stipulated in works on algorithmics (see for example Skiena 1998; Harel 2004). The above algorithm may be represented in the following diagram:
Figure (32) Flowchart of UDs SPL algorithm.
From the algorithm above, we can trace two types of parsing strategies endorsed: topdown parsing and bottom-up parsing. It has to be noted however, that pure bottom-up parsing cannot deal with phenomena such as UDs because simply it cannot deal with gaps. This is considered as the Achilles heel of bottom up parsing (Smith 1990). Gaps
020
refer to non-existent elements in a sentence and bottom-up parsing initiates with the terminals of the input sentence, i.e. the words therein. With certain elements nonexistent, how the bottom-up parser will deal with them as long as it cannot account or even see them? This in mind, the thesis proposes that top-down parsing or even a strategy that mixes top-down with bottom-up techniques will highly be more felicitous. The notion of modularity brings considerable solutions to many problems found in the processing of the parser. The Mother Parser normally addresses unique problems across the board without paying enough heed to the possibility of anomalies such as UDs. A parser shunning away such possibilities gets eventually challenged by bugs, unpredictable outputs and black-box phenomena. Creating an SLP will enable the software developer or the computational linguist of seeing through the components and workings of the parser when confronted with anomalies or complex phenomena such as UDs. Thus, he or she will be spared getting caught up in a morass of codes not knowing where the bug or how it should be removed. They will only check the SLP or the miniparser proposed and test its functionality. One caution is in order here. When designing a real SLP or any type of mini-parsers the designer has to be cautioned against featurecreep or the introduction of too many features into the mini-parser that would eventually result in clogging the system or infesting the universal or mother parser with bugs or with features that may affect the parsers running time which has to be within polynomial limits (within class P in complexity theory) (see Hedman 2006).
022
Chapter 6:
Processing Modifications:
In the previous section I proposed an architectural modification to the parser that may, theoretically, save the overall system more time and yield more accurate results in relation to the parsing of UDs. As shown earlier, UDs are computationally problematic because of their syntactic complexity and their containing of linguistic entities (gaps) that cannot be directly or unambiguously identified by the parser. Gaps, thus, have to be computationally treated head-on. In the following sections two techniques will be introduced as possible solutions to the problems of gaps in UDs parsing: namely, gap threading and memoization.
6.1. Gap-threading
The complexity of handling UDs computationally necessitated the creation of novel techniques to overcome this complexity. Gap threading (Pereira 1981; Karttunen 1986; Pulman 1992; Kaplan and Maxwell 1995; Pereira and Shieber 2002; Dienes 2003) was one of those techniques.1 Historically, it began with Pereira (1981), Karttunen (1986) as a mechanism within Definite Clause Grammars (DCGs) to handle the phenomenon of gaps in wh-questions and relative clauses. The idea of gap threading is that in a sentence like Who did you give the book to _?, the filler Who is connected to the gap
Pulman (1992: 70) ascribes the coinage of the term gap-threading to Karttunen (1986). Reading Karttunens, however, I could not find any mentioning of the term of threading, though of course he explains the process quite succinctly.
023
position marked by _. Recognizing this connection is signaled metaphorically as an act of threading. In such mechanism we thread a marker corresponding to the moved constituent or the filler through the sentence using two features gapin and gapout on each relevant sub-constituent. The marker goes in into a constituent and out again if the constituent does not contain a gap. If it does contain a gap, the marker goes in, but does not come out, and is associated with the gap itself (Pulman 1992). The rule governing this mechanism sets the gapin and gapout features, sending the marker to the constituent within which it expects there to be a gap, and requiring that the gapout feature should not contain the marker, ensuring that the gap has to be found somewhere. A simple gap threading example is illustrated in the following diagram with the topicalized sentence John, Sally gave a book to _.
024
Figure (33) Gap-threading in the sentence John, Sally gave a book to. Before the invention of gap threading, DCG used the technique of gap passing to pass gap information among the nonterminals in grammar rules. However, according to Pereira et al. (2002) gap passing was deficient in two aspects: 1- Several versions of each rule, differing only in which constituent(s) the gap information is passed to, may be needed. For instance, a rule for building dative verb phrases
would need two versions of DCG rules
025
so as to allow a gap to occur in either the NP or PP, as in the sentences What did Alfred give to Bertrand? Who did Alfred give a book to? 2Because of the multiple versions of rules, sentences with no gaps will receive
multiple parses. For instance, the sentence Alfred gave a book to Bertrand. Because of the above mentioned drawbacks, the mechanism of gap threading was invented. However, the problem with gap threading, in my opinion, is that it uses data structures known as difference lists that are peculiar to PROLOG. The problem with PROLOG, as a programming language, is that it utilizes a top-down parsing strategy, which, I think, is unsuitable for UDs parsing (see next section). In the following examples and illustrations from Pulman (1992) PROLOG is being utilized as the platform for handling gaps and using gap-threading. It should be noted, however, that in this version of gap-threading, there are four features used to refer to the behavior of gaps in and out of the sentence. These four features are: gapsSoughtIn, gapsSoughtOut, gapsFoundIn and gapsFoundOut. In practice, those four features are treated as a single feature gaps whose value is a 4tuple: gaps = (Gi, Go, Fi, Fo). According to this formulation, gaps are found by rules like:
026
The gapsSoughtIn feature is regarded as a stack: if we are looking for an empty NP, we can find it at any point in the input, pop that request off the stack, and continue looking for one fewer empty NPs. What this means is that the structures below will all involve the same verb phrase rules as the ungapped versions: The man to whom I gave the book _. The man (who) I gave the book to _. The man (who) I gave _ to the soldiers. The man (who) I gave the name of _ to the police. The gap threading analysis is a considerable practical improvement on the metarule treatment of Gazdar et al. (1985), which would involve a separate rule not only for each different subcategorization, but for each gap position for each version. In the vast majority of cases, only one gap is ever being looked for within one constituent so using a stack is not strictly necessary. Having the relevant category as the value of a gap feature would be sufficient (Pulman 1992: 71-74).
6.2. Gaps in Python

Another exemplification of the handling of gaps in a programming language is found in Python. Python is a general purpose programming language that underscores the importance of readability. It is similar to PROLOG in that its code readability and the
027
intuitiveness of its concepts. As a programming language Python supports many programming methods: object-oriented, imperative and functional programming styles (see Bird et al. 2009). Though gap-threading was considered to be an improvement on Gazdar et al (1985), Gazdars SLASH notation continued to be an inspiration to later inventions and improvements that target the resolution of UDs parsing problems. Bird et al. (2009) almost 25 years after the publication of Gazdar et al. (1985), falls back on Gazdars SLASH notation and categories incorporating them into the NLP applications of Python. He uses SLASH notation describing gaps using the following completely readable equation:
S[-INV] -> NP S/NP, where the slash represents a missing NP
within the sentence. An actual Python code parsing an unbounded dependency with a WH-question trigger looks like the following (Bird et al. 2009: 352)
Figure (34) A parse of the sentence Who do you claim that you like? using Python
028
The percolation of the SLASH feature in all the steps above is similar to the technique of gap-passing and is even more intuitive than gap-threading. However, I think that an approach combining both gap-threading and SLASH percolation (as manifested in Python) would yield more robust results related to the parsing of UDs.
6.3. Memoization
In 1968 Donald Michie wrote at the beginning of a Nature magazine article the following: It would be useful if computers could learn from experience and thus automatically improve the efficiency of their own programs during execution. A simple but effective rote-learning facility can be provided within the framework of a suitable programming language. In this work (Michie 1968) Michie proposed for the first time the techniques of memoization. The technique refers to the process by which a function is made to automatically remember the results of previous computations (Norvig 1991). Memoization is essentially a query optimization procedure used to speed up the parsing system by avoiding the repetition of the processing of results previously processed. The memotable should include all the successful results produced out of the SLP. Thus, if gap-threading is to be applied a function gap, with its 4-tuple realizations, has to be memoized. Memoization is a generic process that has other variants in computational linguistics. The most well-known example of such variants is the HOLD register in
029
ATNs. Within the environment of ATNs, the HOLD register if presented with a constituent followed by a gap, it places the constituent in temporary storage and awaits an appropriate gap in the syntax. Parsers using this strategy also make use of virtual rules or arcs (in the terminology of ATN grammars) that treat constituents on hold as if they occupied the gap. For instance, if the parser is presented with a relative clause construction like: I saw what happened. the initialization instructions search for the S arc and places what on hold, then it retrieves what as subject of the relative clause (Smith 1991). The problem with HOLD registers is that they are increasingly local. They do not function across the board of the parser itself: actually, every call on the S network to parse a relative clause creates a separate HOLD. This will create a forest of HOLD registers if the parser is presented with a text teeming with UDs. Memoization techniques (and the HOLD register strategy in ATNs) are all couched in top-down parsing strategies (Norvig 1991) the same as the technique of gapthreading above. Using these techniques will, thus, highly maximize the SLP processing because, it, too, adopts a top-down parsing strategy. The overall parser architecture with all the components and modifications proposed will look like the following:
031
Figure (35) A parser blueprint incorporating all proposed modifications A parsing system is claimed to be robust if it exhibits graceful behavior in the presence of exceptional conditions. Exceptional conditions or unexpected processes may activate latent design defects in a non-robust parser. As for an NLP system, robustness is concerned with the systems behavior when input falls outside of its initial coverage. Robustness, thus, is a matter of anticipating exceptional conditions while designing a system (Oltmans 1999: 66-67). According to this view the proposed SLP
030
along with the other mechanisms and processes proposed represent a robust NLP system, because it possesses the ability of accounting for exceptional conditions related to UDs and handling them in a predictable and structured strategy.
032
Chapter 7:
Conclusion:
The research conducted in this thesis has two aspects to it: a syntax-theoretic aspect and a computational one. The objective of the thesis was to propose a novel approach to parsing unbounded dependency constructions in a more efficient style. This was realized via explicating the diverse syntactic backdrop of UDs and how those syntactic theories and formalism were influenced by UDs. One of the main conclusions of the thesis is that a parser can do without any syntactic formalism whatsoever. It can use certain techniques or mechanisms adapted from those theories, but it has no need of developing the whole parsing strategy according to a strict interpretation of a particular syntactic formalism such as HPSG or CCG. In fact, and as seen in the RCS Test, parsers built on a syntactic theory or formalism platform tend to be bad achievers. That is why the thesis proposed two types of modifications that have to do with parsing architecture and processing techniques. The syntactic formalisms of the first part of the thesis showcased the importance of UDs as a major motivation for syntacticians to develop, tweak or introduce completely new elements into the theories they are developing or even subscribing to. It has been shown that transformational grammar was itself developed in order to account for such doggedly complex syntactic structures. Most of the syntactic formalisms discussed
033
in the first part of the thesis transformed the phenomenon of UDs into a bone of contention and also a benchmark gauging their success and adeptness against. Specifically, I have shown that UDs cannot rely solely on solutions based on syntactic modeling or proposals; the most successful solutions in NLP do not come from linguistic theories as much as they are the result of a deliberate devising of smart computational techniques rooted in computational practice. The main objective of this thesis was to introduce actual modifications on the parsing technique or structure. This was not possible without studying and explaining the logical, mathematical and formal backgrounds of parsing theory. That explains the introduction of formal language theory as a spring-board to parsing. The thesis did not examine the whole list of parsing algorithms or strategies because of the intractability of such a task. But focusing on a number of major issues directly related to UDs, the thesis came up with the following conclusions and proposed solutions to the problem of UDs parsing: 1- It was shown that syntactic theories and formalisms do not necessarily fare better than non-theoretic or grammarless frameworks. 2- Derivational theories of syntax (e.g. transformational grammars) with their obvious psychological disposition in the study of language proved to be not the best choice for computational implementation. 3- Better results can be obtained if a small-scale latent parser (SLP) is designed. 4- Gap-threading is suitable technique if applied in a bottom-up parsing environment.
034
5- Memoization is also an appropriate memory storage facility if applied within a bottom-up parsing environment. 6- The modifications proposed in the second part of the thesis, the computational section, contribute to creating a robust NLP system satisfying the conditions of robustness. 7- Bottom-up parsing has been proved to be more suitable than top-down parsing when parsing UDs. Finally, what this thesis aspires to prove is that many complex problems (such as UDs) can be successfully tackled with relatively simple or straightforward systems and techniques.
035
036
References
1- Adelsberger, H. (2001) Prolog Programming Language. In Encyclopedia of Physical Science and Technology, (ed.) Meyers, R. 155-178. Academic Press.
2- Aho, A. and Ullman, J. (1972) The Theory of Parsing, Translation, and Compiling, Vol. 1, Parsing. Prentice Hall.
3- Aho, A. and Ullman, J. (1973) The Theory of Parsing, Translation, and Compiling, Vol. 2, Compiling. Prentice Hall
4- Aho, A. and Ullman, J. (1992) Foundations of Computer Science. W. H. Freeman/Computer Science Press.
5- Aho, A., Lam, M., Sethi, R., Ullman, J. (2007) Compilers: Principles, Techniques and Tools. (2nd ed.) Addison Wesley.
6- Asudeh, A. (2009) Adjacency and locality: A constraint-based analysis of complementizer-adjacent extraction. In Miriam Butt and Tracy Holloway King, (Eds.), Proceedings of the LFG09 Conference, pp. 106-126. Stanford, CA: CSLI Publications.
7- Bar-Hillel, Y. (1953) A quasi-arithmetical notation for syntactic description. Language 29. pp. 47-58.
8- Becvr, J. (ed.) (1975) Mathematical Foundations of Computer Science, 4th Symposium, Marinsk Lzne, Czechoslovakia, September 1-5, 1975, Proceedings. Springer
037
9- Berwick, R. and Weinberg, A. (1982) Parsing efficiency, computational complexity, and the evaluation of grammatical theories. Linguistic Inquiry 13:2, pp. 165 191.
10- Bird, S., Klein, E., Loper, E. (2009) Natural Language Processing with Python. O'Reilly Media.
11- Blevins, J. (1994) Derived constituent order in unbounded dependency constructions. Journal of Linguistics, 30, pp. 349-409.
12- Bloomfield, L. (1926) A set of postulates for the science of language. Language2,pp. 153-164.
13- ____ (1939) Linguistic aspects of science. In Otto Neurath et al. (eds.), International Encyclopedia of Unified Science, Vol. 1 No. 4.
14- Bonnet, G. (2011) Syntagms in the artigraphic Latin grammars. In Stephanos Matthaios et al. (Eds.) Ancient Scholarship and Grammar. De Gruyter, pp. 361374.
15- Borsley, R. (1996) Modern Phrase Structure Grammars, Blackwell Textbooks in Linguistics 11, Backwell Publishers, Oxford. 16- Borsley, R. (2009) Review of Robert D. Levine, and Thomas E. Hukari The Unity Of Unbounded Dependency Constructions Journal of Linguistics, 45, pp 232-238.
17- Bresnan, J. (1976) Evidence of a Theory of Unbounded Transformations. Linguistic Analysis 2, 353-393.
038
18- Bunt, H., Carroll, J., Satta, G. (eds.) (2005) New Developments in Parsing Technology. Springer.
19- Bussmann, H. (1996) Routledge Dictionary of Language and Linguistics. Translated and edited by Gregory P. Trauhth and Kerstin Kazzazi. London & New York. Routledge.
20- Carnie, A. (2002) Syntax: a generative introduction. Oxford, Blackwell publishers.
21- ____ (2008) Constituent Structure. Oxford Surveys in Syntax and Morphology. Oxford University Press.
22- Carpenter, B. (1998) Type-logical Semantics. MIT Press.
23- Charniak, E. (1993) Statistical Language Learning. MIT Press. 24- Chomsky, N. (1956) Three models for the description of languages. IRE Transactions on Information Theory 2:3, pp. 113 124. 25- Chomsky, N. (1977) On wh-movement. In P. Culicover, T. Wasow and A. Akmajian (eds.), Formal Syntax. 71 132. New York, Academic Press.
26- Chomsky, N. (1982) Some Concepts and Consequences of the Theory of Government and Binding. Cambridge, MA: MIT Press.
27- Cimiano, P., Buitelaar, P., Vlker, J. (2010) Ontology Construction. In NitinIndurkhya and Fred J. Damerau (Eds.) Handbook of Natural Language Processing, Second Edition. CRC Press, Taylor and Francis Group.
039
28- Clark, A., Fox, C., Lappin, S. (eds.) (2010) The Handbook of Computational Linguistics and Natural Language Processing, Wiley-Blackwell, Oxford. 29- Crystal, D. (2008) A Dictionary of Linguistics & Phonetics. 6th ed., WileyBlackwell Publishers.
30- Dalrymple, M. (2001) Lexical Functional Grammar. Syntax and Semantics, volume 34. Academic Press.
31- ___ (2005) Lexical functional grammar. In Keith Brown (Ed.) Encyclopedia of Language and Linguistics, Elsevier. Vol. , pp. .
32- Dienes, P. (2003) Statistical Parsing with Non-Local Dependencies. PhD dissertation, Universitt des Saarlandes. Website: http://www.coli.uni-saarland.de/bib/files/dienes_phd_thesis.pdf
33- Dopico, J., Dorado, J., Pazos, A. (2009) Encyclopedia of Artificial Intelligence. 3 vols. set. Information Science Reference, IGI Global, New York.
34- Dowty, D., Karttunen, L., Zwicky, A. (1985) Natural language parsing: psychological, computational and theoretical perspectives. (Studies in Natural Language Processing Series.) Cambridge: Cambridge University Press.
35- Earley, J. (1970) An Efficient Context-free Parsing algorithm. Communications of the ACM 13 (2): 94102 36- Engdahl, E. (1983) Parasitic Gaps. Linguistics and Philosophy 6:534.
37- Falk, Y. (2001) Lexical-Functional Grammar: An Introduction to Parallel Constraint-Based Syntax. Stanford, Calif.: CSLI Publications.
041
38- ___ (2005) Long-distance dependencies. In Keith Brown (Ed.) Encyclopedia of Language and Linguistics, Elsevier. Vol. , pp. .
39- Fodor, J. (1978) Parsing Strategies and Constraints on Transformation. Linguistic Inquiry 9, 427-473.
40- Gazdar, G (1981) Unbounded Dependencies and Coordinate Structure. Linguistic Inquiry, 12(2), 155 184.
41- Gazdar, G., Klein, E., Pullum, G., Sag, I. (1985) Generalized Phrase Structure Grammar. Blackwell.
42- Gazdar, G. and Mellish, C. (1989) Natural Language Processing in LISP. Addison Wesley.
43- Ginzburg, J. and Sag, I. (2001) Interrogative Investigations: the form, meaning and use of English interrogatives. CSLI. Stanford.
44- Gdel, K. (1992) On Formally Undecidable Propositions Of Principia Mathematica and Related Systems, tr. B. Meltzer, with a comprehensive introduction by Richard Braithwaite. Dover reprint of the 1962 Basic Books edition.
45- Gorrell, P. (1995) Syntax and Parsing. CUP.
46- Grattan-Guiness, I. (2001) The Search for Mathematical Roots 1870 - 1940. Princeton University Press.
47- Grune, D., Jacobs, C. (2008) Parsing Techniques: A Practical Guide. Springer.
040
48- Harel, D. (2004) Algorithmics: the Spirit of Computing. Pearson Education Ltd.
49- Hausser, R. (2001) Foundations of Computational Linguistics: human-computer communication in natural language, 2nd. Springer. 50- Hedman, S. (2006) A First Course in Logic: an Introduction to Model Theory, Proof Theory, Computability, and Complexity. Oxford University Press.
51- Henderson, H. (ed.) (2009) Encyclopedia of Computer Science and Technology. Facts on File, Inc.
52- Hodas, J. (1992) Specifying Filler-gap dependency parsers in a linear-logic programming language. In K. Apt (ed.), Proceedings of the Joint International Conference and Symposium on Logic Programming, pages 622 - 636, 1992.
53- _____ (1997) A Linear Logic Treatment of Phrase Structure Grammars for Unbounded Dependencies. In Proceedings of the Conference of Logical Aspects of Computational Linguistics. Springer.
54- Hofstadter, D. (1979/1994) Gdel, Escher, Bach: an eternal golden braid. Penguin Books Ltd.
55- Hopcroft, J., Ullman, J. (1968). Formal Languages and their Relation to Automata. Addison-Wesley
56- Hopcroft, J., Motwani, R., Ullman, J. (2000) Introduction to Automata Theory, Languages, and Computation, 2ndedition. Addison Wesley.
57- Horrocks, G. (1987) Generative Grammar. Longman.
042
58- Huddleston, R., and Pullum, G. (2002) The Cambridge Grammar of the English Language. Cambridge University Press.
59- Indurkhya, N. and Damerau, F. (eds.) (2010) Handbook of Natural Language Processing 2nd edition. Chapman & Hall/CRC. 60- Irvine, M. (1994) The Making of Textual Culture: Grammatica and Literary Theory 350 1100. Cambridge University Press.
61- Johnson, J. (2002) Optimality-theoretic Lexical Functional Grammar. In Paola Merlo et al. (Eds.) The Lexical Basis of Sentence Processing: Formal, computational and experimental issues. John Benjamins Publishing, pp. 59 73.
62- Joshi, A. (1987) Introduction to Tree-adjoining Grammar. In Mathematics of Language Manaster-Ramis (ed.), John Benjamins.
63- Jurafsky, D., Martin, J. (2009) Speech and Language Processing: an introduction to natural language processing, computational linguistics and speech recognition, 2nd edition. Prentice-Hall.
64- Kallmeyer, L. (2010) Parsing Beyond Context-Free Grammars. Springer.
65- Kaplan, R., Bresnan, J. (1982) Lexical-Functional Grammar: A Formal System for Grammatical Representation. In Formal Issues in Lexical-Functional Grammar, Dalrymple, M. et al. (eds.) (1995). 1-102. Stanford University.
66- Kaplan, R., Zaenan, A. (1989) Long-Distance Dependencies, Constituent Structure, and Functional Uncertainty. In Alternative Conception of Phrase Structure, eds. Mark R. Bartin and Anthony S. Kroch, 1742, Chicago: University of Chicago Press.
043
67- Kaplan, R. and Maxwell III, J. (1995). An Algorithm for Functional Uncertainty. In M. Dalrymple, R. Kaplan, J. Maxwell III and A. Zaenen, (eds.), Formal Issues in Lexical-Functional Grammar, no. 47 in CSLI Lecture Note Series, 177197. CSLI Publications. 68- Karttunen, L. (1986) D-PATR: A Development Environment for Unificationbased Grammars. In Proceedings of the 11th conference on Computational Linguistics, 74 80. 69- Kasami, T. (1965) An efficient recognition and syntax-analysis algorithm for context-free languages. Scientific report AFCRL-65-758, Air Force Cambridge Research Lab.
70- Lepschy, G. (ed.) (1994) A History of Linguistics. Longman
71- Levine, R. (1989) Downgrading Constructions in GPSG. Natural Language and Linguistic Theory 7: 123 135.
72- ___ (2005) Head-driven Phrase Structure Grammar. In L. Nadel (ed.) Encyclopedia of Cognitive Science. John Wiley & Sons.
73- Levine, R. and Sag, I. (2003) Some Empirical Issues in the Grammar of Extraction. Proceedings of the HPSG03 Conference. Michigan State University, East Lansing, Stefan Mueller (eds.) CSLI Publications.
74- Linz, P. (2001) An Introduction to Formal Languages and Automata. Jones & Bartlett publishers.
75- Luger, G., Stubblefield, W. (2004) Artificial Intelligence: Structures and Strategies for Complex Problem Solving (5th ed.) The Benjamin/Cummings Publishing Company, Inc.
044
76- Magerman, D. (1994) Natural Language Parsing as Statistical Recognition. PhD Dissertation, Stanford University. Website: ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/94/1502/CS-TR-94-1502.pdf
77- Martin-Vide, C. (2003) Formal Grammars and Languages. In Ruslan Mitkov (Ed.) The Oxford Handbook of Computational Linguistics. Oxford University Press, pp. 157-177.
78- Matthews, P. (1997) The Concise Oxford Dictionary of Linguistics. OUP.
79- Minnen, G. (2001) Efficient Processing with Constrained-Logic Grammars Using Grammar. CSLI. Stanford University. 80- Mitchie, D. (1968) Memo Functions and Machine Learning. Nature 218, 19 22.
81- Nivre, J. (2005) Two Notions of Parsing. In Arppe, A. et al.(Eds.) A Finnish Computer Linguist: Kimmo Koskenniemi. Festschrift on the 60th Birthday. CSLI Publications, 111-120.
82- ___ (2006a) Inductive Dependency Parsing. Springer.
83- ___ (2006b) Two Strategies for Text Parsing. In Suominen, M. et al. (Eds.) A Man of Measure: Festschrift in Honour of Fred Karlsson on his 60th Birthday. A special supplement to SKY Journal of Linguistics 19, 440-448.
84- Nivre, J., Rimell, L., McDonald, R. and Gmez Rodrguez, C. (2010) Evaluation of dependency parsers on unbounded dependencies. Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010), 833-841.
045
85- Norvig, P. (1991) Techniques for automatic memoization with applications to context-free parsing. Computational Linguistics 17(1): 91-98.
86- Noy, N. and McGuinness, D. (2001) Ontology development 101: A guide to creating your first ontology. Technical report, KSL-01-05, Stanford Knowledge System Laboratories. Website: http://protege.stanford.edu/publications/ontology_development/ontology101noy-mcguinness.html. 87- Nugues, P. (2006) An Introduction to Language Processing with Perl and Prolog: An Outline of Theories, Implementation, and Application with Special Consideration of English, French, and German. Springer.
88- Oltmans, E. (1999) A Knowledge-based Approach to Robust Parsing. Center for Thematics and Information Technology (CTIT), The Netherlands.
89- Parkes, A. (2008) A Concise Introduction to Languages and Machines. Springer. 90- Pereira, F. (1981) Extraposition Grammars. Computational Linguistics 7 243256
91- Pereira, F., Warren, D. (1983) Parsing as Deduction. In Proceedings of the 21st meeting of the Association for Computational Linguistics, pages 137144. 92- Pereira, F., Shieber, S. (2002) Prolog and Natural-Language Analysis. Microtome Publishing.
93- Pollard, C. and Sag, I. (1994) Head-Driven Phrase Structure Grammar. Chicago: University of Chicago Press and Stanford: CSLI Publications
046
94- Pullum, G. (1986) Footloose and context-free. Natural Language and Linguistic Theory 4, 409-414. (TOPIC...COMMENT series). 95- Pulman S. (1992) Unication-based Syntactic Analysis. In H. Alshawi (ed.), The Core Language Engine. MIT Press.
96- Radford, A. (1997/2003) Syntactic Theory and the Structure of English: a minimalist approach. Cambridge University Press.
97- Reeves, J. (1992) Zero Elements in Linguistic Theory. Unpublished PhD dissertation, University of Minnesota.
98- Reghizzi, S. (2009) Formal Languages and Compilation. Springer.
99- Rvsz, G. (1985) Introduction to Formal Languages. McGraw-Hill.
100-
Rimell, L., Clark, S., Steedman, M. (2009) Unbounded Dependency
Recovery for Parser Evaluation. EMNLP 813-821.
101-
Ritchie, C. and Mellish, C. (2000) Techniques in Natural Language
Processing. Department of Artificial Intelligence, University of Edinburgh Module Workbook. 102- Sag, I. (1982) A sematnic theory of NP movement dependencies. In Pauline Jacobson and Geoffrey Pullum (eds.), The Nature of Syntactic Representation, 427 465. Reidel, Dordrecht. 103- Sag, I. (1983) On parasitic gaps. Linguistics and Philosophy 6:35 45.
047
104- Sag, I., Fodor, J. (1994) Extraction without traces. In Proceedings of the Thirteenth West Coast Conference on Formal Linguistics, (ed. R. Aranovich, W. Byrne, S. Preuss, and M. Senturia), Stanford University. CSLI.
105- Sag, I. and Wasow, T. (1999) Syntactic Theory: A Formal Introduction. CSLI Publications, Stanford, CA.
106- Salomaa, A. (1973) Formal Languages. Academic Press.
107- Shieber, S., Schabes, Y., Pereira, F. (1995) Principles and Implementation of Deductive Parsing. Journal of Logic Programming, 24 (1-2): 336
108- Sikkel, K. (1997) Parsing Schemata - a framework for specification and analysis of parsing algorithms. Texts in Theoretical Computer Science. Springer. 109- Sipser, M. (2006) Introduction to the Theory of Computation, 2nd. Thomson Course Technology. 110- Skiena, S. (1998) The Algorithm Design Manual. Springer.
111- Slack, J. (1990) Unbounded dependency: tying strings to rings. In Proceedings of COLING-90. Website: http://aclweb.org/anthology-new/C/C90/C90-3047.pdf
112- Smith, G. (1991) Computers and Human Language. Oxford University Press.
113- Stabler, E. (1999) Formal Grammars. In Robert Wilson et al. (Eds.) The MIT Encyclopedia of the Cognitive Sciences, MIT. pp. 320-322.
048
114- Steedman, M. (1987) Combinatory grammars and parasitic gaps. Natural Language and Linguistic Theory 5, 403439.
115- Tomalin, M., (2006), Linguistics and the Formal Sciences (Cambridge: CUP).
116- Tomita, M. (1985) An Efficient Context-free Parsing Algorithm for Natural Languages. IJCAI. International Joint Conference on Artificial Intelligence. pp. 756764.
117- _____ (1987) An Efficient Augmented-Context-Free Parsing Algorithm. Computational Linguistics Vol. 13, No. 1-2, 31-46.
118- Trask, R. (1993) A Dictionary of Grammatical Terms in Linguistics. Routledge.
119- Winograd, T. (1983) Language as a Cognitive Process: Volume I: Syntax. Reading MA: Addison-Wesley.
120- Wood, M. (1993) Categorial Grammars. Routledge. 121- Younger, D. (1967) Recognition and Parsing of Context-free Languages in Time n3. Information and Control 10(2): 189208.
049

A Proposed Approach To Handling Unbounded Dependencies in Automatic Parsers

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

A Proposed Approach To Handling Unbounded Dependencies in Automatic Parsers

Uploaded by

Copyright:

Available Formats

University of Alexandria Faculty of Arts English Language Department

A Proposed Approach to Handling Unbounded Dependencies in Automatic Parsers

to the memory of Professor Hassan Atiyya Taman (2010)

70 71 73 77 81 82 83 84 86 89 91 92 93 95 96 98 100 103 104 105 108 110 112

John Simon, Paradigms Lost

Sag &Wasow, Syntactic Theory: a formal introduction

Symbols and Abbreviations

List of Figures and Tables Figures

(27) (28) (29) (30) (31) (32) (33) (34) (35)

Position/function of Gaps. Multi-locus Gaps.

1.2. The Problem

In Mellish et al. (1994) the situation is even more clear-cut:

1.3. Aims and Contributions:

Proposing a syntax-theoretic solution for UDs in terms of a proposed gapsontology.

1.4. Thesis Structure:

1.5. UDs Defined

1.6. The Class of UDs:

1.6.1. Strong UDs:

1.6.2. Weak UDs:

Chapter 2: UDs and Syntactic Formalisms

Chapter 2: UDs and Syntactic Formalisms

2.1. Derivational Approaches to UDs

2.2. UDs in Generalized Phrase Structure Grammar (GPSG)

Figure (7) Tree geometry of the structure of a UD in GPSG

2.3. UDs in Head-driven Phrase Structure Grammar (HPSG)

2.4. Categorial Grammar(s)

Figure (13) A representation of who Jo hits?

2.5. Lexical Functional Grammar (LFG)

The corresponding f-structure looks like the following

2.6. Towards an Ontology of Gaps

2.6.1. Gaps between Objects and Subjects:

Whoi [do you think [_i was responsible]]?

2.6.2. The Distribution of Gaps:

Position/Function Example(s) (Mono-locus Gaps)

AdjPs in predicative complement function

Closed interrogatives in complement function

These are the only dishesi [that they taught me [howj to

interrogatives in complement function

Non-finite clauses in post-head complement function

Multi-locus Gaps Nested dependencies

2.6.3. The Ontology:

Positions/ functions (table 1)

UDs types (UDtp)

UDs class (UDcl)

Chapter 3 Parsing and Formal Languages

3.1. The Concept of a Formal Language:

3.2. Defining a Generative Grammar:

3.3. Formal Grammars and their Relations to Formal Languages

3.4. The Chomsky hierarchy

3.6. Parsing Theory and Strategies

3.7. The Universal Parsing Problem

3.8. Major Parsing Directions

3.9. Top-down parsing:

3.10. Bottom-up Parsing:

3.11. The Cocke Kasami Younger Algorithm:

3.12. The Earley Algorithm:

3.13. Statistical or Grammarless Parsing:

3.14. Text vs. Grammar Parsing: the Nivre Model

*, derive some or all of the

3.15. Text Parsing and the Problem of UDs:

Chapter 4 UDs Parsing Complexity:

4.1. The Rimmel-Clark-Steedman (RCS) Test:

4.2. The Parser Set