You are on page 1of 21

Mildly Context-Sensetive

Languges

Cogs 500 – Introduction to Cognitive Science


by
H. Gökalp Demirci

Gökalp Demirci
Content
• Introduction
• We need another class
• Mildly context sensitive languages
– Definition
– TAG Formalism
– Automaton for MCSL
• Conclusion

Gökalp Demirci
Introduction

• Formal language: Set of words over an


alphabet.
• Mainly interested in classification of formal
languages and formalism describing
languages.
• Chomsky Hierarchy (1956) described majority of
knowledge we have today on formal language
theory.

Gökalp Demirci
Introduction

Gökalp Demirci
Introduction

• Natural languages are set of strings over


certain alphabets.
• So they are also formal languages.
• Where do they stand in Chomsky Hierarchy?
• Which class do they belong?
“The main problem of immediate relevance to the theory of lan-
guage is that of determining where in the hierarchy of devices the
grammars of natural languages lie.” [Chomsky, 1959, p. 138].

Gökalp Demirci
We need a
new class

• Natural languages are not regular.


Proof: Palindromic strings occur in natural lang.

“It is clear, then that in English we can find a sequence a + S1 + b,


where there is a dependency between a and b, and we can select as
S1 another sequence c + S2 + d, where there is a dependency
between c and d ... etc. A set of sentences that is constructed in this
way...will have all of the mirror image properties of [2] which exclude
[2] from the set of finite languages.”
(Chomsky 1957)

Gökalp Demirci
We need a
new class

• Natural languages are not context-free.


• Chomsky asked this question in 1957
• Proven by Huybregts (1984) and Shieber
(1985)
Proof: Cross-serial dependencies occur in
natural languages like Swiss-German, Dutch..

• dat Jan Piet Marie de Kinderen zag helpen laten zwemmen


(That Jan Piet Marie the children saw help make swim )
(That Jan saw Piet help Marie make the children swim )

Gökalp Demirci
We need a
new class

• It is clear that recursively enumerable


languages properly contains natural languages.
• Turing Machines are able to parse natural
languages.
• It is also true that natural languages are context
sensitive (may be parsed by a TM in
polynomial space).
• So how much extra power do context free
grammars need to parse natural languages.
• We need a new class in the hierarchy.

Gökalp Demirci
Mildly
Context-
Sensitive
Languages

Definition
• First introduced by Aravind Joshi in 1985.
• Here is an informal definition of MCSL:
– Beside that it contains all context-free
languages, it should have following properties:
• It admits limited cross-serial dependencies.
• Languages in MCSG are parsable in
polynomial time.
• It has constant-growth property.

Gökalp Demirci
Mildly
Context-
Sensitive
Languages

Definition
• Structures that formally define MCSL:
– Tree adjoining grammars (TAG)
– Combinatory categorial Grammars (CCG)
– Linear indexed grammars (LIG)
– Head grammars (HG)
– Linear context-free rewriting
systems(LCFRS)

Gökalp Demirci
Mildly
Context-
Sensitive
Languages

Tree Adjoining Grammars (TAG)


• Developed by Joshi in the 70's and 80's.
• It is a tree generating system rather than a
string generator.
• Consists of initial and auxiliary trees.
• Trees can be combined by substitution and
adjunction.
• (note the difference from other rewriting
systems: adjunction!! )

Gökalp Demirci
Mildly
Context-
Sensitive
Languages

Tree Adjoining Grammars (TAG)


• Formal definition:
A tree-adjoining grammar (TAG) G is a quintuple {Vn, Vt,Tini,
Taux, S}, where
Vn is a finite set of non-terminals
Vt is a finite set of terminals
Tini is a finite set of trees, called the initial trees
Taux is a finite set of trees, called the auxiliary trees and S⋲Vn is the
start symbol.

Gökalp Demirci
Mildly
Context-
Sensitive
Languages

Gökalp Demirci
Mildly
Context-
Sensitive
Languages

Gökalp Demirci
Mildly
Context-
Sensitive
Languages

• Let's see an example of parsing with cross-


serial dependencies, Dutch sentence:
“ik haar hem de nijlpaarden zag helpen voeren”

Gökalp Demirci
Mildly
Context-
Sensitive
Languages

“ik haar hem de nijlpaarden zag helpen voeren”

Gökalp Demirci
Mildly
Context-
Sensitive
Languages

Automaton for MCSL


• All classes in the hierarchy has automatons
that basically simulate the grammar in a
more machine fashionable way.
• TAG's has an automaton called embedded
pushdown automaton (EPDA) defined by Vijay-
Shanker in 1988.
• Basically it differs from PDA by its ability to
create new stacks on the left and right of the
current stack.

Gökalp Demirci
Mildly
Context-
Sensitive
Languages

• Let's see how EPDA works on an example


parsing of sentence:
“Jan Piet Marie zag laten zwemmen ”
• Notice the crossed dependencies in this
Dutch sentence.

Gökalp Demirci
Mildly
Context-
Sensitive
Languages

Gökalp Demirci
Conclusion

• We have seen that the hierarchy defined by


Chomsky is not enough for fitting natural
languages into the picture.
• So Joshi added one more level to the hierarchy,
mildly context-sensitive languages, between
context-free languages and context-sensitive
languages.
• MCF grammars are widely used for natural
language processing.
• We could process by algorithms recognizing
context-sensitive languages then we will have
problems on complexity of algorithm and
unlimited cross-serial dependencies.

Gökalp Demirci
Thank you for your time

References
1.On the Learnability of Mildly Context-Sensitive Languages using Positive Data and
Correction Queries, Doctoral dissertation, L. B. Bona
2. Mildly Context Sensitive Grammar Formalisms, Petra Schmidt
3. Tree-Adjoining Grammars, A. K. Joshi, Yves Schabes
4. A model-theoretic approach to Mildly Context-sensitive Grammars, Ippei Ukai
5. The Convergence of Mildly Context-Sensetive Grammar Formalism, Joshi et. al.
6. Restricting Grammatical Complexity, Robert Frank

Gökalp Demirci

You might also like