The Chomskyan Theoryandits Implicationsfor Language Teachingand Learning

The CHOMSKYAN THEORY AND ITS IMPLICATIONS FOR LANGUAGE
TEACHING AND LEARNING
FARAJ MOHAMED SAWAN
2017
DEDICATION
This work is dedicated to the memory of my

parents.
Acknowledgements
I take this opportunity to express my deepest gratitude and appreciation for
the assistance, guidance and encouragement that dr. Mohamed Grenat
always gave me without hesitation. Thanks are also due to Steven Schaufele
and Ahmed Reza Lotfi who have sent me some of the literature used in
preparing this work. I am also indebted to the members of Graduate office at
the English Department in both Tripoli University and the Academy of High
Studies.
Abstract
This book tries to relate the work done in Chomskyan Theory to the areas of
language teaching and learning. The book is divided into two main parts.
Using the historical approach, the first part traces back the stages, changes
and developments that occurred during the development of the Chomskyan
Theory. The second explains in some detail the possible implications for
language teaching and learning. Although no direct connection can be made
between the Chomskyan Theory and language teaching and learning, this
humble work has shown that there are some important implications that can
be used in these two areas.
Although Chomsky never intended to devise a theory of or for Applied

Linguistics, the influence of his theory on language teaching has come from
his idea of what human language is. Whereas his actual theory of linguistics
has changed considerably since the 1950s, his views of what such a theory
should achieve has remained fairly constant. The general question is
language acquisition, but not in the sense of teaching. It is an essential
feature of Chomsky's analysis of the problem that first language acquisition is
independent of teaching.
Contents
DEDICATION.............................................................................................................................3
Acknowledgements..................................................................................................................4
Abstract....................................................................................................................................5
Introduction..............................................................................................................................9
CHAPTER (1)............................................................................................................................12
THE CHOMSKYAN LINGUISTICS...............................................................................................12
Past and Present.....................................................................................................................12
1.1 Introduction..................................................................................................................12
1.2. Early Transformational Grammar................................................................................13
1.3. The Standard Theory....................................................................................................14
1.4. The Extended Standard Theory....................................................................................14
1.5. Government and Binding.............................................................................................15
1.6. The Minimalist Program...............................................................................................17
CHAPTER (2)............................................................................................................................22
THE ESSENCE OF CHOMSKYAN THEORY.................................................................................22
2.1. Overview......................................................................................................................22
2.2. Claims of the Theory....................................................................................................24
2.3. Transformations vs. Constraints..................................................................................29
2.4. Levels of Representation..............................................................................................31
2.5. The History of Derivations...........................................................................................36
CHAPTER (3)............................................................................................................................41
THE MODULES OF THE GRAMMAR.........................................................................................41
3.1. Overview......................................................................................................................41
3.2. Ẋ Theory.......................................................................................................................43
3.2.1. Introduction..........................................................................................................43
3.2.2. Universal Base and Functional Heads...................................................................44
3.2.3 The Inadequacy of PS Rules...................................................................................50
3.3. Ө Theory......................................................................................................................52
3.3.1. Introduction..........................................................................................................52
3.3.2. The Ө-Criterion.....................................................................................................52
3.4. Government Theory.....................................................................................................57
3.5. Case Theory.................................................................................................................59
3.5.1. Introducing Case Theory and the Case Filter.........................................................59
3.5.2. How Many Cases Are There?................................................................................62
3.5.3 What can Assign Case?..........................................................................................63
3.5.4 Under What circumstances can Case be Assigned?...............................................64
3.5.5 Case and Minimalism.............................................................................................65
3.6 Binding Theory..............................................................................................................67
3.6.1 Introduction...........................................................................................................67
3.6.2 Binding Theory Features and Principles.................................................................68
3.6.3 The Typology of Empty Categories.........................................................................70
CHAPTER (4)............................................................................................................................75
IMPLICATIONS FOR LANGUAGE TEACHING AND LEARNING...................................................75
4.1 Introduction..................................................................................................................75
4.2 Impact of Chomskyan Theory on Language Teaching and Learning..............................79
4.3 Implications for Language Teaching and Learning........................................................80
CHAPTER (5)............................................................................................................................89
CONCLUSION..........................................................................................................................89
References..............................................................................................................................93
Introduction
The topic of this book is divided into two parts. The first deals with the
Chomskyan Theory, and the second shows how we can benefit from it in
language pedagogy. This piece of work is mainly concerned with the most
important changes that took place during the course of developing the
transformational-generative framework. This particularly applies to those that
took place after the publication of Noam Chomsky's 1957 book Syntactic
Structures. The historical approach is used to trace each change and explain
it in a plain consistent way. Developments and changes are analysed and
traced back to its precursors. The book also shows how these developments
complement each other. Providing a way to benefit from the implications of
the theory in teaching and learning languages is also a main concern in this
book.
The work is divided into five (5) main chapters. Each chapter treats some
aspect of the framework and is subdivided into several sections. The first
chapter, which is divided into six (6) sections, briefly summarizes the history
of the Chomskyan Linguistics and its present state. It states the most
important issues, assumptions, and advancements that occurred during the
course of developing the Chomskyan Theory. Each section deals with a
distinct version of the Chomskyan Theory. Chapter (2) is divided into five (5)
sections. Section (1) identifies the labels used to refer to the Chomskyan
Theory. It discusses the legitimacy of these different labels and provides the
basis to accept or resist them as labels or names for this theory. Section (20
restricts the fundamental assumptions and claims of the changes and
developments that occurred in the course of their modification. Section (3)
clarifies the shift from movement transformations to constraints on them in the
post Aspects Model and the replacement of movement transformations by a
single transformation: Affect Alpha. A derivation consists of distinct
representational levels. Section (4) the history and changes that happened to
these levels. Section (5) shows how trace theory and the Structure
Preservation Constraint (SPC) together assure that the history of a derivation
can be recovered at any step in the course of a derivation.
As it will be seen in section (3) of chapter (2), the shift in attention in the
Revised Extended Standard Theory (REST henceforth) was from
transformations to constraints on them. These constraints are grouped into
what are often called "modules". Titled properly 'Modules of the Grammar',
Chapter (3) defines these modules and explains each one and its subsequent
modifications in a separate section.
Chapter (4) introduces the most important implications of the transformational-

generative grammar (TGG henceforth) for language teaching and learning. It
is an attempt to provide a newer way of looking at syntactic theory and how it
can be used in language pedagogy.
Chapter (5) concludes the book with a summary of the main and most
important points of the Chomskyan Theory.
CHAPTER (1)
THE CHOMSKYAN LINGUISTICS
Past and Present
1.1 Introduction
It is widely known that Chomsky has brought about a revolution in the field of
linguistics. He has postulated a syntactic base called deep structure that
consists of phrase structure rewrite rules and a set of transformations. Those
phrase rewrite rules generate base or kernel sentences. Then
transformational rules transform these kernel sentences into derived ones the
sentences of languages then, can be generated by the application of
transformational rules to the kernel sentences according to an obligatory and
optional set of transformational rules. Thus, a derivation involves a sequence
of phrase markers, the first of which is a base and the last is a surface
structure that is equivalent to an actual sentence.
Despite the constant development in this framework, the notion of

'transformation' remained central in every version of TGG. Similarly, other
characteristics of TGG, such as its analysis of inflectional affixes as being
independent elements, continued to exist in subsequent formulations of the
theory.
Nowadays different linguists practice various frameworks of syntactic

theory.one of these frameworks is the Transformational Generative Approach
being practice by Chomsky and many others of his students and followers.
The fundamental basis of this framework is that there is a language faculty in
the brain responsible for language acquisition. It consists of a system that
stores data and other systems that access the data (Chomsky, 1995).
Noam Chomsky has shifted the focus of investigation from the "performance"
(the speaker's actual use of language) to the "competence" ( the
subconscious control of a linguistic system). He criticized the empirical
approaches of the previous decades and showed that they are inadequate to
explain the complexities of linguistic structure and that a generative model is
more adequate. He argued that semantic considerations were an integral part
of grammatical analysis and posited a deep structure in his grammatical
analysis.
Furthermore, Chomsky presupposed that language acquisition is a matter of

mastering a rule system that permits us to distinguish between grammatical
and ungrammatical sentences. He argued that this system of rules lies in the
brains of the speakers of language and cannot be discovered by studying a
limited corpus. Chomsky considered the study of language a clue to our
understanding of the human mind and argued that linguistics could
legitimately be a branch of cognitive psychology (Chomsky, 1972). He also
pointed out that the aim of linguistics was to find out how language works and
to establish the universal characteristics that define human languages. He has
done his best to account for the creative aspect of language, which enables
speakers to produce and understand new sentences uttered for the first time
(i.e. the creativity of language). He, moreover, tried to explain the fact that
children have to learn their languages in a short period of time on the basis of
a limited data (i.e. the poverty of the stimulus).
1.2. Early Transformational Grammar

The earliest version of TGG was formulated in Chomsky's 1957 book,
Syntactic Structures. This book contained formal rules licensing all and only
the grammatical sentences of the language in question. The key assumption
of TGG is that an adequate grammar must generate sentences in a
sequence, each related to the next by a transformation. For example, passive
constructions were generated from the same underlying structure as their
active counterparts. This was done, as shown in (1b), by a passive
transformation that shifted the order of the two NPs and inserted the auxiliary
be and the preposition by in their appropriate positions.
(1) a. The police chased the criminal.

b. the criminal was chased by the police.
In his book, Chomsky (1957) also proposed that tense was a separate
element apart from the verb in the underlying structure. A movement
transformation too was designed to derive the construction of 'inversion
questions'. To account for the negation of sentences, moreover, he proposed
an insertion transformation that positions 'not' in its appropriate place. Both
these two transformations intervene and prevent the combination of the verb
with the tense inflection1. For this reason, Chomsky devised a transformation
capable of inserting the dummy do in order to carry tense. Several other
functions of the auxiliary do (e.g. in ellipsis constructions) were analyzed as
instances of tense stranding. This syntactic dissection of the functions of the
auxiliary do as well as the clear demonstration Chomsky used in his book
have convinced many linguistics.
1.3. The Standard Theory

After the modifications made to TGG in 1965, Chomsky labelled the resulting
framework the 'Standard Theory'. It differed technically from its antecedent
TGG in certain aspects. Among the defining characteristics of this theory were
the innovation of recursive phrase structure rules and the introduction of
syntactic features to account for the subcategorisation. It surpassed the early
TGG by proposing a semantic component called 'deep structure' (DS) to
represent the necessary information for interpreting sentences. In this deep
structure, a simple mapping between semantic roles and grammatical
relations was claimed. The words and phrases in the surface structure were
arranged identically, as is in an actual sentence (i.e. after the application of
certain operations to DS, the resulting structure will be similar to the spoken or
written sentence). Thus, transformation in this theory played a very crucial
role in linking sound to meaning.
1.4. The Extended Standard Theory
1
This process is known as Tense Stranding in linguistic studies.
Unlike the generative semanticists who claimed that all languages could be
derived from the same underlying structure, Chomsky and others (Stern,
1996) rejected the idea that similar sentences with identical deep structures
must be synonymous. They insisted that transformations involved in the
reordering of quantified expressions are capable of changing the scope of
quantifiers.
Furthermore, they argued for the existence of another kind of structure

responsible for semantic interpretation. Some empty categories were also
introduced such as the subject of infinitives and traces that resulted from
movement. Chomsky who named this theory the 'Extended Standard Theory'
(EST henceforth) schematised the phrase structure rules and proposed a rich
conception of the lexicon. This approach has modularised the theory of
grammar with distinct mechanisms to process different phenomena. A main
concern of EST has been to constrain the power of the theory in making
available certain classes of grammar. This was accomplished by formulating
principles and proposing parameters upon which languages vary. The number
of possible grammars (i.e. languages) depends on the setting of those
parameters. The rationale behind such constraints was explaining language
acquisition, which Chomsky regards as the ultimate goal of linguistic research.
1.5. Government and Binding

Later on, and by the cooperation of various proponents of his revolutionary
framework many changes and developments took place. For example, in the
early 1980s the REST was refreshingly replaced by Government & Binding
(henceforth GB) and its mechanisms. GB theory is a syntactic theory also
made famous by the publication of Chomsky's 1981 book, Lectures on
Government and Binding. It involves relations of certain categories and
elements "governing" and "binding" others in relation to their placement in the
sentence. These relationships were supposed to explain restrictions on
sentence output. (why a sentence "can" and "cannot" be said in certain ways).
The main topic of research in GB was the development of Universal

Grammar. Within this context, GB proponents claim that many principles of
the grammar are parameterised. Therefore, learning a language only requires
fixing a certain set of parameters that are exceptions to universal linguistics
principles plus learning the vocabulary of the language in question. According
to GB all languages are essentially similar and vary only in fixing a limited set
of parameters. The accomplishment of details about these parameters has
been the most active area of research in syntax since the early 1980s.
GB assumes that Universal Grammar consists of certain principles that are

shared by all languages. GB theory views Universal Grammar as a
computational system made of two components: levels of representation and
a system of constraints. It assumes a derivational model comprising four
levels of representation (Black, 1999). The lexicon includes all the lexical
items together with their idiosyncratic properties. For example, these
properties include what sort of 'subject' or 'object' the verb must have. Lexical
items are brought together at D-structure (underlying structure).
D-structure is converted into S-structure, which represents the surface order

of the sentence. S-structure is not directly interpreted itself, but is transformed
into Phonological Form (henceforth PF) and Logical Form (henceforth LF). PF
is responsible for representing the phonological aspects of language. The
phrase structure at LF explicitly represents the semantic relationships.
GB theory represents a great shift in the generative tradition. This shift was
from transformations to constraints on them. These constraints are grouped
together in "modules". These modules are semiautonomous systems that
contain principles and constraints on those principles. Each module applies at
particular points in a derivation. Each one of these modules has its own
universal principles. An output of a derivation is the result of the interaction
between these modules. Transformations, moreover, were reduced to a single
moving anything anywhere. Other general principles prevent Move α from
overgenerating by filtering out any derivation. GB is also enriched by a
number of new empty categories (see chapter 3). Binding theory, on which
GB research concentrated, constraints on movement to anaphor/ pronoun-
antecedent relations. As a result of the link between movement and the
binding principles2, a richly interconnected system emerged. For example, a
constituent can only move to a position where it can bind its trace as shown in
(2), otherwise the derivation will be ill-formed.
If the word who has been positioned in a place where it does not bind its
trace, then the question in (2) will be ungrammatical. An important connection
among movement, c-command, and binding theory is the fact that
constituents cannot move rightward because they (i.e. constituents) will not be
able to c-command the traces left behind. Thus, they cannot bind these traces
either.
To conclude, we summarise the most noteworthy features of GB in the

following list:
1. A highly articulated phrase structure encoding the important distinctions

and relations.
2. The use of a single movement transformation Move α.
3. An extensive use of empty categories.
4. The use of parameterised universal principles.
5. The elimination of language-specific rules.
1.6. The Minimalist Program

Within ten years, GB too had become overloaded by many modifications,
resulting in its replacement with the minimalist approach to language. There
are two different characteristcs of the recent Minimalist Program (henceforth
MP) developed by Chomsky and other linguists of generative syntax. First,
derivations and representations conform to an economy condition requiring
2
The principles of binding will be discussed in chapter (3).
that they be minimal: no extra steps in derivations and no extra symbols in
representations are allowed. Second, the theory itself has progressed in the
direction of Minimality. Thus, the collection of different earlier transformations
is substituted by Affect Alpha. The constraints on transformations and
representations also avoid redundancy by not overlapping in a process that
yields the same output.
MP is carried still further with specific proposals to reduce the levels of

representation to the two minimally basic "interface levels" of Phonetic Form
and Logical Form, which establish the instructions needed for the articulatory-
perceptual and conceptual-intentional performance systems. It has also
reduced X-bar theoretic relations to the primitives of "specifier", "head", and
"complement". Syntactic movement, moreover, has also been reduced to the
elementary processes of copy and delete. Chomsky, in his 1995 book, The
Minimalist Program, has advocated a view of language that is derivational and
involves transderivational economic conditions on derivations. There, he
viewed the syntactic component as a "near perfect" system associating
selections of lexical items to pairs of phonological and logical forms. Variation
in the syntactic component is essentially considered morphological in nature:
strong features on heads force movement of phrases to local domains for
checking and elimination (Johnson, 1996). In Minimalist accounts the
language faculty is considered as consisting of two parts:
1. A cognitive system to store the data ( a computational system and a

lexicon).
2. Performance systems to use and access the data (the "external" systems
Articulatory-Perceptual and Conceptual-Intentional interacting with the
cognitive system at two interface levels of PF and LF respectively).
According to Chomsky (1995), there is only one computational system for

human language and a lexicon. This computational system consists of two
operations Merge and Attract/Move. The Minimalist Program is a theory of
Universal Grammar (UG) that considers a linguistic expression to be the most
economic product of the interface conditions. The economy conditions of such
a character 'select among convergent derivations' (Chomsky, 1995: 378). The
Minimalist Program does not presuppose the existence of any conditions
(such as the Projection Principle) which relate lexical properties and interface
levels (ibid: 220). Viewed this way, the economy of UG (for Chomsky) is
mainly about derivational operations of the computation (derivation) such as
Merge and Move. UG is not concerned with other operations of the cognitive
system, such as LI (Lexical Item), FF (Formal Feature) selection for
numeration N, or ther components of the system like "Lexicon", but this is
neither obvious nor empirically verified yet (Lotfi, 2000).
As its name denotes, MP is a research program rather than a syntactic theory.

The key assumption in MP is that grammars do not generate sentences but
rather they choose the most economic sentence from competitor expressions.
MP has emerged from GB, but it represents a radical departure from it.
Applying economic conditions to grammars and their operations in explaining
language structure is the main goal of Minimalism. The value of analyses
depends on how much they minimise the amount of structure and the length
of derivations. Chomsky, in his 1995 book, The Minimalist Program, has
proposed some principles to minimise the structure and length of derivations.
These include 'Procrastinate', which says that a constituent does not move
any earlier than necessary, and 'Greed', which says that a constituent does
not move to satisfy a constraint that properly applies to another constituent.
This version of grammar in which transformational derivations compete in
forming grammatical sentences represents a major shift in the methodology of
it is worth mentioning that MP has stimulated research in the field even
though working out its details is still in its earliest stages. MP is still a very
auspicious framework for clarifying the realities of the language faculty in a
simple, natural, and economical way. The feature-based analysis in the study
of syntactic derivations has been a very positive progress in the history of the
study of syntax, too.
In fact, Chomskyan Linguistics is in a very rapid period of theory evolution

right now. It has undergone four major revisions since Chomsky's seminal
work in the fifties. The original theory was Generative-Transformational
theory, then came Government & Binding, then Principles & Parameters, and
now Minimalism. Universal Grammar is still central as are the claims
regarding the "poverty of the stimulus", and the unlearnability of UG from the
data the child obtains. To do him justice, Chomsky has been the hand that
guided and modelled his framework regularly. He and others had published
many books and papers until now. Studies are still carried out to solve some
controversial and ambiguous issues. Most of these studies deal with language
from the same point of view (as a component of the brain). These works are
considered a great contribution in the development of the Chomskyan Theory
and each represents a change in the focus of syntactic theory as a whole.
CHAPTER (2)
THE ESSENCE OF CHOMSKYAN THEORY
2.1. Overview
The theory under discussion suffers a lot from the undesirable confusion in its
terminology, because it lacks a consistent label that is acceptable to
everybody. It has been given many undistinguished labels. Some call it
Government & Binding (GB), others call it Principles & Parameters Approach
(P&P or PPA henceforth), and there are also those who prefer to to identify it
with rather different labels such as Minimality or the Minimalist Program (MP).
It must be noted that these labels are not synonymous, because each one
refers to a quite distinct version of the theory, and there is no clear cut
between the stages defined by them. Thus, Alec Marantz (1995) was right in
referring to it as "this latest version of Chomsky's Principles and Parameters
Approach". He implies by this that, at least in his mind, "Minimality" is just a
newer version of "PPA". The list in (1) below shows the different labels used
to identify this theory, all these labels (or names) are objectionable for some
reasons.
(1)
a. 'The framework that is associated with Noam Chomsky and his students at
the Massachusetts Institute of Technology.'
b. Standard Theory (ST henceforth).
c. Revised, Extended Standard Theory (REST).
d. Government & Binding (GB).
e. The Principles & Parameters Approach (P&P or PPA).

f. Minimality/ Minimalist Program (MP).
We might assign the name 'Chomskyan Theory' to (a) above, but this would
be unacceptable to many because this theory is a result of cooperation
between many researchers. In fact, this theory is not tied to a single individual
or small group of individuals. While Chomsky was the guide and evaluator of
the new developments, the research in this program is freewheeling and its
proponents frequently disagree among themselves including Chomsky
himself. So what is the most common label used to identify this theory?
One label is 'Standard Theory', which offends many people because it entails
that the "standard" is given by Chomsky and his followers, and whatever
deviates from it is not. Many "Standard Theoreticians" who talk as if the
"Standard Theory" were the only theory available reinforce this attitude.
Furthermore, the label "Standard Theory" refers to the entire history of
syntactic theory that is built by Chomsky and his students over several
decades. In fact, it also includes several fundamental sections, which have
been developed occasionally and differently. This framework began in the
mid-sixties with the application of Chomsky's 1965 book "Aspects of the
Theory of Syntax". The label "Standard Theory" refers specifically to the
theory presented there. It is also called the "Aspects Model".
Over the fifteen years that followed, the framework was revised to the extent
that its character changed fundamentally. By the early eighties, another
different framework was developed from the "Aspects Model". The publication
of Chomsky's 1979 lectures at Pisa under the title "Lectures on Government
and Binding" presents this framework in an organised coherent form for the
first time.
Unfortunately, the title of the book was given to the framework (Government &
Binding or GB). The Pisa lectures and the book were appropriately titled
because in them Chomsky concentrated on two particular sub-theories,
namely "government and binding"3, but the framework as a whole consists of
many such sub-theories besides that "government" and "binding" are not the
3
Government is the relation between a syntactic head and its dependents, but binding refers to the
relationship between a pronoun or anaphor and its antecedent.
most important ones in it. They are just those that Chomsky had more to say
about in1979. Chomsky himself has expressed his regret for labelling the
entire theory with it, and his preference for the label "Revised, Extended
Standard Theory" (REST). As Steven Schäufele (1999) pointed out in his
synthinar lecturattes, this label can be used in referring to the Chomskyan
Theory.
During the second half of the eighties, the label "Principles & Parameters
Approach" (P&P or PPA) was developed among the proponents of the
framework. Recently, a new label was circulated among the proponents of the
framework. It was the result of the works published in the early 1990s, namely
the "Minimalist Program (MP).
Finally, one has to know all these labels, because some proponents of the
framework are sensitive about using one label or another. It is also useful
when writing research papers to give all the labels, state explicitly what they
denote and then choose one and use it throughout the research paper.
2.2. Claims of the Theory

The proponents of any theory must assert the original assumptions that
establish its basis. They must say how it deals with the phenomena to be
observed in a way that leaves no doubt. In other words, the tenets of their
theory must be stated clearly.
The original Chomskyan assumptions, which were first established in the

Aspects Model (i.e. Standard Theory), are summarised in (1) below:
(1) Claims of REST:
a. All syntactic relations can be described in terms of "Constituent Structure".
b. Constituents move from one part of structure to another, consequently, one

part of structure is transferable into another. Movement transformations
distinguish the sequence of constituent structure involved in the generation of
a particular syntactic string. Each sequence of constituent structure is called a
"derivation".
c. there is a set of well-formedness conditions resident in the syntactic

component of the grammar to which constituent structures, transformations,
and derivations must conform (Steven Schäufele, 1999).
The term "Constituent Structure" denotes a complex concept that refers to the
way of organising the words and other constituents in a string that involves a
combination of two logical relations, namely "Dominance" and "Precedence".
If a constituent precedes another in linear order, the two are said to be in
precedence relation. Theoretically, "Dominance" involves the notion of
constituent being contained within another. "Tree diagrams" are used to
represent dominance relations. By way of illustration, consider the tree
diagram in (2).
The labels S, NP₁, VP, and NP₂ are called nodes. These nodes represent the
constituents of the string described by the diagram. A node that is linked to a
lower node by a line dominates that node. The S node in (2) for example,
dominates all the other nodes in the tree. NP₁ in turn immediately
dominates Det and N, and VP dominates the inflected verb killed. The nodes
occupied by the words him, the, boy and killed do not dominate anything.
They are called terminal nodes. We describe nodes, which dominate a single
node, as "non-branching", but nodes that dominate more than a node like S
and NP₁ as "branching" (Schäufele, 1999; Ouhala, 1999).
For clarification purposes, kinship terminology is used in talking about nodes.

If a node A immediately dominates another node B, then A is the mother of B,
and B is the daughter of A. if a set of nodes share the same mother, then they
are sisters. It is assumed that any node may have at most one mother. The
Dominance and Precedence relations are transitive relations. If A dominates
or precedes B and B dominates or precedes C, then A dominates or
precedes C. Immediate dominance is not transitive, of course. Another
observation about Dominance and Precedence relations is that they are
mutually exclusive. Nodes can be in either a dominance or precedence
relation, but not in both. However, it is desirable to maximise Dominance and
Precedence relations in their domains. Any two nodes must be in either a
precedence or dominance relation. This is a result of the claim that, for
example in (2), all the daughters of NP₁ (i.e. Det and N) precede the verb killed,
therefore the nodes NP₁ and the verb killed are in a precedence relation. This
definition implies that the branching lines, which link between mother nodes
and daughter nodes, may not cross. This is a typical assumption of REST.
Thus Constituent Structure is describable in terms of Dominance and
Precedence relations. If we can draw a tree diagram of it, then that tree
diagram represents its constituent structure. (ibid.).
There is also another relation called adjacency. Two nodes are adjacent if
there is no third node intervening between them. As we shall see in chapter
(3), the Adjacency relation is important for specific details of the theory.
In fact, the claim in (1a) is a fundamental assumption of Standard Theory that

is not shared .by some other frameworks. What this means in Standard
Theory is that terms like "subject" and "object" are merely a shorthand for the
longer but more precise explanations. To give an example, a "direct object" in
Standard Theory is an NP immediately dominated by a VP node, while a
"subject" is an NP immediately dominated by an S node.
What is critical now is the claim that the proper analysis of syntactic string
may involve several constituent structures, which share a common skeleton
and the same lexical items. The set of all the constituent structures involved is
called the derivation of that string. Note that a given item may occupy different
positions in different constituent structures of the derivation.
In the Aspects Model, the derivation was a sequence of constituent structures

with a linear order that could be imposed on these "levels". The Grammar in
the Aspects Model was divided into three components: the syntactic
component, the semantic component, and the phonological component. The
syntactic component includes two more sub-components: a base component
and a transformational component. The function of the semantic component
was to relate the deep structures generated by the sub-component to the
meaning of sentences. The phonological component on the other hand relates
the surface structures generated by the transformational sub-component to
the phonetic forms of sentences. Therefore, each level is immediately
preceded by at most one level. The Grammar according to the Aspects Model
then is as shown in (3).
The situation is even more complicated in th more recent "Minimalist" work
because well-formedness conditions are somewhat different. They relate to
the derivation as a whole and less to individual levels and transformations
within it. For instance, the economy conditions 'select among convergent
derivations' (Chomsky, 1995: 378). To explain the basic notion of economy
consider the following sentences adopted from Neil Smith's (1999) book,
Chomsky: Ideas and Ideals (page 89).
(4) a. I think John saw a buffalo.
b. What do you think John saw?
c. Who do you think saw a buffalo/
d. Who do you think saw what?
(5) *What do you think who saw?
In questions that contain only one Wh-word, that word can move to the front
of the sentence as shown in (4b,c), but when the sentence contains two Wh-
words as shown in (4d) only one can move to the front of the sentence while
the other wh-word remains in its place. The problem is that why cannot what
in (5) move to the front of the sentence. Why is (5) ungrammatical? The
answer is that (4d) is more economical than (5). (4d) is more economical
because the wh-word who is nearer to Spec CP than what. Although the two
constituents can move, the "Shortest Movement" condition permits only the
nearest constituent to the specifier position of CP to move. This generalises to
other constructions showing that some principle of economy holds. Such
constructions include inversion questions in which the first auxiliary has to
move to the front of the sentence as shown in the following example.
(6) a. Mariam might have won.
b. Might Mariam have won?
c. *Have Mariam might won?
Here also the same condition of economy applies to permit only the nearest
first auxiliary to move to the front of the clause.
2.3. Transformations vs. Constraints

What are the differences between REST and ST? during the post-Aspects
development of the Standard Theory the basic assumptions of the framework
were fundamentally renewed, particularly in the transformational component.
The attention was shifted from movement transformations to constraints on
them. The attention in the Aspects Model was given to identifying and defining
transformations4. The syntactician's job was to define what a specific
transformation does and under what conditions it is appropriate.
Transformations were regarded as language specific, which means that a
child learning a language has to learn a number of transformations.
During the seventies, while the formal language of the Aspects Model
continued to be used, the cutting edge of syntactic research, at least in the
Chomskyan School, was not to define specific transformations, but to identify
constraints on transformations and on the implicit power of the
4
There was a lot of talk in the sixties (and later) about specific transformations such as the Passive
Transformation, There-Insertion, and Dative Shift.
transformational component. After all, it was realised that the formalism for
identifying transformations in the Aspects Model was not on the right track.
There was a huge range of imaginable transformations that could be formally
described but which did not seem to be affirmed in any known human
language. If the goal of grammatical theory is to explain how human language
works, then the Aspects formalism was missing something.
By the end of the seventies, it became clear that the theory could operate with
a single transformation known as "Move α" 5. This transformation was
understood to mean, "move any constituent anywhere", provided that no
constraints are violated in the operation. At the early stages of REST, it was
proved impractical to reduce every motivated transformation to an instance of
"Move α". As a result an alternative broader transformation known as "Affect
α"6 was proposed. In addition to movement of constituents, "Affect α" has the
ability to rearrange the constituent structure without moving any constituent.
For example, Affect α can insert the dummy do to carry the inflectional
features of the verb in questions and negatives. Many proponents of REST,
however, admit only transformations that are describable as instantiations of
"Move α". These people regard the broader transformation "Affect α" as
evidence that the best analysis has not yet been discovered. In fact, the
notion of "Affect α" is very attractive, but any theory that can dispense with it is
indeed a stronger one.
Up to the seventies at least, the main focus in the grammatical theory was on
"rules", but the attention in REST focussed on general principles of the
grammar. What happens is "Move α", and grammatical theory is concerned
with the general principles that delimit its scope of operation. If the
transformation can be reduced to a single transformation, "Move α", why
cannot we dispense with it altogether? Indeed there are certain frameworks of
syntactic theory that dispense with movement transformations altogether such
as Generalised Phrase-Structure Grammar (hence forth GPSG) and Lexical-
Functional Grammar (hereafter LFG). But in Standard Theory the essential
part of the point is to represent some generalisations that any syntactic theory
5
α is understood as a variable that can stand for any syntactic constituent
6
Αffect α is interpreted as "Do anything to anything"..
of human languages must represent somehow. Non-transformational
frameworks utilise completely different ways to reach such generalisations.
For instance, in an agentless passive clause like "The door was opened", the
constituent "door" behaves in some respects like the "subject" and in others
like the "object". In Standard Theory, this fact is explained by claiming that it is
both the "subject" and "direct object", but at different levels connected by a
movement transformation. On the other hand, non-transformational
frameworks like GPSG and LFG represent grammatical relations such as the
subjecthood of "door" by constituent-structure tree diagrams, but semantic
relations are indicated in the verb's "argument structure". The link between the
two is shown in the verb's lexical entry.
"Movement" in REST is regarded as the major factor in generation. In view of

this framework, movement derives all the processes of the derivation. In
principle, movement occurs if it does not result in violations of any constraints.
First, it was supposed to operate freely, but during the late eighties, Chomsky
and others advocated a principle according to which movement occurs only
when it is necessary. This principle is sometimes referred to as the "Least
Effort" or "Laziness" Principle. This principle is likened to a political principle
called the "Orwellian Principle" (see Steven Schaufele, 1999) which says: "If
not forbidden, then obligatory; if not obligatory, then forbidden"7. The
"Laziness Principle" now serves as a fundamental principle of the Minimalist
Program. In syntactic theory, it typically means, "move if you have to in order
to avoid ungrammaticality; otherwise stay put" (ibid.).
It is worth mentioning that in the eighties movement of phrases was motivated

by the constraints of the grammar, but now the so-called strong features on
heads force movement of phrases to local domains for checking.
2.4. Levels of Representation
7
There is a good discussion of this principle in Steven Schäufele's Synthinar Lecturettes.
In the Aspects Model, a derivation could consist of any number of levels. Each
level differs from the preceding one by a movement transformation. It was
believed that certain transformations might have to precede others in order to
achieve the desired result. Thus, in the Aspects Model a sentence like (1)
would be derived from the "deep structure" in (2) by a dozen of
transformations. For instance, for "Equi-Deletion" to erase the NP "Sam" in
the lowest clause, the lowest clause would have had to be passivized so as to
get "Sam" into the subject position from which it can be deleted.
Note that instead of the sign e, "∆" was used to identify the empty positions in
earlier versions of the theory.
With the invention of "Move α", the motivation behind these assumptions fell
by the wayside. It became clear that there was no need to impose an order on
the application of transformations. As a consequence of the developments
during the seventies, the "deep structure" in (3) as underlying the sentence in
(1) replaced the one in (2) above.
In this case, independent constraints force the Passive transformation to

operate in both the lowest and the highest clauses. It had no importance to
say that Equi-Deletion occurred before or after the passivization of the lowest
clause. If the lowest clause is not passivized, the derivation clashes, and if
It is passivized, then Equi-Deletion would have also occurred automatically.

There is no need to impose an order on transformations. Constituents move
to satisfy general constraints and as far as such constraints are not violated. It
is not important to talk about the order by which transformations operate. In
REST, we do not have to stipulate that certain transformations precede
others, rather, this happens automatically. The point is to simplify the
grammar of a given language by reducing the number of specifications it has
to make. To borrow an analogy from computer science, the hardware can be
complicated, but what we need is very simple software (ibid.).
These advancements defined another derivational organisation for the

framework. While in the Aspects Model a derivation was a linear sequence
comprising a number of levels, in REST it precisely consisted of four levels.
The lexicon lists the lexical items and their properties that make up the atomic
units of the syntax. These properties include, for example, what sort of object
the lexical verb requires, etc. "DS" means "deep structure"; "PF" and "LF"
stand for "phonological form" and "logical form" respectively. "SS" is
understood to stand for "Syntactic Structure". "SS" is central and connected
directly to all the other levels. PF and LF are called the interface levels. PF is
the interface with the phonology where phonological rules apply to give the
string its phonological manifestation. It is similar to the surface structure of the
derivation in terms of the Aspects Model. LF is the interface with the
semantics where meaning relationships of various kinds are explicitly
represented. DS is the interface level with the lexicon where lexical items are
combined. In REST the DS represents the base-generated form of a string.
Some transformational operations process the DS-representation to satisfy
certain constraints of the grammar, and the result is the SS-representation.
SS is not directly interpreted itself, but is converted into PF and LF. "Move α"
operates between any two levels and nothing important can be said about the
order of its operations. The difference in its character is due to the fact that
different constraints operate at different levels. For example, the "Theta
Criterion" is relevant at DS, the "Case Filter" at SS and PF, and the "Empty
Category Principle" primarily at LS8.
The proponents of the Aspects Model claimed that the semantics of language
had to be encoded in the Deep Structure that is the first level in the derivation
of any sentence. They hypothesised that a speaker/writer generates a deep
structure that represents the intended meaning, then performs certain
operations on that deep structure to produce the final surface sentence that
he pronounces or writes. The listener/reader on the other hand, receives the
surface sentence and applies the reverse operations to decode the abstract
deep structure and interpret it. During the period between the sixties and the
seventies linguists recognised that it was not plausible to relate all the
semantics of language to only one derivational level. The evidence came from
the ambiguity of sentences whose surface structure could have two possible
but distinct meanings. Some ambiguities were processed properly within the
Aspects Model. For example, the sentence in (4) has two possible meanings.
Each meaning can be derived from a different deep structure as shown in (5).
(4) The dog saw a cat running in the farm.
(5) a. [ The dog [ saw [ a cat running in the farm ] ] ]
b. [ [ The dog [ running in the farm ] ] saw a cat ]
Therefore, the Aspects Model explained the ambiguity in (4) by assigning it

two different deep structures. It did so because the ambiguity resulted from
the confusion in the status of the grammatical relations between the
constituents "dog", "cat", and "running in the farm" (particularly which one is
the subject of the non-finite VP). Since the Standard Theory encodes
grammatical relations in the Deep Structure, then the ambiguity in meaning
must be caused by a DS conflict (ibid.). a more complex ambiguity results
8
These constraints will be explained in the next chapter.
from the ranking of quantifiers. By way of illustration consider the sentence in
(6).
(6) Faraj loves everybody.
In orthodox REST the meaning of sentences that contain quantifiers as

"everybody" are processed at the LF level. So the example in (6) above will
have to be transformed into (7) at LF. This clarifies the distinction between LF
and DS because the deep structure of (6) is supposed to be as in (8).
Now what about a sentence like (9)? It has either of the meanings in (10).
(9) Everybody loves somebody.
(10) a. There exists some person y such that for every person x, x loves y.
b. For every person x, there exists some person y such that x loves y.
The interpretation (10b) demonstrates that there is a different person for

everyone. Faraj loves Basma, Mohamed loves his wife, Qais loves Lyla,
Ahmed loves Majda, Tariq loves Fatima, and Nabeel loves himself.
The statement (10b) is true if we can find pairs like this for every single human
being. On the contrary, the meaning in (10a) says that there is some special
human being (call it Ala) who is loved by everybody: Faraj loves Ala,
Mohamed loves Ala, Basma loves Ala, etc.
In LF theory the two meanings in (10a-b) are represented by the structures

shown in (11a-b) respectively. For the meaning in (10b) "everybody" must
occur higher in the tree than "somebody". We say that "everybody" has scope
over "somebody", and the reverse is true for the interpretation (10a).
Note that the interpretations shown in (10) and diagrammed in (11) do not
relate to grammatical relations or subcategorisation and therefore cannot be
represented at DS.it became clear from the representation in (7) and (11) that
LF is generated by the application of Move α to SS. DS is considered to be
the original structure of the sentence without applying any operation. Thus,
there must be a distinction between LF and DS.
The importance of saying all this is that in REST two levels of representation
are involved in the interpretation of sentences. At DS, thematic relations must
be represented. For instance, the verification of what constitutes a verb or any
other constituent subcategorizes for ought to be done at DS. LF however,
represents scope relations. DS on the other hand is concerned only with
lexical semantics.
2.5. The History of Derivations

Trace Theory was a very important progress in the early seventies. It is the
hypothesis that, if anything moves while producing a derivation, it leaves a "trace"9 in
its original place. The trace left behind consists of an abstract copy of the moved
constituent but without any phonetic realisation. Traces are identified through co-
indexation with the constituent whose movement had created it. Co-indexation is
usually represented by an index-letter10. It is marked with the same index-letter as the
moved constituent whose movement had created it. Trace theory is mentioned here
to lead to the discussion of a complex constraint on Move α. This constraint is called
"Structure-Preservation Constraint" or SPC. Joseph Edmonds (1976) had argued for
a typology of transformations based on their field of operation and on a constraint
9
A 'trace' is typically represented by a lower case 't' for 'trace' or 'e' for 'empty'.
10
The letter 'i' is used for index; if more than one is needed, the letters after 'I' in the alphabet are
used.
based on this typology. As shown in the list below, his typology classifies the various
kinds of transformations into three types.
(1) Root Transformations.
These transformations apply only to "root" (independent) clauses.
(2) Structure-Preserving Transformations.
This type of transformation leaves the hierarchical organisation of constituent

structure exactly as they found it (i.e. do not create, destroy, or rearrange constituent
structure).
(3) Local Transformations.
This kind of transformation applies precisely to two adjacent constituents. They are
only subject to the conditions within those two constituents. The constituents to which
this kind applies need not to be sisters, but merely adjacent in linear order. There
must be a c-command relationship between the affected constituents, and at least
one of them must not be a maximal projection.
The Structure-Preservation Constraint (SPC) requires that any instantiation of Move

α must fall into one of these three transformations. This means that unless
movement transformation belongs to either Root or Local Transformations, it may not
change the basic constituent structure in any way. To give an example about this,
let's review the history of Passive Transformations. In early Transformational
Grammar, a passive sentence like (1) was considered to derive from a deep structure
similar to its active counterpart in (2). An optional transformation derives the passive
(1) by interchanging the subject and the object NPs. It makes the subject a
complement of a PP, and provides the passive morphology and the auxiliary for the
verb.
(1) The ball was kicked by Ali.
(2) Ali kicked the ball.
The difference in the Aspects Model was that the PP already exists at Deep
Structure, but without complement. The Deep Structure of (1) then would be as in (3).
In Aspects explanations, the PP in the passive sentence (1) is already there,
not created by the Passive Transformation. In this way, the Aspects' Passive
Transformation is said to be "structure-preserving". Since any embedded
clause can be passivized, the Passive Transformation cannot be considered a
Root Transformation. It also involves two NPs that are not adjacent and so it
cannot be a Local Transformation as well. Thus, according to the SPC it must
be a Structure-Preserving one. Assuming that the subject NP "Ali" has moved
into the PP in (3), the result will be a structure like that in (4). The moved NP
"Ali" leaves a trace in the subject position coindexed by the letter "i" with the
former subject NP "Ali". But since traces cannot be erased, it is impossible for
the direct object NP "the ball" to move into this position. To be licensed as a
new subject, changes must occur in the constituent structure, which
unfortunately violates the SPC.
The importance to saying all this is that in REST the DS representation of a

passive sentence like (1) is necessarily as the one in (5) 11. The agent "Ali" is
base-generated as the complement of a "by"-phrase and the subject position
is empty. The direct object "the ball" can move into it without creating any new
structure.
In REST, as it will become clear in the next chapter, this analysis is credited
by concerns of Theta and Case Theory. In fact, there is an exception to the
definition of a "Structure-Preserving" Transformation, namely what is called
"Adjunction". Adjunction is a recursive process that integrates adjuncts
(optional arguments) into the syntactic structure. It targets the X1 projection
as illustrated by the box in (6a).
11
The question of the passive auxiliary is left out because it is irrelevant to the discussion.
Then it makes a copy of the target node right above the original one, as in
(6b). Finally it attaches the adjunct phrase as a daughter of the newly created
node as in (6c). Adjunction seems to create a new structure as it creates a
new node and therefore it cannot be called "structure-preserving". But since
adjunction is a recursive process and because the new V' node immediately
dominates the original V' node, it is considered a "structure-preserving"
transformation. This understanding is explicitly clarified in some works
published in the mid-eighties (see for example Chomsky, 1986b). but what
about "Local Transformations"! Steven Schäufele (1999) argued that they only
occur between SS and PF (e.g. subject AUX inversion) as a consequence of
the theoretical desirability to restrain the power of Move α.
At the end of this section, it is worth mentioning that trace theory and the SPC
together assure that the history of a derivation can always be recovered. As it
will become clear in the next chapter, the framework has some intricate
complications that might obscure the derivational history, but to some extent,
they are comparatively few. The result is that even at LF we can reconstruct
the base DS representation of a string. In fact this is crucial to the operation of
the framework, moreover it is attractive to many theorists.
CHAPTER (3)
THE MODULES OF THE GRAMMAR
3.1. Overview
As we have seen in chapter (2), the shift in attention in the Chomskyan
Theory was from transformations to constraints on them. These constraints
are grouped into what are often called "modules". These modules are semi-
autonomous systems consisting of basic principles and constraints on them.
Each module is relevant at certain levels of a derivation. The derivation of a
grammatical string involves the interaction of these different modules. In order
to be considered grammatical, a string has to be approved by all the modules.
Note that Chomsky here takes the grammar to be both autonomous and
"internally highly modularised, with separate subsystems of principles
governing sound, meaning, structure and interpretation of linguistic
expression . . ." (Chomsky, 1991). He maintains that linguistic theory is
concerned with a specific mental faculty operating in the brain, not external
such as linguistic behaviour. This is reflected in his 'competence-performance'
dichotomy.
It has been shown that grammar is autonomous and modular. This was
proved by the fact that people may damage one of their brain faculties in an
accident but still be able to use the others in an efficient way. For example, a
person whose brain was injured in an accident might have lost the ability to
speak but nevertheless he can perform well in solving very complicated
mathematical operations or in using aerodynamics to design a high-tech
aircraft. This fact led to the conclusion that the human brain consists of
independent (autonomous) faculties. Therefore the human brain has a
modular structure. Although the mind includes distinct modules responsible
for different abilities, using language requires the interaction of these
independent faculties.
The language faculty inside the brain is also modular in the sense that it is
highly structured (see McGilvary, 1999: 3-4, and Smith, 1999: 7-21). It
consists of separate subfaculties responsible for language acquisition,
production, and comprehension. In this way the structure of language faculty
is similar to that of the brain, which consists of various faculties responsible
for different senses such as vision, sight, smell, hearing and touch. Although
these faculties are separate components of the brain, there is no reason to
assume that they do not interact. Likewise, the separate sub-faculties
responsible for language interact to yield the expected effect (ibid.).
In each of the subsequent sections, I will explain one of these sub-faculties

(modules) as viewed in the PPA framework. In this section, I will restrict
myself to just defining them.
(1) Ẋ (X-Bar) Theory is the theory of phrase structure. It identifies the shared
characteristics in the internal structure of the different kinds of phrases and
the relations between them. It applies primarily at DS.
(2) Ө (Theta) Theory is concerned with the assignment of semantic roles to

NPs and with the fulfilment of subcategorisation requirements. It applies at
DS, but a formalisation of the Structure Preservation Constraint (SPC) known
as the 'Projection Principle' assures that it also applies at all levels. The
Projection Principle (see V. J Cook and Mark Newson, 1988: 49) "requires the
syntax to take into account the specifications for each lexical item given in its
entry in the lexicon". For example, if a verb requires an object, then this object
must be present at all levels. But when movement happens as in (1), the trace
left behind satisfies this Projection Principle.
(3) Government Theory states formal relations between constituents in a

constituent structure represented in a tree diagram. Government is "a general
principle by which elements are assigned 'cases' by other elements that c-
command them" (Matthews, 1997:149). Other definitions of government vary,
but in the sentence "Mohamed wrote the letter", the NP 'Mohamed' is both
commanded and governed by 'INFL', which assigns it 'nominative case'.
Similarly, the second NP 'the letter' is governed by the verb, which assigns it
'accusative case'. In fact, Government Theory is considered a resource for the
other modules rather than a module by itself. It provides them with the tools
needed to carry out their job.
(4) Case Theory is "concerned with the assignment of abstract case on the
basis of relations of government" (ibid.: 47-48). It is relevant at SS and PF. It
decides whether a given NP is in a legitimate slot in constituent structure or
not.
(5) Binding Theory is concerned with the coreferentiality of distinct syntactic

constituents like reflexives, pronominals, and empty categories. It applies at
SS, but mainly at LF. Binding Theory describes how a constituent determines
the reference of another in the same context. For example, in the sentence
"Abd-Alkareem thinks Basma likes him" the pronoun "him" cannot refer to
"Basma". The function of this module can be shortly clarified by the following
sentences:
(2) He likes Ahmed.
(3) He remembers that Ali likes Ahmed.

The noun phrase represented by the pronoun "he" is fully referential; that is, it
cannot be bound by any local or no-local phrases such as the phrase
"Ahmed" in both sentences.
3.2. Ẋ Theory
3.2.1. Introduction
Ẋ Theory was first introduced with the publication of Noam Chomsky's paper
"Remarks on Nominalisation" in 1979. In this paper Chomsky identified nodes
like NP, VP, etc. with sets of feature specifications common to various nodes.
Thus, a lexical noun like "book" and a complex NP like "the last book that
Faraj wrote several months ago" share similar "nominality" features. Both are
referential and they can be inserted in similar positions. Similarly, a lexical
verb such as "speak" has certain qualities in common with VPs like "speak the
speech trippingly upon the tongue".
On the contrary, lexical nouns and verbs like "book" and "speak" have some
features that other NPs and VPs do not share with them and vice versa. The
node labels NP and VP are therefore classified into two types : a "Category
Type" (N or V) and a "Projection Level" (NP or VP). The projection level is
usually referred to as bar level and it has a numerical value. Lexical items,
however, have a zero bar level. Higher bar levels are identified by one or
more horizontal lines : or by primes after it: N', V''. Another way is to write
a numeral after the category label: N1, V2, etc. 12.
In the seventies there was a considerable debate on the number of bar levels.
Although Chomsky thought that two levels above the lexical are enough,
Peggy Speas (1990) asserted that there is no good reason to have more than
one bar level. Whatever the number of bar level is, a node that is given the
maximal value is called a "maximal projection". Therefore, nodes like NP and
VP are called "maximal projections".
12
I will use this technique to refer to specific bar levels.
3.2.2. Universal Base and Functional Heads
The basic assumption of X-bar Theory as introduced in the seventies and
eighties was the generic pattern given to the internal structure of any maximal
projection "XP". A maximal projection "XP" has a lexical head X' and other
dependent constituents as its daughters. These dependents are classified into
three kinds: specifiers, complements, and adjuncts. All these dependents bear
different relations to the head. The difference between specifiers and
complements lies in the distinction between bar-1 and bar-2 projections. A
maximal projection XP (i.e. bar-2) has two daughters: a head X1 (i.e. bar-1)
and a specifier. The projection X1 dominates the lexical head X' and its
complements. Both complements and adjuncts differ only in that complements
are sisters of the head whereas adjuncts are sisters of the X1 projection.
Adjuncts are similar to specifiers because the two are sisters of the X1
projection. But specifiers differ from adjuncts because they are daughters of
the maximal projection whereas adjuncts are daughters of the X1 projection.
The table given below summarizes the different syntactic relations of the three
dependent constituents and their relations with the head.
Specifiers are usually articles, determiners and degree adverbs as "very" and "too" in
adjective phrases (APs) like "very difficult" or "too hard". Complements however, are
the constituents for which a verb or a preposition subcategorizes13. Before 1980
complements were considered to be maximal projections but specifiers were either
maximal projections (e.g. genitive NPs modifying larger NPs, as in "[ [Lockerbie's]
13
We will discuss this issue in the section on Theta Theory.
case]") or X0 lexical heads (e.g. "the" in "the case") or morphological elements such
as the prefix representing definitions in Arabic. In the late 80's some linguists tried to
re-evaluate this controversial issue. Due to the work done by Steven Abney and
others (see S. Abney, 1987), specifiers are now regarded as being maximal
projections. Abney's work also redefined "noun phrases" as "determiner phrases"
(DPs). The head in a phrase like "the government" is the article "the" and
"government" is just its complement. The reason behind mentioning Abney's
argument here is due to the fact that some linguists talk about "DPs" instead of
"NPs".
All category types are supposed to have the same projection structure discussed
above. The constraints of Theta and Case theories permit different categories to take
distinct kinds of complements. But what about specifiers? If Abney's argument is not
right, then the specifiers of NPs are very clear as hose italicised in (1).
(1) a. a book
b. the book
c. some teachers
d. the leader's speech
PPs and APs also have adverbial modifiers as specifiers like those in (2) and (3).
VPs too have adverbial modifiers as in (4), but unlike the case with APs and PPs
they are not called specifiers; the adverbial modifiers of VPs are called adjuncts.
Adjuncts are optional arguments integrated into the syntactic structure by a recursive
process called Adjunction. Adjunction targets the X1 projection as indicated by the
box in (4a). then it makes a copy of the target node right above the original one as in
(4b). finally it attaches the adjunct phrase as a daughter of the newly created node as
in (4c).
(2) far under the table.
(3) a. very hot
b. not as early as expected

During the past sixteen yearslinguists hypothesised that the subject is actually
the specifier of VP. This hypothesis is labelled with three different names 14.
One label is "Internal Subject Hypothesis" (Schaufele, 1999) which is self-
evident. The assumption is that the subject NP is base-generated under VP
and then moves to the specifier position of "S" due to Case Theory
constraints. Another label is the "Lexical Clause Hypothesis" (ibid.). the claim
here is that a syntactic clause has two parts that can be identified easily at
DS: a "lexical clause" and a "functional clause". The lexical clause includes all
the lexical items and the VP node dominates it. The main verb is the head and
the subject NP is in the specifier position, but the other arguments of the verb
are in complement positions. Higher in the tree is the "functional clause"
which is made of a series of "functional heads". The last name is the
"Universal Base Hypothesis" (ibid.). linguists here deduced that all lexical
heads have the same X-bar structure. Every lexical head has at least a
specifier and complement(s) related to it. It should be noted that all the above
mentioned hypotheses are similar because they aspire to prove the same
thing.
14
At least that is as many as I have encountered so far.
After the developments in the eighties this structure was also used in the
description of the so-called "functional categories". Functional categories were
introduced from the beginning in the Standard Theory. A node "AUX" was
assumed in the Aspects Model, which was considered to dominate auxiliaries.
It was assumed very early in the Aspects Model that something was
necessary for constituent structure to account for the base-generation of the
verbal inflections in languages that do not use auxiliaries 15. As the eighties
progressed, the node "AUX" was replaced by "I NFL". This node was
considered to exist in all languages. It dominates all the abstract features that
license the verbal inflectional morphemes. "C OMP" is also a functional head
that was introduced quite early in the history of the framework. The version of
theory which was developed during the 70's posited a "C OMP" that might be a
base position for overt complementizers like the underlines in (5) or covert
complementizers as in (6) or a landing-site for WH-elements as in (7).
(5) a. We know that democracy means popular rule not popular expression.
b. I would be very grateful if you help me.
C. For Faraj to be working everyday would be boring.
(6) a. We know 0 representation is a falsification of democracy.
b. I thought 0 Asma would be here this afternoon.
(7) a. Whom did you find at the party?
b. He introduced me to his boss whom I had not met before.
c. Whose house did you say was made of glass.
d. I saw something in the paper which would interest you.
In 1986 Chomsky applied the X-bar structure of lexical heads to "I NFL" and
"COMP" (Chomsky, 1986b). Therefore, both "INFL" and "COMP" were supposed
15
See for example the Aspects Model's "affix-hopping", but note that it is not used nowadays. In the
minimalist framework a word is supposed to carry its features when it enters derivations. The
function of a head as INFL is not to supply the inflectional morphemes but rather to justify them on
the lexical items via checking. This implies that a verb is already inflected with tense/agreement
markers when it merges with another syntactic object, say, an object DP. Then it moves to INFL either
at SS or at LF so as to check its inflections against the abstract features found there.
to license full projections. "INFL" subcategorizes for VP as its complement and
the surface position of the subject as its specifier. "IP", the maximal projection
of "INFL" is what previously was called "S". "COMP" however, have an "IP"
complement. "COMP" or "C" is the place for complementizers as in (5-6), and
its specifier position is the landing-site of WH-movement as in (7). "CP" is the
maximal projection for "COMP". In the 70's "CP" was referred to as "S". There
seems to be important disputes on the problem of whether every "IP" must be
a complement of a "CP" or not. In fact the presence of a "CP" node is
assumed whenever it is required.
By the end of the 1980s "INFL" was considered to be overly simplified. At the
present day there is a functional node for every verbal inflection. So we have
categories for Tense (TNS), and Aspect (ASP) as well as "AGR" for
agreement features. It is very common to differentiate between "AGRs" and
"AGRo" for subject agreement and object agreement respectively. Steven
Schäufele (1999) argued that Mood could have its own node too. All these
functional heads occur in the position that was first occupied by "INFL". Thus,
instead of working within the structure Chomsky postulated in 1986,
represented here in (8a), we have to work within the structure in (8b). these
functional heads must not always be as in (8b).
Their order is dependent on the language in question. For instance, if TNSP is
the complement of Agrs or vice versa. The strength-weakness distinction (see
Chapters 16, 17 and 19 in Ouhala, 1999) assumed in dealing with the
structure of languages plays a central role in ordering the functional
categories, particularly TnsP and AgrP. (8b) therefore is just a hypothetical
structure that can be altered (rearranged) according to the strength or
weakness of the functional categories peculiar to the language in question. It
is clear from this structure that everything above "VP" at "DS" is functional
without any phonological manifestation16. "VP" dominates the lexical clause.
Hence the "Lexical Clause Hypothesis" which differentiates between the
"lexical clause" (i.e. VP) and the "functional clause" (i.e. the projection above
VP).
16
Here we ignored the assumption that there is a NEGP between VP and AGRS, which contains clausal
negators such as "not" in English. COMP is also ignored because it usually dominates lexical items as
in (5).
The hypothesis that increased the number of functional categories is
sometimes called the "Split Infl Hypothesis", and Steven Schäufele (1999) has
at least once referred to it as the "Exploded Infl Hypothesis". In this
hypothesis the node TNS has to be linked to Tense features, the ASP with
aspectual features, and everything related to agreement has to be correlated
with the appropriate AGR nodes. These assumption are by no means arbitrary
stipulations. Firstly, it has long been assumed that tense and agreement
features are associated with "AUX/INFL", a node separate from the verb in
pre-GB theory. This is contrary to a lexicalist account of tense/agreement
features (e.g. Chomsky, 1995) in which these features have been affixed to a
word when it enters syntactic derivations.
3.2.3 The Inadequacy of PS Rules

The fact that generalisations could be made across category-types has
caused a progress in the development of X-bar theory. One of these
generalisations is related to the order of constituents. A grammar of English,
for example, does not have to specify that NP complements follow head verbs
or prepositions whereas PP complements follow head nouns. To reach
adequacy, a grammar of English has only to assure that complements follow
heads (Ouhala, 1999). The specifications for the kind of categories that can
be used as specifiers or complements should be stated in the lexical entries of
heads and the generalisations of Theta Theory (among others see Schäufele,
1999 and Ouhala, 1999).
The objectives of all that was to get rid of all the phrase-structure rules of the
pre-1980 generative theories. It was assumed that in the description of any
language, its grammar should only specify the order of heads, specifiers, and
complements. Other details are left to either UG or independent constraints.
From the discussion above we see why this framework is sometimes referred
to as the "Principles and Parameters Approach". Because X-bar Theory is a
crucial aspect of UG, it must be innate. But whether heads precede
complements or follow them is dependent on the language or the category
type in question. It is variable from language to language. Therefore, it is a
parameter of UG that permits variation in languages and so the language
learner has to acquire it from the data available to him. To borrow an analogy
from the computer science, imagine the internal grammar to be something like
the "Windows Operating System" with several language choices when you
start its setup program. If you choose Arabic, the Windows' interface will be in
Arabic language, but if you choose English, you will have an English
Windows' interface. The internal code of the Windows program represents
UG, and the language choices represent the parameters. Just as the goal of
'Microsoft Company' is to enable its windows program to work with interfaces
in all languages, the main goal of research in PPA has been to recognize and
minimize the number of parameters, while saving descriptive adequacy over
the many attested human languages.
In conclusion, the objective of work in REST on X-bar Theory was to eliminate

the Phrase-structure rules and to prove that all the facts they explain can be
derived from the general principles of the grammar. X-bar Theory not only
exists but also remains the backbone of the computational aspect of syntax in
the Chomskyan model. Phrase structure rules were eliminated, but this was
made possible by means of a separation of the computational aspect of
syntax. (X-bar theory) from the lexical components (Chomsky, 1987: 25). This
was considered a major development.
3.3. Ө Theory
3.3.1. Introduction
In our discussion of X-bar theory we said that DS represents thematic
relations. In REST, thematic roles are assigned to nominal constituents by
verbs and other constituents which have the ability to license them (ibid.).
The continuous use of the Greek letter Ө (theta) in REST work as an

abbreviation for the word 'thematic' made it possible to talk about 'Ө-roles' and
' Ө-relations' instead of talking about 'thematic roles' and 'thematic relations'.
Ө-roles are semantic roles like Agent, Theme, Goal, Experiences, etc..
Although REST does not concentrate on the kind of Ө-role assigned to a
given nominal, it focuses on the fact that nominal bear Ө-roles. Ө-Theory
presupposes an essential distinction among syntactic categories. The crucial
asymmetry has to do with the categorical feature ±N. nominal are +N, so they
have to be assigned Ө-roles, but verbs and prepositions being –N have to the
ability to assign Ө-roles (Ouhala, 1994).
3.3.2. The Ө-Criterion

The fundamental base in Ө-Theory is the Ө-Criterion. The Ө-Criterion
establishes a one to one correspondence between the set of NPs and Ө-roles
found in any clause, such that each NP receives only one Ө-role and each Ө-
role is assigned to only one NP (ibid.). the variation in the number of the Ө-
roles from clause to clause is dependent on the number of nominal in any
given clause. It is not obligatory for clauses to manifest all their possible Ө-
roles because some of them may be optional. For example, in a sentence like
(1a), the verb "walk" has only an Agent Ө-role to assign. Similarly, the verb
"like" assign two Ө-roles ('Experiencer' and 'Theme') as shown in (1b), and a
verb like "give" licenses three Ө-roles ('Agent', 'Beneficiary', and 'Theme) as in
(1c).
(1) a. She walked.
b. He likes Pepsi.
c. I gave Basma a book.
'Ө-grid' is a term used to talk about the list of Ө-roles which a given verb can
assign17. The nominal assigned one of those Ө-roles is then considered to fill
the slot in the verb's Ө-grid (Schäufele, 1999). The Ө-grid of a given verb is
saturated whenever the roles are filled. The assignment process of Ө-roles to
the suitable constituents is called "saturation". Saturation is a semantic
condition imposed on linguistic expressions. Prepositions assign a single Ө-
17
The term " Ө-grid" is a synonymy for the terms "valency" and "subcategorisation frame".
role within the PP headed by that preposition. NPs specifiers and
complements receive their Ө-roles inside the NP that contains them. Every
VP includes all the arguments to which its head verb assigns Ө-roles. An
exceptional case in VPs is their specifier, but if we take into account the
Internal Subject Hypothesis, then even VP specifiers must receive their Ө-
roles from the head verb, at least at DS which is the working domain of Ө-
Theory. Another way of assigning the external Ө-role to the specifier of VP is
through the V-bar projection. The verb together with its complements is said
to assign an external Ө-role to the specifier of VP (ibid.).
I mentioned above that the number of Ө-roles from clause to clause is

dependent on the number of nominal in any given clause and I also stated
that "walk" assigns only a one Ө-role, but what about the sentence in (2)
which has two NPs.
(2) I walked to the shop.
Here adjuncts enter the scene. The verb "walked" is the head and because it
assigns only a single Ө-role, it must have only a single argument 18. An adjunct
is a constituent that is neither a head nor an argument of the head. The string
"to the shop" in (2) is an adjunct. It is in the form of PP. the head of that PP is
"to" which also has a Ө-grid and can assign a Ө-role to its complement ('Goal'
or 'Path'). Thus, the NP "shop" does not receive its Ө-role from the verb
"walk", but it is the preposition that assigns it. Therefore, sentence (2)
contains two NPs and two Ө-roles, but each is assigned by a different lexical
head (Schäufele, 1999).
As demonstrated in (1c) above, the verb "give" assigns three Ө-roles to three
different NPs. But as shown in (3) one of these NPs is dominated by a PP. so
does "Basma", like "shop" in (2) get its Ө-role from the preposition "to" or from
the verb "gave as shown in (1c).
(3) I gave a book to Basma.
In short, "Basma" still receives its Ө-roles from the verb "gave", and the
preposition "to" in (3) is merely a dummy case-marker inserted to satisfy the
18
"Argument" here means an NP filling a slot in the head's Ө-grid.
conditions of Case Theory19. It is also assumed that "to" mediates the
assignment of the Goal Ө-role by the verb to its indirect object. From (1c) and
(3) we see that there are two ways in English for assigning three Ө-roles by
the same head. In fact, it is logical to say that a verb like "sell" assigns four Ө-
roles, Agent (the seller), Recipient (the buyer), Theme (the thing sold), and
the price (Path) (ibid.). as shown in (4), there is no option for the last
argument but to be realised as a PP, though its Ө-role is assigned by the
verb.
(4) a. He sold a book to me for 25$.
b. He sold me a book for 25$.
The sentences above showed that "sell" assigns four Ө-roles. Now consider
the example in (5) where "sell" assigns only three Ө-roles. Does this violates
the Ө-Criterion?
(5) He sold me a book.
There are two ways to solve this problem. The first says that the verb's Ө-
roles exist in its Ө-grid, but one of them is assigned to a null NP 20. The second
says that some verbs have optional arguments that can be left out. This
approach clarifies whether a verb is obligatory transitive like "devour" in (6) or
optionally transitive like "eat" in (7). It is worth mentioning that verbs, which
are optionally transitive, have some generic argument when the argument is
not stated explicitly (i.e. lexicalized). For example, the verb eat in (7) has the
generic argument "food".
(6) a. He devoured the sandwich.
b. *He devoured.
(7) a. Have you eaten anything?
b. Have you eaten?
19
Case Theory will be discussed in a separate section.
20
As it will be discussed in the section on Case Theory, a condition called the "Case Filter" says that if
the NP has no phonological manifestation, it does not need case and thence does not need a dummy
preposition to case mark it.
Essentially, the Ө-criterion holds at DS where Ө-relations are determined, but
by virtue of the Projection Principle it is expected to hold also at the level of S-
structure and LF. It does not apply at PF because its derivation from SS
essentially involves the elimination of traces and other non-phonological
elements. The Projection Principle (PP) requires that subcategorized
categories be present at all syntactic levels, but says nothing about non-
subcategorized categories such as subjects. The Extended Projection
Principle (EPP) extends this requirement to subjects (Ouhala, 1994). Verbs
like "rain" in (8) do not assign a Ө-role at all and they cannot even license a
subject position. This would seem to violate the Ө-Criterion. In such cases a
dummy NP "it" is inserted in the subject position and is licensed by the
Extended Projection Principle.
(8) It is raining.
Finally, the representation of argument\thematic structure demanded the

classification of syntactic positions into four types: argument positions (A-
positions hereafter), non-argument positions (A'-positions hereafter), Ө-
positions, and Ө'-positions. A-positions are nodes where arguments can be
found at DS or LF. The internal arguments of lexical heads or specifiers of
clauses (i.e. Spec, IP) can occupy these positions. The internal arguments
(complements) can either be a referential noun phrase or a variable such as
traces of moved quantifiers or wh-phrases. A'-positions on the other hand, are
those in which a non-argument can be found at LF. They include adjoined
positions and Spec, CP. Adjoined positions are usually filled by either moved
categories such as topicalised phrases or raised quantifiers, or by base
generated modifiers such as adverbs and adjectives. Ө-positions are those to
which a Ө-role can be assigned. Thus, the positions that are filled by the
internal arguments of lexical heads are Ө-positions. Subject positions are also
Ө-positions if the lexical head assigns an external Ө-role otherwise it is a Ө'-
position. The raising verb "seem" for example, does not assign an external Ө-
role and that is why the subject position of clauses which contain a raising
verb can be filled by an argument moved from the embedded clause as in (8)
or by the expletive "it" as in (9).
Spec, CP and adjoined positions, which are usually filled by non-arguments,
are also Ө'-positions.
It should be noted that although all A'-positions are Ө'-positions, not all A-
positions are also Ө-positions. Because as we saw above, the subject
position (Spec, IP) is an A-position, but the subject positions of clauses with
raising verbs are Ө'-positions even though they are A-positions (Ouhala,
1999: 161-162).
3.4. Government Theory

Government Theory is a very essential module in REST. Defining the various
structural relations has complicated the government module. In chapter (2) we
have seen that the structural relations had to be determined only in terms of
constituent structure. Government relations ought to be defined similarly (i.e.
by means of constituent structure). Our concern in this chapter is to define the
essential government relations of REST. A part from the basic dominance and
precedence relations there is the crucial c-command relation. The "c" is an
abbreviation for "constituent". C-command is a relation between the nodes
involved in a constituent structure tree diagram. It holds between two nodes:
A and B. A c-commands B if and only if every branching node dominating A
dominates B and neither A nor B dominate each other. Thus, because sisters
share the same dominating node, they always c-command each other. This c-
commanding relation of sisters is considered symmetrical and is not of much
importance. The important relation is asymmetrical c-command. Before
pushing towards the complex government relations, this variant of c-command
that is known as m-command has to be introduced. "M" is an abbreviation for
"maximal". M-command does not take into account every branching node. It
only considers maximal projections. By way of illustration take a look at (1)
below.
None of the words in the genitive NP "the boy's" c-commands the N1 "blue
bike", and none of the words in the N1 "blue bike" c-commands the genitive
NP. The words "blue" and "bike" however, do m-command the genitive NP
asymmetrically, because the first maximal projection at the top dominates
both "blue bike" and the genitive NP.
Definition of M-command:
A m-commands B if and only if:
(i) Both A and B do not dominate each other.
(ii) The first maximal projection that dominates A also dominates B.
Before defining the proper government relation, we have to decide what a

potential governor is. All lexical categories and I NFL or AGR have the ability to
be potential governors. A potential governor is a constituent co-indexed with
the constituent in question. To govern a constituent B, a potential governor A
must fulfil the following conditions:
a. A must c-command B.
b. There must be no barrier between A and B.
c. A must be the minimal governor of B.
The last condition is referred to as the Minimality Condition. It ensures that the
proper governor must be the closest to B in this case.
In GB terms, government accounted for Case assignment. It also accounted

for the distribution of PRO. In MP accounts however, there is no Case
assignment. Checking the so-called Case features in the specifier of a
functional phrase (such as AGRP) is the alternative to Case assignment. This
alternative also accounts for the distribution of PRO. In GB literature, we read
that PRO must not be governed, but now PRO only has to check its null Case
features in the specifier of nonfinite TNSP. Finally, Government theory has no
role in the new minimalist approach, but a condition called the Empty
Category Principle, which requires all traces to be properly governed, is not
properly accounted for in the minimalist mechanisms yet (Schäufele, 1999).
3.5. Case Theory
3.5.1. Introducing Case Theory and the Case Filter

In REST, Case has to do with the licensing of nominal. It provides an answer
to the question why NPs occur in certain positions but not others. For
example, (1a) is grammatical, but why replacing the verb 'bought' with a
synonymous noun as in (1b) necessitates the addition of the suffix "-s" and
the preposition 'of' as in (1c). Furthermore, why is the sentence in (2a)
grammatical when the subordinate verb is finite but more substance is needed
if it is infinite as in (2c).
(1) a. Sam bought a new bookcase.
b. *Sam purchase a new bookcase.
c. Sam's purchase of a new bookcase.

(2) a. That Sam bought a new bookcase surprised Terry.
b. *That Sam to buy a new bookcase surprised Terry.
c. For Sam to buy a new bookcase surprised Terry.
Answers to these questions were provided in the work of REST's linguists

during the seventies and early eighties. The solution was that all human
languages show Case symptoms (i.e. Case features), and that UG has a
'Case Filter' that bans overt NPs from appearing at PF without having Case
assigned to them. Case here is a relationship between NPs and their
governors that assign them distinct Cases. Verbs and prepositions usually
have the ability to assign Case, but not always (because some verbs such as
passives do not assign Case)21.
Case is assigned under the conditions of government (c-command or m-

command) in the absence of a barrier. Case Theory deals with the sentences
in (1) above by giving a transitive verb like 'bought' the ability to assign the
accusative Case to its object NP which in turn enables it to pass the Case
Filter. Being finite, the head of the clause INFL/AGR assigns nominative Case
to the subject NP and therefore it gets past the Case Filter. A noun however,
cannot do either and that is why replacing the verb 'bought' with the synonymy
noun 'purchase' in (1b) produces ungrammaticality. This ungrammaticality is
avoided by inserting the dummy preposition so as to enable the object NP to
pass the Case Filter. The addition of the genitive suffix "-s" to the NP 'Sam' in
(1c) empowers it to get past the Case Filter.
Although 'buy' in (2b) is still empowered to assign objective Case when it is

nonfinite, its subject violates the Case Filter. It requires the addition of the
complementizers 'for' as in (2b). The complementizers 'for' etymologically is a
preposition, and thus, capable of assigning Case and rescuing the subject
clauses have no Case assigners, but fortunately, S/IP is not a barrier to
government for its subjects, and therefore, the subject can be governed from
outside (e.g. by a complementizers as in (2c)) and hence be Case-marked.
The Case Filter in REST motivated the movement of NPs from object
21
We will discuss this issue in the next subsection.
positions to subject positions in passive constructions. In an active transitive
clause as in (1a), the verb assigns two Ө-roles to two distinct NPs at DS. It
also assigns accusative Case to its object NP. The subject NP gets its Case
from the head of S/IP. But as shown in the passive counterpart in (3), the verb
assigns only a single Ө-role to its internal argument at DS and the subject
position is empty as in (3a).
(3) a [ e was bought a new bookcase]
Although the passive verb is empowered to assign a Ө-role, it is not capable

of assigning Case to the object position. The incapability of the passive verb
to assign accusative Case to its internal argument is due to the affixation of
the passive morpheme to the base verb. "The passive morpheme is often said
to 'absorb' the (accusative) Case of the passive verb " (see Ouhala, 1999:
212-213). Thus, the object NP in (3a) is without Case and violates the Case
Filter. Move α has a role in such clauses, provided it does not violate any
constraints. In (3a) we have an empty position to which the NP 'a new
bookcase' can move as in (3b) where it can pass the Case Filter by means of
finite S/IP whose head can assign nominative Case.
(3) b. A new bookcase was bought.
Another type of verbs that do not assign accusative Case to their internal
argument is called Unaccusatives. They resemble passives in that they do not
assign an external Ө-role and Case to their internal argument. But unlike
passives, unaccusatives do not manifest a morpheme that can be said to
absorb the (accusative) Case. Among others, the following verbs are all
unaccusatives: "break", "die", and "open". The inability of both passives and
unaccusatives to assign external Ө-roles and accusative Case led to the
generalisation that if a verb does not assign an external Ө-role to the subject
position, it will probably be unable to assign accusative Case to its internal
argument too and vice versa (see Ouhala, 1999: 173-175 and 212-213).
Concluding this subsection, it is time to introduce the notion of 'Chains'. A

chain refers to the different positions a given constituent occupies during the
derivation of a structure. If a constituent remains in situ throughout the
derivation, then we have a one membered chain. But if the constituent in
question moves to satisfy any constraint, principle or condition, then we will
have a chain with more positions indicated by traces as shown in (4).
In fact not only Case is assigned to chains but also Ө-roles. Every chain can
have a single Ө-role and a Case. Thus, each chain must contain only one Ө-
position and one Case-position. The position that is c-commanded by all the
other positions is the typical Ө-position, and the one that c-commands all the
other positions is typically the Case-position.
3.5.2. How Many Cases Are There?

If it is true as it is assumed, then every human language has Case-features
which are isomorphic with its overt morphological cases (Schäufele, 1999).
English for example exhibits three different Cases on the personal pronouns
they/them/their. In PPA accounts one finds three distinct Cases at the
syntactic level. The nominative Case is for the subject of finite clauses,
genitive Case found in the first NPs in each of the phrases in (1), and there is
also the objective Case assigned to the complements of verbs and
prepositions. We do not have to differentiate between the Cases assigned by
verbs and prepositions as shown in (2) in which the pronoun 'you' bears the
same Case even when it has a distinct Ө-role in each phrase.
(1) a. Your cat
b. Sam's cat Belhob
c. Belhob's feeding by Sam
d. Sam's feeding of Belhob
(2) a. Love you
b. Remember you
c. To you
d. With you
e. Think of you
It is assumed that English has a three-way distinction of Case both in its

morphology and syntax. It is also assumed that the number of abstract
syntactic Cases in all languages equals their morphological ones. Contrary to
this is the fact that there are languages without overt morphological distinction
among Cases! The significance lies in the mechanism by which cases are
assigned to NPs, the kind of Cases they get, and in the number of Cases in
the grammar.
In Chomskyan Theory, the subject-object distinction is accounted for by two

longer but more precise structural relations. A subject is an NP immediately
dominated by S/IP and an object is an NP immediately dominated by a VP.
But Case assignment is also an admissible way for representing grammatical
relations. Moreover, certain verbs in some languages assign different Cases.
However, only "the nominative/accusative system tends to be regarded as the
'canonical' case system in generative grammar" (Spencer, 1991: 257). In spite
of that Spencer (1991) has differentiated between two case systems among
languages. These are: nominative/accusative languages and ergative
languages. His typology is summarized in (3) below (ibid.: 256-258).
(3) a. Nominative-Accusative Marking:
Intransitive Subject: Nominative Case
Transitive Subject: Nominative Case
Direct Object: Accusative Case
b. Ergative Marking:
Transitive Subject: Ergative Case
Intransitive Subject: Absolutive Case
Direct Object: Absolutive Case

In languages where nominative case assignment is found other special cases
for complements and adjuncts are also present. Generally, a language will
exhibit a 'Dative' case for indirect objects, an 'Instrumental' case for means,
manner or agent.
3.5.3 What can Assign Case?

As mentioned earlier, verbs and prepositions (i.e. - N) can assign case. Some
verbs, nevertheless, are not capable of assigning case as observed in passive
verbs whose complements have to move in order to to get past the Case
Filter. Other verbs like the copulas do not have the case assigning power too.
The functional head of a clause INFL/AGR is also empowered to assign
nominative case to the subject of finite clauses.
There are two kinds of Case assignment, Structural and Inherent. Structural
Case is assigned to an NP by its governor. For a governor to assign Inherent
Case, it must also assign a Ө-role to the NP in question. The difference
between structural and inherent case is that structural case is determined at
S-structure and does not necessarily involve a thematic relation between the
assigner and the assignee (see Ouhala, 1999: 218 – 220, 395 – 397).
Nominative and Accusative Cases are both structural Cases. Genetive Case
is also a structural case since it is assigned via Spec-Head Agreement with D,
which does not bear any thematic relation to it. A Case which is determined at
D-structure and involves a thematic relation between the assigner and the
assignee, is called inherent Case. Oblique Case assigned by prepositions to
their complements qualifies as inherent Case.
3.5.4 Under What circumstances can Case be Assigned?

There was a consensus that Case is assigned under government provided
that Minimality conditions and Barriers are respected. Adjacency is another
requirement for Case-Assignment that states that complements must be
adjacent to their case assigning governors. Traces play a very important role
to satisfy the Adjacency Condition. Adjacency is also obtained by the traces
left behind by movement. As we will see from the following example,
topicalisation is a topical example to explain the adjacency condition.
The verb "explain" in the example above requires an NP to assign case to it,
but the NP in question has moved to the front of the sentence (i.e.
topicalisation). The moved NP "this condition" establishes a chain with the
trace left behind by movement. This chain works as a medium through which
the moved NP transmits its properties to the trace. The trace receives case
from the verb and transmits it back to the moved NP. Because the trace is
adjacent to the verb and has the very same properties of the moved NP then
the adjacency condition is satisfied in this way.
3.5.5 Case and Minimalism

The difference between Case Theory in the eighties and now is exhibited in
the latest Minimalist Theory in which nothing has the power to assign Case, at
least in syntax. In Minimalism, NPs enter derivations along with their Cases.
They (i.e. NPs) bring their Cases from the lexicon as well as the other
idiosyncrasies peculiar to them. The syntactic component only checks if the
NP in question has the proper Case and decides whether it can be licenced
on that NP or not. But given the three principles of Minimalism, how can the
Case Filter motivate movement if NPs have their Cases inside the lexicon.
Laziness
A constituent does not move unless it has to.
Procrastination
A constituent does not move any earlier than necessary.

Greed
A constituent does not move to satisfy a constraint that properly applies to

another constituent.
The principle of Greed prevents the Case Filter from motivating movement
because if the subject NP for example, is base-generated under Spec-VP with
its Nominative Case already brought from the lexicon, why should it move to
Spec-S/IP? Here licensing plays a very crucial role. The NP's base position
licenses its Ө-role but not necessarily its Case. If its Case is not licensed in its
base position, then it has to move to the Spec position of the appropriate
functional head where its Case can be licensed. In short, thw abstract Case
features that an NP bears must be eliminated before the derivation is
complete. This is due to a principle known as Full Interpretation (FI) which
requires the elimination of unnecessary symbols that play no role in the
interpretation of expressions (Chomsky, 1997).
In the eighties, the Case Filter was held to apply at PF. Now it is assumed that
the abstract Case features must be eliminated before deriving PF by means of
movement to the Spec position of the proper functional head. If these were
visible at LF, then the NP in question must move to the proper Spec position
at LF. If the abstract Case features were visible at both PF and LF then their
elimination must be at SS22. Thus, we have three options upon which
languages differ (i.e. we have a parameter with three options).
1) Abstract Case Features visible at PF

LF does not care about them
Movement at LF
2) Abstract Case Features invisible at PF
LF sensitive to Case-checking
Covert movement at LF
3) Abstract Case Features visible at PF
LF sensitive to Case-checking
22
It should be noted here that "visibility" does not mean the phonological manifestation, but rather it
has to do with the interpretation of constructions.
Movement at SS
In the discussion above we were talking about movement of NPS to Spec-

positions of AGRs and AGRo for Case-checking, but the old Case Filter
requires all the phonologically realised NPs to have Case. Therefore, if
Minimalism is on the right track, then the NPs 'Terry' and 'fountain' in (1) also
have abstract Case features.
(1) Leslie wrote the letter to Terry with a fountain pen.
These two NPS, and the like, get their Cases from their governing
prepositions in the older version of Chomskyan theory. In minimalist accounts,
these NPs must check and eliminate their Case features in the Spec position
of some functional head. In fact, these Case features can be eliminated
through head-complement checking, or by proposing some functional head in
which these Case features can be erased.
3.6 Binding Theory
3.6.1 Introduction
In this section we are going to look at the relations of the binding module in
REST. The notion of 'Binding' is a syntactic one. It is similar to the notion of
'coreference', but not identical, because 'coreference' and ;reference' are
semantic notions. If a linguistic expression refers to something, an entity, an
element or a condition in the physical world, then that linguistic expression is
said to be referential. And if two or more linguistic expressions refer to the
same thing in the real world, then they are considered coreferent. To illustrate
this, take a look at the passage in (1):
At that time, Mo'amar Alqaddafi was a very busy man. The Revolutionary
beloved leader had been working hard pointing out ideological and procedural
differences between the traditional ways of governing and the authority of the
people. The Engineer of the Industrial River from Sirte had also been touring
the whole country, making appearances most notably in Sebha, the site of the
first sign of the great Elfateh Revolution.
In (1) above, the underlined expressions all refer to the same thing and are
therefore coreferential. Each of these expression occurred in a separate
sentence and there is no syntactic relationship between them. The province of
Binding Theory is the kind of coreferentiality represented in (2) where two
coreferential constituents inhabit the same clause.
Normally, for A and B to bind each other (i.e. to be coreferential) A must c-

command B. in this respect B is bound by A. It should be noted that in any
tree diagram there are many c-commanding relations, but a binding relation
can only be obtained if the two constituents are both in a c-commanding
relation and semantically coreferent.
3.6.2 Binding Theory Features and Principles

Binding Theory divides referring constituents into four types on the basis of
two binary features, [±pronominal] and [±anaphoric]:
(i) +ANAPHORIC, -PRONOMINAL
This kind is referred to as anaphors. It includes reciprocal pronouns like each

other, reflexives as himself, and traces etc.
(ii) +PRONOMINAL, -ANAPHORIC
This type is called pronominals. It includes the personal pronouns.
(iii) –PRONOMINAL, -ANAPHORIC
These are called referring expressions or R-expressions. They consist of

ordinary noun phrases.
(iv) +PRONOMINAL, +ANAPHORIC
This is called Big PRO. This interesting category will be discussed at some
length in the next subsection.
Before Binding Theory's Principles, we have to clarify the notion of 'governing

category' within the which constituents can be 'free' or 'bound'. A 'governing
category' includes the constituent in question, its proper governor, and a
subject (a specifier of some kind). For example, in the sentence "the Greens
said that IP [the terrorists hurt themselves]", the governing category of
"themselves" is the subordinate clause (labelled IP). By Principle A of Binding
(see below), anaphors like "themselves" must have an antecedent inside their
governing category: thus, in this case, the antecedent is "the terrorists"
(Matthews, 1997).
After GB developed in the mid-eighties, a governing category is said to be the

one preceded by a barrier. Because 'binding' is a syntactic relation between a
constituent and a c-commanding antecedent, the Binding Principles state how
close to each other the constituents in question must be 23. These Principles
are:
A. An anaphor is bound within its governing category.
B. A pronominal is free within its its governing category.
C. A referring expression is free.
Principle A means that an anaphor must be bound in its governing category.

This means that it must be c-commanded by an antecedent that is not
separated from it by a barrier. Principle B says that a pronominal may have an
antecedent, but unlike anaphors, the antecedent must be separated from the
pronominal by a barrier. The dichotomy here can be illustrated by the
sentences in (3):
(3) a. Basma painted her face blue.
b. Basma painted herself blue.
23
The Principles of Binding are also referred to as Principle A, B, and C.
c. Basma painted her blue.
Looking at sentences (3a – 3b), we see that they are semantically

synonymous at least in some colloquial styles. But syntactically, they seem to
enter into two distinct binding relations. In (3a) 'her' is positioned in the NP
'her face' which is a barrier. Its antecedent is outside the governing category
(i.e. the entire clause) and thus, principle A fulfilled. 'Her' in (3c) on the other
hand is pronominal and is acceptable only if it refers to someone else (i.e. it
does not co-refer with 'Basma'). 'Her' and 'Basma' do not enter in any binding
relation and 'her' is free only in such an interpretation to refer outside the
clause. By principle C, "the grammar won't bother searching for antecedents
for R-expressions" (Schäufele, 1999: 79) because they are free to have or
not to have antecedents within their governing category.
3.6.3 The Typology of Empty Categories

As noted above, Binding Theory has a pair of binary features through which it
defines four category types. Each of these categories can be empty though
they have syntactic realities. Below, we are going to introduce these empty
categories and their relevant general terms.
a. [+Anaphor, -Pronominal]
This empty anaphor is called a 'trace' and is usually represented by the small
letter 't'24. as explained earlier, movement of constituents in REST must leave
a trace behind. This trace is an empty anaphor even if the moved constituent
was a verb and that is why the label NP-trace is avoided nowadays.
Accordingly, traces are subject to Principle A which restricts the distance to
which an antecedent can move. Thus, Principle A constitutes a constraint on
Move α. But successive movement in which a constituent climbs the tree
gradually to meet the binding requirement can avoid the violation of this
24
This is the same as the seventies' NP-trace that was used to distinguish it from the wh-trace, which
will be discussed below.
constraint. This movement is illustrated in the tree diagram in (1). The
constituent in question is x and each trace is positioned inside its governing
category. It is worth noting here that it is possible for an anaphor to be the
antecedent of another.
b. [-Anaphor, +Pronominal]
The empty pronominal is called 'small pro' or 'little pro' and is represented in
text and tree diagrams by 'pro' written in small letters. It functions as a
personal pronoun and it replaces a fully referential NP, but it lacks any
phonological manifestation. It occupies the subject position of imperatives as
in (2) and the subject position in the languages in which overt subject
pronouns are optional in all kinds of clauses as in (3).
(2) Close the door.
Put that down.
(3) Cerco / Cerca / Cerchiamo / Cercate /
(I) am / (S) he is / (We) are / (You) are/

Cercano un'libro
(They) are looking for a book.
(The examples in (3) are from (Schäufele, 1999).
c. [-Anaphor, -Pronominal]
This empty category is sometimes called a 'variable' and sometimes it is

called a 'wh-trace'. Similar to the empty anaphor it arises through movement.
Variables (or wh-traces) are different from '(NP-) traces' because of the
specifications of movement and not due to the nature of moved constituent.
As we know '(NP-) traces' can result from the movement of constituents other
than NPs. Similarly, 'wh-traces' (or 'variables') can also result from the
movement of other constituents that have no wh-features (e,g, Qualifier
Raising (QR)). The importance lies in the position that is occupied by the
moved constituent (i.e. the antecedent) and the position of the 'trace'.
A-movement to A-positions involve anaphoric traces as in the movement of

NPs to get their Case (or to check it as in the minimalist version) to get past
the Case Filter. Verbs also move to AGR leaving an anaphoric trace behind.
The antecedent of a 'wh-trace' occupies an Ā-position. It may be the Spec
position of a head that is incapable of assigning a Ө-role or to check Case
(i.e. Comp).
There is an agreement that each chain has only one Ө-position and only one
Case-position. Neither is an Ā-position, and therefore, the antecedent of a
variable cannot inhabit either. The variable itself occupies one of them.
Binding Theory says that if a 'trace' is in Case-marked position, then it is a
variable (Schäufele, 1999). Being neither anaphor nor pronominal, variables
are subject to Principle C and therefore completely free to refer. This does not
mean that they are identical to R-expressions. Variables must not be
syntactically bound but, nevertheless, they must have an antecedent
somewhere. They ought to be semantically bound by an operator. Thus, they
must have an antecedent. To prove this, remove the word 'who' from (4) and
the clause will be unacceptable. This is not because the variable missed its
antecedent, but because it has no antecedent. It is not a variable at all but
'pro' and English does not permit pro in subject positions in normal declarative
clauses.
d. [+Anaphor, +Pronominal]
This empty category is called 'PRO'. It is both an anaphor and pronominal. It

is capitalized to differentiate it from 'little pro' and it is sometimes called 'big
PRO' (among others see Chomsky, 1981 and Schäufele, 1999). Its being both
an anaphor and pronominal makes it subject to both Principles of Binding A
and B. It should be both free and bound within its governing category. To
avoid this contradiction, it is assumed to be inherently ungoverned and
ungovernable. This assumption is referred to as the 'PRO Theorem'. Thus,
PRO has no governing category. Since Case in REST is assigned under
government and PRO is ungoverned, PRO never has Case. But in minimalist
accounts PRO has to check its null Case features in the specifier of nonfinite
TNSP. As a result of its ungoverned nature, PRO always occupies ungoverned
positions as in the clauses in (5). Its typical position is the subject of infinitive
clauses because they never govern their subjects in REST accounts.
(5) a. It is important [PRO to eat a good breakfast].
b. I want [PRO to eat a good breakfast].
Finally, we do not have to identify an antecedent for PRO because it is

assumed to refer to any number of the suitable set of entities. For instance, in
(5a) it is generic and it refers to the entire human race, but in (5b) it is co-
referential with the subject first person pronoun of the matrix clause.
CHAPTER (4)
IMPLICATIONS FOR LANGUAGE TEACHING AND LEARNING
4.1 Introduction
If one wants to benefit from something, she/he should know what it is first. So
if would like to benefit from the Chomskyan Theory in the field of language
teaching and learning, we ought to know exactly what it is, what it does, and
why it is developed in the first place. Part of the answer to the first question
has been provided in the preceding chapters. The rest of the answer can be
summarized in two or three sentences. The Chomskyan theory is a theory of
language. In addition to describing language in general, it describes its
structure in a consistent coherent manner. It does so to explain what
language is and what a person knows when she/he knows a language. It is
also ant attempt yo define the essential characteristics of human languages
so as to be able to differentiate them from other artificial languages.
The answer to the last question involves the answer to the more basic
question "why do we study language at all?". Of course we learn language to
be able to communicate with other foreign speakers, but this is not the main
focus of theoretical linguistics. In fact, the story is that linguists study language
to reveal, at least, some secrets of the human mind. As Chomsky puts it in his
books Language and Mind:
"there are any number of questions that might

lead one to undertake a study of language.
Personally, I am primarily intrigued by the
possibility of learning something, from the study
of language, that will bring to light inherent
properties of the human mind" p. 103
Therefore we study language to learn something about the nature of our
minds/brains. Moreover, Chomsky points out that there are three distinct
aspects of language in need of study and investigation, that is: language
structure, language acquisition, and language use. It is truth linguistically
acknowledged that we could not find out how languages are acquired unless
we develop a theory of what language is and how it is structured (Radford,
1996).
Building on the antecedent work of his teacher Zelling Harris (see Stern, 1996:
141), Chomsky and his colleagues and students studied language and its
structure and discovered that language is bot creative and rule-governed. After
proposing a theory of its structure, they tried to develop a theory of language
acquisition. They proposed that children have a natural endowment that
enables them to acquire language in a limited time with no instruction at all.
This assumption is called the 'Innateness Hypothesis'. It says that children are
genetically endowed with a device in the brain responsible for language
production, acquisition and maybe comprehension too. This device is called
the 'language faculty'. Nowadays, linguists call it 'Universal Grammar' or UG
for short. UG consists of all the principles that cannot be acquired through
experience. UG theorists argue that all human are genetically endowed with a
set of principles and parameters, which tell children what sort of sounds and
grammar are or are not possible in human language. Thus UG facilitates the
children's task of language learning by restricting the possibilities available to
them. Principles show children what is possible in a language and what it is
not. Parameters are possible options from which one can choose in learning
one language or another. For example, languages vary on what can be
relativized in relative clauses (see Flynn and others, 1998): a) subject, b)
object, c)indirect object, d) object of preposition, e) genitive, f) object of
comparison. All human languages begin with the first option and use subject
relative clauses, but other languages have different possible options of relative
clauses. However, if one type of relative clause is possible in a given
language, then all the other ones to the left must be possible too. There is no
language in which object of preposition relative clauses are found but not
object relative clauses. Accordingly, children only have to set their parameter
on the furthest type of relative clause to the right that is possible in their
language. They do not have to learn each kind separately. Therefore, a UG-
analysis of second language acquisition (SLA) could help language teachers.
It has been suggested that researchers could tell teachers when the
parameters are set similarly or differently for both the L1 L2. If the setting, for
example for relative clauses, were the same, then the teacher would not need
to concentrate on this aspect of language (Bartels, 1999).
So, the children acquiring their mother language have some universal
principles that they do not have to learn; they are already there in their brains.
Other specific language rules have to be worked out by processing the speech
of adults they hear. By forming hypotheses about the speech of adults and
then testing them, children set the parameters which represent the specific
properties upon which language vary. But when we compare first language
acquisition and second language acquisition within the Chomskyan
framework, we come up with three distinct views. First language acquisition
has only one form within the Chomskyan theory (see Cook and Newson,
1998). We have the speech of adults, UG, and finally a competence in a given
language. This view of first language acquisition can be illustrated as in figure
(1) below:
From figure (1) above we see that children have a direct access to UG.
Children hear the speech of adults, process it in the UG device, and then have
their L1 competence. In L2 acquisition however, children can have either a
direct or indirect access to UG, or else have no access at all. If they have a
direct access, then learning a second language can be viewed similarly as
when acquiring a mother tongue. But it may happen that children have indirect
access to UG. They have the L1 knowledge, and thus, can utilize it in learning
a second language. They can transfer their L1 principles positively to the
second language that is being learnt. This view L2 acquisition is illustrated in
figure (2) below:
Although all the studies (Flynn and others, 1988) show that UG has at least
some effect on SLA, there is no consensus regarding the difficulty learners still
have when acquiring L2 grammar if they have access to UG, or why they have
difficulty. Since UG theory in SLA is well developed, studies in the future
should be directed to uncover the difficulties learners would have if they have
access to UG rather than further justifying the existence of UG theoretically.
However, many authors (see Flynn and others, 1988) also point out that while
justifications of UG in interlanguage prevail, what we still lack is an explicit
learning theory.
The third view of L2 learning is having no access to UG at all. The proponents

of this view of L2 learning claim that a given L2 can be learnt from a grammar
book or from drills. Linguists in support of this assumption claim that L2
learners have the ability to reinterpret the principles of L1 to suit the target L2,
but parameters are not resettable. They are transferred to the new learnt
language. To account for L2 learning without UG, the advocators of this view
have turned to alternatives that look at L2 learning "as general problem solving
combined with the knowledge of the L1" (Cook and Newson, 1998).
After this brief introduction about the way the Chomskyan theory deals with
language acquisition, we are going to take a look at the impact of the
Chomskyan thought on language teaching and learning. After so doing we will
interpret some of its implications for language teaching and present them in an
explicit way.
4.2 Impact of Chomskyan Theory on Language Teaching and Learning

It is a fact that Chomsky's TGG has triggered a revolution in the field of
linguistics, and that his has influenced language pedagogy since the mid-
sixties. Its major contribution was the abandonment of the dominant principles
of structuralism in linguistics and audiolingualism in language teaching. At
first, Chomskyan Theory was neglected, but about 1960, it was applied to
mother tongue teaching by some (for example, Roberts P., 1964) that
considered it as a promising alternative to the prevailing treatment of syntax.
Viewing Chomsky's thought in linguistics as a modification of structuralism,

language-teaching theorists assumed that it was relevant to language
teaching. But those, who considered TGG as too limited to serve as an
alternative linguistic theory for language teaching, have only accepted it as an
option among other frameworks for the description of syntactic structures.
With the new developments of 1964 in Chomskyan Theory, linguists

recognized that this grammatical theory had disproved many of the dominant
ideas of contemporary linguistics and the concepts of language teaching. For
example, Chomskyan Theory released language teaching from the tenets of
behaviorism in psychology. It did this by showing that language is just
stimulus-response pattern, rather language is creative. For instance, the
Chomskyan notion that language is creative implied that teaching techniques,
which enable learners to use language creatively, are more appropriate than
those, which lead to automatic responses or mechanic repetitions.
The Chomskyan distinction between deep and surface structures presented a
very useful way of reinterpreting contrastive linguistics. As Di Pietro (1968 and
1971) argued that, if we want to make comparisons between languages and
discover similarities and contrasts, "languages must have something in
common, otherwise comparisons could not be made" (Stern, 1996: 168).
Furthermore, theorists tried to benefit from the implications of Chomskyan
notions in second language learning. These included the notions of
competence and performance, the rule-governed creativity of language, the
deep and surface structure distinction as well as the implications of the new
view of first language acquisition.
Recently, Richard Hudson (2000) has sought evidence in the published

research for the effects of training learning in grammatical analysis to improve
their writing skill. The evidence he found showed that training in grammatical
analysis made learners write better. Moreover, the results of many studies he
researched showed that using Transformational Grammar was better than
using traditional grammar and the remainder of modern grammars.
4.3 Implications for Language Teaching and Learning

The claim that Chomskyan Theory can be used somehow in the field of
language teaching and learning needs proof because even Chomsky himself
remarked that it has no direct connection with language pedagogy. But
looking at how this theory analyses language and its structure, we see that
there are some indirect implications for the areas of language teaching and
learning. This section counts as an attempt to explain some of the points that
we can utilize in the field of language pedagogy. It should be noted that most
of these assumptions can be used in teaching university levels. The possible
implications and effects that are applicable in language teaching and learning
can be summarized in the following points:
1. Describing language more accurately than the other models of grammar

do, it provides promising linguistic foundations for language pedagogy. Trace
Theory for example, can be used in explaining certain language phenomena
to the learner. Of course it is used to assure that the meaning of any moved
constituent is interpreted correctly, but it can also be used in language
teaching. For example, the so-called "wanna contraction" in English is not
possible in all configurations of structures. It is permitted only in some
configurations where there is no intervening constituent between 'want' and
the preposition 'to'. To illustrate, consider the following:
1.a I want to eat this apple.
1.b I wanna eat this apple.
2.a I want this apple to be eaten for fun.
2.b *I wanna this apple to be eaten for fun.
The examples (1a & b) prove that want and to can contract to wanna if they
are adjacent. Because want and to are not adjacent in (2a & b) they cannot
contract to wanna. In the examples (2a & b) the NP this apple intervenes
between want and to, which prevents their contraction. To clarify the role of
trace theory here, consider now the examples in (3) and (4). These sentences
involve movement of the NP this apple to the leftmost of the sentence.
As shown in (3) want and to can contract because the trace of the moved NP
this apple is in the object position of eat and does not intervene between want
and to. But in (4) the trace in question intervenes between want and to. For
this reason, the two categories cannot contract to wanna as shown in (4b).
What all this shows is that the grammatical rule which contracts want and to is
dependent on whether there is an intervening constituent between the two
categories or not. This intervening constituent could be a trace and therefore
Trace Theory plays an important role in explaining such phenomena to the
learners of English language.
2. Although the constraints, conditions and principles of the theory are

abstract, they are definite and formally explicit. Students will consider them as
rules of a game. By applying them students will perceive how the language
system functions.
3. Some operations of the new Minimalist Program serve as a foundation in

teaching the writing skill. For example, the two operation Select and Merge
can be used to explain how sentences are formed. The operation Select
takes lexical items from the lexicon and Merge combines them. By way of
illustration consider the example in (5):
5. He was going to school.
Merging (i.e. combining) the noun school with the preposition to derives the
PP in (6).
Merging the PP in (6) with the verb going derives the VP in (7) below.
Merging the VP in (7) with the auxiliary derives the incomplete phrase I' in (8)
(not complete because it cannot be said in conversation as a reply to a
question or in any exchange of speech).
Merging the I' in (8) with the subject pronoun he will derive the IP in (9) below.
The student can use the operations Select and Merge plus the idiosyncratic
properties of lexical items to avoid ungrammaticality. For example, the
properties of the pronoun he in (5) specify that it has a nominative Case
features and therefore it should always be a subject. Its properties also
specify that it selects a singular verb and this was fulfilled by the auxiliary verb
was in (5) among the properties of was that it must have a singular nominative
a specifier (subject). The verb was also requires a verb in the '–ing' form as its
complement (in its use as an auxiliary forming the past continuous tense).
This property was fulfilled by the verb going in (5). The idiosyncrasies of the
verb going in turn indicate that it needs a prepositional phrase headed by the
preposition to and this was fulfilled by the PP to school in (5). The preposition
to also needs a noun as its complement and this requirement was met by the
noun school in (5).
From the discussion above, we can see that these two operations (Select and
Merge) together with the idiosyncrasies of lexical items are very useful in
explaining how sentences are formed. Thus, they can be used in teaching
writing.
4. The schema of X-bar theory is the same for all phrases and sentences. The
university teacher can explain this to his/her students. After that the teacher
can show them how to insert lexical items in their appropriate positions. The
idiosyncrasies of lexical items plus other principles, relations (for instance, C-
command and the Binding Principles), and operations (for example, the
operation Merge discussed above) of Chomskyan Theory can be used
effectively in defining the appropriate positions in which lexical items can be
positioned. At the end students will recognize that they can generate an
infinite number of phrases and sentences through the use of X-bar schema
and other useful operations, constraints, and principles. Thus, the result will
be the development of the creativity of language among students.
5. Chomskyan Theory determines universal principles to reveal the underlying

similarities among languages. This fact encourages transfer between
languages, particularly in the areas of translation and contrastive analysis,
which are important to language teaching.
6. The different levels in a derivation are very useful in explaining certain

syntactic and meaning ambiguities. For example, the sentences "John is easy
to please" and "John is eager to please" have the same surface form but they
are different. They have distinct deep structures. The first means that "it is
easy to please John" and the second means "John is eager to please
someone". Thus, different levels used in Chomskyan Theory can be used in
some way to explain this difference in meaning. A more complex ambiguity
results from the ranking of quantifiers, which was demonstrated in the section
on Levels of Representation in chapter (2).
7. The ability of this theory to distinguish grammatical sentences from the
ungrammatical ones can be used in the areas of evaluation and error analysis
(i.e. the identification of errors) central in foreign language teaching.
8. it helps the teacher in defining the objectives, content and presentation of

his language course. If the teacher believes in Chomskyan Theory then one of
his/her language course goals will be the following:
a. The formation of knowledge of the foreign language (FL)
According to the assumptions of Chomskyan Theory (as sketched in chapter

2 and 3) the teacher's aim in (a) should be subdivided into three more specific
objectives:
(i) The formation of syntactic knowledge of FL
(ii) The formation of semantic knowledge of FL
(iii) The formation of phonological knowledge of FL
The Chomskyan Theory specifies that the syntactic knowledge of a sentence

is prerequisite to the formation of the semantic and phonological knowledge.
Thus, language courses will be based first on syntactic knowledge. This
means that the presentation of syntactic information should be prior to
semantic and phonological information.
An important basis of Chomskyan Theory is UG. The teacher can benefit from
UG in defining the content of his/her language course. For example, instead
of telling students that there are statements, questions, nouns, etc. in the FL,
the teacher should construct his/her language course so as to delve directly
into how to form statements, questions, etc.. in other words, he/she should
make use of the innate linguistic knowledge that his/her students possess.
9. Its extensive terminology gives the university teacher a useful

metalanguage on which he/she can depend. For example, when the teacher
wants to analyze the errors that his/her students make, he/she can use this
metalanguage to describe these errors as violations of certain constraints or
principles. Then, he/she can use his/her knowledge of how this theory deals
with the aspect in question and devise a remedial work based on his/her
analysis of the errors. The teacher can also use his/her transformational
metalanguage to compare two languages and see how they are similar or
different (i.e. contrastive analysis).
10. Its explanation of the language acquisition process has influenced

language methodology, especially in the presentation of language courses.
11. It enriches the teacher's knowledge of the language structure to be taught.

Thus, it can be assumed that it will improve the content of his/her language
course.
12. Finally, the competence/performance distinction is a very useful aid in

assessing the students' competence via their performance. By analyzing the
students' performance the teacher should be able to know the level of his/her
students' competence. In other words, he/she will have known what his/her
students have learnt and what they have missed. After this assessment the
teacher can modify his course content so as to make up for the structures that
the learners have not mastered yet.
CHAPTER (5)
CONCLUSION
Although Chomsky never intended to devise a theory of or for Applied
Linguistics, the influence of his theory on language teaching has come from
his idea of what human language is. Whereas his actual theory of linguistics
has changed considerably since 1950s, his view of what such a theory should
achieve has remained fairly constant. The general question is language
acquisition, but not in the sense of teaching. It is an essential feature of
Chomsky's analysis of the problem that first language acquisition is
independent of teaching.
The main contribution of the Chomskyan linguistic thought to language

teaching lies in providing a theoretical basis for the cognitive method, which is
based on the Chomskyan notions of competence and performance, and that
of UG. But Chomsky's other contributions include at least the following 25:
(1) He has regarded syntax as the central feature of linguistic structure, and
proposed that its study should be central in linguistic research (Trask, 1997).
(2) He devised a generative grammar that is capable of generating all and

only the grammatical sentences of a language and assigning them suitable
syntactic structures.
(3) He introduced a transformational grammar (TG), which gives both abstract

underlying representations (deep structures) and different surface forms
(surface structures) or sentences.
(4) Chomsky turned linguistics toward mentalism (rationalism) instead of

empiricism. He believed that minds, purposes, mental representations, and
25
It should be noted that most of these ideas have been understood and collected from different
discussions and answers to several queries submitted by different linguists to the Linguist List
Website.
mental processes are real, can be utilized in linguistic descriptions, and can
be studied themselves.
(5) Contrary to his predecessors, he concentrated on the distribution of

linguistic elements and on representing linguistic forms at various levels of
analysis, Chomsky focused rather on rules, and particularly on the rules
required for transforming abstract underlying structures into surface forms
(ibid.). Later, however, Chomsky shifted his focus from the rules themselves
to the constraints to which those rules were subject.
(6) While his predecessors had acknowledged that languages vary without
limit, Chomsky argued for the universal nature of language. He paid more
attention to the similarities among languages than their variable nature.
Recently, the emphasis upon universality has led to the search for universal
grammar, the supposedly universal structural properties of human languages.
(7) Accordingly, Chomsky has argued that these universal properties are
genetically built into our brains at birth: this is his innateness hypothesis. In
Chomsky's view, children know in advance what human languages are like,
and have only to acquire the particular idiosyncrasies of the specific language
they are learning.
From its early descriptions of languages, Chomskyan Theory has proved that
it has great generalizing power and is able to explain underlying regularities
among languages. By offering to both the teacher and the learner more
general rules in place of the traditional lists of exceptions and special cases,
Chomskyan Theory is a very useful aid to teaching and learning. It is this point
which Rutherford (1968) takes up in the preface to his textbook:
"Although transformational theory as such does not

tell us exactly how languages are learnt, it has
nevertheless revealed the extent to which they have
underlying regularities, deep and surface structures
and universal similarities – discoveries which do
have great relevance for language teaching".
Among others, one example of the discovery of underlying regularities and

novel generalization is the treatment of do in Chomsky's Syntactic Structures.
Although Chomsky has changed his syntactic ideas frequently, his beliefs
have remained broadly the same. He refreshes his theory repeatedly and
dramatically. This continuous change is observed in each version of his
theory.
It is a fact that Chomskyan theory is not a theory of language teaching and

that no direct connection can be made between Chomskyan theory and
language teaching, but some of its indirect implications and effects for
language teaching can be found. As has been shown, such implications are
very important for language teaching and learning.
References
1. Abney, S. (1987). The English Noun Phrase in its Sentential Aspect.
Doctoral Dissertation, MIT.
2. Alatis, J E. (ed.) (1968). Contrastive Linguistics and its Pedagogical

Implications. Report of the 19th Round Table on Linguistics and Language
Studies. Washington, D.C.: Georgetown University Press.
3. Bartels, Nat. (1999). A review of Suzanne Flynn, Gita Martohadjono &

Wayne O'Neil (Eds.) Book: The Generative Study of Language Acquisition,
Mahwah, NJ: Lawrence Erlbaum (1998). TESL Electronic Journal, Vol. 3. No.
4, 1999.
4. Black, Cheryl A. (1999). A Step - by - Step Introduction to the Government

and Binding Theory of Syntax, Summer Institute of Linguistics.
5. C., Ritchie William (1967). "Some Implications of Generative Grammar for

the Construction of Courses in English as a Foreign Language", Language
Learning, 17, 1967, 45 – 69.
6. Chomsky, Noam (1957). Syntactic Structures. Mouton, The Hague.
7. --------------------- (1965). 20th edition, Aspects of the Theory of Syntax. MIT

Press, Massachausetts.
8. --------------------- (1966). Cartesian Linguistics, Harper & Row, New York.
9. --------------------- (1972). Language and Mind, 2nd Enlarged Edition, New

York.
10. --------------------- (1975). The Logical Structure of Linguistic Theory,

Kluwer Academic / Plenum Publishers, New York.
11. --------------------- (1980). "The Language Faculty", Published in Stuart
Hirschberg and Terry Hirschberg, (eds.), Reflections on Language, Oxford
University Press, New York, 1999.
12. --------------------- (1981). Lectures on Government and Binding. The Pizza

Lectures, Foris, Dordrecht.
13. --------------------- (1982). 7th edition, Some Concepts and Consequences

of the Theory of Government and Binding. MIT Press, Massachusetts.
14. --------------------- (1986a). Knowledge of Language: Its Nature, Origin, and

Use, USA.
15. --------------------- (1986b). 6th edition, Barriers, MIT Press, Massachusetts.
16. --------------------- (1991). "Linguistics and Cognitive science: Problems and

Mysteries", in Asa Kasher (editor) The Chomskyan Turn, Basel Blackwell,
Oxford, 26 – 53.
17. --------------------- (1995). Minimalist Explorations: A lecture delivered at

London University.
18. --------------------- (1996). "A Minimalist Program for Linguistic Theory", in

The View from Building, 20, 3rd edition, MIT Press, Massachusetts.
19. --------------------- (1997). The Minimalist Program. MIT Press,

Massachusetts.
20. Cook, V. J. and Mark Newson (1998). 2nd edition, Chomsky's Universal
Grammar: An Introduction. Blackwell Publishers.
21. Di Pietro, R J. (1968). "Contrastive Analysis and the Notions of Deep and
Surface Grammar", in Alatis (1968), 65 – 80.
22. ------------------ (1971). Language Structure in Contrast. Rowley, Mass.

Newbury House.
23. E. Muskat-Tabokowsaka (1969). "The Notions of Performance and

Competence in Language Teaching", in Language Learning, 19, 1969, 41 –
54.
24. E., Rutherford William (1968). Modern English: A Textbook for Foreign
Students, New York, Harcourt, Brace and World.
25. Edmonds, Joseph (1976). A Transformational Approach to English

Syntax: Root, Structure-Preserving, and Local Transformations. Academic
Press, Mass.
26. Flynn, S. and others (1998) (Eds). The Generative Study of Language
Acquisition. Mahwah, NJ: Lawrence Erlbaum.
27. Hudson, Richard (2000). Grammar Teaching and Writing Skills: the
Research Evidence, London.
28. Johnson, David E. (1996). To Move or Not to Move: a Critique of Pure

Minimalism, OSU Linguistics Speakers Series.
29. Lotfi, Ahmed Reza (2000). "An Outline of the Pooled Feature Hypothesis:
How Imperfect are 'Language Imperfections'", Unpublished Paper, Azad
University, Iran.
30. Mark, Lester (1970). (ed.). Reading in Applied Transformational Grammar,

New York, Holt, Reinhart and Winston.
31. McGilvray, James (1999). Chomsky: Language, Mind and Politics, Polity
Press, Oxford.
32. Ouhala, Jamal (1994). Introducing Transformational Grammar: From

Rules to Principles and Parameters, Edward Arnold, London.
33. ------------------ (1999). Introducing Transformational Grammar: From

Principles and Parameters to Minimalism, Edward Arnold, London.
34. Questions Submitted to the Linguist List Website found at:

http://linguistlist.org/~ask-ling/
35. Radford, Andrew (1996). Transformational Grammar, Cambridge

University Press, Cambridge.
36. Radford, Jacbson (1969). "The Role of Deep Structure in Language

Learning", in Language Learning, 19, 1969, 117-140.
37. Roberts, P. (1964). English Syntax. New York, Harcourt Brace.
38. Roulet, Eddy (1975). Translated by Christopher N. Candlin. Linguistic

Theory, Linguistic Description and Language Teaching. Longman Group
Limited.
39. Sag, Ivan A. and Thomas Wasow (1999). Syntactic Theory: A Formal
Introduction. CSLI Publications, USA.
40. Schwartz, Bonnie D. and Rex A. Sprouse (2000). Back to Basics in

Generative Second Language Acquisition Research.
41. Schäufele, Steven (1999). The Synthinar Lecturettes. WWW document
found at: http://www.jtauber.com/linguistics/synthinar/.
42. Smith, Neil (1999). Chomsky: Ideas and Ideals, Cambridge University
Press.
43. Speas, Peggy (1990). Phrase Structure in Natural Languages. Dordrecht,

Reidel.
44. Spencer, Andrew (1991). Morphological Theory. Blackwell Publishers,

Oxford, Great Britain.
45. Stern, H. H. (1996). Fundamental Concepts of Language Teaching, 9th

impression, Oxford University Press, Oxford, Great Britain.
46. Trask, Larry (1997). A response to a query submitted to the linguislist

Website: http://linguistlist.org/~ask-ling/archive-1997.7/msg00891.html.
47. Webelhuth, G. (1995). (ed.). Government and Binding Theory and the
RMinimalist Program: Principles and Parameters in Syntactic Theory. A
reader's Guide to "A Minimalist Program for Linguistic Theory". Blackwell,
Cambridge.

The Chomskyan Theoryandits Implicationsfor Language Teachingand Learning

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

The Chomskyan Theoryandits Implicationsfor Language Teachingand Learning

Uploaded by

Copyright:

Available Formats

The CHOMSKYAN THEORY AND ITS IMPLICATIONS FOR LANGUAGE

TEACHING AND LEARNING

FARAJ MOHAMED SAWAN

This work is dedicated to the memory of my

Although Chomsky never intended to devise a theory of or for Applied

Chapter (4) introduces the most important implications of the transformational-

Despite the constant development in this framework, the notion of

Nowadays different linguists practice various frameworks of syntactic

Furthermore, Chomsky presupposed that language acquisition is a matter of

1.2. Early Transformational Grammar

(1) a. The police chased the criminal.

1.3. The Standard Theory

1.4. The Extended Standard Theory

Furthermore, they argued for the existence of another kind of structure

1.5. Government and Binding

The main topic of research in GB was the development of Universal

GB assumes that Universal Grammar consists of certain principles that are

D-structure is converted into S-structure, which represents the surface order

To conclude, we summarise the most noteworthy features of GB in the

1. A highly articulated phrase structure encoding the important distinctions

2. The use of a single movement transformation Move α.

3. An extensive use of empty categories.

4. The use of parameterised universal principles.

5. The elimination of language-specific rules.

1.6. The Minimalist Program

MP is carried still further with specific proposals to reduce the levels of

1. A cognitive system to store the data ( a computational system and a

According to Chomsky (1995), there is only one computational system for

As its name denotes, MP is a research program rather than a syntactic theory.

In fact, Chomskyan Linguistics is in a very rapid period of theory evolution

b. Standard Theory (ST henceforth).

c. Revised, Extended Standard Theory (REST).

d. Government & Binding (GB).

e. The Principles & Parameters Approach (P&P or PPA).

2.2. Claims of the Theory

The original Chomskyan assumptions, which were first established in the

(1) Claims of REST:

a. All syntactic relations can be described in terms of "Constituent Structure".

b. Constituents move from one part of structure to another, consequently, one

c. there is a set of well-formedness conditions resident in the syntactic

For clarification purposes, kinship terminology is used in talking about nodes.

In fact, the claim in (1a) is a fundamental assumption of Standard Theory that

In the Aspects Model, the derivation was a sequence of constituent structures

(4) a. I think John saw a buffalo.

b. What do you think John saw?

c. Who do you think saw a buffalo/

d. Who do you think saw what?

(5) *What do you think who saw?

(6) a. Mariam might have won.

b. Might Mariam have won?

c. *Have Mariam might won?

2.3. Transformations vs. Constraints

"Movement" in REST is regarded as the major factor in generation. In view of

It is worth mentioning that in the eighties movement of phrases was motivated

2.4. Levels of Representation

In this case, independent constraints force the Passive transformation to

It is passivized, then Equi-Deletion would have also occurred automatically.

These advancements defined another derivational organisation for the

(4) The dog saw a cat running in the farm.

(5) a. [ The dog [ saw [ a cat running in the farm ] ] ]

b. [ [ The dog [ running in the farm ] ] saw a cat ]

Therefore, the Aspects Model explained the ambiguity in (4) by assigning it