Professional Documents
Culture Documents
1language and Mathematics PDF
1language and Mathematics PDF
Unauthenticated
Download Date | 6/20/16 3:49 PM
Language Intersections
Volume 1
Unauthenticated
Download Date | 6/20/16 3:49 PM
Marcel Danesi
Language and
Mathematics
|
An Interdisciplinary Guide
Unauthenticated
Download Date | 6/20/16 3:49 PM
ISBN 978-1-61451-554-8
e-ISBN (PDF) 978-1-61451-318-6
e-ISBN (EPUB) 978-1-5015-0036-7
ISSN 2195-559X
Unauthenticated
Download Date | 6/20/16 3:49 PM
Contents
List of gures | viii
Preface | x
1
1.1
1.1.1
1.1.2
1.1.3
1.1.4
1.2
1.2.1
1.2.2
1.2.3
1.3
1.3.1
1.3.2
1.4
1.4.1
1.4.2
1.5
Common Ground | 1
Logic | 6
Formalism in linguistics and mathematics | 8
Syntax | 18
Formal analysis | 24
The structure of logic | 32
Computation | 36
Modeling formal theories | 40
Cognitive science | 46
Creativity | 50
Quantication | 52
Compression | 53
Probability | 55
Neuroscience | 56
Neural structure | 57
Blending | 62
Common ground | 64
2
2.1
2.1.1
2.1.2
2.1.3
2.1.4
2.1.5
2.1.6
2.2
2.2.1
2.2.2
2.3
2.3.1
2.3.2
2.3.3
Logic | 66
Formal mathematics | 69
Lgos and mythos | 70
Proof | 72
Consistency, completeness, and decidability | 81
Non-Euclidean logic | 85
Cantorian logic | 88
Logic and imagination | 91
Set theory | 96
Diagrams | 98
Mathematical knowledge | 101
Formal linguistics | 103
Transformational-generative grammar | 104
Grammar rules | 108
Types of grammar | 110
Unauthenticated
Download Date | 6/20/16 3:50 PM
vi | Contents
2.3.4
2.4
2.4.1
2.4.2
2.5
2.5.1
2.5.2
2.5.3
3
3.1
3.1.1
3.1.2
3.1.3
3.2
3.2.1
3.2.2
3.3
3.3.1
3.3.2
3.3.3
3.3.4
3.4
3.4.1
3.4.2
3.5
3.5.1
3.5.2
Computation | 132
Algorithms and models | 134
Articial intelligence | 138
Knowledge representation | 139
Programs | 144
Computability theory | 147
The Traveling Salesman Problem | 147
Computability | 153
Computational linguistics | 159
Machine Translation | 160
Knowledge networks | 163
Theoretical paradigms | 167
Text theory | 172
Natural Language Processing | 174
Aspects of NLP | 175
Modeling language | 178
Computation and psychological realism | 179
Learning and consciousness | 180
Overview | 184
4
4.1
4.1.1
4.1.2
4.2
4.2.1
4.2.2
4.2.3
4.2.4
4.3
4.3.1
Quantication | 193
Statistics and probability | 195
Basic notions | 197
Statistical tests | 200
Studying properties quantitatively | 202
Benfords Law | 203
The birthday and coin-tossing problems | 206
The Principle of Least Effort | 209
Efficiency and economy | 216
Corpus linguistics | 219
Stylometric analysis | 219
Unauthenticated
Download Date | 6/20/16 3:50 PM
Contents |
4.3.2
4.3.3
4.4
4.4.1
4.4.2
4.4.3
4.4.4
4.5
4.5.1
4.5.2
4.6
5
5.1
5.1.1
5.1.2
5.1.3
5.1.4
5.2
5.2.1
5.2.2
5.2.3
5.2.4
5.3
5.3.1
5.3.2
5.4
Neuroscience | 255
Neuroscientic orientations | 256
Computational neuroscience | 257
Connectionism | 262
Modularity | 264
Research on metaphor | 266
Math cognition | 268
Dening math cognition | 270
Charles Peirce | 272
Graphs and math cognition | 274
Neuroscientic ndings | 276
Mathematics and language | 284
Mathematics and gurative cognition | 285
Blending theory | 287
Concluding remarks | 294
Bibliography | 297
Index | 327
Unauthenticated
Download Date | 6/20/16 3:50 PM
vii
List of gures
Figure 1.1
Figure 1.2
Figure 1.3
Figure 1.4
Figure 1.5
Figure 1.6
Figure 1.7
Figure 1.8
Figure 1.9
Figure 1.10
Figure 1.11
Figure 1.12
Figure 2.1
Figure 2.2
Figure 2.3
Figure 2.4
Figure 2.5
Figure 2.8
Figure 2.9
Figure 2.10
Figure 2.11
Figure 2.12
Figure 2.13
Figure 2.14
Figure 2.15
Figure 2.16
Figure 2.17
Figure 2.18
Figure 2.19
Figure 2.20
Part 1 of the proof that the sum of the angles in a triangle is 180 | 73
Part 2 of the proof that the sum of the angles in a triangle is 180 | 74
Part 3 of the proof that the sum of the angles in a triangle is 180 | 74
Dissection proof of the Pythagorean theorem | 79
Initial correspondence of the set of integers with the set
of square numbers | 89
Second correspondence of the set of integers with the set
of square numbers | 90
Correspondence of the set of integers with the set of positive integer
exponents | 90
The Cantor set | 92
The Sierpinski Carpet | 93
The M-Set | 95
Overlapping sets | 97
Eulers diagrams | 98
Eulers diagram solution | 99
Venns basic diagram | 99
Venn diagrams | 100
Tree diagram for The boy loves the girl | 105
Early model of a transformational-generative grammar | 106
Lexical tree diagram | 115
Figures of speech | 119
Image schemas, mapping and metaphor | 121
Figure 3.1
Figure 3.2
Figure 3.3
Figure 3.4
Figure 3.5
Figure 3.6
Figure 2.6
Figure 2.7
Unauthenticated
Download Date | 6/6/16 9:39 PM
List of gures |
Figure 3.7
Figure 3.8
Figure 3.9
Figure 3.10
Figure 4.1
Figure 4.2
Figure 4.3
Figure 4.4
Figure 4.5
Figure 4.6
Figure 4.7
Figure 5.1
Figure 5.2
Figure 5.3
Figure 5.4
Figure 5.5
Figure 5.6
Figure 5.7
Figure 5.8
Blending | 270
Flow model of math cognition | 275
Model of numeracy and math cognition | 278
Butterworths model | 279
The numerosity adaptation effect | 283
Diagram for Rutherfords model of the atom | 290
Diagram for Bohrs model of the atom | 290
Diagram for Schrdingers model of the atom | 290
Unauthenticated
Download Date | 6/6/16 9:39 PM
ix
Preface
Our work is to present things that are as they are.
Frederick II (11941250)
Unauthenticated
Download Date | 6/6/16 9:40 PM
Preface |
xi
language and mathematics ever since. There is, of course, a branch of linguistics
known as mathematical linguistics (to be discussed in chapter 3), which has the
specic aim of using mathematical constructs to develop grammatical theories;
but there really is no one general rubric in either linguistics or mathematics that
aims to study the relationship between the two disciplines, despite some truly intriguing attempts (which will also be discussed in this book). Needless to say, there
exist various interdisciplinary approaches that come under different rubrics, such
as the philosophy of mathematics, the psychology of mathematics, the anthropology of mathematics, the psychology of mathematics, and so on and so forth. Each
of these is a branch within its own eld. But there has never really been an overarching approach that connects mathematics and language, until very recently
with the advent of so-called mathematical cognition research (also called numeracy research)an area that will be examined closely in the nal chapter.
The study of the mathematics-language interface constitutes a hermeneutic enterprise. Most elds have oneliterature has literary criticism, music has
musicology, art has art criticism, and so on. These strive to understand the relevance of the eld to human knowledge and aesthetics through an analysis of
texts and expressive activities within each. The same kind of approach can be
applied to the math-language nexus. Arguably, the rst hermeneutical work in
this eld, although the authors did not name it as such, was by George Lakoff and
Rafael Nez, Where mathematics comes from (2000), in which they argue that
the same neural processes are involved in producing language and mathematics.
This line of inquiry has soared considerably since the publication of their book.
One of the offshoots from this new interest has been an increased sense of the
common ground that mathematicians and linguists share. Institutes such as the
Cognitive Science Network of the Fields Institute for Research in Mathematical
Sciences, co-founded by the present author, are now springing up everywhere to
lay the groundwork for formulating specic hermeneutical questions about the
interrelationship of mathematics and language.
The groundwork was laid, arguably, by Stanislas Dehaene (1997). He studied
brain-damaged patients who had lost control of number concepts. He was able to
trace the sense of number to the inferior parietal cortex, an area where various
subsystems are also involved in language processing (auditory, visual, tactile).
This type of nding is strongly suggestive of an inherent link between math and
language, even though Dehaene himself has kept away from making this connection directly. George Johnson (2013: 5) puts it as follows:
Scientists are intrigued by clues that this region is also involved in language processing and
in distinguishing right from left. Mathematics is, after all, a kind of language intimately involved with using numbers to order space.
Unauthenticated
Download Date | 6/6/16 9:40 PM
xii | Preface
The skill of adding numbers is not unlike the skill of putting words together into
phrases and sentences. Lakoff and Nez see mathematics as originating in the
same neural substratum where metaphor and other gurative forms of language
originate. This is why, they claim, we intuitively prefer number systems based
on tenthe reason being that we have ten ngers, which we use instinctively to
count. Number systems are thus collections of linking metaphors, or mental
forms that transform bodily experiences (such as counting with the ngers) into
abstractions. Lakoff and Nez also make the seemingly preposterous claim that
even mathematical proofs stem from the same type of metaphorical cognition.
Incredibly, experimental psychological research is validating this hypothesis, as
will be discussed in this book. If we are ever to come to an understanding of what
language and mathematics are, such hermeneutical-empirical approaches cannot
be ignored or dismissed as irrelevant to either discipline.
My discussion in this book is nontechnicalthat is, I do not take prior mathematical or technical linguistic knowledge for granted. This may mean some reductions and oversimplications, but my objective is not to enter into the technical
minutiae of each discipline, which are of course interesting in themselves; rather,
it is to evaluate what a common ground of research entails for the two disciplines.
The rst chapter will look at this ground in a general way; the second one will discuss the role of logic and formalism in both disciplines; the third one will examine
how linguists, mathematicians, and computer scientists have been collaborating
to model natural language and mathematics in order to glean common patterns
between the two; the fourth chapter looks at quantitative approaches in both linguistics and mathematics, and especially the ndings that relate to how the two
disciplines, themselves, obey the laws of probability; and the nal chapter looks
at the ever-expanding idea that neuroscience can provide the link for studying
mathematics and language in a truly interdisciplinary way.
We are living in an age where mathematics has become a critical tool in virtually all elds of scientic inquiry (biology, sociology, economics, education, and
so on). As journalist Thomas Friedman (2007: 300) has aptly put it, the world is
moving into a new age of numbers in which partnerships between mathematicians and computer scientists are bulling into whole new domains of business
and imposing efficiencies in math. I would add that the same world also needs
to forge partnerships between mathematicians and linguists. Some of my assessments are, inevitably, going to be subjective. This is due in part to my own knowledge of both elds and my own theoretical preferences. Nevertheless, I hope to
provide a broad coverage of the common ground and thus to emphasize the importance of mathematics to the study of language and of linguistics to the study
of mathematics.
Unauthenticated
Download Date | 6/6/16 9:40 PM
1 Common Ground
The knowledge of mathematical things is almost innate in us. This is the easiest of sciences,
a fact which is obvious in that no ones brain rejects it; for laymen and people who are utterly
illiterate know how to count and reckon.
Roger Bacon (c. 1214c. 1294)
Introductory remarks
In the 1960s, a number of linguists became intrigued by what they saw as the
mathematical properties of language and, vice versa, the linguistic properties
of mathematics (Marcus and Vasiliu 1960, Jakobson 1961, Hockett 1967, Harris
1968). Their pioneering writings were essentially exploratory investigations of
structural analogies between mathematics and language. They argued, for example, that both possessed the feature of double articulation (the use of a limited
set of units to make complex forms ad innitum), ordered rules for interrelating
internal structures, basic units that could be combined into complex ones, among
other things. Many interesting comparisons emerged from these studies, which
contained an important subtextby exploring the structures of mathematics and
language in correlative ways, we might hit upon deeper points of contact and thus
at a common ground for studying and thus understanding both.
At around the same time, generative grammar came to the forefront in theoretical linguistics (Chomsky 1957, 1965). From the outset, it espoused a basic mathematical mindsetthat is, it saw the study of language as a search for the formal
axioms and rules that undergirded the formation of all grammars. As his early
writings reveal, Chomsky was inspired initially by Markovs (1906) idea that a
mathematical system that has n possible states at any given time, will be in one
and only one of its states. The generativist premise was (and continues to be) that
the study of these states in separate languages will lead to the discovery of a universal set of rule-making principles that produce them (or reect them). These are
said to be part of a Universal Grammar (UG), an innate faculty of the human brain
that allows language to develop effortlessly in human infants through exposure,
in the same way that ight develops in birds no matter where they are in the world
and to what species they belong. The concept of rule in generative grammar was
thus drafted to be analogous to that in propositional logic, proof theory, set theory,
and computer algorithms. The connection between rules, mathematical logic, and
computation was actually studied insightfully by Alan Turing (1936), who claimed
that a machine could be built to process equations and other mathematical forms
without human direction. The machine he described resembled an automatic
Unauthenticated
Download Date | 6/6/16 9:40 PM
2 | 1 Common Ground
typewriter that used symbols instead of letters and could be programmed to duplicate the function of any other existing machine. His Turing machine could
in theory carry out any recursive functionthe repeated application of a rule or
procedure to successive results or executions. Recursion became, and still is, a
guiding assumption underlying the search for the base rules of the UG. Needless
to say, recursion is also the primary concept in various domains of mathematics
(as will be discussed in the next chapter).
The quest to understand the universal structures of mind that produce language and mathematics, considered to be analogous systems, goes actually back
to ancient philosophers and, during the Renaissance, to rationalist philosophers
such as Ren Descartes (1641) and Thomas Hobbes (1656), both of whom saw
arithmetical operations and geometrical proofs as revealing essentially how the
mind worked. By extension, the implication was that the same operationsfor
example, commutation and combinationwere operative in the production of
language. As the late science commentator Jacob Bronowski (1977: 42) observed,
Hobbes believed in a world that could be as rational as Euclidean geometry; so,
he explored in its progression some analogue to logical entailment. Hobbes
found his analogue in the idea that causes entailed effects as rigorously as
Euclids propositions entailed one another. Descartes, Hobbes, and other rationalist philosophers and mathematicians saw logic as the central faculty of the
mind, assigning all other faculties, such as those involved in poetry and art, to
subsidiary or even pleonastic status. They have left somewhat of a legacy, since
some mathematicians see mathematics and logic as one and the same; and of
course so too do generative linguists.
Since the early 1960s, mathematical notions such as recursion have inuenced the evolution of various research paradigms in theoretical linguistics, both
intrinsically and contrastively (since the paradigm has also brought about significant opposing responses by linguists such as George Lakoff). Mathematicians,
too, have started in recent years to look at questions explored within linguistics,
such as the nature of syntactic rules and, more recently, the nature of metaphorical thinking in the production of mathematical concepts and constructs. Research
in neuroscience has, in fact, been shedding direct light on the relation between
the two systems (math and language), showing that how we understand numbers
and learn them might be isomorphic to how we comprehend and learn words.
As rigid disciplinary territories started breaking down in the 1980s and 1990s,
and with interdisciplinarity emerging as a powerful investigative mindset, the
boundaries between research paradigms in linguistics and mathematics have
been steadily crumbling ever since. Today, many linguists and mathematicians
see a common research ground in cognitive science, a edgling discipline in the
mid-1980s, which sought to bring together psychologists, linguists, philosophers,
Unauthenticated
Download Date | 6/6/16 9:40 PM
1 Common Ground
| 3
Unauthenticated
Download Date | 6/6/16 9:40 PM
4 | 1 Common Ground
level than simply formalizing logical structures used to carry out mathematical activities (such as proof), it is necessary to understand the neural source of
mathematics, which he claimed was the same source that produced gurative
language. Lakoff discussed his fascinating, albeit controversial, view of how
mathematicians formed their proofs and generally carried out their theoretical
activities through metaphorical thinking, which means essentially mapping ideas
from one domain into another because the two domains are felt to be connected.
The details of his argument are beyond the present purposes, although some of
these will be discussed subsequently. Suffice it to say here that Lakoff looked at
how Gdel proved his famous indeterminacy theorem (Gdel 1931), suggesting
that it stemmed from a form of conceptualization that nds its counterpart in
metaphorical cognitionan hypothesis that he had put forward previously in
Where mathematics comes from (preface).
As argued in that book, while this hypothesis might seem to be an extravagant
one, it really is not, especially if one assumes that language and mathematics are
implanted in a form of cognition that involves associative connections between
experience and abstraction. In fact, as Lakoff pointed out, ongoing neuroscientic research has been suggesting that mathematics and language result from the
process of blending, which will be discussed in due course. It is sufficient to say at
this point that Lakoffs argument is highly plausible and thus needs to be investigated by mathematicians and linguists working collaboratively. The gist of his
argument is that mathematics makes sense when it encodes meanings that t our
experiences of the worldexperiences of quantity, space, motion, force, change,
mass, shape, probability, self-regulating processes, and so on. The inspiration for
new mathematics comes from these experiences as it does for new language.
The basic model put forth by Lakoff is actually a simple one, to which we shall
return in more detail subsequently. Essentially, it shows that new understanding
comes not from such processes as logical deduction, but rather from metaphor,
which projects what is familiar through an interconnection of the vehicle and the
topic onto an intended new domain of understanding. In this model, metaphor is
not just a gure of speech, but also a cognitive mechanism that blends domains
together and then maps them onto new domains in order to understand them.
The two domains are the familiar vehicle and topic terms which, when blended
together produce through metaphor new understanding, which is the intended
meaning of the blend (see Figure 1.1).
Lakoff presents a very plausible argument for his hypothesis. But in the process he tends to be exclusive, throwing out other approaches, such as the generative one, as mere games played by linguists. While I tend to agree with the substance of Lakoffs argument, as will become evident in this book, I also strongly
believe that the other approaches cannot be so easily dismissed and, when looked
Unauthenticated
Download Date | 6/6/16 9:40 PM
1 Common Ground
| 5
Vehicle Term
Familiar Concept/
Object
New Understanding
Metaphor
Intended Meaning
Familiar Concept/
Object
Topic Term
Figure 1.1: Metaphor as the basis for new understanding
Unauthenticated
Download Date | 6/6/16 9:40 PM
6 | 1 Common Ground
4. the use of statistical techniques and probability theory to understand the internal structural mechanisms of both systems;
5. the investigation of hidden properties, such as the fact that both language
and mathematics tend to evolve towards maximum efficiency and economy
of form;
6. the comparative study of neuro-cognitive processes involved in both language
and mathematics;
7. examining the hypothesis that metaphor is at the source of both systems and
what this entails for both disciplines;
8. providing an overall synopsis of the properties that unite language and mathematics into a single faculty with different functions or, on the other hand,
explaining why the two might form separate faculties, as some contrary research evidence suggests.
The study of (1) makes up the theme of chapter 2; the various concepts implicit
in (2) and (3) will be examined in chapter 3; chapter 4 will then look at the issues
connected with (4) and (5); and chapter 5 will discuss the research connected with
(6), (7), and (8) that links (or differentiates) language and mathematics. Some of
the themes will also be found in an overlapping manner in various chapters. This
is inevitable, given the interrelationships among them. In the remainder of this
one, an overview of how these themes and topics form, historically and actually,
a common research ground of the two disciplines will be touched upon by way of
preliminary discussion. There are of course many other aspects of research that
linguists and mathematicians share in common, but the selection made here is
meant, rst and foremost, to be illustrative of how interdisciplinary collaborations
work in these two elds and, second, to examine domains where collaboration
between linguists and mathematicians has been both explicit and implicit, since
at least the 1960s. As mentioned, the basic critical thrust is hermeneutic, that is,
interpretive of the structures and concepts that make up the common ground.
1.1 Logic
An obvious area of connectivity between mathematics and linguistics is in the
domain of the philosophy of both language and mathematics and its traditional
focus on logic as the basis of mathematical activities, such as proof, and as the
basis of language grammars. The approach based on equating logic, mathematics and grammar is, as is well known, called formalism. Simply dened, formalism is an analytical (hermeneutic) method that attempts to describe the formal
(structural) aspects of language and mathematics by using ideas and methods
Unauthenticated
Download Date | 6/6/16 9:40 PM
1.1 Logic
derived from logical analysis. The basic intent is to provide a set of principles and
rules that are considered to constitute the underlying competencies that allow
people to comprehend and produce linguistic and mathematical artifacts (words,
sentences, numbers, equations, and so on). But formalist analysis is not solely
descriptive; it is also theoretical, seeking to explain how the artifacts come into
being in the rst place and what they reveal about the mind and, by extension,
human nature. In some cases, this is an explicitly-stated goal; in others it is an
unstated implicit one.
Formalism is grounded in models of logic, a fact that goes back to antiquity.
The notion of grammar itself is a de facto logic-based one, understood as a set
of ordered rules that allow speakers of a language to produce its phrases, sentences, and texts ad innitum, much like we are able to construct numbers with
a few rules for digit combination. Even a perfunctory consideration of how sentences are constructed suggests that the rules of grammar have many affinities
with the rules of arithmetic; but they also show differences. For example, addition in arithmetic is both commutative and associative, that is, the order in which
terms are added together does not matter: n + m = m + n. Some languages are
commutative; others are not. Latin is largely commutative, because its grammar
is agglutinative. A sentence such as Puer amat puellam (The boy loves the girl)
can be put together with its constituent words in any permutation, since the meaning of the sentence is determined on the basis of the case structure of the words
not their placementpuer is in the nominative case and is thus the subject of the
sentence no matter where it occurs in the sentence; puellam is in the accusative
case and is thus the object of the sentence no matter where it occurs in it. The
word order in Latin was more reective of social emphases than of syntax and
was, therefore, mainly a feature of style or emphasis. If, for example, the object
was to be emphasized, then the sentence was constructed as: Puellam puer amat.
English, on the other hand, is largely non-commutativeThe boy loves the girl
has a different meaning than The girl loves the boy and, of course, jumbling the
words in the sentence produces a nonsense string. This is why a language such
as English is sometimes called a digital language, because, like the binary and
decimal systems in numeration, symbol placement has valeur, as Saussure (1916)
called it; that is, it assumes a value in a specic structural slot or in a particular
structural set of relations among symbols. Grammar and arithmetic, therefore, evidently constitute a common ground for the study of the general formal properties
(or rules) that underlie the organization of their constituent symbols and forms.
The reason is that both are (purportedly) formal logical systems. There are ve
main principles that sustain formalism:
Unauthenticated
Download Date | 6/6/16 9:40 PM
8 | 1 Common Ground
1.
Reason is the mental process that undergirds the formation of a system such
as language or mathematics.
2. Every system is grounded on rules of formation that can be specied formally.
3. The systematic use of the rules and their constituent symbols determine if
logical validity is inherent in a system or not.
4. The concatenation of symbols and rules (called the syntax) is the essence of
the systems grammar.
5. By examining logical systems for completeness and decidability it can be determined if the systems are consistent or not.
Sets of principles like these are classied under the rubric of the logical calculus.
The term is dened broadly as a set of symbols, axioms, and rules of formation
guided by logical sequence, entailment, and inference which are, in turn, the basis for activities such as mathematical proofs, syllogisms, language syntax, among
others. The logical calculus is the cornerstone of any formal systemas for example, Euclidean geometry, argumentation, the organization of knowledge in dictionaries and encyclopedias, and so on.
Unauthenticated
Download Date | 6/6/16 9:40 PM
1.1 Logic |
niz espoused the idea that all languages were based on universal properties of
logic. This is why they had the same basic kind of rules for making sentences, revealing that all humans possessed the same innate faculty of logic. Shortly after
Leibniz in the nineteenth century, the formal study of grammars emerged alongside the study of linguistic change. In the early twentieth century, anthropological
linguists such as Franz Boas (1940) challenged this universal logic approach to
the study of grammar, especially since his research showed that there was much
more to language than a set of rules and rule-making principles for the construction of sentences. Boas saw the study of different grammars as a means to understand how every language served the specic needs of its inventors and users.
Grammars are inventions of particular peoples adapting to their particular environments. Danish linguist Otto Jespersen (1922), on the other hand, revived the
notion of universal properties in the worlds languages, leading eventually to the
rise of the generative movement in the late 1950s.
Arguably, the raison dtre for the formal study of rule systems and their properties in mathematics and language is the belief that knowledge systems can be
decomposed ultimately into irreducible units that, when combined, show constituency and coherence of structure. Knowledge cannot be random; it must be
organized in order for it to be useful and useable. Rules are really attempts to
characterize the organization of systems. The premise is, therefore, that within
a system, separate and seemingly disparate forms such as words and numbers
will take on coherence and validity only if they are organized by rules that are,
themselves, derivatives of a general class of rules of logic that make up human
reason. This paradigm has allowed linguists and mathematicians to provide relevant organizational frameworks and to postulate increasingly abstract properties
about them. In linguistics that postulation has led to theories of grammar, such
as the generative one; in mathematics it has led to theories of proof, numbers,
and the amalgamation of subsystems such as geometry and arithmetic (analytic
geometry). Rules are not prescriptions; they are formal statements about what is
possible or allowable within each system.
A perfect example of what formal rules of grammar are, actually, is found
in Pn.inis grammar of Sanskrit for which he identied 4,000 sutras (rules) in
his treatise titled Ashtadhyayi. His sutras are the earliest extant example of formal grammatical analysis on record. It is no coincidence that Pn.ini was also
considered to be a great mathematician in India. The sutras are very much like
mathematical rules, showing how Sanskrit words, phrases, and sentences are
interlinked sequentially and through entailment (Kadvany 2007)two basic features of the logical calculus. He also introduced the notion of mapping, preguring current theoretical models such as those involving metaphor, whereby one set
Unauthenticated
Download Date | 6/6/16 9:40 PM
10 | 1 Common Ground
of rules is mapped onto other domains (including other sets) to produce a complete and coherent grammar (Prince and Smolensky 2004).
An example of Pninis method can be seen in the rst two sutras:
1.
2.
vr.ddhir daiC
adeN gun.ah.
The capital letters are symbols for phonemic units or other phonological structures; the other parts of the sutras describe morphological structure and how
it relates to both the phonological constituents and syntactic forms in general.
These are truly remarkable, showing how the main components of a grammar
the phonological, morphological, and syntacticare interrelated, preguring
modern-day grammars. The goal of a formal grammar, as will be argued more
extensively in the next chapter, has always been to show how these components
interact through a sequence of rules of different types, via entailment and mapping. This was Chomskys explicitly-stated goal in 1957. But this formalist mindset
has found resonance in other models of language. For example, in tagmemics
(Pike 1954, Cook 1969), the basic unit of analysis, called the tagmeme, is akin to
a sutra in that it shows how grammatical classes (such as subject and object) are
connected to paradigmatic, or slot-based, llers (nouns, verbs, adjectives, and
so on). The hierarchical organization of levels (from phonology to discourse) is
composed of tagmemes that are combined into more complex units, called syntagmemes. And like UG theory, straticational grammar (Lamb 1999) sees rule
types as mirroring neural processes. The separate strata of language are assumed
to reect the organization of neural wiring in the brain that consists of strands
connected to each other as in electric circuitry.
Pn.inis pioneering work on grammar inuenced mathematical theories in
ancient India, constituting perhaps the rst ever awareness of a connection between language and mathematics. Indian mathematicians started representing
numbers with words, and ultimately developing numerical axioms linked to each
other in the same way that sutras in language are interrelated at various levels.
At about the same time in Greece, Aristotle took a comparable interest in formalizing grammar, identifying the main parts of a sentence as the subject and the
predicatea structural dichotomy that is still a fundamental part of grammatical
analysis to this day (Bck 2000). Aristotle inspired others to study grammar with
the tools of formal logic, rather than impressionistically. He was, of course, aware
of the difference between the literal and rhetorical uses of the units of language,
writing two masterful treatises on this topic (Aristotle 1952a, 1952b). But, for Aristotle, rhetorical language, such as that manifesting itself in poetry, fell outside the
perimeter of grammar proper, and was thus to be considered an extension of, or
Unauthenticated
Download Date | 6/6/16 9:40 PM
1.1 Logic
11
exception to, literal language. One can study rhetorical language on its own, as a
self-contained system. Its overall function was aesthetic and thus fell outside of
strict formal grammatical analysis.
Ironically, it was Aristotle who coined the term metaphor, as is well known.
For Aristotle it was a very useful trope that allows us to refer to something that we
grasp intuitively, but which seems to defy a straightforward literal explanation
or concrete demonstration. Unlike visible things, such as animals, objects, and
plants, something like an idea cannot be shown for someone to see with the
eyes. However, by comparing it to something familiar in an imaginary way, then
we can grasp it much more easily. Aristotle saw metaphor as a heuristic tool for
understanding things that cannot be demonstrated concretely. The tool itself was
based on what he called proportional reasoning. For example, in the metaphor
Old age is the evening of life, a proportion can be set up as follows:
A = old age,
B = life,
Therefore:
C = evening,
D = day
A is to B as C is to D
The reasoning thus hides a hidden logicthe old age period is to life as the
evening is to the day. Now, as knowledge-productive as it was, the most common
function of metaphor in human life was, according to Aristotle, to spruce up more
basic literal ways of speaking and thinking using the logic of proportionality
(Aristotle 1952a: 34). Aristotles view of rhetorical language remained a dominant
one for many centuries until, virtually, the present era when the work on metaphor
within cognitive linguistics is telling a completely different story. One source for
the exclusion of metaphor from serious consideration in western philosophy and
science were the views of rationalist philosophers such as Descartes, Leibniz, and
Locke. Locke (1690: 34) even went so far as to characterize metaphor as a fault:
If we would speak of things as they are, we must allow that all the art of rhetoric, besides
order and clearness, all the articial and gurative application of words eloquence hath
invented, are for nothing else but to insinuate wrong ideas, move the passions, and thereby
mislead the judgment; and so indeed are perfect cheats: and therefore, however laudable or
allowable oratory may render them in harangues and popular addresses, they are certainly,
in all discourses that pretend to inform or instruct, wholly to be avoided; and where truth
and knowledge are concerned, cannot but be thought a great fault, either of language or
person that makes use of them.
Unauthenticated
Download Date | 6/6/16 9:40 PM
12 | 1 Common Ground
Unauthenticated
Download Date | 6/6/16 9:40 PM
1.1 Logic
13
ment known as humanism, which also stressed human reason and imagination
above all else. Within this paradigm shift there were somecalled nominalists
who argued that it is foolish to think that reason guides understanding because
it is based on language. John Duns Scotus and William of Ockham, for instance,
stressed that words ended up referring to other words, rather than to actual things;
and thus that they hardly were conducive to logical thought. Thomas Aquinas had
argued, however, that words did indeed refer to real things in the concrete and to
categories of things in the abstract, even if they constituted variable human models of them (Osborne 2014). At about the same time, Roger Bacon developed one
of the rst comprehensive typologies of linguistic signs, claiming that, without a
rm understanding of the role of logic in the constitution and use of sign systems,
discussing if truth is or is not encoded in them would end up being a trivial matter
of subjective opinion (Bacon 2009).
The foregoing historical foray into the origins and rise of formalism is, of
course, a highly reductive one. The point intended has been simply to suggest
that the emergence of the concept of grammar as a set of rules connected logically
to each other is an ancient one, paralleling the Euclidean view that mathematics
is founded on axioms, postulates, theorems, and rules of combination that lead
to proofs. The ancient grammarians and mathematicians thus laid down the
foundations for formalism to arise as a major paradigm in the philosophy of both
mathematics and language. But it was not an arbitrary introspective mode of
inquiry; it was based on observing and classifying the facts, before devising the
relevant rules. This epistemology can be portrayed in the form of the diagram
below, presented here simply as a schematic model summarizing the formalist
hypothesis as it was established in the ancient world. Note that this is not found
in any of the ancient or medieval writings; it is simply a diagrammatic summary
of the foregoing discussion (see Figure 1.2).
The rst explicit study of grammatical rules in their own right, apart from
their use in the generation of sentences, can be traced to the seventeenth century
and the Port-Royal Circle. In their 1660 Port-Royal Grammar, Antoine Arnauld
and Claude Lancelot put forth the notion that complex sentences were made
up of smaller constituent sentences that had been combined by a general rule;
this was a truly radical idea for the time (Rieux and Rollin 1975), although the
concept of mapping found in Pn.ini certainly pregured this very notion. Clearly,
the Port-Royal grammarians were unaware of Pn.inis work. A sentence such
as Almighty God created the visible world not only could be decomposed into
smaller constituentsGod is almighty, God created the world, The world is visiblebut could be described as the end result of a rule that combined the smaller
constituents into the complex sentenceit is a sort of meta-rule that combines
the sentences produced by lower-level rules. Arnauld and Lancelot then argued
Unauthenticated
Download Date | 6/6/16 9:40 PM
14 | 1 Common Ground
Linguistic,
Mathematical
Facts
Words,
phrases,
sentences
Counting,
adding, taking
away
Putting the
words, phrases,
and sentences
into basic
classes
Putting the
numbers into
operations
Formalization
of the classes
into ordered
rules of grammar and of the
arithmetical
operations
Unauthenticated
Download Date | 6/6/16 9:40 PM
1.1 Logic
15
lute sense. But, as Berlinski (2013: 13) suggests, the Platonic view is not so easily
dismissible even today:
If the Platonic forms are difficult to accept, they are impossible to avoid. There is no escaping them. Mathematicians often draw a distinction between concrete and abstract models
of Euclidean geometry. In the abstract models of Euclidean geometry, shapes enjoy a pure
Platonic existence. The concrete models are in the physical world.
Moreover, there might be a neurological basis to the Platonic view. As neuroscientist Pierre Changeux (2013: 13) muses, Platos trinity of the Good (the aspects of
reality that serve human needs), the True (what reality is), and the Beautiful (the
aspects of reality that we see as pleasing) is actually consistent with notions being
explored in modern-day neuroscience:
So, we shall take a neurobiological approach to our discussion of the three universal questions of the natural world, as dened by Plato and by Socrates through him in his Dialogues.
He saw the Good, the True, and the Beautiful as independent, celestial essences of Ideas,
but so intertwined as to be inseparable within the characteristic features of the human
brains neuronal organization.
However, there is a conundrum that surfaces with Platos view. Essentially, it implies that we never should nd faults within our formal systems of knowledge,
such as exceptions to rules of grammar and arithmetic, for then it would mean that
the logical brain is faulty. As it turns out, this is what Gdels (1931) theoremor
more correctly theoremsrevealed. However, if mathematics is faulty because we
are faulty, why does it lead to demonstrable discoveries, both within and outside
of itself? Ren Thom (1975, 2010) referred to discoveries in mathematics as catastrophes in the sense of events that subvert or overturn existing knowledge (rule
systems). Thom named the process of discovery semiogenesis which he dened
as the emergence of pregnant forms within symbol (rule) systems themselves.
These develop in the human imagination through contemplation and manipulation of the forms. As this goes on, every once in a while, a catastrophe occurs that
leads to new insights, disrupting the previous system. Now, while this provides a
plausible description of what happensdiscovery is indeed catastrophicit does
not tell us why the brain produces catastrophes in the rst place. Perhaps the connection between the brain, the body, and the world will always remain a mystery,
since the brain cannot really know itself.
Actually, the dichotomy between logic and constructivism, or in more contemporary terms, formalism and blending, is an articial one, with those on either
side staking their territories in an unnecessarily adversarial way. Both viewpoints
have some validity and both need to be compared and contrasted in order to get
a more comprehensive understanding of the mental forces at work in producing
Unauthenticated
Download Date | 6/6/16 9:40 PM
16 | 1 Common Ground
both mathematics and language. This is a theme that will be interspersed throughout this book. In my view, there is no one way to explain mathematics or language;
there are likely to be many ways to do so, no matter how faulty or impartial these
are. There will never be a general theory of anything, just pieces of the theory
that can be combined and recombined in various ways according to situation and
needs.
The rst counter-argument to the Port-Royale paradigm was put forward by
Wilhelm von Humboldt (1836), who maintained that languages may have similar
rule types in the construction of their grammars, but the rules only touched the
surface of what the faculty of language was all about. He basically described it as
a powerful tool for carving up the world, fullling the specic needs of the people
who used it. Below the surface, the rules of a specic language thus tell a different
story than just the logical selection and combination of forms independently of
how they relate to reality (Platos Truth). They reected what Humboldt called an
innere Sprachform (internal speech form), which encodes the particular perspectives of the people who speak the language. He put it as follows (Humboldt 1836
[1988]: 43):
The central fact of language is that speakers can make innite use of the nite resources
provided by their language. Though the capacity for language is universal, the individuality
of each language is a property of the people who speak it. Every language has its innere
Sprachform, or internal structure, which determines its outer form and which is a reection
of its speakers minds. The language and the thought of a people are thus inseparable.
Despite the ideas of Humboldt and Boas (mentioned above), the study of the universal properties of grammars continues to constitute a major trend in current
linguistics. The formalist hypothesis will be discussed in more detail in chapter 2.
The premise behind this hypothesis, as implied by the above model (gure 1.2)
derives from the common-sense observation that when we put words together to
express some thought or to convey some piece of information, the combination is
not random, but rule-based, and this is also why the meaning of a combinatory
structure cannot be computed as the sum of the meaning of its parts. Each word
taken in isolation can, of course, be studied on its own from several perspectives
in terms of the pronunciation patterns it manifests, in terms of the specic meanings it encompasses, and so on. In fact, a large portion of linguistic analysis has
been, and continues to be, devoted to the study of units and forms in isolation.
But the power of language does not lie just in the units taken separately, but in
the ways in which they are combined, that is, in their grammar. Sentences are,
in this view, holistic structures that are governed by rule-making principles that
are used to make up the sentences, much like an architect puts together specic
architectural forms to design a building. This premise is still the one that drives
Unauthenticated
Download Date | 6/6/16 9:40 PM
1.1 Logic
17
Unauthenticated
Download Date | 6/6/16 9:40 PM
18 | 1 Common Ground
idea forgot, according to Vico, was that imagination (eikon) is essential to thought.
These philosophers pay lip service to it, but ultimately end up privileging rational
logic as the main form of mentality deployed in mathematics and grammar.
1.1.2 Syntax
The syntax hypothesis (as it is called here) was articulated explicitly for the rst
time in 1957, when Chomsky argued that an understanding of language as a universal faculty of mind could never be developed from a piecemeal analysis of the
disparate structures of widely-divergent languages taken in isolation, which, he
suggested, was the approach taken by American structuralists such as Bloomeld (1930). The units of different languagesthe phonemes and morphemesare
certainly interesting in themselves, but they tell us nothing about how they are organized to produced larger structures, such as sentences. He claimed, moreover,
as did the Port-Royal grammarians, that a true theory of language would have to
explain why all languages seem to reveal a similar structural plan for constructing
their sentences. He proposed to do exactly that by shifting the focus in structural
linguistics away from the making of inventories of isolated piecemeal facts to a
study of the rule-making principles that went into the construction of sentences.
He started by differentiating between the deep structure of language, as a
level of organization which could be characterized with a small set of rules that
were likely to be found in all languages, no matter how seemingly different they
appeared, and the surface structure where sentences are well-formed and interpreted in rule-based ways. The relation of the surface to the deep structure was
established by a set of rules, called transformational, that mapped deep structure
strings onto surface ones. So, in this rather simple, yet elegant, model, all languages share the same set of deep structure rules but differ in the type and/or
application of transformational rules. Although this version of generative grammar has changed radically (at least according to the generativists themselves), it is
still the basic outline of how rules in generative grammar functionthey generate
basic strings of units and then transform them in more complex ways.
The essence of Chomskys initial approach can be seen in the analysis he himself put forward of the following two sentences:
1.
2.
Unauthenticated
Download Date | 6/6/16 9:40 PM
1.1 Logic |
19
Both these sentences, Chomsky observed, would seem to be built from the same
structural plan on the surface, each consisting of a proper noun followed by a
copula verb and a predicate complement:
Structural Plan
Proper Noun
Copula Verb
Predicate Complement
John
is
eager
to please
John
is
easy
to please
Despite the same surface structure, the sentences mean very different things:
(1) can be paraphrased as John is eager to please someone and (2) as It is easy
for someone to please John. Chomsky thus concluded that the two sentences had
different deep structures, specied by phrase structure rulesthese merge into
one surface structure as the result of the operation of transformational rules.
This is brought about by rules that: delete someone in (1); delete It and for someone and move John to the front in (2). Although this is a simplied explanation
of Chomskys example, it still captures the essence of his method and overall
blueprint for grammar.
Chomskys approach was radical for the times, providing arguably the rst
formal theory of how sentences are related to each other and what kinds of rules
inform the grammar of any language. The two main types, as we saw, are phrase
structure and transformational, and the latter operate schematically as follows:
Transformational Rule
|
John is eager to please
someone
Surface Structure
|
Delete someone
Transformational Rules
Surface Structure
It
for someone
John to the
to please John
front
Figure 1.4: Transformational rules
Unauthenticated
Download Date | 6/6/16 9:40 PM
20 | 1 Common Ground
Chomsky then suggested that, as linguists studied the deep structures of different
languages, and how transformational rules mapped these onto surface structures
differentially, they would eventually be able to conate the rules of different languages into one universal set of rule-making principlesthe syntax hypothesis.
Chomskys proposal became immediately attractive to many linguists, changing
the orientation and methodology of linguistics for a while. Above all else, the syntax hypothesis seemed to open the research doors to investigating the age-old
belief that the rules of grammar corresponded to universal innate logical ideas
(Plato, Descartes). Moreover, it was a very clear and simple proposal for linguists
to pursue.
But problems with the syntax hypothesis were obvious from the outset. It was
pointed out, for instance, that abstract rule-making principles did not explain the
semantic richness of even the most simple sentences. This critique put the very
notion of a deep structure embedded in phrase structure rules seriously in doubt.
Moreover, it was suggested that the universal rules inferred by linguists by comparing the deep structures of different languages rested solely on the assumption
that certain rules were more basic then others. As it has turned out, it was the
structure of the positive, declarative sentence of the English language that was
seen as the default sentence type that best mirrored the deep structure of the
UG. Although this assumption has changed over the years, it is correct to say that
the basic plan of attack in generative grammar has not. The search for universal rules and language-specic adaptations of these rules (known as parameters)
continues to guide the overall research agenda of generative linguistics to this day
and, by extension, of any formal approach based on the syntax hypothesis.
Chomsky proclaimed that the primary task of the linguist was to describe
the native speakers ideal knowledge of a language, which he called an unconscious linguistic competence, basically substituting this term for Saussures term
of langue. From birth, we have a sense of how language works and how its bits
and pieces are combined to form complex structures (such as sentences). And
this, he suggested, was evidence that we are born with a unique faculty for language, which he later called an organ, that allows us to acquire the language
to which we are exposed in context effortlessly. Language is an innate capacity.
No one needs to teach it to us; we acquire it by simply listening to samples of
it in childhood, letting the brain put them together into the specic grammar on
which the samples are based. It is as much an imprint as is our reex system. Given
the status that the syntax hypothesis had attained in the 1960s and most of the
1970s, many linguists started researching the syntax hypothesis across languages
and investigating the details of grammatical design. By the 1980s, however, the
utility of this line of inquiry started to be seen with less enthusiasm, and a surge
of interest in investigating how languages varied in structure according to social
Unauthenticated
Download Date | 6/6/16 9:40 PM
1.1 Logic
21
variables and different cultural contexts became increasingly a mainstream paradigm within linguistics. Ironically, this counter-response to generative grammar
(in its most rigid versions) may have been brought about in large part by the fact
that generativism had produced an overload of theories, making it somewhat unmanageable and unwieldy as a formal approach to language, which requires a
unied theoretical framework.
But generative grammar did bring about one very important change in the
mindset of many linguistsit associated mathematics with language. Generative
grammar was, in fact, called mathematical by many for the reason that it used
notions from mathematics, such as Markov chains, commutation, tree structure,
transformation, and the like. The main premise of the syntax hypothesis is that
when units are combined into larger complex structures they produce new and
emergent forms of meaning. In various domains of science ad mathematics
emergence of form is seen as arising through interactions and relations among
smaller and simpler units that themselves may not exhibit the properties of the
larger entities. The syntax hypothesis is a version of this view (Hopper 1998),
driving a large portion of research in formalist and computational linguistics, as
we shall see in the next two chapters.
The counter-movement to generativism has come to be called functionalism. Its basic tenet is that grammar is not hard-wired in the brain, but rather
that it varies according to the functions that a language allows speakers to carry
out. From this paradigm, several research trends have emerged, such as systemic
grammar and cognitive linguistics. The main claim of functionalists, who parallel
in outlook the constructivists in mathematics, is that grammar is connected to
the innere Sprachform (to recall von Humboldts term). As discussed briey, Franz
Boas had espoused a very similar perspective before the generative movement.
Collecting data on the Kwakiutl, a native society on the northwestern coast of
North America, he explored how the grammar and vocabulary of that language
served specic social needs. They were the result, in other words, of the particular experiences of the Kwakiutl. In response to functionalism, the generativists
claimed that they were not against the study of socially-diverse forms of language,
but that, like Saussures (1916) distinction between langue and parole, these were
best approached via branches such as sociolinguistics and linguistic anthropology. Moreover, these were really matters of detail. A language such as Kwakiutl
was still based on the same grammatical blueprint of any other language in the
world. Linguistic competence is an autonomous faculty that should be studied as
such, much like the axiomatic structure of arithmetic, which can be studied apart
from its practical manifestations.
For the sake of historical accuracy, it should be mentioned that the concept of
phrase structure came out of early structuralism. Leonard Bloomeld (1933), for
Unauthenticated
Download Date | 6/6/16 9:40 PM
22 | 1 Common Ground
example, emphasized the need to study the formal properties of sentences and
phrases, which he called immediate constituent (IC) analysis. In IC analysis sentences are divided into successive constituents until each one consists of only a
word or morpheme. In the sentence The mischievous boy left home, the rst
subdivision of immediate constituents would be between The mischievous boy and
left home. Then the internal immediate constituents of the rst are segmented as
the and mischievous boy, and then mischievous boy is further divided into mischievous and boy. The constituent left home, nally, is analyzed as the combination of left and home. Chomsky took his cue from IC analysis, adding the mathematical notion of transformation to it, as he himself acknowledged (Chomsky
1957).
Various extensions, modications, and elaborations of generative grammar
have been put forward since 1957. There is no need to discuss them in detail here.
Suffice it to say that the three main ones are the following:
1. Transformational-generative grammar (TG grammar), which is based on
Chomskys original model of 1957 that he modied in 1965, becoming at
the time the so-called standard theory. It still has many adherents who
see it as a straightforward approach to the syntax hypothesis. As will be
discussed in the next chapter, TG grammar includes phrase structure rules,
transformational rules, and lexical insertion rules. The latter are rules that
insert lexemes into the slots in the strings generated by the syntactic rules.
In 1965, Chomsky put forward a detailed account of how these rules worked,
including projection rules and subcategorization rules. In my own view, TG
grammar is still the most elegant and viable formalist theory of language,
even though many would claim that this is a nave view. Admittedly, only
experts in formalist linguistics can truly discuss the signicant departures
from the early TG theory, but to linguists who do not follow the formalist
hypothesis, it is my sense that TG theory is still the most attractive one.
2. Government and Binding (GB) theory, which is an elaboration of TG, developed by Chomsky himself in the late 1970s and 1980s where he introduced
the concept of modularity, whereby modules (basic and complex) are related
to each other through rules, rather than as being considered part of a dichotomy of deep and surface structure forms. In some versions of GB theory,
the surface-structure is actually seen as unnecessary. GB theorists have also
added stylistic rules and meaning-changing rules to the basic generativist
framework, in order to address various critiques that emerged with regard
to the articial separation of syntax from semantics in the TG model. The
concepts of deep and surface structure are thus greatly modied (now called
d-structure and s-structure) and considered to be linked by movement rules.
Unauthenticated
Download Date | 6/6/16 9:40 PM
1.1 Logic |
3.
23
Unauthenticated
Download Date | 6/6/16 9:40 PM
24 | 1 Common Ground
There are various other kinds of theoretical frameworks that subscribe to the
syntax hypothesis. These need not be discussed here in any detail, since they
have a handful of adherents. They simply merit mention: Arc pair grammar, Dependency grammar, Lexical functional grammar, Optimality theory, Stochastic
grammar, and Categorical grammar. The central feature of all is the belief that
there are two sets of rulesone for making up basic structures and one for mapping these onto more complex ones.
Jakobson then went on to note that the mathematician mil Borel, just before the
Fourth International Congress of Mathematicians in 1909, attributed the paradoxical nature of denumerable innities in math theory to the inuences of language
used to explain it. From this clever remark, a widely-held suspicion that language
and mathematics were intrinsically intertwined dawned upon many. As Bloomeld (1933: 512) succinctly put it a few years later: mathematics is merely the best
that language can do. Therefore, Jakobson (1961: 21) concluded, the connectivity
between the two systems must be of primary interest for mathematicians and
linguists alike.
Formalist approaches are very useful in describing structure, and especially
how rules interact to produce complexity of structure. But in order for the rules
to work unhampered, meaning must be discarded from their formal architecture,
or else meaning must be treated as either a separate phenomenon or as an appendage to the rules of syntax.
Unauthenticated
Download Date | 6/6/16 9:40 PM
1.1 Logic
25
Meaning has always been a thorn in the side of formalism, since it is almost
impossible to divorce it from formal structureseven if pure symbolic systems are
used for constructing rules. Cognizant of the role of linguistic meaning in mathematics, in 1980 the Association of Teachers of Mathematics published a handbook
showing how deeply interconnected mathematics is with the linguistic meanings
we ascribe to it. Since then, math teachers and their professional associations
have become increasingly interested in this interconnection, aiming to use any
relevant insight in order to improve pedagogy. The study of how mathematics is
learned indicates that there is, in fact, more to it than just acquiring formalisms
and learning to think logically (Danesi 2008). One of the learning problems involved is, as Borel aptly noted, that language is used to teach mathematics and
to formulate problems. To quote Kasner and Newman (1940: 158): It is common
experience that often the most formidable algebraic equations are easier to solve
than problems formulated in words. Such problems must rst be translated into
symbols, and the symbols placed into proper equations before the problems can
be solved. As a trivial, yet useful example, of how language and mathematics
can easily become enmeshed ambiguously, note that the operation of addition is
described by variant English words such as and, sum, total, add together; conversely, subtraction is normally suggested by expressions such as less, from, take
away, difference, is greater than, and so on. A similar variety of expressions is
found in many other languages. These lexical variants can be a source of difficulty
for students learning mathematics who struggle to translate them into the simple
symbol +. So, those who do not have access to the semantic differences among
these expressions may manifest specic kinds of learning difficulties, or else may
be confused by the inconsistency (or decorative air, so to speak) of the language
used (Danesi 1987).
One of central objectives of formal analysis is to eliminate ambiguities, inconsistencies, and supercial ornamentations of this kind. To do so, the logical
calculus provides a series of denitions, axioms, symbols, and postulates that
do not vary or that resist ambiguous interpretation. This means, for instance, developing symbols for numbers and arithmetical operations that do not vary according to whim or situation. The history of arithmetic bears this out. The rst
number systems were derived from the use of material objects to represent numerical concepts (Schmandt-Besserat 1978, 1992); the words referring to the objects
themselves came, over time, to stand for the numerical concepts as well. Around
3000 BCE the Egyptians started using a set of number symbols based on counting
groups of ten (without place value) to represent numerical concepts; and a little
later the Babylonians developed a sexagesimal system based on counting groups
of 60a system we still use to this day to mark the passage of time. These early
societies developed number systems primarily to solve practical problemsto sur-
Unauthenticated
Download Date | 6/6/16 9:40 PM
26 | 1 Common Ground
vey elds, to carry out intricate calculations for constructing buildings, and so on.
For this, they needed a standard system of numerical representation. They were
also interested in numbers as abstractions, but, by and large, they were mainly interested in what they could do with numbers in terms of engineering and business
affairs. Remarkably, their numerical symbol systems were closed systems (unambiguous and consistent), unlike the language used to describe them, which varied
according to context and usage.
It was the Greeks who took a step further in removing ambiguity and inconsistency in formal number systems, by examining the numbers in themselves,
apart from their uses in everyday life, developing mathmatik (a term coined by
Pythagoras). Around 300 BCE, Euclid founded the rst school of mathmatik in
Alexandria to study numbers, geometrical gures, and the method of proof in formal ways, independent of their uses in practical tasks. These could, of course, be
applied to construction and engineering activities, but their abstract study was
an autonomous one. From there the distinction between pure (or theoretical) and
applied mathematics surfaceda distinction that some, like Archimedes, did not
see as useful. Even today, some would claim that pure mathematics must be kept
separate from applied mathematics; but this ignores a whole set of discoveries
that have worked the other way around, whereby applications of mathematical
ideas have, themselves, led to further theorization. The dichotomy started probably with Euclid, who wrote the rst treatise of formal mathematics titled the
Elementsa book that has permanently shaped how we conceptualize mathematical methodology.
A key aspect of Greek mathematical formalism was the use of writing symbols
to represent numerical concepts in a consistent waya practice that was, in itself,
an engagement with mathematical abstraction. When alphabet symbols appeared
on the scene around 1000 BCE, they were used to represent not only sounds, but
numbers. The order {A, B, C, } of the alphabet is based on that early practice,
where A stood for the number 1, B for the number 2, and so on. The Greeks were
the rst to use alphabet letters for numbers. Their notation, however, was derived
from previous notation, such as the Egyptian one. Bellos (2014: 64) describes this
remarkable milestone in the history of mathematics as follows:
By the time of Euclid, the Greeks were using a number system derived from Egyptian hieratic
script: 27 distinct numbers were represented by 27 distinct symbols, the letters of the Greek
alphabet. The number 444 was written , because was 400, was 40 and was 4. Fractions were described rhetorically, for example, as eleven parts in eighty-three, or written as
common fractions with a numerator and a denominator, much like the modern form, 11/83,
although the Greeks maintained the historic obsession with unit fractions.
Unauthenticated
Download Date | 6/6/16 9:40 PM
1.1 Logic
27
As Greek mathematicians started studying the properties of numbers in themselves, they introduced separate symbols for the latter. This was one of the rst
events that made an abstract conceptualization of numbers possible. The rst
formal mathematical system was, as mentioned, the one devised by Euclid in his
Elements, consisting of a set of axioms from which theorems, propositions, and
postulates could be investigated and/or proved. In it, we nd the rst denitions
of number and of various types of numberdenitions designed to skirt around
ambiguity and inconsistency. There are 22 denitions in total in Book VII of the
Elements which are worthwhile reproducing here, since they show how early formalism was, and still is, a system of analysis based on clear and unambiguous
denitions of basic constituent units (Euclid 1956).
1. A unit is that by virtue of which each of the things that exist is called one.
2. A number is a multitude composed of units.
3. A number is a part of a number, the less of the greater, when it measures the
greater.
4. But parts when it does not measure it.
5. The greater number is a multiple of the less when it is measured by the less.
6. An even number is that which is divisible into two equal parts.
7. An odd number is that which is not divisible into two equal parts, or that
which differs by a unit from an even number.
8. An even-times-even number is that which is measured by an even number
according to an even number.
9. An even-times-odd number is that which is measured by an even number according to an odd number.
10. An odd-times-odd number is that which is measured by an odd number according to an odd number.
11. A prime number is that which is measured by a unit alone.
12. Numbers relatively prime are those which are measured by a unit alone as a
common measure.
13. A composite number is that which is measured by some number.
14. Numbers relatively composite are those which are measured by some number
as a common measure.
15. A number is said to multiply a number when the latter is added as many times
as there are units in the former.
16. And, when two numbers having multiplied one another make some number,
the number so produced be called plane, and its sides are the numbers which
have multiplied one another.
17. And, when three numbers having multiplied one another make some number,
the number so produced be called solid, and its sides are the numbers which
have multiplied one another.
Unauthenticated
Download Date | 6/6/16 9:40 PM
28 | 1 Common Ground
Unauthenticated
Download Date | 6/6/16 9:40 PM
1.1 Logic
29
4. metric structurethe interval or distance between the numbers can be measured precisely;
5. unidimensionalitynumber forms (digits, fractions, and so on) are unidimensional (at) structures constituting points on the line;
6. topological structurethe differential order and metric structure of the numbers determines their particular occurrence in space, implying that the set of
real numbers is an ordered eld.
As mentioned, one of the main premises of formal analysis is that it must be complete (leaving out other contrasting possibilities) and consistent (avoiding circularities, ambiguities, and statements that cannot be proved or disproved). Euclids
geometry is perhaps the one that most approaches completeness and consistency,
even though it may have few applications outside of the plane (two-dimensional
space), as demonstrated by non-Euclidean geometries. However, the fth postulate is problematic and may be a Gdelian aw, so to speak, in the Euclidean
system:
If a straight line crossing two straight lines makes the interior angles on the same side less
than two right angles, the two straight lines, if extended indenitely, meet on that side on
which are the angles less than the two right angles.
The postulate refers to a diagram such as the one below. If the angles at A and B
formed by a line l and another two lines l1 and l2 sum up to less than two right
angles, then lines l1 and l2 meet on the side of the angles formed at A and B if
continued indenitely:
l
A
l1
l2
B
Figure 1.5: Euclids fth postulate
Unauthenticated
Download Date | 6/6/16 9:40 PM
30 | 1 Common Ground
In the 1800s, mathematicians nally proved that the parallel postulate or axiom is essentially not an axiom. This discovery led to the creation of geometric
systems in which the axiom was replaced by other axioms. From this non-Euclidean geometries emerged. In one of these, called hyperbolic or Lobachevskian
geometry, the parallel axiom is replaced by the following one: Through a point
not on a given line, more than one line may be drawn parallel to the given line. In
one model of hyperbolic geometry, the plane is dened as a set of points that lie
in the interior of a circle. Parallel lines are dened, of course, as lines that never
intersect. In the diagram below, therefore, the lines going through point X are all
parallel to line QP, even though they all pass through the same point. The lines
cross within the circle and there exist an innite number of parallels that can also
be drawn within it. The reason for this is, of course, that the lines, being inside
the circle, cannot be extended beyond its circumference:
P
Q
Figure 1.6: Lobachevskian Geometry
Of course, if the lines were to be extended outside the circle, then all but one
of them would intersect with QP. Around 1860, Riemann had another whimsical
hunch: Is there a world where no lines are parallel? The answer is the surface of a
sphere on which all straight lines are great circles. It is, in fact, impossible to draw
any pair of parallel lines on the surface of a sphere, since they would meet at the
two poles:
Great circles
> 180
Unauthenticated
Download Date | 6/6/16 9:40 PM
1.1 Logic
31
Because one important use of geometry is to describe the physical world, we might
ask which type of geometry, Euclidean or non-Euclidean, provides the best model
of reality. Some situations are better described in non-Euclidean terms, such as aspects of the theory of relativity. Other situations, such as those related to everyday
building, engineering, and surveying, seem better described by Euclidean geometry. In other words, Euclidean geometry is still around because it is a system that
has applications in specic domains. And this is a central lesson to be learned
by a discussion of the logical calculusit is system-specic, that is, it applies to
certain domains. Each domain thus has its own logical calculus. Lobachevskian
and Riemannian geometries, and by extension n-dimensional geometries, have
developed their own axioms, postulates, symbols, and rules for proving propositions within the system as either true or false. So, there are various types of logical
calculi, but all are based on the use of symbols and rules of combination that are
complete and consistent within a system.
Given that both Euclidean and non-Euclidean logic make sense and have
applications to the real world, one can see the reason why logical systems are so
appealingthey turn practical and intuitive knowledge into theoretical knowledge so that it can be applied over and over (Kaplan and Kaplan 2011). The
Pythagorean theorem was not just a recipe of how to construct right triangles;
it revealed the abstract nature of triangular structure and how it was connected
to the world. The Pythagorean triples that are derived from c2 = a2 + b2 could thus
be seen to refer not only to specic properties of right triangles, but to properties
of numbers themselves, leading eventually to Fermats Last Theorem and all the
intellectual activities that it has generated (Singh 1997).
From Euclids time onwards, it is therefore not surprising to nd that mathematics and logic were thought to be intrinsically intertwined, with one mirroring
the other. But Charles Peirce (a logician and mathematician) argued eloquently
that the two are ontologically different. This is what he wrote circa 1906 (in
Kiryushchenko 2012: 69):
The distinction between the two conicting aims [of logic and mathematics] results from
this, that the mathematical demonstrator seeks nothing but the solution of his problem;
and, of course, desires to reach that goal in the smallest possible number of steps; while
what the logician wishes to ascertain is what are the distinctly different elementary steps
into which every necessary reasoning can be broken up. In short, the mathematician wants
a pair of seven-league boots, so as to get over the ground as expeditiously as possible. The
logician has no purpose of getting over the ground: he regards an offered demonstration
as a bridge over a canyon, and himself as the inspector who must narrowly examine every
element of the truss because the whole is in danger unless every tie and every strut is not
only correct in theory, but also awless in execution. But hold! Where am I going? Metaphors
are treacherousfar more so than bridges.
Unauthenticated
Download Date | 6/6/16 9:40 PM
32 | 1 Common Ground
Unauthenticated
Download Date | 6/6/16 9:40 PM
1.1 Logic |
33
logisms and distinguish them from invalid ones. For example, one rule states that
no valid syllogism has two negative premises. There are two negative premises in
the above syllogism.
It was George Boole (1854) who used the idea of sets to unite logic, argumentation, and mathematics into a general formal system. To test an argument, Boole
converted statements into symbols, in order to focus on their logical relations,
independently of their real-world meanings. Then through rules of derivation or
inference he showed that it is possible to determine what new formulas may be
derived from the original ones. Boolean algebra, as it is called, came forward to
help mathematicians solve problems in logic, probability, and engineering. It also
removed meaning from logical argumentation once and for alla fact that has
come back to haunt logicians in the era of computer modeling (as we shall see in
chapter 3).
Booles primary objective was to break down logic into its bare structure by replacing words and sentences (which bear contextual or categorical meaning) with
symbols (which presumably do not). He reduced symbolism to the bare minimum
of two symbolsthe 1 of the binary system for true and the 0 for false. Instead
of addition, multiplication and the other operations of arithmetic (which bear
historical meanings) he used conjunction (), disjunction (), and complement
or negation (), in order to divest operations of any kind of external information
that may interfere with the logic used. These operations can be expressed either
with truth tables or Venn diagrams, which show how they relate to sets, such as
x and y below, where the symbolic representations and Venn diagrams of these
operations are displayed visually:
y
x y
y
x y
x
x
Unauthenticated
Download Date | 6/6/16 9:40 PM
34 | 1 Common Ground
Unauthenticated
Download Date | 6/6/16 9:40 PM
1.1 Logic
35
The irrational numbers and the imaginary ones did not exist until they cropped
up in the solution of two specic equations made possible by the Pythagorean
theorem and the concept of quadratic equation respectively. So, where were they
before? Waiting to be discovered? This question is clearly at the core of the nature
of mathematics. This story can be told over and over within the eldtransnite
numbers, graph theory, and so on. These did not exist until they crystallized
in the conduct of mathematics, through ingenious notational modications, diagrammatic insights, ludic explorations with mathematical signs, and so on.
Aware of the problem of meaning in the formalization of logic, Gottlob Frege
(1879) introduced the distinction between sense and referent. The latter is the object named, whereas the former involves a mode of presentation. So, in an expression such as Venus is the Morning Star, Frege claimed that there are two terms
with different senses but with the same referent. Thus, for Frege this expression is
a version of Venus is Venus, involving a reference to an astronomical discovery.
In symbolic terms, A = A is rendered as A = B, only because in language A has
different senses. Freges distinction introduced the notion that two terms, whose
senses were already xed so that they might refer to different objects, refer to the
same object. His work inuenced Bertrand Russell in a negative way, since he became dissatised with Freges approach. So, Russell advanced his own theory of
descriptions. In his system, the expression Venus is the Morning Star is analyzed as there is an object which is both the Morning Star and Venus. The term
Morning Star is not a name as such; it is a description. Russell viewed such a
sentence as attributing the property Morning Star to the object named Venus.
The sentence therefore is not an identity, Venus is Venus, as Frege claimed.
The theory of reference was taken up by Ludwig Wittgenstein in 1921. Wittgenstein saw sentences as propositions about simple world factsthat is, they
represented features of the world in the same way that pictures or symbols did.
But Wittgenstein had serious misgivings about his own theory of language from
the outset. In his posthumously published Philosophical Investigations (1953), he
was perplexed by the fact that language could do much more than just construct
propositions about the world. So, he introduced the idea of language games,
by which he claimed that there existed a variety of linguistic games (describing,
reporting, guessing riddles, making jokes, and so on) that went beyond simple
Fregean semantics. Wittgenstein was convinced that ordinary language was too
problematic to describe with logical systems because of its social uses. Unlike
Russell, he wanted to ensure the careful, accurate, and prudent use of language
in communication.
Perhaps the most complete study attempting to outline the meta-structure of
logic was Russell and Whiteheads 1913 treatise, the Principia mathematica. The
features connected with their treatise will be discussed in the next chapter. It is
Unauthenticated
Download Date | 6/6/16 9:40 PM
36 | 1 Common Ground
sufficient to say here that, like Euclids fth axiom, it immediately invited reservations from mathematicians. And after Gdels (1931) proof, it became obvious that
it could hardly be considered complete or consistent. By the mid-1950s, formal
analysis went into a crisisa crisis that was somewhat resolved by the rise of computer science and articial intelligence, which used the logical calculus as a basis
to carry out mathematical and linguistic tasks. Logic was the grammar of computers; and thus could be studied in computer software, rather than speculatively.
A little later, research in neuroscience started showing that certain computer algorithms mirrored neural processes. The rescue of formalism was achieved not by
speculations on the meta-structure of logic, but by computer science and brain
research working in tandem (as will be discussed subsequently).
1.2 Computation
Formal grammars and formal mathematics have typically sought to encode the
purported laws of thought, as Boole called them, that generate well-formed
statements, such as proofs and sentences. So, they are not necessarily about the
practical value of the proofs or sentences themselvesthat is, their meanings
but about how they are formed. As discussed, they are concerned with the form
of any argument and thus its validity. This entails ignoring those features that are
deemed to be irrelevant to this goal, such as specic language grammars or certain
proofs in mathematics. As we saw, the rst to concern himself with the metastructure of logic was Aristotle and the fundamental difference between modern
formal logic (as in Boolean, set-theoretic, or Markovian logic systems) and traditional, or Aristotelian logic, lies in their differing analyses of the logical structure of the statements they treat. The syllogism was Aristotles model of logical
form; modern analyses are based on notions such as recursion, logical connectives (such as quantiers) and rules that conjoin the various forms.
But all logical approaches have been fraught, from the outset, with the problem of undecidability. Euclids fth postulate is an example of an undecidable
statementit is obvious, but it cannot be decided whether it is an axiom or a theorem to be proved. At about the same time that formalist approaches surfaced in
linguistics, based on mathematical formalism, computer science and articial intelligence came onto the scene, providing new mechanisms and theoretical frameworks for testing and modeling formal theories and rule systems for decidability
and thus computability. Computational structure will be discussed in more detail in the third chapter. Here a few general ideas will be considered, especially
the one that the computer is a powerful modeling device. Moreover, according to
many contemporary formalists, the action has shifted over to computer science
Unauthenticated
Download Date | 6/6/16 9:40 PM
1.2 Computation | 37
Unauthenticated
Download Date | 6/6/16 9:40 PM
38 | 1 Common Ground
ism. When the computer would come to a halt in certain applications or models, it
indicated that the phenomenon that the computer could not handle would need
to be studied further by linguists or mathematicians (Martn-Vide and Mitrana
2001). In other words, if a theory was inconsistent, the computer would be able
to detect the inconsistency, because the program would go into an innite loop.
A loop is a sequence of instructions that is continually repeated until a certain
state is reached. Typically, when an end-state is reached the instructions have
achieved their goal and the algorithm stops. If it is not reached, the next step
in the sequence is an instruction to return to the rst instruction and repeat the
process over. An innite loop is one that lacks an exit routine. The result is that
the program repeats itself continually until the operating system senses it and
terminates the program with an error.
This approach to theory-testing is called retroactive data analysis in computer
science. This is a method whereby efficient modications are made to an algorithm and its correlative theory that do not generate some output or at least do
not correspond to the input data. The modications can take the form of insertions in the theoretical model, deletions, or updates with new information and
techniques. When nothing works, then we have eshed out of the algorithm something that may be faulty in the theory or, on the other hand, that may be unique
to the phenomenonlinguistic or mathematicaland thus non-computable, that
is, beyond the possibilities of algorithmic modeling.
Computer modeling is a very useful practice for linguists and mathematicians, allowing them to test their hand-made theories and models. In mathematics, it has even been used to devise proofs, the most famous one being the Four
Color Theorem (to be discussed subsequently). Known as proof by exhaustion, it is
established by dividing a problem into a nite number of cases and then devising
an algorithm for proving each one separately. If no exception emerges after an exhaustive search of cases, then the theorem is established as valid. The number of
cases sometimes can become very large. The rst proof of the Four Color Theorem
was based on 1,936 cases, all of which were checked by the algorithm. The proof
was published in 1977 by Haken and Appel and it astonished the world of mathematics, since it went against the basic Euclidean paradigm of proof, with its use of
axioms, postulates, and logic (deductive or inductive) to show that something was
valid. The central idea in traditional proofs is to show that something is always
true by the use of entailment and inference reasoning, rather than to enumerate
all potential cases and test themas does proof by exhaustion, where there is no
upper limit to the number of cases allowed. Some mathematicians prefer to avoid
such proofs, since they tend to leave the impression that a theorem is only true by
coincidence, and not because of some underlying principle or pattern. However,
there are many conjectures and theorems that cannot be proved (if proof is the
Unauthenticated
Download Date | 6/6/16 9:40 PM
1.2 Computation | 39
correct notion) in any other way. These include: the proof that there is no nite
projective plane of order 10, the classication of nite simple groups, and the
so-called Kepler conjecture.
The earliest use of computers in linguistics and mathematics goes back to the
late 1940s and the Machine Translation (MT) movement (Hutchins 1997), which
itself emerged within the context of the cybernetics movementthe science concerned with regulation and control in humans, animals, organizations, and machines. MT was of interest to both linguists and mathematicians because it showed
how algorithms translate one system into another. Cybernetics was conceived by
mathematician Norbert Wiener who used the term in 1948 in his book Cybernetics,
or Control and Communication in the Animal and Machine. The same term was used
in 1834 by the physicist Andr-Marie Ampre to denote the study of government
in his classication system of human knowledge, recalling Plato, who used it to
signify the governance of people. Cybernetics views communication in all selfcontained complex systems as analogous, since they all operate on the basis of
feedback and error-correction signals. The signals (or signal systems) are called
servomechanisms. The cybernetic movement no doubt enthused many linguists,
mathematicians, and computer scientists, leading to the MT movement. When the
early work failed to yield meaningful results, however, the automated processing of human languages was recognized as far more complex than had originally
been assumed. Thus, MT became the impetus for expanding the methods of computational linguistics and for revising formalist theories such as the syntax one.
Today, the computer as a modeling device has become intrinsic to linguistic and
mathematical research. Traditional concepts in the two sciences are being revised
and refashioned as the constant improvement in computer technologies makes it
possible to carry out efficient analyses of specic theories and models.
The Internet has also led to different ways of conducting research. One example of this is the Polymath Project. Mathematical discoveries have been largely
associated with individuals working with mathematical ideas in isolation. And
these are typically named after themPascals Triangle, Hamiltonian circuits,
Bayesian inference, and so on. The Pythagoreans, on the contrary, collaborated
among themselves to discuss and debate discoveries, such as their own theorem
and the unexpected appearance of irrationals. Probably aware of the intellectual power of this kind of collaboration, renowned mathematician Tim Gowers
initiated the online Polymath Project (Nielsen 2012), reviving the Pythagorean
ideal of cooperation in mathematical research. The Project is a worldwide one
involving mathematicians from all over the globe in discussing and proposing
solutions to difficult problems. The Project started in 2009 when Gowers posted
a problem on his blog, asking readers to help him solve it. The problem was to
nd a new proof for the density version of the Hales-Jewett theorem (1963). Seven
Unauthenticated
Download Date | 6/6/16 9:40 PM
40 | 1 Common Ground
weeks later Gowers wrote that the problem was now probably solved, thanks to
the many suggestions he had received.
Computer modeling, data compression algorithms, and the like have led to
a new focus on the relation between quantitative notions such as frequency and
structure. For one thing, algorithms allow for an efficient and rapid collection and
analysis of large corpora of data. And this makes it possible to quantify it statistically. While some may claim that, outside of the use of statistics to analyze the
data, this paradigm has had little or no inuence on the development of theories
of pragmatics, ethnosemantics, and other such branches of language, as I will
argue in chapter 3, the opposite may be true, since interest in discourse may have
initiated in part by the inability of computers to produce human dialogue in a natural way, thus inducing a retroactive focus on conversational structure that would
have been likely inconceivable beforehand.
Moreover, by modeling discourse in the form of algorithms it has become
clear that within linguistic texts there is a hidden structure, based on events and
their probabilities of occurrence within certain contexts. Work in computational
quantication has also led to a new and fertile area of interdisciplinary research
between mathematicians and linguists in the domain of probability theory. The
computer modeling of discourse is essentially a Bayesian-guided one, as will be
discussed in chapter 3. For now, suffice it to say that probability theory has become
a new theme within linguistic research.
Unauthenticated
Download Date | 6/6/16 9:40 PM
1.2 Computation | 41
level is transformed into the surface one via transformational rules. In its bare
outline form, Chomskys theory of language design was (and still is) an elegant
one, as discussed.
Chomsky subsequently claimed, as also discussed, that as linguists studied
the specics of phrase structure and transformational rules in different languages
they would eventually discover within them, and extract from them, a universal
set of rule-making principles, dened as the UG. With this claim, Chomsky turned
linguistics into a branch of both psychology and computer science (Thibault 1997).
But there are problems with his proposal, as we saw. First, rule-making principles
do not explain the semantic interactions among the words in sentences that often
guide the syntax of sentences themselves (Lakoff 1987). Second, sentences might
not be the basic units from which to develop a theory of language (Halliday 1975).
For instance, pronouns may not be simple slot-llers in syntactic descriptions, but
rather trace devices in conversations. The following stretch of conversation, does
not have pronouns in it:
Speaker A:
Speaker B:
Speaker A:
Speaker B:
This stretch would be evaluated by native speakers of English as stilted or, perhaps, as ironic-humorous in some contexts, not because it lacks sentence structure, but because it lacks text structure. The appropriate version of the conversation is one in which pronouns are used systematically as trace devices (anaphoric
and cataphoric) so that parts of individual sentences are not repeated in the conversational chain:
Speaker A:
Speaker B:
Speaker A:
Speaker B:
The use of the pronouns she and herself is text-governed; that is, the pronouns
connect the various parts of the conversation, linking them like trace devices. This
is called coreference or indexicality, a text-making process which suggests that
pronouns cannot be examined in isolation as part of a syntactic rule system, but
rather as part of texts where they function as indexes or deictic particles to keep
conversations uid and non-repetitive. Chomsky has answered this critique by
claiming that transformational rules can handle deixis and deletion easily by ex-
Unauthenticated
Download Date | 6/6/16 9:40 PM
42 | 1 Common Ground
Unauthenticated
Download Date | 6/6/16 9:40 PM
1.2 Computation | 43
Unauthenticated
Download Date | 6/6/16 9:40 PM
44 | 1 Common Ground
Sentence
Subject
Predicate
The boy
Figure 1.9: Tree diagram for The boy eats the pizza
This type of diagram represents Markovs idea that sentences are not constructed
by a direct concatenation of single words, but rather hierarchically in terms of
phrases and relations among them. So, their positioning to the right or left is not
a simple diagrammatic convenience; it shows how the parts of a sentence relate
to each other hierarchically. This means that the linear string, the + boy + eats +
pizza is not generated in a linear fashion with the words combined one after the
other, but rather in terms of rules that will show its hierarchical structure. The
formal study of syntax is, more precisely, an examination of this kind of structure
that can be divided into different states (the branches of the tree) which overlie the
structure of linear strings. The concatenation of items in a sentence is thus governed by states of different kinds, leading to the concept that the rules describe the
states as they are generated one after the other. Thus, people purportedly sense
that something is out of place in a sentence, not because it is necessarily in the
wrong linear place but because it has no syntactic value there. This is akin to place
value in digits. The 2 in 23 has a different value than it does in 12. The values
are determined not by linear order, but by compositional (hierarchical) structure.
The structure of a digit is read, like a string, with each digit having the value of
ascending powers of 10.
The generative rules, therefore, must show how the parts of speech are connected to each other relationally and compositionally. In the above sentence, for
instance, the subject consists of a noun phrase and the predicate of a verb phrase,
which itself is made up with a verb and another noun phrase:
Sentence
Subject
Predicate
Noun Phrase
Verb Phrase
The boy
Verb
Noun Phrase
eats
the pizza
Unauthenticated
Download Date | 6/6/16 9:40 PM
1.2 Computation |
45
Rules such as the following ones will generate the above sentence (S = Sentence,
Sub = Subject, Pr = Predicate, NP = Noun Phrase, VP = Verb Phrase, V = Verb).
Rules (1), (2), (4), and (5) are called rewrite rules because each one rewrites the
previous one by expanding one of its symbols; (3), (6), and (7) are called insertion rules because they show where a lexical item or phrase is inserted in the
generation process:
1.
2.
3.
4.
5.
6.
7.
S
Sub
NP1
Pr
VP
V
NP2
Sub + Pr
NP1
The boy
VP
V + NP2
eats
the pizza
These rules show that a sentence is composed of parts that are expanded in sequential order going through a series of states (indicated by the different rules),
producing a terminal string that has the following linear structure:
NP1 + V + NP2 = the boy + eats + the pizza
Needless to say, that is how a computer algorithm, in bare outline, works. So, it
is a relatively easy task to write algorithms to model Markovian nite-state grammars for generating sentences such as the one above. Now, to this system of rules,
Chomsky added the notion of transformational rule. So, the passive version of the
above stringThe pizza is eaten by the boyis generated by means of a transformational rule (T-rule) that operates on the string as input to produce the output
as required:
NP1 + V + NP2 NP2 + be + V [past participle] + by + NP1
The boy + eats + the pizza The pizza + is + eaten + by + the boy
This T-rule converts one string into another. Since it is a general rule, it applies to
any terminal string that has the representation on the left of the arrow as its structural description. This model of grammar was the standard one in 1965 (above).
Since then, many debates in the eld have dealt with which grammars or which
systems of rules are more powerful and more psychologically real and which modications must be made. From the outset, computer scientists were attracted to the
generative paradigm because it was algorithmically-friendly and thus could be
used as a basis for developing programs to generate language and to translate
from one language to another (chapter 3).
Unauthenticated
Download Date | 6/6/16 9:40 PM
46 | 1 Common Ground
Similar tree diagrams can be devised to show the hierarchical structure overlying digit formation. A digit such as 2,234 has the following Markovian tree structure (Note: this is a highly simplied modeling of the relevant tree; V = value):
Now, a similar type of rule to the phrase structure ones above can be written so
that a computer can model and test the representation of digital numbers for consistency and coherence:
D N n 10n1 + N n1 10n2 + N n2 10n3 + + N1 100
This says that a digit (D) is composed of numerals (N) that have values in ascending powers of 10 when read in a line. This is now a formal statement devised by
hand that can easily be written as an algorithm that will run a program to generate strings of numbers ad innitum.
Computational linguistics and mathematics have now gone beyond the modeling of formalisms such as digit formation, as we shall see. They now attempt
to reproduce human behavior in robots, with the development of very powerful
learning algorithms. But it must not be forgotten that many of the advances in
digital communications technologies, such as voice activation, speech recognition, and other truly remarkable capacities of computers today, were made possible by the early partnership among linguists, mathematicians, and computer
scientists. In effect, running formal programs is akin to following an engineering
manual for assembling some object. The rules of assemblage allow for the object
to work but they tell us nothing about the object itself, nor about why the rules
work or do not. This is why formal theorists have always sought conrmation or
corroboration in psychology wherein the models are tested out not on machines
but on human beings. The collaboration among psychologists, linguists, mathematicians, and computer scientists coalesced into a full-edged discipline called
cognitive science in the 1980s.
Unauthenticated
Download Date | 6/6/16 9:40 PM
1.2 Computation |
47
assumption was that the same laws of learning applied to all organisms and,
therefore, that the discovery of basic principles of learning and problem-solving
could be gleaned from experiments with animals. Cognitive science sought the
laws not in any comparison with animals, but instead from studying how machines learned to do things from a set of instructions. The term cognition,
rather than mind or behavior, was employed from the outset in order to eliminate the articial distinction maintained by behaviorist psychologists between
inner (mental) and observable (behavioral) processes. Indeed, this term has now
come to designate all mental processes, from perception to language. Adopting
insights from articial intelligence, cognitive scientists aimed from the outset to
investigate the mind by seeking parallels between the functions of the human
brain and the functions of computers.
Cognitive science thus adopted the notions and methods of articial intelligence researchers. If the output of an algorithm was a linguistic sentence and
it was shown to be well-formed, then the input (the rules used to create the algorithm) was evaluated as correct; if the output was not a well-formed sentence
then the fault was detected in the input and changed accordingly. The process
was a purely mechanical and abstract one, since it was thought that a faculty like
language could be analyzed in isolation form its functions in context and from its
biological interactions with the human body. As Gardner (1985: 6) put it, for early
cognitive scientists, it was practical to have a level of analysis wholly separate
from the biological or neurological, on the one hand, and the sociological or cultural, on the other; therefore, central to any understanding of the human mind
is the electronic computer.
The current focus of cognitive science has, of course, gone beyond this computational agenda. It now even seeks to design articial programs that will display all the characteristics of human cognition, not just model aspects of it. To
do so, it must not only be able to decompose the constituent parts that faculties such as perception, language, memory, reasoning, emotion, and so on might
have, and then reassemble them in terms of representations that can be then programmed into software, but also devise ways for the algorithm to generate new
rules on its own given variable inputs. For contemporary cognitive science the
guiding premise is the belief that representational structures in the mind and computational representations of these structures are isomorphic. Aware that lived or
embodied experience might interfere into this whole process, recent cognitive
science has gone beyond developing representations and algorithms to be implemented in computers, to studying how lived experience shapes cognition, in
contrast to articial intelligence. So, there are now two streams within cognitive
science: the formalist one that aims to translate formal theories into algorithms
that are believed to be transferable to non-living robots, and the one that seeks to
Unauthenticated
Download Date | 6/6/16 9:40 PM
48 | 1 Common Ground
see how mental operations are unique because they are shaped by bodily experiences.
At the core of the cognitive science agenda, no matter which of the two streams
is involved, is learningHow do we learn language? How do we learn mathematics? As discussed above, the role of metaphor in the process was for many years
ignored, but today it is a central topic within both streams of cognitive science.
Metaphor indicates how we go from sensory knowledge or imaginative inference
to conceptual knowledge. Like other animals, human infants come to understand
things in the world at rst with their senses. When they grasp objects, for instance, they are discovering the tactile properties of things; when they put objects
in their mouths, they are probing their gustatory properties; and so on. However,
in a remarkably short period of time, they start replacing this type of sensory
knowing with conceptual knowingthat is, with words, pictures, and other forms
that stand for things. This event is extraordinaryall children require to set their
conceptual mode of knowing in motion is simple exposure to concepts in social
context through language, pictures, and other kinds of symbol-based forms of
representation and communication. From that point on, they require their sensory apparatus less and less to gain knowledge, becoming more and more dependent on their conceptual mode. Cognitive science research, such as the one by
Lakoff and Nez (2000), has started to show that the transition from one stage
to the other is mediated by metaphor. Without discussing the relevant research
here, since it will be discussed subsequently, it is sufficient to say that the role of
metaphor in childhood can no longer be ignored.
The shift from sensory to conceptual knowing was rst examined empirically
by two psychologistsJean Piaget and Lev S. Vygotsky. Piagets work documented
the presence of a timetable in human development that characterizes the shift
(Piaget 1923, 1936, 1945, 1955, 1969, Inhelder and Piaget 1969). During the initial
stage infants explore the world around them with their senses, but are capable of
distinguishing meaningful (sign-based) stimuli (such as verbal ones) from random noises. In short time, they show the ability to carry out simple problemsolving tasks (such as matching colors). Piaget called this the pre-operational
stage, since it is during this phase that children start to understand concept-based
tasks operationally. By the age of 7, which Piaget called the concrete operations
stage, children become sophisticated thinkers, possessing full language and other
conceptual modes of knowing for carrying out complicated tasks. The mental development of children culminates in a formal operations stage at puberty, when
the ability to reason and actualize complex cognitive tasks emerges.
As insightful as Piagets work is, it makes no signicant reference to the use
of metaphor in childhood as a creative strategy for knowing the world. Vygotsky
(1962), on the other hand, saw metaphor as a vital clue to understanding how the
Unauthenticated
Download Date | 6/6/16 9:40 PM
1.2 Computation |
49
conceptual mode of knowing emerges. When children do not know how to label
somethingsuch as the moonthey resort to metaphor, calling it a ball or a
circle. Such metaphorical fables, as Vygotsky called them, allow children to
interconnect their observations and reections in a holistic and meaningful fashion. Gradually, these are replaced by the words they acquire in context, which
mediate and regulate their thoughts, actions, and behaviors from then on. By the
time of puberty children have, in fact, become creatures of their culture. Vygotsky
thus saw culture as an organizing system of the concepts that originate and develop with a group of people tied together by force of history.
This line of work raises the question of association as a major force in development and cognition. Given the controversy surrounding the term in psychology
and linguistics, it is necessary to clarify, albeit schematically, what it now means
within the cognitive science paradigm. In psychology, associationism is the theory
that the mind comes to form concepts by combining simple, irreducible elements
through mental connection. One of the rst to utilize the notion of association
was Aristotle, who identied four strategies by which associations are forged: by
similarity (for example, an orange and a lemon), difference (for example, hot and
cold), contiguity in time (for example, sunrise and a roosters crow), and contiguity in space (for example, a cup and saucer). John Locke (1690) and David Hume
(1749) saw sensory perception as the underlying factor in guiding the associative
process; that is, things that are perceived to be similar or contiguous in time or
space are associated to each other; those that are not are kept distinct by the
mind. In the nineteenth century, the early psychologists, guided by the principles
enunciated by James Mill (see 2001), studied experimentally how subjects made
associations. In addition to Aristotles original four strategies, they found that factors such as intensity, inseparability, and repetition played a role in stimulating
associative thinking: for example, arms are associated with bodies because they
are inseparable from them; rainbows are associated with rain because of repeated
observations of the two as co-occurring phenomena; etc.
Associationism took a different route when Ivan Pavlov (1902) published his
famous experiments with dogs, which, as is well known, established the theory
of conditioning as an early learning theory. When Pavlov presented a meat stimulus to a hungry dog, the animal would salivate spontaneously, as expected. He
termed this the dogs unconditioned responsean instinctual response programmed into each species by Nature. After Pavlov rang a bell while presenting
the meat stimulus a number of times, he found that the dog would eventually salivate only to the ringing bell, without the meat stimulus. Clearly, Pavlov suggested,
the ringing by itself, which would not have triggered the salivation initially, had
brought about a conditioned response in the dog. It was thus by repeated association of the bell with the meat stimulus that the dog had learned something
Unauthenticated
Download Date | 6/6/16 9:40 PM
50 | 1 Common Ground
1.2.3 Creativity
In the two streams of cognitive sciencethe formalist and the embodied one (so to
speak)creativity has different denitions. In the former it consists in the ability
to create well-formed strings ad innitum; in the latter it is a result of what is now
called blending. In Syntactic Structures, Chomsky (1957) compared the goal of linguistics to that of chemistry. A good linguistic theory should be able to generate
all grammatically possible utterances, in the same way that a good chemical
theory might be said to generate all physically possible compounds (Chomsky
1957: 48). A decade later (Chomsky 1966a: 10), he went on to dene verbal creativity as the speakers ability to produce new sentences that are immediately understood by other speakers. For generativists, linguistic creativity unfolds within a
system of rules and rule-making principles that allow for the generation of an innite class of symbol combinations and permutations with their formal properties.
It should come as no surprise, therefore, to nd that anyone who holds this perspective has an affinity for articial intelligence models and computer algorithms.
Unauthenticated
Download Date | 6/6/16 9:40 PM
1.2 Computation | 51
Ulrich Neisser (1967: 6) put it as follows just before the advent of cognitive science
as an autonomous area of inquiry:
The task of the psychologist in trying to understand human cognition as analogous to that
of a man trying to discover how a computer has been programmed. In particular, if the program seems to store and reuse information, he would like to know by what routines or
procedures this is done. Given this purpose, he will not care much whether his particular
computer stores information in magnetic cores or in thin lms; he wants to understand the
program, not the hardware. By the same token, it would not help the psychologist to know
that memory is carried by RNA as opposed to some other medium. He wants to understand
its utilization, not its incarnation.
However, Neisser was well aware that the computer metaphor, if brought to an
extreme, would actually lead psychology astray. So, only a few pages later he issued the following warning (Neisser 1967: 9): Unlike men, articially intelligent
programs tend to be single-minded, undistractable, and unemotional in my
opinion, none does even remote justice to the complexity of mental processes.
Although attempts have been made to model such creative linguistic acts as
metaphor, the results have never been successful. This is because metaphor is
an exception to the strict rules of syntax, as Lakoff found in his thesis (described
above). When the mind cannot nd a conceptual domain for understanding a
new phenomenon, it resorts instinctively to metaphor to help it scan its internal
space in order to make new associations. There is no innovation in science or art
without this capacity. Logic and syntax simply stabilize the rational architecture
of cognition, not create new features for it to utilize in some novel way. It should
be mentioned, however, that there are algorithms that can identify metaphorical
language very effectively, such as the one devised by Neuman et al. (2013). And,
various programs have been written for generating legitimate metaphors. The
problem of representation is therefore a fairly straightforward one. The difficulties
come at the level of interpretation. When asked what a novel metaphor generated
through a random algorithmic process means then the computer breaks down.
The embodied cognition stream of cognitive science actually complements
the more formalist one, aiming to study the shift from sensory to conceptual
knowledge discussed above. The two streams should not be considered to be bifurcating, but rather converginga thematic subtext of this book is that all kinds
of approaches to cognition, from the formalist to the highly creative, are relevant
for understanding it. This is the basic meaning of interdisciplinaritya form of
scientic inquiry that is not based on partisan partnerships, but rather on an
open-minded view of the methods and goals of each scientic epistemology.
Unauthenticated
Download Date | 6/6/16 9:40 PM
52 | 1 Common Ground
1.3 Quantication
One area where linguistics and mathematics certainly converge practically is in
the use of quantication methods and theories. In the case of mathematics, elds
such as statistics and probability theory are branches that have theoretical implications for studying mathematics itself as well as many practical applications
(in science, business, and other elds). In the case of linguistics, quantication is
a tool used to examine specic phenomena, such as statistical and probabilistic
patterns in the evolution of languages, or to esh out hidden structure in language
artifacts (such as texts) through basic statistical techniques.
A fundamental premise in the quantication research paradigm is that statistical and probabilistic methods allow us to discover and model structure effectively. Modeling is a basic aspect of both the theoretical and computational
approaches to language and mathematics, as discussed above. Architects make
scale models of buildings and other structures, in order to visualize the structural
and aesthetic components of building design, while using quantication techniques as part of the engineering of such structures; scientists utilize computer
models of atomic and sub-atomic phenomena to explore the structure of invisible matter and thus to glean underlying principles of structure (as in quantum
analysis); and so on and so forth.
Another premise is that mathematics is itself fundamentally the science of
quantity. The most basic signs in mathematics are the numbers that stand for
quantitative concepts. The integers, for example, stand for holistic entities, and
these can be enumerated with the different numbers. The study of integers leads
to the discovery of hidden pattern. For example, the sum or product of whole numbers always produces another whole number: 2 + 3 = 5. On the other hand, dividing
whole numbers does not always produce another whole number, because division
is akin to the process of partitioning something. So, 2 divided by 3 will not produce
a whole number. Rather, it produces a partitive number known of course as a fraction: 2/3. Various types of number sign systems have been used throughout history
to represent all kinds of quantitative concepts. The connection between the number and its referent, once established, is bidirectionalthat is, one implies the
other. The decimal system has prevailed for common use throughout most of the
world because it is an efficient system for everyday number concepts. The binary
system, on the other hand, is better adapted to computer systems, since computers
store data using a simple on-off switch with 1 representing on and 0 off.
The study of quantitative structure is now a branch of mathematics and linguistics. The three main relevant topics that will interest us in this book are compression, economical structure, and probability structure. These will be discussed
in more detail in the fourth chapter.
Unauthenticated
Download Date | 6/6/16 9:40 PM
1.3 Quantication
| 53
1.3.1 Compression
One of the more interesting ndings of contemporary cognitive science is that of
compression, or the idea that emergence of form and meaning comes from the
compression of previous form. Compression can be both modeled and quantied
using basic statistical techniques. As Ball and Bass (2002: 11) point out in the area
of mathematics teaching, understanding compression involves unpacking symbols and concepts:
Looking at teaching as mathematical work highlights some essential features of knowing
mathematics for teaching. One such feature is that mathematical knowledge needs to be
unpacked. This may be a distinctive feature of knowledge for teaching. Consider, in contrast, that a powerful characteristic of mathematics is its capacity to compress information
into abstract and highly usable forms. When ideas are represented in compressed symbolic
form, their structure becomes evident, and new ideas and actions are possible because of
the simplication afforded by the compression and abstraction. Mathematicians rely on this
compression in their work. However, teachers work with mathematics as it is being learned,
which requires a kind of decompression, or unpacking, of ideas.
Unauthenticated
Download Date | 6/6/16 9:40 PM
54 | 1 Common Ground
isters. As simple as this may seem, it does have implications for describing style,
dialectal variation, and the like in a precise way. It is interesting to note that research has shown that the MLU changes over the life cycle and can also be used to
chart various milestones in the acquisition of language in childhood. Miller (1981)
found that the following MLUs corresponded to specic ages as follows:
Table 1.1: Mean length of utterance and language development
MLU
1.31
1.62
1.92
2.54
2.85
3.16
3.47
3.78
4.09
4.40
4.71
5.02
5.32
5.63
18
21
24
30
33
36
39
42
45
45
51
54
57
60
In a subsequent study, Garton and Pratt (1998) indicate, however, that while there
is a correlation between MLU and age equivalence, it is a weak one. So, at best it
should be used as a generic guide, not as a law of verbal development. Nevertheless, the MLU shows how a simple quantitative notion might be able to shed light
on something intrinsic, such as language acquisition.
One application of the MLU concept is to determine how many morphemes
are used to construct words and sentences, so as to provide a rationale for classifying languages as either agglutinative or isolating, that is as either morphological
or syntactic, with the latter being much more compressive. As is well known, the
former are languages, such as Turkish, Basque, and a number of indigenous American languages, that use bound morphemes such as suffixes abundantly in the
construction of their words; the latter are languages that tend to form their words
with one morpheme per word. Chinese is an example of an isolating language,
although it too uses affixes, but less frequently than other languages do. The American linguist Joseph Greenberg (1966) introduced the concept of morphological
index to assess degree of morphological relation of languages to each other in
Unauthenticated
Download Date | 6/6/16 9:40 PM
1.3 Quantication
| 55
terms of mean length of words. The index is derived by taking a representative and
large sample of text, counting the words and morphemes in it, and then dividing
the number of morphemes (M) by the number of words (W):
I=MW
In a perfectly isolating language, the index will be equal to 1, because there is
a perfect match between number of words (W) and number of morphemes (M),
or M = W. In agglutinating languages, the M will be greater than W. The greater
it is, the higher the index, and thus the higher the degree of agglutination. The
highest index discovered with this method is 3.72 for the Inuit languages. Interestingly, this method of classifying languages has produced consistent results with
the traditional phylogenetic methods using cognate analysis and sound shifts to
determine language families.
1.3.2 Probability
In mathematics, the formal study of quantitative structure came to the forefront
with the calculus and probability theory, both of which showed that quantities
cannot be studied in absolutist terms, but relative to the situation in which they
exist. From this, theories of probability became ever more present in the philosophy of mathematics itself. Probability attempts, in fact, to express in quantiable terms statements of the form: An event A is more (or less) probable than an
event B. Mathematicians have struggled for centuries to create a theory of probability that would allow them to penetrate what can be called a quantication
principle. This can be dened simply as the extraction of some probability metric
in a set of seemingly random data. In fact, they have developed several related
theories and methods to carry this out. The subjective theory takes probability
as an expression of an individuals own degree of belief in the occurrence of an
event regardless of its nature. The frequency theory is applied to events that
can be repeated over and over again, independently and under the same exact
conditions.
The study of such phenomena as compression and probabilistic structure
constitutes yet another area of the common ground that connects linguistics and
mathematics. Together with computational modeling, quantication methods
have been showing more and more that there are inherent tendencies in the brain
that manifest themselves in specic ways in representational systems. Unraveling
these tendencies is part of the hermeneutical perspective that interdisciplinarity
entails.
Unauthenticated
Download Date | 6/6/16 9:40 PM
56 | 1 Common Ground
1.4 Neuroscience
We started off this chapter discussing Lakoffs 2011 lecture at the Fields Institute
showing how mathematics and language shared a common propertyblending.
Gdels famous proof, Lakoff argued, was inspired by Cantors diagonal method.
It was, in his words, a blend of Cantors method with a new domain. Gdel had
shown essentially that within any formal logical system there are results that can
be neither proved nor disproved. Lakoff pointed out that Gdel found a statement
in a set of statements that could be extracted by going through them in a diagonal fashionnow called Gdels diagonal lemma. That produced a statement, S,
like Cantors C, that does not exist in the set of statements. The inspiration came,
according to Lakoff, through the linguistic process of metaphorization, whereby
one domain is associated with another and in the association one nds new ideas.
Cantors diagonalization and one-to-one matching proofs are metaphorsblends
between different domains linked in a specic way. This metaphorical insight led
Gdel, Lakoff suggested, to imagine three metaphors of his own. The rst one,
called the Gdel Number of a Symbol, is evident in the argument that a symbol
in a system is the corresponding number in the Cantorian one-to-one matching
system (whereby any two sets of symbols can be put into a one-to-one relation).
The second one, called the Gdel Number of a Symbol in a Sequence, consists in
Gdels demonstration that the nth symbol in a sequence is the nth prime raised to
the power of the Gdel Number of the Symbol. And the third one, called Gdels
Central Metaphor, was his proof that a symbol sequence is the product of the
Gdel numbers of the symbols in the sequence.
Lakoff concluded by claiming that Gdels proof exemplies the process of
blending perfectly. A blend is formed when the brain identies two distinct inputs
(or mental spaces) in different neural regions as the same entity in a third neural
region. But the blend contains more information than the sum of information bits
contained in the two inputs, making it a powerful form of new knowledge (see
Figure 1.12).
The three together constitute the blend, paralleling the process of metaphor
preciselyinput 1 might correspond to the topic, input 2 to the vehicle, and the
blend to the so-called ground. In the metaphor, That mathematician is a rock,
the two distinct inputs are mathematician (topic) and rock (vehicle). The blending
process is guided by the inference (or what Lakoff calls a conceptual metaphor)
that people are substances, constituting the nal touch to the blenda touch that
keeps the two entities distinct in different neural regions, while identifying them
simultaneously as a single entity in the third. Using conceptual metaphor theory, which will be discussed subsequently, Lakoff suggested that the metaphorical
blend occurs when the entities in the two regions are the source (substances) and
Unauthenticated
Download Date | 6/6/16 9:40 PM
1.4 Neuroscience
Input 1
| 57
Input 2
Blend
target (people). Gdels metaphors, analogously, came from neural circuits linking
a number source to a symbol target. In each case, there is a blend, with a single
entity composed of both a number and a symbol sequence. When the symbol sequence is a formal proof, a new mathematical entity appearsa proof number.
The underlying premise in this whole line of theorization is that metaphorical
blends in the brain produce knowledge and insights.
In the end, Lakoff argued that mathematicians and linguists had a common
goalto study the blending processes that unite mathematics and language.
Chomsky before had also argued for a similar collaboration, but his take on the
kind of approach was (and still is) radically different. Whatever the case, it became obvious by the early 2000s that the area where mathematics and language
can be studied interactively lies within neuroscience. It is therein that formal
theories and blending theories can be assessed and corroborated or eliminated.
We will discuss the different research ndings in neuroscience that are making the
investigation of linguistic and mathematical competence truly intriguing in the
nal chapter. Here it is sufficient to go through some of the goals of neuroscience
in a prima facie way.
Unauthenticated
Download Date | 6/6/16 9:40 PM
58 | 1 Common Ground
mental evidence to suggest that the human brain and that of some chimps come
with a wired-in aptitude for math. The difference in the case of chimps is, apparently, an inability to formalize this knowledge and then use it for invention and
discovery. So, humans and chimps possess a kind of shared number instinct,
according to Dehaene and others, but not number sense. Of course, the study of
language in primates has also revolved around a similar dichotomy: Do primates
possess a language instinct but not a language sense?
Within neuroscience a subeld, called math cognition, has emerged to seek
answers to the innate (Platonic)-versus-constructivist debate in the learning of
mathematics. Brain-scanning experiments have shown that certain areas of the
brain are hard-wired to process numerical patterns, while others are not. So, math
cognition is specic to particular neural structures; it is not distributed modularly
throughout the brain. Moreover, these structures come equipped with number
sense. Dehaene claims that the number line, for instance, is not a construct; it
is an image that is innate and can be seen to manifest itself (differentially, of
course) throughout the world. But anthropological evidence scattered here and
there (Bockarova, Danesi, and Nez 2012) would argue to the contrary, since in
cultures where the number line does not exist as a tradition, the kinds of calculations and concepts related to it do not appear. Whatever the truth, it is clear that
the neuroscientic study of math cognition is an area of relevance to understanding what mathematics is, how it is learned, and how it varies anthropologically.
The study of the latter is a eld known as ethnomathematics. It has been found, for
example, that proof and mathematical discoveries in general seem to be located
in the same neural circuitry that sustains ordinary language and other cognitive
and expressive systems. It is this circuitry that allows us to interpret meaningless
formal logical expressions as talking about themselves.
One of the more signicant ndings to emerge from neuroscience in general is
the likelihood that the right hemisphere (RH) is a crucial point-of-departure for
processing novel stimuli: that is, for handling input for which there are no preexistent cognitive codes or programs available. In their often-quoted review of a large
body of experimental literature a number of decades ago, Goldberg and Costa
(1981) suggested that the main reason why this is so is because of the anatomical structure of the RH. Its greater connectivity with other centers in the complex
neuronal pathways of the brain makes it a better distributor of new information.
The left hemisphere (LH), on the other hand, has a more sequentially-organized
neuronal-synaptic structure and, thus, nds it more difficult to assimilate information for which no previous categories exist. If this is indeed the case, then it
suggests that the brain is prepared to interpret new information primarily in terms
of its physical and contextual characteristics. Further work in this area has conrmed this synopsis. This is a relevant nding because the rst thoughts about
Unauthenticated
Download Date | 6/6/16 9:40 PM
1.4 Neuroscience
| 59
number (number sense) are likely to be located in the RH of the brain; these are
then given formal status by the LH. This suggests that both hemispheres are involved in a connective form of thinking.
The RH is where the sense impressions that the brain converts into images are
subsequently transformed into concrete percepts. Percepts register our physiological and affective responses to the signals and stimuli present in the environment.
They lter incoming information and assay it for its relevance, discarding from it
all that is deemed to be irrelevant to the task at hand. In this way, bodily sense
is present in all thinking in such a way that it is even more ordered than language and logic. Number sense emerges as a kind of blend from the percepts in
the RH which are then transferred into ordered sense to the LH. Work in neuroscience today seemingly conrms this very simple hypothesis. For example,
Semenza et al. (2006) found that mathematical abilities are located and develop
in the brain with respect to language, whose acquisition also shows a RH to LH
ow. The researchers assessed math ability in six right-handed patients affected
by aphasia following a lesion to their non-dominant hemisphere (crossed aphasia) and in two left-handed aphasics with a right-sided lesion. Acalculia (loss of
the ability to execute simple arithmetical operations) was found in all cases, following patterns that had been previously observed in the most common aphasias
resulting from LH lesions. No sign of RH acalculia (acalculia in left lateralized
right-handed subjects) was detected by their study. Overall, the study suggested
that language and calculation share the same hemispheric substratum.
PET and fMRI studies are now conrming that language processing is extremely complex, rather than involving a series of subsystems (phonology, grammar, and so on) located in specic parts of the brain (Brocas area, Wernickes
area, and Penelds area), and that it parallels how we understand numbers and
space. The neuronal structures involved in language are spread widely throughout
the brain, primarily by neurotransmitters, and it now appears certain that different types of linguistic and computational (arithmetical) tasks activate different
areas of the brain in many sequences and patterns. It has also become apparent
from fMRI research that language and problem-solving are regulated, additionally, by the emotional areas of the brain. The limbic systemwhich includes
portions of the temporal lobes, parts of the hypothalamus and thalamus, and
other structuresmay have a larger role than previously thought in the processing
of certain kinds of speech and in the emergence of number sense.
Overall, the current research in neuroscience suggests that the brain is a connective organ, with each of its modules (agglomerations of neuronal subsystems
located in specic regions) organized around a particular task. The processing
of visual information, for instance, is not conned to a single region of the RH,
although specic areas in the RH are highly active in processing incoming visual
Unauthenticated
Download Date | 6/6/16 9:40 PM
60 | 1 Common Ground
information. Rather, different neural modules are involved in helping the brain
process visual inputs as to their contents; in practice this means retaining from the
information what is relevant, and discarding from it (or ignoring) what is not. Consequently, visual stimuli that carry linguistic information or geometric information (such as diagrams) would be converted by the brain into neuronal activities
that are conducive to strictly logical, not visual, processing. This is what happens
in the case of American Sign Language. The brain rst processes the meanings
of visual signs, extracting the grammatical relations in them, in a connected or
distributed fashion throughout the brain (Hickok, Bellugi, and Klima 2001). But
visual stimuli that carry a different kind of informationsuch as the features of a
drawingare converted instead into neuronal activities that are involved in motor
commands for reproducing the drawing. This nding would explain why tonemes
(tones with phonemic value) are not processed by the RH, as is the case for musical
tones. Tone systems serve verbal functions, thus calling into action the LH. Musical tones instead serve emotional (aesthetic) functions, thus calling into action
the RH.
The connectivity that characterizes neural structure has been examined not
only experimentally with human subjects, but also theoretically with computer
software. Computer models of the brain have been designed to test out various
theories, from formalist to blending theories. One of the most cited theories in
computational neuroscience is the so-called Parallel Distributed Processing (PDP)
model. It is designed to show how, potentially, brain networks interconnect with
each other in the processing of information. The PDP model appears to perform
the same kinds of tasks and operations that language and problem-solving do
(MacWhinney 2000). As Obler and Gjerlow (1999: 11) put it, in the strong form
of PDP theory, there are no language centers per se but rather network nodes
that are stimulated; eventually one of these is stimulated enough that it passes a
certain threshold and that node is realized, perhaps as a spoken word.
The integration of RH and LH functions to produce language and mathematics
is now a virtual law of neuroscience. Investigating such phenomena as blending has, in fact, become a primary research target, since it provides a theoretical
framework for how we form and understand complex ideas via the interconnectivity of modules in separate neural pathways that are activated in tandem. The
specic branch of neuroscience that studies these phenomena is known as cognitive neuroscience. Methods employed in this branch include experimental studies
with brain-damaged subjects, neuroimaging studies, and computer modeling research on neural processes.
The relevant issues pertaining to the common ground of language and mathematics that cognitive neuroscience is now investigating are the following:
Unauthenticated
Download Date | 6/6/16 9:40 PM
1.4 Neuroscience |
61
1.
2.
Unauthenticated
Download Date | 6/6/16 9:40 PM
62 | 1 Common Ground
1.4.2 Blending
As Whiteley (2012) has cogently argued, of all the models investigated by cognitive neuroscientists, the most promising one for getting at the core of the neural
continuity between mathematics and language is blending theory. The rst elaborate discussion of this theory is by Fauconnier and Turner (2002). The best way
to make the case of why blending may be a promising line of inquiry for neuroscience to pursue is to take a step back and review conceptual metaphor theory
(CMT) schematically here.
CMT subdivides gurative language into linguistic and conceptual. The former is a single metaphorical utterance; the latter a mental schema from which the
single metaphor derives. In other words, a specic linguistic metaphor is a token
of a type (a conceptual metaphor). For instance, Hes a real snake is a token of
people are animals. Using this distinction, in 1980 George Lakoff and Mark Johnson meticulously illustrated the presence of conceptual metaphors in everyday
speech forms, thus disavowing the mainstream view at the time that metaphorical
utterances were alternatives to literal ways of speaking or even exceptional categories of languagea topic that, as we saw above, Lakoff had himself addressed
in his doctoral thesis. According to the traditional account of discourse, an individual would purportedly try out a literal interpretation rst when he or she hears
a sentence, choosing a metaphorical one only when a literal interpretation is not
possible from the context. But as Lakoff and Johnson convincingly argued, if this
is indeed the case, then it is so because people no longer realize that most of their
sentences are based on metaphorical inferences and nuances. Moreover, many
sentences are interpreted primarily in a metaphorical way, no matter what their
true meaning. When a sentence such as The murderer was an animal is uttered,
almost everyone will interpret it as a metaphorical statement. Only if told that the
Unauthenticated
Download Date | 6/6/16 9:40 PM
1.4 Neuroscience
| 63
animal was a real animal (a tiger, a bear, and so on), is the sentence given a
literal interpretation.
A critical nding of early CMT research concerned so-called nonsense or
anomalous strings. It was Chomsky (1957) who rst used such stringsfor example, Colorless green ideas sleep furiouslyto argue that the syntactic rules of a
language were independent from the semantic rules. Such strings have the structure of real sentences because they consist of real English words put together
in a syntactically-appropriate fashion. They meet the logical criterion of wellformedness. This forces us to interpret the string as a legitimate, but meaningless,
sentencea fact which suggests that we process meaning separately from syntax.
Of course, what Chomsky ignored is that although we do not extract literal meaning from such strings, we are certainly inclined to extract metaphorical meaning
from them. When subjects were asked to interpret them in follow-up research,
they invariably came up with metaphorical meanings for them (Pollio and Burns
1977, Pollio and Smith 1979, Connor and Kogan 1980). This nding suggests, therefore, that we are inclined, by default, to glean metaphorical meaning from any
well-formed string of words, and that literal meaning is probably the exception.
As Winner (1982: 253) has aptly put it, if people were limited to strictly literal
language, communication would be severely curtailed, if not terminated.
Another early nding of CMT is that metaphor implies a specic type of
mental imagery. In 1975, for instance, Billow found that a metaphor such as The
branch of the tree was her pony invariably was pictured by his child subjects in
terms of a girl riding a tree branch. Since the use of picture prompts did not significantly improve the imaging process or the time required to interpret metaphors,
Billow concluded that metaphors were already high in imagery-content and,
consequently, needed no prompts to enhance their interpretation. Incidentally,
visually-impaired people possess the same kind of imagery-content as do visually normal people. The fascinating work of Kennedy (1984, 1993. Kennedy and
Domander 1986) has shown that even congenitally blind people are capable of
making appropriate line drawings of metaphorical concepts if they are given
suitable contexts and prompts.
A conceptual metaphor results from a neural blend. In the linguistic metaphor
The professor is a bear the professor and the bear are amalgamated by the conceptual metaphor people are animals. Each of the two parts is called a domain
people is the target domain because it is the general topic itself (the target of
the conceptual metaphor); and animals is the source domain because it represents the class of vehicles, called the lexical eld, that delivers the metaphor (the
source of the metaphorical concept). Using the Lakoff-Johnson model, it is now
easy to identify the presence of conceptual metaphors not only in language, but
also in mathematics. The number line is a good example of what this entails. In
Unauthenticated
Download Date | 6/6/16 9:40 PM
64 | 1 Common Ground
this case, the target domain is number and the source domain is linearity. The
latter comes presumably from the fact that we read numerals from left to right or
in some languages, vice versa. So, the line is a blend of two input domains leading to a new way of understanding number and of representing it (see Figure 1.12
above). Thus the notion of number sense is relevant and interpretable only on the
basis of specic cultural experience and knowledge. That is, only in cultures that
use Euclidean geometry is it possible to make a general inference between geometrical objects such as lines and numerical ideas. Thus, conceptual metaphors
are not just extrapolations; they derive from historical, cultural, social emphases,
experiences, and discourse practices.
What does talking about number as a gment of linearity imply? It means
that we actually count and organize counting in this way. In a phrase, the conceptual metaphor both mirrors and then subsequently structures the actions we
perform when we count. First, it reveals how the blend occurred; and second,
it then guides future activity in this domain of sense-making. For this reason,
the number line has become a source of further mathematics, leading to more
complex blends and thus producing emergent structure regularly. The number
line results from blending experiences (inputs) to further conceptual abstractions,
permitting us not only to recognize patterns within them, but also to anticipate
their consequences and to make new inferences and deductions. Thus, blending
theory suggests that the source domains (inputs) enlisted in delivering an abstract
target domain were not chosen originally in an arbitrary fashion, but derived from
the experience of events and, of course, from the subjective creativity of individuals who use domains creatively and associatively.
CMT has led to many ndings about the connectivity among language and
mathematics, culture, and knowledge (Lakoff and Nez 2000). Above all else,
it has shown that gurative cognition shows up not only in language but in other
systems as well. Lakoff himself has always been aware of this level of connectivity,
writing as follows: metaphors can be made real in less obvious ways as well, in
physical symptoms, social institutions, social practices, laws, and even foreign
policy and forms of discourse and of history (Lakoff 2012: 163164).
Unauthenticated
Download Date | 6/6/16 9:40 PM
| 65
overview, not an in-depth description and assessment of all the many applications and connections between the two disciplines. My goal is to show how this
collaborative paradigm (often an unwitting one) has largely informed linguistic
theory historically and, in a less substantive way, how it is starting to show the nature of mathematical cognition as interconnected with linguistic cognition. The
comparative study of mathematics-as-language and language-as-mathematics
gained momentum with Lakoff and Nezs (2000) key book and with work
in the neurosciences showing similar processing mechanisms in language and
mathematics.
As mentioned, the interface lays the groundwork for formulating specic
hermeneutical questions and conceptualizations about the nature of mathematics vis--vis language. Neuroscience enters the hermeneutical terrain by shedding
light on what happens in the brain as these conceptualizations are manipulated
in some way.
The primary task of any scientic or critical hermeneutics is to explain how
and why phenomena are the way they are by means of theories, commentaries,
annotations, and, as new facts emerge or are collected about the relevant phenomena, to subsequently adjust, modify, or even discard them on the basis of
the new information. The ultimate goal of science is to explain what Aristotle
called the nal causes of reality. To esh these out in mathematics and language specic interdisciplinary rubrics present themselves as highly suggestive.
Linguistics studies the nal causes that constitute the phenomenon of language
and mathematics the nal causes that constitute math cognition. Whether one
adopts a formalist or functionalist analytical framework, the role of both sciences
is to uncover laws of structure and meaning that undergird the systems under
study. Linguists use mathematics also in specic waysfrom computer to quantitative modeling. Vice versa, the mathematician can look to linguistic theories to
determine the degree of relationship between mathematical and linguistic structure. The balance tilts much more to the linguistics-using-math side than the
math-using-linguistics side. But the work in CMT and blending theory is changing all this and starting to instill a veritable equilibrium of research objectives and
theoretical modeling that nds its fulcrum in the neurosciences.
Unauthenticated
Download Date | 6/6/16 9:40 PM
2 Logic
Logic will get you from A to B. Imagination will take you everywhere.
Albert Einstein (18791955)
Introductory remarks
Formal linguistics and mathematics focus on the rules, rule types, and rulemaking principles that undergird the formation of forms (words, digits, sentences,
equations). Both have developed very precise methods to describe the relevant
apparatus of rules and their operations. An obvious question is what similarities
or differences exist between the two. As we saw in the previous chapter, formalist
approaches have actually revealed many similarities traceable to a common foundation in logic. As a matter of fact, formal linguistics implies formal mathematics
thus uniting the two disciplines, ipso facto, at least at the level of the study of
rules. If the focus is on the latter, then indeed formalism is of some value; if it
is deemed to be an overt or indirect theory of mind, as for example UG theory,
then its value is diminished, unless the theory can be validated empirically. This
chapter will look more closely at the main techniques and premises that underlie
both formal mathematics and formal linguistics, as well as at the main critiques
that can be (and have been) leveled at them.
Language and mathematics were thought in antiquity to share a common
ground in lgos, which meant both word and thought. The main manifestation of this mental feature was in logic and this, in turn, was the basis of linguistic
grammars and mathematical proofs. As we saw, Aristotle and Dionysius Thrax,
envisioned language as a logically structured system of grammatical rules of sentence formation (Bck 2000, Kempe 1986), in an analogous way that Pythagoras
and Euclid envisioned mathematical proofs as a set of statements that followed
from each other logically. The term lgos emerged in the sixth-century BCE with
the philosopher Heraclitus, who dened it as a divine power that produced
order in the ux of Nature. Through the faculty of logic, all human beings, he
suggested, shared this power. The Greeks thus came to see logic as a unique intellectual endowment allowing humans to transform intuitive and practical observations about the world into general principles. They separated lgos from mythos
(discussed below). So, the starting point for a comparative study of formal mathematics and linguistics is a discussion of logic. For this purpose, it can be dened
simply (and restrictively) as a faculty of the mind that leads to understanding
through reection and ordered organization of information.
Unauthenticated
Download Date | 6/6/16 9:41 PM
2 Logic
67
It is interesting to note that Saussure (1916) used the analogy of the game of
chess in basically the same way to distinguish between formal linguistic structure (langue) and its uses (parole). Studying the actual uses in themselves is
Unauthenticated
Download Date | 6/6/16 9:41 PM
68 | 2 Logic
impracticable, since they are unpredictable (parole); but the system that permits
them is not (langue). Getting at that system is the goal of linguistics, according
to Saussure and the later formalists. Moreover, since rules of grammar or rules
of proof are developed to organize relevant information by showing relations
among the parts within it, it is a small step to the belief that they mirror the laws
of thought. In showing how the moves literally move about the ultimate goal
is to understand how the mind plays the game of language or mathematics, so
to speak. But, as Colyvan puts it, because there are different versions of how the
game can be played, what the rule-makers end up doing is arguing over the nature
of the rules, losing sight of the original goalunraveling the raison dtre of forms
and their connection to logic in its fundamental sense of reasoning from facts and
thus of systematic organization of relevant information.
Formal linguistic and mathematical theories have always had a basis in logic.
Pn.ini, as we saw, described the Sanskrit language with a set of about 4,000 rules,
showing that many words were made up of smaller bits and pieces, which recur
in the formation of other words and thus are intrinsic parts of the grammar of
a language. Modern-day formalism also foresees its objectives essentially in this
wayas a study of how to identify the rules that describe grammar. The primary
goal is thus to come up with a set of consistent and complete rules that hold together logically. By studying these rules the assumption is that we are holding up
a mirror to the brain.
As a preliminary observation, it should be mentioned that the debates around
formalist approaches and the more central one of what they suggest in real (brainbased) terms have subsided somewhat today. The reason for this is that formalism
has become bogged down with strangling complexity (some of which we will see
in this chapter). After a very productive period (from the turn of the twentieth century to about the late 1990s), very little progress has been made since the start
of the new millennium in dening the types of rules and their logical properties
required to adequately describe language or mathematics. For this reason, many
linguists and mathematicians have apparently become tired of this line of inquiry.
Moreover, the cognitive linguistic movement, spearheaded by George Lakoff and
starting in the early 1980s (Lakoff and Johnson 1980), came forward to show that
we cannot separate meaning from the game, because if we do the game literally
has no meaning at all. Linguistics and mathematics have thus moved on somewhat, having become more and more interested in studying language and mathematics directly through the lens of meaning, seeing formalist games, by and large,
as adjuncts to this central interest. Nevertheless, the formalist episode in both
disciplines has been a very productive and insightful one, and continues to be so
with the advent of computer science and articial intelligence (chapter 3).
Unauthenticated
Download Date | 6/6/16 9:41 PM
69
Unauthenticated
Download Date | 6/6/16 9:41 PM
70 | 2 Logic
Unauthenticated
Download Date | 6/6/16 9:41 PM
71
Unauthenticated
Download Date | 6/6/16 9:41 PM
72 | 2 Logic
person to person, and there existed a rational logic in all humans that eventually had supremacy in governing human actions. Marx (1953) developed Hegels
philosophy into the theory of dialectical materialism by which he claimed that
human history (destiny) unfolded according to unconscious physical laws that
led to inevitable outcomes. On the other side of the debate, Nietzsche (1979) saw
intuition, self-assertion, and passion as the only meaningful human attributes,
with logic and reason being mere illusory constructs. Peirce (1931) developed a
comprehensive system of thought that emphasized the biological and social basis
of knowledge, as well as the instrumental character of ideas, thus uniting intuition and reason. Husserl (1970) stressed the experiential-sensory basis of human
thinking. For Husserl, only that which was present to consciousness was real. His
theoretical framework came to be known as phenomenology which has, since his
times, come to be a strong movement in psychology and philosophy dedicated to
describing the structures of experience as they present themselves to consciousness, without recourse to any theoretical or explanatory framework.
It is not surprising that many of the philosophers of the above eras were
also mathematiciansreecting the common origins of both forms of inquiry in
Ancient Greece. The Greeks, like many of these scholars, saw logic as the link between philosophy and mathematics. To understand mathematics therefore, one
had to study the nature of logic. They divided logic into two main categories
induction and deduction. The former involves reaching a general conclusion from
observing a recurring pattern; the latter involves reasoning about the consistency
or concurrence of a pattern. Induction is generalization-by-extrapolation; deduction is, instead, generalization-by-demonstration. They were, of course,
aware that there were other types of logic (as we shall see), but they argued that
induction and deduction were particularly apt in explaining mathematical truths.
2.1.2 Proof
The starting point for the development of any system of proof is a set of axioms
and postulates that are assumed to be self-evident. If A = B and A = C, then we
can condently conclude that B = C by the axiom of equalitythings equal to
the same thing are equal to each other. The axiom states something that we know
intuitively about the world. It has an inherent logical sense that needs no further
elaboration or explanation. Axioms, like those of Euclid and Peano in the previous
chapter, are common sensical in this way. Now, these can be used to carry out a
proof in arithmetic or geometry, which is essentially a set of statements (some of
which are axioms, some of which are previously proved theorems, and so on) that
are connected to each other by entailment. The sequential order of the parts in
Unauthenticated
Download Date | 6/6/16 9:41 PM
| 73
the set leads to a conclusion that is inescapable, much like Aristotles syllogisms.
Needless to say, self-evident notions may not always be self-evident, as we saw
with Euclids fth axiom. And, as some research in anthropology has shown, the
concept of axiom itself may not be universal, as the Greeks assumed (see relevant
articles in Kronenfeld, Bennardo, and de Munck 2011). Work in the pedagogy of
mathematics across the world has shown, moreover, that the methods of proof
and their foundation on axioms are not found everywhere.
As Colyvan (2012: 5) remarks, the basic idea is that mathematical truths can,
in some sense, be reduced to truths about logic. This entails several related assumptions or corollaries. One of these is that human thinking is not random, but
structured logicallythat is, the components of thought are connected to each
other in a systematic way, as mirrored in a syllogism. It is in the domain of proofmaking that we can observe how logic works. Another common belief is that the
rules of logic written by mathematicians are real in the sense that they accurately
represent the mental logic involved and thus, as Boole (1854) put it, the laws of
thought. So, studying proofs is studying the laws of logic in actu and, by extension, the laws of thought.
Let us take a classic proposition that the number of degrees in a triangle is
180 as a case-in-point of how proof unfolds. If one measures the sum of degrees
in hundreds or thousands of triangles, one will nd that they add up to 180 (giving some leeway for measurement errors). But we cannot be certain that this is
always the case. So, we put it forth as a proposition to be proved. If the proof is
successful it would turn the proposition into a theorem that allows us to use the
fact that 180 is the sum for all triangles in subsequent proofs. First, a triangle is
constructed with the base extended and a line parallel to the base going through
its top vertex (A). The angles at the other vertices are labeled with B and C, as
shown below:
A
Now we can use a previously proved theorem of plane geometrynamely that the
angles on opposite sides of a transversal are equal. In the diagram above, both
AB and AC are transversals (in addition to being sides of the triangle). We use the
previous theorem to label the equal angles with the same letters x and y:
Unauthenticated
Download Date | 6/6/16 9:41 PM
74 | 2 Logic
A
x
Now, we can use another established fact to show that the angles inside the triangle add up to 180namely, that a straight line is an angle of 180. To do this, we
label the remaining angle at the vertex A as z:
A
z
We can now see that the sum of the angles at A is x + y + z. Since these make up
a straight line, we assert that x + y + z = 180 by the axiom of equality. Next, we
look at the angles within the triangle and notice that the sum of these, too, add
up to x + y + z. Since we know that this sum is equal to 180, we have, again by
virtue of the axiom of equality, proved that the sum of the angles in the triangle
is 180. Since the triangle chosen was a general one, because x, y, and z can take
on any value we desire (less than 180 of course), we have proved the proposition
true for all triangles. This generalization-by-demonstration process is the sum and
substance of deductive thinking.
It is relevant to note here that the proof applies to two-dimensional triangles.
As discussed in the previous chapter, the mathematics changes for triangles in
higher dimensions, a fact that was actually established by the so-called GaussBonnet proof applied to n-dimensional Riemannian manifolds.
This proof is deductive. The relevant feature about deduction is the way in
which the various parts are put together sequentially, much like the sentences in
a coherent verbal text, and how each move from one part to the next has sequitur or entailment structurethat is, the choice of the moves is not random; it
is based on how each move derives from the previous one logically in sequence. It
is the coherence that leads us to accepting the conclusion (theorem) as being necessarily so. In the development of the proof, previously-proved theorems, axioms,
or established facts were used. This is analogous to the semiotic notion of intertexuality, whereby one text (in this case the proof at hand) alludes to, or entails,
Unauthenticated
Download Date | 6/6/16 9:41 PM
| 75
other texts (already-proved theorems and established facts). This, in turn, implies
associative thinking, not strict deductive thinking, whereby the solver brings in
information from outside the text that has bearing on the text.
As the Greeks found out early on, not all propositions can be proved by deduction. Some require induction. Consider the following well-known proposition:
to develop a formula for the number of degrees in any polygon. Lets consider a
triangle rstthe polygon with the least number of sides. The sum of the angles
in a triangle is 180. Next, lets consider any quadrilateral, which can be divided
into two triangles. By doing this, we discover that the sum of the angles in the
quadrilateral is equivalent to the sum of the angles in the two triangles, namely
180 + 180 = 360. The pentagon can be divided into three triangles and thus the
sum of its angles is equal to the sum of the angles in the three triangles: 180 +
180 + 180 = 540.
Continuing on in this way, we will nd that the number of angles in a hexagon
is equal to the sum of the angles in four triangles, in a heptagon to the sum of
the angles in ve triangles, and so on. Since any polygon can be segmented into
constituent triangles, we have uncovered a patternthe number of triangles that
can be drawn in any polygon is two less than the number of sides that make
up the polygon. For example, in a quadrilateral we can draw two triangles, which
is two less than the number of its sides (4), or (4 2); in a pentagon, we can
draw three triangles, which is, again, two less than the number of its sides (5),
or (5 2); and so on. In the case of a triangle, this rule also applies, since we
can draw in it one and only one triangle (itself). This is also two less than the
number of its sides (3), or (3 2). We can continue the same reasoning process as
far as our energy will permit us and we will not nd any exception to this pattern.
So, we can conclude that in an n-gon we can draw (n 2) triangles. Since we
know that there are 180 in a triangle, then there will be (n 2) 180 in an n-gon.
What if we do come across an exception? The answer goes somewhat as follows. Each experiment (segmenting a polygon into internal triangles) builds into
the next, moving from simple to increasingly more complex gures, but all connected by a structural principle (polygons can be dissected into triangles). Induction allows us, therefore, to discover a hidden principle or pattern by performing
various experiments on mathematical objects in order to esh the principle out.
Does the experiment come to an end? It does not because the proof is based on the
logical principle that if it applies to the nth case and then to the (n + 1)th casethe
one right after itit will establish the pattern without exception.
This is the underlying meta-principle of induction. To see how it works formally, consider the formula for summing a sequence of integers:
Sum(n) = n(n + 1)/2
Unauthenticated
Download Date | 6/6/16 9:41 PM
76 | 2 Logic
We start by showing that the formula works for the rst case, that is, for n = 1:
Sum(n) = n(n + 1)/2
Sum(1) = 1(1 + 1)/2 = 1
Sum(1) = 1(2)/2 = 1
Sum(1) = 2/2 = 1
The next step is to show that the formula works for the sum of (n + 1) terms:
Sum(n) = n(n + 1)/2
Sum(n+1) = Sum(n) +(n + 1)
Sum(n+1) = n(n + 1)/2 + (n + 1)
Sum(n+1) = n(n + 1)/2 + 2(n + 1)/2
Sum(n+1) = (n + 1)(n + 2)/2
Sum(n+1) = (n + 1)[(n + 1) + 1]/2
The form of the last formula is identical to the form of the one for Sum(n) . This can
be seen more readily by letting (n + 1) = m:
Sum(n+1) = (n + 1)[(n + 1) + 1]/2
Sum(m) = m[(m + 1)]/2
In this way, we have just shown that the formula is true for (n + 1). Since we can
choose n to be as large as we want, we have proved that the formula can be
applied to any series.
Proof by induction can be compared to the domino effect, whereby a row of
dominoes will fall in succession if the rst one is knocked over. If the (n + 1)th
domino falls, then we can be sure that the (n + 2)th will as well, and so on ad
innitum. Again, the demonstration convinces us because the assumption is that
logical structure is like a game with the moves of the pieces in this case seen to
go on forever. Note that the way in which an inductive proof progresses is also
sequential, albeit in a different way from deductive proof.
Within the sequence of any proof (deductive or inductive), the choice of the
parts does not come from some pre-established set of statements concatenated
mechanically, but as a result of insight thinking. In the rst proof above, the key
insight was that parts of intersecting lines can be combined to show that they
are equal. This was not a predictable aspect of the proof; it came from an insight
based on previous knowledge (the number of degrees in a straight line). In the
polygon proof the insight was that a polygon can be divided into constituent triangles. Insight thinking of this kind is neither deductive nor inductive; it was called
Unauthenticated
Download Date | 6/6/16 9:41 PM
77
abductive by Charles Peirce (19381951), dened as using hunches based on previous knowledge and experience that are mapped onto the problem at hand. So,
deduction and induction may indeed reveal how formal logic works, but they
also show that logic itself is, paradoxically, guided by an inferential and more
creative form of thought.
This is why a proof, like any text, will have many forms, subject to the inventiveness of the proof-maker. Moreover, the proof-maker might also have to devise
variants of a specic proof or else come up with a new type of logic to carry out
some new demonstration. Already Euclid was faced with several propositions that
he could not be prove deductively or inductively. So, he resorted to an ingenious
kind of logic, known as reductio ad absurdum. He used it to prove several important theorems, including the one that prime numbers are innite. Another
important proof was that irrationals were different from rationals. Euclid started
by noting that the general form of a rational number is p/q (q = 0). So, if 2 could
not be written in the form p/q, then we would have shown that it was not a rational. He did this by assuming the opposite, namely that the number 2 could
be written in the form p/q and then he went on to show that this would lead to a
contradiction.
Using a contemporary form of the proof, it proceeds like this. We start by
squaring both sides of the equation:
2 = p/q
(assumption)
(2)2 = (p/q)2
Therefore:
2 = p2 /q2
We multiply both sides by q2 :
2q2 = p2
Now, p2 is an even number because it equals 2q2 , which has the form of an even
number. So, p = 2n. Lets add this to the sequence of moves:
2q2 = p2
Since p = 2n:
2q2 = (2n)2 = 4n2
Therefore:
2q2 = 4n2
This equation can be simplied by dividing both sides by 2:
q2 = 2n2
Unauthenticated
Download Date | 6/6/16 9:41 PM
78 | 2 Logic
This shows that q2 is an even number, and thus that q itself is an even number.
It can be written as 2m (to distinguish it from 2n): q = 2m. Now, Euclid went right
back to his original assumptionnamely that 2 was a rational number:
2 = p/q
In this equation he substituted what he had just proved, namely, that p = 2n and
q = 2m:
2 = 2n/2m
2 = n/m
Now, the problem is that we nd ourselves back to where we started. We have
simply ended up replacing p/q with n/m. We could, clearly, continue on indenitely in this way, always coming up with a ratio with different numerators and
denominators: 2 = {n/m, x/y, . . . }. We have thus reached an impasse, caused
by the assumption that 2 had the rational form p/q, and it obviously does not,
because it produces the impasse. Thus, Euclid proved that 2 is not a rational
number by contradiction. The relevant feature here is that the proof is also sequential but it doubles back on itself, so to speak. The way the proof text is laid
out uses deductive logic, but the key insight comes from assuming that it produces
an absurdity. Much like an ironic text in language, this method of proof convinces
us through a kind of logical irony. As a matter of fact, this method of proof was
devised originally by one of the greatest ironists of ancient philosophy, Zeno of
Elea with his paradoxes. As Berlinski (2013: 83) observes, it assigns to one half
[of the mind] the position he wishes to rebut, and to the other half, the ensuing
right of ridicule.
Are the methods of proof truly reective of the laws of thought, or are they
a matter of historical traditions and much creative thinking? For one thing, not
all cultures in antiquity had a similar view of proof. The Greek approach has remained the central one in mathematics for several reasons: it seems to be effective
in translating practical knowledge into theoretical knowledge uidly; and, more
importantly, it was Greek mathematics that made its way to medieval and Renaissance Europe, where it was institutionalized into the discipline of mathematics
itself. Of course, there can be mathematics without proof, and there can be mathematics with different kinds of proof. But, somehow, the Greek approach has remained entrenched in the mindset of mathematicians. It is undeniably powerful.
Take the Pythagorean theorem. It is not just a recipe of how to construct any right
triangle; it is a model of space, since it tells us that certain spatial relations are the
way they are because of a hidden logical structure inherent in them.
Proofs of the same theorem have been found in many parts of the ancient
world (from China to Africa and the Middle East) long before the Pythagoreans put
Unauthenticated
Download Date | 6/6/16 9:41 PM
79
forward their own (Bellos 2010: 53). The archeological discovery of a Babylonian
method for nding the diagonal of a square suggests that the theorem was actually known one thousand years before Pythagoras (Musser, Burger, and Peterson
2006: 763). Actually, Pythagoras left no written version of the proof (it is described
through secondary sources). Many historians of mathematics believe that it was
a dissection proof, similar to the one below. First, we construct a triangle where
a2 + b2 = c2 . Then, we construct a square with length a + b (the sum of the lengths
of the two sides of the triangle). This is equivalent to joining four copies of the
triangle together in the way shown by the diagram:
a
b
a
c
c
c
a
b
The area of the internal square is c2 . The area of the large square is (a + b)2 , which is
equal to a2 + 2ab + b2 . The area of any one triangle in the square is ab. There are
four of them; so the overall area covered by the four triangles in the large square
is: 4 ( ab) = 2ab. If we subtract this from the area of the large square, [(a2 + 2ab +
b2 ) 2ab], we get a2 + b2 . This corresponds to the area of the internal square, c2 .
Using the axiom of equality, proof is now complete: c2 = a2 + b2 .
Many different kinds of proof of this theorem have been devised over the centuries. As Raju (2007) has argued, this shows that proof is not a closed system of
logic, but varies considerably. A broader view of proof, as Selin (2000) suggests,
will show that the acceptance of the Euclidean methods was due to the inuence
of the Graeco-Roman way of doing science on the Renaissances revival of knowledge and on the subsequent Enlightenment. The work in ethnomathematics is
showing, in fact, that cultures play major roles in determining how proof is understood and used (Ascher 1991, Goetzfried 2007). As Stewart (2008: 34) puts it,
proof is really a text, a mathematical story whose parts form a coherent unity:
What is a proof? It is a kind of mathematical story, in which each step is a logical consequence of the previous steps. Every statement has to be justied by referring it back to
previous statements and showing that it is a logical consequence of them.
Unauthenticated
Download Date | 6/6/16 9:41 PM
80 | 2 Logic
Unauthenticated
Download Date | 6/6/16 9:41 PM
81
6=2+2+2
8=2+3+3
7=2+2+3
9=3+3+3
10 = 2 + 3 + 5
11 = 3 + 3 + 5
Again, there is no known proof for this conjecture. From a practical perspective,
a proof for the conjectures may be unnecessary anyhow, for it would probably
not change anything in mathematics in any signicant way. But, mathematicians
continue to search for a proof, perhaps because it is part of the Euclidean game
that they continue to play. Proofs are convincing because like any closed text, they
provide closure. However, it seems that not all truths can be proved with the Euclidean rules of the game. As the Greek geometers, including Euclid, knew, some
constructions turn out to be impossible (squaring the circle, for instance). So, as
Peirce (19311958) often wrote, logic is useful to us because we can use it to explain
our practical mathematical know-how, but it may not apply to all mathematics
(Sebeok and Umiker-Sebeok 1980: 4041).
The major premise states that a category has (or does not have) a certain characteristic and the minor premise that some object is (or is not) a member of that
category. The conclusion then affirms (or negates) that the object in question has
the characteristic. By simply replacing the specic referents with letter symbols,
Unauthenticated
Download Date | 6/6/16 9:41 PM
82 | 2 Logic
we get a generalized picture of the logic involved: = all, H = set of humans, M =
set of mortals, K = set of Kings, = is a member of):
Major Premise:
Minor Premise:
Conclusion:
H M
KH
KM
Any member of the set with the H trait also has the M trait. If K is a member of H,
then we conclude that K is also a member of M. It is the process of substituting
symbols that shows why this is so in an abstract way. The syllogism shows that
the conclusion is decidable. It also shows consistency and completeness.
The syllogism remained the basis for formal mathematical analysis well
into the nineteenth century. Bertrand Russell wanted to ensure that its structure
would always allow mathematicians to determine which conclusions are valid (or
provable) and which are not (Russell 1903, Russell and Whitehead 1913). Using
a notion developed two millennia earlier by Chrysippus of Soli, Frege (1879)
had suggested that circularity (the nemesis of consistency) could be avoided by
considering the form of propositions separately from their content. In this way,
one could examine the consistency of the propositions without having them refer
to anything in the real world. As we saw (chapter 1), Freges approach inuenced
Wittgenstein (1921), who used symbols rather than words to ensure that the form
of a proposition could be examined for logical consistency separate from any
content to which it could be applied. If the statement it is raining is represented
by the symbol p and the statement it is sunny by q, then the proposition
it is either raining or it is sunny can be assigned the general symbolic form
p q (with = or). A proposition in which the quantier all occurs would be
shown, as indicated above, with an inverted . If the form held up to logical
scrutiny, then that was the end of the matter. Undecidability and circularity stem,
Wittgenstein affirmed, from our expectation that logic must interpret reality for
us. But that is expecting way too much from it. Wittgensteins system came to be
known as symbolic logicpregured actually by Lewis Carroll in his ingenious
book The Game of Logic (1887).
As discussed briey in the previous chapter, Russell joined forces with Alfred
North Whitehead to produce his masterful treatise, the Principia mathematica, in
1913. His objective was, as mentioned, to solve the problem of circularity, such
as the classic Liar Paradoxa dilemma that goes back to the fth century BCE,
when a host of intriguing debates broke out throughout Greece over the nature
and function of logic in philosophy and mathematics. Prominent in them were
the philosopher Parmenides and his disciple Zeno of Elea. The latter became famous (or infamous) for his clever arguments, called paradoxes, that seemed to
Unauthenticated
Download Date | 6/6/16 9:41 PM
| 83
defy common sense. The story goes that one of the most vexing of all the paradoxes concocted during the debates, known as the Liar Paradox, was uttered by
Protagoras. Its most famous articulation has been attributed, however, to the celebrated Cretan poet Epimenides in the sixth century BCE:
The Cretan philosopher Epimenides once said: All Cretans are liars. Did Epimenides speak the truth?
The paradox lies in the fact that the statement leads to circular reasoning, not to
a conclusion as in a syllogism. It is a menacing form of logic, because it suggests
that circularity might be unavoidable and that some statements are undecidable.
It thus exposes syllogistic logic as being occasionally useless. The source of the
circularity in the paradox is, of course, the fact that it was Epimenides, a Cretan,
who made the statement that all Cretans are liars. It arises, in other words, from
self-referentiality. Russell found the paradox to be especially troubling, feeling
that it threatened the very foundations of logic and mathematics. To examine the
nature of self-referentiality more precisely, he formulated his own version, called
the Barber Paradox:
The village barber shaves all and only those villagers who do not shave themselves. So, shall he shave himself?
Let us assume that the barber decides to shave himself. He would end up being
shaved, of course, but the person he would have shaved is himself. And that contravenes the requirement that the village barber should shave all and only those
villagers who do not shave themselves. The barber has, in effect, just shaved
someone who shaves himself. So, let us assume that the barber decides not to
shave himself. But, then, he would end up being an unshaved villager. Again
this goes contrary to the stipulation that he, the barber, must shave all and only
those villagers who do not shave themselvesincluding himself. It is not possible, therefore, for the barber to decide whether or not to shave himself. Russell
argued that such undecidability arises because the barber is a member of the village. If the barber were from a different village, the paradox would not arise.
Russell and Whitehead (1913) tackled circularity (and by implication, the
undecidability issue) in the Principia. But the propositions they developed led to
unexpected problems. To solve these, Russell introduced the notion of types,
whereby certain types of propositions would be classied into different levels
(more and more abstract) and thus considered separately from other types. This
seemed to avoid the problemsfor a while anyhow. The Polish mathematician
Alfred Tarski (1933) developed Russells theory further by naming each level of in-
Unauthenticated
Download Date | 6/6/16 9:41 PM
84 | 2 Logic
Unauthenticated
Download Date | 6/6/16 9:41 PM
| 85
Unauthenticated
Download Date | 6/6/16 9:41 PM
86 | 2 Logic
admitted (2002: 193): One can never rule out the chance that a short proof of the
Four-Color Theorem might some day be found, perhaps by the proverbial bright
high-school student.
Proof by computer raises fundamental epistemological questions for formal
mathematics. Above all else, it raises issues about the larger question of decidability, as Fortnow (2013) has cogently argued. The gist of Fortnows argument
can be paraphrased as follows. If one is asked to solve a 9-by-9 Sudoku puzzle,
the task is considered to be a fairly simple one. The complexity arises when asked
to solve, say, a 25-by-25 version of the puzzle. And by augmenting the grid to 1000by-1000 the solution to the puzzle becomes gargantuan in terms of effort and time.
Computer algorithms can easily solve complex Sudoku puzzles, but start having
difficulty as the degrees of complexity increase. The idea is, therefore, to devise
algorithms to nd the shortest route to solving complex problems. So, the issue of
complexity raises the related issue of decidability, since there would be no point
in tackling a complex problem that may turn out not to have a solution. If we let P
stand for any problem with an easy solution, and NP for any problem with a difficult complex solution, then the whole question of decidability can be represented
in a simple way. If P were equal to NP, P = NP, then problems that are complex (involving large amounts of data) could be tackled easily as the algorithms become
more efficient (which is what happened in the Four-Color solution). The P = NP
problem is the most important open problem in computer science and formal
mathematics, as will also be discussed in the next chapter. It seeks to determine
whether every problem whose solution can be quickly checked by computer can
also be quickly solved by computer. Work on this problem has made it evident that
a computer would take hundreds of years to solve some NP questions and sometimes go into a loop (the halting problem). Indeed, to prove P = NP one would
have to use, ironically, one or more of the classic methods of proof. We seem to be
caught in a circle where algorithms are used to determine some proofs and vice
versa, some proofs are used to determine some algorithms.
So, is a computer algorithm a proof? And what does it tell us about mathematical statements? It is certainly logical, because the algorithm is a text consisting of
sequential instructions, revealing the same kind of sequential structure that traditional proofs have but with a different language. In other words, computer logic is
really a type of language that involves nite-state (closed) systems of instruction,
like a Turing machine or a Markov chain. The algorithm is a nished product; the
process of arriving at it is still inferential-abductive in the same way that traditional proofs are. Is a simple deductive proof of the Four-Color Theorem hidden
in the computer instructions somewhere? Can it be extracted and reformulated in
more traditional ways through abduction?
Unauthenticated
Download Date | 6/6/16 9:41 PM
87
Unauthenticated
Download Date | 6/6/16 9:41 PM
88 | 2 Logic
a ball, such as a deformed melon, a baseball bat with bulges, and the like. The
surface of the ball, but not of the doughnut, is simply connected. Any simply
connected two-dimensional closed surface, however distorted, is topologically
equivalent to the surface of a ball. Poincar wondered if simple connectivity characterized three-dimensional spheres as well. His conjecture was nally proved by
Russian mathematician Grigory Perelman in 2002, posting his solution on the Internet (OShea 2007, Gessen 2009). It is much too complex to discuss here (being
over 400 pages). Suffice it to say that a logical diagnosis of the proof shows that it
involves many kinds of logic and inferential processesanalogies, connections,
hunches. As Chaitin (2006: 24) observes, mathematical facts are not isolated,
they are woven into a spiders web of interconnections. And as Wells (2012: 140)
aptly states:
Proofs do far more than logically certify that what you suspect, or conjecture, is actually
the case. Proofs need ideas, ideas depend on imagination and imagination needs intuition,
so proofs beyond the trivial and routine force you to explore the mathematical world more
deeplyand it is what you discover on your exploration that gives proof a far greater value
than merely conrming a fact.
What seems certain in all this is, as iterated throughout this chapter, that our
brain might indeed possess the faculty that the Greeks called lgos. Analyzing
mathematics as a practical activity is not sufficient, in the same way that it is not
sufficient to study language just as a communicative activity. In both cases we seek
to understand the faculty (mathematics or language) as some faculty of the brain.
This means converting practical into theoretical knowledge. It is the conversion
process that is relevant here, since it is part of how the brain makes discoveries.
The practical knowledge of knotting patterns that produced a right triangle was
not enough for the Greeks; they wanted to understand why these were true in
an abstract way. So, they took the rst step in establishing mathematics as an
explanatory, rather than just utilitarian, discipline for use in everyday life. Proof
solidies a utilitarian practice by demonstrating that it ts in with the logic and
logistics of established ideas.
Unauthenticated
Download Date | 6/6/16 9:41 PM
89
kind of proofs that Cantor introduced into mathematics, given that they laid out
the rudimentary principles of an emerging set theory in his era and, and given
that, as Lakoff argued about the Gdelian proofs (previous chapter), they can be
used to pinpoint the areas of connectivity between mathematical and linguistic
(metaphorical) thought.
The type of proof that Cantor used was a one-to-one correspondence proof.
Following Lakoff, it can be said in hindsight, that Cantor utilized a metaphorical
blendthat is, a form of proof that amalgamates two seemingly separate domains
and putting them together to produce insight. Actually, for the sake of historical
accuracy, the kind of thinking that Cantors proof displays can be found in an
observation made by Galileo, who suspected that mathematical innity posed
a serious challenge to common sense. In his 1632 Dialogue Concerning the Two
Chief World Systems he noted that the set of square integers can be compared,
one-by-one, with all the whole numbers (positive integers), leading to the incredible possibility that there may be as many square integers as there are numbers
(even though the squares are themselves only a part of the set of integers). How
can this be, in view of the fact that there are numbers that are not squares, as the
following comparison of the two sets seems to show? The bottom row of the comparison simply contains the integers that on the top row are also squares. So, for
instance, 2 is not also a square, but 4 is, since it can be broken down into 22 .
The comparison thus shows the relevant gaps between the top row (the complete
set of integers) and the bottom row (the subset of square integers):
Integers = 1
10
11
12
Squares = 1
Figure 2.5: Initial correspondence of the set of integers with the set of square numbers
As one would expect, this method of comparison shows that there are many more
blanks in the bottom set (the set of square integers), given that it is a subset of the
top set (the set of whole numbers). So, as anticipated, this proves that the set of
whole numbers has more members in it than the set of square numbers. But does
it, asks Galileo? All one has to do is eliminate the blanks and put the top numbers
in a direct one-to-one correspondence and we get an incredible result.
This shows that no matter how far we go down along the line there will never
be a gap. All we have to prove this is to use induction. If we stop at, say, point n
on the top row and nd that below it the point is n2 , all we have to do is go to
the next point (n + 1) and check if the bottom point is (n + 1)2 and thus induce
the fact that this will indeed go on forever. But this is hardly all there is going on
Unauthenticated
Download Date | 6/6/16 9:41 PM
90 | 2 Logic
Integers = 1
10
11
12
Squares = 1
16
25
36
49
64
81
100
121
144
22
32
42
52
62
72
82
92
102
112
122
12
Figure 2.6: Second correspondence of the set of integers with the set of square numbers
cognitively here. It can, in fact, be argued that the initial insight comes from an
unconscious conceptual metaphorA line has no gaps vis--vis another parallel
line. Lines are made up of distinct points and the number of these is the same as
it is for any other line of equal length. So, the proof shows that there are as many
squares as integersa totally unexpected result.
As a product of an unconscious conceptual metaphor it tells us a lot more.
Indeed, it allowed Cantor to proceed to prove many more other theorems with the
same kind of logic. The method is by analogy (that is, by ana-logic, the logic of
correspondence) which, as Hofstadter has argued persuasively (Hofstadter 1979,
Hofstadter and Sander 2013), is a powerful force in mathematical and scientic
discoveries.
In 1872, Cantor showed that the same one-to-one correspondence logic can
be used to prove that the same pattern holds between the whole numbers and
numbers raised to any power:
Integers = 1
10
11
12
Powers = 1n
2n
3n
4n
5n
6n
7n
8n
9n
10n
11n
12n
Figure 2.7: Correspondence of the set of integers with the set of positive integer exponents
Unauthenticated
Download Date | 6/6/16 9:41 PM
91
ing numbers. The method is elegant and simple because, again, it comes from a
metaphorical blend. So, instead of putting numbers in a linear one-to-one pattern, he put them into a zigzag diagonal layout. It is not necessary to go through
the proof here, since it is well known. Suffice it to say that we cannot help but be
impressed by the result. When Cantors overall logic is understood, it ceases to
look like the product of the overactive imagination of a mathematical eccentric. It
is indeed logical, in a metaphorical way.
Cantor classied those numbers with the same cardinality as belonging to
the set aleph null, or 0 (the rst letter of the Hebrew alphabet). He called 0
a transnite number. Remarkably, he found that there are other transnite numbers. These constitute sets of numbers with a greater cardinality than the integers.
He labeled each successively larger transnite number with increasing subscripts
{0 , 1 , 2 , }.
So, what does proof prove? Simply put, it shows , or more accurately, convinces
us that something is the way it is. So, it may well be that the many proofs de-
Unauthenticated
Download Date | 6/6/16 9:41 PM
92 | 2 Logic
Repeating the process starting with 1 gives the sequence 1, 101, 101000101,
101000101000000000101000101, Cantors set is a Boolean set, which pregures fractal theory. The set can, in fact, be extended to encompass at surfaces.
The result is called the Sierpinski carpet, named after Waclaw Sierpinski, who
used the Cantor set to generate it:
Unauthenticated
Download Date | 6/6/16 9:41 PM
| 93
Produced in 1916 it was one of the rst examples of a fractal. Connectivity among
ideas, including forms and rules, is the essence of mathematical thinking, and
thus goes well beyond syllogistic logic. So, what can we conclude, if anything?
One thing is that there is no logic without imagination. It is the latter that likely
spurs mathematicians on to nd things that cannot be proved. There are many
open questions, or conjectures, in mathematics that tantalize the intellect, yet
shut out its logical side. In the 1930s, mathematician Lothar Collatz noticed a pattern. For any number n, if it is even, make it half, or n/2; if it is odd, triple it and
add one, or (3n + 1). If one keeps repeating this rule, we always end up with the
number one. Here is a concrete example:
Example = 12
12/2 = 6
6/2 = 3
(3)(3) + 1 = 10
10/2 = 5
(3)(5) + 1 = 16
16/2 = 8
8/2 = 4
4/2 = 2
2/2 = 1
Is this always the case? Is there a number where oneness is not achieved? There
seems to be some principle in this conjecture that, if unraveled, might lead to deep
discoveries. How do we prove it? There is no known answer. The pattern is there,
but the proof is undecidable. Proof by contradiction, or reductio ad absurdum,
Unauthenticated
Download Date | 6/6/16 9:41 PM
94 | 2 Logic
might be useful in this sense or even proof by exhaustion. Something can be either
yes or no, but not both. Aware of this verity, Aristotle claried the connection between contradiction and falsity in his principle of non-contradiction, which states,
simply, that an assertion cannot be both true and false. Therefore if the contradiction of an assertion (not-P) can be derived logically from the assertion (P) it can
be concluded that a false assumption has been used. The discovery of contradictions at the foundations of mathematics at the beginning of the twentieth century,
however, led mathematicians to reject the principle of non-contradiction, giving
rise to new theories of logic, which accept that some statements can be both true
and false.
To unpack the cognitive nature of contradiction, consider a well-known proof
in geometry, namely that for any non-degenerate right triangle, the length of the
hypotenuse is less than the sum of the lengths of the two remaining sides. The
proof relies, of course, on the Pythagorean theorem, c2 = a2 + b2 . The claim is that
a + b > c. As in any proof by contradiction, we start by assuming the opposite,
namely that a + b c. If we square both sides, we get the following:
(a + b)2 c2
or
a2 + 2ab + b2 c2
A triangle is non-degenerate if each side has positive length, so it may be assumed
that a and b are greater than 0. Therefore:
(a + b)2 c2
or
a2 + b2 < a2 + 2ab + b2 c2
The transitive relation can now be reduced to:
a2 + b2 < c2
Since the Pythagorean theorem is a2 + b2 = c2 we have reached a contradiction,
since strict inequality and equality are mutually exclusive. This means that it is
impossible for both to be true and we know that the Pythagorean theorem holds.
Thus, the assumption that a + b c must be false and hence a + b > c, proving the
claim. In abstract terms such a proof can be represented as follows (P = proposition we wish to disprove and S is the set of statements or premises that have been
previously established). We consider P, or the negation of P (P), in addition to S;
if this leads to a logical contradiction F, then we can conclude that the statements
in S lead to the negation of P (P), or P itself.
Unauthenticated
Download Date | 6/6/16 9:41 PM
| 95
If
S {P}
then
S P .
Or if
S {P}
then
S P.
Proof in this sense is certainly much broader and exible than it was in classical Euclidean method. Proof by computer, too, is another form of proof that falls
outside the method. By accepting proof by computer, mathematicians have, actually, taken the induction principle one step furtherlet the computer decide if
something is computable, decidable, or not. The computer is a powerful iteration
machine that allows us to look at what happens when some pattern is iterated ad
innitum. Take fractal geometry again. A self-similar shape in this eld is a shape
that, no matter, what scale is used to observe it, resembles the whole thing. The
Mandelbrot set, or M-set, is the most widely known reproduced image in mathematics:
The set was generated in the 1980s when computer power to make it possible became available. The mathematics behind the M-set is relatively simple, since it
involves adding and multiplying numbers: z = z2 + c. The key is iterationrules repeated without end. The image of the M-set is a result of iteration. Mandelbrot had
found that for certain values of z the outputs would continue and grow forever,
while for others they shrunk to zero. The M-set therefore emerges as a modelit
denes the boundary limit between two classes of number. Outside the lines are
Unauthenticated
Download Date | 6/6/16 9:41 PM
96 | 2 Logic
free z-values bound for innity; inside are prisoners destined for extinction. Incredibly, every object has a fractal dimension, dened as a statistical roughness
measure. Formulas for human lungs, trees, clouds, and so on can be generated
entirely articially based on a measure of their iterative complexity. Fractal geometry thus has emerged as a secret language of nature, telling us that iteration is an
inherent principle in the structure of the universe, at least in some of its parts. It is
amazing to contemplate that a simple logic game played by Mandelbrot has had
so many scientic reications.
Unauthenticated
Download Date | 6/6/16 9:41 PM
97
1.
Universal sets consist of all members being considered at any one time. For
example, the set of all the positive integers is {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, }.
2. An innite set contains an endless number of members. The integers, for instance, form an innite set: {1, 2, 3, 4, }.
3. A nite set, on the other hand, has a specic number of members. One such
set is the set of natural single digits including zero: {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}.
4. An empty set, also called a null set, has no members. The symbol 0 is used to
show this set: {0}. An example of an empty set is all odd numbers that end
in 2there is no such set of course.
5. A single element set contains only one member. For example, the set of all
primes less than or equal to 2 contains only 2.
6. Equivalent or equal sets have the same number of members. For instance, the
set of even numbers under ten, {0, 2, 4, 6, 8}, is equal to the set of odd numbers
under ten, {1, 3, 5, 7, 9}.
7. Overlapping sets have some members in common. If the set of last years class
math stars is M1 = {Alex, Sarah, Betty} and the set of this years stars is M2 =
{Alex, Sarah, Tom}, sets M1 and M2 overlap because Alex and Sarah belong to
both sets. This relation between sets is usually shown with intersecting circles
in which the common members are included in the area of overlap: (a = Alex,
s = Sarah, b = Betty, t = Tom):
M1
b
M2
a s
t
Figure 2.11: Overlapping sets
8. Disjoint sets have no members in common. The set of even numbers and the
set of odd numbers are disjoint because they do not have any elements in
common.
9. Subsets are sets contained within other sets. For example, the set of even numbers, E = {0, 2, 4, 6, 8, }, is a subset of the set of all integers, I = {0, 1, 2, 3, 4,
5, 6, }. This is shown with E I.
Such notions clarify many aspects of the logical calculus, showing how different
sets with different members can sometimes interact or not at all. In some ways set
theory is a precursor to logic. In fact, it was developed from Booles symbolic logic
and the theory of sets as developed by De Morgan as a way of using mathematical
symbols and operations to solve problems in logic. Above all else it has shown that
Unauthenticated
Download Date | 6/6/16 9:41 PM
98 | 2 Logic
thought might be visual, since set theory is essentially a theory of logic diagrams
that show, rather than tell (so to speak), where and what the logical connections
and patterns are among numbers.
2.2.1 Diagrams
Set theory makes it possible to envision commonality among what would otherwise be seen as disparate elements and to show how these can relate to each other.
Diagrams such as the overlapping circles above are called Venn diagrams, after
British logician John Venn (1880, 1881), who was the rst to use them. These provide visual snapshots of the constitution and operation of sets, bringing out the
logical patterns inherent in them. The translation of sentential (syllogistic) logic
to diagram logic started with Leonhard Euler. Before the advent of Venn diagrams,
Euler represented categorical or sentential statements in terms of diagrams such
as the following, which clearly pregure the Venn diagrams (Hammer and Shin
1996, 1998):
All A are B.
B
No A is B.
A
Some A is B.
A B
Some A is not B.
A
The usefulness of the diagrams over the sentential forms lies in the fact that no additional conventions, paraphrases, or elaborations are neededthe relationships
holding among sets are shown by means of the same relationships holding among
the circles representing them. In other words, we do not have to worry about the
various problems that plague syllogistic logic (as discussed); all we have to do is
observe the logical relations through the conguration of the diagrams.
Euler was however aware of both the strengths and weaknesses of diagrammatic representation. For instance, consider the following problematic syllogism:
1.
2.
3.
No A is B.
Some C is A.
Therefore, some C is not B.
Unauthenticated
Download Date | 6/6/16 9:41 PM
99
Euler realized that no single diagram could be devised to represent the two
premises, because the relationship between sets B and C cannot be fully specied
in one single diagram. Instead, he suggested three possible cases:
(Case 1)
A C
(Case 2)
A C
(Case 3)
A C
Euler claimed that the proposition Some C is not B can be read from all these diagrams. But it is far from clear which one is best. It was Venn (1881: 510) who tackled
Eulers dilemma by pointing out that the weakness lay in the fact that Eulers
method was too strict. Venn aimed to overcome Eulers dilemma by showing
how partial information could be visualized. So, a diagram like the following one
(which he called primary) does not convey specic information about the relationship between sets A and B:
This is not just a clever rewriting of Eulerian logic diagrams; it is different because it does not represent any specic information about the relation between
two sets. Now, for the representation of premises, Venns solution was to shade
them (Venn 1881: 122). With this simple modication, we can draw diagrams for
various premises and relations as follows (see Figure 2.15).
But even this system poses dilemmas. It was Charles Peirce (1931) who pointed
out that it had no way of representing existential statements, disjunctive information, probabilities, and relations. All A are B or some A is B cannot be shown by
either the Euler or Venn systems in a single diagram. But this does not invalidate
diagrammatic representation. It is not possible here to deal with Peirces solution to such logical dilemmas, known as Existential Graph theory (see Roberts
2009). Basically, he showed that the use of diagrams enhanced the power of logical reasoning and especially predicate logic. Like Euler, Peirce saw a diagram as
Unauthenticated
Download Date | 6/6/16 9:41 PM
100 | 2 Logic
A
A
A
A
A
A
B
B
B
Figure 2.15: Venn diagrams
anything showing how the parts correlated to each other. This was evident especially in the outline of the diagram, which is a trace to how the thought process
unfolded. In other words, it is a pictorial manifestation of what goes on in the mind
as it grapples with structural-logical information. Graphs thus display the very
process of thinking in actu (Peirce 19311958, vol. 4: 6), showing how a given argument, proof, or problem unfolds in a schematic way (Parker 1998, Stjernfelt 2007,
Roberts 2009). Graphs allow us to grasp something as a set of transitional states.
Therefore, every graph conveys information and simultaneously explains how we
understand it. It is a picture of cognitive processes in action. And it doubles back
on the brain to suggest further information or ideas. The following citation encapsulates Peirces notion of graph. In it, we see him discussing with a general why
a map is used to conduct a campaign (Peirce 19311958, vol. 4: 530):
But why do that [use maps] when the thought itself is present to us? Such, substantially,
has been the interrogative objection raised by an eminent and glorious General. Recluse
that I am, I was not ready with the counter-question, which should have run, General, you
make use of maps during a campaign, I believe. But why should you do so, when the country
they represent is right there? Thereupon, had he replied that he found details in the maps
that were so far from being right there, that they were within the enemys lines, I ought to
have pressed the question, Am I right, then, in understanding that, if you were thoroughly
and perfectly familiar with the country, no map of it would then be of the smallest use to
you in laying out your detailed plans? No, I do not say that, since I might probably desire
the maps to stick pins into, so as to mark each anticipated days change in the situations of
the two armies. Well, General, that precisely corresponds to the advantages of a diagram
of the course of a discussion. Namely, if I may try to state the matter after you, one can make
exact experiments upon uniform diagrams; and when one does so, one must keep a bright
lookout for unintended and unexpected changes thereby brought about in the relations of
different signicant parts of the diagram to one another. Such operations upon diagrams,
whether external or imaginary, take the place of the experiments upon real things that one
performs in chemical and physical research.
Unauthenticated
Download Date | 6/6/16 9:41 PM
101
Interestingly, topological theory has become a model of many natural phenomena. It has proven useful, for instance, in the study of the DNA. Stewart (2012:
105) elaborates as follows:
One of the most fascinating applications of topology is its growing use in biology, helping
us understand the workings of the molecule of life, DNA. Topology turns up because DNA is
a double helix, like two spiral staircases winding around each other. The two strands are intricately intertwined, and important biological processes, in particular the way a cell copies
its DNA when it divides, have to take account of this complex topology.
Unauthenticated
Download Date | 6/6/16 9:41 PM
102 | 2 Logic
propositions, rule systems, and the like. The interplay of the Innenwelt with the
Umwelt is what produces knowledge. This interplay is much more complex and
exible than theories of logic have generally allowed, since it includes, as argued
throughout this chapter, inventive and creative processes.
This suggests that mathematics is both invented and discovered. The word
invention derives from Inventio, which in western rhetorical tradition refers to one
of the ve canons used for the elaboration of arguments. More broadly, the word
meant both invention and discovery, indicating that the two are intrinsically intertwined. Discovery comes about through largely creative-serendipitous processes,
whereas invention entails intentionality. For example, re is a discovery, but rubbing sticks to start a re is an invention. The general principles of arithmetic derive
from the experience of counting. Naming the counting signs (numerals) allows us
to turn these principles into ideas that can be manipulated intellectually and systematically. This whole line of thought suggests an anthropic principle, which
states that we are part of the world in which we live and are thus privileged to
understand it best. Al-Khalili (2012: 218) puts it as follows:
The anthropic principle seems to be saying that our very existence determines certain properties of the Universe, because if they were any different we would not be here to question
them.
The question becomes why all this is so. It is one of the greatest conundrums of human philosophy. We could conceivably live without the Pythagorean theorem. It
tells us what we know intuitivelythat a diagonal distance is shorter than taking
an L-shaped path to a given point. And perhaps this is why it emergedit suggests that we seek efficiency and a minimization of effort in how to do things and
how to classify the world. But in so doing we squeeze out of our economical
symbolizations other ideas and hidden truths. To put it another way, the practical
activity of measuring triangles contained too much information, a lot of which
was superuous. The theorem renes the information, throwing out from it that
which is irrelevant. The ability to abstract theories and models from the world of
concrete observations involves the optimal ability to throw away irrelevant information about the world in favor of new information that emerges at a higher level
of analysis (Neuman 2007, Nave Neuman, Howard, and Perslovsky 2014).
In the end, all theories and speculations about the nature of mathematics
are just thatspeculations. It is useful to reiterate them here, using Ren Thoms
(2010: 494) typology:
1. The Formalist Position. Formalists claim that mathematical objects are derivations of rules that cohere logically. This was the stance taken by Russell and
Whitehead.
Unauthenticated
Download Date | 6/6/16 9:41 PM
2.
3.
103
The Platonic Position. Platonists claim that mathematical objects have an autonomous existence; the mathematician does not create them; he or she discovers them like an explorer might discover an unknown territory.
The Constructivist Position. Constructivists claim that the mathematician
builds complex mathematical forms from simpler ones and then applies
them within and outside mathematics. The use of mathematics to do things
is a practical outcome of this.
Unauthenticated
Download Date | 6/6/16 9:41 PM
104 | 2 Logic
Unauthenticated
Download Date | 6/6/16 9:41 PM
105
S
NP
Det
VP
N
NP
Art
Det
Def
Art
Def
The
boy
loves
the
girl
Figure 2.16: Tree diagram for The boy loves the girl
The diagram shows the hierarchical relation among the symbols in the string.
Each level in the tree is called a Markov state. The input state is S and the output, or end-state, is the string at the bottom of the tree. This version of generative
grammar was also called a state-grammar.
The rules show how a linear string is governed by hierarchical phrase structure and states of generation. Thus, the string The boy loves the girl may appear
linear to the ear or the eye, but it is actually the output of a series of states, specied by rules connected sequentially (one state leads to another) to each other. This
type of diagram was actually introduced by a modern-day founder of linguistics,
Wilhelm Wundt (1880, 1901). Like Chomsky, Wundt saw the sentence as the basic
unit of language. Rules, therefore, are not merely a convenient way of describing
sentence structure, but a formal means of showing how the parts in a sentence
relate to each other in specic ways.
The above rules tell only a part, albeit a central one, of the generation of
sentences. They produce simple declarative, or deep-structure, sentences. A true
theory of grammar would include transformational rules which change deepstructure strings into more complex outputs. So, the passive version of the above
sentence, The girl is loved by the boy, would result from the application of a
transformational rule, such as the one described in the previous chapter. It is the
transformational component of linguistic competence that is language-specic,
and thus produces linguistic diversity in grammars, not the base or deep-structure
component.
There are a number of theoretical issues raised by this early standard form
of transformational-generative (TG) grammar, such as how to determine the sequence of application of transformational rules to an input (originally called a
cycle) and the subsequent assignment of morphological and phonological features to the transformed string by a different set of rules. Suffice it to say that the
Unauthenticated
Download Date | 6/6/16 9:41 PM
106 | 2 Logic
distinction between deep structure inputs and surface structure outputs by means
of ordered sets of rules describes the system used by Chomsky sufficiently for our
purposes. The key aspect of the TG model is that of movement from one state
or sets of states to another, as in formal mathematical proofs. Indeed, in early
versions of the theory, the rules were called part of a nite-state system of logic,
meaning that the movement from one state to another came to an end.
In the early model, there are thus two syntactic componentsthe base component (consisting of phrase structure rules) and the transformational component
(consisting of transformational rules), which generate deep and surface structures respectively. Deep structures are seen to be the input to the semantic component, which assigns meaning to the string (via further rules), basically through
lexical insertion and constraints on the insertions from syntactic conditions. The
surface structures that result from the application of the transformations constitute the input to the phonological component, which assigns a phonemic description to the string (also via further rules).
The early theory of TG grammar looked like this:
Sy
nta
Base component
Deep structures
Transformational
component
surface
structures
Semantic
component
Semantic
representation
of sentences
Phonological
component
Phonological representation of sentences
The task of the linguist is to specify the rules that are in each of the boxes. These
represent the native speakers linguistic competence because, in knowing how to
produce and understand sentences, the speaker, Chomsky claimed, has an internal representation of these rules. All the linguist is doing is giving form to this
representation. The simple elegance of this early model has been marred since at
Unauthenticated
Download Date | 6/6/16 9:41 PM
107
least the mid-1970s, in part by Chomsky himself, who has conceded that there may
be no boundary between syntax and semantics and hence no deep structures, at
least as he originally envisioned them. I actually disagree since the early model is
still useful for describing structural relations among sentences, such as the activepassive one. The problems that emerged subsequently are, to my mind, basically
squabbles that crop up within any theoretical school.
One thing has remained constant, thoughsyntactic rules are the essence of
linguistic competence (the syntax hypothesis). Chomsky claimed, further, that as
linguists studied the nature of rules in different languages they would eventually
discover a universal set of rule-making principles. From this basic planrevised
at various points after the 1965 expos (for example, Chomsky 1966a, 1966b, 1975,
1982, 1986, 1990, 1995, 2000, 2002)formal TG theory took its cue. Basically, a
TG grammar is an approach for devising a set of rules for writing base strings and
transforming them into complex (and language-specic) ones. It is fundamentally
similar to the propositional logic used by mathematicians to indicate how strings
of symbols follow from each other through statements of various kinds. Grammar is thus seen as a generator and the rules as the elements that activate the
generator. Of course, there is little room for phenomena such as grammaticalization where words themselves, if they acquire new functions, trigger grammatical
change; or the fact the communicative competence (parole) may change grammar
in and of itself.
One of the key notions in TG grammar is that of parsing, which is used to
specify how the phrases are composed and what rules are needed to specify their
composition. Parsing is dened as the process of representing a string in terms
of its phrase-structure relations. The meaning of the symbols in a string (input
and output) is considered to result from how the strings are structureda notion
called compositional semantics in later versions of generative grammar. By breaking down a string (parsing it) part by part in its deep structure form, we can determine its meaning. In other words, meaning is dependent on syntax. Although
various factions in the TG grammar movement broke away from this premise, by
and large, meaning has always constituted a difficult problem for this movement.
In my view, compositional semantics with its basis in lexical insertion is the best
t for any version of the theory. In the rules above, called production rules, the
parts that are not lexical are called symbols, including the start symbol (S), until
slots in a string occur whereby insertions from the lexicon occur. For example, an
insertion rule would specic that the verb love cannot be inserted if the preceding
noun phrase is, say, the rock. If the same string is generated by the same set of
rules, production and lexical, then the grammar is said to be ambiguous. Avoiding ambiguity of this type took up a large swath of research activity on the part of
TG grammarians throughout the 1970s and 1980s. Other models have emerged to
Unauthenticated
Download Date | 6/6/16 9:41 PM
108 | 2 Logic
connect syntax to semantics but, as it has turned out, these have hardly migrated
to mainstream linguistic practices, indicating that they are relevant only within
the game of generative grammar, to use Colyvans metaphor once again.
Meaning in the sense of language connecting with outside of language referents (social and environmental) is seen to fall literally outside of linguistic theory
proper. It is seen to be part of psychology and pragmatic knowledge, not linguistic
competence per se, and should thus be relegated for study in applied areas, such
as sociolinguistics and psycholinguistics. Linguistic theory is seen as a pristine
theory about linguistic competence, not about the uses and variability of speech.
Unauthenticated
Download Date | 6/6/16 9:41 PM
109
a nite set, P, of production rules, with each rule having the Kleene Star form:
( N)* N(( N)* ( N))*; the Kleene star operator is a set of instructions for mapping symbols from one string to another; these include phrase
structure and transformational systems
If is an alphabet (a set of symbols), then the Kleene star of , denoted *, is
the set of all strings of nite length consisting of symbols in , including the
empty string.
The concept of Kleene Star operator is basic to this type of rule systemit too is a
direct adoption from formal mathematics. If S is a particular set of strings, then the
Kleene star of S, or S*, is the smallest set of S that contains the empty string and is
closed under the string concatenation operationthat is, S is the set of all strings
that can be generated by concatenating strings in S. Below are some examples
({} = empty set):
1. 0* = {}, since there are no strings of nite length consisting of symbols in 0,
so is the only element in 0*
2. If, say, E = {}, then E* = E, since a = a = a by denition, so =
3. If, say, A= {a}, then A* = {, a, aa, aaa, }.
4. If = {a, b}, then * ={, a, b, aa, ab, ba, bb, aaa, }
5. If S = {ab, cd}, then S* = {, ab, cd, abab, abcd, cdab, cdcd, ababab, }
With this set of meta-rule-making principles, which are really the rules of combinatory algebra, it is possible to write the phrase structure grammar of any
language. Differences among languages occur at the transformational level; that
is, languages are differentiated by the kinds of transformation rules applied and
used, not by phrase structure. The grammar now can be dened in terms of
how strings relate to each other. The system in its entirety is rather complex and
need not be detailed here. The upshot is that grammars are built from a small
set of meta-rule-making principles, becoming complex through derivational and
transformational processes.
For instance, consider the grammar of a hypothetical language, L, made up of
N = {S, B) and = {a, b, c}, S the start symbol, and the following phrase structure
or simply production (P) rules:
1. S aBSc
2. S abc
3. Ba aB
4. Bb bb
Now, L can be dened as L = {an bn cn | n 1} where an denotes a string of consecutive as, bn , a string of consecutive bs, and cn , a string of consecutive cs. L is the
Unauthenticated
Download Date | 6/6/16 9:41 PM
110 | 2 Logic
Unauthenticated
Download Date | 6/6/16 9:41 PM
1.
2.
3.
4.
5.
6.
7.
111
AB+C
BF+G
F a (terminal)
G b (terminal)
CD+H
D c (terminal)
H d (terminal)
Research on context-free grammars has shown that these do not generate all kinds
of strings required by both natural and articial languages. The articial language
L = {an bn cn | n 1} above is not a context-free language, since at least one symbol
(for example, a) is followed by the same number of another symbol (for example, b).
In the set of production rules for regular grammars the same constraint of a
single nonterminal symbol on the left-side holds but, in addition, the right-hand
side is also restricted. It may also contain an empty string, a single terminal symbol, or a single terminal symbol followed by a nonterminal symbol. Rules in a
regular grammar might look like this:
1. S aA
2. A aA
3. A bB
4. B bB
5. B (terminal)
Many variations and extensions of these rule-making principles now exist in the
relevant literature. They have been developed not only by linguists but also by
computer scientists to generate actual language samples. Indeed, the latter eld
is the one that has most benetted by the work in formal grammars, applying the
rules of natural language grammars to the construction of articial languages.
One of the claims of formal grammarians generally is that language in its deep
structure is based on the principle of recursion. In mathematics, a classic example of recursion is the Fibonacci sequence, Fn = Fn1 + Fn2 , which generates the
following well-known sequence:
{1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, . . . } .
The recursion formula provides a snapshot of the internal structure of the sequence. An analogous claim is made by formal grammarians who indicate that a
recursive grammar is actually the key to unlocking the UG in the brain, explaining why we feel that some sentences are genuine, while others are not. However,
although such snapshots may be useful for relating the words in strings to each
Unauthenticated
Download Date | 6/6/16 9:41 PM
112 | 2 Logic
other via grammatical categories; they hardly tell us what generates or triggers
the recursive rules themselves in the rst place. Aware of this, Chomsky has
suggested that the rules explicate only the ways in which sentences are formed
mentally and then realized physically in real grammars. One can infer the former
from the latter. He introduced the distinction between language in general, which
he calls I-language, and languages in particular, which he calls E-languages, in
order to make this point. Chomsky put forward the notion of a UG to explain
the I-language, explicating why children learn to speak so effortlessly without
trainingwhen the child learns one fact about a language, the child can easily
infer other facts without having to learn them one by one. Differences in language
grammars are thus explainable as choices of rule types, or parameters. From
recursive patterns observed in these we come to understand the role of recursion
in the UG.
But, then, this solution begs the fundamental question of deciding which
sentences are basic (in the I-language) and which ones are contextualized adaptations (in E-languages). It is beyond the scope of the present discussion to deal
with the relevant arguments for and against UG theory. The theory is implanted
on the view that recursion in the I-language reects the nature of recursion in the
UG. Although this may be somewhat reductive, overall it captures the gist of this
line of formal grammar research. It also implies that meaning has no effect on the
I-language, since it is an innate logical form. Meaning is a product of external factors in the formation of E-languages. And this need not concern grammariansit
is something for psychologists and philosophers to gure out.
Let us look a litle more closely at the concept of recursion. Essentially, it is
dened as the process whereby a procedure goes through one of the steps in the
procedure, evoking the procedure itself. The procedure is a set of steps based on
a set of rules. Chomsky applied this mathematical notion to natural language in
1965, in reference to the embedding of clauses within sentences. Thus, two distinct
sentences(1) You see that boy; (2) That boy is my grandsoncan be embedded
into each other to produce The boy who you see is my grandson by a recursive rule.
Chomsky calls this particular model of language X-Bar Theory. If we let, x and y
stand for two grammatical categories, and x-bar and y-bar for the corresponding
grammatical phrases, Chomsky claims that rule x-bar x + y-bar is the underling
recursive principle of language. Take, as an example, the sentence The clock is in
the corner. X-Bar Theory would analyze this sentence (schematically at least) as
follows:
Deep structure recursion principle:
x-bar x + y-bar
Unauthenticated
Download Date | 6/6/16 9:41 PM
113
Surface rule:
x-bar = n-bar = noun phrase (the clock, the corner)
y-bar = p-bar = prepositional phrase (in the corner)
where:
n = noun (clock, corner)
p = preposition (in)
Structure of The clock is in the corner:
n-bar n + p-bar p + n-bar n
Supplemented with an appropriate system of transformational rules that assign
word order and sentence relations, Chomsky maintains that X-Bar Theory is
sufficient to explain the basic blueprint of language. If Chomsky is right, then,
the uniqueness of language comes down to a single rule-making principle that
species how word order develops. But then how would X-Bar Theory explain
languages in which word order is virtually irrelevant? Many critics have, in fact,
argued that languages such as Classical Latin do not display any evidence of recursion, because they encode grammatical relations by means of inection, that
is, by variations or changes that their words undergo to indicate their relations
with other words and changes in meaning. Chomsky has countered that one of
the word combinations in a language such as Latin or Russian is a basic one and
the others are its transformations. But deciding which one is basic is problematic,
given that all sentence permutations are perceived as basic by Latin or Russian
speakers according to the context in which each one is uttered: that is, the choice
of one or the other word order depends on stylistic, communicative, and other
types of factors, not on syntactic ones.
Recursion is certainly an operative principle in the structure of grammar, but
does it really explain language? As Daniel Everett (2005) has shown, albeit controversially, recursion may not be a universal feature after all since it is absent from
the Pirah language, spoken by the people of Amazones in Brazil. The reason,
according to Everett, is that cultural factors have made recursion unnecessary.
This does not minimize the importance of recursion in rule systems, including
grammatical ones, but it may well be a human invention, not an innate faculty
of the mind. That is, it is our way of formalizing repeating forms that come under
our observation. Information is highly recursiveideas built within other ideas
ad innitum. But this raises the question of what information is and what our
theories of information are all about. There is no proof that recursion is an inbuilt
property of information systems, but rather that it is a useful construct to describe
certain patterns within certain kinds of information. Moreover, the main feature
of information-processing is the discarding of information, as discussed. One of
Unauthenticated
Download Date | 6/6/16 9:41 PM
114 | 2 Logic
the main tasks of the brain is to eliminate information that is either irrelevant or
else unrelated to what we need to extract from it. So, rules are really just responses
to how we select from information what we need or what we believe is relevant.
Rules are interpretations, not absolute statements of fact.
Moreover, the connection between linguistic competence and performance
is rarely, if ever, taken into consideration by formal grammarians, even though,
as most other approaches to language would now sustain, the use of language
is governed by features of communication that may themselves initiate change in
language grammars. Grammar is just one of the ways that allows people to express
their concepts of the world, not a hard-wired innate faculty organized into modules in the brain (Fodor 1983). Language draws upon general cognitive resources
to make sense of the world. The assumption of formal grammarians, on the other
hand, is that the essence of linguistic competence is an abstract sense of grammar,
not a sense of meaning.
Unauthenticated
Download Date | 6/6/16 9:41 PM
115
we represent this feature with the symbol [+aspirated], we can now specify the
difference between the two allophones more precisely[ph ] is marked as [+aspirated] and [p], which does not have this feature, as [aspirated]. The [aspirated]
symbol is a distinctive feature.
In effect, all linguistic units can be described in terms of distinctive features.
This includes the lexicon, whose units can be specied in terms of features that are
mapped against the structural prole of strings or slots in rules. It is a particular
kind of dictionary that contains not only the distinctive-feature specication of
items, but also their syntactic specication, known as subcategorization. Thus,
for example, the verb put would be subcategorized with the syntactic specication that it must be followed by a noun phrase and a prepositional phrase (I put
the book on the table). It cannot replace loving for this specic syntactic reason.
On the other hand, love would also t nicely into the same slot (I love the book on
the table). It is thus irrelevant what the verbs mean, as long as they are mapped
correctly onto strings via the rules of insertion. The different meanings of the two
sentences, due to different lexical insertions, are seen as being determined by extralinguistic socio-historical conventions of meaning, not by internal processes of
language.
Lexical insertion involves its own hierarchical structure and set of rules. For
example a verb such as drink can only be preceded by a subject that is marked
as [+animate] (the boy, the girl, and so on). If it is so marked, then it entails further feature-specication in terms of gender ([+male], [+female]), age ([+adult],
[adult]), and other similar notions. An example of how the lexicon would classify
the four lexemes man, boy, woman, girl is the following tree diagram:
person
[+animate]
[+male]
[+female]
[+adult]
[adult]
[+adult]
[adult]
man
boy
woman
girl
Unauthenticated
Download Date | 6/6/16 9:41 PM
116 | 2 Logic
Unauthenticated
Download Date | 6/6/16 9:41 PM
117
gluing together of the parts comes under the name of Glue Theory, a rather appropriate term (Dalrymple, Lamping, and Saraswat 1993, Dalrymple 1999, 2001).
The claim is that meaning composition in any context (from the sentence to the
discourse text) is constrained by a set of instructions, called meaning constructors,
stated within a formal logic, which states how the meanings of the parts of a sentence can be combined to provide the meaning of the sentence or set of sentences.
The idea of compositionality was discussed in a detailed fashion even before
TG grammar by Bar-Hillel (1953), who used the term categorial grammar to characterize the process. A categorial grammar assigns a set of types (called categories)
to each basic nonterminal symbol, along with inference rules, which determine
how a string of symbols follows from constituent symbols. It has the advantage
that the inference rules can be xed, so that the specication of a particular language grammar is entirely determined by the lexicon. Whereas a so-called lambda
calculus (which is essentially the name of the types of rules used by formal grammarians) has only one type of rule, A B, a categorial grammar has two types:
(1) B/A, which describes a phrase that results in a phrase B followed on the right by
a phrase of type A; (2) A\B, which describes a phrase of type B when preceded on
the left by a phrase of type A. The formalization of types of categorial grammars is
known as type-logical semantics or Lambek calculus (Lambek 1958, Morrill 2010).
Although some valid arguments have been put forward in defense of compositionality concerning its psychological basis, many formal semanticists have by
and large kept their distance from it. The principle is seen as simply explaining
how a person purportedly can understand sentences he or she has never heard
before. However, Schiffer (1987) showed how this is a spurious argument. He illustrates his case with the following sentence: Tanya believes that Gustav is a dog.
Compositionality can never account for the content of Tanyas belief (given that
dog has various references). Partee (1988) counters that Schiffer did not distinguish between semantic and psychological facts. Formal semantics, she claimed,
provides a theory of entailment and this, in itself, cannot be excluded from any
viable theory of language understanding.
Despite Partees counter-argument, there is very little going on in this area of
formal semantic study today, perhaps because when one has come to the specication of the rules of production, compositionality, or lexical insertion, there is
very little left to do. On the other hand, some linguists now claim that the whole
approach was misguided from the outset. But this would constitute a baby-andthe-bathwater counter-argument. One of the achievements of formal grammar
and formal semantics is that linguists have become more aware of the logical
structure of grammar and, perhaps, of discourse. It remains to be seen how far
this insight can go with the ongoing research in cognitive linguistics and discourse
theory generally.
Unauthenticated
Download Date | 6/6/16 9:41 PM
118 | 2 Logic
Unauthenticated
Download Date | 6/6/16 9:41 PM
119
Apostrophe
Hyperbole
Metaphor
Metonymy
Personification Other
Oxymoron
Synechdoche Other
Unauthenticated
Download Date | 6/6/16 9:41 PM
120 | 2 Logic
Conceptual metaphor
happiness is up
sadness is down
more is up
less is down
Linguistic metaphor
My grandson is nally feeling up after a long bout with
stress.
But I am feeling down, since I have way too much to do.
Our family income went up considerably last year.
But her salary went down.
The image schema is a blending mechanism, which amalgamates concrete experience with abstraction. In an early version of CMT, the formation of conceptual
metaphors was seen as a mapping process, whereby the elements in a source domain were mapped onto the target domain via image-schematic mechanisms. The
mapping was not seen as exclusive to language, but also as guiding representational practices in general. Consider the concept of time in English. Common
conceptual metaphors of time include source domains such as a journey (Theres
a long way to go before its over), a substance (Theres not enough time left to nish
the task), a person (Time comes and goes), and a device (Time keeps ticking on),
among others. These source domains manifest themselves as well in representations such as mythical gures (Father Time), narratives (The Time Machine, 1895,
by H. G. Wells), and others. So, CMT became a broad movement because of the
fact that it provided a means of linking the internal system of language to external systems of representation. To the best of my knowledge, this had never been
accomplished before in a systematic descriptive way.
More technically, the process constitutes a blend (as already discussed) which
involves several components. There is a generic space, as it is called, which guides
the mapping between the target and source domains, called a diagrammatic
mapping. The image schema undergirds the diagrammatic mapping through its
content which comes from the imagic mapping of sensory perception. This
produces the blend and thus metaphor, which is a conceptual blend that results
from the integration of the various components (see Figure 2.20).
In this revised model, mapping is part of blending. Other conceptual structures also result from the latter process (for example, metonymy and irony), but
each in a different way. Mapping best describes metaphor, whereas a part-for-thewhole blend best describes metonymy. The notion of conceptual metaphor has
had far-reaching implications. Substantive research has come forward to show
how conceptual metaphors coalesce into a system of cultural meanings that inform representations, symbols, rituals, activities and behaviors. Lakoff and Johnson (1980) called this coalescence idealized cognitive modeling (ICM). This is dened as the unconscious formation of over-arching models that result from the repeated blending of certain target domains with specic kinds of source domains.
Unauthenticated
Download Date | 6/6/16 9:41 PM
| 121
Generic space
DIAGRAMMATIC
MAPPING
DIAGRAMMATIC
MAPPING
Image schema
DIAGRAMMATIC
MAPPING
Source
Target
Image content
IMAGIC
MAPPING
Sensory perception
Blend
CONCEPTUAL MAPPING
AND INTEGRATION
METAPHOR
To see what this means, consider the target domain of ideas. The following conceptual metaphors, among others, are used in English to deliver the meaning of
this concept (from Danesi 2007):
ideas are food
1. My profs ideas left a sour taste in my mouth.
2. I always nd it hard to digest her ideas at once.
3. Although she is a voracious reader; she cant chew all the complex ideas in
that book.
4. She is always spoon-feeding her students.
ideas are persons
5. Freud is the father of modern psychology, isnt he?
6. Some medieval ideas continue to live on even today.
7. Quantum mechanics is still in its infancy.
8. Maybe we should resurrect Euclidean geometry.
9. She breathed new life into logical methods.
Unauthenticated
Download Date | 6/6/16 9:41 PM
122 | 2 Logic
Unauthenticated
Download Date | 6/6/16 9:41 PM
123
These are linguistic metaphors based on the conceptual metaphor people are animals. In (1), the latter concept can show up as a verb, if it is the snakes movements
that are implicated; in (2) it manifests itself as an adjective, if it is a quality of the
snake that is implicated instead. The two different grammatical categories can
be seen to reect different nuances of metaphorical meaning. Work has shown
that such lexicalizations are common in grammars throughout the world (Cienki,
Luka, and Smith 2001). Differences in word order, too, can often be traced to conceptual distinctions. In Italian, for instance, the difference between the literal and
metaphorical meaning of an adjectival concept is often reected by the different
position of the adjective in a noun phrase:
1.
2.
In the rst example it is the literal meaning of povero that is reected in the noun
phrase by the post-positioning of the adjective with respect to the noun. In the
Unauthenticated
Download Date | 6/6/16 9:41 PM
124 | 2 Logic
second one the metaphorical meaning of povero is brought out by means of its prepositioning with respect to the noun, alerting the interlocutor in an anticipatory
fashion to this meaning.
Ronald Langacker (1987, 1990, 1999) has argued that the parts of speech themselves are the result of specic image schemas working unconsciously. Nouns, for
instance, encode the image schema of a region. Thus, a count noun such as leaf is
envisioned as referring to something that encircles a bounded region, and a mass
noun such as rice a non-bounded region. Now, this difference in image schematic
structure induces grammatical distinctions. Thus, because bounded referents can
be counted, the form leaf has a corresponding plural form leaves, but rice does
not. Moreover, leaf can be preceded by an indenite article (a leaf ), rice cannot.
In research on the worlds languages, these examples come up constantly. The
research also shows that not all languages use the same classication system of
nouns. The reason for this has a basis in historical context. In Italian, grapes is a
mass noun, uva, perhaps because the fruit plays a key role in Italian culture (not
only as a fruit but as part of wine-making and other activities).
It is worth noting that, even before the advent of cognitive linguistics, the
Gestalt psychologists were seriously entertaining the possibility that many concepts were indeed metaphorical in origin. Rudolf Arnheim (1969: 242), for example, explained the raison dtre of function words such as prepositions and conjunctions as the result of image schemas (before the use of that term):
I referred in an earlier chapter to the barrier character of but, quite different from although, which does not stop the ow of action but merely burdens it with a complication.
Causal relations are directly perceivable actions; therefore because introduces an effectuating agent, which pushes things along. How different is the victorious overcoming of a
hurdle conjured up by in spite of from the displacement in either-or or instead; and
how different is the stable attachment of with or of from the belligerent against.
The gist of the research in cognitive linguistics, therefore, suggests that grammar
and meaning cannot be separated. Montague tried to get around this critique before the advent of the cognitive linguistic movement in several ways, as we saw,
and Sperber and Wilson added the idea of relevance as being implicit in the application of the rules. For Chomsky (2000, 2002) the crux to understanding language
continues to be the syntax hypothesis, with meaning embedded in syntax. Cognitive linguists view the whole situation in reversesyntax is embedded in meaning
processes.
In sum, in treating linguistic knowledge as a form of everyday knowledge encoded into words and larger structures, the cognitive linguistic movement is a
radically different one from formalism, and poses a strong challenge to the latter. In response, formal grammarians have developed sophisticated counterargu-
Unauthenticated
Download Date | 6/6/16 9:41 PM
| 125
ments, claiming that words themselves are without meaning: they have, at best,
internal representations of meaning, which are really just ways of using words in
previously-derived strings of symbols. Along these lines, they argue that compositionality can be extended to discourse texts. Today, neuroscientic research is
being used more and more to resolve the debate. When a metaphor is produced,
different regions of the brain are activated in tandem, as fMRI studies have shown.
For instance, Prat (2012: 282) investigated the neural correlates of analogical mapping processes during metaphor comprehension by subjects using the fMRI technique. Prat explains his experiment and ndings as follows:
Participants with varying vocabulary sizes and working memory capacities were asked to
read 3-sentence passages ending in nominal critical utterances of the form X is a Y. Processing demands were manipulated by varying the preceding contexts. Three gurative conditions manipulated difficulty by varying the extent to which preceding contexts mentioned
relevant semantic features for relating the domains of the critical utterance to one another.
In the easy condition, supporting information was mentioned. In the neutral condition, no
relevant information was mentioned. In the most difficult condition, opposite features were
mentioned, resulting in an ironic interpretation of the critical utterance. A fourth, literal
condition included context that supported a literal interpretation of the critical utterance.
Activation in lateral and medial frontal regions increased with increasing contextual difficulty. Lower vocabulary readers also had greater activation across conditions in the right
inferior frontal gyrus. In addition, volumetric analyses showed increased right temporoparietal junction and superior medial frontal activation for all gurative conditions over
the literal condition. The results from this experiment imply that the cortical regions are
dynamically recruited in language comprehension as a function of the processing demands
of a task. Individual differences in cognitive capacities were also associated with differences
in recruitment and modulation of working memory and executive function regions, highlighting the overlapping computations in metaphor comprehension and general thinking
and reasoning.
In reviewing the fMRI studies on metaphor, Wang and Daili (2013) concluded,
however, that the results are not always this clear; they tend to be ambiguous,
albeit promising. In the context of the present discussion, their review nevertheless points out that metaphor can no longer be relegated to subsidiary status in a
theory of language.
Unauthenticated
Download Date | 6/6/16 9:41 PM
126 | 2 Logic
discussed in the next chapter, formalism has had important applications to articial intelligence research and robotics. Language development in children, for
example, has been modeled in robots in order to test the validity of rule systems
and how these operate algorithmically. Interestingly, robots have been found to
develop word-to-meaning mappings without grammatical rulesa very enigmatic
nding to say the least. Algorithms can also be devised to model trends in data and
create reliable measures of similarity among natural textual utterances in order to
construct more reliable rule systems. Without formal approaches, the vastly complex information present in discourse data would have remained inaccessible to
linguists. With the proliferation of the Internet and the abundance of easily accessible written human language on the web, the ability to create a program capable
of reproducing human language on a statistical analysis of the data would have
many broad and exciting possibilities.
In the early 1970s the American linguist Dell Hymes (1971) proposed that
knowledge of language entailed more than linguistic competence, or languagespecic knowledgeit also entailed the ability to use language forms appropriately in specic social and interactive settings. He called this kind of knowledge
communicative competence, a term that has since become central in the study of
language. Hymes also maintained that such competence was not autonomous
from linguistic competence, but, rather, that it was interrelated with it. Moreover,
the words used in conversations are cues of social meanings, not just carriers
of lexical and grammatical information. To carry out a simple speech act such
as saying hello requires a detailed knowledge of the verbal and nonverbal cues
that can bring about social contact successfully. An infringement or misuse of
any of the cues will generally lead to a breakdown in communication. Every
conversation unfolds with its own kind of speech logicthat is, with its own
set of assumptions and implicit rules of reasoning that undergird its sequence,
form, and overall organization (Danesi and Rocci 2000). So, if we have learned
anything from the history of formal mathematics and linguistics it is that a pure
abstract theory of language or mathematics is an ideal, not a reality. Saussures
and Chomskys articial dichotomy between langue and parole is ill-founded,
as it turns out. Reconnecting the two through a study of meaning structures, as
in CMT, is the way in which progress towards answering the basic question of
what language is can be achieved. This has become evident even in computational models of Natural Language Processing, as will be discussed in the next
chapter.
Unauthenticated
Download Date | 6/6/16 9:41 PM
Turings 1936 paper, published shortly after Gdels, also proved that in logical
systems some objects cannot be computed, which is another way of saying that
they are undecidable. An undecidable problem in computer science is one for
which it is impossible to construct a single algorithm that always leads to a correct
yes-or-no answer. This notion became an important early insight for determining
what could be programmed in a computer.
By extension, one can claim that any formal grammar will have a Gdelian
aw in it. Finding the undecidable proposition or rule in a formal grammar has
never been undertaken, as far as I know. But my guess is that it can be found with
some effort.
The Gdelian critique of formal grammar does not mean that formal approaches should be discarded. On the contrary, the efforts of formal linguists,
like those of mathematical logicians, have not been without consequences. As
mentioned, they have had applications in computer programming. But when it
comes to natural language, formal grammar theories break down because they
have never been able to account for meaning in any successful way. Simply put, in
human language strings of symbols involve interpretations of what they mean, not
just a processing of their sequential structure as in computer software. And those
interpretations come from experience that emanates from outside the strings.
Unauthenticated
Download Date | 6/6/16 9:41 PM
128 | 2 Logic
This dynamic between form and meaning was studied deeply by Vygotsky (1961:
223) who understood that they are really inextricable, and that when we speak we
are really involved with meaning and thought in tandem:
A word without meaning is an empty sound: meaning, therefore, is a criterion of word, its
indispensable component. But from the point of view of psychology, the meaning of every
word is a generalization or a concept. And since generalizations and concepts are undeniably acts of thought, we may regard meaning as a phenomenon of thinking. It does not
follow, however, that meaning formally belongs in two different spheres of psychic life. Word
meaning is a phenomenon of thought only in so far as speech is connected with thought and
illuminated by it. It is a phenomenon of verbal thought, or meaningful speecha union of
word and thought.
It should be mentioned initially here that neuroscientists are coming closer and
closer to accepting the cognitive linguistic work as being real in a psychological
sense; although contrasting work on the neuroscience of logic is also highly interesting and suggestive (for example Houd and Tzourio-Mazoyer 2003, Krawczyk
2012, Monti and Osherson 2012, Smith et al. 2015). A notion that has come forth
to attempt a compromise between formalism and cognitivism in both language
and mathematics is that of network. In previous work (Danesi 2000), this notion
was used to exemplify how various forms of language had a branching structure
to produce integrated layers of meanings. So, the meaning of cat is something
that can only be extrapolated from the network of associations that it evokes, including mammal, animal, organism, life, whiskers and tail. This has a denotative
branching structure within the network. By adding metaphorical branches (as in
Hes a cool cat and The cat is out of the bag), the network is extended to enclose
gurative and other kinds of meanings.
Unauthenticated
Download Date | 6/6/16 9:41 PM
2.5.3 Overview
As argued in this chapter, formalist approaches are important in many ways. But
they are always fraught with challenging paradoxes. A classic one is the Unexpected Hanging paradox (a paradox to which we will return in subsequent chapters). It goes somewhat like this:
A condemned logician is to be hanged at noon, between Monday and Friday. But he is not
told which day it would be. As he waits, the logician reasons as follows: Friday is the nal
day available for my hanging. So, if I am alive on Thursday evening, then I can be certain
that the hanging will be Friday. But since the day is unexpected, I can rule that out, because
it is impossible. So, Friday is out. Thus, the last possible day for the hanging to take place
is Thursday. But, if I am here on Wednesday evening, then the hanging must perforce take
place on Thursday. Again, this conicts with the unexpectedness criterion of the hanging.
So, Thursday is also out. Repeating the same argument, the logician is able to rule out the
remaining days. The logician feels satised, logically speaking. But on Tuesday morning he
is hanged, unexpectedly as had been promised.
This is a truly clever demonstration of how one can reason about anything, and
yet how the reasoning might have nothing to do with reality. Are formalist theories
subject to the Unexpected Hanging paradox? Aware of the profoundly disturbing
aspect of this line of reasoning, David Hilbert (1931) put forth a set of requirements
that a logical theory of mathematics should obey. Known as Hilberts program, it
was written just before Gdels theorem as a framework for rescuing mathematics
from what can be called the Unexpected Hanging conundrum. Hilberts program
included the following criteria which, as we have seen throughout this chapter,
make up the underlying paradigm of formalism:
1. Formalization. A complete formalization of mathematics, with all statements
articulated in a precise formal language that obeyed well-dened rules.
2. Completeness. The formalization system must show that all mathematical
statements are true.
3. Consistency. A proof that no contradiction can be obtained in the formal set
of rules.
4. Conservation. A proof that any result relating to real things by using reasoning about ideal objects can be provided without the latter.
5. Decidability. An algorithm must be determined for deciding the truth or falsity
of any mathematical statement.
Hilberts program was put into some question by Gdels demonstration, but it
continues to have validity as a heuristic system for conducting mathematical activities. The current versions of mathematical logic, proof theory, and so-called
reverse mathematics, are based on realizing Hilberts programreverse math-
Unauthenticated
Download Date | 6/6/16 9:41 PM
130 | 2 Logic
ematics is a system that seeks to establish which axioms are required to prove
mathematical theorems, thus turning the Euclidean system of proof upside down,
going in reverse from the theorems to the axioms.
Hilberts program was based on the hope that mathematics could be formalized into one system of the predicate calculus, whether or not it linked mathematics to reality. Similarly, Chomsky has always claimed that his theory is about
grammar, not language as it is spoken and used. But the implicit assumption in
both Hilbert and Chomsky is that logical formalism and reality are an implicit
match. This is known as logicismthe attempt to make logic the core of mathematics and language and then to connect it to reality. Aware of the issues connected
with this stance, Hilbert made the following insightful statement (cited in Tall
2013: 245):
Surely the rst and oldest problems in every branch of mathematics spring from experience
and are suggested by the world of external phenomena. Even the rules of calculation with
integers must have been discovered in this fashion in a lower stage of human civilization,
just as the child of today learns the application of these laws by empirical methods. But,
in the further development of a branch of mathematics, the human mind, encouraged by
the success of its solutions, becomes conscious of its independence. It evolves from itself
alone, often without appreciable inuence from without, by means of logical combination,
generalization, specialization, by separating and collecting ideas in fortunate ways, in new
and fruitful problems, and appears then itself as the real questioner.
Without going here into the many responses to Hilberts program, including the
P = NP problem, it is sufficient to point out that both formal mathematics and formal linguistics have opened up signicant debates about the nature of language
and mathematics. The Unexpected Hanging conundrum, however, continues to
hang over [pun intended] both. As Tall (2013: 246) comments, mathematicians
and linguists must simply lower their sights, continuing to use formalism only
when and where it is applicable:
Instead of trying to prove all theorems in an axiomatic system (which Gdel showed is not
possible), professional mathematicians continue to use a formal presentation of mathematics to specify and prove many theorems that are amenable to the formalist paradigm.
Unauthenticated
Download Date | 6/6/16 9:41 PM
The words of language, as they are written or spoken, do not seem to play any role in the
mechanism of thought. The psychical entities which seem to serve as elements in thought
are certain signs and more or less clear images which can be voluntarily reproduced and
combined. There is, of course, a certain connection between those elements and relevant
logical concepts. It is also clear that the desire to arrive nally at logically connected concepts is the emotional basis of this rather vague play with the above mentioned elements.
But taken from a psychological viewpoint this combinatory play seems to be the essential
feature in productive thoughtbefore there is any connection with logical construction in
words or other kinds of signs which can be communicated to others.
Unauthenticated
Download Date | 6/6/16 9:41 PM
3 Computation
Computing is not about computers any more. It is about living.
Nicholas Negroponte (b. 1943)
Introductory remarks
The P = NP problem discussed in the previous chapter is a profound one for mathematics. A starting point for understanding its import is a famous computing challenge issued by the security company, RSA Laboratories, in 1991. The company
published a list of fty-four numbers, between 100 and 617 digits long, offering
prizes of up to two hundred thousand dollars to whoever could factor them. The
numbers were semiprimes, or almost-prime numbers, dened as the product of
two (not necessarily different) prime numbers. In 2007 the company retracted
the challenge and declared the prizes inactive, since the problem turned out to
be intractable. But the challenge did not recede from the radar screen of mathematicians, as many tried to factor the numbers using computers. The largest
factorization of an RSA semiprime, known as RSA-200, which consists of 200 digits, was carried out in 2005. Its factors are two 100-digit primes, and it took nearly
55 years of computer time, employing the number eld sieve algorithm, to carry
out. This algorithm is the most efficient one for factoring numbers larger than
100 digits.
The enormity of the RSA challenge brings us directly into the core of the P = NP
problem. Can a problem, such as the RSA one, be checked beforehand to determine if it has a quick solution? The problem is still an outstanding one, and it
too carries a price tag of one million dollars, offered this time around by the Clay
Institute. To reiterate here, the P = NP problem entails asking whether a problem
whose solution can be determined to be possible by computer can also be solved
quickly by the computer. Not surprisingly, the problem was mentioned by Gdel in
a letter he sent to John von Neumann in 1956, asking him whether an NP-complete
problem could be solved in quadratic or linear time. The formal articulation of the
problem came in a 1971 paper by Stephen Cook. Of course, it could well turn out
that a specic problem itself will fall outside all our mathematical assumptions
and techniques. Quadratic time refers to the fact that the running time of an algorithm increases quadratically if the size of the input is doubled. That is, as we
scale the size of the input by a certain amount, we also scale the running time by
the square of that amount. If we were to plot the running time against the size of
the list, we would get a quadratic function.
Unauthenticated
Download Date | 6/6/16 9:42 PM
3 Computation |
133
The foray in the last chapter into formalism led to the N = NP dilemma, which
constitutes a basis for investigating mathematics and language in terms of algorithms and computer models. One of the more important byproducts of the
formal grammar movement has been a growing interest in the modeling of natural
and articial languages. Known as computational linguistics (CL), it is a branch
that aims to devise algorithms in order to see what these yield both in terms of
machine-based processing systems and in terms of what they reveal about human
language. CL has had many interesting implications and applications, from machine translation to the study of language development. The interplay between
theoretical linguistics and CL has become a valuable one, since computational
models of language can be used not only to test linguistic theories but also to devise algorithms for generating useful articial languages, such as those used on
the Internet.
Because computers have an enormous capacity for data-processing, they are
heuristic devices that allow the linguist to examine large corpora of data and glean
from the data relevant insights into language and discourse. Without the computational approach, the vastly complex information present in discourse data
would have remained largely inaccessible to linguists and the current emphasis on discourse within linguistics, sociolinguistics, and applied linguistics might
never have come about. Indeed, the use of computer technology in discourse analysis has made it a relatively simple task to extract from the data the relevant patterns and categories that are hidden within it and thus to describe the rules of
discourse in as straightforward a manner as the rules of grammar.
A similar approach is found in mathematics, known generally as computability theory (CT), which asks questions such as the following one: How many sets
of the natural numbers are there, such as the primes, the perfect numbers, and
so on? There are more random numbers than ordered ones in sets. So, is there
any way, or more precisely is there an algorithm, that can tell us which is which?
Consider a set, A, which consists of certain numbers. Are, say, 23 and 79 in the set
or not? Can an algorithm be developed that can answer this question, which can
be rephrased as the question of whether 23 and 59 are computable? Clearly this
kind of approach penetrates the nature of sets and of membership in sets and,
thus, leads to a more comprehensive understanding of what logic is.
CL and CT are fascinating in themselves, especially in areas such as the N = NP
problem and in so-called Natural Language Processing (NLP), which constitutes
an attempt to make computers produce language in a more naturalistic manner.
Using linguistic input from humans, algorithms have been constructed that are
able to modify a computer systems style of production based on such input, thus
simulating the adaptability of verbal communication. The focus is on how humans comprehend linguistic inputs and then use this knowledge to produce rel-
Unauthenticated
Download Date | 6/6/16 9:42 PM
134 | 3 Computation
evant outputs. An offshoot of this line of inquiry has been a focus on precision in
the development of theories from given data. With the proliferation of the Internet
and the abundance of easily accessible written human language on it, the ability
to create a program capable of processing human language by computer based
on an enormous quantity of natural language data has many broad and exciting
possibilities, including improved search engines and, as a consequence, a deeper
understanding of how language works. In a phrase, the computer is both a powerful modeling device for testing theories and a new means for reproducing human
language articially.
This chapter starts with a discussion of the connection of CL and CT to algorithms and computer modeling. Then it looks at how CL may have triggered the
interest in discourse and at how theories of discourse, in turn, inform NLP. It then
discusses computability in mathematics and what it tells us about mathematics in
general. It ends with an overall assessment of the computation movement in both
mathematics and linguistics. The thematic thread that I wish to weave throughout is that because language and mathematics can be modeled computationally
in similar ways, this can provide insights into their structure and, perhaps, even
their common nature. The computational streams in both linguistics and mathematics are extensions of formalism, since programming a computer requires a
fairly precise knowledge of how to write rules and connect them logically.
Unauthenticated
Download Date | 6/6/16 9:42 PM
1.
2.
3.
4.
5.
135
24 = 12 2
Notice that 12 = 6 2
Plug this in (1) above: 24 = (6 2) 2 = 6 2 2
Notice that 6 = 3 2
Plug this in (3) above: 24 = 6 2 2 = (3 2) 2 2 = 3 2 2 2
The prime factors of 24 are 2 and 3, or 24 = 3 23 . We also note that each of the prime
factors that produces a composite number also divides evenly into it: 3 divides
into 24 as does 2. This is then the basis for constructing the algorithm:
1. Start by checking if the smallest prime number, 2, divides into the number
evenly.
2. Continue dividing by 2 until it is no longer possible to do so evenly.
3. Go to the next smallest prime, 3.
4. Continue in this way.
This method will work every time. The above instructions constitute the algorithm; that is, they constitute a logical step-by-step set of procedures. Euclid actually conceptualized his algorithm geometrically, as did Nichomachus even before
Euclid. Their geometric algorithms are described and illustrated by Heath (1949:
300). These are shown below:
Euclids example
Nichomachus example
Unauthenticated
Download Date | 6/6/16 9:42 PM
136 | 3 Computation
Euclids algorithm shows how to nd the greatest common divisor (gcd) of two
starting lengths BA and DC, which are multiples of a common unit length. DC,
being shorter, is used to measure BA, but only once because remainder EA is less
than DC. EA is divisible into DC, with remainder FC, which is shorter than EA,
and divides three times into its length. Because there is no remainder, the process
ends with FC being the gcd. Nichomachus algorithm shows how the factorization
of the numbers 49 and 21 results in the gcd of 7.
The algorithm is not only a set of instructions for the factorization of composite numbers but also a model of factorization itself, since it breaks the operation
down into its essential steps. Generally speaking, by modeling mathematical (and
linguistic) phenomena in the form of algorithms, we are in effect gaining insight
into the phenomena themselves.
Euclids algorithm above can be easily transformed into a computer program
via a owchart. Scott (2009: 13) provides the following owchart of the algorithm:
ENTRY
Euclids algorithm for the
greatest common divisor (gcd)
of two numbers
INPUT A, B
yes
B = 0?
no
yes
A > B?
no
(< or = 1)
BBA
GOTO 2
AAB
GOTO 2
PRINT A
END
Unauthenticated
Download Date | 6/6/16 9:42 PM
137
This breaks down the steps in calculating the gcd of numbers a and b in locations named A and B. The algorithm proceeds by subtractions in two loops: If the
test B A yields yes (or true), or more accurately the number b in location B is
greater than or equal to the number a in location A, then, the algorithm species
B B A (meaning the number b a replaces the old b). Similarly, if A > B, then
A A B. The process terminates when (the contents of) B is 0, yielding the gcd
in A. Algorithms are thus tests for decidability. If an algorithm can be written for
something and comes to an end, it is computable (that is, it can be carried out
and thus decidable). The general procedures above for factorization of composite numbers are, as the owchart shows, easily turned into computer language,
which is then run on an actual computer. The computer is thus a modeling device
that allows us to test the model.
It is thus useful to look here at distinctions, denitions, and basic concepts
in computer modeling, although well known among computer scientists, since
these are implicit in all computation activities and theories. Computer modeling
is the representation of objects or ideas. Like physical models, computer models
show what something might look like when the real thing would be too difficult or
impossible to create physically. Architects use computer modeling to see what a
new house design might look like. The architect can change the design in order to
see what the changes entail. The model of the house is more exible to build than
a physical model. Similarly, a model of factorization (above) allows us to see what
factoring might look like. The mathematician can change the model in order to
see what the changes would entail and what they would yield in terms of a theory
of factorization.
A computer model lets the linguist or mathematician test the validity or computability of a theory in some domain. And this forces the mathematician or linguist to specify the algorithm precisely beforehand. The realism of a computer
algorithm reects the level of understanding of its maker. Algorithms are also useful as database-makers, so to speak, since they enable users to store large corpora
of information in databases which then allow for a guided search of the databases
in various ways. The efficiency with which computers store and retrieve information makes database management a major function in CL and CT. Neuroscientists
can also store the results of experiments and compare their results with those of
other scientists.
Computer modeling is also a means for mimicking various activities. Articial
intelligence (AI) software enables a computer to imitate the way a person solves
complex problems, speaks, or carries out some other expressive task. One particular type of AI software, called an expert system, enables a computer to ask
questions and respond to information the answers provide. The computer does
so by drawing upon rules and vast amounts of data that human experts have sup-
Unauthenticated
Download Date | 6/6/16 9:42 PM
138 | 3 Computation
plied to the writers of the software. The computer can narrow the eld of inquiry
until a potential solution or viable theory is reached. However, if the rules and
data available to the system are incomplete, the computer will not yield the best
possible solution.
Unauthenticated
Download Date | 6/6/16 9:42 PM
139
Actually, proper AI began at a workshop at Dartmouth College in 1956 organized by John McCarthy, who is credited with coining the name of the new
discipline. At the workshop, computer scientists presented and discussed the rst
programs capable of modeling logical reasoning, learning, and board games, such
as checkers. One presentation described the rst program that learned to play
checkers by competing against a copy of itself.
AI is a major branch of computer science today, aiming to design systems
(models and simulations) that process information in a manner similar to the way
humans do. This makes it as well a branch of cognitive science and neuroscience.
A computer with AI is a very useful tool in these areas because, as mentioned
several times, it can test the consistency of theories, methods, and even such detailed artifacts as proofs and grammar rules. It can also be programmed to perform
the same tasks, making it possible to assess the algorithm itself as a theoretical construct. AI is typically divided into several branches, including knowledge
representation and reasoning, planning and problem solving, Natural Language
Processing, Machine Learning, computer vision, and robotics.
The key idea in AI is representation. The programmer asks a simple question:
How can we best represent phenomenon X? As a trivial, yet useful, example, consider how factoring in algebra could be represented, such as the factorization of
the expression 2x + 4y + 16z. The instructions to the computer would include sequential steps such as the following:
1. Check for factors in all symbols
2. Extract the factors
3. Move them to the front
4. Add parentheses
The operation of the instructions would then produce the required output: 2(x +
2y + 8z). This is said to be a manifestation of knowledge representation in a specic
domain. It is at the core of AI.
Unauthenticated
Download Date | 6/6/16 9:42 PM
140 | 3 Computation
Unauthenticated
Download Date | 6/6/16 9:42 PM
141
as to affirm that AI itself is a theory of mind and thus a way to predict human
behaviora fact that has not escaped Google, which uses algorithms to mine the
Internet for information on people and groups (MacCormick 2009). The fundamental assumption here is that the minds functions can be thought of as attendant to neurological states (for example, synaptic congurations) and that these,
in turn, can be thought of as operations akin to those that a computer is capable of carrying out. That this was a viable approach to analyzing intelligence was
demonstrated by Turing (1936), mentioned in the previous chapter. He showed
that four simple operations on a tapemove to the right, move to the left, erase
the slash, print the slashallowed a machine to execute any kind of program that
could be expressed in a binary code (as for example a code of blanks and slashes).
As long as one could specify the steps involved in carrying out a task and translating them into the binary code, the Turing machine would be able to scan the tape
containing the code and carry out the instructions.
As Gardner (1985: 1718) correctly noted, Turing machines, and similar computational constructs of knowledge catapulted cognitive science to the forefront
in the study of the human mind in the 1980s:
The implications of these ideas were quickly seized upon by scientists interested in human
thought, who realized that if they could describe with precision the behavior of thought
processes of an organism, they might be able to design a computing machine that operated
in identical fashion. It thus might be possible to test on the computer the plausibility of notions about how a human being actually functions, and perhaps even to construct machines
about which one could condently assert that they think just like human beings.
There are now two versions of AI. The employment of computers to test models of
knowledge is the weak version of AI, and, as such, it has helped to shed some
light on how logical processes might unfold in the human mind. The strong
version, on the other hand, claims that all human activities, including emotions
and social behavior, are not only representable in the form of algorithms, but that
machines themselves can be built to think, feel, and socialize. This view depicts
human beings as special types of computation machines. The following citation
from Konner (1991: 120), an early supporter of the strong version, makes this emphatically clear:
What religious people think of as the soul or spirit can perhaps be fairly said to consist of
just this: the intelligence of an advanced machine in the mortal brain and body of an animal.
And what we call culture is a collective way of using that intelligence to express and modify
the emotions of that brain, the impulse and pain and exhilaration of that body.
Not all cognitive scientists have adopted the strong version of AI. Neuroscientists, in particular, are working more and more on the development of computa-
Unauthenticated
Download Date | 6/6/16 9:42 PM
142 | 3 Computation
As Black (1962) pointed out at the start of AI, the idea of trying to discover how a
computer has been programmed in order to extrapolate how the mind works was
bound to become a guiding principle in AI research on mathematics and language
for the simple reason that algorithms are so understandable and so powerful in
producing outputs. But there is a caveat here, expressed best by physicist Roger
Penrose (1989), who has argued that computers can never truly be intelligent because the laws of nature will not allow it. Aware that this is indeed an effective
argument, Allen Newell (1991) responded by pointing out that the use of mechanical metaphors for mind has indeed allowed us to think conveniently about the
mind, but that true AI theory is not based on metaphor. He summarized his case
as follows (Newell 1991: 194):
The computer as metaphor enriches a little our total view of ourselves, allowing us to see
facets that we might not otherwise have glimpsed. But we have been enriched by metaphors
before, and on the whole, they provide just a few more threads in the fabric of life, nothing
more. The computer as generator of a theory of mind is another thing entirely. It is an event.
Not because of the computer but because nally we have obtained a theory of mind. For a
theory of mind, in the same sense as a theory of genetics or plate tectonics, will entrain an
indenite sequence of shocks through all our dealings with mindwhich is to say, through
all our dealings with ourselves.
It is relevant to note that the advent of AI dovetails with the rise of Machine
Translationthe use of computers to translate texts from one natural language
to another. Machine Translation was, and still is, a testing ground for weak and
strong versions of AI. It made an early crucial distinction in knowledge representation between the virtual symbols in abstract systems or algorithms and the
actualized symbols in texts. The idea was to design algorithms capable of mimicking the actualized symbols in linguistic behavior. From this basic platform,
computational linguists developed representations of linguistic knowledge that
do indeed mimic linguistic behavior, as we shall see. Although the computer
Unauthenticated
Download Date | 6/6/16 9:42 PM
143
cannot interpret its outputs (actual symbols) in human terms, it can model them
in virtual terms. The interpretation of the difference is the task of the analyst. All
this suggests that only the weak version of AI is a viable one in the modeling of
mathematical and linguistic knowledge.
The founding notion in knowledge representation within AI is Turings machine, discussed briey above. It is not a physical device. It is a logical abstraction.
Garnham (1991: 20) illustrates it appropriately as follows:
If something can be worked out by mathematical calculation, in the broadest sense of that
term, then there is a Turing machine that can do each specic calculation, and there is a General Turing machine that can do all of them. The way it works is that you pick the calculation
you want done and tell the General Turing machine about the ordinary Turing machine that
does that calculation. The General Turing machine then simulates the operation of the more
specic one.
To paraphrase, by picking an operation and loading a programa specic Turing machinefor carrying it out into the computers memory, the computera
General Turing machinecan then model what would happen if one actually had
that specic machine. The fundamental assumption in early CL was that rules of
syntax are akin to those that a Turing machine is capable of carrying out. The
modern computer works essentially in this way, using binary digits to realize the
operations. The simplicity of the machine is important to note. The main insight
from this line of investigation is that complexity is a derivative of simple operations working recursively at the level of operationality. This inherent principle of
computation may even be the implicit premise that led Chomsky to assume that
recursion was the underlying principle in the operation of the UG. Whatever the
case, it is obvious that algorithmic knowledge representation and human theories of that knowledge can be compared, analyzed, and modied accordingly. The
synergy that exists between the two is the essence of CL and CT. By trying to gure
out how to design a computer program that simulates the cognitive and neurofunctional processes underlying mental activities we can get an indirect glimpse
into those activities.
In computer science, recursion refers to the process of repeating items in a
self-similar way and, more precisely, to a method of dening functions in which
the function being dened is applied within its own denition, but in such a way
that no loop or innite chain can occur. The so-called recursion theorem says that
machines can be programmed to guarantee that recursively dened functions exist. Basically it asserts that machines can encode enough information to be able
to reproduce their own programs or descriptions.
Unauthenticated
Download Date | 6/6/16 9:42 PM
144 | 3 Computation
3.1.3 Programs
It is useful here to discuss what is involved in programming a computer to model
or simulate some activity, behavior, or theory. Preparing a program begins with
a complete description of the operation that the computer is intended to model.
This tells us what information must be inputted, what system of instructions
and types of computing processes (logical, probabilistic, neural) are involved,
and what form the required output should take. The initial step is to prepare a
owchart that represents the steps needed to complete the task. This is itself a
model of the relevant knowledge task, showing all the steps involved in putting
the instructions together into a coherent program. The format of the owchart,
actually, imitates the formatting of a traditional proof in geometry. Each step in
the chart gives options and thus allows for decisions to be made. The owchart is
converted into a program that is then typed into a text editor, a program used to
create and edit text les.
Flowcharts use simple geometric symbols and arrows to specify relationships.
The beginning or end of a program is represented by an oval; a process is represented by a rectangle; a decision is represented by a diamond; and an I/O (inputoutput) process is represented by a parallelogram. The owchart below shows
how to build a computer program to nd the largest of three numbers A, B, and C:
Start
Read A,B,C
YES
NO
Is B > C?
YES
Is A > B?
NO
YES
Is A > C?
NO
Print B
Print C
Print A
End
Unauthenticated
Download Date | 6/6/16 9:42 PM
145
This breaks down the steps in the comparison of the magnitudes of numbers in
a precise and machine readable way. Basically it mimics what we do in the real
world, comparing two numbers at a time and deciding when to determine the
largest magnitude along the way.
Programs are written with high-level languages, which include symbols,
linguistic expressions, and/or mathematical formulas. Some programming languages support the use of objects, such as a block of data and the functions that
act upon the given data. These relieve programmers of the need to rewrite sections
of instructions in long programs. Before a program can be run, special programs
must translate the programming language text into a machine language, or lowlevel language, composed of numbers. Sophisticated systems today combine
a whole series of states and representational devices to produce highly expert
systems for processing input.
Now, for the present purposes it is sufficient to note that programming is a
translation system, converting one system (composed of virtual symbols) into another so that the initial system can be restructured into the second system to produce an output (composed of actual symbols) that allows the rst system to operate. These can be represented diagrammatically as follows (S1 = initial system,
S2 = computer system):
S1
S2
Output
In this diagram the S2 is the set of instructions that constitute the modeling system
required to translate S1 into the S2 (the computer system consisting of a knowledge
representation language with relevant symbols, objects and so on). The S2 thus
constitutes a model, albeit a specic kind of model, based in AI. So, a program is
a model that will allow us to represent mathematical and linguistic knowledge,
or at least aspects of such knowledge, in algorithmic ways. The mechanical system (S2 )more technically known as the source codeis an operating system and
requires interpretation on the part of the programmer to construct. As in traditional proofs, this means blending modes of logic, from abduction to deduction.
Abduction enters the picture when devising the steps and connecting them to
the programmers previous knowledge. So, programming languages contain the
materials to organize a format into a coherent representation of the S1 that the
machine can process.
The above description, although reductive, is essentially what a program does
in converting human ideas into machine-testable ones, thus allowing us to test
Unauthenticated
Download Date | 6/6/16 9:42 PM
146 | 3 Computation
for their consistency, completeness, and decidability. For this reason, computers
have been called logic machines, since they allow for the testing of the three
criteria for knowledge representation that were discussed in the previous chapter.
It is relevant to note that a programming language is usually split into two
components: syntax (form) and semantics. These are understood in the same way
that formal grammars dene them (previous chapter). Without going in details
here suffice it to say that these are modeled to mimic the same type of sequential logical structure found in formal grammars. Lets look at a simple program
in BASIC that translates the source (S1 ) into its language (S2 ). The program is a
rst-generation BASIC one with simple data types, loop cycles and arrays. The following example is written for GW-BASIC, but will work in most versions of BASIC
with minimal changes. It is intended to produce a simple dialogue:
10
20
30
40
50
60
70
80
90
100
110
120
130
140
This is of course a very simple program. But it shows how syntax and semantics
are envisioned in a formal (compositional) way. Third-generation BASIC lan-
Unauthenticated
Download Date | 6/6/16 9:42 PM
147
guages such as Visual Basic, Xojo, StarOffice Basic and BlitzMax have introduced
features to support object-oriented and event-driven programming paradigms.
Most built-in procedures and functions are now represented as methods of standard objects rather than operators. The point is that whether or not this type
of knowledge representation is psychologically real, for the purpose of theorytesting it can be assumed to be so.
Unauthenticated
Download Date | 6/6/16 9:42 PM
148 | 3 Computation
A salesman wishes to make a round-trip that visits a certain number of cities. He knows the
distance between all pairs of cities. If he is to visit each city exactly once, then what is the
minimum total distance of such a round trip?
(Benjamin, Chartrand, and Zhang 2015: 122)
The TSP involves the use of Hamiltonian cycles, which need not concern us here.
Simply put, a Hamiltonian cycle uses all the vertices of a graph at once. A graph
with a Hamiltonian path is thus traceable and connectible. The solution of the TSP
is elaborated by Benjamin, Chartrand, and Zhang (2015: 122) as follows (where c =
a city, n = number of vertices in a graph):
The Traveling Salesman Problem can be modeled by a weighted graph G whose vertices are
the cities and where two vertices u and v are joined by an edge having weight r if the distance
between u and v is known and this distance is r. The weight of a cycle C in G is the sum of the
weights of the edges of C. To solve this Traveling Salesman Problem, we need to determine
the minimum weight of a Hamiltonian cycle in G. Certainly G must contain a Hamiltonian
cycle for this problem to have a solution. However, if G is complete (that is, if we know the
distance between every pair of cities), then there are many Hamiltonian cycles in G if its
order n is large. Since every city must lie on every Hamiltonian cycle of G, we can think of
a Hamiltonian cycle starting (and ending) at a city c. It turns out that the remaining (n 1)
cities can follow c on the cycle in any of its (n 1)! orders. Indeed, if we have one of the
(n 1)! orderings of these (n 1) cities, then we need to add distances between consecutive
cities in the sequence, as well as the distance between c and the last city in the sequence. We
then need to compute the minimum of these (n 1)! sums. Actually, we need only nd the
minimum of (n 1)!/2 sums since we would get the same sum if a sequence was traversed in
reverse order. Unfortunately, (n 1)!/2 grows very, very fast. For example, when n = 10, then
(n 1)!/2 = 181,400.
Unauthenticated
Download Date | 6/6/16 9:42 PM
149
Euler went on to prove that it is impossible to trace a path over the bridges without
crossing at least one of them twice. This can be shown by reducing the map of the
area to graph form, restating the problem as follows:
Is it possible to draw the following graph without lifting pencil from paper, and
without tracing any edge twice?
Unauthenticated
Download Date | 6/6/16 9:42 PM
150 | 3 Computation
The graph version provides a more concise and thus elemental model of the situation because it disregards the distracting shapes of the land masses and bridges,
reducing them to points or vertices, and portraying the bridges as paths or edges.
This is called a network in contemporary graph theory. More to the point of the
present discussion, it shows that solving the problem is impossible without doubling back at some point. Creating more complex networks, with more and more
paths and vertices in them, will show that it is not possible to traverse a network
that has more than two odd vertices in it without having to double back over some
of its pathsan odd vertex is one where an odd number of paths converge. Euler
proved this fact in a remarkably simple way. It can be paraphrased as follows.
A network can have any number of even paths in it, because all the paths that
converge at an even vertex are used up without having to double back on any
one of them. For example, at a vertex with just two paths, one path is used to get to
the vertex and another one to leave it. Both paths are thus used up without having
to go over either one of them again. Take, as another example, a vertex with four
paths. One of the four paths gets us to the vertex and a second one gets us out.
Then, a third path brings us to the other vertex, and a fourth one gets us out. All
paths are once again used up.
The same reasoning applies by induction to any network with even vertices.
At an odd vertex, on the other hand, there will always be one path that is not used
up. For example, at a vertex with three paths, one path is used to get to the vertex
and another one to leave it. But the third path can only be used to go back to the
vertex. To get out, we must double back over one of the three paths. The same
reasoning applies to any odd vertex. Therefore, a network can have, at most, two
odd vertices in it. And these must be the starting and ending vertices. If there is
any other odd vertex in the network, however, there will be a path or paths over
which we will have to double back.
The network in the Knigsberg graph has four vertices in it. Each one is odd.
This means that the network cannot be traced by one continuous stroke of a pencil
Unauthenticated
Download Date | 6/6/16 9:42 PM
151
without having to double back over paths that have already been traced. The relevant insight here is that Eulers graph makes it possible to look at the relationships
among elemental geometric forms to determine solvability (Richeson 2008: 107):
The solution to the Knigsberg bridge problem illustrates a general mathematical phenomenon. When examining a problem, we may be overwhelmed by extraneous information.
A good-problem-solving technique strips away irrelevant information and focuses on the
essence of the situation. In this case details such as the exact positions of the bridges and
land masses, the width of the river, and the shape of the island were extraneous. Euler
turned the problem into one that is simple to state in graph theory terms. Such is the sign of
genius.
The implications of Eulers problem for modern graph theory, topology, and the
computational-mathematical study of the P = NP problem are unending. Graph
theory has had a great impact on mathematical method, bringing together areas
that were previously thought to be separate. A path that traverses every edge of
a graph exactly once is called Eulerian. One that does not is called non-Eulerian.
Euler then looked at graphs in the abstract. In the case of a three-dimensional
gure, for instance, he found that if we subtract the number of edges (e) from the
number of vertices (v) and then add the number of faces (f) we will always get 2
as a result:
ve+f =2
Take, for example, a cube:
V2
E2
E1
V1
V3
E3
E4
V4
E7
E6
E8
E5
E10
V6
E9
V5
E12
V7
E11
V8
As can be easily seen, the cube as 8 vertices, 12 edges and 6 faces. Now, inserting
these values in the formula, it can be seen that the relation it stipulates holds. The
KBP not only provided the basic insights that led to the establishment of two new
branches of mathematicsgraph theory and topologybut it also held signicant
implications for the study of mathematical impossibility. Eulers demonstration
Unauthenticated
Download Date | 6/6/16 9:42 PM
152 | 3 Computation
that the Knigsberg network was impossible to trace without having to double
back on at least one of the paths showed how the question of impossibility can
be approached systematically. It was a prototype for the study of combinatorial
optimization (Papadimitriou and Steiglitz 1998), which consists essentially in developing algorithms for network ow, and testing NP-complete problems.
The KBP is a predecessor of the TSP, which was presented in the 1930s and
now constitutes one of the most challenging problems in algorithmic optimization, having led to a large number of programming ideas and methods. As Bruno,
Genovese, and Improta (2013: 201) note:
The rst formulation of the TSP was delivered by the Austrian mathematician Karl Menger
who around 1930 worked at Vienna and Harvard. Menger originally named the problem the
messenger problem and set out the difficulties as follows. At this time, computational complexity theory had not yet been developed: We designate the Messenger Problem (since this
problem is encountered by every postal messenger, as well as by many travelers) the task of
nding, for a nite number of points whose pairwise distances are known, the shortest path
connecting the points. This problem is naturally always solvable by making a nite number
of trials. Rules are not known which would reduce the number of trials below the number of
permutations of the given points. The rule, that one should rst go from the starting point
to the point nearest this, etc., does not in general result in the shortest path.
Unauthenticated
Download Date | 6/6/16 9:42 PM
153
a metric of solvability and provability. Turing machines work in this way, because
they can only be in one state at a time. But the advent of quantum physics and
quantum computing has started to provide a powerful alternative to the nite state
model. Quantum physics claims that the fundamental particles of Nature are not
in one xed state at any moment, but can occupy several states simultaneously,
known as superposition. It is only when disturbed that they assemble into one
state. This has obvious implications for the computability hypothesis, because it
could lead to faster machines. In 2009 a quantum program was devised that was
able to run Grovers reverse phone algorithm (Elwes 2014: 289). A phone book is
essentially a list of items organized in alphabetical order. So, looking up a name
in it is a straightforward (nite-state) task. However, if we have a phone number
and want to locate the person to whom it belongs we are faced with a much more
difficult problem to solve. This is the essence of the reverse phone book problem.
Its solution is a perfect example of how seemingly intractable problems can be
modeled in various computable ways to provide solutions. Elwes (2014: 289) puts
it as follows:
In 1996, Lou Grover designed a quantum algorithm, which exploits a quantum computers
ability to adopt different states, and thus check different numbers, simultaneously. If
the phone book contains 10,000 entries, the classical algorithm will take approximately
10,000 steps to nd the answer. Grovers algorithm reduces this to around 100. In general,
it will take around N steps, instead of N. The algorithm was successfully run on a 2-qubit
quantum processor in 2009.
An added aspect of quantum computing is that quantum computations are probabilistic. By running the algorithms over and over one can, thus, increase the level
of decidability to higher and higher degrees, but this would then slow down the
process. Grovers algorithm, actually, was found to be optimal, since no other algorithm has been discovered since that could solve the problem faster. It is not
known, moreover, whether every problem in NP, such as the TSP one, can be
solved with quantum algorithms.
3.2.2 Computability
CT constitutes a partnership between mathematics and computer science aiming
to decide what mathematical problems can be solved by any computer. A function
or problem is computable if an algorithm can be devised that will give the correct
output for any valid input. Since computer programs are countable but numbers
are not, then there must exist numbers that cannot be calculated by any program.
There is, as already discussed, no easy way of describing any of them.
Unauthenticated
Download Date | 6/6/16 9:42 PM
154 | 3 Computation
There are many tasks that computers cannot perform. The most well-known
is the halting problem, mentioned in the previous chapter. Given a computer program and an input, the problem is to determine whether the program will nish
running or will go into a loop and run forever. Turing proved in 1936 that no algorithm for solving this problem can exist. He reasoned as follows: it is sufficient to
show that if a solution to a new problem were to be found, then it could be used to
decide an undecidable problem by changing instances of the undecidable problem into instances of the new problem. Since we know that no method can decide
the old problem, no method can decide the new problem either.
One could ask: Is this not just a moot point, since mathematics goes on despite
computability conundrums? The issue of computability is a crucial one, since it
allows us to reformulate classic questions in algorithmic ways. One of the most basic questions of mathematics is: What does a real number look like? This question
was actually contemplated before the advent of CT by mil Borel in 1909 (chapter 1). If we write out the decimal expansion, then each of the digits, from 0 to 9
should appear equally often. The decimal expansion of a number is its representation in the decimal system where each place consists of a digit from 0 to 9 arranged
in such a way that it is multiplied by a power of 10 (10n ), decreasing from left to
right, with 100 indicating the ones place. In other words, it shows the values
of each digit according to its place in the decimal layout or expansion. So, for
instance, the digit 1,236 has the following value structure:
1 103 + 2 102 + 3 101 + 6 100 .
Now, Borel argued that the equal occurrence of the digits does not happen over a
short stretch of the expansion, but if it is stretched out to innity the digits should
eventually average out. He dened this as a normal number. There are 100 possible different 2-digit combinations, 00 to 99, which should also appear equally
over longer stretches of the expansion; the same applies to 3-digit combinations;
and so on, to n-digit combinations. Generally, every nite string of digits in an
expansion should appear with the same frequency as any other string of the same
length. This is Borels main criterion for normality. As a corollary, the same criterion should hold for numbers in any base, such as the binary one.
Borel actually proved that virtually every real number (or more accurately every place-value representation of every number) is normal, with few exceptions.
This raised a few truly intriguing questions: Are the numbers e and normal? It is
conjectured that they are, but no one has been able to prove it. A non-computable
number is called a random real number because it seems to have no discernible
pattern. More specically, one can easily run an algorithm to predict the next integer in an expansion with a high degree of certainty; but no algorithm can predict
with any degree of certainty what the next digit would be. This is a crucial aspect
Unauthenticated
Download Date | 6/6/16 9:42 PM
155
of numbers because randomness is stronger than normality. In effect, computability in this case leads, paradoxically, to a consideration of randomness and other
probability factors in the makeup of normality.
Computability, as examples such as this show, is an epistemological notion
that extends more traditional ways of doing mathematics. Indeed, before the advent of CT, computability (solvability) was examined in more direct mathematical
terms, as we have seen in previous chapters. Group theory is a case-in-point. It
came from the fact that two mathematicians, Neils Henrik Abel and variste Galois
in the nineteenth century, were contemplating the solutions of polynomial equations (Mackenzie 2012: 118119). Specically they were looking at quintic polynomials, which have no solution. Their proof involved an exploration of the mathematical concept of symmetry. The general form of the quintic polynomial looks
like this:
x5 + ax4 + bx3 + cx2 + dx + f
The equation has ve roots, {r1 , r2 , r3 , r4 , r5 }. Each coefficient in the equation is a
symmetric function of the roots:
a = (r1 + r2 + r3 + r4 + r5 )
b = (r1 r2 + r1 r3 + r1 r4 + r1 r5 + r2 r3 + r2 r4 + r2 r5 + r3 r4 + r3 r5 + r4 r5 )
and so on
Each of the roots participates equally in the formulas; if the roots are permuted
(say, by replacing r1 with r2 and r2 with r1 ) the formulas do not change. The terms
will have a different order in the written sequence but the sums will be the same.
To put it differently, the linear structure changes, but not the conceptual one it
represents. There are 120 ways to permute the ve roots (5! = 120). So a quintic
polynomial has 120 symmetries (conceptually speaking). Some polynomials have
fewer symmetries because some of the permutations may be excluded due to extra algebraic relations between some of the roots (for instance, a root may be the
square of another). If a polynomial is solvable by radicals, it generates a hierarchy
of intermediate polynomials and number elds, which correspond to the roots.
The symmetries of the original polynomial have to respect hierarchical structure.
The full group (as Galois called it) of 120 permutations of the roots does not allow a
hierarchy of subgroups of the requisite kind. As it turns out, the maximum height
(number of permutations for the quintic polynomial) is 20.
All this may prove to be very interesting in itself, but seems to constitute nothing but an internal ludic exercise. Does group theory have any other value or
meaning? As it has turned out it, it provides an accurate language for many natural phenomena, as Mackenzie (2012: 121) indicates:
Unauthenticated
Download Date | 6/6/16 9:42 PM
156 | 3 Computation
Chemists now use group theory to describe the symmetries of a crystal. Physicists use it to
describe the symmetries of subatomic particles. In 1961, when Murray Gell-Mann proposed
his Nobel Prize-winning theory of quarks, the most important mathematical ingredient was
an eight-dimensional group called SU(3), which determines how many subatomic particles
have spin (like the neutron and proton). He whimsically called his theory The Eightfold
Way. But it is no joke to say that when theoretical physicists want to write down a new eld
theory, they start by writing down its group of symmetries.
Unauthenticated
Download Date | 6/6/16 9:42 PM
157
erates random numbers, we are faced with a much more complex situation, but
still a highly do-able one in computational terms. We have modeled the real numbers in terms of composition and expansion possibilities. Now, we can ask: What
other mathematical structures can be modeled computationally in this way? As it
turns out this type of question leads to a plethora of other phenomena that can
be modeled in the same way. These are known as non-standard models. They
were discovered by Abraham Robinson in 1960 (see Robinson 1974). Robinson
discovered what he called hyperreal numbers, which included the innitesimals
(numbers relating to, or involving, a small change in the value of a variable that
approaches zero as a limit), which truly surprised everyone as to the reality of their
existencehe found these by looking at models of the calculus and discovering
analogies in number systems. The hyperreal numbers now raise further questions,
because the real line and the hyperreal line seem to model things differentially,
and the philosophical problem is that we have no way of knowing what a line in
physical space is really like.
Given the importance of innitesimals to mathematical modeling it is worth
revisiting the whole episode schematically here. The early calculus was often critiqued because it was thought to be an inconsistent mathematical theory, given
its use of bizarre notions such as the innitesimals. These were dened as changing numbers as they approached zero. The problem was that in some cases they
behaved like real numbers close to zero but in others they behaved paradoxically
like zero. Take, as an example, the differentiation of the polynomial f(x) = ax2 +
bx + c (Colyvan 2012: 121):
4.
f(x + ) f(x)
f (x) =
2 + b
2ax
+
f (x) =
f (x) = 2ax + b +
5.
f (x) = 2ax + b
1.
2.
3.
f (x) =
Robinsons discovery laid to rest the problem of innitesimals. He did this by using
set theory. Statements in set theory that quantify the members of a specic set are
Unauthenticated
Download Date | 6/6/16 9:42 PM
158 | 3 Computation
said to be of the rst-order, while those that quantify sets themselves are said to be
second-order. Higher-order systems involve quantifying sets of sets ad innitum.
Robinsons approach was a theory that generalized rst-order logical statements
but not higher-order onesthus avoiding problems of incompleteness. He posited
that a proper extension of the reals (), *, would allow for every subset, say D of
, to be extended to a larger set *D in * so that every function, f : D could be
extended from *D to *, that is: f : *D *. He called this the transfer principle:
Every statement about the real numbers expressed in rst-order logic is true in
the extended system *.
A hyperreal number is a number that belongs to *. It is relevant to note that
when Robinson presented his ideas there was a strong reaction against them. The
situation is described by Tall (2013: 378) as follows:
Non-standard analysis was Robinsons vision of a brave new world that encompassed the
ancient idea of innitesimal. But it was presented to a world immersed in the epsilon-delta
processes of mathematical analysis. Its rst weak spot was that the theory did not seem to
add any new results in standard mathematical analysis
Modeling and computability are really parts of a general approach in the search
for ways to represent knowledge. The use of the computer to facilitate this search
is essentially what CT is about. Since Euclid, mathematicians have been searching
for a meta-algorithm, so to speak, that would allow them to solve all intractable
problems. But, as it turns out this might be a dream, although it is one being pursued with different techniques and with a lot of know-how in collaboration with
other disciplines (Davis and Hersh 1986). Interdisciplinarity is now a basic mindset within what has been called here hermeneutic mathematics.
Unauthenticated
Download Date | 6/6/16 9:42 PM
| 159
Unauthenticated
Download Date | 6/6/16 9:42 PM
160 | 3 Computation
Unauthenticated
Download Date | 6/6/16 9:42 PM
| 161
Old men and women (who are not necessarily old) love that program.
Old men and old women (both the men and women are old) love that program.
Unauthenticated
Download Date | 6/6/16 9:42 PM
162 | 3 Computation
have to have some contextual rule subsystem in the algorithm that would indicate:
1. that pens as writing instruments are (typically) smaller than boxes
2. that boxes understood as containers are larger than pens (typically again)
3. that it is impossible for a bigger object to be contained by a smaller one
The general form of such rules would be somewhat as follows (p = writing instrument known as a pen, b = box, c = container):
1.
2.
3.
p<b
bc
b>p
But the algorithm would still have to decompose the polysemy of the word pen.
An appropriate rule would indicate that the word pen means:
1.
2.
3.
a writing instrument
a play pen
a pig pen
Unauthenticated
Download Date | 6/6/16 9:42 PM
| 163
strategy. First, the SL (source language) text is parsed into an internal representation, much like the ones used in formal grammars. Second, a transfer is made
from the SL text to the TL (target language) text. The transfer mechanisms between the SL and TL consist of an analyzer that literally transforms the SL text
into an abstract form and a generator which then converts this into a representation in the TL. Of course, as in many versions of formal grammar, this assumes
a universal set of rules or rule types in the structure of languages. Experience
with programming rules, however, has shown this to be impracticable. Nevertheless, the Interlingua approach has taken schemas of real-world knowledge into
account, thus expanding the purview and sophistication of MT. In other words, it
has started to integrate ILA with ELA in a sophisticated way.
A variant of the Interlingua system is called Knowledge-Based Machine Translation (KBMT) which also converts the SL text into a representation that is claimed
to be independent of any specic language, but differs in that its inclusion of semantic and contextual information is based on frequency analyses. By adding
these, the system deals with polysemy and other ambiguities in statistical terms
(Nirenburg 1987). This allows the algorithm to make inferences about the appropriate meaning to be selected in terms of frequency distribution measures of a
lexical item. This is intended to simulate the human use of real-world information
about polysemy, allowing the analyzer to integrate inference of meaning based on
probability metrics into the mechanical translation process. The generator simply searches for analogous or isomorphic forms in the TL and converts them into
options for the system. The key notion, though, is that of knowledge modeling.
The details of how this is done are rather complex; and they need not interest us
here as such. Suffice it to say that the computer modeling of knowledge through
Interlingua involves mining data from millions of texts on the Internet, analyzing
them statistically in terms of knowledge categories, and then classifying them for
the algorithmic modeling of polysemy.
Unauthenticated
Download Date | 6/6/16 9:42 PM
164 | 3 Computation
ticated since early times as a catcher of rats and mice and as a pet and existing in
several distinctive breeds and varieties. The problem with this denition is that it
uses mammal to dene cat. What is a mammal? The dictionary denes mammal as
any of various warm-blooded vertebrate animals of the class Mammalia. What is
an animal? The dictionary goes on to dene an animal as a living organism other
than a plant or a bacterium. What is an organism? An organism, the dictionary
stipulates, is an individual animal or plant having diverse organs and parts that
function together as a whole to maintain life and its activities. But, then, what
is life? Life, it species, is the property that distinguishes living organisms. At
that point it is apparent that the dictionary has gone into a conceptual loopit
has employed an already-used concept, organism, to dene life.
Looping is caused by the fact that dictionaries employ words to dene other
words. As it turns out, the dictionary approach just described is the only possible
onefor the reason that all human systems of knowledge seem to have a looping
structure. This suggests that the meaning of something can only be inferred by
relating it to the meaning of something else to which it is, or can be, linked in
some way. So, the meaning of cat is something that can only be inferred from the
circuitry of the conceptual associations that it evokes. This circuitry is part of a
network of meanings that the word cat entails.
Each associated meaning or concept is a node in the network. There is no
limit (maximum or minimum) to the number and types of nodes and circuits that
characterize a concept. It depends on a host of factors. In the network for cat,
secondary circuits generated by mammal, for example, could be extended to contain carnivorous, rodent-eater, and other nodes; the life node could be extended to
generate a secondary circuit of its own containing nodes such as animate, breath,
existence, and so on; other nodes such as feline, carnivorous, Siamese, and tabby
could be inserted to give a more detailed picture of the conceptual structure
of cat. In a circuit there is always a focal nodethe one chosen for a discourse
situation. In the above network cat is the focal node, because that is the concept
under consideration. However, if animal were to be needed as the focal concept,
then cat would be represented differently as a nonfocal node connected to it in a
circuit that would also include dog and horse, among other associated nodes. In
effect, there is no way to predict the conguration of a network in advance. It all
depends on the purpose of the analysis, on the type of concept, and on other such
factors that are variable and/or unpredictable.
In psychology, the primary nodesmammal, animal, life, and organismare
called superordinate ones; cat is instead a basic concept; and whiskers and tail
are subordinate concepts. Superordinate concepts are those that have a highly
general referential function. Basic concepts have a typological function. They allow for reference to types of things. Finally, subordinate concepts have a detail-
Unauthenticated
Download Date | 6/6/16 9:42 PM
| 165
ing function. Clearly, the conguration of a network will vary according to the
function of its focal nodethat is, a network that has a superordinate focal node
(mammal) will display a different pattern of circuitry than will one that has a basic
concept at its focal center.
The above description of cat constitutes a denotative network. Denotation is
the initial meaning captured by a concept, as is well known. Denotative networks
allow speakers of a language to talk and think about concrete things in specic
ways. But such networks are rather limited when it comes to serving the need
of describing abstractions, emotions, morals, and so on. For this reason they are
extended considerably through further circuitry. Consider the use of cat and blue
in sentences such as:
1.
2.
3.
4.
These encode connotative and metaphorical meanings. The use of cat in (1) to
mean attractive or engaging, comes out of the network domain associated
with jazz music and related pop culture circuits (Danesi 2000); and the use of
blues in (2) to mean sad, gloomy, comes out of the network domain associated
with blues music. In effect, these have been linked to the networks of cat and blue
through the channel of specic cultural knowledge. They are nodes that interconnect cat and blue to the network domains of jazz and blues music. The meaning
of something secret associated with cat in example (3) above and the meaning
of unexpectedness associated with blue in (4) result from linking cat with the
secrecy network domain and blue with the sky domain. Sentence (3) is, in effect,
a specic instantiation of the conceptual metaphor animals reect human life and
activities, which underlies common expressions such as: Its a dogs life; Your life
is a cats cradle; I heard it from the horses mouth. Sentence (4) is an instantiation
of the conceptual metaphor Nature is a portent of destinywhich literary critics
classify as a stylistic technique under the rubric of pathetic fallacy. This concept
underlies such common expressions as: I heard it from an angry wind; Cruel clouds
are gathering over your life.
A comprehensive network analysis of cat and blue for the purposes of MT
would have to show how all meaningsdenotative, connotative, metaphorical
are interconnected to each other through complex circuitry that involves both
ILA and ELA. It would also have to add a statistical measure of the frequency of
the probable presence of a specic circuitry in a discourse textas will be discussed below. It is the ability to navigate through the intertwining circuitry of such
Unauthenticated
Download Date | 6/6/16 9:42 PM
166 | 3 Computation
sidney
slither
is a
is a
vegetarian
grass_snake
size
crocodile
green
small
is a
is a
color
eats
meat
is a
reptiles
is a
snake
has
no_legs
Unauthenticated
Download Date | 6/6/16 9:42 PM
| 167
Unauthenticated
Download Date | 6/6/16 9:42 PM
168 | 3 Computation
Unauthenticated
Download Date | 6/6/16 9:42 PM
| 169
differs from its mean occurrence in large corpora. Research on the use of BLEU has
shown that there is a strong positive correlation between human assessments and
delity of translation by using n-gram algorithms (Doddington 2002, Coughlin
2003, Denoual and Lepage 2005).
N-gram theory has brought about great interest in machine-learning as a theoretical paradigm. Machine-learning (ML) is now a branch of AI and CL. It studies
how computers can learn from huge amounts of data by using statistical techniques such as n-grams. An everyday example of an ML system is the one that distinguishes between spam and non-spam emails on many servers, allocating the
spam ones to a specic folder. An early example of ML goes back to the 1956 Dartmouth workshop which introduced the rst program that learned to play checkers
by competing against a copy of itself. Other programs have since been devised for
computers to play chess, backgammon, as well as to recognize human speech and
handwriting.
Simply put, ML algorithms are based on data mining information which is
converted into knowledge network systems to produce knowledge representation.
In some instances the algorithm attempts to generalize from certain inputs in order to generate, speculatively, an output for previously unseen inputs. In other
cases, the algorithm operates on inputs where the desired output is unknown,
with the objective being to discover hidden structure in the data. Essentially, such
ML algorithms are designed to predict new outputs from specic test cases. The
algorithms thus mimic inductive learning by humans, that is, the extraction of
a general pattern on the basis of specic cases. This whole line of investigation
has, remarkably, led to the construction of robots which acquire human-like skills
through the autonomous exploration of specic cases and through interaction
with human teachers.
Perhaps the rst scientist to devise an ML algorithm for MT was Makoto Nagao
in 1984, who called his technique example-based MT. Using case theory in linguistics (Fillmore 1968), Nagao based his algorithm on analogy-making in language. From a corpus of texts that had already been translated, he selected specic model sentences to get the algorithm to translate other components of the
original sentence, combining them in a structural way to complete the translation.
Nagaos system, as far as I can tell, has been rather successful, but still falls short
of translating texts with full communicative and conceptual delity and certainly
does not approach the power of human analogy. To this day, the main obstacle is
gurative sense, which an algorithm would need to untangle from the structure
of a text, before bits and pieces can be put together according to strict rules of
syntax. As Bar-Hillel argued, without a universal encyclopedia, a computer will
probably never be able to select the appropriate meaning of the word on its own.
Unauthenticated
Download Date | 6/6/16 9:42 PM
170 | 3 Computation
Unauthenticated
Download Date | 6/6/16 9:42 PM
| 171
!S!
QUERY
SUBJECT
WELLNESS
PLACE
PLACE
is
he
PREPPH
BODY-PART
besides
his abdomen
!S!
PLACE
SUBJECT
WELLNESS
PREPPH
BODY-PART
PLACE
(besides)
(his) (abdomen)
(anywhere else)
QUERY
(he) (bleeding)
Figure 3.9: An example of how English is translated into concepts, then recombined from concepts into Chinese. IBM, 2007
In this case a probabilistic approach is required. The following diagrams show the
mathematical formulas that apply.
k = arg max
k g
log
l= sql m=
( )
f k ,s,c m ,c m+ ,s n ,s n
k g
( )
f k ,s ,c m ,c m+ ,s n ,s n
s V k
( )
g k f k ,s,c m ,c m+ , w m , w m+ ,s n ,s n
log
k g
s
( )
f k ,s ,c m ,c m+ , w m , w m+ ,s n ,s n
V k
Figure 3.10: Using statistics to translate spoken language into concepts. IBM, 2007
Unauthenticated
Download Date | 6/6/16 9:42 PM
172 | 3 Computation
It is obvious that the task is a complex one and the mathematical system used a
highly sophisticated one. The interesting thing about the algorithm above is that
it breaks down the process into concepts rather than words and then assigns a
statistical modeling framework to it. It is beyond the scope here to delve into the
mathematical relation to the knowledge representation system in question. It is
sufficient for the present purposes to simply present it, since MT today is venturing into territories that even linguists have rarely entered in the past. And these
territories are drawing mathematics and linguists closer and closer together in the
search for determining the computability of relevant phenomena.
Unauthenticated
Download Date | 6/6/16 9:42 PM
| 173
Unauthenticated
Download Date | 6/6/16 9:42 PM
174 | 3 Computation
Unauthenticated
Download Date | 6/6/16 9:42 PM
175
Without going into specics here, it is instructive to note that the program models
question-and-answer sequences that characterize human conversation in terms of
distinct algorithmic states. Even this early simple program shows how real-world
information can be transformed into computer-usable instructions. Until the late
1980s, most systems were based on BASIC. Shortly thereafter, ML programs using
statistical and n-gram models, rather than strict if-then rules, greatly enhanced
the ability of algorithms to simulate human conversation by incorporating the relative certainty of possible questions and answers in common stretches of dialogue
into the instructions. These algorithms work effectively if the conversation tends
to be script-based. Take, for instance, what is involved in successfully ordering
a meal at a restaurant. The components of this script include: a strategy for getting the waiters attention; an appropriate response by the waiter; a strategy for
ordering food to t ones particular tastes and nancial capabilities; an optional
strategy for commenting favorably or unfavorably on the quality of the food. Any
radical departure from this script would seem anomalous and even result in a
breakdown in communication.
Unauthenticated
Download Date | 6/6/16 9:42 PM
176 | 3 Computation
to NLP use learning-based AI that examines patterns in data to improve a programs own understanding. Typical tasks today include the following:
1. developing subprograms for segmenting sentences, as well as tagging and
parsing the parts of speech
2. applying sophisticated data processing methods capable of yielding outputs
from large and multi-source data sets that consist of both unstructured and
semi-structured information (known as deep analytics)
3. developing methods of information extraction that locate and classify items
in a text into pre-established categories, such as peoples names, organizations, expressions of times, and so on (known as named-entity extraction)
4. determining which expressions refer to the same entity in a text (known as
co-reference resolution)
As even this minimal list shows, NLP has allowed linguists to understand the components of language and their relation to external knowledge representation in a
very precise way. One of the best known NLP approaches to this internal-versusexternal modeling is script theory, especially as developed initially by computer
scientist Roger Schank (1980, 1984, 1991), which has had signicant implications
for pragmatics and the study of discourse. It assumes that some (perhaps many)
human interactions are governed by internal scripts, which refer essentially to
the real-world knowledge structures that manifest themselves in typical social
situations. They allow people to carry out conversations effectively. The computational task at hand is described by Schank (1984: 125) as follows:
When we read a story, we try to evaluate the reasoning processes of the main character. We
try to determine why he does what he does and what he will do next. We examine what
we would do in a similar situation, and we try to make the same connections that the main
character seems to be making. We ask ourselves, What is he trying to do? Whats his plan?
Why did he do what he just did? Any understanding system has to be able to decipher the
reasoning processes that actors in stories go through. Computer understanding means computers understanding people, which requires that they understand how people formulate
goals and plans to achieve those goals. Sometimes people achieve their goals by resorting
to a script. When a script is unavailable, that is, when the situation is in some way novel,
people are able to make up new plans.
Making contact with a stranger, for instance, requires access to both the appropriate cultural script, its contextualization, and the verbal structures that encode
it. If the contact occurs in an elevator, the script might call for talking about the
weather. By extension, all social actions and interactions can be conceived in
terms of this script-language-context complementarity. The enactment of agreements, disagreements, anger, irtations, and so on can be seen to unfold in a
script-like fashion.
Unauthenticated
Download Date | 6/6/16 9:42 PM
177
Work in contemporary NLP has been using script theory effectively, alongside other theoretical paradigms (discussed above). By decomposing even a simple script-like conversation into its pragmatic, linguistic, and conceptual components, NLP has developed a truly sophisticated array of tasks and research questions that overlap considerably with research agendas in pragmatics and conversational analysis. Some of these are listed below (note that these summarize much
of the foregoing discussion about CL):
1. nding ways to produce conceptually-appropriate machine-readable summaries of chunks of text (automatic summarization)
2. determining which words in a text refer to the same objectsfor instance,
matching pronouns and adverbs with preceding (anaphora) or following (cataphora) nouns or names (coreference resolution)
3. classifying discourse texts in terms of their social function (yes-no question,
content question, assertion, directive, and so on), since many can be decoded
in terms of script theory
4. segmenting words into their constituent morphemes (morphological segmentation) and then relating these to their use in a text
5. determining which items in a text refer to proper nouns (people names,
places, organizations, and so on) (named entity recognition)
6. converting computer language into understandable human language (natural
language generation)
7. understanding which semantic-conceptual rules apply in a certain text, while
others are excluded (natural language understanding)
8. determining the text corresponding to a printed text image (optical character
recognition)
9. tagging the part of speech for each word so that its role in sentences and its
connection to the lexicon can be determined; this is part of disambiguation,
since many words are polymorphic, that is, pertain to different morphological
classes, as, for example, the fact that the word set can be a noun (I bought a
new set of chess pieces), a verb (I always set the table) or adjective (He has too
many set ways of thinking)
10. parsing a sentence effectively, since in addition to being polysemous and polymorphic, natural languages are also polyanalytical, that is, sentences in a
language will have multiple syntactic analyses (Roark and Sproat 2007); different types of parsing systems, such as dependency grammar, optimality theory, and stochastic grammar are, essentially, attempts to resolve the parsingrepresentation problem
11. identifying relationships among named entities in a text (who is the son of
whom, what is the connection of a some thing to another, and so on)
Unauthenticated
Download Date | 6/6/16 9:42 PM
178 | 3 Computation
There are a host of other problems that NLP research faces in devising algorithms
to produce natural language-like outputs. The usefulness of this approach to
linguists is that it allows them to zero in on the various components that make
up something as simple as a sentence or a conversational text. NLP has made
great strides in many areas and, like work on algorithms in various elds of
human endeavor (from ight simulation to medical modeling), it has produced
some truly remarkable accomplishments. For example, in the area of speech
recognition technology, voice-activated devices that skip manual inputting are
now routine. The work in this area has shed light on how oral speech relies not
so much on pauses between items, but on other segmental cues. For example,
in speech /naitrait/ is not articulated with a break between /nait/ and /rait/, but
the word could be either a single morpheme, nitrate, or two morphemes, night
rate. So the segmentation process involves not only determining which phonic
cues are phonemic, but also contextual ones that produce the relevant cues to
determine word boundaries.
Headings are dened with <h> tags (<h1>, <h2>, ) and specied as follows:
<h1>This is a heading</h1>
<h2>This is a heading</h2>
<h3>This is a heading</h3>
and so on
2.
3.
NLP has made great strides in producing ML systems. The fundamental goal is understanding the relation between the system (language), its representation (mod-
Unauthenticated
Download Date | 6/6/16 9:42 PM
179
eling), and how these connect to the outside world, both bringing it into the system and using the system to understand the outside world. The complexity of this
task has been made obvious by the fact that the rule systems employed by computer languages in NLP are intricate and difficult to develop. In my view, the main
goal of NLP is to nd simpler languages that have the same kind of ergonomic
power of human language. NLP holds great promise for making computer interfaces easier for people, so as to be able to talk to the computer in natural language,
rather than learn a specialized language of computer commands. This can be
called a meta-ELIZA project, in reference to one of the rst programs attempting
to simulate speech.
Both CT and CL constitute interdisciplinary hermeneutic modes of investigation, involving linguists, computer scientists, articial intelligence experts, mathematicians, and logicians in the common goal of unraveling the nature of mathematical and linguistic phenomena by modeling them in the form of algorithms.
To summarize, the algorithms devised by computer scientists are insightful on at
least three counts:
1. They force analysts to unravel the relation between structure and meaning in
the formation of even the simplest sentences and the simplest mathematical
formulas and thus to focus on how the constituents of a mathematical or linguistic form can lead to the production of meaningful wholes by means of the
relation among them (internal information) and with the real world (external
information). The computer cannot do this; the analyst can and must do it in
representing the knowledge system or subsystem involved.
2. They produce machine-testable models that can then be discussed vis--vis
the theoretical models of mathematicians and linguists.
3. They emphasize the relation among representation, internal knowledge, and
contextualization and how these might be modeled.
Unauthenticated
Download Date | 6/6/16 9:42 PM
180 | 3 Computation
in computational terms. In MT, the initial task was to translate from one system
(S1 ) to another (S2 ), seeking equivalences in the structure and the lexicon of the
two, but this turned out to be insufficient to produce translations that approximated the abilities of a human translator. So, work in MT led eventually to a focus
on semantic systems, real-world knowledge (network theory), pragmatic forms
(scripts), and so on and so forth. I argued that the rise of pragmatics and conceptual metaphor theory in linguistics came about, at least indirectly, by the failures
of MT and the rise of CL to solve unexpected problems such as the high density of
metaphorical speech in language. In this case the computer was a catalyst in expanding the purview of linguistics. That is to say, what began as an effort to make
MT more imitative of human translation, morphed into a discipline dedicated to
unraveling the nature of language using computer modeling and simulation. Similarly, in CT, it can be argued that the rise in the heuristic modeling of problems,
rather than on concrete solutions, including which problems have or have not a
solution, has expanded mathematics epistemologically. So, the research in both
CL and CT has led to expanding the research paradigms of both mathematics and
linguistics as well as the span of the common ground on which they rest. But
we are still left with the question of whether the algorithms are truly reective
of human mental processes and thus truly describe the nature of language and
mathematics.
Aware of the implications of this question, computer scientists have been developing algorithms to test theories of language development and even to predict
certain aspects of how languages are acquired in infancy. If there is a match then
the conclusion is, surely, that the algorithms are psychologically real. The computer modeling of language learning has the advantage of making it possible to
manipulate the algorithms and the data as the data is assembled. This is an example of black box testing, or checking that the output of a program yields what is
expected and then, on that basis, to infer the validity of the algorithm, modifying
it appropriately.
Unauthenticated
Download Date | 6/6/16 9:42 PM
181
Unauthenticated
Download Date | 6/6/16 9:42 PM
182 | 3 Computation
thus consciousness itself. The result is always undecidable. In his classic book,
Mental models (1983), Johnson-Laird gives us a good overall taxonomy for the
kinds of machines or theoretical algorithm systems that have been used (unconsciously) to model consciousness:
1. Cartesian machines which do not use symbols and lack awareness of themselves
2. Craikian machines (after Craik 1943) that construct models of reality, but
lack self-awareness
3. self-reective machines that construct models of reality and are aware of their
ability to construct such models.
Programs designed to simulate human intelligence are Cartesian machines in
Johnson-Lairds sense, whereas animals and human infants are probably Craikian
machines. But only human infants have the capacity to develop self-reective
consciousness, which Maturana and Varela (1973) aptly called autopoietic, that
is, a machine that is capable of self-generation and self-maintenance. To quote
McNeill (1987: 262):
Self-aware machines are able to act and communicate intentionally rather than merely as
if they were acting intentionally (of which Craikian machines are capable). This is because
they can create a model of a future reality and a model of themselves deciding that this
reality should come into being.
As McNeill (1987: 262264) goes on cogently to argue, self-awareness is tied to linguistic actions. The inner speech that Vygotsky discussed is a manifestation
of self-awareness. Unlike a Cartesian machine, a human being can employ selfawareness, at will, to construct models of reality. But this also means social conditioning, a dimension that is completely lacking in AI. As McNeill (1987: 263) states,
We become linguistically conscious by mentally simulating social experience.
Perhaps the view of consciousness that is the most relevant to the topic at
hand is the one put forward by Popper (1935, 1963). Popper classied the world of
the mind into three domains. World 1 is the domain in which the mind perceives
physical objects and states instinctively, whereby human brains take information by means of neuronal synapses transmitting messages along nerve paths
that cause muscles to contract or limbs to move. It is also the world of things.
World 1 may describe human-built Cartesian machines and Craikian machines/organisms. World 2 is the domain of subjective experiences. This is the level at
which the concept of Self emerges, as the mind allows humans to differentiate
themselves from the beings, objects, and events of the outside world. Craikian
machines might participate in this world, but likely to a limited degree. It is at
Unauthenticated
Download Date | 6/6/16 9:42 PM
183
this level that we perceive, think, plan, remember, dream, and imagine; so actual
machines might simulate these faculties but not really possess them in any human sense. World 3 is the domain of knowledge in the human sense, containing
the externalized artifacts of the human mind. It is, in other words, the humanmade world of culture, including language and mathematics. This corresponds to
Johnson-Lairds self-reective level; in order to create mathematics one has to
possess this level of consciousness, otherwise mathematics would be reduced to
counting for survival. The World 1 states become World 2 and World 3 ones through
imaginative thought (such as metaphorical thought), not through algorithmic processes. As Hayward (1984: 49) has stated: we could say that our extended version
of Poppers World 3, which includes a very large part of World 1 and of World 2, is
formed by interacting webs of metaphor gestalts. There is no evidence that any
Cartesian or Craikian machine has access to this form of consciousness.
No Cartesian or Craikian machine can ever reach World 3 because it has no
historical knowledge that leads to it. Terry Winograd (1991: 220), a leading researcher himself in articial intelligence, has spotted the main weakness in the
belief that computational artifacts are psychologically real, putting it as follows:
Are we machines of the kind that researchers are building as thinking machines? In asking this kind of question, we engage in a kind of projectionunderstanding humanity by
projecting an image of ourselves onto the machine and the image of the machine back onto
ourselves. In the tradition of articial intelligence, we project an image of our language activity onto the symbolic manipulations of the machine, then project that back onto the full
human mind.
As Nadeau (1991: 194) has also put it, such exercises in theoretical reasoning in
both formalist and computationist models of mind are essentially artifacts: If
consciousness is to evolve on this planet in the service of the ultimate value, we
must, I think, quickly come to the realization that reality for human beings is a
human product with a human history, and thereby dispel the tendency to view
any product of our world-constructing minds as anything more, or other, than a
human artifact.
The computer is one of our greatest intellectual achievements. It is an extension of our logical intellect. We have nally come up with a machine that will
eventually take over most of the arduous work of the logical calculus. Arnheims
(1969: 73) caveat is still valid today: There is no need to stress the immense practical usefulness of computers. But to credit the machine with intelligence is to
defeat it in a competition it need not pretend to enter. In Sumerian and Babylonian myths there were accounts of the creation of life through the animation of
clay (Watson 1990: 221). The ancient Romans were fascinated by automata. By
the time of Mary Shelleys Frankenstein in 1818, the idea that robots could be
Unauthenticated
Download Date | 6/6/16 9:42 PM
184 | 3 Computation
brought to life both fascinated and horried the modern imagination. Since the
rst decades of the twentieth century the quest to animate machines has led to
many fascinating achievements, from AI to Google. As William Barrett (1986: 160)
has warned, if a self-reective machine will ever be built it would have a curiously disembodied kind of consciousness, for it would be without the sensitivity,
intuitions, and pathos of our human esh and blood. And without those qualities
we are less than wise, certainly less than human.
3.5.2 Overview
The ELIZA program was an early attempt to model human speech, by simply
matching questions and answers on the basis of simple discourse patterns. The
humans who were exposed to ELIZA interpreted the answers as being delivered
by a conscious entity. ELIZA had passed the Turing Test, or Turings (1936) idea
that if a human cannot distinguish between the answers of a computer and a
human, he or she must conclude that the machine is indeed intelligent. This
raises some deep questions about intelligence and consciousness. So, although
it is well known, the Turing Test is worth revisiting here by way of conclusion to
the theme of computation in language and mathematics.
In 1950, shortly before his untimely death in his early forties, Turing suggested
that one could program a computer in such a way that it would be virtually impossible to discriminate between its answers and those contrived by a human being.
This notion quickly became immortalized as the Turing Test. Consider an observer
in a room which hides on one side a programmed computer and, on the other, a
human being. The computer and the human being can only respond to the observers questions in writingsay, on pieces of paper which both pass on to the
observer through slits in the wall. If the observer cannot identify, on the basis of
the written responses, who is the computer and who the human being, then he
or she must conclude that the machine is intelligent and conscious. It has passed
the Turing Test.
The counter-argument to the Turing Test came from John Searle (1984) and
his Chinese Room illustration. Searle argued that a machine does not know
what it is doing when it processes symbols, because it lacks intentionality. Just
like a human being who translates Chinese symbols in the form of little pieces
of paper by using a set of rules for matching them with other symbols, or little
pieces of paper, knows nothing about the story contained in the Chinese pieces
of paper, so too a computer does not have access to the story inherent in human
symbols. As this argument made obvious, human intentions cannot be modeled
algorithmically. Intentionality is connected intrinsically with the interpretation
Unauthenticated
Download Date | 6/6/16 9:42 PM
185
of incoming information and the meaning codes that humans have acquired from
cultural inputs.
The modeling of mathematical and linguistic knowledge cannot be extricated
from the question of intentionality. It cannot be reduced to a Turing machine.
This does not preclude the importance of modeling information in itself, as argued throughout this chapter. Shannons (1948) demonstration that information
of any kind could be described in terms of binary choices between equally probable alternatives is still an important one. Information in this computable model
is dened as data that can be received by humans or machines, and as something that is mathematically probabilistica ringing alarm signal carries more
information than one that is silent, because the latter is the expected state of
the alarm system and the former its alerting state. When an alarm is tripped in
some way, the feedback process is started and the information load of the system
increases (indeed reaches its maximum). Shannon showed, essentially, that the
information contained in a signal is inversely proportional to its probability of
occurrencethe more likely a signal, the less information load it carries; the less
likely, the more. But this does not solve what can be called the central computational dilemmahow to get a machine to interpret information not in simple
probabilistic terms but in ways that relate the information to its historical meanings and to the intentions of the purveyor or conveyor of the informationthe
Chinese Room dilemma.
This problem seems to be intractable, even though the modeling methods
in CT and CL have become increasingly rened, sophisticated, and intelligent.
Work that allows computers to produce linguistic outputs that are very close to
human speech are improving dramatically, with algorithms that allow a computer
to modify its style of production to take into account abstract pragmatic factors
such as politeness, anger, deference, and other social features of register. But the
question becomes: Can the computer truly understand this (the Chinese Room
dilemma)? Comprehension is just as intractable as anything else in CT or CL. It is
relevant to note that Turing himself was aware of the limitations of computability
theories in general. He proved, in fact, that a machine, unlike humans, would not
stop for a given input and consider it differently from its program.
The premise guiding all computational modeling is that any theory of the
mind can be reduced to the search for the algorithmic procedures that relate mind
states to brain statesthat is, the minds functions can be thought of as attendant
to neurological states and that these, in turn, can be thought of as operations akin
to those that a computer is capable of carrying out. It is a form of black-box theorizing, as mentioned, but it avoids the Chinese Room dilemma and other aspects
of intentionality.
Unauthenticated
Download Date | 6/6/16 9:42 PM
186 | 3 Computation
Clearly, SHRDLU passes the Turing Test, but it could not possibly pass the Chinese
Room Test. In fairness, the goal of NLP theories is not to bring the computer to
consciousness, but to get it to reproduce natural language in such a way that it
approximates what humans do when they talk, and thus glean insights from the
modeling process itself. The computer may not know what it is doing, but it does
it well nonetheless.
CL has opened up a truly fascinating debate about the nature of linguistic inquiry and how to conduct linguistic research. It is relevant to note that Chomsky
has often been skeptical of NLP, rmly believing that language is specic to the
human species and cannot be reproduced in computer software. As we have
discussed throughout, for Chomsky, the speech faculty is constituted by a set of
Unauthenticated
Download Date | 6/6/16 9:42 PM
187
universal principles present in the brain at birth that are subjected to culturallyspecic parameters during infancy. The parameter-setting feature of Chomskys
theory assigns some role to experiential factors. But he has always maintained
that the role of the linguist is to search out the universal rule-making principles
that make up the speech faculty. In reviewing Chomskys Syntactic Structures,
Robert Lees (1959) predicted that it would revolutionize linguistics, rescuing it
from its prescientic and piecemeal descriptive practices. Data collection and
classicatory assemblages of linguistic facts are interesting in themselves, but
useless for the development of a theory of language. Chomsky (1990: 3) himself
articulated the main goal of linguistics as the search for an answer to the question: What is the initial state of the mind/brain that species a certain class of
generative procedures?
One of the more zealous advocates and defenders of this perspective is Jerry
Fodor (1975, 1983, 1987). Fodor sees the mind as a repository of formal symbols.
Because symbols take on the structure of propositions in discourse, and so serve
thought during speech, he refers to them as mental representations that are decomposable into nite-state rules that are converted to higher structures by conversion rules. Cumulatively, they constitute the brains language of thought.
Like Chomsky, Fodor sees language as a mental organ present in the brain at
birth, equipping humans with the ability to develop the specic languages that
cultures require of them. The psycholinguist Stephen Pinker (1990: 230231), another staunch formalist, agrees:
A striking discovery of modern generative grammar is that natural languages all seem to be
built on the same basic plan. Many differences in basic structure but different settings of
a few parameters that allow languages to vary, or different choices of rule types from a
fairly small inventory of possibilities On this view, the child only has to set these parameters on the basis of parental input, and the full richness of grammar will ensue when those
parametrized rules interact with one another and with universal principles. The parametersetting view can help explain the universality and rapidity of language acquisition: when
the child learns one fact about her language, she can deduce that other facts are also true of
it without having to learn them one by one.
The problem with such views is that, as Rommetveit (1991: 12) has perceptively
remarked, they ignore a whole range of lived phenomena such as background
conditions, joint concerns, and intersubjectively endorsed perspectives. As Rommetveit goes on to observe, we really can never escape the vagueness and indeterminacy of the social situation or of the intentions of the interlocutors when
we engage in discourse, no matter how precise the analysts assessment may
appear to be. Pinkers analysis of language ontogenesis is an acceptable interpretation, among many others, if it is constrained to describing the development of
Unauthenticated
Download Date | 6/6/16 9:42 PM
188 | 3 Computation
syntax in the child. But it is not a viable psychological theory, because it ignores
a much more fundamental creative force in the childthe use of metaphorical
constructs to ll in knowledge gaps that the child development literature has documented rather abundantly.
From the failure of MT to incorporate performance factors into its algorithms,
work in CL has led indirectly to a refocusing of linguistic inquiry in general. It can
be argued that it brought about a signicant number of defections from the Chomskyan camp. A focus on how gurative meaning interconnects with other aspects
of language, including grammar, is the most promising direction for CL to take.
If nothing else, the plethoric research conducted on the worlds languages during the last century has amply documented that syntactic systems are remarkably
alike and rather unrevealing about the nature of how a message is programmed
differentially among people living in different cultures. It has shown, in my opinion, that syntax constitutes a kind of organizing grid for the much more fundamental conceptual-semantic plane.
The question becomes: If metaphor is truly a unique human feature of
language and mathematics, is it still programmable? Among the rst to model
metaphorical cognition computationally were Eric MacCormac (1985) and James
M. Martin (1990) who were able easily to model what rhetoricians call frozen
metaphors, those that have lost their metaphorical semantics due to frequency
of usage, leaving judiciously out the computational study of creative or novel
metaphors, which as they admitted are virtually impossible to model. But, despite the difficult computational problems involved, metaphor processing is a
rapidly expanding area in NLP. Because of its data-processing and data-mining
capacities, the corpus that the computer can examine for metaphoricity in real
speech has become a crucial part of NLP, with deep implications for the automatic
identication and interpretation of language indispensable for any true NLP.
The turn of the millennium witnessed a technological leap in natural language computation, as manually crafted rules have gradually given way to more
robust corpus-based statistical methods. This is also the case for metaphor research. Recently, the problem of metaphor modeling has become a central one,
given the increase in truly sophisticated statistical techniques. However, even the
statistically-based work has been producing fairly limited results in getting the
computer to understand metaphorical meaning. The computer can of course produce new metaphorical language ad innitum, but it takes a human brain to interpret it. At the same time, work on computational lexical semantics, applying
machine learning to open semantic tasks, has opened up many new paths for
computer scientists to pursue in programming metaphorical competence. It still
remains to be seen how far this line of inquiry can proceed. All that can really be
done is to examine the trends in computational metaphor research and compare
Unauthenticated
Download Date | 6/6/16 9:42 PM
189
Unauthenticated
Download Date | 6/6/16 9:42 PM
190 | 3 Computation
in blending theory that there is a shared mechanism for both metaphorical and
literal language comprehension. So, the eld is still open, needing much more
extensive research.
The main issues in the computational modeling of metaphor comprehension
include the following:
1. distinguishing algorithmically between conceptual and linguistic metaphor
2. distinguishing between frozen and novel metaphors
3. dening multiword metaphorical expressions
4. programming extended metaphor and metaphor in discourse
Metaphor processing systems that incorporate state-of-the-art NLP methods include the following themes and issues:
1. statistical metaphor processing modalities
2. the incorporation of various lexical resources for metaphor processing
3. the use of large corpora
4. programs for the identication of conceptual and linguistic metaphor
5. metaphorical paraphrasing
6. metaphor annotation in corpora
7. datasets for evaluation of metaphor processing tools
8. computational approaches to metaphor based on cognitive evidence
9. computational models of metaphor processing based on the human brain
Despite the many caveats mentioned in this chapter, in the end, all human knowledge inheres in model-making. Models of nature, of the mind, and so on are how
we ultimately understand things. The worst that could occur in science is, as
Barrett (1986: 47) has phrased it, that the pseudo-precise language of theorists leaves us more confused about the matters of ordinary life than we would
otherwise be. In computational approaches to mathematics and languageat
least as I see itthe goal has been to come up with a simple modeling language
that can penetrate the core of the brains processing capacities. In its search for
what it means to be human in everyday situations and to express it in language
or mathematics, computer science may not have found the answer, but it has
spurred on mathematicians and linguists to search for it in new ways.
It was probably Descartes who originated the idea of a universal or articial
common language in the 1600s, although the quest for a perfect language goes
back to the Tower of Babel story. More than 200 articial languages have been invented since Descartes made his proposal. The seventeenth-century clergyman,
John Wilkins, wrote an essay in which he proposed a language in which words
would be built in a nonarbitrary fashion. Volapkinvented by Johann Martin
Schleyer, a German priest, in 1879was the earliest of these languages to gain
Unauthenticated
Download Date | 6/6/16 9:42 PM
191
moderate currency. The name of the language comes from two of its words meaning world and speak. Today, only Esperanto is used somewhat and studied as
an indirect theory of perfect language design. It was invented by Polish physician Ludwik Lejzer Zamenhof. The name is derived from the pen name Zamenhof
used, Dr. Esperanto (1887). The word Esperanto means, as Zamenhof explained
it, one who hopes. Esperanto has a simple and unambiguous morphological
structureadjectives end in /-a/, adverbs end in /-e/, nouns end in /-o/, /-n/ is
added at the end of a noun used as an object, and plural forms end in /-j/. The core
vocabulary of Esperanto consists mainly of root morphemes common to the IndoEuropean languages. The following sentence is written in Esperanto: La astronauto, per speciala instrumento, fotografas la lunon = The astronaut, with a special instrument, photographs the moon. Much like computer languages, there
can be no ambiguity to sentences such as this one.
Esperanto espouses the goal of standardizing language so that ideas can be
communicated in the same way across cultures. Some estimates peg the number
of speakers of Esperanto from 100,000 to over a million. It is difficult to accurately
quantify the speakers, because there is no specic territory or nation that uses
the language. Zamenhof actually did not want Esperanto to replace native or indigenous languages; he intended it as a universal second language, providing a
common linguistic vehicle for communication among people from different linguistic backgrounds. The Universala Esperanto-Asocio (Universal Esperanto Association), founded in 1908, has chapters in over a hundred countries. Cuba has
radio broadcasts in Esperanto. There are a number of periodicals published in
Esperanto, including Monato, a news magazine published in Belgium. Some novelists, such as Hungarian Julio Baghy and the Frenchman Raymond Schwartz,
have written works in Esperanto.
It is ironic to note, however, that research on Esperanto indicates that it has
a tendency to develop dialects, and that it is undergoing various predictable
changes (diachronically speaking), thus impugning its raison dtre. Benjamin
Bergen (2001) discovered that even in the rst generation of speakers, Esperanto
had undergone considerable changes in its morphology and has borrowed words
from other languages. So, perfect languages may not be possible after all, either
as devised by computers or humans. The structure of grammar and vocabulary in
articial languages is reduced to a bare outline of natural language grammar and
vocabulary, and meaning is generally restricted to a denotative rangeone-wordone-meaning. In a phrase, the idea is to eliminate culture-specic knowledge
networks from human language. This is an ideal, but an impossible one to attain,
since even articial languages such as Esperanto apparently develop digressions
from the ideal.
Unauthenticated
Download Date | 6/6/16 9:42 PM
192 | 3 Computation
So, what have we learned about mathematics and language in general from
computationism, from AI, and from articial languages? As mentioned several
times, the most important insight that these approaches have produced inheres
in eshing out patterns that can be modeled and thus compared. As a corollary, it
has become obvious that many aspects of mathematics and language have computational structure. Connecting this structure to meaning continues to be a major
problem. In computationism, three things stand out, which can be reiterated here
by way of conclusion:
1. In the task of writing an algorithm, we may have identied a specic way a
mental process operates and, as a consequence, we can better understand or
evaluate theories about that process.
2. It may be possible to simulate that process on the computer.
3. It might be possible to design computers that can do things that humans do.
This is an open question that requires much more research and theoretical
debate.
Unauthenticated
Download Date | 6/6/16 9:42 PM
4 Quantication
It is the mark of a truly intelligent person to be moved by statistics.
George Bernard Shaw (18561950)
Introductory remarks
If one were to do a very quick calculation of the number of words consisting of a
specic number of letters (2 letters, 3 letters, 4 letters, and so on) as they occur on
several pages of common texts (newspapers, blogs, novels, and so on), a pattern
would soon become conspicuous. Words consisting of two to four letters (to, in,
by, the, with, more) are more frequent overall than words consisting of more letters. If the size of the text is increased, this pattern becomes even more apparent.
This in itself is an interesting discovery, reinforcing perhaps an intuitive sense
that shorter words are more frequent in all kinds of common communications
because they make them more rapid. But there is much more to the story. Grammatical constructions and discourse patterns too seem to be governed by the same
kind of statistical economya fact that is easily discerned today in text messages
and other forms of digital communication. Textspeak, as it is called (Crystal 2006,
2008), reveals a tendency to abbreviate words, phrases, and grammatical forms in
the same way that once characterized telegrams. The reason in the latter case was
to save on the price of sending messages, since each letter would cost a signicant
amount of money. In textspeak it seems to be a stylistic feature that cuts down on
the time required to construct and send messages. The high frequency of shorter
words in all kinds of texts and the propensity to abbreviate language forms in
rapid communication systems suggests a principle that can be paraphrased simply as the tendency to do more with less. This principle, as it turns out, has been
investigated and researched seriously by linguists and mathematicians.
Wherever one looks in both mathematics and language, one will note what
can be called an economizing tendency. In other words, there are aspects of both
systems (if not many) that can be measured as compression phenomena and this
can lead to various theoretical conclusions about the nature of the two systems.
The approach to the study of mathematics and language as governed by laws
of statistics, probability, and quantication of various kinds can be allocated to
the general rubric of quantication. Statistical-quantitative techniques have been
applied to the investigation of the structure of natural languages, to patterns
inherent in language learning, to rates of change in language, and so on. The
general aim has generally been to unravel hidden patterns in language. At rst,
Unauthenticated
Download Date | 6/6/16 9:43 PM
194 | 4 Quantication
Unauthenticated
Download Date | 6/6/16 9:43 PM
(n a )(n b ) = n a+b
(n a )(m a ) = (nm)a
(n a ) (n b ) = n ab
(n ) = n
a b
(n = 0)
ab
na = 1/n a
Exponential numbers also became the catalyst of the theory of logarithms, which
similarly started out as a means of making computations much more efficient and
automatic. Logarithms have since been used in many areas of mathematics, science, and statistics, allowing for all kinds of discoveries to occur in these domains
as well. The relevant point is that a simple notational device invented to make a
certain type of multiplication easier to read was the source of many discoveries,
directly or indirectly. The history of mathematics is characterized by the invention of notational strategies (such as exponents) that have led serendipitously to
unexpected discoveries.
By probabilistic structure, two things are intended in this chapter. First, it
refers to aspects in both language and mathematics which can be studied with
the tools of probability theory or can be quantied in order to assess them theoretically; and second, it refers to the ways in which information is compressed in
both systems. Whatever the case, it is obvious that quantication maps out another area of the common ground shared by linguistics and mathematics, and
so we will start this chapter off with a brief historical digression into statistical
techniques and their general implications for the study of both.
Unauthenticated
Download Date | 6/6/16 9:43 PM
196 | 4 Quantication
Unauthenticated
Download Date | 6/6/16 9:43 PM
enormous amounts of data in a few tables in order to show visually what the data
imply. His goal was to communicate information economically and effectively.
Here is a sample of one of his tables:
Table 4.1: Example of one of Graunts tables
Buried within the walls of London
Whereof the plague
Buried outside the walls
Whereof the plague
Buried in total
Whereof the plague
3,386
1
5,924
5
9,310
6
Graunt then went on to derive percentages to show the relative quantities for comparison purposes. From this simple technique of gathering data and displaying it
in an organized fashion, the science of statistics crystallized shortly thereafter.
It has now become a tool of both mathematicians and linguists to study quantiable phenomena or else to assay the probabilistic structure of various phenomena
within mathematics and language. In effect, it truly denes a large stretch of common ground.
Unauthenticated
Download Date | 6/6/16 9:43 PM
198 | 4 Quantication
The Monty Hall Problem and the Prosecutors Fallacy will be discussed subsequently. For now, it is important to note that the key idea is that of the normal
distributionthe curve that has the shape of a bell. The curve is a continuous
probability distribution, indicating the likelihood that any real observation will
fall between any two limits as the curve approaches zero on either side. A normal
distribution is characterized mathematically as follows:
(x)
1
e 22
2
f(x, , ) =
In this formula, is the mean (or expectation of the distribution) and the standard deviation. Without going into the mathematical details here, suffice it to say
that when = 0 and = 1 the distribution becomes the normal curve. This is a
remarkable discovery of statisticians, having changed our whole view of random
phenomena. Statistical applications have shown consistently that phenomena of
various kinds hide within them a patterna specic statistic in random data will
tend to occur within three standard deviations of the mean of the curve, as determined by a specic set of the relevant data.
19.1% 19.1%
15.0%
15.0%
9.2%
9.2%
0.1% 0.5%
3
1.7%
2.5 2
4.4%
4.4%
1.5
0.5
0.5
1.5
1.7%
2
2.5
0.5%
0.1%
The tail ends of the curve are the exceptional onesgiven a large enough sample, most measured phenomena will fall between 2 and 2 standard deviations
(average deviations) from the mean at 0. So, when a statistical test is applied to
the curve and reveals that a variable in the data verges beyond these deviations
then one can infer relevance at different levels of condence. This implies that in
random data it is possible to estimate the probability of occurrence of the value
of any variable within it. The total area under the curve is dened to be 1. We can
multiply the area by 100 and thus know that there is a 100 percent chance that any
value will be somewhere in the distribution. Because half the area of the curve is
below the mean and half above it, we know that there is a 50 percent chance that
Unauthenticated
Download Date | 6/6/16 9:43 PM
a randomly chosen value will be above the mean and the same chance that it will
be below it.
This implies that linguistic and mathematical data, when collected, will likely
have a pattern in it that shows a normal distribution. Some examples of this will
be discussed below. For now, it is remarkable to note that a simple classicatory
and probabilistic method can reveal hidden structure and thus allow us to esh
out some implicit principle in it.
The area under the normal curve is equivalent to the probability of randomly
drawing a value in that range. The area is greatest in the middle, where the hump
is, and thins out toward the tails. There are, clearly, more values close to the mean
in a normal distribution than away from it. When the area of the distribution is
divided into segments by standard deviations above and below the mean, the area
in each section is a known quantity. For example, 0.3413 of the curve falls between
the mean and one standard deviation above the mean, which means that about
34 percent of all the values of a random sample are between the mean and one
standard deviation above it. It also means that there is a 0.3413 chance that a value
drawn at random from the distribution will lie between these two points.
The amount of curve area between one standard deviation above the mean
and one standard deviation below is 0.3413 + 0.3413 = 0.6826, which means that
approximately 68.26 percent of the values lie in that range. Similarly, about 95 percent of the values lie within two standard deviations of the mean, and 99.7 percent
of the values lie within three standard deviations.
99.7% of area
95% of area
68% of area
+1
+2
+3
In order to use the area of the normal curve to determine the probability of occurrence of a given value, the value must rst be standardized, or converted to a
z-score. To convert a value to a z-score means to express it in terms of how many
standard deviations it is above or below the mean. After the z-score is obtained,
one can look up its corresponding probability in a table. The formula to compute
Unauthenticated
Download Date | 6/6/16 9:43 PM
200 | 4 Quantication
a z-score is as follows:
= Mean
= Standard Deviation
z=
Unauthenticated
Download Date | 6/6/16 9:43 PM
can be detected as signicant. This means that we can be quite sure that the difference is real. Signicance tells us, in fact, how sure we can be that a difference
or relationship exists. One important concept in signicance testing is whether
we use a one-tailed or two-tailed test of signicance. This depends on our
hypothesis. If it involves the direction of the difference or relationship, then we
should use a one-tailed probability. For example, a one-tailed test can be used to
test the null hypothesis: Females will not score signicantly higher than males
on an IQ test. This hypothesis (indirectly) predicts the direction of the difference.
A two-tailed test can be used instead to test the opposite null hypothesis: There
will be no signicant difference in IQ scores between males and females. Whenever one performs a signicance test, it involves comparing a test value that we
have calculated to some critical value for the statistic. It doesnt matter what type
of statistic we are calculating (a t-test, a chi-square test, and so on), the procedure
to test for signicance is the same.
In 2005, Adam Kilgarriff (2005) used basic signicance techniques to check
for randomness in language, with the null hypothesis that randomness was a
feature of language, nding the opposite. Signicance testing between corpora
of linguistic data has now become a key tool in investigating language, as the
Kilgarriff study showed. Basically, corpus linguists test their data with statistics.
Psycholinguistic experiments, grammatical elicitation tests and survey-based investigations also commonly involve statistical tests of some sort. A special type
of statistical technique is called the type-token ratioa token is any instance of a
particular morpheme or phrase in a text. Comparing the number of tokens in the
text to the number of types of tokens can reveal how large a range of vocabulary
is used in the text.
The two most common uses of signicance tests in corpus linguistics are calculating keywords and collocations. To extract keywords, the statistical signicance of every word that occurs in a corpus must be determined, by comparing
its frequency with that of the same word in a reference corpus. When looking for
a words collocations, the co-occurrence frequency of that word and everything
that appears near it once or more in the corpus is determined statistically. Both
procedures typically involve many thousands of signicance tests.
Regression analysis involves identifying the relationship between a dependent and an independent variable. A relationship is hypothesized and estimates
if the values are used to develop a regression equation. Various tests are then employed to determine if the model is satisfactory, and if the equation can be used
to predict the value of the dependent variable. Correlation analysis also deals
with relationships among variables. The correlation coefficient is a measure of
association between two variables. Values of the correlation coefficient are always between 1 and +1. A correlation coefficient of +1 indicates that two vari-
Unauthenticated
Download Date | 6/6/16 9:43 PM
202 | 4 Quantication
ables are perfectly related in a positive linear sense, while a correlation coefficient
of 1 indicates that two variables are perfectly related in a negative linear sense,
and a correlation coefficient of 0 indicates that there is no linear relationship between the two variables. Correlation makes no a priori assumption as to whether
one variable is dependent on the other(s) and is not concerned with the relationship between variables; instead it gives an estimate of the degree of association
between the variables, testing for interdependence of the variables. Regression
analysis describes the dependence of a variable on one (or more) explanatory
variables; it implicitly assumes that there is a one-way causal effect from the explanatory variable(s) to the response variable, regardless of whether the path of
effect is direct or indirect.
Speelman (2014) gives a comprehensive overview of how these basic statistical techniques inform and guide the conduct of research in corpus linguistics.
Focusing on regression analysis, he explains why it is exceptionally well suited to
compare near-synonyms in corpus data, allowing us to identify the different factors that have an impact on the choice between near synonyms, and to determine
their respective effects.
Unauthenticated
Download Date | 6/6/16 9:43 PM
203
1
)
d
ln (1 + 1d )
ln (10)
The underlying assumption of Benfords Law is that the sample quantities, expressed in the base 10 and more or less arbitrary units will be fairly evenly distributed on a logarithmic scale. So, this is why the probability of the leading digit
being d clearly approaches:
ln (1 + 1d )
1
log10 (d + 1) log10 (d)
= log10 (1 + ) =
log10 (10) log10 (1)
d
ln (10)
Benfords Law applies mainly to data that are distributed uniformly across many
orders of magnitude. On the other hand, a distribution that lies within one order
Unauthenticated
Download Date | 6/6/16 9:43 PM
204 | 4 Quantication
1
)
d
Research in QM has shown that distributions that conrm Benfords Law include
statistical data where the mean is greater than the median and the skew is positive; numbers produced through various combinations, such as quantity unit
price; and various calculations such as multiplicative ones whose answers fall
into a logarithmic distribution. As Havil (2008: 192) suggests, there are at least
two main observations to be made vis--vis Benfords Law:
One, that if Benfords Law does hold, it must do so as an intrinsic property of the number
systems we use. It must, for example, apply to the base 5 system of counting of the Arawaks
of North America, the base 20 system of the Tamanas of the Orinoco and the Babylonians
with their base 60, as well as to the exotic Basque system, which uses base 10 up to 19,
base 20 from 20 to 99 and then reverts to base 10. The law must surely be base independent.
The second is that changing the units of measurement must not change the frequency of
rst signicant digits.
What Havil is pointing out here is that Benfords Law must be a law of numbers
not numeration. As such it is a veritable mathematical discovery. As we shall see
below, a version of the law applies to language as well, thus uniting mathematical
and linguistic probability phenomena rather unexpectedly.
Benfords Law seems to crop up everywhere. Bartolo Luque and Lucas Lacasa
(2009) used it to examine prime numbers. It is known that prime numbers, in
very large datasets, are not distributed according to the law. Rather, the rst digit
distribution of primes seems to be uniform. However, as Luque and Lacasa discovered, smaller datasets (intervals) of primes exhibit a signicant bias in rst digit
distribution. They also noticed another remarkable pattern: the larger the dataset
Unauthenticated
Download Date | 6/6/16 9:43 PM
205
of primes, the more closely the rst digit distribution approached uniformity. The
researchers wondered, therefore, if there existed any pattern underlying the trend
toward uniformity as the prime interval increased to innity.
The set of all primes is innitea fact proved by Euclid, as is well known. From
a statistical point of view, one difficulty in this kind of analysis is deciding how
to choose numbers randomly in an innite dataset. So, only a nite interval can
be chosen, even if it is not possible to do so completely randomly in a way that
satises the laws of statistics and probability. To overcome this obstacle, Luque
and Lacasa chose several intervals of the shape [1, 10d]; for example, 1100,000
for d = 5, and so on. In these sets, all rst digits are equally probable a priori.
So if a pattern emerges in the rst digit of primes in a set, it would reveal something about the rst digit distribution of primes within that set. By looking at sets
as d increases, Luque and Lacasa thus investigated how the rst digit distribution of primes changes as the dataset increases. They found that primes follow a
size-dependent Generalized Benfords Law (GBL), which describes the rst digit
distribution of numbers in series that are generated by power law distributions,
such as [1, 10d]. As d increases, the rst digit distribution of primes becomes more
uniform.
Signicantly, Luque and Lacasa showed that the GBL can be explained by
the prime number theoremspecically, the shape of the mean local density of
the sequences is responsible for the pattern. The researchers also developed a
framework that provides conditions for any distribution to conform to a GBL. The
conditions build on previous research. Luque and Lacasa also investigated the sequence of nontrivial Riemann zeta zeros, which are related to the distribution of
primes, and whose distribution of the zeros is considered to be one of the most important unsolved mathematical problems. Although the distribution of the zeros
does not follow BL, here the researchers found that it does follow a size-dependent
GBL, as in the case of the primes.
This is a crucial, if unexpected nding about primes, that may lead to solving some of the most intractable theorems in prime number theory, such as the
Riemann Hypothesis (Derbyshire 2004, Du Sautoy 2004, Sabbagh 2004, Wells
2005, Rockmore 2005). In 1859, Bernhard Riemann presented a paper to the Berlin
Academy titled On the Number of Prime Numbers Less Than a Given Quantity
in which he put forth an hypothesis that remains unsolved to this day. Riemann
never provided a proof for his hypothesis and his housekeeper burnt all his personal papers on his death. It is a proof that is waiting to be made, so to speak,
even though it has already led to several signicant discoveries in primality. On a
number line, the primes become scarcer and scarcer as the numbers on the line
grow larger: twenty-ve percent of the numbers between 1 and 100, 17 percent
of the numbers between 1 and 1,000, and 7 percent of the numbers between 1
Unauthenticated
Download Date | 6/6/16 9:43 PM
206 | 4 Quantication
and 1,000,000 are primes. Paul Erds (1934) proved that there is at least one prime
number between any number greater than 1 and its double. For example, between
2 and its double 4 there is one prime, 3; between 11 and its double 22 there are three
primes, 13, 17, and 19. Riemann argued that the thinning out of primes involves an
innite number of dips called zeroes, on the line, and it is these zeroes that
encode all the information needed for testing primality. So far no vagrant zero
has been found, but at the same time no proof of the hypothesis has ever come
forward.
From previous work, Riemann knew that the number of primes around a
given number on the line, n, equals the reciprocal of the natural logarithm of that
numberthe number of times we have to multiply e by itself to get a given number.
Riemann showed that at around one million, whose natural logarithm is about 3,
every 13th number or so is prime. At one billion, whose natural logarithm is 21,
about every 21st number is prime. A pattern seems to jut out from such discoveries.
So, Riemann asked why primes were related to natural logarithms in this way. He
suspected that he might nd a clue to his question in a sequence, {1 + 1/2s + 1/3s +
1/4s + + 1/ns }, now called the Riemann zeta function. For imaginary numbers
the zeta function equals zero. Proving the hypothesis means proving that every
exponent makes summing the fractions in the zeta function zero. If the hypothesis
is right, then we will know how the primes thin out along the number line. So far
computers have been able to verify the hypothesis for the rst 50 billion. What
kind of proof would be involved in showing that it applies to all? Incredibly, the
zeta function is related to the energies of particles in atomic nuclei, to aspects of
the theory of relativity, and other natural phenomena.
What is remarkable is that a simple quantication phenomenon that crops
up in one domain morphs to another to provide insights into it. Clearly, statistical
techniques are indeed revelatory vis--vis the hidden properties of various phenomena.
Unauthenticated
Download Date | 6/6/16 9:43 PM
207
Unauthenticated
Download Date | 6/6/16 9:43 PM
208 | 4 Quantication
ones is the coin toss problem which is worthwhile revisiting here for the sake of
argument and illustration. If a coin is to be tossed eight times in a row, there is
only one possible outcome of throwing all heads (H = head, T = tails):
H H H H H H H H Only possible outcome of eight heads thrown in a row
Another way to describe this outcome is to say that it consists of no tails. There
are, however, eight possible outcomes composed of seven heads and one tail.
These can be shown as follows:
H H H H H H H T One possible outcome of seven heads and one tail
H H H H H H T H A second possible outcome of seven heads and one tail
H H H H H T H H A third possible outcome of seven heads and one tail
H H H H T H H H A fourth possible outcome of seven heads and one tail
H H H T H H H H A fth possible outcome of seven heads and one tail
H H T H H H H H A sixth possible outcome of seven heads and one tail
H T H H H H H H A seventh possible outcome of seven heads and one tail
T H H H H H H H An eighth possible outcome of seven heads and one tail
For six heads and two tails, there are 28 outcomes; for ve heads and three tails,
there are 56 outcomes; and so on. Altogether, the total number of possible outcomes of tails is:
1 + 8 + 28 + 56 + 70 + 56 + 28 + 8 + 1 = 256
So, the probability of getting all heads and no tails in eight tosses is 1/256; the
probability of seven heads and one tail is 8/256 = 1/32; the probability of six heads
and two tails is 28/256 = 7/64; and so on. In sum, calculating probabilities in various seemingly random phenomena allows us to detect pattern. It allows us to
sift the wheat from the chaff of randomness. It also shows that mathematics itself may have an intrinsic probability structure which, when applied to external
phenomena, seems to provide fascinating insights into them. Further excursions
Unauthenticated
Download Date | 6/6/16 9:43 PM
209
into this world of probability will be taken below. In a sense, this type of analysis
brings out what can be called the efficiency of events, by which is meant that
probability theory looks at how things become streamlined through a trial-anderror process that hides within it a denumerable probability system. This system
also brings out that there is a minimal, versus a maximal, way of doing things
and that events occur through one or the other, if no articial interferences are
involved. This efficiency of events criterion shows up in two main ways:
1. It shows up in probability distributions which indicate that the path of least
resistance in a coin toss or in determining the likelihood of two birthdays being on the same day have a denite numerical structure that shows how one
can achieve something minimally.
2. It shows up in the way we do mathematics through compression (such as
exponential notation) which makes it more efficient yet, at the same time,
becomes the source of further mathematics.
Unauthenticated
Download Date | 6/6/16 9:43 PM
210 | 4 Quantication
Unauthenticated
Download Date | 6/6/16 9:43 PM
211
phology because changes in phonology brought about the need for grammatical
reorganization (Clivio, Danesi, and Maida-Nicol 2011). Different devices emerged
in the Romance languages to maintain case distinctionsprepositions, for example, became necessary to distinguish case functions. This transfer of the burden
of meaning from morphological structure to syntactic order suggests that syntax
is a later development in language.
Not all meaning is preserved, however, in reorganization. Sometimes it leads
to expansion and, thus, to the discovery of new meanings. This happens not only
in language change, but also in other systems, including mathematics. For example, the use of superscripts in the representation of exponential numbers, which
was introduced in the Renaissance period, led serendipitously to the investigation
of new laws governing numbers, as already discussed.
The Principle of Economy is not, in itself, an explanatory theory of why change
occurs in the rst place. Nor are its corollaries. To unravel the causes of change, ultimately one must resort to a theorization of the internal forces at work in change.
The explanatory framework under which such inquiry has been conducted is that
of the Principle of Least Effort (PLE), mentioned above. The PLE in language was
likely discovered by the French scholar Guillaume Ferrero in 1894, articulating
it in an article that laid out previously-undetected facts about natural phenomena. Zipf (1929, 1932, 1935, 1949) claimed that its operation was independent of
language and culture. As Van de Walle and Willems (2007: 756) write, Zipf saw language as a self-regulating structure evolving independently from other social
and cultural factors. The PLE is the likely reason why speakers minimize articulatory effort by shortening the length of words and utterances. Through reorganization this leads to change in grammar and vocabulary. The changes, however,
do not disrupt the overall system of language, since they continue to allow people
to interpret the meaning of words and utterances unambiguously and with least
effort or, in some cases, to nd new meanings for them.
Initially, Zipf noticed that the length of a specic word (in number of phonemes) and its rank order in the language (its position in order of its frequency of
occurrence) were in a statistically inverse correlationthe higher the rank order of
a word, the more it tended to be shorter (made up with fewer phonemes). Articles
(the), conjunctions (and, or), and other function words (to, it), which have a high
rank order in English (and in any other language for that matter), are typically
monosyllabic, consisting of 13 phonemes. What emerged as even more intriguing
was that abbreviation and acronymy were used regularly with longer words and
phrases that had gained general and diffuse currency. Modern examples include:
FYO, ad, photo, 24/7, aka, DNA, IQ, VIP, and so on. In some cases, the abbreviated
form eclipsed the full formphoto is now more frequent than photograph in common conversation, as is ad rather than advertisement. These tendencies are now
Unauthenticated
Download Date | 6/6/16 9:43 PM
212 | 4 Quantication
Frequency
10,000 A
1,000
100
10
1
10
100
1,000
Rank Order
10,000
Figure 4.4: Zipan curve of Joyces Ulysses
Note that the slope of the curve is downward from left to right, approaching the
value of 1 (the straight line in the middle). This result emerges no matter what
type of text is used. Indeed, given a large enough corpus, the exact same type
of curve describes the rank order-frequency pattern in newspapers, textbooks,
Unauthenticated
Download Date | 6/6/16 9:43 PM
213
recipe collections, and the like. The larger the corpus the more the curve tends
towards the slope 1. The specic language also does not inuence this result.
Indeed, Zipf used data from widely-divergent languages and found this to be true
across the linguistic spectrum. Not only words, but also web page requests, document sizes on the web, and the babbling of babies have been found to t the
Zipan paradigm. If the different Zipan curves are compared, they tend to show
the following shape in terms of a logarithmic (rather than linear) function:
The relation of word frequency (pn ) to rank order (n) was formalized by Zipf as
follows:
log p n = A B log n
Shortly after the publication of Zipfs research, the mathematician Benoit Mandelbrot (1954, 1983), who developed fractal geometry, became fascinated by its
implications. He detected in it a version of what is called a scaling law in biology.
As a brilliant mathematician, Mandelbrot also made appropriate modications to
Zipfs original formula and, generally speaking, it is Mandelbrots formula that is
used today to study frequency distribution phenomena:
f(k; N, q, s) =
1/(k + q)s
H N,q,s
In this formula, k is the rank of the data, and q and s are parameters of the distribution. N is nite and q = 0. Finally, HN,q,s is as follows:
N
1
(i
+
q)s
i=1
H N,q,s =
Since the mid-1950s, research in various disciplines has largely validated the Zipfian paradigm (Miller and Newman 1958, Wyllys 1975, Rousseau and Zhan 1992,
Li 1992, Ridley and Gonzalez 1994, Perline 1996, Nowak 2000). The most frequent
words are economical in form and they account for most of the actual constitution
Unauthenticated
Download Date | 6/6/16 9:43 PM
214 | 4 Quantication
of sizeable texts, with the rst ranking 15 words accounting for 25 %, the rst 100
for 60 %, the rst 1,000 for 85 % and the rst 4,000 for 97.5 %. Remarkably, the operation of Zipan patterns has been found to surface in various types of activities
and behaviors, from numeration patterns (Raimi 1969, Burke 1991, Hill 1998) to the
distribution of city populations. Perhaps the most relevant nding comes from the
Nielsen Norman Group which examined the popularity of web sites using Zipan
methodology. It found that the rst page is the most popular one (the home page),
the second page is the one that receives second-most requests, and so on. Other
studies have found that Zipan curves characterize the outgoing page requests
there are a few pages that everybody looks at and a large number of pages that are
seen only once. The distribution of hypertext references on the web also appears
to manifest a Zipan distribution.
In early research, Zipf did not bring meaning and cultural diversity into his
statistical analyses. However, when he did, he also found some fascinating patterns. For example, he discovered that, by and large, the number of words (n) in
a verbal lexicon or text was inversely proportional to the square of their meanings (m): (n)(m)2 = C. In 1958, psycholinguist Roger Brown (1958) claimed that
Zipan analysis could even be extended to explain the Whoran concept of codability (Whorf 1956). This notion implies that speech communities encode the
concepts that they need. And this determines the size and composition of their
vocabularies. If speakers of a language need many colors for social reasons (such
as clothing fashion), then they will develop more words for color concepts than do
the speakers of other languages. Codability extends to the grammar (verb tenses,
noun pluralization, and many others), which is a guide to a speech communitys
organization of time and space. For instance, if planning ahead of time for future
events is not part of a communitys need, then the verb system will either not have
a future tense-marking system, or else will use it minimally. Thus, vocabulary and
grammar reveal codability. Brown (1958: 235) put it as follows:
Zipfs Law bears on Whorfs thesis. Suppose we generalize the nding beyond Zipfs formulation and propose that the length of a verbal expression (codability) provides an index of
its frequency in speech, and that this, in turn, is an index of the frequency with which the
relevant judgments of difference and equivalence are made. If this is true, it would follow
that the Eskimo distinguishes his three kinds of snow more often than Americans do. Such
conclusions are, of course, supported by extralinguistic cultural analysis, which reveals the
importance of snow in the Eskimo life, of palm trees and parrots to Brazilian Indians, cattle
to the Wintu, and automobiles to the American.
This interpretation of Zipan theory was critiqued by George Miller (1981: 107) as
follows: Zipfs Law was once thought to reect some deep psychobiological principle peculiar to the human mind. It has since been proved, however, that com-
Unauthenticated
Download Date | 6/6/16 9:43 PM
215
pletely random processes can also show this statistical regularity. But a resurgence of interest in Zipan analysis today suggests that it may have tapped into
something deep indeed, although some renement or modication is needed
to guide the tapping. Recent work by Ferrer i Cancho (Ferrer i Cancho and Sole
2001, Ferrer i Cancho 2005, Ferrer i Cancho, Riordan, and Bollobs 2005), for instance, has shown that there are social reasons behind the operation of Zipfs
law. In other words, Zipfs law does not operate blindly but rather in response
to communicative and other pragmatic factors. When there are small shifts in the
effort expended by speaker or hearer, changes occur cumulatively because they
alter the entropy of the whole system. Interestingly, Zipfs law has been found in
other species. For example, McCowan, Hanser, and Doyle (1999) discovered that it
applies to dolphin communication which, like human language, had a slope of 1;
however, in squirrel monkeys it is 0.6, suggesting a simpler form of vocalization.
As Colin Cherry (1957: 103) pointed out a while back, Zipf understood the
relation between effort and language rather insightfully, unlike what his critics
believed:
When we set about a task, organizing our thoughts and actions, directing our efforts toward
some goal, we cannot always tell in advance what amount of work will actually accrue; we
are unable therefore to minimize it, either unconsciously or by careful planning. At best we
can predict the total likely work involved, as judged by our past experience. Our estimate of
the probable average rate of work required is what Zipf means by effort, and it is this, he
says, which we minimize.
In human affairs there are always two forces at work, Zipf asserted: a social force
(the need to be understood), which he called the Force of Unication, and the
personal force or the desire to be brief, which he called the Force of Diversication.
Clearly, therefore, the implications of Zipan analysis go far beyond the simple
statistical study of how form (length of words) and frequency of usage correlate.
In a fundamental way, the overall consequence afforded by the work in Zipfian analysis is a specic realization of Gregory Batesons aim, contained in his
Steps to an ecology of mind (1972), to understand the relation between form and
content, mind and nature, using scientic rather than speculative philosophical
theories. By showing a statistical correlation between the form of communication
and its usage, one will be on a more scientic footing in developing theories of
linguistic change.
Unauthenticated
Download Date | 6/6/16 9:43 PM
216 | 4 Quantication
Unauthenticated
Download Date | 6/6/16 9:43 PM
217
level (such as morphemes, actual numbers) bear meaning or function. The lower
level units have differential function, that is, they provide the minimal cues for
making distinctions at the higher level. The higher-level units have combinatory
function, since they are combinations from the set of units at the rst level and
thus possess meaning in themselves. Double articulation does not seem to occur
in the signal systems of animals, making it a unique property of human systems.
Nth (1990: 155) puts it as follows:
Among these features, double articulation most certainly does not occur in natural animal
communication systems. Most probably, not even the ape language Yerkish is decoded as
a system with double articulation. Some authors who ascribe the feature of double articulation to bird calls and other animal languages seem to take the mere segmentability of
acoustic signals for a level of second articulation, However, a prerequisite of a truly phonemic patterning is that the same minimal but meaningless elements are combined to form
new messages. When they are substituted for each other, the substitution results in a semantic difference. This type of patterning seems to be absent from animal communication
systems.
Unauthenticated
Download Date | 6/6/16 9:43 PM
218 | 4 Quantication
y = f(x)
A
b
E
X
F
life phenomena. For example we could nd the largest rectangle that has a given
perimeter or the least dimensions of a carton that is to contain a given volume,
both of which are deemed to have efficiency features.
In a more general framework, efficiency is connected to economy and thus
compression. Perhaps the most salient manifestation of the relation between the
two is in the use of symbols. As Godino, Font, Wilhelmi, and Lurduy (2011: 254)
observe, compression via symbolization is a central aspect of mathematics:
If we consider, for example, the knowledge required to nd the number of objects in a set,
it is necessary to use some verbal or symbolic tools, procedures, counting principles, etc.
Consequently, when an agent carries out and evaluates a mathematical practice, it activates
a conguration of objects formed by problems, languages, concepts, propositions, procedures, and arguments. The six types of primary entities postulated extend the traditional
distinction between conceptual and procedural knowledge when considering them insufficient to describe the intervening and emergent object in mathematical activity. The problems
are the origin or reason of being of the activity; the language represents the remaining entities and serves as an instrument for the action; the arguments justify the procedures and
propositions that relate the concepts to each other. The primary objects are related to each
other forming congurations, dened as the networks of intervening and emergent objects
from the systems of practices. These congurations can be socio-epistemic (networks of institutional objects) or cognitive (networks of personal objects).
Without notation, there would be no abstractions, theories, propositions, theorems, and so on in mathematics. There would be only counting and measuring
practices. As Steenrod, Halmos, and Dieudonn (1973) point out, notational systems are compressions of linguistic notions, and a mathematical system without
language would be indecipherable to the brain. This is why we have to explain
each and every math symbol in language, and when the notation leads to new
ideas, then those ideas have to be not only symbolized, but also explained with
language.
Unauthenticated
Download Date | 6/6/16 9:43 PM
| 219
Unauthenticated
Download Date | 6/6/16 9:43 PM
220 | 4 Quantication
Much suspicion about the validity of stylometry existed until Donald Foster
brought the eld into the spotlight with his 1996 study that correctly identied the
author of the pseudonymously authored book, Primary Colors, as Joe Klein (Foster
2001). This led to an upsurge in interest in corpus linguistics generally and more
specically in stylometry among linguists, literary scholars, and others. Statistical studies of idiolect started to appear in the 2000s. A fascinating study was
carried out by James Pennebaker (2011). Studying the speeches of American presidents, Pennebaker found an inordinate use of the pronoun I in them, relative
to other speech styles and idiolects. The reason is, Pennebaker suggested, that
a president may unconsciously wish to personalize his commitment to specic
causes or issues through I-word use. He discovered, surprisingly, that president
Obama turned out to be the lowest I-word user of any of the modern presidents,
including Truman who came in second in this regard. He did not interpret this,
however, as humility or insecurity on the part of Obama, but rather, as its diametrical opposite (condence and self-assurance). Pennebaker based this analysis
on his statistical nding that self-assured speakers used I less than others, although most people would assume the opposite. It shows emotional distance from
a cause, not an emotional entanglement in it. In effect, Pennebaker suggests, function words (pronouns, articles, and the like) reveal more about idiolect than do
content words (nouns, adjectives, verbs). These words have an under-the-radar
furtiveness to them, constituting traces to personal identity in the everyday use of
language.
The nding also showed that social and emotional factors change style. The
profession of president is conducive to the use of a specic pronoun. The question becomes: Is it characteristic of other professions? Is it found in certain types
of individuals? These are the questions that a corpus linguistic approach would
attempt to answer. They have obvious implications for the study of style and for
the connection of discourse patterns to external inuences.
Pennebakers work falls under the rubric of stylometry (although this is not
mentioned explicitly as such in it). He started researching the connection between
language forms and personality by looking at thousands of diary entries written
by subjects suffering through traumas and depressions of various kinds. Today,
with social media sites such as Facebook and Twitter the potential sample size
of diaries has become enormous and can be used to carry out relevant stylometric analyses very effectively. Pennebaker discovered, for instance, that pronouns
were actually indicators of improvements in mental health in many subjects. A recovery from a trauma or a depression requires a form of perspective switching
that pronouns facilitate. They are linguistic symptoms revealing the inner life of
the psyche. The use of function words also correlates with age, gender, and class
differences. Younger people, women, and those from lower classes seem more
Unauthenticated
Download Date | 6/6/16 9:43 PM
| 221
frequently to use pronouns and auxiliary verbs than do their counterparts. Lacking power, Pennebaker suggests, requires a more profound engagement with the
thoughts of others.
Perhaps the earliest example of the analysis of a text to determine its authenticity based on a stylistic analysis is that of Lorenzo Vallas 1439 proof that the
fourth-century document, Donation of Constantine, was a forgery. Valla based his
argument in part on the fact that the Latin used in the text was not consistent
with the language as it was written in fourth century documents. Valla thus used
simple logical reasoning. This kind of reasoning can now be more accurate given
the statistical techniques that corpus linguistics makes available. The basic ones
were laid out for the rst time by Polish philosopher Wincenty Lutosawski in
1890. Today, computer databases and algorithms are used to carry out the required
measurements.
With the growing corpus of texts on the Internet, stylometry is being used
more and more to study Internet texts and thus to rene its methods. The main
concept is that of writer invarianta property of a text that is invariant in the authors idiolect. To identify this feature, the 50 most common words are identied
and the text is then broken into word chunks of 5,000 items. Each is analyzed to
determine the frequency of the 50 words. This generates a unique 50-word identier for each chunk.
Unauthenticated
Download Date | 6/6/16 9:43 PM
222 | 4 Quantication
Perhaps the best known use of stylometric techniques is in the areas of forensic science and archeological-philological investigations of various kinds. Within
these elds the cognate technique of lexicometry is used, which is simply the measurement of the frequency of words within a text and then plotting the frequency
distribution of a given word in the speech of an individual, a specic genre of text,
and so on. This allows the analyst to determine how a lexical item is used and who
the probable user might be. Thus, lexicometry, like stylometry in general, is used
both as proof of identity and as a heuristic tool (Findler and Viil 1964).
A primary objective of corpus linguistics is to derive a set of general rules of
vocabulary use, sentence formation and text-construction on the basis of the automated analyses of language samples collected in natural speech environments.
Quirks 1960 survey of English usage and Kucera and Franciss 1967 computational analysis of a carefully chosen corpus of American English, consisting of
nearly 1 million words, are early examples of this kind of analysis. One of the
rst offshoots has been the preparation of dictionaries combining prescriptive information (how language should be used) and descriptive information (how it is
actually used).
Corpus linguistics has also produced several other research methods allowing for theoretical generalizations to be made on the basis of actual corpora of
data. Wallis and Nelson (2001) summarize the principles in terms of what they
call the 3A perspective: Annotation, Abstraction and Analysis. Annotation is the
application of a scheme to texts, such as a structural mark-up, parsing, and other
such rule-based frames; abstraction involves generating a mapping of the data
against the model or scheme used; and analysis is the statistical generalization
of the data in order to determine what models work best. In effect, corpus linguistics has become an important branch of linguistics for validating if certain
features or patterns in speech samples are relevant to explicating structural and
semantic aspects of a language, in addition to idiolectal characteristics. This adds
a signicant empirical component to linguistic theories and models.
Unauthenticated
Download Date | 6/6/16 9:43 PM
| 223
hardly a mere stylistic option to literal language. They found, overall, that people
used 1.80 novel and 4.08 frozen metaphors per minute of discourse. Altogether
this totals to 5.88 per minute of metaphorical speech. These ndings came from
transcripts of psychotherapeutic interviews, various essays, and even the 1960
Kennedy-Nixon presidential debates.
Graesser, Mio, and Millis (1989) analyzed the use of metaphor in six TV debates and news programs on the PBS Mac Neil/Lehrer News. They counted a total
of 504 unique metaphors in the six debates (repetitions were not counted), which
totaled 12,580 words; 12,580 divided by 504 is 24.96, hence an approximate rate
of one unique metaphor every 25 words. Steen et al. (2010) examined patterns
of metaphor usage in various kinds of discourse using techniques of corpus linguistics nding that on average one in every seven and a half words is related to
metaphor (Steen et al 2010: 780).
From these studies has come an impetus for developing algorithms to detect
metaphor in speech and to generate metaphorical discourse, not interpret it as discussed in the previous chapter (for example, Steen 2006, Renning and LnnekerRodman 2007, Shutkova 2010, see relevant studies in Diamantaras, Duch, and
Iliadis 2010). This has led some to put forth a neural theory of metaphor based on
several psycholinguistic and computational studies (for example, Feldman 2006).
Essentially, the extraction of metaphor from texts as well as its computational
modeling involves establishing a probabilistic relationship between concepts and
words via a statistical analysis of language data and then constructing the relevant algorithm and, nally, a third-party rating of the metaphors the model generated. This type of research was discussed in the previous chapter. The point here
is that it is still ongoing and can fall under several branches, including and especially, corpus linguistics.
With the advent of social media, the research focus has started to shift towards the use of gurative language in these media. Ngyuen, Nguyen, and Hwang
(2015) used a statistical method for the analysis of gurative language in tweets,
determining if they were sarcastic, ironic, or metaphorical tweets by extracting
two main features (actual term features and emotion patterns). Their study used
two datasets, the Trial set (1,000 tweets) and the Test set (4,000 tweets). Performance was evaluated by cosine similarity to gold standard annotations. These
are trustworthy corpora that are critical for evaluating algorithms that use annotations. Their proposed method achieved 0.74 on the Trial set. On the Test set, they
achieved 0.90 on sarcastic tweets and 0.89 on ironic tweets. This is a remarkable
nding, showing that in social media, metaphor, especially in its ironic forms, is
very dense.
Overall, the statistics on metaphor corroborate that metaphor is not an exception to literal language, but a common feature (if indeed a major feature) of
Unauthenticated
Download Date | 6/6/16 9:43 PM
224 | 4 Quantication
discourse. The point here, again, is that corpus linguistics in collaboration with
computational linguistics is useful in corroborating or refuting the theories of linguists.
Unauthenticated
Download Date | 6/6/16 9:43 PM
| 225
N!
x (1 )Nx
x!(N x)!
This distribution is based on the existence of equal outcomes. On the other hand,
the Bernoulli distribution describes the tossing of a biased coin (and similar experiments with unequal probabilities). The two possible outcomes are n = 0 (failure)
and n = 1 (success) in which the latter occurs with probability P and the former
with probability Q = 1 P, with 0 < P < 1, which has the probability density function:
P(n) = P n (1 P)1n
The Bernoulli distribution is the simplest discrete distribution, and it is the building block for other more complicated discrete distributions. The distributions of
a number of such types based on sequences of independent Bernoulli trials are
summarized in the following table (Evans, Hastings, and Peacock 2000: 32).
Table 4.2: Probability distributions
Distribution
Denition
binomial distribution
geometric distribution
negative binomial distribution
There are many other types of distributions that need not concern us here. The
point to be made is that probability distributions both describe and analyze
random events with equal and unequal elements involved. In other words, they
unravel hidden quantitative structure in randomness. Probability considerations
have also been applied to three areas that are relevant to the discussion here since
they, too, reveal different angles from which to view mathematical probabilities
and thus provide insights into mathematics and its description of the world.
The three are: the Monty Hall Problem, the Prosecutors Fallacy, and Bayesian
Inference.
Unauthenticated
Download Date | 6/6/16 9:43 PM
226 | 4 Quantication
For the sake of historical accuracy, it should be mentioned that the MHP was
similar to the three prisoners problem devised by Martin Gardner in 1959 (see
Gardner 1961). Of course, playing by the rules of probability may mean nothing
if one losesthat is, nds himself or herself in a wrong point in the probability
curve. However, knowing about the existence of the curve leads to many more insights into the nature of real events than so-called common sense. The MHP has
various implications that reach right into the power of probability theory to unravel hidden structure. Our assumption that two choices means 50-50 chances
is true when we know nothing about either choice. If we picked any coin then the
Unauthenticated
Download Date | 6/6/16 9:43 PM
| 227
chances of getting a head or tail are, of course, 50-50. But information is what
matters here and changes the game.
The MHP brings out the principle that the more we know, the better our decision will be. If the number of doors in the MHP were 100 this becomes even
clearer, as we saw. As Monty starts eliminating the bad candidates (in the 99 that
were not chosen), he shifts the focus away from the bad doors to the good ones
more and more. After Montys ltering, we are left with the original door and the
other door. In effect, the information provided by Monty does not improve our
chances. Here is where Bayesian Inference (BI) comes into play, which will be discussed below. BI allows us to generalize the MHP as follows, since it allows us to
re-evaluate probabilities as new information is added. The probability of choosing
the desired door improves as we get more information. Without any evidence, two
choices are equally likely. As we gather additional evidence (and run more trials)
we can increase our condence interval that A or B is correct. In sum:
1. Two choices are 50-50 when we know nothing about them.
2. Monty helps by ltering the bad choices on the other side.
3. In general, the more information the more the possibility of re-evaluating our
choices.
The MHP makes us realize how subsequent information can challenge previous
decisions. The whole scenario can be summarized with the main theorem in BI,
which is as follows:
The conditional probability of each of a set of possible causes for a given observed
outcome can be computed from knowledge of the probability of each cause and
the conditional probability of the outcome of each cause.
Unauthenticated
Download Date | 6/6/16 9:43 PM
228 | 4 Quantication
This is indeed fallacious reasoning. Consider a larger sample. In a city of, say,
2 million people, the number with matching hair samples will be 1/2,000
2,000,000 = 1,000. Now, the probability of the suspect being guilty is a mere
1/1,000. The PF was rst formulated by William Thomson and Edward Schumann
in 1987. They showed how real people in court situations made this mistake, including at least one prosecuting attorney. Thomson and Schumann also examined
the counterpart to the PF, which they called the Defense Attorneys Fallacy. The
defense attorney might argue that the hair evidence is worthless because it only
increases the probability of defendants guilt only by a small amount, 1/1,000,
especially when compared to the overall pool of potential suspects (2,000,000).
However, the hair sample is normally not the only evidence, and thus together
with the evidence it might indeed point towards the suspect.
The key here is, again, that the reasoning involves BI (discussed in the next
section). The fallacy lies in confusing P(E|I) with P(I|E), whereby E = evidence,
I = innocence. If the former is very high, people commonly assume that P(I|E) must
also be high. P(E|I) is the probability that the incriminating evidence would be
observed even when the accused is innocent, known as a false positive; and P(I|E)
is the probability that the accused is innocent, despite the evidence E. The fallacy
thus warns us that in the real world probability considerations are to be taken at
their face value and that they can provide true insights into situations.
P(X&Y)
P(Y)(P(Y) = 0)
Unauthenticated
Download Date | 6/6/16 9:43 PM
| 229
So, X = house that has cable and Y = house that has broadband. Given the percentages expressed in the problem, the answer is: P(X|Y) = 0.06/0.48 = 0.125 or
12.5 %. This analysis allows probabilities to be updated as events change. It is
called Bayesian Inference, after the Reverend Thomas Bayes in 1763, which he formulated as follows:
P(X)
P(X|Y) = P(Y|X)
P(Y)
BI has become part of QM and has been used, for example, to help solve the MHP
and Prosecutors Fallacy problems above, among many other very complex problems. Rather than use the closed reasoning system of formal logic, mathematics
has developed a more comprehensive approach to problems with Bayesian probabilistic reasoning. There are several ways to write the Bayesian formula, as follows, which can be used to shed light, for example, on the MHP:
P(Y|X) =
P(X|Y)P(Y)
P(X)
Suppose the contestant chooses A, then Monty has the choice of B and C to open
and this can be now represented as follows:
P(M B |A) =
1
2
P(M B |B) = 0
P(M B |C) = 1
Plugging these into the Bayesian formula, we get:
P(M B ) = P(M B |A) P(A) + P(M B |B) P(B) + P(M B |C) P(C)
=
1
2
1
3
+0
1
3
+1
1
3
1
2
Unauthenticated
Download Date | 6/6/16 9:43 PM
230 | 4 Quantication
The contestant can stick with his or her choice or switch to another door. If he or
she keeps door A, the probability of winning the car is as follows:
P(A|M B ) =
=
=
2
3
1
2
1
3
If the contestant switches to door C, then the probability of nding the car becomes:
P(M B )|C) P(C)
P(C|M B ) =
P(M B )
1 1
= 13
2
2
3
So, Bayesian Inference makes it clear why the answer is what it is. Now, what
does this all imply? Basically, that some events have a Bayesian structure and this
means that they are both part of chance (uncertainty) and external intervention.
Mathematics has thus formalized a situation that typies a whole stretch of real
living that we grasp intuitively.
Unauthenticated
Download Date | 6/6/16 9:43 PM
| 231
the event occurs. This, as we saw, is especially well suited to those dilemmas, illustrated by the MHP and the PF, which suggest that BI models are the most suitable
ones.
To elaborate on this point, lets return to Benfords Law. The law has, as discussed, logarithmic structure. In effect, Newcomb and Benford found that in a
large sample, the rst digit, d, obeys the following frequency law (Barrett 2014:
188):
P(d) = log10 [1 + 1/d], for d = 1, 2, 3, . . . , 9
The relevant probabilities are as follows:
P(1) = 0.30
P(2) = 0.18
P(3) = 0.12
P(4) = 0.10
P(5) = 0.08
P(6) = 0.07
P(7) = 0.06
P(8) = 0.05
P(9) = 0.05
This shows that the digit 1 is the most likely to occur. Does this pattern apply correspondingly to language, that is, to the frequency of rst-letters? I applied the
formula to a series of texts in Italian, using a simple concordance algorithm, and
found striking similarity, whereby the letter p has a 35 % chance of being the rst
letter in a word within a large-sized sample. I know of no work investigating this
possibility formally in Italian. But even anecdotal assessmentssuch as counting the letters that start words in a dictionaryseem to conform to the law. This
may hint at something deeper both within mathematics and language and their
connection to the real world.
Actually, it was Andrey Markov who ventured into this territory in 1913. He
wanted to determine whether he could characterize a writers style by the statistics
of the sequences of letters that he or she used. Barrett (2014: 237238) describes
Markovs intriguing experiment as follows:
Markov looked at an extract from Pushkin of 20,000 (Russian) letters which contained the
entire rst chapter and part of the second chapter of a prose poem, with its characteristic rhyming patterns Markov simplied Pushkins text by ignoring all punctuation marks
and word breaks and looked at the correlations of successive letters according to whether
they were vowels (V) or consonants (C). He did this rather laboriously by hand (no computers then!) and totaled 8,638 vowels and 11,362 consonants. Next, he was interested in
the transitions between successive letters: investigating the frequencies with which vowels
Unauthenticated
Download Date | 6/6/16 9:43 PM
232 | 4 Quantication
and consonants are adjacent in the patterns VV, VC, CV or CC. He nds 1,104 examples of
VV, 7,534 of VC and CV and 3,827 of CC. These numbers are interesting because if consonants
and vowels had appeared randomly according to their total numbers we ought to have found
3,033 of VV, 4,755 of VC and CV and 7,457 of CC. Not surprisingly, Pushkin does not write at
random. The probability VV or CC is very different from VC and this reects the fact that language is primarily spoken rather than written and adjacent vowels and consonants make for
clear vocalization. But Markov could quantify the degree to which Pushkins writing is nonrandom and compare its use of vowels and consonants with that of other writers, If Pushkins
text were random then the probability that any letter is a vowel is 8,638/20,000 = 0.43 and
that it is a consonant is 11,362/20,000 = 0.57. If successive letters are randomly placed then
the probability sequence VV being found would be 0.43 0.43 = 0.185 and so 19,999 pairs of
letters would contain 19,999 0.185 = 3,720 pairs. Pushkins text contained only 1,104. The
probability of CC is 0.57 0.57 = 0.325. And the probability of a sequence consisting of one
vowel and one consonant, CV or VC, is 2 (0.43 0.57) = 0.490
Leaving aside the fact that the results could pertain only to the Russian language,
the nding is still remarkable. The implication is that factors such as personal
style, genre, meaning, and other factors have an effect on form and structure and
this can be determined probabilistically. This raises a fundamental question: Why
are numbers and letters not evenly distributed in texts and lists? Moreover, why is
the distribution scale-invariant, that is, measurable with different units?
Markovs idea has been taken up within QM and the results have been very
interesting. It is now known as a statistical language model, which assigns a probability to a sequence of n words, w, using a probability distribution:
P(w1 , w2 , w3 , . . . , w n )
The idea is then to estimate the probability of certain words, letters, expressions,
and so on in different kinds of texts. This has had, as we saw in the previous chapter, various applications to NLP study. For example, in speech recognition, the
algorithm attempts to match sounds with word sequences, given instructions for
distinguishing homophones and synonymous forms. Texts, moreover, are ranked
on the probability of the query Q in the text.
This line of research has recently been employed in cryptography and is
called, generally, frequency analysis (FA). The basis of FA is the observation that
the letters of the alphabet are not equally common (as discussed for Italian above).
The following frequency patterns have been noted across large samples of English
texts (Elwes 2014: 345) (see Table 4.3).
Applying FA to texts thus allows us to identify, within a range of probability,
language affinities at the level of phonemic-morphemic structure, in an analogous
way that Benfords Law allows us to identify number structure.
Unauthenticated
Download Date | 6/6/16 9:43 PM
| 233
Letter
e
t
a
o
i
n
s
h
r
d
l
u
c
12.7
9.1
8.2
7.5
7.0
6.7
6.3
6.1
6.0
4.3
4.0
2.8
2.8
m
w
f
g
y
p
b
v
k
j
x
q
z
2.4
2.4
2.2
2.0
2.0
1.9
1.5
1.0
0.8
0.2
0.2
0.1
0.1
The fact that logarithmic laws can be extracted from seemingly random data is
a truly remarkable nding. Probability theory has categorized events into three
classes:
1. Independent: each event is not affected by other events
2. Dependent or Conditional: an event is affected by other events
3. Mutually Exclusive: events cannot occur at the same time
Independent events, such as coin tosses, indicate that the elements of the events
do not know the outcome (so to speak). Each coin toss is an isolated event. If we
toss a coin three times and it comes up tails each time, what is the chance of the
next one being a head or a tail? Well, it is or 0.50, just like any other toss event.
There is no link between the current coin toss and the previous ones. Independent
events occur throughout Nature and human systems. Connecting themthat is,
giving them meaningis a human activity, not a probabilistic one. The kind of
probability law that applies to this kind of situation can be called, simply, probability I (for independent), or PI.
PI explains why the so-called gamblers fallacy is indeed fallacious. Basically, it asserts that since we have had three tails, a head as the next outcome is
due and therefore likely to occur with the next coin toss. But, as the PI suggests,
this is not true. As Elwes (2014: 341) elaborates: The error is that this law makes
probabilistic predictions about average behaviour, over the long term. It makes no
predictions about the results of individual experiments.
Unauthenticated
Download Date | 6/6/16 9:43 PM
234 | 4 Quantication
ln x
dt
t
for x > 0 .
This means that e is the unique number with the property that the area of the
region bounded by the hyperbola y = 1/x and the x-axis, and the vertical lines
x = 1 and x = e is 1:
e
dx
= ln e = 1 .
x
1
Unauthenticated
Download Date | 6/6/16 9:43 PM
| 235
Choose a point on a graph at the beginning. What is the probability that a random
walker will reach it eventually? Or: What is the probability that the walker will
return to his starting point?
Plya proved that the answer is 1, making it a virtual certainty. He called it a
1-dimensional outcome. But in higher dimensions this is not the case. A random
walker on a 3-dimensional lattice, for instance, has a much lower chance of returning to the starting point (P = 0.34). This brings us back to the Markov chain
as a relevant model. Say that any stage of a random walk you ip a coin to decide in which direction to go next. In this case the type of analysis involved is of
the PI variety. The dening characteristic of a Markov chain is that the probability distribution at each stage depends only on the present, not the past. Markov
chains are thus perfect models for random walks and random events. The following gure (from Wikipedia) shows a walk whereby a marker is placed at zero on the
number line and a coin is ippedif it lands on heads (H) the marker is moved one
unit to the right (1); if it lands on tails (T), it is moved one unit to the left (1). There
are 10 ways of landing on 1 (by 3H and 2T), 10 ways of landing on 1 (2H and 3T),
5 ways of landing on 3 (4H and 1T), 5 ways of landing on 3 (1H and 4T), 1 way of
landing on 5 (5H), and 1 way of landing on 5 (5T) (see Figure 4.7).
In sum, probability constructs are much more than devices for determining
gambling outcomes. They appear to penetrate the structure of many events. These
are interconnected with Markov models that have formed the basis of formalism
in both mathematics and linguistics (previous chapter), thus bringing out the usefulness of the constructs, even if in constrained ways. As Elwes (2014: 342) points
out:
Markov chains are an excellent framework for modeling many phenomena, including population dynamics and stock-market uctuations. To determine the eventual behaviour of a
Markov process is a deep problem, as Plyas 3-dimensional random walk illustrates.
Unauthenticated
Download Date | 6/6/16 9:43 PM
Unauthenticated
Download Date | 6/6/16 9:43 PM
Figure 4.7: Markov chain analysis of the random walk problem (from Wikipedia)
1 1 3
1 1 3 1 3 3 5
Lands on
Outcome
HHHHH
HHHHT
HHHTH
1 1 3
HHHTT
HHTHH
HHTHT
HHTTH
HHTTT
3
HTHHH
HTHHT
HTHTH
HTHTT
HTTHH
HTTHT
HTTTH
HTTTT
THHHH
THHHT
THHTH
THHTT
THTHH
THTHT
THTTH
THTTT
TTHHH
TTHHT
TTHTH
TTHTT
TTTHH
TTTHT
TTTTH
Fifth flip
TTTTT
Fourth flip
Third flip
Second flip
First flip
236 | 4 Quantication
| 237
Unauthenticated
Download Date | 6/6/16 9:43 PM
238 | 4 Quantication
developed the concept of time depth, which became the founding technique in
glottochronology or lexicostatistics. Although there is a difference between the two
todaywith lexicostatistics used more generally for the measurement of inherent
tendencies in vocabularies and glottochronology for measuring the diversication
of related languages over timefor the present purposes it is sufficient to say that
they originated with the same purpose, namely to search for statistical regularity
in vocabulary systems and rates of replacement. Swadesh divided the origin and
evolution of language into four primary periods, in synchrony with the major ages:
1.
2.
3.
4.
Unauthenticated
Download Date | 6/6/16 9:43 PM
2.
3.
| 239
Culturally-biased words, such as the names of specic kinds of plants or animals, are
to be included in the core vocabulary only if relevant to the analysis at hand.
The number of cognates in the core vocabulary can be used to measure time depth,
allowing for sound shifts and variation. The lower the number of cognates, the longer
the languages are deemed to have been separated. Two languages that can be shown to
have 60 % of the cognates in common are said to have diverged before two which have,
instead, 80 % in common.
In 1953, Robert Lees modied Swadeshs formula for estimating time depth. Lees
assumed that the rate of loss in basic core vocabularies was constant. Allowing
for extraneous factors and interferences such as borrowing and social interventions (the maintenance of certain words for ritualistic reasons), Lees claimed that
the time depth, t, could be estimated within a normal probability distribution and
that it was equal to the logarithm of the percentage of cognates, c, divided by twice
the logarithm of the percentage of cognates retained after a millennium of separation, r:
log c
t=
2 log r
As in virtually all the cases discussed so far in this chapter, the key notion is,
again, that of logarithm. Although well known (and brought up frequently in this
chapter), it is worthwhile revisiting this key concept, since it is one of the main
ones in the development of probability theories, appearing, as we have seen, constantly in quantitative analyses of all kinds.
In mathematics a logarithm is the power to which a base, usually 10, must be
raised to produce a given number. If nx = a, the logarithm of a, with n as the base,
is x; symbolically, logn a = x. For example, 103 = 1,000; therefore, log10 1,000 = 3. To
get a sense of how Lees developed his formula, an analogy might be useful. Suppose we wanted to calculate the number of ancestors in any previous generation.
We have 2 parents, so we have 2 ancestors in the rst generation. This calculation
can be expressed as 21 = 2. Each of our parents has 2 parents, and so we have 2 2 =
22 = 4 ancestors in the second generation. Each of the four grandparents has 2 parents, and so we have 4 2 = 2 2 2 = 23 = 8 ancestors in the third generation. The
calculation continues according to this pattern. In which generation do we have
1,024 ancestors? That is, for which exponent x is it true that 2x = 1,024? We nd the
answer by multiplying 2 a number of times until we reach 1,024. But if we know
that log2 1,024 = 10, we can estimate the answer much more quickly.
So, like many other mathematical constructs the logarithm is a shortcut and
like all forms of economical compression has led to many discoveries. It recurs in
various mathematical functions, such as the constant e dened as the limit of the
expression (1 + 1/n)n as n becomes large without bound. Its limiting value is approximately 2.7182818285. As it turns out, e forms the base of natural logarithms;
Unauthenticated
Download Date | 6/6/16 9:43 PM
240 | 4 Quantication
Unauthenticated
Download Date | 6/6/16 9:43 PM
| 241
remained the most useful one for establishing core vocabularies more scientically, for the simple reason that knowledge about this protolanguage is detailed
and extensive (Renfrew 1987, Mallory 1989). Already in the nineteenth century,
linguists had a pretty good idea both of what PIE sounded like and what its core
vocabulary may have been. Speakers of PIE lived around ten to ve thousand years
ago in southeastern Europe, north of the Black Sea. Their culture was named Kurgan, meaning barrow, from the practice of placing mounds of dirt over individual graves. PIE had words for animals, plants, parts of the body, tools, weapons,
and various abstract notions.
The core vocabulary notion has been used to reconstruct other language families, to compare variants within them, and to determine time depth. The main
problem is that vocabulary substitution is not constant. For this reason, a number of linguists today reject glottochronology. But if the database is large enough
and the time depth long enough, glottochronology has proven to be highly accurate, suggesting that languages do indeed undergo change regularly (see, for
example, Currie, Meade, Guillon, and Mace 2013). The premise that languages,
like natural substances, are governed by an inbuilt radioactive decay is both
true and false. It is true because languages do indeed change naturally; it is false
because language is also a variable social tool that is subject to factors other than
internal evolutionary tendencies. And this is why probabilistic measures are more
useful than linear metrics not involving logarithmic functions. The assumptions
of glottochronology can be outlined as follows:
1. Vocabulary is replaced at a constant rate in all languages and this rate can be
measured and used to estimate how long ago the language existed and when
it broke off from its family tree branch. But this may not always be the case,
as some ambiguous results using glottochronology have shown.
2. A core vocabulary should encompass common or universal concepts: personal pronouns, kinship terms, anatomical parts, and so on; these may show
some variation (as for example Russian ruk, which covers the same referential domain as two English words, arm + hand); so, the rening of terms is
required according to language family.
3. In lexicostatistical analysis, it is only the cognates (words with a clear common etymological origin) that are used in the time depth measurement. The
larger the percentage of cognates, the more recently the two languages are
said to have separated. But often words that are borrowed from languages for
various reasons may affect the overall computation and this should be taken
always into account.
Lees actually obtained another value (from the one above) for the glottochronological constant using a 200 word vocabulary, obtaining a value of 0.805 with
Unauthenticated
Download Date | 6/6/16 9:43 PM
242 | 4 Quantication
Unauthenticated
Download Date | 6/6/16 9:43 PM
| 243
in organisms and natural substances is (all subject to decay); there is a sociohistorical component that affects change which falls outside of the Swadesh-Lee
paradigm.
Some mathematical linguists have, actually, confronted the main issues in
glottochronology. Van der Merwe (1966) split up the word list into classes that
showed an isomorphic rate of change. Dyen, James, and Cole (1967) allowed the
meaning of each word (realized by different lexemes) to have its own rate. Gleason (1959) and Brainerd (1970) modied the formulas so as to take into account
change in cognation, and Sankoff (1973) did the same for borrowing factors and
synonyms. Embleton (1986) used various simulation models to further rene the
mathematics. Gray and Atkinson (2003) developed a lexicostatistical model that
does not assume constant rates of change, showing that the dating of languages
is still a viable method that can used to adjust previous estimates using SwadeshLees formulas. Similarly, Starostin (1999) made adjustments that allow for the
elimination of borrowing and other accidental interferences in the rate of change.
Starostins proposals are very intriguing and seemingly viable ones. These include
the following:
1. Since loanwords, words borrowed from one language into another, are a disruptive factor in the calculations, it is relevant to consider the native replacement of items by items from the same language. The failure to do this was a
major reason why Swadeshs original estimation of the replacement rate was
under 14 words from the 100-wordlist per millennium, when the real rate is,
actually, much slower (around 5 or 6). Introducing this correction into the formula effectively cancels out counter-arguments based on the loanword principle. A basic wordlist includes generally a low number of loanwords, but it
does bring down the time depth calculations as indicated.
2. The rate of change is not really constant, but actually depends on the time period during which the word has existed in the language (in direct proportion
to the time elapsedthe so-called aging of words, understood as gradual
erosion of the words primary meaning under the weight of acquired secondary ones).
3. The individual items on the 100 wordlist have different stability rates (for instance, the lexemes for the pronoun I generally have a much lower chance
of being replaced than a word for, say, yellow).
Starostins formula takes the above variables into account, including rate of
change and individual stability quotients:
t=
ln (c)
Lc
Unauthenticated
Download Date | 6/6/16 9:43 PM
244 | 4 Quantication
In this formula, Lc denotes the gradual slowing down of the replacement process due to different individual rates (the less stable lexemes are the rst and the
quickest to be replaced), whereas the square root represents the reverse trend,
namely the acceleration of replacement as items in the original wordlist age and
become more apt to shift their meaning. This yields more credible results than the
Swadesh-Lees one. More importantly, it shows that glottochronology can really
only be used as a serious mathematical tool on language families whose phonology is known.
Dyen, Kruskal and Black (1992) used an Indo-European database with 95 languages, nding that glottochronological approaches are rather successful in
predicting time depth. Ringe, Warnow and Taylor (2000) used a quantitative
analysis on 24 Indo-European languages, involving 22 phonological units, 15 morphological structures and 333 lexical ones, again obtaining fairly accurate results
when mapped against known historical factors (such as when the societies
emerged as autonomous entities). Gray and Atkinson (2003) examined a database
of 87 languages with 2,449 lexical items, incorporating cognation research. Other
databases have been drawn up for African, Australian and Andean language
families, among others. As linguists acquire more and more information on the
nature of core vocabularies and as research in quantication methods becomes
evermore accurate, good glottochronological analyses are becoming more and
more a reality, thus validating Swadeshs pioneering work.
But, then, how do we reliably recognize distant relatives whose spellings have
drifted far apart? Why should we even presume that the tree of language is a
tree, as opposed to a sort of network, given that lexical borrowings and language
admixtures are common occurrences? Over the years, historical linguists have
separately tackled such questions with steadily increasing mathematical sophistication. One has been supplanting Swadeshs time depth method with cladistic
techniques that account for each word to model the actual process of evolution.
Cladistics is a method of classifying animals and plants according to the proportion of measurable characteristics that they have in common. It is assumed
that the higher proportion of characteristics that two organisms share, the more
they have recently diverged from a common ancestor. In other words, cladistics is
the counterpart to lexicostatistics, but provides more sophisticated mathematical
models that seem to apply as well to language diversication. Gray and Atkinson (2003) have applied sophisticated computational tools (maximum-likelihood
models and Bayesian Inference techniques) for dealing with variable rates. By
breathing new life into glottochronology, research paradigms such as these are
stimulating the cross-fertilization of ideas.
Gray and Atkinsons paper dates the initial divergence of the Indo-European
language family to around 8,700 years ago, with Hittite the rst language to split
Unauthenticated
Download Date | 6/6/16 9:43 PM
| 245
off from the family tree. They support their theory by taking into account the fact
that Indo-European originated in Anatolia and that Indo-European languages
were transported to Europe with the spread of agriculture. They argue against
the alternative Kurgan hypothesis, which claims that the Kurgan culture of the
Steppes was Indo-European speaking. They used an existing database of core
words compiled by Dyen (discussed above) with software developed in genetics
to construct a family tree and assign dates to it. Their approach is similar to
glottochronology but also different in that it uses new computational-algorithmic
methods to construct the tree and compute the dates. The study thus avoids many
of the problems that frequently arise in work of this type. However, like most
studies in glottochronology the method does not take cultural inuences into
account, which interfere with the regularity of change in language.
Unauthenticated
Download Date | 6/6/16 9:43 PM
246 | 4 Quantication
The gist of the work in quantitative linguistics generally shows that economical forces were at work in language evolutiona principle elaborated later by Martinet (1955) as the PE, as discussed throughout this chapter. Various other theories
have, of course, been put forward to explain why languages change so predictably.
Ignoring the alternatives for the sake of argument, what stands out is the fact
that the PE covers so many phenomena, such as time depth and cognation factors. The general implication of this virtual law of change is intertwined with the
PLE, namely that reducing the physical effort involved in speaking has an effect
on language structure. Economy is thus tied to effort and efficiency. Compressed
forms (abbreviations, for instance) and systems (syntax-versus-morphology, for
example) lead to efficiency in use. The same applies to mathematics. There are
many episodes in the history of mathematics whereby someone comes up with
an economical method to represent a cumbersome task, as we saw with exponential notation, which, later, leads to discoveries forming the foundations of a new
branch.
Actually, for the sake of historical accuracy, it should be mentioned that the
PE was both implicit and somewhat explicit in linguistics before Martinet. As mentioned above the rst mention of the PE was in Whitney 1877, where it was called
the Principle of Economy. In 1939, Joseph Vendryes (1939: 49) discussed the presence of economic forces in language as did Hjelmslev (1941: 111116) shortly thereafter. Interestingly, Vendryes saw economy as operative not only in phonology, but
in other areas of language, without however seeing a reorganizational system involved among the various levels (discussed above). He also articulated a version of
the PLE by pointing out that the formation of sentences also seems to be regulated
by economy. Basically, the PE posits that a language does several things at once:
(1) it increases its communicative rapidity rate through compression of form, (2) it
gets rid of superuous material, an idea that was already known in the nineteenth
century and articulated eloquently by Paul Passy in 1890, who also claimed that a
language gives prominence to every necessary element in the system, discarding
or marginalizing the other elements.
Passy was probably inuenced by the ideas of Whitney (1877) and Henry
Sweet (1888) both of whom noted two patterns in language change: (1) the
dropping of superuous sounds and (2) easing the transition from one sound
to another via assimilatory processes. The Romance language family was used
as a litmus test to evaluate the accuracy of Sweets and others observations.
Consider the following cognates in three Romance languagesItalian, French,
and Spanish. The Latin words from which they derived are provided as well:
Unauthenticated
Download Date | 6/6/16 9:43 PM
| 247
Italian
French
Spanish
nocte(m) (night)
octo (eight)
tectu(m) (roof )
notte
otto
tetto
nuit
huit
toit
noche
ocho
techo
Unauthenticated
Download Date | 6/6/16 9:43 PM
248 | 4 Quantication
language. In 1988, Valter Tauli followed up on this dichotomy suggesting that the
forces driving language change include the following (Tauli 1958: 50):
1. a tendency towards an economy of effort (the PE)
2. a tendency towards clarity (so as to avoid ambiguity)
3. emotional impulses
4. aesthetic tendencies
5. social impulses
Various other studies have been published since Martinet. Virtually all take into
account external factors in the operation of the PE. Interestingly, generative grammars have also given their particular take on the PE. In the Minimalist Program
economy seems to be inherent in how the rules show economy of form through
various processes called generally optimality theory. This is a general model of
how grammars are structured. For example, if a vowel appears only when it is
needed for markedness reasons, in words that would otherwise be without vowels
and in clusters that would otherwise violate certain phonological rules, the process is called economical because it follows from intrinsic properties of optimality
rather than stipulated economy principles. This is, of course, a different approach
to economy, but even in formalist grammars, the concept itself is seen as cropping
up in various places and is thus used to explain tendencies in language.
4.6 Overview
In general, QM has allowed linguists and mathematicians to discover principles
of structure that would have otherwise remained unknown. And it has suggested
that intrinsic forces are at work in the evolution of both mathematics and language. These have been subsumed under two principles herethe PE and the PLE.
The ideas discussed in this chapter, however, bring us back to the most fundamental question of all: Are they truly real or are they simple constructs that
match our views? One could say from the research in QM that the brain compacts
the information that it uses frequently, making it an efficiency-seeking organ. But
there are many dangers involved in correlating the brain with its products and
inferring from the latter what goes on in the former. Nonetheless, had the brain
had a different structure, the PLE might not have manifested itself in language
and mathematics.
Of course, a way around the brain-as-mind-as-brain vexatious circularity is to
eliminate the distinction between inner (mental) and observable (behavioral) processes and to create articial models of the processes in computer software. The
most radical AI researchers, like Ray Kurzweil (2012), view this as not only plau-
Unauthenticated
Download Date | 6/6/16 9:43 PM
4.6 Overview
249
sible but inevitable. While this seems to be a modern premise, it really is no more
than a contemporary version of an age-old belief that the human mind is a machine programmed to receive and produce information in biologically-determined
ways. The new impetus and momentum that this belief has gained has rekindled
the mind-body problem in a modern form: Is cognition a derivative of individual
experience? Or is it inherent in innate mental structures independently of bodily
processes and individual feelings?
When a mathematician solves or proves an intractable problem by essentially
reducing it to a formula, an equation, or a proof, the way in which it is done puts
the brains capacities on display. But this cannot explain the process. The concept
of ergonomicsa term coming out of psychology and sociology in relation to the
design of workplaces so that they may provide optimum safety and comfort and
thus enhance productivity ratesmay be of relevance here. This notion has been
extended to the study of biological systems and to the study of language.
The term was introduced in 1857 by Woiciech Jastrzbowski and then again
in 1949 by British psychologist Hywel Murrell. The basic premise of ergonomics is
that the design of things tends towards maximum efficiency. A simple, yet still profound demonstration of this is the Pythagorean theorem. The Egyptians had discovered that knotting and stretching a rope into sides of 3, 4, and 5 units in length
produced a right triangle, with 5 the longest side (the hypotenuse). The Pythagoreans were aware of this discovery. It was an ergonomic one. The aim of the goal was
to show that it revealed a general structural patternan inherent PE in the world.
Knotting any three stretches of rope according to this patternfor example, 6, 8,
and 10 unitswill produce a right triangle because 62 + 82 = 102 (36 + 64 = 100). As
the historian of science, Jacob Bronowski (1973: 168) has insightfully written, we
hardly recognize today how important this demonstration was. It could no longer
be attributed simply to simple invention. It was a discovery that reached out into
the world:
The theorem of Pythagoras remains the most important single theorem in the whole of mathematics. That seems a bold and extraordinary thing to say, yet it is not extravagant; because
what Pythagoras established is a fundamental characterization of the space in which we
move, and it is the rst time that it is translated into numbers. And the exact t of the numbers describes the exact laws that bind the universe. If space had a different symmetry the
theorem would not be true.
Unauthenticated
Download Date | 6/6/16 9:43 PM
250 | 4 Quantication
the biological realm, research has shown that the human body is designed to seek
maximum efficiency in locomotion and rate of motion. The body is an ergonomic
structure. From this, we are apparently impelled to design our products and artifacts ergonomicallyfrom handles to the design of chairs for maximum comfort.
Language and mathematics too would fall under the rubric of ergonomic structure.
The overall premise that derives from the work in QM is that mathematics and
language are subject to many of the same laws of biological and physical systems.
When mathematics and language go contrary to these laws, it is for social, creative, and inventive reasons. And this happens often, since social and imaginative
forces are as powerful, if not more so, than inbuilt psycho-biological ones. These
allow us to step outside the laws of evolutionary thrust and of the normal distribution. Any model, including a logarithmic one, is an interpretation. But then
why does Benfords Law apply no matter who devised it (namely Newcomb and
Benford)? This takes us back to whether or not mathematics is discovered or invented, to which there is no clear-cut answer. It is both and the interplay between
invention and discovery is what gives principles such as the PE one some validity.
More accurately, discovery occurs through abductive processes but it needs to be
rened to make it stable and viable. Discovery involves a lot of information; theorization steps in to eliminate the superuous information and rene the discovery
to t specic needs and ideas.
Clearly, there is a connection between mathematics, language, the mind, and
reality. But is this connection of our own making or is it a reex of our need to
understand the world? In order to grasp the hermeneutical nature of discovery in
mathematics, this is perhaps the most crucial question of all. It is relevant to note
that What is mathematics? was the title of a signicant book written for the general
public by Courant and Robins in 1941. Their answer to this question is indirect
that is, they illustrate what mathematics looks like and what it does, allowing us
to come to our own conclusions as to what mathematics is. And perhaps this is the
only possible way to answer the conundrum of mathematics. The same can be said
about music. The only way to answer What is music? is to play it, sing it, or listen
to it. And of course the answer to What is language? is to speak it. A year before,
in 1940, Kasner and Newman published another signicant popular book titled
Mathematics and the imagination. Again, by illustration the authors show how
mathematics is tied to imaginative thought. We come away grasping intuitively
that mathematics is both a system of logic and an art, allowing us to investigate
reality. Lakoff and Nez also approached the topic of what is mathematics in
2000, as mentioned. But rather than illustrate what mathematics does, they made
the claim that it arose from the same conceptual system that led to the origin of
language, being located in the same areas of the brain as language. So, maybe one
Unauthenticated
Download Date | 6/6/16 9:43 PM
4.6 Overview
251
can do more than just illustrate what mathematics or language is; one can truly
understand it by comparing the two. Lakoff and Nez are on the right track, as
will be discussed in the next chapter. As we saw, the two scholars claimed that
mathematical notions and techniques such as proofs are interconnected through
a process of blending. This entails taking concepts in one domain and fusing them
with those in another to produce new ones or to simply understand existing ones.
Changing the blends leads to changes in mathematical structure and to its development. Like language, no one aspect of mathematics can be taken in isolation.
So, what is reality and what is the connection between mathematics, language and reality? Is the calculus just a means of coding reality and then using it,
like a map, to explore reality further? There is no doubt that the calculus is a symbolic artifact and that it allows us to engage with reality. The connection between
symbols and the reality they represent is a dynamic one, with one guiding the
other. By way of conclusion, consider the use of quantication in science. Science
is not based on certainty, but on guesses, theories, paradigms, and probable outcomes. It obeys the same laws of probability that mathematics and language do. To
make their hunches useable or practicable, scientists express them in mathematical language, which gives them a shape that can be seen, modied, and tested.
In some ways, science is the referential domain of mathematics.
It was in the early 1900s when scientists started looking beyond classical
Newtonian physics, discovering gaps within it, and thus looking for new interpretations of observed events. The reason was that the observations and the mathematical equations were out of kilter. Max Planck published a new theory of energy
transfer in 1900 to explain the spectrum of light emitted by certain heated objects,
claiming that energy is not given off continuously, but in the form of individual units that he called quanta. Planck came to this theory after discovering an
equation that explained the results of these tests. The equation is E = N h f, with
E = energy, N = integer, h = constant, f = frequency. In determining this equation,
Planck came up with the constant (h), which is now known as Plancks constant.
The truly remarkable part of Plancks discovery was that energy, which appears to
be emitted in wavelengths, is actually discharged in small packets (quanta). This
new theory of energy revolutionized physics and opened the way for the theory of
relativity.
In 1905, Einstein, proposed a new particle, later called the photon, as the
carrier of electromagnetic energy, suggesting that light, in spite of its wave nature, must be composed of these energy particles. The photon is the quantum
of electromagnetic radiation. Although he accepted Maxwells theory, Einstein
suggested that many anomalous experiments could be explained if the energy
of a Maxwellian light wave were localized into point-like quanta that move independently of one another, even if the wave itself is spread continuously over
Unauthenticated
Download Date | 6/6/16 9:43 PM
252 | 4 Quantication
space. In 1909 and 1916, he then showed that, if Plancks law of black-body radiation is accepted, the energy quanta must also carry momentum, making them
full-edged particles. Then, in 1924, Louis de Broglie, demonstrated that electrons could also exhibit wave properties. Shortly thereafter, Erwin Schrdinger
and Werner Heisenberg, devised separate, but equivalent, systems for organizing the emerging theories of quanta into a framework, establishing the eld of
quantum mechanics. The relevant point to be made is that these systems were
all expressed in mathematical language and it was because of this language that
further ideas crystallized to make quantum physics a reality.
Quantum mechanics provides a different view of the atom than classic
physics. The discovery that atoms have an internal structure prompted physicists
to probe further into these tiny units of matter. In 1911, Ernest Rutherford developed a model of the atom consisting of a spherical core called the nucleus, made
up of a dense positive charge, with electrons rotating around this nucleus. Bohrs
proposal was a modication of this model. In 1932, James Chadwick suggested that
the atomic nucleus was composed of two kinds of particles: positively charged
protons and neutral neutrons, and a few years later in 1935, Hideki Yukawa, proposed that other particles, which he called mesons, made up the atomic nucleus.
After that, the picture of the atom grew more complicated as physicists discovered
the presence of more and more subatomic particles. In 1955, Owen Chamberlain
and Emilio Segre discovered the antiproton (a negatively charged proton), and in
1964, Murray Gell-Mann and George Zweig, proposed the existence of so-called
quarks as fundamental particles, claiming that protons and neutrons were composed of different combinations of quarks. In 1979, gluons (a type of boson) were
discovered as carrying a powerful strong force. This force, also called the strong
interaction, binds the atomic nucleus together. In 1983, Carlo Rubbia discovered
two more subatomic particlesthe W particle and the Z particle, suggesting that
they are a source of the weak force, also called the weak interaction.
Today, physicists believe that six kinds of quarks exist and that there are
three types of neutrinos, particles that interact with other particles by means of
the weak nuclear interaction. The last kind of neutrino to be directly detected
is known as the tau-neutrino. There may be an underlying unity among three
of the basic forces of the universe: the strong force, the weak force, and the
electromagnetic force that holds electrons to the nucleus.
Now, the point to the above historical excursion into quantum physics is that
the discoveries related to it dovetail perfectly with the rise of group theory, matrix
theory, and other modern-day mathematical theories, forming the basis of quantum physics. The question of which came rst, the physics or the mathematics, is a
moot one. In 1927, Heisenberg discovered a general characteristic of quantum mechanics, called (as is well known) the uncertainty principle. It is to physics, what
Unauthenticated
Download Date | 6/6/16 9:43 PM
4.6 Overview
253
Gdels undecidability theory is to logic and mathematics. According to this principle, it is impossible to precisely describe both the location and the momentum
of a particle at the same instant. For example, if we describe a particles location
with great precision, we must give its momentum in terms of a broad range of numbers. In effect, we must force the electron to absorb and then re-emit a photon so
that a light detector can see the electron. We know the precise location of both
the photon source and the light detector. But even so, the momentum spoils our
attempt: The absorption of a photon by the electron changes the momentum. The
electron is therefore in a new direction when it re-emits the photon. Thus, detection of the re-emitted photon does not allow us to determine where the electron
was when it absorbed the initial photon.
Such phenomena nd their codication in the language of functional analysis,
a research area within mathematics that was inuenced in large part by the needs
of quantum mechanics, which can model the values of physical observations such
as energy and momentum, considered to be Eigen values, involving the mathematics of continua, linear operators in Hilbert space, and the like. Essentially,
functional analysis deals with functionals, or functions of functions. It is the result
of conceptual blending whereby diverse mathematical processes, from arithmetic
to calculus, are united because they exhibit very similar properties. A functional,
like a function, is a relationship between objects, but the objects may be numbers,
vectors, or functions. Groupings of such objects are called spaces. Differentiation
is an example of a functional because it denes a relationship between a function
and another function (its derivative). Integration is also a functional. Functional
analysis and its osmosis with quantum mechanics shows how discoveries have
always been made, by the analogical blending of previous ideas with new ones.
Classical mechanics, special relativity, general relativity, and quantum mechanics, all utilize the concept of symmetry in their mathematical forms, such as
the symmetry related to rotations in space. A guiding assumption is that fundamental physical laws should look the same no matter which direction one looks.
A physicist can describe this property by saying that the laws are invariant under rotation. But invariance under rotation presupposes a role for the observer.
The variable direction to be used as a result of a rotation is the direction that the
observer chooses. A translation in space is dened as a shift in the measurement
system produced by placing the origin for measurement at a different location. It is
anticipated that the fundamental laws will look the same after a translation. This
property is called invariance under translation. It is of course a theoretical constraint in the minds of physicists. But the concept of invariance has been found to
occur in actual spaces. This kind of symmetry also occurs at the subatomic level.
The mathematical properties of the rotation group, together with the group of
Unauthenticated
Download Date | 6/6/16 9:43 PM
254 | 4 Quantication
Unauthenticated
Download Date | 6/6/16 9:43 PM
5 Neuroscience
Tears come from the heart, not from the brain.
Leonardo da Vinci (14521519)
Introductory remarks
In chapter 2, abduction was discussed as guiding the development of deductive
proofs, that is, in allowing the proof-maker to infer what is needed along the
sequence of statements that make up the proof. The argument was made that,
although proofs show logical structure, especially in the way they are laid out
through a concatenation of statements, the selection of some of the statements
does not come automatically from the concatenation structure itself, but rather
from insights that are akin to metaphorical hunches in language. The source of
the abductive insights has been called a neural blending process, which involves
amalgamating something in one region of the brain with a task at hand so that
it can be better understood and carried out. The concept of blending thus sheds
light on how the two parts of cognitionabduction (imagination) and logic
constitute a single system that has been called interhemispheric.
In many of the theories and models discussed in previous chapters, the assumption is that they reveal what mathematics and language are all about. Today,
cognitive scientists look to validate these by turning to experimental methods
made available by neuroscience. Whether or not there is a continuity (or ontological osmosis) between brain and mind, the fact is that it is assumed to be there
by theorists. Neuroscience has thus been used by formalists, computationists,
and cognitivists alike to justify their theories, having become an intrinsic litmustesting tool, so to speak, of both linguistics and mathematics. It may well be the
central disciplinary link between the two.
The question that this train of thought raises is a rather deep epistemological
one: As interesting as it is, does a theory really explain mathematics, language, or
anything else, for that matter? Or is it nothing more than a gment of the fertile
imaginations of linguists and mathematicians, working nowadays with computer
scientists and statisticians? It was Roman Jakobson (1942) who was among the rst
to deal with this question, claiming that neurological research is the only one that
can be used in any empirical sense to evaluate the validity of theories and constructs (see also Jakobson and Waugh 1979). Modern-day neuroscience has taken
its cue from Jakobsons suggestion, expanding the research paradigm considerably with sophisticated brain-imaging techniques. Since at least the mid-1990s,
Unauthenticated
Download Date | 6/6/16 9:43 PM
256 | 5 Neuroscience
Unauthenticated
Download Date | 6/6/16 9:43 PM
257
and another says that it serves mainly as a denition for computation. Support
for the validity of the thesis comes from the fact that every realistic model of computation, yet discovered, has been shown to be equivalent. The thesis has been
extended to the principle of computational equivalence (Wolfram 2002), which
claims that there are only a small number of intermediate levels of computing
power before a system is universal and that most natural systems are universal.
The relevant point here is that the thesis was believed to mirror what happens in
the brain.
It is the work of McCulloch and Pitts in 1943 that can be called neuroscientic
in the modern sense. The researchers aimed to show that a logical model of nervous activity was consistent with the logical calculus. Using articial models of
neurons connected together as if in a network, the researchers claimed to show
that the brain produced highly complex patterns in the same way as their models.
Their contribution led to the development of articial neural networks (ANNs),
which, as we saw, are networks designed to model biological neurons. McCulloch
and Pitts also argued that the features of the network could be expanded to allow
it to learn from new inputs. Then in 1957, Frank Rosenblatt added the notion of
the perceptron to ANNs, whereby inputs are processed by association units programmed to detect the presence of specic features in the inputs.
This type of research was an early version of computational neuroscience,
an orientation within neuroscience attempting to model formalist and computational models of language and mathematics in computer software designed
to mimic biological software. It did not take hold until the late 1950s when AI
emerged as a branch of computer science and psychology. By the early 1960s,
Hilary Putnam (1961) laid out a research paradigm that would incorporate the
notion of Turing machines into the study of mind. From this a debate emerged between cognitivists and neural network theorists, laying the foundation of another
orientation, sometimes called cognitive neuroscience, as a branch of cognitive
science and a key tool in the investigation of the relation between gurative
language and mathematics.
Unauthenticated
Download Date | 6/6/16 9:43 PM
258 | 5 Neuroscience
the behavior of cells and circuits in the brain. In our case, this involves mainly
exploring the computational principles governing the processing of language and
mathematics, including the representation of information by spiking neurons, the
processing of information in neural networks, and the development of algorithms
simulating linguistic and mathematical learning.
Computational neuroscience (CN) focuses on the use of formalist concepts
and techniques in the design of experiments and algorithms for simulating the behavior of neurons and neural networks during processing states. Techniques such
as nonlinear differential equations and applied dynamical systems are applied to
neuronal modeling. The idea here is to understand a natural phenomenon via its
computational counterpart. As discussed in the third chapter, this approach has
led to many interesting insights.
In an excellent overview of the eld Silva (2011) looks at the validity of the
basic CN approach which, as he correctly asserts, genuinely does aim to understand how the brain and related structures represent and process information via
computational modeling, which attempts to replicate observed data in order to
penetrate the dynamics of brain functioning that is inherent in the data. So, unlike
straightforward computationism (chapter 3), CN starts with experimental observations or measurements, rather than a pure theoretical framework, from which
it constructs a computer (mathematical) model aiming to furnish a set of rules
that are capable of explaining (simulating) properties of the experimental observations, using typical statistical-inferential techniques such as those described in
the previous chapter, and thus setting up a relation paradigm between the data
and the underlying molecular, cellular, and neural systems that produced it.
This whole approach, Silva points out, begins with an inference about how
the data t together and what are the likely rules that govern the patterns within
it. This is, of course, an abductive process on the part of the neuroscientist (which
seems not to be acknowledged as such in CN) and thus essentially tells us more
about his or her theoretical stance than about the data in any objective sense.
Indeed, in this approach there are many uncontrollable variables, including the
amount and quality of the data and how it was collected, which may limit the applicability of the model. The inference (abduction) is then translated into a quantitative algorithm framework which involves expressing the patterns observed in
the data in terms of differential equations or state variables that evolve in space
or time. The translation depends on the abilities of the translator and his or her
particular preferences. The model, Silva admits, is thus nothing more than an
informed guess, and this is where testing it out by carrying out numerical simulations becomes a critical aspect of the whole approach. CN thus seeks answers
to neurological questions in terms of its models compared against the actual experimental data.
Unauthenticated
Download Date | 6/6/16 9:43 PM
259
Unauthenticated
Download Date | 6/6/16 9:43 PM
260 | 5 Neuroscience
The next step is to set up a model that says something about the set of axioms.
While admitting that this is itself a guess, Silva emphasizes that it is the result
of much trial and error, making it a plausible conjecture that can be tested empirically. This allows CN to break free of the inbuilt limitations of mathematical
models, such as differential equations and allows it the latitude to write down a
set of axioms and to prove a conjecture from those axioms using whatever mathematics is required. Returning to his example, Silva goes on to make the following
relevant observation:
Again, consider the example from above regarding the signicant resources and time being
put into deciphering the structural connectome of the brain. This massive amount of accumulating data is qualitative, and although everyone agrees it is important and necessary to
have it in order to ultimately understand the dynamics of the brain that emerges from the
structural substrate represented by the connectome, it is not at all clear at present how to
achieve this. Although there have been some initial attempts at using this data in quantitative analyses they are essentially mostly descriptive and offer little insights into how the
brain actually works. A reductionists approach to studying the brain, no matter how much
we learn and how much we know about the parts that make it up at any scale, will by itself never provide an understanding of the dynamics of brain function, which necessarily
requires a quantitative, i.e., mathematical and physical, context. The famous theoretical
physicist Richard Feynman once wrote that people who wish to analyze nature without
using mathematics must settle for a reduced understanding. No where is this more true
than in attempting to understand the brain given its amazing complexity.
Above all else, it is in understanding how we create new language and new mathematics that CN has never really produced satisfying hypotheses. But scholars
such as Sandri (2004) make the argument that creativity can also be modeled
algorithmicallyit all depends on the complexity of the model. It was Turing who
discussed a system whose computational power was beyond that of his nite state
machine (Turing Machine). Turings challenge was an early impetus for developing a so-called hybrid computational system in CN, based on neural networks and
brain automata, which can go beyond the Turing Machine (Sandri 2004: 9). The
model, Sandri asserts, would need to simulate highly integrating activities, like
feedback and novelty-making processes, which are understood as processes that
involve innitary procedures, ending up in a complex information network, and
computational maps, in which both digital, Turing-like computation and continuous, analog forms of calculus are expected to occur (Sandri 2004: 9).
While this seems to be a signicant new trend in CN, it still involves a degree of
circularitythat is, creativity needs to be dened precisely beforehand in order to
develop hybrid algorithms, and this takes us back to the set of problems described
above by Silva. CN is thus in a Catch-22 situation. In a follow-up co-authored
paper (Toni, Spaletta, Casa, Ravera, and Sandri 2007), Sandri reiterates his view
Unauthenticated
Download Date | 6/6/16 9:43 PM
261
that it is the hybrid development of neural networks and brain automata that will
expand the computational power of models. The authors support their view as
follows (Toni et al. 2007: 67):
The cerebral cortex and brain stem appears primary candidate for this processing. However, also neuroendocrine structures like the hypothalamus are believed to exhibit hybrid
computational processes, and might give rise to computational maps. Current theories on
neural activity, including wiring and volume transmission, neuronal group selection and
dynamic evolving models of brain automata, bring fuel to the existence of natural hybrid
computation, stressing a cooperation between discrete and continuous forms of communication in the CNS. In addition, the recent advent of neuromorphic chips, like those to
restore activity in damaged retina and visual cortex, suggests that assumption of a discretecontinuum polarity in designing biocompatible neural circuitries is crucial for their ensuing
performance. In these bionic structures, in fact, a correspondence exists between the original anatomical architecture and synthetic wiring of the chip, resulting in a correspondence
between natural and cybernetic neural activity. Thus, chip "form" provides a continuum essential to chip function. We conclude that it is reasonable to predict the existence of hybrid
computational processes in the course of many human, brain integrating activities, urging
development of cybernetic approaches based on this modelling for adequate reproduction
of a variety of cerebral performances.
The main point made by Sandri et al. is a valid one, of course. But this was the path
followed by the connectionists, as will be described below. Zyllerberg, Dehaene,
Roelfsma, and Sigman (2011) also argue for hybridity, but with a slightly different slant. Their objective is to model neuronal mechanisms by which multiple
such operations are sequentially assembled into mental algorithms. We outline a
theory of how individual neural processing steps might be combined into serial
programs. Their solution is a hybrid neuronal device, whereby each step involves parallel computation that feeds a slow and serial production system. Thus,
production selection is mediated by a system of competing accumulator neurons
that extends the role of these neurons beyond the selection of a motor action.
An experiment by Weisberg, Keil, Goodstein, Rawson, and Gray (2008), however, seems to show that humans do not process information in the same way
as the algorithms of neuroscientists do. The researchers tested peoples abilities
to critically consider the underlying logic of a computational explanation, giving
nave adults (those with no knowledge of neuroscience), students in a neuroscience course, and neuroscience experts brief descriptions of psychological phenomena followed by one of four types of explanation. The actual information was
irrelevant to the logic of the explanation, as conrmed by the expert subjects. The
subjects evaluated good explanations as more satisfying than bad ones. But those
in the two non-expert groups additionally judged that explanations with logically
irrelevant information as more satisfying than those without. The neuroscience
Unauthenticated
Download Date | 6/6/16 9:43 PM
262 | 5 Neuroscience
5.1.2 Connectionism
Connectionism was an early counter-trend to computational neuroscience that
continues to provide to this day a serious theoretical alternative to CN constructs
such as neural network theory. The connectionist movement started with Russian
neuroscientist Alexander Luria, who in 1947 suggested that the neural processing of information involved interconnectivity in functional task distribution that
spanned the entire brain. Adopting Jakobsons (1942) idea that the selection of
linguistic units and their combination were neurologically complementary processes, Luria showed that the latter was impaired by lesions in the anterior areas
of the language centers, whereas the former was disrupted when damage occurred
to the posterior areas of the same centers. Luria argued that although a single linguistic function (articulation, comprehension, etc.) could be safely located in a
Unauthenticated
Download Date | 6/6/16 9:43 PM
263
specic area of the left hemisphere (LH), the overall phenomenon of language as
an expressive and representational code resulted from the interaction of several
cooperative cerebral structures that were connected by a network of synaptic processes. Subsequent aphasiology studies conrmed Lurias basic idea: for example, LH-damaged patients use intonation patterns correctly (Danly and Shapiro
1982), suggesting a right-hemisphere (RH) location for this function; RH-damaged
patients, on the other hand, show little or no control of intonation (for example,
Heilman, Scholes, and Watson 1975, Ross and Mesulam 1979). This kind of work
led to the concept of parallel distributed processes (Rumelhart and McClelland
1986) which has been shedding some light on how Lurias idea of interconnectivity may in fact be the source of the higher mental functions.
Connectionism garnered broad interest in the 1960s and 1970s after widelypublicized split-brain studies conducted by the American psychologist Roger
Sperry and his associates showed that there was much more to the brain than locationism, or the idea that functions can be located in specic brain areas (for example, Sperry 1968, 1973). Split-brain patientsknown more technically as commisurotomy patientsare epilepsy subjects who have had their two hemispheres
separated by surgical section of the corpus callosum in order to attenuate the
seizures they tend to suffer. Each of their hemispheres can thus be investigated,
so to speak, in isolation by simply presenting stimuli to them in an asymmetrical fashion. So, any visual or audio stimulus presented to the left eye or left
ear of a split-brain subject could be assessed in terms of its RH effects, and vice
versa any visual or audio stimulus presented to the right eye or right ear could
be assessed in terms of its LH effects. The commisurotomy studies were pivotal
in providing a detailed breakdown of the main psychological functions according
to hemisphere and in how these worked in tandem. Overall, they suggested that
in the intact brain both hemispheres, not just a dominant one, were needed
in a neurologically-cooperative way to produce complex thinking. The split-brain
experiments established, once and for all, that the two hemispheres complement
each other in normal cognitive processing. So, in order to carry out a complex
cognitive task (for example, problem-solving in mathematics, reading, etc.) the integrated cooperation of both hemispheres is required. Cognition, in other words,
is interhemispheric, not just the product of dominant sites or centers in one or the
other of the two hemispheres of the brain.
The use of clinical methods such as aphasiology data and of commisurotomy
experiments as the primary ones in establishing facts about brain functioning started to give way, by the mid-1970s, to the employment of non-clinical
techniques to investigate the brains of normal subjects. They included dichotic
listening (sending signals to the brain via headphones), electroencephalograph
analysis (graphing brain waves with electrodes), and lateral eye movement
Unauthenticated
Download Date | 6/6/16 9:43 PM
264 | 5 Neuroscience
(videotaping the movement of the eyes during the performance of some cognitive task). The ndings generated by such techniques started casting further
doubt on the idea that neural networks based on computation worked as models
of the mind. By the early 1980s, new experiments conrmed, for instance, that
metaphor was the result of interhemispheric programming and that it could not
be explained in terms of a simple logical calculus.
Many of these techniques have been largely abandoned today for a simple
reasonthey have been made obsolete by new technologies such as positron
emission tomography (PET) scanning and functional magnetic resonance imaging (fMRI). These have enabled neuroscientists to observe the brain directly while
people speak, listen, read, solve problems, conduct proofs, and think in general.
These are particularly effective because they do not require any physical contact
with the brain. They produce images similar to X rays that show which parts of
the brain are active while a person carries out a particular mental or physical task.
PET scanning shows the parts of the brain that are using the most glucose (a form
of sugar), and fMRI shows the parts where high oxygen levels indicate increased
activity.
The PET and fMRI studies are gradually conrming that mathematical and
linguistic processing are extremely complex, rather than involving a series of subsystems located in specic parts of the brain (Brocas area, Wernickes area, and
Penelds area). The neuronal structures involved are spread widely throughout
the brain, primarily by neurotransmitters, and it now appears certain that different types of tasks activate different areas of the brain in many sequences and
patterns. It has also become apparent from fMRI research that language is regulated, additionally, by the emotional areas of the brain. The limbic systemwhich
includes portions of the temporal lobes, parts of the hypothalamus and thalamus,
and other structuresmay have a larger role than previously thought in the processing of all kinds of information (Damasio 1994).
5.1.3 Modularity
Connectionist neuroscience has led to the notion that the brain is a modular organ, with each module (agglomeration of neuronal subsystems located in a specic region) organized around a particular task. It is worthwhile repeating here
previously made annotations about how interhemisphericity works. The processing of visual information, for instance, is not conned to a single region of the
RH, although specic areas in the RH are highly active in processing incoming
visual forms. Rather, different neural modules are involved in helping the brain
process visual inputs as to their contents. Consequently, visual stimuli that carry
Unauthenticated
Download Date | 6/6/16 9:43 PM
265
Unauthenticated
Download Date | 6/6/16 9:43 PM
266 | 5 Neuroscience
PDP models
Unauthenticated
Download Date | 6/6/16 9:43 PM
267
described what happened to the main character of each one. One of the stories
contained a metaphorical idiom. The groups tested were aphasics, RH patients,
and normals. Like Winner and Gardner, the researchers found that, of the three
groups, the RH patients were the ones who showed the greatest inability to detect
the metaphorical idioms.
In the 1980s, the evidence in favor of an RH involvement in metaphor mounted. Hier and Kaplan (1980) found that RH patients exhibited decits in explaining
the meaning of proverbs. Wapner, Hamby, and Gardner (1981) discovered that
RH patients tended to exhibit signicant difficulties in deriving the metaphorical
point of a story. Brownell, Potter, and Michelow (1984) and Brownell (1988) detected RH involvement in metaphor comprehension, but could not specify what
neural regions of the RH were implicated. Using PET-scanning equipment Bottini
et al. (1994) showed the right temporal lobe to be the most active one in metaphor.
They also found that the right parietal lobe was active in some metaphorical tasks,
whereas the corresponding lobe in the LH was not.
This whole line of research suggests that metaphor results from an interhemispheric connectivity, originating in the RH and moving over to the LH for its organization into language or some other system, including mathematics. After the
publication of Lakoff and Nezs study (2000), which claimed that metaphor had
the same neural structure in mathematics, a plethora of neuroscientic studies
surfaced showing that metaphor and math were indeed connected and that a unitary neuroscientic model could be drafted. Pesci (2003) argued persuasively, on
the basis of a literature review connecting mathematics and metaphor, that the
latter seemed to play a critical role in math because it was an efficient transformation mediator of cognition. Lakoff and Nezs main claim was that we
understand mathematics through conceptual metaphors and thus through linkages between source domains (for example spatial relationships between objects)
and target domains (abstract mathematical notions). These are based on certain
basic schemas of thought, or cross-modal organizational structures, as discussed
in chapter 3. In 2009, Aubry showed how the Lakoffian model works in explaining abstract mathematical conceptions of space. Mowat and Davis (2010), Ernest
(2010), and Zwicky (2010) have argued along the same lines. The gist of this line of
inquiry is that the role of metaphor in mathematics can no longer be ignored. Computational models in neuroscience cannot handle connective phenomena such as
metaphorical blending. And if the relevant research is at all correct, then it is in
studying blending that the greatest insights into the relation between math cognition and language can be gleaned.
Recent work on metaphor processing has largely substantiated the connectionist ndings. Some questions have also been raised that require further investigation. For instance, Schmidt-Snoek, Drew, Barile, and Aguas (2015) show
Unauthenticated
Download Date | 6/6/16 9:43 PM
268 | 5 Neuroscience
that there are links between sensory-motor words used literally (kick the ball) and
sensory-motor regions of the brain, but nd no conclusive evidence to suggest
whether metaphorically used words (kick the habit) also show signs of such embodiment. Nonetheless, their study indicated greater amplitudes for metaphorical than literal sentences, supporting the possibility of different neural substrates
for motion and auditory sentences. The ndings are consistent with a sensorymotor (RH) neural categorization of metaphor.
Parallel ndings have been documented in a vast array of studies that conrm
RH involvement in metaphor processing (Schmidt and Seger 2009, Diaz, Barrett,
and Hogstrom 2011). A review of the literature, and the controversies it has generated, is the one by Lai, van Dam, Conant, Binder, and Rutvik (2015). By and large,
the studies substantiate the difference between literal and metaphorical cognition
neurologically.
Unauthenticated
Download Date | 6/6/16 9:43 PM
269
about the nature of mathematics, allowing us to revisit, for example, the Platonistversus-constructivist one with new empirical information.
Overall, ongoing research in neuroscience suggests that understanding of
number and space are a result of the same kind of brain circuitry that processes
the two phenomena, even though the debate continues as to what areas are involved in number sense versus linguistic awareness. And this leads a new way of
examining how the brain models the world. Our external experience of quantity
and space, and our symbolic representations of that experience, activate the same
neural networks, as Edward Hubbard and his associates have argued (see, for example, Hubbard et al. 2005). Abstract mathematical concepts such as Cartesian
coordinates or the complex plane might appear to be cultural inventions, but
they may have emerged as concepts because they t in with the architecture of
the brain and thus its cerebral symbolism. So, they are both part of the biology
of cognition, but also shaped by cultural inuence, which initiates the abstraction process. This may or may not be veriable, but it does bring out that the
two dimensions of human knowledge-makingthe Umwelt and the Innenwelt
interact constantly in the production of knowledge and this interaction is guided
by image schemas such as more than, less than, nearer, farther, and bigger, smaller
that apply to language as well as to mathematics.
Since the circuitry encoding different magnitudes produces blends, one
would expect that the perception of phenomena such as duration, size, and
quantity would affect each other. And this has been shown with so-called interference studies. For example, if subjects are given information indicating that two
trains of different size are travelling at the same speed, they will tend to perceive
the larger train as moving faster.
Guhe et al. (2011) have developed a computational model of how blending
might be simulated. They devised a system by which different conceptualizations
of number can be blended together to form new ones via recognition of common
features, and a judicious combination of their features. The model of number is
based on Lakoff and Nezs grounding metaphors for arithmetic. The ideas are
worked out using a so-called Heuristic-Driven Theory Projection (HDTP, a method
based on higher-order anti-unication). HDTP provides generalizations between
domains, thus allowing for a mechanism of nding commonalities and allows for
the transfer of concepts from one domain to another, producing new conceptual
blends.
Of course, the work on metonymy is also critical for understanding the connection between mathematics and language, but need not be discussed in any
detail here. The difference between metaphor and metonymy can be reduced to a
simple paraphrase: metaphor amalgamates information, metonymy condenses it.
So, metonymy is operative in giving rise to symbols; metaphor is operative in how
Unauthenticated
Download Date | 6/6/16 9:43 PM
270 | 5 Neuroscience
Generic
space
Input 1
Input 2
Blend or
solution
Figure 5.1: Blending
The difference is that in metonymy one of the inputs is actually part of the other.
Again without going into details here, suffice it to say that the concept of blend
covers a broad range of cognitive activities, including metonymy, metaphor, and
irony. Note that by generic space the model simply renames concept-to-beconstructed.
Unauthenticated
Download Date | 6/6/16 9:43 PM
271
Unauthenticated
Download Date | 6/6/16 9:43 PM
272 | 5 Neuroscience
straightforward, even natural. The same is true of physics. If mathematics and physics
stayed within these familiar story worlds, they might as disciplines have the cultural status
of something like carpentry: very complicated and clever, and useful, too, but tting human
understanding. The problem comes when mathematical work runs up against structures
that do not t our basic stories. In that case, the way we think begins to fail to grasp the
mathematical structures. The mathematician is someone who is trained to use conceptual
blending to achieve new blends that bring what is not at human scale, not natural for
human stories, back into human scale, so it can be grasped.
Hyde (2011) looked at the relevant literature on math cognition in order to provide a more comprehensive denition of the phenomenon. After going through
a set of studies of adults, infants, and animals he concluded that non-symbolic
number sense is supported by at least two distinct cognitive systems: a parallel individuation system that encodes the numerical identity of individual items
and an approximate number system that encodes the approximate numerical
magnitude, or numerosity, of a set. Of course, some argue that the non-symbolic
representation of small numbers is carried out solely by the parallel individuation
system, while the non-symbolic representation of large numbers is carried out
solely by the approximate number system. Others argue that all numbers are represented by a single system. This debate has been fueled by experiments showing
dissociations between small and large number processing and contrasting ones
showing similar processing of small and large numbers. Hyde argues for diversity
in results due to subjectivity (Hyde 2011: 150):
When items are presented under conditions that allow selection of individuals, they will be
represented as distinct mental items through parallel individuation and not as a numerical
magnitude. In contrast, when items are presented outside attentional limits (e.g., too many,
too close together, under high attentional load), they will be represented as a single mental
numerical magnitude and not as distinct mental items. These predictions provide a basis
on which researchers can further investigate the role of each system in the development of
uniquely human numerical thought.
Unauthenticated
Download Date | 6/6/16 9:43 PM
273
guage as models of reality because they display how the parts resemble relations
among the parts of some different set of entities in other domains. Thus, it can
be said that math cognition is especially visible (literally) in the use of diagrams
to represent math concepts. Diagrams do not simply portray information, but
also the process of thinking about the information as it unfolds in the brain
(Peirce, vol. 4: 6). Peirce called diagrams moving pictures of thought (Peirce,
vol. 4: 811) because in their structure we can literally see a given argument. As
Kiryuschenko (2012: 122) has aptly put it, for Peirce graphic language allows us to
experience a meaning visually as a set of transitional states, where the meaning
is accessible in its entirety at any given here and now during its transformation.
If Kant and Peirce are correct, then it is obvious that the role of diagrams and
visual signs generally in the neuroscientic study of mathematical cognition is an
important one because they mirror brain structure. The work on math cognition
and visualization is actually quite extensive (Shin 1994, Chandrasekaran, Glasgow, and Narayanan 1995, Hammer 1995, Hammer and Shin 1996, 1998, Allwein
and Barwise 1996, Barker-Plummer and Bailin 1997, 2001, Kulpa 2004, Stjernfelt
2007, Roberts 2009). So too is the interest in phenomenology among mathematicians, a trend that was pregured by Peirces notion of phaneroscopy, which
he described as the formal analysis of appearances apart from how they appear
to interpreters and of their actual material content (see Hartimo 2010). In effect,
mathematical diagrams express our intuitions about quantity, space, and relations in a way that seems to parallel mental imagery in general as a means of
grasping and retaining reality. The intuitions are probably universal (rst type
of denition); the visual representations, which include numerals originally, are
products of historical processes (second type of denition).
The Kantian notion of visual sign extends to numerals, equations and other
mathematical artifacts. Algebraic notation is, in effect, a diagrammatic strategy
for compressing information, much like pictography does in referring to specic
referents (Danesi and Bockarova 2013). An equation is a graph consisting of signs
(letters, numbers, symbols) organized to reect the order and structure of events
that it aims to represent iconically. It may show that some parts are tied to a strict
order, whereas others may be more exibly interconnected. As Kauffman (2001:
80) observes, Peirces graphs contain arithmetical information in an economical
form:
Peirces Existential Graphs are an economical way to write rst order logic in diagrams on
a plane, by using a combination of alphabetical symbols and circles and ovals. Existential
graphs grow from these beginnings and become a well-formed two dimensional algebra. It
is a calculus about the properties of the distinction made by any circle or oval in the plane,
and by abduction it is about the properties of any distinction.
Unauthenticated
Download Date | 6/6/16 9:43 PM
274 | 5 Neuroscience
An equation such as the Pythagorean one (c2 = a2 + b2 ) is a type of Existential
Graph, since it is a visual portrait of the relations among the variables (originally
standing for the sides of the right triangle). But, being a graph, it also tells us that
the parts relate to each other in many ways other than in terms of the initial triangle referent. It reveals hidden structure, such as the fact that there are innitely
many Pythagorean triples, or sets of integers that satisfy the equation. Expressed
in language (the square on the hypotenuse is equal to the sum of the squares
on the other two sides), we would literally not be able to see this hidden implication. To return to Susan Langers (1948) distinction between discursive and
presentational forms (chapter 2), the equation tells us much more than the statement (a discursive act) because it presents inherent structure holistically, as an
abstract form. We do not read a diagram, a melody, or an equation as made up
of individual bits and pieces (notes, shapes, symbols), but presentationally, as a
totality which encloses and reveals much more meaning.
Unauthenticated
Download Date | 6/6/16 9:43 PM
hunch
inference
abduction
deduction
guessing
informed
guessing
insight
logical form
275
Hunches are the brains attempts to understand what something unknown means
initially. These eventually lead to inferences through a matrix of associative devices to previous knowledge such as induction, analogy, and metaphor. So, the
Pythagorean triangle, which came initially from the hunches of builders, led to
an inference that all similar triangles may contain the same pattern, and this led
to the insight that we call the Pythagorean Theorem, which was given a logical
form through proof. Once the form exists, however, it becomes the source for more
inferences and abductions, such as the previously-hidden concept of Pythagorean
number triples. Eventually, it gave rise to an hypothesis, namely that only when
n = 2 does the general formula hold (cn = an + bn )called Fermats Last Theorem
(Taylor and Wiles 1995). This, in turn, led to many other discoveries (Danesi 2013).
As another example of how unpacking leads to insight, consider imaginary numbers. The motivation for their invention/discovery came from solving
quadratic equations that produced the square root of negative numbers. It was not
clear, at rst, how to resolve this apparent anomaly. So, a hunch that they could
be treated like any number surfaced at some point, which led to an inference,
namely that the square root of a negative number must exist in some domain,
which in turn, led to an abductionthe ingenious invention of a diagram, called
the Argand diagram, that showed the relation of imaginary numbers to real ones.
As is well known, the diagram locates imaginary numbers on one axis and real
ones on another. The point z = x + iy is then used to represent a complex number
in the Argand plane, displaying its vectorial features in terms of the angle that
it forms. The Argand diagram turned out, moreover, to be much more than a
simple heuristic device, showing how to carry out arithmetical operations with
complex numbers; it soon became a source of investigation of the structure of
these numbers and numbers in general.
Needless to say, mathematicians have always used diagrams to unpack hidden structure. For this reason, the relation between mental imagery and math
cognition has become a main topic in both neuroscience and psychology. Among
the rst to investigate this relation empirically was Piaget, who sought to understand the development of number sense in relation to symbolism (summarized
in Piaget 1952). In one experiment, he showed a ve-year-old child two matching
sets of six eggs placed in six separate egg-cups. He then asked the child whether
Unauthenticated
Download Date | 6/6/16 9:43 PM
276 | 5 Neuroscience
there were as many eggs as egg-cups (or not)the child replied in the affirmative.
Piaget then took the eggs out of the cups, bunching them together, leaving the
egg-cups in place. He then asked the child whether or not all the eggs could be put
into the cups, one in each cup and none left over. The child answered negatively.
Asked to count both eggs and cups, the child would correctly say that there was
the same amount. But when asked if there were as many eggs as cups to ll, the
child would again answer no. Piaget concluded that the child had not grasped
the relational properties of numeration, which are not affected by changes in the
positions of objects. Piaget showed, in effect, that ve-year-old children have not
yet established in their minds the symbolic connection between numerals and
number sense (Skemp (1971: 154).
Unauthenticated
Download Date | 6/6/16 9:43 PM
277
evidence scattered here and there (Bockarova, Danesi, and Nez 2012) would
seem to dispute this, since there are cultures where the number line does not exist
and thus that the kinds of calculations and concepts related to it do not appear.
Whatever the truth, it is clear that neuroscience, as Dehaene suggests, can provide
answers to many of these conundrums.
Dehaene brings forth evidence that animals such as rats, pigeons, raccoons,
and chimpanzees can perform simple calculations, describing ingenious experiments that show that human infants also show a parallel manifestation of number
sense. Further, Dehaene suggests that this rudimentary number sense is as basic
to the way the brain understands the world as our perception of color or of objects
in space, and, like these other abilities, our number sense is wired into the brain.
But how then did the brain leap from this basic number sense to trigonometry, calculus, and beyond? Dehaene argues that it was the invention of symbolic systems
of numerals that started us on the climb towards higher abstract mathematics,
He makes his case by tracing the history of numbers, from early times when people indicated a number by pointing to a part of their body (even today, in many
societies in New Guinea, the word for six is wrist), to early abstract symbols
such as Roman numerals (chosen for the ease with which they could be carved
into wooden sticks), and to modern numerals and number systems. Dehaene also
explores the unique abilities of idiot savants and mathematical geniuses, asking what might explain their special mathematical talent. Using modern imaging
techniques (PET scans and fMRI), Dehaene illustrates exactly where numerical
calculation takes place in the brain. But perhaps most importantly, Dehaene argues that the human brain does not work like a computer, and that the physical
world is not based on mathematicsrather, mathematics evolved to explain the
physical world in a similar way that the eye evolved to provide sight. His model
of math cognition is charted in gure 5.3. It shows that there are verbal and attention components, but overall numeracy and numerical magnitude processes are
independent modules of cognition.
Dehaenes arguments are far-reaching. But do they really explain math cognition? Is it a shared instinctual sense with other species, or are we nding simple
analogies in those species? This type of speculation has always been evident in the
primate language studies, which sought to establish, or else reject, a language instinct in primates. There really has emerged no impartial evidence to suggest that
chimpanzees and gorillas are capable of math or language in the same way that
humans are, nor of having the ability or desire to pass on to their offspring what
they have learned from their human mentors, despite claims to the contrary. Conditioning effects cannot be ruled out when assessing the reported ndings of the
primate experiments. Also, there is no way of ascertaining if the kinds of counting procedures witnessed in other animals are really no more than instinctive
Unauthenticated
Download Date | 6/6/16 9:43 PM
278 | 5 Neuroscience
Linguistic
Symbolic
number
system
Geometry
measurement
Numeration
number line
calculation
Spatial
attention
Quantitative
Numerical
magnitude
process
Magnitude
comparison
Cognitive
skills
Early numeracy
knowledge
Mathematical
outcomes
Unauthenticated
Download Date | 6/6/16 9:43 PM
279
even a few weeks old, contrary to Piagets ndings, that number sense requires
cognitive growth, and that people afflicted with Alzheimers have unexpected numerical abilities. The diagram below summarizes many of the ideas elaborated by
Butterworth. It is taken from his literature review of very low attainment in arithmetic (dyscalculia) which is a core decit in an inherited foundational capacity
for numbers (Butterworth 2010). It shows how it might come about:
(a)
(b)
Hidden layer
Symbolic representation
Numerals
Five
Semantic representation
(numerosity)
Three
Number words
Patterns of dots
(c)
Mediated semantic
pathway
Hidden layer
Direct semantic
pathway
Mediated symbolic
pathway
Direct symbolic
pathway
Unauthenticated
Download Date | 6/6/16 9:43 PM
280 | 5 Neuroscience
Unauthenticated
Download Date | 6/6/16 9:43 PM
281
Unauthenticated
Download Date | 6/6/16 9:43 PM
282 | 5 Neuroscience
Unauthenticated
Download Date | 6/6/16 9:43 PM
283
Stare at the fixation + sign for 30 sec, then see the figure below.
After staring at the figure above for 30 seconds, the left side
of the display should be experienced as more numerous than the right,
although they are actually identical (after Burr & Ross, 2008).
Figure 5.5: The numerosity adaptation effect
The effect shows that non-symbolic numerical intuition can imprint itself upon
the human brain directly. In the diagram a subject should have a strong impression that the display on the lower left is more numerous than the one on the
right, after 30 seconds of viewing the upper gure, although both have the same
number of dots. The subject might also underestimate the number of dots in the
display. The effects are resistant to the manipulation of non-numerical features of
the display (size, density, contrast). Since these effects happen automatically, the
operation of a largely automatic processing system in the brain appears to be the
most likely explanation. As Burr and Ross (2008: 428) observe: Just as we have
a direct visual sense of the reddishness of half a dozen ripe cherries, so we do of
their sixishness.
Some critics suggest that the effects are dependent on density and less so
on numerosity. Others suggest that numerosity may be related with kurtosis (the
perception of sharpness) and, thus, that the effect may be better explained in
Unauthenticated
Download Date | 6/6/16 9:43 PM
284 | 5 Neuroscience
terms of texture such that only the dots falling with the most effectively-displayed
region are the ones involved in the effects. However, since the display in the experiment was of spots that were uniformly either white or black, the kurtosis effect
is inapplicable. It is not the number of dots in the entire display that causes the
adaptation but only those within a particular area. At present, there is no real explanation of why adaptation has such a profound effect on numerosity. What the
experiment shows, however, is that perception and number sense are intrinsically
intertwined, and this brings out the force of contextual factors. The repetition of
the same experiment in various cultural contexts would go a long way to answering this question.
Unauthenticated
Download Date | 6/6/16 9:43 PM
This kind of connective thinking occurs because of gaps that are felt to inhere in
the system. As Godino, Font, Wilhelmi, and Lurduy (2011: 250) cogently argue, notational systems are practical (experiential) solutions to the problem of counting:
As we have freedom to invent symbols and objects as a means to express the cardinality of
sets, that is to say, to respond to the question, how many are there?, the collection of possible
numeral systems is unlimited. In principle, any limitless collection of objects, whatever its
nature may be, could be used as a numeral system: diverse cultures have used sets of little
stones, or parts of the human body, etc., as numeral systems to solve this problem.
All this implies that mathematics is both invented and discovered, not through
abstract contemplation, but through the recruitment of everyday cognitive mechanisms that make human imagination and abstraction possible. Fauconnier and
Turner (2002) have proposed arguments along the same lines, giving substance
Unauthenticated
Download Date | 6/6/16 9:43 PM
286 | 5 Neuroscience
to the notion that ideas in mathematics are based on inferences deriving from
experiences and associations within these experiences.
The idea that metaphor plays a role in mathematics seems to have never been
held seriously until after Lakoff and Nezs watershed work, even though, as
Marcus (2012: 124) observes, mathematical terms are mainly metaphors:
For a long time, metaphor was considered incompatible with the requirements of rigor and
preciseness of mathematics. This happened because it was seen only as a rhetorical device
such as this girl is a ower. However, the largest part of mathematical terminology is the
result of some metaphorical processes, using transfers from ordinary language. Mathematical terms such as function, union, inclusion, border, frontier, distance, bounded, open, closed,
imaginary number, rational/irrational number are only a few examples in this respect. Similar metaphorical processes take place in the articial component of the mathematical sign
system.
Like language, no one aspect of mathematics can be taken in isolation. Matrix algebra is a more general way of doing arithmetic; Boolean algebra is a more general
way of doing algebra; and so on. The connecting links are, typically, conceptual
metaphors such as: arithmetic is motion along a path (a notion represented in the
number line), sets are containers, geometric gures are objects in space, recurrence
is circular, and so on. Many resist the approach taken by Lakoff and Nez, pointing out that there are strategies other than conceptual metaphor involved in doing
math. The main critics, though, come out of the computational camp (discussed
briey above).
As discussed in the opening chapter, already in the 1960s, a number of
linguists became intrigued by the relation between mathematics and language
(Hockett 1967, Harris 1968). Their work contained an important subtextby exploring the structures of mathematics and language in correlative ways, we might
hit upon deeper points of contact and thus at a common cognitive origin for both.
Mathematics makes sense when it encodes concepts that t our experiences of
the worldexperiences of quantity, space, motion, force, change, mass, shape,
probability, self-regulating processes, and so on. The inspiration for new mathematics comes from these experiences as it does for new language. What was
lacking at the time was the concept of blend, which started appearing only in the
early 2000s.
The example of Gdels famous proof, which Lakoff has argued (see Bockarova and Danesi 2012: 45), was inspired by Cantors diagonal method, as was
mentioned briey in the opening chapter. It is worth revisiting here. Gdel proved
that within any formal logical system there are results that can be neither proved
nor disproved. He found a statement in a set of statements that could be extracted by going through them in a diagonal fashionnow called Gdels diagonal
Unauthenticated
Download Date | 6/6/16 9:43 PM
lemma. That produced a statement, S, like Cantors C, that does not exist in the
set of statements. Cantors diagonal and one-to-one matching proofs are mathematical metaphorsassociations linking different domains in a specic way
(one-to-one correspondences). This insight led Gdel to envision three metaphors
of his own (as we saw): (1) the Gdel number of a symbol, which is evident in the
argument that a symbol in a system is the corresponding number in the Cantorian
one-to-one matching system (whereby any two sets of symbols can be put into a
one-to-one relation); (2) the Gdel number of a symbol in a sequence, which is
manifest in the demonstration that the nth symbol in a sequence is the nth prime
raised to the power of the Gdel number of the symbol; and (3) Gdels central
metaphor, which was Gdels proof that a symbol sequence is the product of the
Gdel numbers of the symbols in the sequence.
The proof, as Lakoff argues, exemplies perfectly how blending works. When
the brain identies two distinct entities in different neural regions as the same
entity in a third neural region, they are blended together. Gdels metaphors come
from neural circuits linking a number source to a symbol target. In each case, there
is a blend, with a single entity composed of both a number and a symbol sequence.
When the symbol sequence is a formal proof, a new mathematical entity appears
a proof number.
Unauthenticated
Download Date | 6/6/16 9:43 PM
288 | 5 Neuroscience
Not all ICMs manifest a clustering structure. A second major type can be called
radiation, which inheres in different target domains being delivered by identical
source domains. It can be envisioned as a single source domain radiating outwards to deliver different target domains. For example, the plant source domain
above not only allows us to conceptualize ideas (That idea has deep ramications), but also such other abstract concepts as love (Our love has deep roots),
inuence (His inuence is sprouting all over), success (His career has borne great
fruit), knowledge (That discipline has many branches), wisdom (His wisdom has
deep roots), and friendship (Their friendship is starting to bud just now), among
many others. Radiation can be dened more neursocientically as the blending of
abstract concepts that implicate each other through a specic experiential model
or frame of reference (source domain). Radiation, by the way, explains why we
talk of seemingly different things, such as wisdom and friendship, with the same
metaphorical vehicles. Clustering, on the other hand, explains why we use different metaphorical vehicles. It thus allows people to connect source domains as
they talk.
Now, clustering can be seen in how algorithms and proofs are constructed. In
the proof of the triangle as containing 180 (chapter 2), several domains clustered
around the proof. First, the domain of angle sizes was involved in determining
that the straight line was an angle; second, there was the idea that angles can be
dissected into parts and then recombined. In other words, grounding and linking were involved in the proof, clustering around the main task of connecting the
statements in the proof.
Radiation can be seen in connective branches such as Cartesian geometry,
which blends arithmetic, algebra and geometry through the image schema of intersecting number lines. The radiation occurs in how these three domains radiate
outwards into linkages among each other, showing how arithmetic, algebra, and
geometry are highly interrelatedone assumes knowledge of the other. Descartes
called this radiative blend, of course, analytic geometry. A number line is itself
a rudimentary geometric representation that shows the continuity between positive and negative numbers and a one-to-one correspondence between a specic
number and a specic point on the line. Descartes simply drew two number lines
intersecting at right angles. The horizontal line is called the x-axis, the vertical
one the y-axis, and their point of intersection the origin. This system of two perpendicular intersecting number lines is called eponymously the Cartesian plane.
Blending is unconscious and that is why we hardly ever are aware of what
we are doing when we do math. Consider a simple statement such as 7 is larger
than 4. This is a metaphor, produced by blending a source domain that involves
concepts of size with the target domain of numbers (Presmeg 1997, 2005). The
conceptual metaphor that underlies the statement 7 is larger than 4 is numbers
Unauthenticated
Download Date | 6/6/16 9:43 PM
are collections of objects of differing sizes. Similarly, the concept of quantity, involves at least two metaphorical blends. The rst is the more is up, less is down
image schema, which appears in common expressions such as the height of those
functions went up as the numerical value increased and the other functions sloped
downwards as the numerical values decreased. The other is linear scales are paths,
which manifests itself in expressions such as rational numbers are far more numerous than integers, and innity is way beyond any collection of nite sets. As Lakoff
(2012b: 164) puts it:
The metaphor maps the starting point of the path onto the bottom of the scale and maps
distance traveled onto quantity in general. What is particularly interesting is that the logic
of paths maps onto the logic of linear scales. Path inference: If you are going from A to C, and
you are now at in intermediate point B, then you have been at all points between A and B and
not at any points between B and C. Example: If you are going from San Francisco to N.Y. along
route 80, and you are now at Chicago, then you have been to Denver but not to Pittsburgh.
Linear scale inference: If you have exactly $ 50 in your bank account, then you have $ 40,
$ 30, and so on, but not $ 60, $ 70, or any larger amount. The form of these inferences is the
same. The path inference is a consequence of the cognitive topology of paths. It will be true
of any path image-schema.
The same kind of argument can be made for scientic thinking in general (Black
1962). Science often involves theorizing about things that we cannot see, hear,
touchatoms, gravitational forces, magnetic elds, and so on. So, scientists use
their imagination to take a look. The result is a metaphorical theory. A classic
example of this is the early history of atomic theory (Sebeok and Danesi 2000),
which can be sequenced into three main phases: (1) the Rutherford Model which
portrays the atom space as a tiny solar system; (2) the Bohr Model, which adds
quantized orbits to the Rutherford Model; and (3) the Schrdinger Model, which
Unauthenticated
Download Date | 6/6/16 9:43 PM
290 | 5 Neuroscience
posits the idea that electrons occupy regions of space. The three models are rendered in diagram form below (Danesi 2013). These show how radiation worksthe
Rutherford model radiating outwards (metaphorically) to suggest the Bohr model
which in turn radiates outward towards the Schrdinger model:
the nucleus
electrons
orbits
Nucleus
1st shell = 2 electrons
6 protons and
6 neutrons
in the nucleus
orbits
electron
clouds
The way in which each model is composed is hardly haphazard, as Black pointed
out: each one attempts to model atomic structure according to specic types of
experimental data, and each one is generated from a radiative ICMone target
domain linked to separate source domains. The target domain in all three cases
is atomic structure; but each diagram provides, literally, a different metaphorical view of the same domaina domain that is not directly accessible to vision.
Rutherford speculated that atomic structure mirrors the solar systema theory
Unauthenticated
Download Date | 6/6/16 9:43 PM
that may have been inuenced by the ancient Pythagorean concept of the cosmos
as having the same structure at all its levels, from the microcosmic (the atom) to
the macrocosmic (the universe). The Bohr Model is, in effect, an extension of the
Rutherford one, and the Schrdinger Model an extension of the previous two. The
model envisioned by Rutherford is a rst-order blend of the structure of the solar
and atomic systems. Bohr began with Rutherfords model as his source domain,
but then postulated further that electrons can only move in certain quantized orbits, blending emerging ideas in quantum physics to the Rutherford model. Bohr
was thus able to explain certain qualities of emission for hydrogen, but failed for
other elements. His was a second-order blenda blend of a previous blend.
Schrdingers model, in which electrons are described not by the paths they
take but by the regions where they are most likely to be found, can explain certain qualities of emission spectra for all elements. The basic source domain has
not changed, but it is now elaborated signicantly to account for phenomena that
are not covered by the original model. It was in 1926 that Schrdinger used mathematical equations to describe the likelihood of nding an electron in a certain
position. Unlike the Bohr model, Schrdingers model does not dene the exact
path of an electron, but rather, predicts the odds of the location of the electron.
This model is thus portrayed as a nucleus surrounded by an electron cloud. Where
the cloud is most dense, the probability of nding the electron is greatest, and on
the other side, the electron is less likely to be in a less dense area of the cloud. This
model is a third-order blenda blend of previous blends. Note, however, that at
each stage of the development of atomic theory, there is an inherent connectivity. Blending occurs in different orders to produce complex ideas. The trace to the
brains inner blending processes is metaphor, either conceptually or visually (as
in the diagrams above). This is also why physicists use metaphor descriptively, referring to sound waves as undulating through empty space, atoms as leaping from
one quantum state to another, and electrons as orbiting an atomic nucleus; and
so on. The physicist K. C. Cole (1984: 156) puts it as follows:
The words we use are metaphors; they are models fashioned from familiar ingredients and
nurtured with the help of fertile imaginations. When a physicist says an electron is like a
particle, writes physics professor Douglas Giancoli, he is making a metaphorical comparison like the poet who says love is like a rose. In both images a concrete object, a rose or a
particle, is used to illuminate an abstract idea, love or electron.
As Robert Jones (1982: 4) has also pointed out, for the scientist metaphor serves
as an evocation of the inner connection among things. It is interesting and relevant to note that the philosopher of science, Fernand Hallyn (1990), identied
the goal of science as that of giving the world a poetic structure. Scientic models, in this view, are visual-metaphorical interpretations of given information that
Unauthenticated
Download Date | 6/6/16 9:43 PM
292 | 5 Neuroscience
lead to further connections and insights. Marcus (2012: 184) writes on this theme
insightfully as follows:
When mathematics is involved in a cognitive modeling process, both analogical and indexical operations are used. But the conict is unavoidable, because the model M of a situation A
should be concomitantly as near as possible to A (to increase the chance of the statements
about M to be relevant for A too), but, on the other hand, M should be as far as possible
from A (to increase the chance of M to can be investigated by some method which is not
compatible with the nature of A). A similar situation occurs with cognitive mathematical
metaphors. Starting as cognitive model or metaphor for a denite, specic situation, M acquires an autonomous status and it is open to become a model or a metaphor for another,
sometimes completely different situation. M may acquire some interpretation, but it can also
abandon it, to acquire another one. No mathematical construction can be constrained to
have a unique interpretation, its semantic freedom is innite, because it belongs to a ctional universe: mathematics. Mathematics has a strong impact on real life and the real
world has a strong impact on mathematics, but all these need a mediation process: the replacement of the real universe by a ctional one.
Unauthenticated
Download Date | 6/6/16 9:43 PM
| 293
When ideas are represented in this way, their structural possibilities become evident in the blend itself, which is a kind of snapshot of hidden or suggestive structure, and new ideas are possible because of this. It is this hidden structure packed
into a blend that is often the source of discovery in mathematics. Unpacking it describes a large amount of how mathematical cognition unfolds. Progress is thus
guided by blending on blending and so on, ad innitum.
One can, actually, describe entire systems in terms of n-order blends. For
example, algebra is a second-order blend from arithmetic. The ancient Egyptians
and Babylonians used a proto-form of algebra, and hundreds of years later, so too
did the Greeks, Chinese, and people of India. Diophantus used what we now call
quadratic equations and symbols for unknown quantities. But between 813 and
833, al-Khwarizmi, a teacher in the mathematical school in Baghdad, wrote an
inuential book on algebra that came to be used as a textbook. As al-Khwarizmi
argued, restoration and completion were symbol-manipulating techniques. As
such, they enshrined algebra as a separate and powerful branch of arithmetic. It
was then that algebra developed into the equation modeling system that it has
become. This happened between the fteenth and seventeenth centuries when,
as Bellos (2010: 123) puts it, mathematical sentences moved from rhetorical to
symbolic expression. As Bellos (2010: 124) goes on to write:
Replacing words with letters and symbols was more than convenient shorthand. The symbol x may have started as an abbreviation for unknown quantity, but once invented, it
became a powerful tool for thought. A word or an abbreviation cannot be subjected to mathematical operation in the way that a symbol like x can be. Numbers made counting possible;
but letter symbols took mathematics into a domain far beyond language.
Algebra made formulas in science a reality, greatly enhancing the power of science to explore reality. As Crilly (2011: 104) observes, the desire to nd a formula
is a driving force in science and mathematics. Perhaps the worlds most famous
example of this is Einsteins E = mc2 , which compresses so much information in
it that it dees common sense even to start explaining why this should be so. The
formula, devised in 1905, tells us that the energy (E) into which a given amount of
matter can change equals the mass (m) of that matter multiplied by the speed of
light squared (c2 ). Using this equation, scientists determined that the ssioning
of 0.45 kilograms of uranium would release as much energy as 7,300 metric tons
of TNT. Constructing a formula is, in effect, devising a notation for something.
But in so doing, it also becomes a predictive tool, as is evident in the applications
of Einsteins formula. Science is prophecy of a mathematical kind. Mathematical
formulas predict events that have not occurred; and when some new formula predicts them better or shows that the previous formulas are faulty, then replacement
occurs.
Unauthenticated
Download Date | 6/6/16 9:43 PM
294 | 5 Neuroscience
Unauthenticated
Download Date | 6/6/16 9:43 PM
295
is starting to yield signicant insights into the question of what math and language are. Linguistics is, in fact, highly exible as a science, both theoretically and
methodologically. Together with traditional forms of eldwork and ethnographic
analysis, the use of mathematics can help the linguist gain insights into language
and discourse that would be otherwise unavailable (as we have seen throughout
this book). That, in my view, is the most important lesson to be learned from considering the math-language nexus. The more we probe similarities (or differences)
in mathematics and language with all kinds of tools, the more we will know about
the mind that creates both. That, as Arika Orent (2009) aptly puts it, should always be the fundamental goal of linguistics (and math cognition for that matter):
The job of the linguist, like that of the biologist or the botanist, is not to tell us
how nature should behave, or what its creations should look like, but to describe
those creations in all their messy glory and try to gure out what they can teach
us about life, the world, and, especially in the case of linguistics, the workings of
the human mind.
Unauthenticated
Download Date | 6/6/16 9:43 PM
Unauthenticated
Download Date | 6/6/16 9:43 PM
Bibliography
Adam, John A. 2004. Mathematics in nature: Modeling patterns in the natural world. Princeton,
NJ: Princeton University Press.
Al-Khalili, Jim. 2012. Paradox: The nine greatest enigmas in physics. New York, NY: Broadway.
Alexander, James. 2012. On the cognitive and semiotic structure of mathematics. In Mariana
Bockarova, Marcel Danesi, and Rafael Nez (eds.), Semiotic and cognitive science essays
on the nature of mathematics, 134. Munich: Lincom Europa.
Allan, Keith. 1986. Linguistic meaning. New York, NY: Routledge.
Allwein, Gerard and Barwise, Jon (eds.) 1996. Logical reasoning with diagrams. Oxford: Oxford
University Press.
Alpher, Barry. 1987. Feminine as the unmarked grammatical gender: Buffalo girls are no fools.
Australian Journal of Linguistics 7. 169187.
Ambrose, Rebecca C. 2002. Are we overemphasizing manipulatives in the primary grades to the
detriment of girls? Teaching Children Mathematics 9: 1621.
Andersen, Henning. 1989. Markedness theory: The rst 150 Years. In Olga M. Tomic (ed.),
Markedness in synchrony and diachrony, 1116. Berlin: Mouton de Gruyter.
Andersen, Henning. 2001. Markedness and the theory of linguistic change. In Henning Andersen (ed.), Actualization, 1957. Amsterdam: John Benjamins.
Andersen, Henning. 2008. Naturalness and markedness. In: K. Wellems and L. De Cuypere
(eds.), Naturalness and iconicity in language, 101119. Amsterdam: John Benjamins.
Andersen, Peter B. 1991. A theory of computer semiotics. Cambridge: Cambridge University
Press.
Anderson, Myrdene, Senz-Ludlow, Adalira, and Cifarelli, Victor (eds.). 2003. Educational perspectives on mathematics as semiosis: From thinking to interpreting to knowing. Ottawa:
Legas Press.
Anderson, Myrdene, Senz-Ludlow, Adalira, and Cifarelli, Victor. 2000. Musement in mathematical manipulation. In Adrian Gimate-Welsh (ed.), Ensayos semiticos, 663676.
Mexico: Porra.
Andrews, Edna and Tobin, Yishai (eds.). 1996. Toward a calculus of meaning: Studies in
markedness, distinctive features and deixis. Amsterdam: John Benjamins.
Andrews, Edna. 1990. Markedness theory. Durham, NC: Duke University Press.
Andrews, Edna. 2003. Conversations with Lotman: Cultural semiotics in language, literature,
and cognition. Toronto: University of Toronto Press.
Anndsen, Jens. 2006. Aristotle on contrariety as a principle of rst philosophy. Uppsala: Uppsala University Thesis.
Appel, Kenneth and Haken, Wolfgang. 1986. The Four Color Proof suffices. The Mathematical
Intelligencer 8: 1020.
Appel, Kenneth and Haken, Wolfgang. 2002. The Four-Color Problem. In: D. Jacquette (ed.),
Philosophy of mathematics, 193208, Oxford: Blackwell.
Ardila A. and Rosselli M. 2002. Acalculia and dyscalculia. Neuropsychology Review 12: 179
231.
Aristotle. (1952a). Rhetoric. In The works of Aristotle, Vol. 11, W. D. Ross (ed.). Oxford: Clarendon
Press.
Aristotle. (1952b). Poetics. In The works of Aristotle, Vol. 11, W. D. Ross (ed.). Oxford: Clarendon
Press.
Unauthenticated
Download Date | 6/6/16 9:44 PM
298 | Bibliography
Aristotle. 2012. The Organon, trans. R. B. Jones, E. M. Edghill, and A. J. Jenkinson. CreateSpace
Independent Publishing Platform.
Arndt, Walter W. 1959. The performance of glottochronology in Germanic. Language 35. 180
192.
Arnheim, Rudolf. 1969. Visual thinking. Berkeley, CA: University of California Press.
Arranz, Jos I. P. 2005. Towards a global view of the transfer phenomenon. The Reading Matrix
5. 116128.
Ascher, Marcia. 1991. Ethnomathematics: A multicultural view of mathematical ideas. Pacic
Grove, CA: Brooks/Cole.
Association of Teachers of Mathematics. 1980. Language and mathematics. Washington: Association of Teachers of Mathematics.
Aubry, Mathieu. 2009. Metaphors in mathematics: Introduction and the case of algebraic geometry. Social Science Research Network. Available at SSRN: http://ssrn.com/abstract=
1478871 or http://dx.doi.org/10.2139/ssrn.1478871
Babin, Arthur E. 1940. The theory of opposition in Aristotle. Notre Dame, IN: Notre Dame Doctoral Thesis.
Bach, Emmon W. 1989. Informal lectures on formal semantics. Albany, NY: SUNY Press.
Bck, Alan. 2000. Aristotles theory of predication. Leiden: Brill.
Bacon, Roger. 2009. The art and science of logic, trans. Thomas S. Maloney. Toronto: PIMS.
Ball, Deborah and Bass, Hyman (2002). Toward a Practiced-Based Theory of Mathematical
Knowledge for Teaching. In: Elaine Simmt and Brent David, eds., Proceedings of the 2002
Annual Canadian Mathematics Education Study Group/Groupe Canadien dtude en Didactique des Mathematiques, 2327. Sherbrooke, Canada.
Ball, Keith M. 2003. Strange curves, counting rabbits, and other mathematical Explorations.
Princeton, NJ: Princeton University Press.
BarHillel, Yehoshua. 1953. A quasi arithmetical notation for syntactic description. Language
29. 4758.
BarHillel, Yehoshua. 1960. The present status of automatic translation of languages. Advances in Computers 1. 91163.
Barbaresi, Lavinia M. 1988. Markedness in English discourse: A semiotic approach. Parma:
Edizioni Zara.
Barker-Plummer, Dave and Bailin, Sydney C. 1997. The role of diagrams in mathematical proofs.
Machine Graphics and Vision 8: 2558.
Barker-Plummer, Dave and Bailin, Sydney C. 2001. On the practical semantics of mathematical
diagrams. In: M. Anderson (ed.), Reasoning with diagrammatic representations. New York,
NY: Springer.
Barrett, William. 1986. The death of the soul: From Descartes to the computer. New York, NY:
Anchor.
Barrow, John D. 2014. 100 essential things you didnt know about maths & the arts. London:
Bodley Head.
Barthes, Roland. 1964. Elements of semiology. London: Cape.
Barthes, Roland. 1967. Systme de la mode. Paris: Seuil.
Barwise, Jon and Etchemendy, John 1994. Hyperproof. Stanford, CA: CSLI Publications.
Barwise, Jon and Etchemendy, John. 1986. The liar. Oxford: Oxford University Press.
Bateson, Gregory. 1972. Steps to an ecology of mind. New York, NY: Ballantine.
Battistella, Edwin L. 1990. Markedness: The evaluative superstructure of language. Albany, NY:
State University of New York Press.
Unauthenticated
Download Date | 6/6/16 9:44 PM
Bibliography |
299
Battistella, Edwin L. 1996. The logic of markedness. Oxford: Oxford University Press.
Baudouin de Courtenay, Jan. 1894 [1972]. A Baudouin de Courtenay anthology: The beginnings
of structural linguistics, ed. and trans. Edward Stankiewicz. Bloomington, IN: Indiana
University Press.
Beckmann, Petr. 1981. A history of . New York, NY: St. Martins.
Belardi, Walter. 1970. Lopposizione privativa. Napoli: Istituto Universitario Orientale di Napoli.
Bellos, Alex. 2010. Heres looking at Euclid: A surprising excursion through the astonishing
world of math. Princeton, NJ: Princeton University Press.
Bellos, Alex. 2014. The grapes of math: How life reects numbers and numbers reect life. New
York, NY: Doubleday.
Belsey, Catherine. 2002. Poststructuralism: A very short introduction. Oxford: Oxford University
Press.
Benford, Frank. 1938. The law of anomalous numbers. Proceedings of the American Philosophical Society 78: 551572.
Benjamin, Arthur, Chartrand, Gary, and Zhang, Ping. 2015. The fascinating world of graph theory. Princeton, NJ: Princeton University Press.
Benthem, Johann van and Ter Meulen, Alice (eds.). 2010. Handbook of logic and language,
2nd ed. Oxford: Elsevier.
Benveniste, Emile. 1946. Structure des relations de personne dans le verbe. Bulletin de la Socit de Linguistique de Paris 43. 225236.
Bergen, Benjamin K. 2001. Nativization processes in L1 Esperanto. Journal of Child Language
28. 575595.
Bergin, Thomas G. and Max H. Fisch. 1984. The New Science of Giambattista Vico, 2nd ed.
Ithaca, NY: Cornell University Press.
Bergsland, Knut and Vogt, Hans. 1962. On the validity of glottochronology. Current Anthropology 3. 115153.
Berlinski, David. 2013. The king of innite space: Euclid and his elements. New York, NY: Basic
Books.
Bernacer, Javier and Murillo, Jos Ignacio. 2014. The Aristotelian conception of habit and its
contribution to human neuroscience. Frontiers if Human neuroscience 8: 883.
Bernstein, Basil. 1971. Class, codes and control: Theoretical studies towards a sociology of
language. London: Routledge.
Bickerton, Derek. 2014. More than nature needs: Language, mind, and evolution. Cambridge,
MA: Harvard University Press.
Billeter, Jean Franois. 1990. The Chinese art of writing. New York, NY: Rizzoli.
Billow, R. M. 1975). A cognitive developmental study of metaphor comprehension. Developmental Psychology 11: 415423.
Black, Max. 1962. Models and metaphors. Ithaca, NY: Cornell University Press.
Blanch, Robert. 1966. Structures intellectuelles. Paris: Vrin.
Blatner, David. 1997. The joy of pi. Harmondsworth: Penguin.
Bloomeld, Leonard. 1933. Language. New York, NY: Holt.
Boas, Franz. 1940. Race, language, and culture. New York, NY: Free Press.
Bochnski, Innocentius M. J. 1961. A history of formal logic. Notre Dame, IN: University of Notre
Dame Press.
Bockarova, Mariana, Marcel Danesi and Rafael Nez (eds.). 2012. Semiotic and cognitive science essays on the nature of mathematics. Munich: Lincom Europa.
Unauthenticated
Download Date | 6/6/16 9:44 PM
300 | Bibliography
Bod, Rens, Hay, Jennifer and Jannedy, Stefanie. 2003. Probabilistic linguistics. Cambridge:
MIT Press.
Bogoslovksy, Boris B. 1928. The technique of controversy. London: Paul, Trench and Teubner.
Bolinger, Dwight. 1968. Aspects of language. New York, NY: Harcourt, Brace, Jovanovich.
Boole, George. 1854. An investigation of the laws of thought. New York, NY: Dover.
Booth, Andrew D. 1955. Use of a computing machine as a mechanical dictionary. Nature 176.
565.
Booth, Andrew D. and Locke, William N.. 1955. Historical introduction. In W. N. Locke and A. D.
Booth (eds.), Machine translation of languages, 114. New York, NY: John Wiley.
Borel, mil. 1909. Le continu mathmatique et le continu physique. Rivista di Scienza 6: 2135.
Bottini, Gabriella, Corcoran, Rhiannon, Sterzi, Roberto, Paulesu, Eraldo, Schenone, Pietro,
Scarpa, Pina, Frackowiak, Richard S. J., and Frith, Christopher D. 1994. The role of the right
hemisphere in the interpretation of gurative aspects of language: A positron emission
tomography activation study. Brain 117: 12411253.
Brainerd, Barron. 1970. A stochastic process related to language change. Journal of Applied
Probability 7. 6978.
Bronowski, Jacob. 1973. The ascent of man. Boston, MA: Little, Brown, and Co.
Bronowski, Jacob. 1977. A sense of the future. Cambridge, MA: MIT Press.
Brown, Roger. 1958. Words and things: An introduction to language. New York, NY: The Free
Press.
Brown, Roger. 1986. Social psychology. New York, NY: Free Press.
Brownell, Hiram H. 1988. Appreciation of metaphoric and connotative word meaning by braindamaged patients. In: Cgristine Chiarello (ed.), Right hemisphere contributions to lexical
semantics, 1932. New York, NY: Springer.
Brownell, Hiram H., Heather H. Potter and Diane Michelow. 1984. Sensitivity to lexical denotation and connotation in braindamaged patients: A double dissociation? Brain and
Language 22. 253265.
Bruno, Giuseppe, Genovese, Andrea, and Improta, Gennaro. 2013. Routing problems: A historical perspective. In: Mircea Pitici (ed.), The best writing in mathematics 2012. Princeton, NJ:
Princeton University Press.
Bryant, Edwin. 2001. The quest for the origins of Vedic culture. Oxford: Oxford University Press.
Buckland, William. 2007. Forensic semiotics. Semiotic Review of Books 10. 916.
Bhler, Karl. 1908 [1951]. On thought connection. In D. Rapaport (ed.), Organization and
pathology of thought, 8192. New York, NY: Columbia University Press.
Bhler, Karl. 1934. Sprachtheorie: Die Darstellungsfunktion der Sprache. Jena: Fischer.
Burke, John and Kincannon, Eric. 1991. Benfords law and physical constants: The distribution
of initial digits. American Journal of Physics 14. 5963.
Burr, David and Ross, John. 2008. A visual sense of number. Current Biology 18: 425428.
Butterworth, Brian. 1999. What counts: How every brain is hardwired for math. Michigan: Free
Press.
Butterworth, Brian. 2010. Foundational numerical capacities and the origins of dyscalculia.
Trends in Cognitive Science 14: 534541.
Butterworth Brian, Varma Sashank, and Laurillard Diana. 2011. Dyscalculia: From brain to education. Science 332: 10491053.
Bybee, Joan. 2006. Frequency of use and organization of language. Oxford: Oxford University
Press.
Unauthenticated
Download Date | 6/6/16 9:44 PM
Bibliography |
301
Callaghan, Catherine A. 1991. Utian and the Swadesh list. In J. E. Redden (ed.), Papers for the
American Indian language conference, held at the University of California, Santa Cruz, July
and August, 1991, 218237. Carbondale, IL: Department of Linguistics, Southern Illinois
University.
Calude, Cristian and Paun, Gheorghe. 1981. The absence of contextual ambiguities in programming languages. Revue Roumaine de Linguistique: Cahiers de Linguistique Thorique et
Applique 18. 91110.
Calude, Cristian. 1976. Quelques arguments pour le caractre nonformel des langages de
programmation. Revue Roumaine de Linguistique: Cahiers de Linguistique Thorique et
Applique 13. 257264.
Cameron, Angus. 2011. Ground zeroThe semiotics of the boundary line. Social Semiotics 21.
417434.
Cann, Ronnie. 1993. Formal semantics: An introduction. Cambridge: Cambridge University
Press.
Cantor, Georg. 1874. ber eine Eigenschaft des Inbegriffes aller reelen algebraischen Zahlen.
Journal fr die Reine und Angewandte Mathematik 77. 258262.
Cappelletti, Marinella, Butterworth, Brian, and Kopelman, Michael. 2006. The understanding
of quantiers in semantic dementia: A singlecase study. Neurocase: The Neural Basis of
Cognition 12. 136145.
Cardano, Girolamo. 1663 [1961]. The book on games of chance (Liber de ludo aleae). New York:
Holt, Rinehart, and Winston.
Carroll, Lewis 1879 [2004]. Euclid and his modern rivals. New York, NY: Dover.
Carroll, Lewis. 1887. The game of logic. New York, NY: Dover.
Cartmill, Matt, Pilbeam, David, and Isaac, Glynn. 1986. One hundred years of paleoanthropology. American Scientist 74: 410420.
Cassirer, Ernst. 1944. An essay on man. New Haven, CT: Yale University Press.
Chaitin, Gregory J. 2006. Meta math. New York, NY: Vintage.
Chandrasekaran, B., Glasgow, Janice, and Narayanan, N. Hari (eds.) 1995. Diagrammatic reasoning: Cognitive and computational perspectives. Cambridge, MA: MIT Press.
Changeux, Pierre, 2013. The good, the true, and the beautiful: A neuronal approach. New
Haven, CT: Yale University Press.
Chartier, Tim. 2014. Math bytes. Princeton, NJ: Princeton University Press.
Cherry, Colin. 1957. On human communication. Cambridge, MA: MIT Press.
Cho, Yank S. and Proctor, Robert W. 2007. When is an odd number not odd? Inuence of task
rule on the MARC effect for numeric classication. Journal of Experimental Psychology,
Learning, Memory, and Cognition 33. 832842.
Chomsky, Noam and Halle, Morris. 1968. The sound pattern of English. New York, NY: Harper
and Row.
Chomsky, Noam. 1957. Syntactic structures. The Hague: Mouton.
Chomsky, Noam. 1965. Aspects of the theory of syntax. Cambridge, MA: MIT Press.
Chomsky, Noam. 1966a. Cartesian linguistics: A chapter in the history of rationalist thought.
New York, NY: Harper and Row.
Chomsky, Noam. 1966b. Topics in the theory of generative grammar. The Hague: Mouton.
Chomsky, Noam. 1975. Reections on language. New York, NY: Pantheon.
Chomsky, Noam. 1982. Some concepts and consequences of the theory of government and
binding. Cambridge, MA: MIT Press.
Unauthenticated
Download Date | 6/6/16 9:44 PM
302 | Bibliography
Chomsky, Noam. 1986. Knowledge of language: Its nature, origin, and use. New York, NY:
Praeger.
Chomsky, Noam. 1990. Language and mind. In D. H. Mellor (ed.), Ways of communicating, 56
80. Cambridge: Cambridge University Press.
Chomsky, Noam. 1995. The minimalist program. Cambridge, MA: MIT Press.
Chomsky, Noam. 2000. New horizons in the study of language and mind. Cambridge: Cambridge University Press.
Chomsky, Noam. 2002. On nature and language. Cambridge: Cambridge University Press.
Chretien, Douglas. 1962. The mathematical models of glottochronology. Language 38. 1137.
Church, Alan. 1935. Abstract No. 204. Bulletin of the American Mathematical Society 41: 332
333
Church, Alan. 1936. An unsolvable problem of elementary number theory. American Journal of
Mathematics 58: 345363.
Cienki, Alan, Luka, Barbara J., and Smith, Michael B. (eds.). 2001. Conceptual and discourse
factors in linguistic structure. Stanford, CA: Center for the Study of Language and Information.
Clark, Michael. 2007. Paradoxes from A to Z. London: Routledge.
Clawson, Calvin C. 1999. Mathematical sorcery: Revealing the secrets of numbers. Cambridge,
MA: Perseus.
Clivio, Gianrenzo P., Danesi, Marcel and Maida-Nicol, Sara. 2011. Introduction to Italian dialectology. Munich: Lincom Europa
Cobham, Alan. 1965. The intrinsic computational difficulty of functions. Proceedings of Logic,
Methodology, and Philosophy of Science II, North Holland.
Cole, K. C. 1984. Sympathetic vibrations. New York, NY: Bantam.
Collins, Joan M. 1969. An exploration of the role of opposition in cognitive processes of kindergarten children. Ontario Institute for Studies in Education Theory.
Colyvan, Mark. 2012. An introduction to the philosophy of mathematics. Cambridge: Cambridge
University Press.
Connor, K. and Kogan, N. 1980. Topic-vehicle relations in metaphor: The issue of a symmetry.
In: R. P. Honeck and R. R. Hoffman (eds.), Cognition and gurative language, 238308.
Hillsdale, NJ: Lawrence Erlbaum Associates.
Cook, Stephen. 1971. The complexity of theorem proving procedures. Proceedings of the Third
Annual ACM Symposium on Theory of Computing. pp. 151158.
Cook, Walter A. 1969. Introduction to tagmemic analysis. New York, NY: Holt, Rinehart and
Winston.
Cook, William J. 2014. In pursuit of the traveling salesman problem. Princeton, NJ: Princeton
University Press.
Coseriu, Eugenio. 1973. Probleme der strukturellen Semantik. Tbingen: Tbinger Beitrge zur
Linguistik 40.
Coughlin, Deborah A. 2003. Correlating automated and human assessments of machine translation quality. In MT Summit IX, New Orleans, USA 2327.
Courant, Richard and Robbins, Herbert (1941). What is mathematics? An elementary approach
to ideas and methods. Oxford: Oxford University Press.
Craik, Kenneth. 1943. The nature of explanation. Cambridge: Cambridge University Press.
Craik, Kenneth. 1943. The nature of explanation. Cambridge: Cambridge University Press.
Crilly, Tony. 2011. Mathematics. London: Quercus.
Cruse, D. Alan. 1986. Lexical semantics, Cambridge, Eng.: Cambridge University Press.
Unauthenticated
Download Date | 6/6/16 9:44 PM
Bibliography |
303
Crystal, David. 2006. Language and the Internet. 2nd ed. Cambridge: Cambridge University
Press.
Crystal, David. 2008. txtng: the gr8 db8. Oxford: Oxford University Press.
Cummins, Robert. 1996. Representations, targets, and attitudes. Cambridge, MA: MIT Press.
Currie, Thomas E., Meade, Andrew, Guillon, Myrtille, and Mace, Ruth. 2013. Cultural phylogeography of the Bantu languages of SubSaharan Africa. Royal Society Publishing.
http://royalsocietypublishing.org/content/280/1762/20130695.
Dalrymple, Mary (ed.). 1999. Semantics and syntax in lexical functional grammar: The resource
logic approach. Cambridge, MA: MIT Press.
Dalrymple, Mary, Lamping, John, and Saraswat, Vijay. 1993. LFG semantics via constraints.
In Proceedings of the Sixth Meeting of the European ACL (97105). Utrecht: University of
Utrecht.
Dalrymple, Mary. 2001. Lexical functional grammar, No. 42 in Syntax and Semantics Series.
New York, NY: Academic Press.
Damasio, Antonio R. 1994. Descartes error: Emotion, reason, and the human brain. New York:
G. P. Putnams Sons.
Danesi, Marcel and Bockarova, Mariana. 2013. Mathematics as a modeling system. Tartu: University of Tartu Press.
Danesi, Marcel and Rocci, Andrea. 2009. Global linguistics: An introduction. Berlin: Mouton
de Gruyter.
Danesi, Marcel. 1987. Formal mothertongue training and the learning of mathematics in
elementary school: An observational note on the Brussels Foyer Project. Scientia Paedogogica Experimentalis 24: 313320.
Danesi, Marcel. 1998. Gender assignment, markedness, and indexicality: Results of a pilot
project. Semiotica 121: 213240.
Danesi, Marcel. 2000. Semiotics in language education. Berlin: Mouton de Gruyter.
Danesi, Marcel. 2001. Layering processes in metaphorization. International Journal of Computing Anticipatory Systems 8: 157173.
Danesi, Marcel. 2002. The puzzle instinct: The meaning of puzzles in everyday life. Bloomington, IN: Indiana University Press.
Danesi, Marcel. 2003. Second language teaching: A view from the right side of the Brain. Dordrecht: Kluwer Academic Publishers.
Danesi, Marcel. 2004a. Poetic logic: The role of metaphor in thought, language, and culture.
Madison, WI: Atwood Publishing.
Danesi, Marcel. 2004b. The liar paradox and the Towers of Hanoi: The ten greatest math puzzles
of all time. Hoboken, NJ: John Wiley.
Danesi, Marcel. 2006. Alphabets and the principle of least effort. Studies in Communication
Sciences 6. 4762.
Danesi, Marcel. 2007. The quest for meaning: A guide to semiotic theory and practice. Toronto:
University of Toronto Press.
Danesi, Marcel. 2008. ProblemSolving in mathematics: A semiotic perspective for educators
and teachers. New York, NY: Peter Lang.
Danesi, Marcel. 2011. George Lakoff on the cognitive and neural foundation of mathematics.
Fields Notes 11 (3). 1420.
Danesi, Marcel. 2013. Discovery in mathematics: An interdisciplinary perspective. Munich:
Lincom Europa.
Unauthenticated
Download Date | 6/6/16 9:44 PM
304 | Bibliography
Danly, M. and Shapiro, B. 1982. Speech prosody in Brocas aphasia. Brain and Language 16:
171190.
Davies, W. Vivien. 1988. Egyptian hieroglyphs. Berkeley: University of California Press.
Davis, Philip J. and Hers, Reuben. 1986. Descartes dream: The world according to mathematics.
Boston, MA: Houghton Mifflin.
Dawkins, Richard. 1976. The selsh gene. Oxford: Oxford University Press.
Dawkins, Richard. 1985. River out of Eden: A Darwinian view of life. New York, NY: Basic.
Dawkins, Richard. 1987. The blind watchmaker. Harlow: Longmans
Dawkins, Richard. 1998. Unweaving the rainbow: Science, delusion and the appetite for wonder. Boston, MA: Houghton Mifflin.
De Morgan, Augustus. 1847. Formal logic or the calculus of inference. London: Taylor and Walton.
De Souza, Clarisse S. 2005. The semiotic engineering of humancomputer interaction. Cambridge, MA: MIT Press.
Dehaene, Stanislas. 1997. The number sense: How the mind creates mathematics. Oxford: Oxford University Press.
Dehaene, Stanislas. 2004. Arithmetic and the brain. Current Opinion in Neurobiology 14: 218
224.
Dehaene, Stanislas., Piazza, Manuela, Pinel, Philippe, and Cohen, Laurent. 2003. Three parietal circuits for number processing. Cognitive Neuropsychology 20: 487506.
Denoual, Etienne and Lepage, Yves. 2005. BLEU in characters: Towards automatic MT evaluation in languages without word delimiters. Companion Volume to the Proceedings of the
Second International Joint conference on Natural Language Processing 8186.
Derbyshire, J. 2004. Prime obsession: Bernhard Riemann and his greatest unsolved problem in
mathematics. Washington, DC: Joseph Henry Press.
Derrida, Jacques. 1967. De la grammatologie. Paris: Minuit.
Descartes, Ren. 1637 [1996]. La gometrie. Paris: Presses Universitaires de France.
Descartes, Ren. 1641 [1986]. Meditations on rst xwphilosophy with selections from the objections and replies. Cambridge: Cambridge University Press.
Devlin, Keith J. 2000. The math gene: How mathematical thinking evolved and why numbers are
like gossip. New York, NY: Basic.
Devlin, Keith. 2005. The math instinct. New York, NY: Thunders Mouth Press.
Devlin, Keith. 2011. The man of numbers: Fibonaccis arithmetic revolution. New York, NY:
Walker and Company.
Dewdney, Andrew K. 1999. A mathematical mystery tour: Discovering the truth and beauty of
the cosmos. New York, NY: John Wiley and Sons.
Diamantaras, Konstantinos, Duch, Wlodek and Iliadis, Lazaros S. (eds.). 2010. Articial Neural
Networks ICANN 2010: 20th International Conference. New York, NY: Springer.
Diaz, Michele T., Barrett, Kyle M., and Hogstrom, Larson J. 2011. The inuence of sentence novelty and gurativeness on brain activity. Neuropsychologia 49: 320330.
Dirven, Ren and Verspoor, Marjolijn. 2004. Cognitive exploration of language and linguistics.
Amsterdam: John Benjamins.
Dobson, Annette and Black, Paul. 1979. Multidimensional scaling of some lexicostatistical
data. Mathematical Scientist 1979/4, 5561.
Dobson, Annette. 1969. Lexicostatistical grouping. Anthropological Linguistics 7, 216221.
Unauthenticated
Download Date | 6/6/16 9:44 PM
Bibliography |
305
Doddington, George. 2002. Automatic evaluation of machine translation quality using ngram
cooccurrence statistics. Proceedings of the human language Technology conference (HLT),
San Diego, CA 128132.
Dormehl, Luke. 2014. The formula. New York, NY: Perigree.
Driver, Godfrey R. 1976. Semitic writing: From pictograph to alphabet. Oxford: Oxford University
Press.
Du Sautoy, M. 2004. The music of the primes: Bernhard Riemann and the greatest unsolved
problem in mathematics. New York, NY: HarperCollins.
Dyen Isidore. 1975. Linguistic subgrouping and lexicostatistics. The Hague, Mouton.
Dyen, Isidore (ed.). 1973. Lexicostatistics in genetic linguistics: Proceedings of the Yale conference, April 34, 1971. The Hague: Mouton.
Dyen, Isidore, James, A. T. and Cole, J. 1967. Language divergence and estimated word retention
rate. Language 43. 150171.
Dyen, Isidore, Kruskal, Joseph. and Black, Paul. 1992. An Indoeuropean classication, a lexicostatistical experiment. Transactions of the American Philosophical Society 82/5.
Dyen, Isidore. 1963. Lexicostistically determined borrowing and taboo. Language 39, 6066.
Dyen, Isidore. 1965. A lexicostatistical classication of the Austronesian languages. International Journal of American Linguistics, Memoir 19.
Eckman, Fred R., Moravcsik, Edith A., and Wirth, Jessica R. (eds.). 1983. Markedness. New York,
NY: Plenum.
Eco, Umberto. 1984. Semiotics and the philosophy of language. Bloomington, IN: Indiana University Press.
Eco, Umberto. 1992. Interpretation and overinterpretation. Cambridge: Cambridge University
Press, 1992).
Eco, Umberto. 1998. Serendipities: Language and lunacy, translated by William Weaver. New
York, NY: Columbia University Press.
Elk, Victor and Matras, Yaron. 2006. Markedness and language change: The Romani sample.
Berlin: Mouton de Gruyter.
Elwes, Richard. 2014. Mathematics 1001. Buffalo, NY: Firey.
Embleton, Sheila M. 1986. Statistics in historical linguistics. Bochum: Brockmeyer.
English, Lyn D. (ed.). 1997. Mathematical reasoning: Analogies, metaphors, and images. Mahwah, NJ: Lawrence Erlbaum Associates.
Erds, Paul. 1934. A theorem of Sylvester and Schur. Journal of the London Mathematical Society 9: 282288.
Ernest, Paul. 2010. Mathematics and metaphor: A response to elzabeth Mowat & Brent Davis.
Complicity: An International Journal of Complexity and Education 7: 98104
Euclid (1956). The thirteen books of Euclids elements, 3 volumes. New York, NY: Dover.
Evans, Merran, Hastings, Nicholas, and Peacock, Brian. 2000. Statistical distributions. New
York, NY: John Wiley.
Everett, Daniel. 2005. Cultural constraints on grammar and cognition in Pirah. Current Anthropology 46. 621624.
Eymard, Pierre, Lafon, Jean-Pierre, and Wilson, Stephen S. 2004. The number pi. New York, NY:
American Mathematical Society.
Fan-Pei, Gloria Yanga et al. 2013. Contextual effects on conceptual blending in metaphors: An
event-related potential study. Journal of Neurolinguistics 26: 312326.
Fauconnier, Gilles and Turner, Mark. 2002. The way we think: Conceptual blending and the
minds hidden complexities. New York, NY: Basic.
Unauthenticated
Download Date | 6/6/16 9:44 PM
306 | Bibliography
Feldman, Jerome. 2006. From molecule to metaphor: A neural theory of language. Cambridge,
MA: MIT Press.
Ferrer i Cancho, Ramon and Sol, Ricard V. 2001. Two regimes in the frequency of words and the
origins of complex lexicons: Zipfs law revisited. Journal of Quantitative Linguistics 2001,
8, 165231.
Ferrer i Cancho, Ramon, Riordan, Oliver, and Bollobs, Bla. 2005. The consequences of Zipfs
law for syntax and symbolic reference. Proceedings of the Royal Society of London, Series B, Biological Sciences, 2005, 15. Royal Society of London.
Ferrer i Cancho, Ramon. 2005. The variation of Zipfs law in human language. European Physical Journal 2005, 44, 24957.
Ferrero, Guillaume. 1894. Linertie mentale et la loi du moindre effort. Revue Philosophique de
la France et de ltranger 37. 169182.
Fillmore, Charles J. 1968. The case for case. In E. Bach and R. T. Harms (eds.), Universals in
linguistic theory. London: Holt, Rinehart and Winston.
Findler, Nicholas V. and Viil, Heino. 1964. A few steps toward computer lexicometry. American
Journal of Computational Linguistics. 179.
Fischer, John L. 1958. Social inuences in the choice of a linguistic variant. Word 14. 4757.
Fleming, Harold C. 1973. Subclassication in HamitoSamitic. In Isidore Dyen (ed.), Lexicostatistics in genetic linguistics, 8588. The Hague: Mouton.
Flood, R. and Wilson, R. 2011. The great mathematicians: Unravelling the mysteries of the universe. London: Arcturus.
Fodor, Jerry A. 1975. The language of thought. New York, NY: Crowell.
Fodor, Jerry A. 1983. The modularity of mind. Cambridge, MA: MIT Press.
Fodor, Jerry A. 1987. Psychosemantics: The problem of meaning in the philosophy of mind.
Cambridge, MA: MIT Press.
Fortnow, Lance. 2013. The golden ticket: P, NP, and the search for the impossible. Princeton, NJ:
Princeton University Press.
Foster, Donald. 2001. Author unknown: Tales of a literary detective. New York, NY: Holt.
Foucault, Michel. 1972. The archeology of knowledge, trans. by A. M. Sheridan Smith. New York,
NY: Pantheon.
Fox, Anthony. 1995. Linguistic reconstruction: An introduction to theory and method. Oxford:
Oxford University Press.
Fox, James J. 1974. Our ancestors spoke in pairs: Rotinese views of language, dialect and code.
In R. Bauman and J. Scherzer (eds.), Explorations in the ethnography of speaking, 6588.
Cambridge: Cambridge University Press.
Fox, James J. 1975. On binary categories and primary symbols. In R. Willis (ed.), The interpretation of symbolism, 99132. London: Malaby.
Frege, Gottlob. 1879. Begriffsschrift eine der arithmetischen nachgebildete Formelsprache des
reinen Denkens. Halle: Nebert.
Freiberger, Marianne and Thomas, Rachel. 2015. Numericon: A journey through the hidden lives
of numbers. New York, NY: Quercus.
Friedman, Thomas L. 2007. The world is at: A brief history of the twenty-rst century. New York:
Picador.
Gabelentz, Georg von der. 1901. Die Sprachwissenschaft; ihre Aufgaben, Methoden und bisherigen Ergebnisse. Leipzig: C. H. Tauchnitz.
Galilei, Galileo. 1638 [2001]. Dialogue concerning the two chief world systems, trans. by Stillman Drake. New York, NY: Modern Library.
Unauthenticated
Download Date | 6/6/16 9:44 PM
Bibliography |
307
Gamkrelidze, Thomas V. and Ivanov, Vjaeslav V.. 1990. The early history of Indo-European
languages. Scientic American 262. 110116.
Ganesalingam, Mohan and Herbelot, Aurelie. 2006. Composing distributions: mathematical
structures and their linguistic interpretation. Computational Linguistics 1. 131.
Gardner, Howard. 1985. The minds new science: A history of the cognitive revolution. New York,
NY: Basic Books.
Gardner, Martin. 1961. The 2nd Scientic American book of mathematical puzzles. New York,
NY: Simon and Schuster.
Garnham, Alan. 1991. The mind in action: A personal view of cognitive science. London: Routledge.
Geeraerts, Dirk (ed.). 2006. Cognitive linguistics. Berlin: Mouton de Gruyter.
Gessen, Masha. 2009. Perfect rigor: A genius and the mathematical breakthrough of the century. Boston, MA: Houghton Mifflin Harcourt.
Ghyka, Matila. 1977. The geometry of art and life. New York, NY: Dover.
Gibbs, Raymond W. 1994. The poetics of mind: Figurative thought, language, and understanding. Cambridge: Cambridge University Press.
Gillings, Richard J. 1972. Mathematics in the time of the pharaohs. Cambridge, MA: MIT Press.
Gleason, Henry L., Jr. 1959. Counting and calculating for historical reconstruction. Anthropological Linguistics 2. 2232.
Gdel, Kurt. 1931. ber formal unentscheidbare Stze der Principia Mathematica und verwandter Systeme, Teil I. Monatshefte fr Mathematik und Physik 38: 173189.
Godel, Robert. 1957. Les sources manuscrites du Cours de linguistique gnrale de F. de
Saussure. Paris: Minard.
Godino, Juan D., Font, Vicenc, Wilhelmi, Miguel R., and Lurduy, Orlando. 2011. Why is the learning of elementary arithmetic concepts difficult? Semiotic tools for understanding the
nature of mathematical objects. Educational Studies in Mathematics 77: 247265.
Goetzfridt, Nicholas J. 2008. Pacic ethnomathematics: A bibliographic study. Honolulu, HI:
University of Hawaii Press.
Goldberg, Elkhonon and Costa, Louis D. 1981. Hemispheric differences in the acquisition of
descriptive systems. Brain and Language 14: 144173.
Gordon, Alison F. and Chris Pratt. 1998. Learning to be literate, 2nd ed. Oxford: Blackwell.
Graesser, A., Mio, J. and Millis, K. 1989. Metaphors in persuasive communication. In: D. Meutsch
and R. Viehoff (eds.), 131154, Comprehension and literary discourse: Results and problems of interdisciplinary approaches. Berlin: Mouton de Gruyter.
Gray, Russell D. and Quentin D. Atkinson. 2003. Languagetree divergence times support the
Anatolian theory of Indo-European origin. Nature 425. 435439.
Greenberg, Joseph H. 1966. Language universals. The Hague: Mouton.
Greimas, Algirdas J. 1966. Smantique structurale. Paris: Larousse.
Greimas, Algirdas J. 1970. Du sens. Paris: Seuil.
Greimas, Algirdas J. 1987. On meaning: Selected essays in semiotic theory, trans. by P. Perron
and F. Collins. Minneapolis, MN: University of Minnesota Press.
Grice, Paul. 1975. Logic and conversation. In P. Cole and J. Morgan (eds.), Syntax and semantics,
Vol. 3, 4158. New York, NY: Academic.
Gudschinsky, Sarah. 1956. The ABCs of lexicostatistics (glottochronology). Word, 12, 175210.
Guhe, Markus et al. 2011. A computational account of conceptual blending in basic mathematics. Cognitive Systems Research 12: 249265.
Unauthenticated
Download Date | 6/6/16 9:44 PM
308 | Bibliography
Haarmann, Harald. 1990. Basic vocabulary and language contacts; the disillusion of glottochronology. Indogermanische Forschungen 95. 749.
Hadamard, Jacques. 1945. The psychology of invention in the mathematical eld. Princeton, NJ:
Princeton University Press.
Haken, Wolfgang and Appel, Kenneth. 1977. The solution of the Four-Color-Map Problem. Scientic American 237: 108121.
Hales, Alfred W. and Jewett, Robert. 1963. Regularity and positional games. Transactions of the
American Mathematical Society 106: 222229.
Halliday, Michael A. K. 1966. Lexis as a linguistic level. Journal of Linguistics 2(1) 1966. 5767.
Halliday, Michael A. K. 1975. Learning how to mean: Explorations in the development of language. London: Arnold.
Halliday, Michael A. K. 1985. Introduction to functional grammar. London: Arnold.
Hallyn, Ferdinand. 1990. The poetic structure of the world: Copernicus and Kepler. New York, NJ:
Zone Books.
Hammer, Eric and Shin, SunJoo. 1996. Euler and the role of visualization in logic. In Seligman, J. and Westersthl, D. (eds.), Logic, language and computation: Volume 1, 271286.
Stanford, CA: CSLI Publications.
Hammer, Eric and Shin, SunJoo. 1998. Eulers visual logic. History and Philosophy of Logic 19:
129.
Hammer, Eric. 1995. Reasoning with sentences and diagrams. Notre Dame Journal of Formal
Logic 35: 7387.
Harel, Guershon and Sowder, Larry. 2007. Toward comprehensive perspectives on the learning
and teaching of proof. In: F. K. Lester (ed.), Second handbook of research on mathematics
teaching and learning, 805842. Charlotte, NC: Information Age Publishing.
Harris, Roy. 1993. The Linguistics Wars. Oxford: Oxford University Press.
Harris, Zellig. 1951. Methods in structural linguistics. Chicago, IL: University of Chicago Press.
Harris, Zellig. 1968. Mathematical structures of language. New York, NY: John Wiley.
Hartimo, Mirja (ed.) 2010. Phenomenology and mathematics. New York, NY: Springer.
Haspelmath, Martin. 2006. Against markedness (and what to replace it with). Journal of Linguistics 42. 2570.
Hatten, Robert S. 2004. Musical meaning in Beethoven: Markedness, correlation and interpretation. Bloomington, IN: Indiana University Press.
Havil, Julian. 2008. Impossible? Princeton, NJ: Princeton University Press.
Hayward, J. W. 1984. Perceiving ordinary magic. Boston, MA: Shambala.
Heath, Thomas L. 1949. Mathematics in Aristotle. Oxford: Oxford University Press.
Hegel, G. W. F. 1807. Phaenomenologie des Geistes. Leipzig: Teubner.
Heilman, Kenneth M., Scholes, R., and Watson, R. T. 1975. Auditory affective agnosia: Disturbed
comprehension of affective speech. Journal of Neurology, Neurosurgery and Psychiatry
38: 6972.
Hersh, Reuben. 1997. What is mathematics really? Oxford: Oxford University Press.
Hertz, Robert. 1973. The preeminence of the right hand: A study in religious polarity. In
R. Needham (ed.). Right and left, 2336. Chicago, IL: University of Chicago Press.
Hickok, Gregory, Bellugi, Ursula, and Klima, Edward S. 2001. Sign language in the brain. Scientic American 284 (6): 5865.
Hier, Daniel B. and Joni Kaplan. 1980. Verbal comprehension decits after right hemisphere
damage. Applied Psycholinguistics 1. 270294.
Unauthenticated
Download Date | 6/6/16 9:44 PM
Bibliography |
309
Hilbert, David. 1931. Die Grundlagen der elementaren Zahlentheorie. Mathematische Annalen
104: 485494.
Hill, Theodore P. 1998. The rst digit phenomenon. American Scientist 86. 35863.
Hirst, Graeme. 1988. Resolving lexical ambiguity computationally with spreading activation
and Polaroid Words. In: S. L. Small, G. W. Cottrell, and M. K. Tanenhaus (eds.), Lexical ambiguity resolution: Perspectives from psycholinguistics, neuropsychology, and articial
intelligence, 73107. San Mateo, CA: Morgan Kaufmann Publishers.
Hjelmslev, Louis. 1939. Note sur les oppositions supprimables. Travaux de Cercle Linguistique
de Prague 8. 5157.
Hjelmslev, Louis. 1959. Essais linguistique. Copenhagen: Munksgaard.
Hjelmslev, Louis. 1963. Prolegomena to a theory of language. Madison, WI: University of Wisconsin Press.
Hobbes, Thomas. 1656 [1839]. Elements of philosophy. London: Molesworth.
Hockett, Charles F. 1960. The origin of speech. Scientic American 203. 8896.
Hockett, Charles F. 1967. Language, mathematics and linguistics. The Hague: Mouton.
Hoenigswald, Henry M. 1960. Language change and linguistic reconstruction. Chicago, IL:
University of Chicago Press.
Hofstadter, Douglas and Sander, Emanuel. 2013. Surfaces and essences: Analogy as the fuel
and re of thinking. New York, NY: Basic.
Hofstadter, Douglas. 1979. Gdel, Escher, Bach: An eternal golden braid. New York, NY: Basic.
Hoijer, Harry. 1956. Lexicostatistics: A critique. Language, 32, 4960.
Holm, Hans J. 2003. The proportionality trap. Or: What is wrong with lexicostatistical subgrouping. Indogermanische Forschungen 108. 3846.
Holm, Hans J. 2005. Genealogische Verwandtschaft. In R. Khler, G. Altmann, R. Piotrowski
(eds.), Quantitative Linguistik; ein internationales Handbuch. Berlin: Walter de Gruyter.
Holm, Hans J. 2007. The new arboretum of IndoEuropean Trees: Can new algorithms reveal the
phylogeny and even prehistory of IE? Journal of Quantitative Linguistics 14. 167214.
Hopper, Paul. 1998. Emergent grammar. In: Tomasello, M. eds. 1998. The new psychology of
language: Cognitive and functional approaches to language structure. Mahwah, NJ: Earlbaum, pp. 155176.
Houd, Olivier and Tzourio-Mazoyer, Nathalie (2003). Neural foundations of logical and mathematical cognition. Nature reviews Neuroscience 4: 507514.
Hubbard Edward M., Arman, A. C., Ramachandran V. S., and Boynton, G. M. 2005. Individual
differences among grapheme-color synesthetes: Brain-behavior correlations. Neuron 45:
975985.
Hubbard, Edward M., Diester, Ilka, Cantlon, Jessica, Ansari, Daniel, Opstal, Filip van, and
Troiani, Vanessa. 2008. The evolution of numerical cognition: From number neurons to
linguistic quantiers. Journal of Neuroscience 12. 1181911824.
Humboldt, Wilhelm von. 1836 [1988]. On language: The diversity of human language-structure
and its inuence on the mental development of mankind, P. Heath (trans.). Cambridge:
Cambridge University Press.
Hume, David. 1749 [1902]). An enquiry concerning human understanding. Oxford: Clarendon.
Husserl, Edmund 1970 [1891]. Philosophie der Arithmetik. The Hague: Nijhoff
Hutchins, John. 1997. From rst conception to rst demonstration: The nascent years of machine translation, 19471954. A chronology. Machine Translation 12. 195252.
Hyde, Daniel C. 2011. Two systems of non-symbolic numerical cognition. Frontiers in Human
Neuroscience. 10.3389/fnhum.2011.00150
Unauthenticated
Download Date | 6/6/16 9:44 PM
310 | Bibliography
Unauthenticated
Download Date | 6/6/16 9:44 PM
Bibliography |
311
Kamp, Hans. 1981), A theory of truth and semantic representation, in J. Groenendijk, T. Janssen,
and M. Stokhof (eds.), Formal methods in the study of language. Centre for Mathematics
and Computer Science, Amsterdam, 114.
Kant, Immanuel. 2011 [1790]. Critique of pure reason, trans. J. M. D. Meiklejohn. CreateSpace
Platform.
Kaplan, Robert and Kaplan, Ellen. 2007. Out of the labyrinth: Setting mathematics free. London:
Bloomsbury Press.
Kaplan, Robert and Kaplan, Ellen. 2011. Hidden harmonies: The lives and times of the
Pythagorean theorem. London: Bloomsbury Press.
Kasner, Edward and Newman, James. 1940. Mathematics and the imagination. New York, NY:
Simon and Schuster.
Kauffman, Louis K. 2001. The mathematics of Charles Sanders Peirce. Cybernetics and Human
Knowing 8: 79110.
Kemp, J. Alan (trans.). 1986. The Tekhne Grammatike of Dionysius Thrax. Amsterdam: John Benjamins.
Kendon, Simon and Creen, Malcolm. 2007. An introduction to knowledge engineering. New
York, NY: Springer.
Kennedy, J. M. 1984. Vision and metaphors. Toronto: Toronto Semiotic Circle.
Kennedy, J. M. 1993. Drawing and the blind: Pictures to touch. New Haven, CT: Yale University
Press.
Kennedy, J. M. and Domander, R. 1986. Blind people depicting states and events in metaphoric
line drawings. Metaphor and Symbolic Activity 1: 109126.
Kennedy, John M. 1999. Metaphor in pictures: Metonymy evokes classication. International
Journal of Applied Semiotics 1. 8398.
Kilgarriff, Adam. 2005. Language is never, ever, eve, random. Corpus Linguistics and Linguistic
Theory 12: 263275.
King, Margaret. 1992. Epilogue: On the relation between computational linguistics and formal
semantics. In Michael Rosner; Roderick Johnson. Computational linguistics and formal
semantics. Cambridge: Cambridge University Press.
King, Ruth. 1991. Talking gender: A nonsexist guide to communication. Toronto: Copp Clark
Pitman Ltd.
Kiryushchenko, Vitaly. 2012. The visual and the virtual in theory, life and scientic practice: The
case of Peirces Quincuncial map projection. In Mariana Bockarova, Marcel Danesi, and
Rafael Nez (eds.), Semiotic and cognitive science essays on the nature of mathematics,
6170. Munich: Lincom Europa.
Kochenderfer, Mykel J. 2015. Decision making under uncertainty. Cambridge, MA: MIT Press
Koehn, Phili 2010. Statistical machine translation. Cambridge: Cambridge University Press.
Khler, Reinhard, Altmann, Gabriel, and Grzybek, Peter (eds.). 2015. Quantitative linguistics.
Berlin: Mouton de Gruyter.
Kolmogorov, Andrei N., 1933, Grundbegriffe der Wahrscheinlichkeitsrechnung, Ergebnisse der
Mathematik. Berlin: Springer.
Konnor, Melvin. 1991. Human nature and culture: Biology and the residue of uniqueness. In:
J. J. Sheehan and M. Sosna (eds.), The boundaries of humanity, pp. 103124. Berkeley, CA:
University of California Press.
Kornai , Andrs. 2008. Mathematical linguistics. New York, NY: Springer.
Kosslyn, Stephen M. 1983. Ghosts in the minds machine: Creating and using images in the
brain. New York, NY: W. W. Norton.
Unauthenticated
Download Date | 6/6/16 9:44 PM
312 | Bibliography
Kramsch, Claire. 1998. Language and culture. Oxford: Oxford University Press.
Krawczyk, Daniel C. 2012. The cognition and neuroscience of relational reasoning. Brain Research 1428: 1323.
Kroeber Alfred L. and Chretien, Charles D. 1937. Quantitative classication of Indo-European
languages. Language 13. 83103.
Kronenfeld, David B., Bennardo, Giovanni, and de Munck, Victor C. (eds.). 2011. A companion to
cognitive anthropology. Chichester: Wiley-Blackwell.
Kruszewski, Mikolai. 1883 [1955]. Writings in general linguistics. Amsterdam: John Benjamins.
Kucera, Henry and Francis, W. Nelson. 1967. Computatonal analysis of present-day American
English. Providence, RI: Brown University Press.
Kuhn, Thomas S. 1970. The structure of scientic revolutions. Chicago, IL: University of Chicago
Press.
Kulacka, Agnieszka. On the nature of statistical language laws. In: Piotr Stalmaszczyk (ed.),
Philosophy of language and linguistics: Volume I, pp. 151168. Piscataway, NJ: Transaction
Publishers.
Kulpa, Zenon 2004. On diagrammatic representation of mathematical knowledge. In: A. Sperti,
G. Bancerek, and A. Trybulec (eds.), Mathematical knowledge management. New York, NY:
Springer.
Kuryowicz, Jerzy. 1927. Schwa indoeuropen et Hittite. Symbolae grammaticae in honorem
Ioannis Rozwadowski, Vol. 1, 95104. Cracow: Gebethner and Wolff.
Kurzweil, Ray. 2012. How to create a mind: The secret of human thought revealed. New York, NY:
Viking.
Labov, William. 1963. The social motivation of a sound change. Word 19. 273309.
Labov, William. 1967. The effect of social mobility on a linguistic variable. In S. Lieberson (ed.),
Explorations in sociolinguistics, 2345. Bloomington, IN: Indiana University Research
Center in Anthropology, Linguistics and Folklore.
Labov, William. 1972. Language in the inner city. Philadelphia, PA: University of Pennsylvania
Press.
Lachaud, Christian Michel. 2013. Conceptual metaphors and embodied cognition: EEG coherence reveals brain activity differences between primary and complex conceptual
metaphors during comprehension. Cognitive Systems Research 2223: 1226.
Lai, Vicky T., van Dam, Wessel, Conant, Lisa L., Binder, Jeffrey R. and Rutvik, H. Desai. 2015.
Familiarity differentially affects right hemisphere contributions to processing metaphors
and literals. Frontiers in Human Neuroscience, Volume 10.
Lakoff, George and Johnson, Mark. 1980. Metaphors we live by. Chicago, IL: Chicago University
Press.
Lakoff, George and Johnson, Mark. 1999. Philosophy in esh: The embodied mind and its challenge to western thought. New York, NY: Basic.
Lakoff, George and Nez, Rafael. 2000. Where mathematics comes from: How the embodied
mind brings mathematics into being. New York, NY: Basic Books.
Lakoff, George. 1970. Irregularity in syntax. New York, NY: Holt, Rhinehart, & Winston.
Lakoff, George. 1987. Women, re and dangerous things: What categories reveal about the
mind. Chicago, IL: University of Chicago Press.
Lakoff, George. 2012a. Explaining embodied cognition results. Topics in Cognitive Science 4.
773785.
Unauthenticated
Download Date | 6/6/16 9:44 PM
Bibliography |
313
Lakoff, George. 2012b. The contemporary theory of metaphor. In Marcel Danesi and Sara
Maida-Nicol (eds.), Foundational texts in linguistic anthropology, 12871. Toronto: Canadian Scholars Press.
Lamb, Sydney. 1999. Pathways of the brain: The neurocognitive basis of labguage. Amsterdam:
John Benjamins.
Lambek, Joachim. 1958. The mathematics of sentence structure. American Mathematical
Monthly 65. 54170.
Langacker, Ronald W. 1987. Foundations of cognitive grammar. Stanford, CA: Stanford University Press.
Langacker, Ronald W. 1990. Concept, image, and symbol: The cognitive basis of grammar.
Berlin: Mouton de Gruyter.
Langacker, Ronald W. 1999. Grammar and conceptualization. Berlin: Mouton de Gruyter.
Langer, Suzanne K. 1948. Philosophy in a new key. New York, NY: Mentor Books.
Laroche, Paula. 2007. On words: Insight into how our words work and dont. Oak Park, IL:
Marion Street Press.
Laurence, William L. 2013. Four-Color proof. In: G. Kolata and P. Hoffman (eds.), Book of mathematics, 135137. New York, NY: Sterling.
Leepik, Peet. 2008. Universals in the context of Juri Lotmans semiotics. Tartu: Tartu University
Press.
Lees, Robert. 1953. The basis of glottochronology. Language 29. 113127.
Lees, Robert. 1957. Review of Syntactic Structures. Language 33. 375407.
Lesh, Robert and Harel, Guershon. 2003. Problem solving, modeling, and local conceptual
development. Mathematical Thinking and Learning 5: 157.
LviStrauss, Claude. 1958. Anthropologie structurale. Paris: Plon.
LviStrauss, Claude. 1971. LHomme nu. Paris: Plon.
Levine, Robert. 1997. A geography of time: The temporal misadventures of a social psychologist
or how every culture keeps time just a little bit differently. New York, NY: Basic.
Li, Wentian. 1992. Random texts exhibit Zipfslawlike word frequency distribution. IEEE
Transactions on Information Theory 38. 18421845.
Libertus, M. E., Pruitt, L. B., Woldorff, M. G. and Brannon, E. M. 2009. Induced alpha-band oscillations reect ratio-dependent number discrimination in the infant brain. Journal of
Cognitive Neuroscience 21: 23982406.
Locke, John. 1690 [1975]. An essay concerning human understanding, ed. by P. H. Nidditch.
Oxford: Clarendon Press.
Lorrain, Franois. 1975. Rseaux sociaux et classications sociales. Paris: Hermann.
Lotman, Juri. 1991. Universe of the mind: A semiotic theory of culture. Bloomington, IN: Indiana
University Press.
Luhtala, Anneli. 2005. Grammar and philosophy in late anqituity. Amsterdam: John Benjamins.
Luque, Bartolo and Lacasa, Lucas. 2009. The rst digit frequencies of primes and Riemann zeta
zeros. Proceedings of the Royal Society A. 10: 1098.
Luria, Alexander R. 1947. Traumatic aphasia. The Hague: Mouton.
Lutosawski, Wincenty. 1890. Principes de stylomtrie. Revue des tudes grecques 41. 6181.
Macaulay, Ronald. 2009. Quantitative methods in sociolinguistics. New York, NY: Palgrave
Macmillan.
MacCormac, Eric. 1985. A cognitive theory of metaphor. Cambridge, MA: MIT Press.
MacCormick, John. 2012. Nine algorithms that changed the future. Princeton, NJ: Princeton
University Press.
Unauthenticated
Download Date | 6/6/16 9:44 PM
314 | Bibliography
Mackenzie, Dana. 2012. The universe in zero words. London: Elwin Street Publications.
MacNamara, Olwyn. 1996. Mathematics and the sign. Proceedings of PME 20. 369378.
MacWhinney, Brian. 2000. Connectionism and language learning. In: M. Barlow and S. Kemmer
(eds.), Usage models of language, 121150. Stanford: Center for the Study of Language
and Information.
Mallory, James P. 1989. In search of the Indo-Europeans: Language, archaeology and myth.
London: Thames and Hudson.
Malmberg, Bertil. 1974. Langueformevaleur: Reexion sur trios concepts saussurienes.
Semiotica 18. 312.
Mandelbrot, Benoit. 1954. Structure formelle des textes et communication. Word 10. 127.
Mandelbrot, Benoit. 1977. The fractal geometry of nature. New York, NY: Freeman and Co.
Mansouri, Fethi. 2000. Grammatical markedness and information Processing in the acquisition
of Arabic [as] a second language. Munich: Lincom.
Maor, Eli. 1994. e: The story of a number. Princeton, NJ: Princeton University Press.
Maor, Eli. 2007. The Pythagorean theorem: A 4,000-year history. Princeton, NJ: Princeton University Press.
Marcus, Solomon and Vasiliu, Em. 1960. Mathmatique et phonologie: Thorie des graphes
et consonantisme de la langue roumaine. Revue de mathmatqiues pures et appliqu 5.
319340.
Marcus, Solomon. 1975. The metaphors and the metonymies of scientic (especially mathematical) language. Revue Roumaine de Linguistique 20, 535537.
Marcus, Solomon. 1980. The paradoxical structure of mathematical language. Revue Roumaine
de Linguistique 25, 359366.
Marcus, Solomon. 2003. Mathematics through the glasses of Hjelmslevs semiotics. Semiotica
145, 235246.
Marcus, Solomon. 2010. Mathematics as semiotics. In: Thomas A. Sebeok and Marcel Danesi
(eds.), Encyclopedic dictionary of semiotics, 3rd ed. Berlin: Mouton de Gruyter.
Marcus, Solomon. 2013. Mathematics between semiosis and cognition. In: Mariana Bockarova,
Marcel Danesi, and Rafael Nez (eds.), 99129. Semiotic and cognitive science essays on
the nature of mathematics. Munich: Lincom Europa.
Markov, Andrey A. 1906 [1971]. Extension of the limit theorems of probability theory to a sum
of variables connected in a chain. In R. Howard. Dynamic probabilistic systems, Volume 1:
Markov chains. New York, NY: John Wiley and Sons
Marr, David. 1982. Vision: A computational investigation into the human representation and
processing of visual information. New York, NY: W. H. Freeman.
Marr, David. 1982. Vision: A computational investigation into the human representation and
processing of visual information. New York, NY: W. H. Freeman.
MartnVide, Carlos and Mitrana, Victor (eds.). 2001. Where mathematics, computer science,
linguistics and biology meet. Dordrecht: Kluwer.
Martin, James M. 1990. A computational model of metaphor interpretation. Boston, MA: Academic.
Martinet, Andr. 1955. conomie des changements phontiques. Paris: Maisonneuve and
Larose.
Marx, Karl. 1953 [1858]. Grundrisse der Kritik der Politischen konomie. Berlin: Dietz
Maturana, Humberto R. and Varela, Francisco. 1973. Autopoiesis and cognition: The realization
of the living. Dordrecht: Reidel.
Unauthenticated
Download Date | 6/6/16 9:44 PM
Bibliography |
315
McCarthy, John. 2001. A thematic guide to optimality theory. Cambridge: Cambridge University
Press.
McComb, Karen, Packer, Craig, and Pusey, Anne. 1994. Roaring and numerical assessment in
contests between groups of female lions, Panthera leo. Animal Behavior 47: 379387.
McCowan, Brenda, Hanser, Sean F., and Doyle, Laurance R. 1999, Quantitative tools for comparing animal communication systems: Information theory applied to Bottlenose dolphin
whistle repertoires. Animal Behaviour 62. 11511162.
McCulloch, Warren S. and Pitts, Walter. 1943. A logical calculus of the ideas immanent in nervous activity. Bulletin of Mathematical Biophysics 5: 115133.
McNeill, David. 1987. Psycholinguistics: A New Approach. New York, NY: Harper & Row.
Meluk, Igor. 2001. Linguistic theory: Communicative organization in natural language. Amsterdam: John Benjamins.
Menninger, Karl. 1969. Number words and number symbols: A cultural history of number. Cambridge, MA: MIT Press.
Merton, Robert K. and Barber, Elinor. 2003. The travels and adventures of serendipity: A study
in sociological semantics and the sociology of science. Princeton, NJ: Princeton University
Press.
Mettinger, Arthur. 1994. Aspects of semantic opposition in English. Oxford: Oxford University
Press.
Mill, James. 2001. Analysis Phenomena Of Human Mind. Thoemmes Facsimile Edition.
Miller, George A. 1956. The magical number seven, plus or minus two: Some limits on our capacity for processing information. Psychological Review 63, 8197.
Miller, George A. 1981. Language and speech. New York, NY: W. H. Freeman.
Miller, George A. and Newman, E. B.. 1958. Tests of a statistical explanation of the rank
frequency relation for words in written English. American Journal of Psychology 1958, 71,
20918.
Miller, Jon F. 1981. Eliciting procedures for language. In J. F. Miller (ed.), Assessing language
production in children. London: Arnold.
Mitchell, W. J. T. and Davidson, Arnold I. (eds.). 2007. The late Derrida. Chicago, IL: University of
Chicago Press.
Montague, Richard. 1974. Formal philosophy: selected papers of Richard Montague / ed. and
with an introd. by Richmond H. Thomason. New Haven, CT: Yale University Press.
Monti, Martin M. and Osherson, Daniel N. 2012. Logic, language and the brain. Brain Research
1428: 3342.
Morrill, Glyn. 2010. Categorial grammar: Logical syntax, semantics, and processing. Oxford
University Press.
Morris, Charles. 1938. Foundations of the theory of signs. Chicago, IL: University of Chicago
Press.
Morrow, Glenn R. 1970. A commentary on the First Book of Euclids Elements. Princeton, NJ:
Princeton University Press
Moseley, R. L. and Pulvermller F. 2014. Nouns, verbs, objects, actions, and abstractions: Local
fMRI activity indexes semantics, not lexical categories. Brain and Language 132: 2842.
Mowat, Elizabetg and Davis, Brent. 2010. Interpreting embodied mathematics using network
theory: Implications for mathematics education. Complicity: An International Journal of
Complexity and Education 7: 131.
Mller, Cornelia. 2008. Metaphors dead and alive, sleeping and waking: A dynamic view.
Chicago, IL: University of Chicago Press.
Unauthenticated
Download Date | 6/6/16 9:44 PM
316 | Bibliography
Musser, Gary L., Burger, William F., and Peterson, Blake E. 2006. Mathematics for elementary
teachers: A contemporary approach. Hoboken, NJ: John Wiley.
Nadeau, R. L. 1991. Mind, machines, and human consciousness. Chicago, IL: Contemporary
Books.
Nagao, Makoto. 1984. A framework of a mechanical translation between Japanese and English
by analogy principle. In A. Elithorn and R. Banerji (eds.), Articial and human intelligence.
Oxford: Elsevier.
Nave, Ophir, Neuman, Yair, Howard, D., and Perslovsky, L. 2014. How much information should
we drop to become intelligent? Applied Mathematics and Computation 245: 261264.
Needham, Rodney. 1973. Right and left. Chicago, IL: University of Chicago Press.
Neisser, Ulrich. 1967. Cognitive psychology. Englewood Cliffs, NJ: Prentice-Hall.
Neuman, Yair, Assaf, Dan, Cohen, Yohai, Last, Mark, Argamon, Shlomo, Newton, Howard, and
Frieder, Ophir. 2013. Metaphor identication in large texts corpora. PLoS ONE 8: e62343.
Neuman, Yair. 2007. Immune memory, immune oblivion: A lesson from Funes the memorious.
Progress in Biophysics and Molecular Biology 92: 258267.
Neuman, Yair. 2014. Introduction to computational cultural psychology. Cambridge: Cambridge
University Press.
Neumann John von. 1958. The computer and the brain. New Haven, CT: Yale University Press.
Newcomb, Simon. 1881. Note on the frequency of use of the different digits in natural numbers.
American Journal of Mathematics 4: 3940.
Newell, Allen. 1991. Metaphors for mind, theories of mind: Should the humanities mind? In:
J. J. Sheehan and M. Sosna (eds.), The boundaries of humanity, pp. 158197. Berkeley, CA:
University of California Press.
Nguyen, Hoang Long, Nguyen, Trung Duc and Hwang, Dosam. 2015. KELabTeam: A statistical approach on gurative language sentiment analysis in Twitter. Proceedings of the
9th International Workshop on Semantic Evaluation (SemEval 2015), pages 679683.
Denver, CO, June 45.
Nielsen, Michael. (2012). Reinventing discovery: The new era of networked science. Princeton,
NJ: Princeton University Press.
Nietzsche, Friedrich 1873 [1979]. Philosophy and truth: Selections from Nietzsches notebooks
of the early 1870s. Atlantic Heights, NJ: Humanities Press.
Nirenburg, Sergei. 1987. Machine translation: theoretical and methodological issues. Cambridge: Cambridge University Press.
Nth, Winfried. 1990. Handbook of semiotics. Bloomington, IN: Indiana University Press.
Nowak, Martin A. 2000. The basic reproductive ratio of a word, the maximum size of a lexicon.
Journal of Theoretical Biology 204. 179189.
Nez, Rafael, Edwards, L. D., and Matos, Filipe J. 1999. Embodied cognition as grounding for
situatedness and context in mathematics education. Educational Studies in Mathematics
39, 4565.
OShea, Donal. 2007. The Poincar Conjecture. New York, NY: Walker.
Obler, Loraine K. and Gjerlow, Kris. (1999). Language and the brain. Cambridge: Cambridge
University Press.
Ogden, Charles K. 1932. Opposition: A linguistic and psychological analysis. London: Paul,
Trench, and Trubner.
Ogden, Charles K. and Richards, Ivor A. 1923. The meaning of meaning. London: Routledge and
Kegan Paul.
Unauthenticated
Download Date | 6/6/16 9:44 PM
Bibliography |
317
Okrent, Arika. 2009. In the land of invented languages: Esperanto rock stars, Klingon poets,
Loglan lovers, and the mad dreams who tried to build a perfect language. New York:
Spiegel and Grau.
Osborne, Thomas M. 2014. Human action in Thomas Aquinas, John Duns Scotus, and William of
Ockham. Washington, DC: The Catholic University of America Press.
Osgood, Charles E., Suci, George J., and Tannenbaum, Percy H. 1957. The measurement of
meaning. Urbana, IL: University of Illinois Press.
Otte, Michael. 1997. Mathematics, semiotics, and the growth of social knowledge. For the
Learning of Mathematics 17. 4754.
Papadimitriou, Christos H. and Steiglitz, Kenneth. 1998. Combinatorial optimization: Algorithms and complexity. New York, NY: Dover.
Papineni, Kishore, Roukos, Salim, Ward, Todd, and Zhu, Wei-Jing. 2002. BLEU: A method for
automatic evaluation of machine translation, Proceedings of the 40th Annual Meeting of
the Association for the computational linguistics (ACL), Philadelphia, July 2002, 311318.
Park, Hye Sook. 2000. Markedness and learning principles in SLA: Centering on acquisition of
relative clauses. Journal of PanPacic Association of Applied Linguistics 4. 87114.
Parker, Kelly A. 1998. The continuity of Peirces thought. Nashville, TN: Vanderbilt University
Press.
Parsons, Talcott and Bales, Robert. 1955. Family, socialization, and interaction process. Glencoe, IL: Free Press.
Partee, Barbara, Meulen, Alice Ter, and Wall, Robert. 1990. Mathematical methods in linguistics. New York, NY: Springer.
Partee, Barbara. 1988. Semantic facts and psychological facts. Mind and Language 3. 4352.
Passy P. 1890. tude sur les changements phontiques et leurs caractres gnraux. Paris:
Firmin-Didot.
Pavlov, Ivan. 1902. The work of digestive glands. London: Griffin.
Peano, Giuseppe. 1973. Selected works of Giuseppe Peano, H. Kennedy, ed. and trans. London:
Allen and Unwin.
Peirce, Charles S. 1923. Chance, love, and logic. New York, NY: Harcourt, Brace.
Peirce, Charles S. 19311958. Collected papers of Charles Sanders Peirce, Vols. 18, C. Hartshorne and P. Weiss (eds.). Cambridge, MA: Harvard University Press.
Pennebaker, James W. 2011. The secret life of pronouns. London: Bloomsbury Press.
Penrose, Roger. 1989. The emperors new mind. Cambridge: Cambridge University Press.
Perline, Richard. 1996. Zipfs law, the central limit theorem, and the random division of the unit
interval. Physical Review 54. 220223.
Pesci, Angela. 2003. Could metaphorical discourse be useful for analysing and transforming individuals relationship with mathematics? The Mathematics Education into the 21st Century
Project: Proceedings of the International Conference, 224230. Brno, Czech Republic,
September 2003.
Petty, William. 2010. Natural and political observations, mentioned in a following index, and
made upon the bills of mortality by John Graunt, citizen of London; with reference to the
government (1662). EEBO Editions, ProQuest (December 13, 2010)
Piaget, Jean. 1923. Le langage et la pense chez lenfant. Neuchtel: Delachaux et Niestl.
Piaget, Jean. 1936. Lintelligence avant le langage. Paris: Flammarion.
Piaget, Jean. 1945. La formation du symbole chez lenfant. Neuchtel: Delachaux et Niestl.
Piaget, Jean. 1952. The childs conception of number. London: Routledge and Kegan Paul.
Piaget, Jean. 1955. The Language and thought of the child. Cleveland: Meredian.
Unauthenticated
Download Date | 6/6/16 9:44 PM
318 | Bibliography
Piaget, Jean. 1969. The childs conception of the world. Totowa: Littleeld, Adams and Company.
Pike, Kenneth. 1954. Language in relation to a unied theory of the structure of human behavior. The Hague: Mouton.
Pinker, Stephen. 1990. Language acquisition. In: D. N. Osherson and H. Lasnik (eds.), Language: An invitation to cognitive science, 191241. Cambridge, Mass.: MIT Press.
Pinker, Stephen. 1994. The language instinct: How the mind creates language. New York, NY:
William Morrow.
Pollio, H. and Burns, B. 1977. The anomaly of anomaly. Journal of Psycholinguistic Research 6:
247260.
Pollio, H. and Smith, M. 1979. Sense and nonsense in thinking about anomaly and metaphor.
Bulletin of the Psychonomic Society 13: 323326.
Pollio, H., Barlow, J., Fine, H., and Pollio, M. 1977. Psychology and the poetics of growth: Figurative language in psychology, psychotherapy, and education. Hillsdale, NJ: Lawrence
Erlbaum Associates.
Plya, George. 1921. ber eineAufgabe der Wahrscheinlichkeitsrechnung betreffend die Irrfahrt
im Strassennetz. Mathematische Annalen 84: 149160.
Popper, Karl. 1935 [2002]. The logic of scientic discovery. London: Routledge.
Popper, Karl. 1963. Conjectures and refutations. London: Routledge and Keagan Paul.
Pos, Hendrik. 1938. La notion dopposition en linguistique. XIe Congrs International de Psychologie, 24647.
Pos, Hendrik. 1964. Perspectives du structuralisme. In tudes phonologiques dedies la
mmoire de M. le Prince K. S. Trubetzkoy, 7178. Prague: Jednota Ceskych Mathematiku
Fysiku.
Posamentier, Alfred S. 2004. Pi: A biography of the worlds most mysterious number. New York,
NY: Prometheus.
Posamentier, Alfred S. and Lehmann, Ingmar. 2007. The (fabulous) Fibonacci numbers.
Amherst, NY: Prometheus.
Pottier, Bernard (1974. Linguistique gnrale. Paris: Klincksieck.
Prat, Chantel S. 2012. An fMRI investigation of analogical mapping in metaphor comprehension: The inuence of context and individual cognitive capacities on processing demands.
Journal of Experimental Psychology, Learning, Memory, and Cognition 38. 282294.
Presmeg, Norma C. 1997. Reasoning with metaphors and metonymies in mathematics learning.
In L. D. English (ed.), Mathematical reasoning: Analogies, metaphors, and images, 267
280. Mahwah, NJ: Lawrence Erlbaum.
Presmeg, Norma C. 2005. Metaphor and metonymy in processes of semiosis in mathematics
education. In J. Lenhard and F. Seeger (eds.), Activity and sign, 105116. New York, NY:
Springer.
Prince, Alan and Smolensky, Paul. 2004. Optimality theory: Constraint interaction in generative
grammar. Oxford: Blackwell.
Putnam, Hilary, 1961. Brains and Behavior, paper presented at the American Association for
the Advancement of Science, Section L (History and Philosophy of Science), meeting,
December 27, 1961.
Quirk, Randolph. 1960. Towards a description of English usage. Transactions of the Philological
Society. 1960. 4061.
Radford, Louis. 2010. Algebraic thinking from a cultural semiotic perspective. Research in
Mathematics Education 12: 119.
Unauthenticated
Download Date | 6/6/16 9:44 PM
Bibliography |
319
Radford, Luis and Grenier, Monique. 1996. On dialectical relationships between signs and
ideas. Proceedings of PME 20, 179186.
Raimi, Ralph A. 1969. The peculiar distribution of rst digits. Scientic American 221. 109119.
Raju, C. K. 2007. Cultural foundations of mathematics. Delhi: Pearson Longman.
Ramachandran, Vilayanur S. 2011. The telltale brain: A neuroscientists quest for what makes
us human. New York, NY: Viking.
Reed, David. 1994. Figures of thought: Mathematics and mathematical texts. London: Routledge.
Reining, Astrid and Lnneker-Rodman, Birte. 2007. Corpus-driven metaphor harvesting. In:
Proceedings of the HLT/NAACL-07 Workshop on Computational Approaches to Figurative
Language, 512, Rochester, NY.
Renfrew, Colin, McMahon, April, and Trask, Larry (eds.). 2000) Time depth in historical linguistics. Cambridge, England: The McDonald Institute for Archaeological Research.
Renfrew, Colin. 1988. Archaeology and language: The puzzle of Indo-European origins. Cambridge: Cambridge University Press.
Richards, Ivor A. 1936. The philosophy of rhetoric. Oxford: Oxford University Press.
Richeson, David S. 2008. Eulers gem: The polyhedron formula and the birth of topology.
Princeton, NJ: Princeton University Press.
Ridley, Dennis R. and Gonzales, Emilia A. 1994. Zipfs law extended to small samples of adult
speech. Perception and Motor Skills 1994, 79, 1534.
Rieux, Jacques and Rollin, Bernard E. 1975. General and rational grammar: The Port-Royal
grammar. The Hague: Mouton.
Ringe, Donald, Warnow, Tandy, and Taylor, Ann. 2002. Indo-European and computational
cladistics. Transactions of the Philological Society 100. 59129.
Roark, Brian and Sproat, Richard W. 2007. Computational approaches to morphology and syntax. Oxford University Press.
Roberts, Don D. 2009. The existential graphs of Charles S. Peirce. The Hague: Mouton.
Roberts, Royston M. 1989. Serendipity: Accidental discoveries in science. New York, NY: John
Wiley.
Robins, Robert H. 1990. Leibniz, Humboldt and comparative linguistics. In: Tullio De Mauro and
Lia Formigari (eds.), Leibniz, Humboldt, and the origins of comparativism, pp. 85102.
Amsterdam: John Benjamins.
Robinson, Abraham. 1974. Non-standard analysis. Princeton, NJ: Princeton University Press.
Robinson, Andrew. 1995. The story of writing. London: Thames and Hudson.
Rochefoucauld, Franois, Duc de la. 1665 [2006]. Maxims. New York, NY: Dover.
Rockmore, D. 2005. Stalking the Riemann Hypothesis: The quest to nd the hidden law of prime
numbers. New York, NY: Vintage.
Rommetveit, Ragnar. 1991. Psycholinguistics, hermeneutics, and cognitive science. In G. Appel and H. W. Dechert (eds.), A case for psycholinguistic cases, 115. Amsterdam: John
Benjamins.
Rosenblatt, Frank. 1957. The perceptron, a perceiving and recognizing automaton Project Para.
Ithaca, NY: Cornell Aeronautical Laboratory.
Ross, Alan S. C. 1950. Philological probability problems. Journal of the Royal Statistical Society,
Series B 12. 1959
Ross, Elliotl D. and Mesulam, Marek Marsel. 1979. Dominant language functions of the right
hemisphere: Prosody and emotional gesturing. Archives of Neurology 36: 144148.
Rotman, Brian. 1988. Towards a semiotics of mathematics. Semiotica 72. 135.
Unauthenticated
Download Date | 6/6/16 9:44 PM
320 | Bibliography
Rotman, Brian. 1993. Signifying nothing: The semiotics of zero. Stanford, CA: Stanford University Press.
Rousseau, Ronald and Zhang, Qiaoqiao. 1992. Zipfs data on the frequency of Chinese words
revisited. Scientometrics 24. 201220.
Rumelhart David E. and McClelland, James L. (eds.) (1986). Parallel distributed processing.
Cambridge, MA: MIT Press.
Russell, Bertrand and Alfred N. Whitehead. 1913. Principia mathematica. Cambridge: Cambridge University Press.
Russell, Bertrand. 1903. The principles of mathematics. London: Allen and Unwin.
Sabbagh, K. 2004. The Riemann Hypothesis: The greatest unsolved problem in mathematics.
New York, NY: Farrar, Strauss & Giroux.
Saddock, Jerrold M. 2012. The modular architecture of grammar. Cambridge: Cambridge University Press.
Samoyault, Tiphaine. 1988. Alphabetical order: How the alphabet began. New York, NY: Viking.
Sandri, G. 2004. Does computation provide a model for creativity? An epistemological perspective in neuroscience. Journal of Endocrinological Investigation 27: 922.
Sankoff, David. 1970. On the rate of replacement of wordmeaning relationships. Language 46.
564569.
Sapir, Edward. 1921. Language. New York, NY: Harcourt, Brace, and World.
Saussure, Ferdinand de. 1879. Mmoire sur le systme primitif des voyelles dans les langues
indoeuropennes. Leipzig: Vieweg.
Saussure, Ferdinand de. 1916. Cours de linguistique gnrale. Ed. Charles Bally and Albert
Sechehaye. Paris: Payot.
Schank, Roger C. 1980. An articial intelligence perspective of Chomskys view of language.
The Behavioral and Brain Sciences 3. 3542.
Schank, Roger C. 1984. The cognitive computer. Reading, MA: Addison-Wesley.
Schank, Roger C. 1991. The connoisseurs guide to the mind. New York, NY: Summit.
Schiffer, Stephen 1987. Remnants of meaning. Cambridge, MA: MIT Press.
Schlegel, Friedrich von. 1808 [1977]. ber die Sprache und Weisheit der Indier: Ein Beitrag zur
Begrndung der Altertumskunde. Amsterdam: John Benjamins.
SchmandtBesserat, Denise. 1978. The earliest precursor of writing. Scientic American 238.
509.
SchmandtBesserat, Denise. 1992. Before writing, 2 vols. Austin, TX: University of Texas Press.
Schmidt-Snoek, Gwenda L., Drew, Ashley R., Barile, Elizabeth C., and Aguas, Stephen J. 2015.
Auditory and motion metaphors have different scalp distributions: A ERP study. Frontiers
in Human Neuroscience, Volume 9.
Schmidt, Gwenda L. and Seger, Carol A. 2009. Neural correlates of metaphor processing: the
roles of gurativeness, familiarity and difficulty. Brain and Cognition 71: 375386.
Schneider, Michael S. 1994. Constructing the universe: The mathematical archetypes of nature,
art, and science. New York, NY: Harper Collins.
Schooneveld, Cornelius H. van. 1978. Semantic transmutations. Bloomington, IN: Physsardt.
Schuster, Peter. 2001. Relevance theory meets markedness: Considerations on cognitive effort
as a criterion for markedness in pragmatics. New York, NY: Peter Lang.
Scott, Michael L. 2009. Programming language pragmatics. Oxford: Elsevier
Searle, John R. 1984. Minds, brain, and science. Cambridge, MA: Harvard University Press.
Sebeok, Thomas A. and Danesi, Marcel. 2000. The forms of meaning: Modeling systems theory
and semiotics. Berlin: Mouton de Gruyter.
Unauthenticated
Download Date | 6/6/16 9:44 PM
Bibliography |
321
Sebeok, Thomas A. and Umiker-Sebeok, Jean. 1980. You know my method: A juxtaposition of
Charles S. Peirce and Sherlock Holmes. Bloomington, IN: Gaslight Publications.
Segerstrle, Ullica. 2000. Defenders of the truth: The battle for science in the sociobiology
debate and beyond. Oxford: Oxford University Press.
Selin, Helaine. 2000. Mathematics across cultures. Dordrecht: Kluwer.
Selvin, Steven. 1975. A problem in probability (letter to the editor). American Statistician 29: 67
Semenza C, Delazer M, Bertella L, Gran A, Mori I, Conti FM, Pignatti R, Bartha L, Domahs F,
Benke T, Mauro A. 2006. Is math lateralised on the same side as language? Right hemisphere aphasia and mathematical abilities. Neurosci Lett. 2006 Oct 9;406(3):2858.
Senechal, Marjorie. 1993. Mathematical structures. Science 260. 11701173.
Shannon, Claude E. 1948. A mathematical theory of communication. Bell Systems Technical
Journal 27 (1948): 379423.
Shannon, Claude E. 1951. Prediction and entropy of printed English. Bell Systems Technological
Journal 1951, 30, 5064.
Sheehan, J. J. 1991. Coda. In: J. J. Sheehan and M. Sosna (eds.), The boundaries of humanity,
259265. Berkeley, CA: University of California Press.
Shin, Soon-Joo. 1994. The logical status of diagrams. Cambridge: Cambridge University Press.
Shorser, Lindsey. 2012. Manifestations of mathematical meaning. In: Mariana Bockarova, Marcel Danesi, and Rafael Nez (eds.), 295315. Semiotic and cognitive science essays on
the nature of mathematics. Munich: Lincom Europa.
Shutova, Ekaterina. 2010. Automatic metaphor interpretation as a paraphrasing task. In: Proceedings of NAACL 2010, 10291037, Los Angeles, CA.
Silva, Gabriel A. 2011. The need for the emergence of mathematical neuroscience: Beyond computation and simulation. Computational Neuroscience 5: 51.
imic, Jelena and Vuk, Damir. 2010. Machine translation in practice. Proceedings of the
21st central European conference on information and intelligent systems, 415419.
Varadin, Croatia.
Singh, Simon. 1997. Fermats enigma: The quest to solve the worlds greatest mathematical
problem. New York, NY: Walker and Co.
Sjoberg, Andree and Sjoberg, Gideon. 1956. Problems in glottochronology. American Anthropologist 58. 296308.
Skemp, Richard R. 1971. The psychology of learning mathematics. Harmondsworth: Penguin.
Smith, Kathleen, W., Balkwill, Laura-Lee, Vartanian, Oshin, and Goel, Vinod. 2015. Syllogisms delivered in an angry voice lead to improved performance and engagement of a
different neural system compared to neutral voice. Frontiers in Human Neuroscience 10
(10.3389/fnhum.2015.00222).
Smolin, Lee. 2013. Time reborn: From the crisis in physics to the future of the universe. Boston,
MA: Houghton Mifflin Harcourt.
Smullyan, Raymond. 1997. The riddle of Scheherazade and other amazing puzzles, ancient and
modern. New York, NY: Knopf.
Speelman, Dirk. 2014. Logistic regression: A conrmatory technique for comparisons in corpus
linguistics. Amsertdam: John Benjamins.
Sperber, Dan and Wilson, Deirdre. 1986. Relevance, communication, and cognition. Cambridge,
MA: Harvard University Press.
Sperry, Roger W. 1968. Hemisphere disconnection and unity in conscious awareness. American
Psychologist 23: 723733.
Unauthenticated
Download Date | 6/6/16 9:44 PM
322 | Bibliography
Sperry, Roger W. 1973. Lateral specialization of cerebral function in the surgically separated
hemisphere. In: P. J. Vinken and G. W. Bruyn (eds.), The psychophysiology of thinking, 273
289. Amsterdam: North Holland.
Stachowiak, F., Huber, W., Poeck, K., and Kerschensteiner, M. 1977. Text comprehension in
aphasia. Brain and Language 4: 177195.
Starostin, Sergei. 1999. Methodology of longrange comparison. In Vitaly Shevoroshkin and
Paul J. Sidwell (eds.), Historical linguistics and lexicostatistics, 6166. Melbourne.
Steen, G. J., Dorst, A. G., Herrmann, J. B., Kaal, A. A. and Krennmayr, T. 2010. Metaphor in usage.
Cognitive Linguistics 21: 765796.
Steen, Gerard J. 2006. Finding metaphor in grammar and usage. Amsterdam: John Bejamins.
Steenrod, Norman, Halmos, Paul, Schiffer, Menahem N., and. Dieudonn, Jean A. 1973. How to
write mathematics. New York, NY: Springer.
Stewart, Ian. 1995. Natures numbers. New York, NY: Basic Books.
Stewart, Ian. 2008. Taming the innite. London: Quercus.
Stewart, Ian. 2013. Visions of innity. New York, NY: Basic Books.
Stjernfelt, Frederik. (2007). Diagrammatology: An investigation on the borderlines of phenomenology, ontology, and semiotics. New York, NY: Springer.
Swadesh, Morris. 1951. Diffusional cumulation and archaic residue as historical explanations.
Southwestern Journal of Anthropology 7, 121.
Swadesh, Morris. 1955. Towards greater accuracy in lexicostatistic dating. International Journal
of American Linguistics 21. 121137.
Swadesh, Morris. 1959. Linguistics as an instrument of prehistory. Southwestern Journal of
Anthropology 15. 2035.
Swadesh, Morris. 1971. The origins and diversication of language. Chicago, IL: AldineAtherton.
Sweet Henry. 1888. A history of English sounds from the earliest period. Oxford: Clarendon.
Tagliamonte, Sali. 2006. Analysing sociolinguistic variation. Cambridge: Cambridge University
Press.
Tall, David. 2013. How humans learn to think mathematically. Cambridge: Cambridge University
Press.
Tanaka-Ishii, Kumiko and Ishii, Yuichiro. 2008. Sign and the lambda term. Semiotica 169. 123
148.
Tanaka-Ishii, Kumiko and Ishii. 2007, Yuichiro. Icon, index, symbol and denotation, connotation, metasign. Semiotica 166. 124135.
Tarski, Alfred. 1933 [1983]. Logic, semantics, metamathematics, Papers from 1923 to 1938, ed.
John Corcoran. Indianapolis, IN: Hackett Publishing Company.
Tauli Valter. 1958. The structural tendencies of languages. Helsinki:
Taylor, Richard and Andrew Wiles. 1995. Ring-theoretic properties of certain Hecke algebras.
Annals of Mathematics 141. 553572.
Teraia, A. and Nakagawa, M. 2012. A corpus-based computational model of metaphor understanding consisting of two processes. Cognitive Systems Research 1920: 3038.
Thibault, Paul J. 1997. ReReading Saussure: The dynamics of signs in social life. London: Routledge.
Thom, Ren. 1975. Structural stability and morphogenesis: An outline of a general theory of
models. Reading: Benjamin.
Thom, Ren. 2010. Mathematics. In: Thomas A. Sebeok and Marcel Danesi (eds.), Encyclopedic
dictionary of semiotics, 3rd ed. Berlin: Mouton de Gruyter.
Unauthenticated
Download Date | 6/6/16 9:44 PM
Bibliography |
323
Thomason, Sarah Grey and Kaufman, Terrence. 1988. Language contact, creolization, and genetic linguistics. Berkeley, CA: University of California Press.
Thomson, William and Schumann, Edward. 1987. Interpretation of statistical evidence in criminal trials. Law and Human Behavior 11: 167187.
Tiersma, Peter M. 1982. Local and general markedness. Language 58. 832849.
Titchener, Edward B. 1910. A textbook of psychology. Delmar: Scholars Facsimile Reprints.
Tomic, Olga M. (ed.). 1989. Markedness in synchrony and diachrony. Berlin: Mouton de Gruyter.
Toni, R., Spaletta, G., Casa, C. D., Ravera, S., and Sandri, G. 2007. Computation and brain processes, with special reference to neuroendocrine systems. Acta Biomedica78: 6783.
Trubetzkoy, Nikolai S. 1936. Essaie dune thorie des oppositions phonologiques. Journal de
Psychologie 33. 518.
Trubetzkoy, Nikolai S. 1939. Grundzge der Phonologie. Travaux du Cercle Linguistique de
Prague 7 (entire issue).
Trubetzkoy, Nikolai S. 1968. Introduction to the principles of phonological description. The
Hague: Martinus Nijhoff.
Trubetzkoy, Nikolai S. 1975. Letters and notes, ed. R. Jakobson. The Hague: Mouton.
Turing, Alan. 1936. On computable numbers with an application to the Entscheidungs problem.
Proceedings of the London Mathematical Society 42: 230265.
Turing, Alan. 1950 [1963]. Computing machinery and intelligence. In: E. A. Feigenbaum and
J. Feldman (eds.), Computers and thought, 123134. New York, NY: McGraw-Hill.
Turner, Mark 2005. Mathematics and narrative. thalesandfriends.org/en/papers/pdf/
turnerpaper.pdf.
Turner, Mark. 2012. Mental packing and unpacking in mathematics. In: Mariana Bockarova,
Marcel Danesi, and Rafael Nez (eds.), Semiotic and cognitive science essays on the
nature of mathematics, 123134. Munich: Lincom Europa.
Tweedie, Fiona J., Singh, S., and Holmes, David I. 1996. Neural network applications in stylometry: The Federalist Papers. Computers and the Humanities 30: 110.
Tymoczko, Thomas. 1978. The Four-Color Problem and its philosophical signicance. Journal of
Philosophy 24: 5783.
Uexkll, Jakob von. 1909. Umwelt und Innenwelt der Tierre. Berlin: Springer.
Van de Walle, Jrgen and Willems, Klaas. 2007. Zipf, George Kingsley (19021950). In Encyclopedia of languages and linguistics, 2nd ed., K. Brown, ed.; Vol. 13, 75657. Oxford: Elsevier
Science.
Van der Merwe, Nikolaas J. 1966) New mathematics for glottochronology. Current Anthropology
7. 485500
Van der Schoot, Bakker Manno, A. H., Arkema, T. M., Horsley and E. C. D. M van Lieshout. 2009.
The consistency effect depends on markedness in less successful but not successful
problem solvers: An eye movement study in primary school children. Contemporary Educational Psychology 34: 5866.
Van Eyck, Jan and Kamp, Hans. 1997. Representing discourse in context. In: J. van Benthem and
A. ter Meulen (eds.) Handbook of logic and language, volume 3, 179237. Amsterdam:
Elsevier.
Varelas, Maria. 1989. Semiotic aspects of cognitive development: Illustrations from early mathematical cognition. Psychological Review 100. 420431.
Vendryes J. 1939. Parler par conomie. In; C. Bally and G. Genve (eds.), Mlanges de linguistique offerts Charles Bally, 4962. Geneva: Georg & Co.
Unauthenticated
Download Date | 6/6/16 9:44 PM
324 | Bibliography
Venn, John. 1880. On the employment of geometrical diagrams for the sensible representation
of logical propositions. Proceedings of the Cambridge Philosophical Society 4: 4759.
Venn, John. 1881. Symbolic logic. London: Macmillan.
Verene, Donald P. 1981. Vicos science of imagination. Ithaca, NY: Cornell University Press.
Vijayakrishnan, K. J. 2007. The grammar of Carnatic music. Berlin: Mouton de Gruyter.
Vygotsky, Lev S. 1961. Thought and language. Cambridge, MA: MIT Press.
Walker, C. B. F. 1987. Cuneiform. Berkeley, CA: University of California Press.
Wallis, Sean and Nelson, Gerald. 2001. Knowledge discovery in grammatically analysed corpora. Data Mining and Knowledge Discovery 5: 307340.
Wallon, Henri. 1945. Les origines de la pense chez lenfant. Vol. 1. Paris: Presses Universitaires
de France.
Wang, Xiaolu and He, Daili. 2013. A review of fMRI Investigations into the neural mechanisms of
metaphor comprehension. Chinese Journal of Applied Linguistics 38: 234239.
Wapner, Wendy, Hamby, Suzanne, and Gardner, Howard. 1981. The role of the right hemisphere
in the apprehension of complex linguistic materials. Brain and Language 14: 1533.
Watson, L. 1990. The nature of things. London: Houghton and Stoughton.
Waugh, Linda. 1979. Markedness and phonological systems. LACUS (Linguistic Association of
Canada and the United States) Proceedings 5: 155165.
Waugh, Linda. 1982. Marked and unmarked: A choice between unequals in semiotic structure.
Semiotica 39: 211216.
Weaver, Warren. 1955. Translation. In: W. N. Locke and A. D. Booth (eds.), Machine Translation
of languages, 1523. New York, NY: John Wiley.
Weinreich, Uriel. 1953. Languages in contact: Findings and problems. The Hague: Mouton.
Weinreich, Uriel. 1954. Is a structural dialectology possible? Word 10: 388400.
Weinstein, Edward A. 1964. Affections of speech with lesions of the nondominant hemisphere. Research Publications of the Association for Research on Nervous and Mental
Disorders 42: 220225.
Weisberg, Donna Skolnick, Keil, Frank C., Goodstein, Joshua, Rawson, Elizabeth, and Gray,
Jeremy R. 2008. The seductive allure of neuroscience explanations. Journal of Cognitive
Neuroscience 20: 470477.
Weizenbaum, Joseph. 1966. ELIZAA computer program for the study of natural language communication between man and machine. Communications of the ACM 9: 3645.
Weizenbaum, Joseph. 1976. Computer power and human reason: From judgment to calculation.
New York, NY: W. H. Freeman.
Wells, David. 2005. Prime numbers: The most mysterious gures in math. Hoboken: John Wiley.
Wells, David. 2012. Games and mathematics: Subtle connections. Cambridge: Cambridge University Press.
Werner, Alice. 1919. Introductory sketch of the Bantu languages. New York, NY: Dutton.
Werner, Heinz and Kaplan, Bernard. 1963. Symbol formation: An organismic-developmental
approach to the psychology of language and the expression of thought. New York, NY:
John Wiley.
Wheeler, Marilyn M. 1987. Research into practice: Childrens understanding of zero and innity.
Arithmetic Teacher 35: 4244.
Whiteley, Walter. 2012. Mathematical modeling as conceptual blending: Exploring an example
within mathematics education. In: Mariana Bockarova, Marcel Danesi, and Rafael Nez
(eds.), 256279. Semiotic and cognitive science essays on the nature of mathematics.
Munich: Lincom Europa.
Unauthenticated
Download Date | 6/6/16 9:44 PM
Bibliography |
325
Whitney, W. D. 1877. The Principle of Economy as a phonetic force. Transactions of the American
Philological Association 8: 123134.
Whorf, Benjamin Lee. 1956. Language, thought, and reality, J. B. Carroll (ed.). Cambridge, MA:
MIT Press.
Wiener, Norbert. 1948. Cybernetics, or control and communication in the animal and the machine. Cambridge, MA: MIT Press.
Wierzbicka, Anna. 1996. Semantics: Primes and universals. Oxford: Oxford University Press.
Wierzbicka, Anna. 1997. Understanding cultures through their key words. Oxford: Oxford University Press.
Wierzbicka, Anna. 1999. Emotions across languages and cultures: Diversity and universals.
Cambridge: Cambridge University Press.
Wierzbicka, Anna. 2003. Crosscultural pragmatics: The semantics of human interaction. New
York, NY: Mouton de Gruyter.
Wiles, Andrew. 1995. Modular elliptic curves and Fermats last theorem. Annals of Mathematics. Second Series 141: 443551.
Wilson, E. O. and Harris, M. 1981. Heredity versus culture: A debate. In: J. Guillemin (ed.), Anthropological realities: Reading in the science of culture, 450465. New Brunswick, NJ:
Transaction Books.
Winner, Ellen and Gardner, Howard. 1977. The comprehension of metaphor in brain-damaged
patients. Brain 100: 717729.
Winner, Ellen. 1982. Invented worlds: The psychology of the arts. Cambridge, MA: Harvard University Press.
Winograd, Terry. 1991. Thinking machines: Can there be? Are we? In: J. J. Sheehan and M. Sosna
(eds.), The boundaries of humanity, 198223. Berkeley, CA: University of California Press.
Wittgenstein, Ludwig. 1921. Tractatus logico-philosophicus. London: Routledge and Kegan
Paul.
Wittgenstein, Ludwig. 1953. Philosophical investigations. New York, NY: Macmillan.
Wittmann, Henri. 1969. A lexico-statistic inquiry into the diachrony of Hittite. Indogermanische
Forschungen 74: 110.
Wittmann, Henri. 1973. The lexicostatistical classication of the French-based Creole languages. Lexicostatistics in genetic linguistics: Proceedings of the Yale conference, 8999.
The Hague: Mouton.
Wolfram, Stephen. 2002. A new kind of science. Champaign, IL: Wolfram Media.
Wundt, Wilhelm. 1880. Grundzge der physiologischen Psychologie. Leipzig: Englemann.
Wundt, Wilhelm. 1901. Sprachgeschichte und Sprachpsychologie. Leipzig: Eugelmann.
Wyllys, Ronald E. 1975. Measuring scientic prose with rankfrequency (Zipf) curves: A new
use for an old phenomenon. Proceedings of the American Society for Information Science
12: 3031.
Yancey, A., Thompson, C., and Yancey, J. 1989. Children must learn to draw diagrams. Arithmetic Teacher 36: 1519.
Zipf, George K. 1929. Relative frequency as a determinant of phonetic change. Harvard Studies
in Classical Philology 40: 195.
Zipf, George K. 1932. Selected studies of the principle of relative frequency in language. Cambridge, MA: Harvard University Press.
Zipf, George K. 1935. The psycho-biology of language: An introduction to dynamic philology.
Boston, MA: Houghton-Mifflin.
Unauthenticated
Download Date | 6/6/16 9:44 PM
326 | Bibliography
Zipf, George K. 1949. Human behavior and the principle of least effort. Boston, MA: AddisonWesley.
Zwicky, Arnold and Sadock, Jerrold. 1975. Ambiguity tests and how to fail them. In: J. Kimball
(ed.) Syntax and semantics 4, New York, NY: Academic Press.
Zwicky, Jan. 2010. Mathematical analogy and metaphorical insight. For the Learning of Mathematics 30: 914.
Zyllerberg, A., Dehaene, S., Roelfsma, P. R., and Sigman, M. 2011. The human Turing machine:
A neural framework for mental programs. Trends in Cognitive Science 15: 293300.
Unauthenticated
Download Date | 6/6/16 9:44 PM
Index
abduction 86, 91, 92, 145, 255, 258, 273, 274,
275
Abel, Neils Henrik 155
acalculia 59, 282
agglutinative 7, 23, 54, 55
Aiken, Howard 138
algorithm 1, 5, 34, 3740, 4547, 50, 51,
8487, 126, 127, 129, 132143, 145, 147,
148, 152154, 156, 162, 167, 169, 171,
172, 178, 188, 189, 202, 221, 223, 231,
232, 258, 265, 294
allophone 114, 115
alphabet 26, 91, 109, 153, 232, 273
ambiguity 11, 2529, 107, 160163, 170, 172,
191, 247, 248
analogy 17, 67, 90, 116, 169, 239, 275
anomalous 42, 63, 104, 170, 174, 175, 251
anthropic principle 102
aphasia 59, 282
Appel, Kenneth 38, 85, 91
argumentation 8, 32, 33
Aristotle 1012, 17, 36, 49, 65, 66, 70, 71, 73,
94, 262
arithmetic x, 2, 79, 11, 14, 15, 21, 25, 33, 34,
59, 69, 7072, 102, 134, 142, 168, 194,
240, 253, 268, 269, 273, 275, 279, 282,
285, 288, 293
Arnauld, Antoine 31
articial intelligence (AI) 36, 47, 50, 68, 96,
111, 134, 137, 138, 179, 183, 192
articial language 133, 190
articial neural network (ANN) 221, 222
associationism 49, 50
axiom 1, 8, 10, 13, 21, 25, 27, 28, 30, 31, 36,
38, 7274, 79, 84, 85, 87, 108, 130, 230,
259, 260
Babbage, Charles 138
Bacon, Francis 71
Bacon, Roger 12
Bar-Hillel Paradox 160162, 170
Bar-Hillel, Yehoshua 116, 117, 160, 169, 173
Barber Paradox 83
BASIC 146, 175
Unauthenticated
Download Date | 6/6/16 9:45 PM
328 | Index
connotation 166
consistency 24, 25, 27, 29, 38, 72, 81, 82, 92,
129, 139, 140, 146
contradiction 77, 78, 84, 87, 93, 94, 129
constructivism 14, 15
context 103, 110113, 160, 161
conversation 4042, 53, 126, 140, 166, 173,
175178, 186, 211
core vocabulary 191, 238242, 244, 245
coreference 41, 177
corpus linguistics 53, 194, 201, 202,
219224, 294
correlation 195, 200, 209, 211
correlation coefficient 201, 202
creativity 50, 64, 260
cybernetics 39
De Morgan, Augustus 96
decidability 8, 36, 8184, 86, 129, 137, 146,
153
decimal number 7, 52, 71, 154
deduction 4, 64, 72, 74, 77, 87, 91, 145, 274,
275
deep structure 1820, 22, 40, 42, 105108,
111, 112, 161
Dehaene, Stanislas xi, 57, 58, 261, 268, 276,
277, 280, 282, 284
deixis 41
Democritus 70
denotation 165
Descartes, Ren 2, 11, 17, 20, 69, 71, 190, 288
Devlin, Keith 268, 284
diagram 13, 29, 33, 43, 98, 99, 101, 273275
dialogue 40, 70, 146, 174
disambiguation 170, 177
discourse 40, 62, 116, 133, 134, 172, 223
distinctive feature 114116
double articulation 1, 216
Eckert, J. Presper 138
economy 6, 110, 193, 202, 216, 218, 245, 247,
248
efficiency 6, 216, 246
ELIZA 174, 179, 184
embodied cognition 51, 280
emergence 15, 21, 53
Enlightenment x, 50, 67, 71, 79
Unauthenticated
Download Date | 6/6/16 9:45 PM
Index |
Epimenides 83, 84
ergonomics 249
Esperanto 191
ethnomathematics 58, 79
ethnosemantics 40
Euclid 2628, 66, 69, 77, 78, 80, 85, 91,
134136, 205
Euclidean geometry 2, 8, 29, 31, 64, 280
Euler, Leonhard 80, 98, 99, 101, 148151
Euler diagram 98, 99
Existential Graph 99, 272274
exponent 37, 90, 194, 195, 204, 211, 239, 240
Fermat, Pierre de 196
Fermats Last Theorem 31, 275, 284
Ferrero, Guillaume 211
Fibonacci, Leonardo 71
Fibonacci sequence 111
Ficino, Marsilio 71
fth postulate (axiom) 29, 36, 73
gurative, xi 4, 11, 42, 43, 62, 64, 116, 118,
125, 128, 160, 169, 188, 223, 257, 285
Fodor, Jerry 187
formal grammar 1012, 42, 43, 104, 108, 111,
133, 146, 163
formal linguistics 22, 23, 66, 67, 103, 104, 110
formal mathematics 26, 37, 66, 67, 69, 81,
86, 96, 108
formal semantics 114, 116118
formalism xii, 59, 13, 15, 2326, 36, 51, 67,
68, 124, 125, 127130, 133
formalist hypothesis 16, 17
Foster, Donald 220
Four Color Theorem 38, 85, 86
fractal 92, 93, 95, 96
Frege, Gottlob 35, 82
Fundamental Theorem of Arithmetic 134
Galileo 71, 89, 254
Galois, variste 155, 156
Gao, Yuqing 170
generativism 21, 161
genetic algorithm 221
geometry, x 9, 30, 31, 67, 69, 70, 72, 73, 80,
94, 288
glottochronology 194, 237, 238, 241243,
245, 294
329
Unauthenticated
Download Date | 6/6/16 9:45 PM
330 | Index
M-Set 95
Machine Translation (MT) 5, 39, 142, 159,
160, 163, 167, 169, 174
Machine-learning (ML) 139, 188
Mandelbrot, Benoit 85, 96, 213
mapping 4, 9, 13, 118, 120122, 189
markedness 248, 256
Markov, Andrey A. 1, 43, 231, 232
Markov chain 21, 86, 168, 181, 235, 236
Markov state 105
Martinet, Andr 210, 216, 246, 248
Marx, Karl 72
math cognition xi, xii, 58, 65, 268283, 293
mathematical knowledge 101103
Mauchley, John 138
McCarthy, John 139
Mean Length of Utterance (MLU) 53, 54
metalanguage 84
metaphor 119124
Mill, James 49
Minimalist Program 23, 37, 248
mirror neuron 92
model 4045, 144146
modularity 264, 265
Montague, Richard 116, 124
Montague grammar 116
Monty Hall Problem 198, 225227, 229
morpheme 18, 22, 5355, 114, 177, 178, 191,
201, 217
morphological index 54
morphology 10, 54, 177, 191, 210, 211, 219,
237, 246
mythos 66, 70, 71
Unauthenticated
Download Date | 6/6/16 9:45 PM
Index |
331
Unauthenticated
Download Date | 6/6/16 9:45 PM
332 | Index
Thales 80
Thom, Ren 15, 102, 130
Thrax, Dionysius 12, 66
time depth 238244, 246, 294
topology 101, 151, 259, 289
transnite number 35, 91
transformational rule 104, 105
transformational-generative grammar 104
Traveling Salesman Problem (TSP) 147, 148,
152, 153
tree diagram 43, 44, 46, 104, 105, 115
Trivium 69
Turing, Alan 1, 84, 127, 141, 154, 184, 185
Turing machine 2, 86, 141, 143, 153, 185, 257,
260
Turing Test 184
Uexkll, Jakob von 101, 276
Umwelt 101, 102, 269, 276
undecidability 36, 8284, 253
Unexpected Hanging paradox 129, 130, 147,
159
Universal Grammar (UG) 1
Valla, Lorenzo 221
Vendryes, Joseph 246
Venn diagram 98
Venn, John 98100
Vico, Giambattista 17
Vygotsky, Jean 48, 49, 128, 181, 182
Weaver, Warren 160
Weizenbaum, Joseph 174
well-formedness 63
Whitehead, Alfred North 35, 82, 83, 102
Wiener, Norbert 39
William of Ockham 13
Winograd, Terry 183, 186
Wittgenstein, Ludwig 35, 82, 254
Wundt, Wilhelm 105
Zamenhof, Ludwik Lejzer 191
Zeno of Elea 78, 82
Zipf, George Kingsley 209, 211, 213
Zipfs law 214, 215
Zipan analysis 194, 209, 214
Zipan curve 212
Unauthenticated
Download Date | 6/6/16 9:45 PM