1language and Mathematics PDF

Marcel Danesi
Language and Mathematics
Unauthenticated
Download Date | 6/20/16 3:49 PM
Language Intersections
Volume 1
Unauthenticated
Marcel Danesi
Language and
Mathematics
|
An Interdisciplinary Guide
Unauthenticated
ISBN 978-1-61451-554-8
e-ISBN (PDF) 978-1-61451-318-6
e-ISBN (EPUB) 978-1-5015-0036-7
ISSN 2195-559X
Library of Congress Cataloging-in-Publication Data

A CIP catalog record for this book has been applied for at the Library of Congress.
Bibliographic information published by the Deutsche Nationalbibliothek
The Deutsche Nationalbibliothek lists this publication in the Deutsche Nationalbibliograe;
detailed bibliographic data are available on the Internet at http://dnb.dnb.de.
2016 Walter de Gruyter Inc., Boston/Berlin
Cover image: Lonely/iStock/thinkstock
Typesetting: PTP-Berlin, Protago-TEX-Production GmbH, Berlin
Printed on acid-free paper
Printed in Germany
www.degruyter.com
Unauthenticated
Contents
List of gures | viii
Preface | x
1
1.1
1.1.1
1.1.2
1.1.3
1.1.4
1.2
1.2.1
1.2.2
1.2.3
1.3
1.3.1
1.3.2
1.4
1.4.1
1.4.2
1.5
Common Ground | 1
Logic | 6
Formalism in linguistics and mathematics | 8
Syntax | 18
Formal analysis | 24
The structure of logic | 32
Computation | 36
Modeling formal theories | 40
Cognitive science | 46
Creativity | 50
Quantication | 52
Compression | 53
Probability | 55
Neuroscience | 56
Neural structure | 57
Blending | 62
Common ground | 64
2
2.1
2.1.1
2.1.2
2.1.3
2.1.4
2.1.5
2.1.6
2.2
2.2.1
2.2.2
2.3
2.3.1
2.3.2
2.3.3
Logic | 66
Formal mathematics | 69
Lgos and mythos | 70
Proof | 72
Consistency, completeness, and decidability | 81
Non-Euclidean logic | 85
Cantorian logic | 88
Logic and imagination | 91
Set theory | 96
Diagrams | 98
Mathematical knowledge | 101
Formal linguistics | 103
Transformational-generative grammar | 104
Grammar rules | 108
Types of grammar | 110
Unauthenticated
vi | Contents
2.3.4
2.4
2.4.1
2.4.2
2.5
2.5.1
2.5.2
2.5.3
Formal semantics | 114

Cognitive linguistics | 118
Conceptual metaphors | 119
Challenge to formalism | 123
Formalism, logic, and meaning | 125
A Gdelian critique | 127
Connecting formalism and cognitivism | 128
Overview | 129
3
3.1
3.1.1
3.1.2
3.1.3
3.2
3.2.1
3.2.2
3.3
3.3.1
3.3.2
3.3.3
3.3.4
3.4
3.4.1
3.4.2
3.5
3.5.1
3.5.2
Computation | 132
Algorithms and models | 134
Articial intelligence | 138
Knowledge representation | 139
Programs | 144
Computability theory | 147
The Traveling Salesman Problem | 147
Computability | 153
Computational linguistics | 159
Machine Translation | 160
Knowledge networks | 163
Theoretical paradigms | 167
Text theory | 172
Natural Language Processing | 174
Aspects of NLP | 175
Modeling language | 178
Computation and psychological realism | 179
Learning and consciousness | 180
Overview | 184
4
4.1
4.1.1
4.1.2
4.2
4.2.1
4.2.2
4.2.3
4.2.4
4.3
4.3.1
Quantication | 193
Statistics and probability | 195
Basic notions | 197
Statistical tests | 200
Studying properties quantitatively | 202
Benfords Law | 203
The birthday and coin-tossing problems | 206
The Principle of Least Effort | 209
Efficiency and economy | 216
Corpus linguistics | 219
Stylometric analysis | 219
Unauthenticated
Contents |
4.3.2
4.3.3
4.4
4.4.1
4.4.2
4.4.3
4.4.4
4.5
4.5.1
4.5.2
4.6
Other techniques | 221

The statistics on metaphor | 222
Probabilistic analysis | 224
The Monty Hall Problem | 226
The Prosecutors Fallacy | 227
Bayesian Inference | 228
General implications | 230
Quantifying change in language | 237
Lexicostatistics and glottochronology | 237
Economy of change | 245
Overview | 248
5
5.1
5.1.1
5.1.2
5.1.3
5.1.4
5.2
5.2.1
5.2.2
5.2.3
5.2.4
5.3
5.3.1
5.3.2
5.4
Neuroscience | 255
Neuroscientic orientations | 256
Computational neuroscience | 257
Connectionism | 262
Modularity | 264
Research on metaphor | 266
Math cognition | 268
Dening math cognition | 270
Charles Peirce | 272
Graphs and math cognition | 274
Neuroscientic ndings | 276
Mathematics and language | 284
Mathematics and gurative cognition | 285
Blending theory | 287
Concluding remarks | 294
Bibliography | 297
Index | 327
Unauthenticated
vii
List of gures
Figure 1.1
Figure 1.2
Figure 1.3
Figure 1.4
Figure 1.5
Figure 1.6
Figure 1.7
Figure 1.8
Figure 1.9
Figure 1.10
Figure 1.11
Figure 1.12
Metaphor as the basis for new understanding | 5

The formalist mode of inquiry | 14
Chomskyan analysis of surface structure | 19
Transformational rules | 19
Euclids fth postulate | 29
Lobachevskian Geometry | 30
Riemannian geometry | 30
Set theory diagrams | 33
Tree diagram for The boy eats the pizza | 44
Phrase structure diagram for The boy eats the pizza | 44
Markovian diagram for 2,234 | 46
Blending | 57
Figure 2.1
Figure 2.2
Figure 2.3
Figure 2.4
Figure 2.5
Figure 2.8
Figure 2.9
Figure 2.10
Figure 2.11
Figure 2.12
Figure 2.13
Figure 2.14
Figure 2.15
Figure 2.16
Figure 2.17
Figure 2.18
Figure 2.19
Figure 2.20
Part 1 of the proof that the sum of the angles in a triangle is 180 | 73
Dissection proof of the Pythagorean theorem | 79
Initial correspondence of the set of integers with the set
of square numbers | 89
Second correspondence of the set of integers with the set
of square numbers | 90
Correspondence of the set of integers with the set of positive integer
exponents | 90
The Cantor set | 92
The Sierpinski Carpet | 93
The M-Set | 95
Overlapping sets | 97
Eulers diagrams | 98
Eulers diagram solution | 99
Venns basic diagram | 99
Venn diagrams | 100
Tree diagram for The boy loves the girl | 105
Early model of a transformational-generative grammar | 106
Lexical tree diagram | 115
Figures of speech | 119
Image schemas, mapping and metaphor | 121
Figure 3.1
Figure 3.2
Figure 3.3
Figure 3.4
Figure 3.5
Figure 3.6
Euclids and Nichomachus algorithms | 135

A owchart of Euclids algorithm | 136
Flowchart for determining the largest number | 144
Programming schema | 145
Knigsberg Bridges Problem | 149
Knigsberg Bridges Problem in outline graph form | 150
Figure 2.6
Figure 2.7
Unauthenticated
List of gures |
Figure 3.7
Figure 3.8
Figure 3.9
Figure 3.10
Number of vertices, edges, and faces of a cube | 151

Knowledge network for snake | 166
An example of how English is translated into concepts, then recombined from
concepts into Chinese. IBM, 2007 | 171
Using statistics to translate spoken language into concepts.
IBM, 2007 | 171
Figure 4.1
Figure 4.2
Figure 4.3
Figure 4.4
Figure 4.5
Figure 4.6
Figure 4.7
The normal curve | 198

Standard deviations | 199
Birthday problem | 208
Zipan curve of Joyces Ulysses | 212
Zipan curves (logarithmic function) | 213
Maxima and minima | 218
Markov chain analysis of the random walk problem (from Wikipedia) | 236
Figure 5.1
Figure 5.2
Figure 5.3
Figure 5.4
Figure 5.5
Figure 5.6
Figure 5.7
Figure 5.8
Blending | 270
Flow model of math cognition | 275
Model of numeracy and math cognition | 278
Butterworths model | 279
The numerosity adaptation effect | 283
Diagram for Rutherfords model of the atom | 290
Diagram for Bohrs model of the atom | 290
Diagram for Schrdingers model of the atom | 290
Unauthenticated
ix
Preface
Our work is to present things that are as they are.
Frederick II (11941250)
Mathematics is often designated a language, complete with its own symbols

and rules of grammar. The use of this term to characterize mathematics is, however, not to be considered just descriptive or gurative. The two (language and
mathematics) are very much alike. They may have different intellectual and practical functions, but they share many structural properties and, as research is
starting to show, they appear to share as well many structures in the brain.
Throughout the modern history of their science, linguists have actually used
mathematics frequently as a tool for investigating aspects of language, revealing
many important things about language and how it is used. Yet, some linguists
(perhaps many) skirt around the use of mathematics in their discipline for various
reasonsmathematics is something totally different from language, which has a
semantic basis to it, whereas mathematics does not; it is simply a quantitative
ancillary tool, adding nothing substantive to the existing repertory of techniques
within linguistics; and so on and so forth. Other linguists, on the other hand,
actually see language itself as a mathematical system, based on the same rules
of logic. On the other side, mathematicians have rarely looked to linguistics for
insights into their own discipline; but this situation has changed drastically in
the last little while, as they begin to realize that language is of central importance
to understanding how mathematics is conceptualized. Mathematicians are also
becoming more and more intrigued by research that is showing that mathematics
and language might be connected in the brain, forming a unitary cognitive system. If so, this has many implications for both disciplines, including a common
ground for developing an agenda of collaborative future research.
The primary objective of this book is to provide a general assessment of the
signicance that a common ground of research has for both disciplines. By reviewing the main applications of mathematics to the study of language, and vice versa,
by discussing the research on the connection of language to mathematics, I hope
to show that meaningful research can be conducted in an interdisciplinary fashion. A secondary objective is to show that the research methods and theoretical
pursuits of both disciplines have been remarkably similar throughout the ages,
identifying the reasons why this is so. In Ancient Greece, arithmetic, geometry,
and grammar were considered to be intertwined branches of knowledge. Grammar was separated from the mathematical arts subsequently as specialization became a tendency certainly by the Renaissance and then the Enlightenment. This
articial separation has impeded the fruitful study of the relationship between
Unauthenticated
Preface |
xi
language and mathematics ever since. There is, of course, a branch of linguistics
known as mathematical linguistics (to be discussed in chapter 3), which has the
specic aim of using mathematical constructs to develop grammatical theories;
but there really is no one general rubric in either linguistics or mathematics that
aims to study the relationship between the two disciplines, despite some truly intriguing attempts (which will also be discussed in this book). Needless to say, there
exist various interdisciplinary approaches that come under different rubrics, such
as the philosophy of mathematics, the psychology of mathematics, the anthropology of mathematics, the psychology of mathematics, and so on and so forth. Each
of these is a branch within its own eld. But there has never really been an overarching approach that connects mathematics and language, until very recently
with the advent of so-called mathematical cognition research (also called numeracy research)an area that will be examined closely in the nal chapter.
The study of the mathematics-language interface constitutes a hermeneutic enterprise. Most elds have oneliterature has literary criticism, music has
musicology, art has art criticism, and so on. These strive to understand the relevance of the eld to human knowledge and aesthetics through an analysis of
texts and expressive activities within each. The same kind of approach can be
applied to the math-language nexus. Arguably, the rst hermeneutical work in
this eld, although the authors did not name it as such, was by George Lakoff and
Rafael Nez, Where mathematics comes from (2000), in which they argue that
the same neural processes are involved in producing language and mathematics.
This line of inquiry has soared considerably since the publication of their book.
One of the offshoots from this new interest has been an increased sense of the
common ground that mathematicians and linguists share. Institutes such as the
Cognitive Science Network of the Fields Institute for Research in Mathematical
Sciences, co-founded by the present author, are now springing up everywhere to
lay the groundwork for formulating specic hermeneutical questions about the
interrelationship of mathematics and language.
The groundwork was laid, arguably, by Stanislas Dehaene (1997). He studied
brain-damaged patients who had lost control of number concepts. He was able to
trace the sense of number to the inferior parietal cortex, an area where various
subsystems are also involved in language processing (auditory, visual, tactile).
This type of nding is strongly suggestive of an inherent link between math and
language, even though Dehaene himself has kept away from making this connection directly. George Johnson (2013: 5) puts it as follows:
Scientists are intrigued by clues that this region is also involved in language processing and
in distinguishing right from left. Mathematics is, after all, a kind of language intimately involved with using numbers to order space.
Unauthenticated
xii | Preface
The skill of adding numbers is not unlike the skill of putting words together into
phrases and sentences. Lakoff and Nez see mathematics as originating in the
same neural substratum where metaphor and other gurative forms of language
originate. This is why, they claim, we intuitively prefer number systems based
on tenthe reason being that we have ten ngers, which we use instinctively to
count. Number systems are thus collections of linking metaphors, or mental
forms that transform bodily experiences (such as counting with the ngers) into
abstractions. Lakoff and Nez also make the seemingly preposterous claim that
even mathematical proofs stem from the same type of metaphorical cognition.
Incredibly, experimental psychological research is validating this hypothesis, as
will be discussed in this book. If we are ever to come to an understanding of what
language and mathematics are, such hermeneutical-empirical approaches cannot
be ignored or dismissed as irrelevant to either discipline.
My discussion in this book is nontechnicalthat is, I do not take prior mathematical or technical linguistic knowledge for granted. This may mean some reductions and oversimplications, but my objective is not to enter into the technical
minutiae of each discipline, which are of course interesting in themselves; rather,
it is to evaluate what a common ground of research entails for the two disciplines.
The rst chapter will look at this ground in a general way; the second one will discuss the role of logic and formalism in both disciplines; the third one will examine
how linguists, mathematicians, and computer scientists have been collaborating
to model natural language and mathematics in order to glean common patterns
between the two; the fourth chapter looks at quantitative approaches in both linguistics and mathematics, and especially the ndings that relate to how the two
disciplines, themselves, obey the laws of probability; and the nal chapter looks
at the ever-expanding idea that neuroscience can provide the link for studying
mathematics and language in a truly interdisciplinary way.
We are living in an age where mathematics has become a critical tool in virtually all elds of scientic inquiry (biology, sociology, economics, education, and
so on). As journalist Thomas Friedman (2007: 300) has aptly put it, the world is
moving into a new age of numbers in which partnerships between mathematicians and computer scientists are bulling into whole new domains of business
and imposing efficiencies in math. I would add that the same world also needs
to forge partnerships between mathematicians and linguists. Some of my assessments are, inevitably, going to be subjective. This is due in part to my own knowledge of both elds and my own theoretical preferences. Nevertheless, I hope to
provide a broad coverage of the common ground and thus to emphasize the importance of mathematics to the study of language and of linguistics to the study
of mathematics.
Unauthenticated
1 Common Ground
The knowledge of mathematical things is almost innate in us. This is the easiest of sciences,
a fact which is obvious in that no ones brain rejects it; for laymen and people who are utterly
illiterate know how to count and reckon.
Roger Bacon (c. 1214c. 1294)
Introductory remarks
In the 1960s, a number of linguists became intrigued by what they saw as the
mathematical properties of language and, vice versa, the linguistic properties
of mathematics (Marcus and Vasiliu 1960, Jakobson 1961, Hockett 1967, Harris
1968). Their pioneering writings were essentially exploratory investigations of
structural analogies between mathematics and language. They argued, for example, that both possessed the feature of double articulation (the use of a limited
set of units to make complex forms ad innitum), ordered rules for interrelating
internal structures, basic units that could be combined into complex ones, among
other things. Many interesting comparisons emerged from these studies, which
contained an important subtextby exploring the structures of mathematics and
language in correlative ways, we might hit upon deeper points of contact and thus
at a common ground for studying and thus understanding both.
At around the same time, generative grammar came to the forefront in theoretical linguistics (Chomsky 1957, 1965). From the outset, it espoused a basic mathematical mindsetthat is, it saw the study of language as a search for the formal
axioms and rules that undergirded the formation of all grammars. As his early
writings reveal, Chomsky was inspired initially by Markovs (1906) idea that a
mathematical system that has n possible states at any given time, will be in one
and only one of its states. The generativist premise was (and continues to be) that
the study of these states in separate languages will lead to the discovery of a universal set of rule-making principles that produce them (or reect them). These are
said to be part of a Universal Grammar (UG), an innate faculty of the human brain
that allows language to develop effortlessly in human infants through exposure,
in the same way that ight develops in birds no matter where they are in the world
and to what species they belong. The concept of rule in generative grammar was
thus drafted to be analogous to that in propositional logic, proof theory, set theory,
and computer algorithms. The connection between rules, mathematical logic, and
computation was actually studied insightfully by Alan Turing (1936), who claimed
that a machine could be built to process equations and other mathematical forms
without human direction. The machine he described resembled an automatic
Unauthenticated
2 | 1 Common Ground
typewriter that used symbols instead of letters and could be programmed to duplicate the function of any other existing machine. His Turing machine could
in theory carry out any recursive functionthe repeated application of a rule or
procedure to successive results or executions. Recursion became, and still is, a
guiding assumption underlying the search for the base rules of the UG. Needless
to say, recursion is also the primary concept in various domains of mathematics
(as will be discussed in the next chapter).
The quest to understand the universal structures of mind that produce language and mathematics, considered to be analogous systems, goes actually back
to ancient philosophers and, during the Renaissance, to rationalist philosophers
such as Ren Descartes (1641) and Thomas Hobbes (1656), both of whom saw
arithmetical operations and geometrical proofs as revealing essentially how the
mind worked. By extension, the implication was that the same operationsfor
example, commutation and combinationwere operative in the production of
language. As the late science commentator Jacob Bronowski (1977: 42) observed,
Hobbes believed in a world that could be as rational as Euclidean geometry; so,
he explored in its progression some analogue to logical entailment. Hobbes
found his analogue in the idea that causes entailed effects as rigorously as
Euclids propositions entailed one another. Descartes, Hobbes, and other rationalist philosophers and mathematicians saw logic as the central faculty of the
mind, assigning all other faculties, such as those involved in poetry and art, to
subsidiary or even pleonastic status. They have left somewhat of a legacy, since
some mathematicians see mathematics and logic as one and the same; and of
course so too do generative linguists.
Since the early 1960s, mathematical notions such as recursion have inuenced the evolution of various research paradigms in theoretical linguistics, both
intrinsically and contrastively (since the paradigm has also brought about significant opposing responses by linguists such as George Lakoff). Mathematicians,
too, have started in recent years to look at questions explored within linguistics,
such as the nature of syntactic rules and, more recently, the nature of metaphorical thinking in the production of mathematical concepts and constructs. Research
in neuroscience has, in fact, been shedding direct light on the relation between
the two systems (math and language), showing that how we understand numbers
and learn them might be isomorphic to how we comprehend and learn words.
As rigid disciplinary territories started breaking down in the 1980s and 1990s,
and with interdisciplinarity emerging as a powerful investigative mindset, the
boundaries between research paradigms in linguistics and mathematics have
been steadily crumbling ever since. Today, many linguists and mathematicians
see a common research ground in cognitive science, a edgling discipline in the
mid-1980s, which sought to bring together psychologists, linguists, philosophers,
Unauthenticated
1 Common Ground
| 3
neuroscientists, and computer scientists to study cognition, learning, and mental

organization. So, in a sense this book is about the cognitive science of language
and mathematics, but it does not necessarily imply that cognitive science has
found the light at the end of the tunnel, so to speak. As mentioned in the preface,
the basically empirical and theory-based focus of cognitive science will shed
light on the math-language interface only from a certain angle. The hermeneutic
approach espoused here is intended to insert other perspectives of a more critical
nature into the disciplinary mix that might provide a clearer picture of how the
interface unfolds intellectually and practically.
The purpose of this chapter is to provide an overview of the main areas that
fall onto a common ground of interest and research in linguistics and mathematics. Then, in subsequent chapters, the objective will be to zero in on each of these
areas in order to glean from them general principles that might apply to both systems. This is, in fact, a common goal today behind institutional initiatives such
as the Cognitive Science Network at the Fields Institute for Research in the Mathematical Sciences at the University of Toronto (mentioned in the preface).
Perhaps the rst detailed comparison of mathematics and language was
Charles Hocketts 1967 book, Language, mathematics and linguistics. Although a
part of the book was devoted to a critique of Chomskyan grammar, a larger part
dealt with describing properties that language and mathematics seemed to share
and with what this implied for the study of both. Hockett was a structuralist,
and his interest in mathematics was really an outgrowth of early musings on the
links between language and mathematics within structuralism, such as those
by Roman Jakobson, who claimed that notions such as the Saussurean ones of
value and opposition, could be protably applied to the study of mathematical
structure (see Andrews 1990). Hocketts book was an offshoot of Jakobsons implicit entreaty to study mathematics from the structuralist perspective. Since
then, much has been written about the relation between mathematics and language (for example, Harris 1968, Marcus 1975, 1980, 2003, 2010, Thom 1975, 2010,
Rotman 1988, Varelas 1989, Reed 1994, MacNamara 1996, Radford and Grenier
1996, English 1997, Otte 1997, Anderson, Senz-Ludlow, and Cifarelli 2000, 2003,
Bockarova, Danesi, and Nez 2012). There now exists intriguing evidence from
the elds of education, neuroscience, and psychology that linguistic notions
might actually explain various aspects of how mathematics is learned (for example, Cho and Procter 2007, Van der Schoot, Bakker Arkema, Horsley, and van
Lieshout 2009).
In a lecture given by Lakoff at the founding workshop of the Network mentioned above in 2011, titled The cognitive and neural foundation of mathematics:
The case of Gdels metaphors, it was saliently obvious to those presentmainly
mathematiciansthat in order to study mathematical cognition at a deeper
Unauthenticated
4 | 1 Common Ground
level than simply formalizing logical structures used to carry out mathematical activities (such as proof), it is necessary to understand the neural source of
mathematics, which he claimed was the same source that produced gurative
language. Lakoff discussed his fascinating, albeit controversial, view of how
mathematicians formed their proofs and generally carried out their theoretical
activities through metaphorical thinking, which means essentially mapping ideas
from one domain into another because the two domains are felt to be connected.
The details of his argument are beyond the present purposes, although some of
these will be discussed subsequently. Suffice it to say here that Lakoff looked at
how Gdel proved his famous indeterminacy theorem (Gdel 1931), suggesting
that it stemmed from a form of conceptualization that nds its counterpart in
metaphorical cognitionan hypothesis that he had put forward previously in
Where mathematics comes from (preface).
As argued in that book, while this hypothesis might seem to be an extravagant
one, it really is not, especially if one assumes that language and mathematics are
implanted in a form of cognition that involves associative connections between
experience and abstraction. In fact, as Lakoff pointed out, ongoing neuroscientic research has been suggesting that mathematics and language result from the
process of blending, which will be discussed in due course. It is sufficient to say at
this point that Lakoffs argument is highly plausible and thus needs to be investigated by mathematicians and linguists working collaboratively. The gist of his
argument is that mathematics makes sense when it encodes meanings that t our
experiences of the worldexperiences of quantity, space, motion, force, change,
mass, shape, probability, self-regulating processes, and so on. The inspiration for
new mathematics comes from these experiences as it does for new language.
The basic model put forth by Lakoff is actually a simple one, to which we shall
return in more detail subsequently. Essentially, it shows that new understanding
comes not from such processes as logical deduction, but rather from metaphor,
which projects what is familiar through an interconnection of the vehicle and the
topic onto an intended new domain of understanding. In this model, metaphor is
not just a gure of speech, but also a cognitive mechanism that blends domains
together and then maps them onto new domains in order to understand them.
The two domains are the familiar vehicle and topic terms which, when blended
together produce through metaphor new understanding, which is the intended
meaning of the blend (see Figure 1.1).
Lakoff presents a very plausible argument for his hypothesis. But in the process he tends to be exclusive, throwing out other approaches, such as the generative one, as mere games played by linguists. While I tend to agree with the substance of Lakoffs argument, as will become evident in this book, I also strongly
believe that the other approaches cannot be so easily dismissed and, when looked
Unauthenticated
1 Common Ground
| 5
Vehicle Term
Familiar Concept/
Object
New Understanding
Metaphor
Intended Meaning
Familiar Concept/
Object
Topic Term
Figure 1.1: Metaphor as the basis for new understanding
at in a non-partisan way, do give insights into language and its mathematical

basis, from a specic angle. Moreover, formalist models have had very fertile applications in areas such as Natural Language Programming in computer science
and in Machine Translation, which have both become critical tools of the Internet
(Danesi 2013).
While mathematicians are starting to look towards linguistics, and especially
cognitive linguistics (which is what Lakoffs approach is generally called), as a
source of potential insights into questions such as what is number sense, one can
also argue that linguistics, as a science, has always had an implicit interest in both
mathematics as a system of understanding and in using mathematical techniques
(such as statistics) to carry out specic kinds of research. For example, already
in the nineteenth-century, the neogrammarians developed their theory of sound
change on the basis of lists of frequently-used cognates. From their databases they
extracted principlesor laws as they called themof phonological change. Although they did not explicitly use statistical analysis (which was in its infancy
anyhow in their era), it was implied in their modus operandithat is, they developed their theories not through speculation, but by examining data in order to
conduct analyses and develop theories from them.
The common ground for interdisciplinary research in linguistics and mathematics can be subdivided into several main areas, implied by work that has been
conducted (and continues to be conducted) in both disciplines:
1. the study of language and mathematics as formal systems based on logical
analysis and logical symbolism;
2. the computer modeling of language and mathematics;
3. the use of computer algorithms for testing theories of language and of mathematics;
Unauthenticated
6 | 1 Common Ground
4. the use of statistical techniques and probability theory to understand the internal structural mechanisms of both systems;
5. the investigation of hidden properties, such as the fact that both language
and mathematics tend to evolve towards maximum efficiency and economy
of form;
6. the comparative study of neuro-cognitive processes involved in both language
and mathematics;
7. examining the hypothesis that metaphor is at the source of both systems and
what this entails for both disciplines;
8. providing an overall synopsis of the properties that unite language and mathematics into a single faculty with different functions or, on the other hand,
explaining why the two might form separate faculties, as some contrary research evidence suggests.
The study of (1) makes up the theme of chapter 2; the various concepts implicit
in (2) and (3) will be examined in chapter 3; chapter 4 will then look at the issues
connected with (4) and (5); and chapter 5 will discuss the research connected with
(6), (7), and (8) that links (or differentiates) language and mathematics. Some of
the themes will also be found in an overlapping manner in various chapters. This
is inevitable, given the interrelationships among them. In the remainder of this
one, an overview of how these themes and topics form, historically and actually,
a common research ground of the two disciplines will be touched upon by way of
preliminary discussion. There are of course many other aspects of research that
linguists and mathematicians share in common, but the selection made here is
meant, rst and foremost, to be illustrative of how interdisciplinary collaborations
work in these two elds and, second, to examine domains where collaboration
between linguists and mathematicians has been both explicit and implicit, since
at least the 1960s. As mentioned, the basic critical thrust is hermeneutic, that is,
interpretive of the structures and concepts that make up the common ground.
1.1 Logic
An obvious area of connectivity between mathematics and linguistics is in the
domain of the philosophy of both language and mathematics and its traditional
focus on logic as the basis of mathematical activities, such as proof, and as the
basis of language grammars. The approach based on equating logic, mathematics and grammar is, as is well known, called formalism. Simply dened, formalism is an analytical (hermeneutic) method that attempts to describe the formal
(structural) aspects of language and mathematics by using ideas and methods
Unauthenticated
1.1 Logic
derived from logical analysis. The basic intent is to provide a set of principles and
rules that are considered to constitute the underlying competencies that allow
people to comprehend and produce linguistic and mathematical artifacts (words,
sentences, numbers, equations, and so on). But formalist analysis is not solely
descriptive; it is also theoretical, seeking to explain how the artifacts come into
being in the rst place and what they reveal about the mind and, by extension,
human nature. In some cases, this is an explicitly-stated goal; in others it is an
unstated implicit one.
Formalism is grounded in models of logic, a fact that goes back to antiquity.
The notion of grammar itself is a de facto logic-based one, understood as a set
of ordered rules that allow speakers of a language to produce its phrases, sentences, and texts ad innitum, much like we are able to construct numbers with
a few rules for digit combination. Even a perfunctory consideration of how sentences are constructed suggests that the rules of grammar have many affinities
with the rules of arithmetic; but they also show differences. For example, addition in arithmetic is both commutative and associative, that is, the order in which
terms are added together does not matter: n + m = m + n. Some languages are
commutative; others are not. Latin is largely commutative, because its grammar
is agglutinative. A sentence such as Puer amat puellam (The boy loves the girl)
can be put together with its constituent words in any permutation, since the meaning of the sentence is determined on the basis of the case structure of the words
not their placementpuer is in the nominative case and is thus the subject of the
sentence no matter where it occurs in the sentence; puellam is in the accusative
case and is thus the object of the sentence no matter where it occurs in it. The
word order in Latin was more reective of social emphases than of syntax and
was, therefore, mainly a feature of style or emphasis. If, for example, the object
was to be emphasized, then the sentence was constructed as: Puellam puer amat.
English, on the other hand, is largely non-commutativeThe boy loves the girl
has a different meaning than The girl loves the boy and, of course, jumbling the
words in the sentence produces a nonsense string. This is why a language such
as English is sometimes called a digital language, because, like the binary and
decimal systems in numeration, symbol placement has valeur, as Saussure (1916)
called it; that is, it assumes a value in a specic structural slot or in a particular
structural set of relations among symbols. Grammar and arithmetic, therefore, evidently constitute a common ground for the study of the general formal properties
(or rules) that underlie the organization of their constituent symbols and forms.
The reason is that both are (purportedly) formal logical systems. There are ve
main principles that sustain formalism:
Unauthenticated
8 | 1 Common Ground
1.
Reason is the mental process that undergirds the formation of a system such
as language or mathematics.
2. Every system is grounded on rules of formation that can be specied formally.
3. The systematic use of the rules and their constituent symbols determine if
logical validity is inherent in a system or not.
4. The concatenation of symbols and rules (called the syntax) is the essence of
the systems grammar.
5. By examining logical systems for completeness and decidability it can be determined if the systems are consistent or not.
Sets of principles like these are classied under the rubric of the logical calculus.
The term is dened broadly as a set of symbols, axioms, and rules of formation
guided by logical sequence, entailment, and inference which are, in turn, the basis for activities such as mathematical proofs, syllogisms, language syntax, among
others. The logical calculus is the cornerstone of any formal systemas for example, Euclidean geometry, argumentation, the organization of knowledge in dictionaries and encyclopedias, and so on.
1.1.1 Formalism in linguistics and mathematics

In the set of principles undergirding the logical calculus, the one that species
the use of rules (principle 4 above) is of primary importance to the study of formalism in mathematics and grammar. A rule is a statement about a pattern that
operates within a particular system, describing or prescribing what is possible or
allowable within the system. Grammar is, essentially, a collection of rules that are
used in the construction of linguistic forms such as sentences; arithmetic, too, is a
collection of rules used to carry out the arithmetical operations. The rules for composing sentences are formal statements of how words and phrases are organized
to produce the sentences; the rules for composing arithmetical structures are formal statements of how numbers can be added, subtracted, and so on. Overall,
rules are the basis of any logical calculus, and they are thus seen by formalists as
revealing how any logical system operates.
The study of grammar began with both Pn.ini in India, who is thought to have
lived around the fourth century BCE, and at about the same time with the ancient
Greek philosophers. By the Middle Ages, European scholars began to speculate
about how languages might be compared in terms of their grammatical properties.
In the subsequent eighteenth century, the German philosopher and mathematician Gottfried Wilhelm Leibniz proposed that the languages of Europe, Asia, and
Egypt sprang from the same original Indo-European language (Robins 1990). Leib-
Unauthenticated
1.1 Logic |
niz espoused the idea that all languages were based on universal properties of
logic. This is why they had the same basic kind of rules for making sentences, revealing that all humans possessed the same innate faculty of logic. Shortly after
Leibniz in the nineteenth century, the formal study of grammars emerged alongside the study of linguistic change. In the early twentieth century, anthropological
linguists such as Franz Boas (1940) challenged this universal logic approach to
the study of grammar, especially since his research showed that there was much
more to language than a set of rules and rule-making principles for the construction of sentences. Boas saw the study of different grammars as a means to understand how every language served the specic needs of its inventors and users.
Grammars are inventions of particular peoples adapting to their particular environments. Danish linguist Otto Jespersen (1922), on the other hand, revived the
notion of universal properties in the worlds languages, leading eventually to the
rise of the generative movement in the late 1950s.
Arguably, the raison dtre for the formal study of rule systems and their properties in mathematics and language is the belief that knowledge systems can be
decomposed ultimately into irreducible units that, when combined, show constituency and coherence of structure. Knowledge cannot be random; it must be
organized in order for it to be useful and useable. Rules are really attempts to
characterize the organization of systems. The premise is, therefore, that within
a system, separate and seemingly disparate forms such as words and numbers
will take on coherence and validity only if they are organized by rules that are,
themselves, derivatives of a general class of rules of logic that make up human
reason. This paradigm has allowed linguists and mathematicians to provide relevant organizational frameworks and to postulate increasingly abstract properties
about them. In linguistics that postulation has led to theories of grammar, such
as the generative one; in mathematics it has led to theories of proof, numbers,
and the amalgamation of subsystems such as geometry and arithmetic (analytic
geometry). Rules are not prescriptions; they are formal statements about what is
possible or allowable within each system.
A perfect example of what formal rules of grammar are, actually, is found
in Pn.inis grammar of Sanskrit for which he identied 4,000 sutras (rules) in
his treatise titled Ashtadhyayi. His sutras are the earliest extant example of formal grammatical analysis on record. It is no coincidence that Pn.ini was also
considered to be a great mathematician in India. The sutras are very much like
mathematical rules, showing how Sanskrit words, phrases, and sentences are
interlinked sequentially and through entailment (Kadvany 2007)two basic features of the logical calculus. He also introduced the notion of mapping, preguring current theoretical models such as those involving metaphor, whereby one set
Unauthenticated
10 | 1 Common Ground
of rules is mapped onto other domains (including other sets) to produce a complete and coherent grammar (Prince and Smolensky 2004).
An example of Pninis method can be seen in the rst two sutras:
1.
2.
vr.ddhir daiC
adeN gun.ah.
The capital letters are symbols for phonemic units or other phonological structures; the other parts of the sutras describe morphological structure and how
it relates to both the phonological constituents and syntactic forms in general.
These are truly remarkable, showing how the main components of a grammar
the phonological, morphological, and syntacticare interrelated, preguring
modern-day grammars. The goal of a formal grammar, as will be argued more
extensively in the next chapter, has always been to show how these components
interact through a sequence of rules of different types, via entailment and mapping. This was Chomskys explicitly-stated goal in 1957. But this formalist mindset
has found resonance in other models of language. For example, in tagmemics
(Pike 1954, Cook 1969), the basic unit of analysis, called the tagmeme, is akin to
a sutra in that it shows how grammatical classes (such as subject and object) are
connected to paradigmatic, or slot-based, llers (nouns, verbs, adjectives, and
so on). The hierarchical organization of levels (from phonology to discourse) is
composed of tagmemes that are combined into more complex units, called syntagmemes. And like UG theory, straticational grammar (Lamb 1999) sees rule
types as mirroring neural processes. The separate strata of language are assumed
to reect the organization of neural wiring in the brain that consists of strands
connected to each other as in electric circuitry.
Pn.inis pioneering work on grammar inuenced mathematical theories in
ancient India, constituting perhaps the rst ever awareness of a connection between language and mathematics. Indian mathematicians started representing
numbers with words, and ultimately developing numerical axioms linked to each
other in the same way that sutras in language are interrelated at various levels.
At about the same time in Greece, Aristotle took a comparable interest in formalizing grammar, identifying the main parts of a sentence as the subject and the
predicatea structural dichotomy that is still a fundamental part of grammatical
analysis to this day (Bck 2000). Aristotle inspired others to study grammar with
the tools of formal logic, rather than impressionistically. He was, of course, aware
of the difference between the literal and rhetorical uses of the units of language,
writing two masterful treatises on this topic (Aristotle 1952a, 1952b). But, for Aristotle, rhetorical language, such as that manifesting itself in poetry, fell outside the
perimeter of grammar proper, and was thus to be considered an extension of, or
Unauthenticated
1.1 Logic
11
exception to, literal language. One can study rhetorical language on its own, as a
self-contained system. Its overall function was aesthetic and thus fell outside of
strict formal grammatical analysis.
Ironically, it was Aristotle who coined the term metaphor, as is well known.
For Aristotle it was a very useful trope that allows us to refer to something that we
grasp intuitively, but which seems to defy a straightforward literal explanation
or concrete demonstration. Unlike visible things, such as animals, objects, and
plants, something like an idea cannot be shown for someone to see with the
eyes. However, by comparing it to something familiar in an imaginary way, then
we can grasp it much more easily. Aristotle saw metaphor as a heuristic tool for
understanding things that cannot be demonstrated concretely. The tool itself was
based on what he called proportional reasoning. For example, in the metaphor
Old age is the evening of life, a proportion can be set up as follows:
A = old age,
B = life,
Therefore:
C = evening,
D = day
A is to B as C is to D
The reasoning thus hides a hidden logicthe old age period is to life as the
evening is to the day. Now, as knowledge-productive as it was, the most common
function of metaphor in human life was, according to Aristotle, to spruce up more
basic literal ways of speaking and thinking using the logic of proportionality
(Aristotle 1952a: 34). Aristotles view of rhetorical language remained a dominant
one for many centuries until, virtually, the present era when the work on metaphor
within cognitive linguistics is telling a completely different story. One source for
the exclusion of metaphor from serious consideration in western philosophy and
science were the views of rationalist philosophers such as Descartes, Leibniz, and
Locke. Locke (1690: 34) even went so far as to characterize metaphor as a fault:
If we would speak of things as they are, we must allow that all the art of rhetoric, besides
order and clearness, all the articial and gurative application of words eloquence hath
invented, are for nothing else but to insinuate wrong ideas, move the passions, and thereby
mislead the judgment; and so indeed are perfect cheats: and therefore, however laudable or
allowable oratory may render them in harangues and popular addresses, they are certainly,
in all discourses that pretend to inform or instruct, wholly to be avoided; and where truth
and knowledge are concerned, cannot but be thought a great fault, either of language or
person that makes use of them.
Hobbes (1656) also inveighed ercely against metaphor, characterizing it as an

obstacle to communication and thought, a source of ambiguity and obscurity, and
thus, a feature of language to be eliminated from true philosophical and scientic
inquiry. He came to this view because he believed, as briey mentioned, that the
laws of arithmetic mirrored the laws of human thought, and thus that the only
Unauthenticated
meaningful form of philosophical inquiry was of the same literal-logical kind as

the one used to explicate mathematical notions.
An indirect reason for the neglect of metaphor may be Aristotles own explanation of metaphor as a proportion (above). This made it a subtype of logical reasoning and thus required no special attention as a unique phenomenon. This in no
way implies that it is an irrelevant theory. On the contrary, as Umberto Eco (1984:
88) has aptly pointed out, despite the thousands and thousands of pages written about metaphor since Aristotle formulated his theory, no single explanation
has ever really eclipsed it. But the Aristotelian view does not explain the impulse
to construct and use metaphors in the rst place. The inuential Roman rhetorician Quintilian subsequently claimed that metaphor revealed nothing more than
a substitutive strategy for literal language. Thus, in an expression such as Julius
Caesar is a lion, Quintilian maintained that we simply substitute the term lion
for its literal counterpart, a courageous man, so as to make it more memorable
or effective. But, like Aristotles proportion theory, Quintilians substitution theory tells us nothing about the psychological motivation for the substitution in the
rst place. If metaphor were merely an embellishment of literal speech, then it
would appear only on special occasions. But it does not, and is found throughout conversations of all kinds. Such views are clearly based on the belief that
literal meaning is the default form of language. However, as will be discussed
subsequently, metaphor became the proverbial y in the ointment for literalist
theories of language in the 1970s when empirical research showed that it was a
pervasive form of speech (Pollio, Barlow, Fine, and Pollio 1977).
In Aristotle, one can see the beginnings of the formalist hypothesis (as it has
been called here). The rst to write a formal grammar of the Greek language, based
on the Aristotelian perspective, was the scholar Dionysius Thrax, who lived between 170 and 90 BCE. Thrax developed a taxonomy of the parts of speech and a
set of rules for relating them to each other in the formation of sentences. He identied nouns, verbs, articles, pronouns, prepositions, conjunctions, adverbs, and
participles as the main parts. Thraxs subsequent work, called the Tekhne grammatike (Kemp 1986), begins with a denition of grammar and a description of
accents, punctuation marks, sounds, and syllables.
A similar approach was adopted by the Roman grammarian Priscian, who
lived in the sixth century CE (Luhtala 2005). Priscians grammar served as a general model for medieval and Renaissance scholars and educators to teach Latin
and Greek in school. The translation of the works of the Greek philosophers in the
late medieval period led gradually to a new awareness of ideas that fell outside
of theology. The result was, rst, the movement known as Scholasticism, whose
representatives claimed that the study of grammar and logic gave us a better understanding of the importance of human reason, and, a little later, of the move-
Unauthenticated
1.1 Logic
13
ment known as humanism, which also stressed human reason and imagination
above all else. Within this paradigm shift there were somecalled nominalists
who argued that it is foolish to think that reason guides understanding because
it is based on language. John Duns Scotus and William of Ockham, for instance,
stressed that words ended up referring to other words, rather than to actual things;
and thus that they hardly were conducive to logical thought. Thomas Aquinas had
argued, however, that words did indeed refer to real things in the concrete and to
categories of things in the abstract, even if they constituted variable human models of them (Osborne 2014). At about the same time, Roger Bacon developed one
of the rst comprehensive typologies of linguistic signs, claiming that, without a
rm understanding of the role of logic in the constitution and use of sign systems,
discussing if truth is or is not encoded in them would end up being a trivial matter
of subjective opinion (Bacon 2009).
The foregoing historical foray into the origins and rise of formalism is, of
course, a highly reductive one. The point intended has been simply to suggest
that the emergence of the concept of grammar as a set of rules connected logically
to each other is an ancient one, paralleling the Euclidean view that mathematics
is founded on axioms, postulates, theorems, and rules of combination that lead
to proofs. The ancient grammarians and mathematicians thus laid down the
foundations for formalism to arise as a major paradigm in the philosophy of both
mathematics and language. But it was not an arbitrary introspective mode of
inquiry; it was based on observing and classifying the facts, before devising the
relevant rules. This epistemology can be portrayed in the form of the diagram
below, presented here simply as a schematic model summarizing the formalist
hypothesis as it was established in the ancient world. Note that this is not found
in any of the ancient or medieval writings; it is simply a diagrammatic summary
of the foregoing discussion (see Figure 1.2).
The rst explicit study of grammatical rules in their own right, apart from
their use in the generation of sentences, can be traced to the seventeenth century
and the Port-Royal Circle. In their 1660 Port-Royal Grammar, Antoine Arnauld
and Claude Lancelot put forth the notion that complex sentences were made
up of smaller constituent sentences that had been combined by a general rule;
this was a truly radical idea for the time (Rieux and Rollin 1975), although the
concept of mapping found in Pn.ini certainly pregured this very notion. Clearly,
the Port-Royal grammarians were unaware of Pn.inis work. A sentence such
as Almighty God created the visible world not only could be decomposed into
smaller constituentsGod is almighty, God created the world, The world is visiblebut could be described as the end result of a rule that combined the smaller
constituents into the complex sentenceit is a sort of meta-rule that combines
the sentences produced by lower-level rules. Arnauld and Lancelot then argued
Unauthenticated
Linguistic,
Mathematical
Facts
Words,
phrases,
sentences
Counting,
adding, taking
away
Putting the
words, phrases,
and sentences
into basic
classes
Putting the
numbers into
operations
Formalization
of the classes
into ordered
rules of grammar and of the
arithmetical
operations
Figure 1.2: The formalist mode of inquiry
that rules of this kind manifested themselves in different languages in specic

ways. The details varied, but the rule types did not. As is well known, this general
approach was adopted and expanded by Chomsky in modern times, who acknowledged his debt to the Port-Royal grammarians in an explicit way (Chomsky
1966a).
The main premise of the Port-Royal Circle is actually a plausible onenamely,
that the rules of language, when assembled, might reveal universal properties of
rules, and this is why language as a faculty is not invented over and over, across
generations of speakers, because everyone possesses those properties as part of
being human. This view paralleled the debate in mathematics that surfaced at
around the same time, known as the Platonic-versus-constructivist one, which
can be encapsulated by questions such as the following: Do we discover mathematics or do we invent it and then discover that it works? Was 2 out there
in some absolute sense ready to be discovered when the Pythagoreans did so by
examining various sizes of right triangles, or did they produce it inadvertently
through a manipulation of the Pythagorean theorem? Plato believed that mathematical ideas pre-existed in the world and that we come across them, or perhaps
extract them, from the world through logical reasoning. Just like the sculptor takes
a clump of marble and gives it the form of a human body, so too mathematicians
take a clump of reality and give it logical form. In both representations we discover things about the minds faculties. The truth is already in the clump; but
it takes the skills of the trained mathematician to discover it. Many now nd this
perspective difficult to accept, leaning towards constructivism, or the idea that
mathematical objects are constructed, telling us what we want to know about the
world, or what we need to know from it, rather than what is there in any abso-
Unauthenticated
1.1 Logic
15
lute sense. But, as Berlinski (2013: 13) suggests, the Platonic view is not so easily
dismissible even today:
If the Platonic forms are difficult to accept, they are impossible to avoid. There is no escaping them. Mathematicians often draw a distinction between concrete and abstract models
of Euclidean geometry. In the abstract models of Euclidean geometry, shapes enjoy a pure
Platonic existence. The concrete models are in the physical world.
Moreover, there might be a neurological basis to the Platonic view. As neuroscientist Pierre Changeux (2013: 13) muses, Platos trinity of the Good (the aspects of
reality that serve human needs), the True (what reality is), and the Beautiful (the
aspects of reality that we see as pleasing) is actually consistent with notions being
explored in modern-day neuroscience:
So, we shall take a neurobiological approach to our discussion of the three universal questions of the natural world, as dened by Plato and by Socrates through him in his Dialogues.
He saw the Good, the True, and the Beautiful as independent, celestial essences of Ideas,
but so intertwined as to be inseparable within the characteristic features of the human
brains neuronal organization.
However, there is a conundrum that surfaces with Platos view. Essentially, it implies that we never should nd faults within our formal systems of knowledge,
such as exceptions to rules of grammar and arithmetic, for then it would mean that
the logical brain is faulty. As it turns out, this is what Gdels (1931) theoremor
more correctly theoremsrevealed. However, if mathematics is faulty because we
are faulty, why does it lead to demonstrable discoveries, both within and outside
of itself? Ren Thom (1975, 2010) referred to discoveries in mathematics as catastrophes in the sense of events that subvert or overturn existing knowledge (rule
systems). Thom named the process of discovery semiogenesis which he dened
as the emergence of pregnant forms within symbol (rule) systems themselves.
These develop in the human imagination through contemplation and manipulation of the forms. As this goes on, every once in a while, a catastrophe occurs that
leads to new insights, disrupting the previous system. Now, while this provides a
plausible description of what happensdiscovery is indeed catastrophicit does
not tell us why the brain produces catastrophes in the rst place. Perhaps the connection between the brain, the body, and the world will always remain a mystery,
since the brain cannot really know itself.
Actually, the dichotomy between logic and constructivism, or in more contemporary terms, formalism and blending, is an articial one, with those on either
side staking their territories in an unnecessarily adversarial way. Both viewpoints
have some validity and both need to be compared and contrasted in order to get
a more comprehensive understanding of the mental forces at work in producing
Unauthenticated
both mathematics and language. This is a theme that will be interspersed throughout this book. In my view, there is no one way to explain mathematics or language;
there are likely to be many ways to do so, no matter how faulty or impartial these
are. There will never be a general theory of anything, just pieces of the theory
that can be combined and recombined in various ways according to situation and
needs.
The rst counter-argument to the Port-Royale paradigm was put forward by
Wilhelm von Humboldt (1836), who maintained that languages may have similar
rule types in the construction of their grammars, but the rules only touched the
surface of what the faculty of language was all about. He basically described it as
a powerful tool for carving up the world, fullling the specic needs of the people
who used it. Below the surface, the rules of a specic language thus tell a different
story than just the logical selection and combination of forms independently of
how they relate to reality (Platos Truth). They reected what Humboldt called an
innere Sprachform (internal speech form), which encodes the particular perspectives of the people who speak the language. He put it as follows (Humboldt 1836
[1988]: 43):
The central fact of language is that speakers can make innite use of the nite resources
provided by their language. Though the capacity for language is universal, the individuality
of each language is a property of the people who speak it. Every language has its innere
Sprachform, or internal structure, which determines its outer form and which is a reection
of its speakers minds. The language and the thought of a people are thus inseparable.
Despite the ideas of Humboldt and Boas (mentioned above), the study of the universal properties of grammars continues to constitute a major trend in current
linguistics. The formalist hypothesis will be discussed in more detail in chapter 2.
The premise behind this hypothesis, as implied by the above model (gure 1.2)
derives from the common-sense observation that when we put words together to
express some thought or to convey some piece of information, the combination is
not random, but rule-based, and this is also why the meaning of a combinatory
structure cannot be computed as the sum of the meaning of its parts. Each word
taken in isolation can, of course, be studied on its own from several perspectives
in terms of the pronunciation patterns it manifests, in terms of the specic meanings it encompasses, and so on. In fact, a large portion of linguistic analysis has
been, and continues to be, devoted to the study of units and forms in isolation.
But the power of language does not lie just in the units taken separately, but in
the ways in which they are combined, that is, in their grammar. Sentences are,
in this view, holistic structures that are governed by rule-making principles that
are used to make up the sentences, much like an architect puts together specic
architectural forms to design a building. This premise is still the one that drives
Unauthenticated
1.1 Logic
17
current formalist research. It can be summarized as a corollary to the formalist

hypothesis, namely the syntax hypothesis, or the view that syntax (the rules of
combination) is the core of language.
The same hypothesis applies to mathematics. All we have to do is change the
relevant terms and we will have the same basic logical template. Again, the claim
is that the power of mathematics does not lie in the units by themselves, but in the
ways in which they are combined to produce equations and proofs. Equations are
equivalent to sentences and proofs to texts (the concatenation of sentences). It is
obvious that the study of formal rules in language and mathematics is an important interdisciplinary area that would benet both linguists and mathematicians.
It implies that there is a superstructure to both systems that unites them, at the
very least, at the level of rule-making principles.
The problem remains the role of logic in this broad scenario and how it is dened. In the construction of rules and in their applications to formal systems, one
cannot underestimate the role of inference by analogy, which involves guring out
why something is the way it is on the basis of experience and by detecting a resemblance among things. The power of analogy in mathematics has been discussed
extensively (Hofstadter 1979, Hofstadter and Sander 2013). Einstein himself understood this to be a law of human thought when he resorted to analogies both to
present his theory of relativity and to explore its profound implications. The importance of analogies was known to Plato, Aristotle, Descartes, and many other
philosophers who espoused the formalist hypothesis. But they saw it only as an
adjunct to formalist analysis. Prominent among the philosophers who, instead,
saw analogy as a basic force in how logic itself is constructed was the Italian
philosopher Giambattista Vico (Bergin and Fisch 1984), who located the source
of analogies in the imagination. Vico warned that an emphasis on rational logic,
apart from imaginative thought, was ultimately counterproductive. Natural discovery and new understanding, he emphasized, followed a train of thought that
started from imaginative (analogical) modes of thinking, progressing only gradually, and with signicant effort, to rational modes. The debate on the role of imagination in human thought goes back to Plato, who separated the image (eikon)
from the idea (eidos). This set in motion the tendency to view rational thought
(eidos) as radically divergent from mental imagery (eikon), not as intrinsically intertwined with it. Descartes reinforced this separation by claiming that mental
images proceed without logic, and so cannot be associated in any way with the
latter. The Cartesian view ignores, of course, the Renaissance tradition of ingenium and the fact that even Plato used myths and insightful linguistic imagery to
describe his views of ideas and forms. Paradoxically, as Verene (1981) has pointed
out, Descartes own style of presentation unfolds in the form of highly suggestive
and creative imagery. What Plato, Descartes, and all philosophers xated on the
Unauthenticated
idea forgot, according to Vico, was that imagination (eikon) is essential to thought.
These philosophers pay lip service to it, but ultimately end up privileging rational
logic as the main form of mentality deployed in mathematics and grammar.
1.1.2 Syntax
The syntax hypothesis (as it is called here) was articulated explicitly for the rst
time in 1957, when Chomsky argued that an understanding of language as a universal faculty of mind could never be developed from a piecemeal analysis of the
disparate structures of widely-divergent languages taken in isolation, which, he
suggested, was the approach taken by American structuralists such as Bloomeld (1930). The units of different languagesthe phonemes and morphemesare
certainly interesting in themselves, but they tell us nothing about how they are organized to produced larger structures, such as sentences. He claimed, moreover,
as did the Port-Royal grammarians, that a true theory of language would have to
explain why all languages seem to reveal a similar structural plan for constructing
their sentences. He proposed to do exactly that by shifting the focus in structural
linguistics away from the making of inventories of isolated piecemeal facts to a
study of the rule-making principles that went into the construction of sentences.
He started by differentiating between the deep structure of language, as a
level of organization which could be characterized with a small set of rules that
were likely to be found in all languages, no matter how seemingly different they
appeared, and the surface structure where sentences are well-formed and interpreted in rule-based ways. The relation of the surface to the deep structure was
established by a set of rules, called transformational, that mapped deep structure
strings onto surface ones. So, in this rather simple, yet elegant, model, all languages share the same set of deep structure rules but differ in the type and/or
application of transformational rules. Although this version of generative grammar has changed radically (at least according to the generativists themselves), it is
still the basic outline of how rules in generative grammar functionthey generate
basic strings of units and then transform them in more complex ways.
The essence of Chomskys initial approach can be seen in the analysis he himself put forward of the following two sentences:
1.
2.
John is eager to please.

John is easy to please.
Unauthenticated
1.1 Logic |
19
Both these sentences, Chomsky observed, would seem to be built from the same
structural plan on the surface, each consisting of a proper noun followed by a
copula verb and a predicate complement:
Structural Plan
Proper Noun
Copula Verb
Predicate Complement
John
is
eager
to please
John
is
easy
to please
Figure 1.3: Chomskyan analysis of surface structure
Despite the same surface structure, the sentences mean very different things:
(1) can be paraphrased as John is eager to please someone and (2) as It is easy
for someone to please John. Chomsky thus concluded that the two sentences had
different deep structures, specied by phrase structure rulesthese merge into
one surface structure as the result of the operation of transformational rules.
This is brought about by rules that: delete someone in (1); delete It and for someone and move John to the front in (2). Although this is a simplied explanation
of Chomskys example, it still captures the essence of his method and overall
blueprint for grammar.
Chomskys approach was radical for the times, providing arguably the rst
formal theory of how sentences are related to each other and what kinds of rules
inform the grammar of any language. The two main types, as we saw, are phrase
structure and transformational, and the latter operate schematically as follows:
Transformational Rule
|
John is eager to please
someone
Surface Structure
|
Delete someone
John is eager to please
Transformational Rules
Surface Structure
It is easy for someone
It
for someone
John to the
to please John
John is easy to please
front
Figure 1.4: Transformational rules
Unauthenticated
Chomsky then suggested that, as linguists studied the deep structures of different
languages, and how transformational rules mapped these onto surface structures
differentially, they would eventually be able to conate the rules of different languages into one universal set of rule-making principlesthe syntax hypothesis.
Chomskys proposal became immediately attractive to many linguists, changing
the orientation and methodology of linguistics for a while. Above all else, the syntax hypothesis seemed to open the research doors to investigating the age-old
belief that the rules of grammar corresponded to universal innate logical ideas
(Plato, Descartes). Moreover, it was a very clear and simple proposal for linguists
to pursue.
But problems with the syntax hypothesis were obvious from the outset. It was
pointed out, for instance, that abstract rule-making principles did not explain the
semantic richness of even the most simple sentences. This critique put the very
notion of a deep structure embedded in phrase structure rules seriously in doubt.
Moreover, it was suggested that the universal rules inferred by linguists by comparing the deep structures of different languages rested solely on the assumption
that certain rules were more basic then others. As it has turned out, it was the
structure of the positive, declarative sentence of the English language that was
seen as the default sentence type that best mirrored the deep structure of the
UG. Although this assumption has changed over the years, it is correct to say that
the basic plan of attack in generative grammar has not. The search for universal rules and language-specic adaptations of these rules (known as parameters)
continues to guide the overall research agenda of generative linguistics to this day
and, by extension, of any formal approach based on the syntax hypothesis.
Chomsky proclaimed that the primary task of the linguist was to describe
the native speakers ideal knowledge of a language, which he called an unconscious linguistic competence, basically substituting this term for Saussures term
of langue. From birth, we have a sense of how language works and how its bits
and pieces are combined to form complex structures (such as sentences). And
this, he suggested, was evidence that we are born with a unique faculty for language, which he later called an organ, that allows us to acquire the language
to which we are exposed in context effortlessly. Language is an innate capacity.
No one needs to teach it to us; we acquire it by simply listening to samples of
it in childhood, letting the brain put them together into the specic grammar on
which the samples are based. It is as much an imprint as is our reex system. Given
the status that the syntax hypothesis had attained in the 1960s and most of the
1970s, many linguists started researching the syntax hypothesis across languages
and investigating the details of grammatical design. By the 1980s, however, the
utility of this line of inquiry started to be seen with less enthusiasm, and a surge
of interest in investigating how languages varied in structure according to social
Unauthenticated
1.1 Logic
21
variables and different cultural contexts became increasingly a mainstream paradigm within linguistics. Ironically, this counter-response to generative grammar
(in its most rigid versions) may have been brought about in large part by the fact
that generativism had produced an overload of theories, making it somewhat unmanageable and unwieldy as a formal approach to language, which requires a
unied theoretical framework.
But generative grammar did bring about one very important change in the
mindset of many linguistsit associated mathematics with language. Generative
grammar was, in fact, called mathematical by many for the reason that it used
notions from mathematics, such as Markov chains, commutation, tree structure,
transformation, and the like. The main premise of the syntax hypothesis is that
when units are combined into larger complex structures they produce new and
emergent forms of meaning. In various domains of science ad mathematics
emergence of form is seen as arising through interactions and relations among
smaller and simpler units that themselves may not exhibit the properties of the
larger entities. The syntax hypothesis is a version of this view (Hopper 1998),
driving a large portion of research in formalist and computational linguistics, as
we shall see in the next two chapters.
The counter-movement to generativism has come to be called functionalism. Its basic tenet is that grammar is not hard-wired in the brain, but rather
that it varies according to the functions that a language allows speakers to carry
out. From this paradigm, several research trends have emerged, such as systemic
grammar and cognitive linguistics. The main claim of functionalists, who parallel
in outlook the constructivists in mathematics, is that grammar is connected to
the innere Sprachform (to recall von Humboldts term). As discussed briey, Franz
Boas had espoused a very similar perspective before the generative movement.
Collecting data on the Kwakiutl, a native society on the northwestern coast of
North America, he explored how the grammar and vocabulary of that language
served specic social needs. They were the result, in other words, of the particular experiences of the Kwakiutl. In response to functionalism, the generativists
claimed that they were not against the study of socially-diverse forms of language,
but that, like Saussures (1916) distinction between langue and parole, these were
best approached via branches such as sociolinguistics and linguistic anthropology. Moreover, these were really matters of detail. A language such as Kwakiutl
was still based on the same grammatical blueprint of any other language in the
world. Linguistic competence is an autonomous faculty that should be studied as
such, much like the axiomatic structure of arithmetic, which can be studied apart
from its practical manifestations.
For the sake of historical accuracy, it should be mentioned that the concept of
phrase structure came out of early structuralism. Leonard Bloomeld (1933), for
Unauthenticated
example, emphasized the need to study the formal properties of sentences and
phrases, which he called immediate constituent (IC) analysis. In IC analysis sentences are divided into successive constituents until each one consists of only a
word or morpheme. In the sentence The mischievous boy left home, the rst
subdivision of immediate constituents would be between The mischievous boy and
left home. Then the internal immediate constituents of the rst are segmented as
the and mischievous boy, and then mischievous boy is further divided into mischievous and boy. The constituent left home, nally, is analyzed as the combination of left and home. Chomsky took his cue from IC analysis, adding the mathematical notion of transformation to it, as he himself acknowledged (Chomsky
1957).
Various extensions, modications, and elaborations of generative grammar
have been put forward since 1957. There is no need to discuss them in detail here.
Suffice it to say that the three main ones are the following:
1. Transformational-generative grammar (TG grammar), which is based on
Chomskys original model of 1957 that he modied in 1965, becoming at
the time the so-called standard theory. It still has many adherents who
see it as a straightforward approach to the syntax hypothesis. As will be
discussed in the next chapter, TG grammar includes phrase structure rules,
transformational rules, and lexical insertion rules. The latter are rules that
insert lexemes into the slots in the strings generated by the syntactic rules.
In 1965, Chomsky put forward a detailed account of how these rules worked,
including projection rules and subcategorization rules. In my own view, TG
grammar is still the most elegant and viable formalist theory of language,
even though many would claim that this is a nave view. Admittedly, only
experts in formalist linguistics can truly discuss the signicant departures
from the early TG theory, but to linguists who do not follow the formalist
hypothesis, it is my sense that TG theory is still the most attractive one.
2. Government and Binding (GB) theory, which is an elaboration of TG, developed by Chomsky himself in the late 1970s and 1980s where he introduced
the concept of modularity, whereby modules (basic and complex) are related
to each other through rules, rather than as being considered part of a dichotomy of deep and surface structure forms. In some versions of GB theory,
the surface-structure is actually seen as unnecessary. GB theorists have also
added stylistic rules and meaning-changing rules to the basic generativist
framework, in order to address various critiques that emerged with regard
to the articial separation of syntax from semantics in the TG model. The
concepts of deep and surface structure are thus greatly modied (now called
d-structure and s-structure) and considered to be linked by movement rules.
Unauthenticated
1.1 Logic |
3.
23
Minimalist Program (MP) is an extension and modication of GB starting

in the 1990s. The formal theory of UG nds its most extensive articulation
within the MP paradigm. In the rule system of MP, there are specic kinds
of independently-operating morphological rules, in response to the fact that
agglutinative languages provide counter-evidence to the original syntax hypothesis (as discussed above). Within MP theory sentences are generated by
optimally efficient derivations that must satisfy the conditions that hold on
various levels of linguistic representation.
For formalists, the key to understanding complexity of structure is to be found

in the rule-based relations that the smaller units have vis--vis the more complex
ones. While formalism is certainly pivotal in Natural Language Processing technologies and theory, as we shall see, it may in the end tell us very little about
the relation between meaning and complexity, as Lakoff and others have pointed
out, and to which we shall return subsequently. Within formalist research, moreover, various splinter factions have arisen over the years, as can be expected in
any paradigm. The so-called modular system, an offshoot of GB, posits more than
one transformational component, arguing, in addition, that various generative
modules (rule packages), which independently characterize syntax, semantics,
and other subsystems, are needed to describe language, including an interface
system that connects them all (Saddock 2012). But the modular approach has not
really had a lasting impact on formalist linguistics, having simply put forward a
more complex (albeit interesting) rule apparatus for describing syntactic systems
in relation to other systems. As such, it has produced few new insights other than
how rule systems can be manipulated creatively.
Another opposing framework is known, generally, as the biolinguistic one,
originating mainly in the work of Derek Bickerton (for Bickertons recent views
see his 2014 book, More than nature needs). His main claim is that language is
a displacement of animal communicationthat is, it is an advanced exaptation
of animal communication that is not bound by the stimulus-response constraints
to which the latter is tied. Language, therefore, starts with the equivalent of animal signalswordswhich it displaces and then puts into relation to each other
through syntactic rules. Although this perspective attempts to distance itself from
the traditional syntax hypothesis, it really does not do so. Bickerton claims that
there is an engine in the brain that puts words together into sentences. This
sounds suspiciously like the language organ that Chomsky talks about. The goal
of generative grammar is to come up with a complete model of the rules that make
up this organ, which can then be used to describe all human languages, allowing
us to determine if they are human (versus animal or articial). Bickertons proposal seems to be no more than an extension of this mindset.
Unauthenticated
There are various other kinds of theoretical frameworks that subscribe to the
syntax hypothesis. These need not be discussed here in any detail, since they
have a handful of adherents. They simply merit mention: Arc pair grammar, Dependency grammar, Lexical functional grammar, Optimality theory, Stochastic
grammar, and Categorical grammar. The central feature of all is the belief that
there are two sets of rulesone for making up basic structures and one for mapping these onto more complex ones.
1.1.3 Formal analysis

The underlying objective of all formalism is, to summarize, unraveling the rules
that make a system, such as mathematics or language, operate with completeness
and consistency (Ganesalingam and Herbelot 2006). For the sake of historical accuracy, it should be mentioned that there were precursors to this movement in
linguistics already in the nineteenth century, as Jakobson (1961: 2) pointed out at
a ground-breaking symposium hosted by the American Mathematical Society:
Baudouin de Courtenay attempted to utilize in the study of language some of the basic
notions of contemporaneous mathematics, and in his historical survey of linguistics, published in 1909, he expressed his conviction that this scholarship would become ever closer to
the exact sciences. Upon the model of mathematics it would on the one hand, deploy ever
more quantitative thought and on the other, develop new methods of deductive thought.
In particular, just as mathematics converts all the innities to denumerable sets amenable
to analytic thought, Baudouin expected somewhat similar results for linguistics from improved qualitative analysis.
Jakobson then went on to note that the mathematician mil Borel, just before the
Fourth International Congress of Mathematicians in 1909, attributed the paradoxical nature of denumerable innities in math theory to the inuences of language
used to explain it. From this clever remark, a widely-held suspicion that language
and mathematics were intrinsically intertwined dawned upon many. As Bloomeld (1933: 512) succinctly put it a few years later: mathematics is merely the best
that language can do. Therefore, Jakobson (1961: 21) concluded, the connectivity
between the two systems must be of primary interest for mathematicians and
linguists alike.
Formalist approaches are very useful in describing structure, and especially
how rules interact to produce complexity of structure. But in order for the rules
to work unhampered, meaning must be discarded from their formal architecture,
or else meaning must be treated as either a separate phenomenon or as an appendage to the rules of syntax.
Unauthenticated
1.1 Logic
25
Meaning has always been a thorn in the side of formalism, since it is almost
impossible to divorce it from formal structureseven if pure symbolic systems are
used for constructing rules. Cognizant of the role of linguistic meaning in mathematics, in 1980 the Association of Teachers of Mathematics published a handbook
showing how deeply interconnected mathematics is with the linguistic meanings
we ascribe to it. Since then, math teachers and their professional associations
have become increasingly interested in this interconnection, aiming to use any
relevant insight in order to improve pedagogy. The study of how mathematics is
learned indicates that there is, in fact, more to it than just acquiring formalisms
and learning to think logically (Danesi 2008). One of the learning problems involved is, as Borel aptly noted, that language is used to teach mathematics and
to formulate problems. To quote Kasner and Newman (1940: 158): It is common
experience that often the most formidable algebraic equations are easier to solve
than problems formulated in words. Such problems must rst be translated into
symbols, and the symbols placed into proper equations before the problems can
be solved. As a trivial, yet useful example, of how language and mathematics
can easily become enmeshed ambiguously, note that the operation of addition is
described by variant English words such as and, sum, total, add together; conversely, subtraction is normally suggested by expressions such as less, from, take
away, difference, is greater than, and so on. A similar variety of expressions is
found in many other languages. These lexical variants can be a source of difficulty
for students learning mathematics who struggle to translate them into the simple
symbol +. So, those who do not have access to the semantic differences among
these expressions may manifest specic kinds of learning difficulties, or else may
be confused by the inconsistency (or decorative air, so to speak) of the language
used (Danesi 1987).
One of central objectives of formal analysis is to eliminate ambiguities, inconsistencies, and supercial ornamentations of this kind. To do so, the logical
calculus provides a series of denitions, axioms, symbols, and postulates that
do not vary or that resist ambiguous interpretation. This means, for instance, developing symbols for numbers and arithmetical operations that do not vary according to whim or situation. The history of arithmetic bears this out. The rst
number systems were derived from the use of material objects to represent numerical concepts (Schmandt-Besserat 1978, 1992); the words referring to the objects
themselves came, over time, to stand for the numerical concepts as well. Around
3000 BCE the Egyptians started using a set of number symbols based on counting
groups of ten (without place value) to represent numerical concepts; and a little
later the Babylonians developed a sexagesimal system based on counting groups
of 60a system we still use to this day to mark the passage of time. These early
societies developed number systems primarily to solve practical problemsto sur-
Unauthenticated
vey elds, to carry out intricate calculations for constructing buildings, and so on.
For this, they needed a standard system of numerical representation. They were
also interested in numbers as abstractions, but, by and large, they were mainly interested in what they could do with numbers in terms of engineering and business
affairs. Remarkably, their numerical symbol systems were closed systems (unambiguous and consistent), unlike the language used to describe them, which varied
according to context and usage.
It was the Greeks who took a step further in removing ambiguity and inconsistency in formal number systems, by examining the numbers in themselves,
apart from their uses in everyday life, developing mathmatik (a term coined by
Pythagoras). Around 300 BCE, Euclid founded the rst school of mathmatik in
Alexandria to study numbers, geometrical gures, and the method of proof in formal ways, independent of their uses in practical tasks. These could, of course, be
applied to construction and engineering activities, but their abstract study was
an autonomous one. From there the distinction between pure (or theoretical) and
applied mathematics surfaceda distinction that some, like Archimedes, did not
see as useful. Even today, some would claim that pure mathematics must be kept
separate from applied mathematics; but this ignores a whole set of discoveries
that have worked the other way around, whereby applications of mathematical
ideas have, themselves, led to further theorization. The dichotomy started probably with Euclid, who wrote the rst treatise of formal mathematics titled the
Elementsa book that has permanently shaped how we conceptualize mathematical methodology.
A key aspect of Greek mathematical formalism was the use of writing symbols
to represent numerical concepts in a consistent waya practice that was, in itself,
an engagement with mathematical abstraction. When alphabet symbols appeared
on the scene around 1000 BCE, they were used to represent not only sounds, but
numbers. The order {A, B, C, } of the alphabet is based on that early practice,
where A stood for the number 1, B for the number 2, and so on. The Greeks were
the rst to use alphabet letters for numbers. Their notation, however, was derived
from previous notation, such as the Egyptian one. Bellos (2014: 64) describes this
remarkable milestone in the history of mathematics as follows:
By the time of Euclid, the Greeks were using a number system derived from Egyptian hieratic
script: 27 distinct numbers were represented by 27 distinct symbols, the letters of the Greek
alphabet. The number 444 was written , because was 400, was 40 and was 4. Fractions were described rhetorically, for example, as eleven parts in eighty-three, or written as
common fractions with a numerator and a denominator, much like the modern form, 11/83,
although the Greeks maintained the historic obsession with unit fractions.
Unauthenticated
1.1 Logic
27
As Greek mathematicians started studying the properties of numbers in themselves, they introduced separate symbols for the latter. This was one of the rst
events that made an abstract conceptualization of numbers possible. The rst
formal mathematical system was, as mentioned, the one devised by Euclid in his
Elements, consisting of a set of axioms from which theorems, propositions, and
postulates could be investigated and/or proved. In it, we nd the rst denitions
of number and of various types of numberdenitions designed to skirt around
ambiguity and inconsistency. There are 22 denitions in total in Book VII of the
Elements which are worthwhile reproducing here, since they show how early formalism was, and still is, a system of analysis based on clear and unambiguous
denitions of basic constituent units (Euclid 1956).
1. A unit is that by virtue of which each of the things that exist is called one.
2. A number is a multitude composed of units.
3. A number is a part of a number, the less of the greater, when it measures the
greater.
4. But parts when it does not measure it.
5. The greater number is a multiple of the less when it is measured by the less.
6. An even number is that which is divisible into two equal parts.
7. An odd number is that which is not divisible into two equal parts, or that
which differs by a unit from an even number.
8. An even-times-even number is that which is measured by an even number
according to an even number.
9. An even-times-odd number is that which is measured by an even number according to an odd number.
10. An odd-times-odd number is that which is measured by an odd number according to an odd number.
11. A prime number is that which is measured by a unit alone.
12. Numbers relatively prime are those which are measured by a unit alone as a
common measure.
13. A composite number is that which is measured by some number.
14. Numbers relatively composite are those which are measured by some number
as a common measure.
15. A number is said to multiply a number when the latter is added as many times
as there are units in the former.
16. And, when two numbers having multiplied one another make some number,
the number so produced be called plane, and its sides are the numbers which
have multiplied one another.
17. And, when three numbers having multiplied one another make some number,
the number so produced be called solid, and its sides are the numbers which
have multiplied one another.
Unauthenticated
18. A square number is equal multiplied by equal, or a number which is contained

by two equal numbers.
19. And a cube is equal multiplied by equal and again by equal, or a number
which is contained by three equal numbers.
20. Numbers are proportional when the rst is the same multiple, or the same
part, or the same parts, of the second that the third is of the fourth.
21. Similar plane and solid numbers are those which have their sides proportional.
22. A perfect number is that which is equal to the sum its own parts.
As can be seen, these axioms state clearly what numbers are and how they form a
closed system. There is no room for variation or interpretation here. Euclids approach remained the basic blueprint for the development of formal mathematical
systems, until other geometries and the calculus came onto the scene much later.
These expanded the reach of formal Euclidean mathematics to include different
numerical and spatial concepts. With the advent of set theory and Boolean algebra, formal analysis developed into an autonomous branch of mathematics by the
end of the nineteenth century. The goal has always been to eliminate ambiguity
and variance through a set of basic denitions and axioms.
Set theory and Boolean algebra dovetailed with the arrival of structuralism in
psychology and linguistics in the nineteenth century (Wundt 1880, 1901, Titchener
1910, Saussure 1916). The underlying premise in structuralism is that all human
systems of representation and communication are grounded on abstract structures that operate in terms of relations to each other (Sebeok and Danesi 2000).
The distinction between structure and form is a crucial one. The physical form of
a triangle can be obtuse, acute, equilateral, isosceles, and so on. But the structure
is the same in all cases. It consists of three lines meeting to form three angles.
Similarly, in language, structures are patterns that can take on various forms. By
studying the forms, therefore, the idea is to get at the nature of the structures.
Mathematicians use the terms structure and form in parallel ways. Structure
is something that emerges from commonalities in forms and their relations. An
example is the real number set. The numbers themselves are the forms that make
up the set; but the relations that these show among themselves is what gives the
set coherence and unity. These include (Senechal 1993):
1. differential orderevery number is either greater or smaller than every other
number;
2. eld structurenumbers can be combined according to operational rules
(they can be added, multiplied, and so on), and thus form a eld;
3. interval structureif put on a line the numbers form constant intervals among
themselves;
Unauthenticated
1.1 Logic
29
4. metric structurethe interval or distance between the numbers can be measured precisely;
5. unidimensionalitynumber forms (digits, fractions, and so on) are unidimensional (at) structures constituting points on the line;
6. topological structurethe differential order and metric structure of the numbers determines their particular occurrence in space, implying that the set of
real numbers is an ordered eld.
As mentioned, one of the main premises of formal analysis is that it must be complete (leaving out other contrasting possibilities) and consistent (avoiding circularities, ambiguities, and statements that cannot be proved or disproved). Euclids
geometry is perhaps the one that most approaches completeness and consistency,
even though it may have few applications outside of the plane (two-dimensional
space), as demonstrated by non-Euclidean geometries. However, the fth postulate is problematic and may be a Gdelian aw, so to speak, in the Euclidean
system:
If a straight line crossing two straight lines makes the interior angles on the same side less
than two right angles, the two straight lines, if extended indenitely, meet on that side on
which are the angles less than the two right angles.
The postulate refers to a diagram such as the one below. If the angles at A and B
formed by a line l and another two lines l1 and l2 sum up to less than two right
angles, then lines l1 and l2 meet on the side of the angles formed at A and B if
continued indenitely:
l
A
l1
l2
B
Figure 1.5: Euclids fth postulate
Also known as the Parallel Postulate, it attracted immediate criticism, since it

seemed to be more of a proposition or theorem than a postulate. Proclus (in Morrow 1970) wrote as follows: This postulate ought even to be struck out of Postulates altogether; for it is a theorem. It is impossible to derive the Parallel Postulate
from the rst four.
Unauthenticated
In the 1800s, mathematicians nally proved that the parallel postulate or axiom is essentially not an axiom. This discovery led to the creation of geometric
systems in which the axiom was replaced by other axioms. From this non-Euclidean geometries emerged. In one of these, called hyperbolic or Lobachevskian
geometry, the parallel axiom is replaced by the following one: Through a point
not on a given line, more than one line may be drawn parallel to the given line. In
one model of hyperbolic geometry, the plane is dened as a set of points that lie
in the interior of a circle. Parallel lines are dened, of course, as lines that never
intersect. In the diagram below, therefore, the lines going through point X are all
parallel to line QP, even though they all pass through the same point. The lines
cross within the circle and there exist an innite number of parallels that can also
be drawn within it. The reason for this is, of course, that the lines, being inside
the circle, cannot be extended beyond its circumference:
P
Q
Figure 1.6: Lobachevskian Geometry
Of course, if the lines were to be extended outside the circle, then all but one
of them would intersect with QP. Around 1860, Riemann had another whimsical
hunch: Is there a world where no lines are parallel? The answer is the surface of a
sphere on which all straight lines are great circles. It is, in fact, impossible to draw
any pair of parallel lines on the surface of a sphere, since they would meet at the
two poles:
Great circles
> 180
Figure 1.7: Riemannian geometry
Unauthenticated
1.1 Logic
31
Because one important use of geometry is to describe the physical world, we might
ask which type of geometry, Euclidean or non-Euclidean, provides the best model
of reality. Some situations are better described in non-Euclidean terms, such as aspects of the theory of relativity. Other situations, such as those related to everyday
building, engineering, and surveying, seem better described by Euclidean geometry. In other words, Euclidean geometry is still around because it is a system that
has applications in specic domains. And this is a central lesson to be learned
by a discussion of the logical calculusit is system-specic, that is, it applies to
certain domains. Each domain thus has its own logical calculus. Lobachevskian
and Riemannian geometries, and by extension n-dimensional geometries, have
developed their own axioms, postulates, symbols, and rules for proving propositions within the system as either true or false. So, there are various types of logical
calculi, but all are based on the use of symbols and rules of combination that are
complete and consistent within a system.
Given that both Euclidean and non-Euclidean logic make sense and have
applications to the real world, one can see the reason why logical systems are so
appealingthey turn practical and intuitive knowledge into theoretical knowledge so that it can be applied over and over (Kaplan and Kaplan 2011). The
Pythagorean theorem was not just a recipe of how to construct right triangles;
it revealed the abstract nature of triangular structure and how it was connected
to the world. The Pythagorean triples that are derived from c2 = a2 + b2 could thus
be seen to refer not only to specic properties of right triangles, but to properties
of numbers themselves, leading eventually to Fermats Last Theorem and all the
intellectual activities that it has generated (Singh 1997).
From Euclids time onwards, it is therefore not surprising to nd that mathematics and logic were thought to be intrinsically intertwined, with one mirroring
the other. But Charles Peirce (a logician and mathematician) argued eloquently
that the two are ontologically different. This is what he wrote circa 1906 (in
Kiryushchenko 2012: 69):
The distinction between the two conicting aims [of logic and mathematics] results from
this, that the mathematical demonstrator seeks nothing but the solution of his problem;
and, of course, desires to reach that goal in the smallest possible number of steps; while
what the logician wishes to ascertain is what are the distinctly different elementary steps
into which every necessary reasoning can be broken up. In short, the mathematician wants
a pair of seven-league boots, so as to get over the ground as expeditiously as possible. The
logician has no purpose of getting over the ground: he regards an offered demonstration
as a bridge over a canyon, and himself as the inspector who must narrowly examine every
element of the truss because the whole is in danger unless every tie and every strut is not
only correct in theory, but also awless in execution. But hold! Where am I going? Metaphors
are treacherousfar more so than bridges.
Unauthenticated
1.1.4 The structure of logic

Formal analysis has an abstract structure itselfa kind of meta-structure that
can thus be described in abstract ways. Basically, it has the form of an argument
designed to lead to a valid (inescapable) conclusion. Even an argument based on
false premises could be valid, and, on the other hand, one based on true premises
could be invalid. All that is needed is that the logical form (the meta-structure) be
valid. One of the rst to be aware that logic itself had argument structure was Aristotle. He called the description of this structure the categorical syllogism, which
he believed would show how all logical systems operatedconnecting premises,
such as the following to each other, leading inescapably to a conclusion:
1. All mammals are warm-blooded (Major premise)
2. Cats are mammals (Minor premise)
3. Therefore, all cats are warm-blooded (Conclusion)
This syllogism is valid because the premises are connected logically: (1) is the major premise and (2) is the minor premise. Each is composed of categorical terms
(terms that denote categories such as mammals, cats, and so on). Each of the
premises has one term in common with the conclusion. The categorical term in
common in the premises is called the middle term. The skeletal structure of the
categorical logic of the above type of syllogism can be shown as follows:
1. All A are B.
2. All C are A.
3. Therefore, all C are B.
One does not need to use syllogistic argumentation to accept this as true, though.
Common sense tells us that this is so. However, common sense does not show us
the validity of the logic behind the reasoning involved. Moreover, there may be
arguments that are tricky. The following shows one of these, since the conclusion
is logically invalid.
1. No cats are planets.
2. Some satellites are not planets.
3. Therefore, some satellites are not cats.
The argument here fails on several counts. For the present discussion, it is sufficient to point out that the argument sequence is not based on entailment, since
(2) and (3) are essentially the same logically. Therefore, the syllogism must be invalid. The rules of syllogisms enable us to test the validity of an argument without
considering specic examples (actual categories) or examining the arguments
structure in detail. These rules are based on certain features that recur in valid syl-
Unauthenticated
1.1 Logic |
33
logisms and distinguish them from invalid ones. For example, one rule states that
no valid syllogism has two negative premises. There are two negative premises in
the above syllogism.
It was George Boole (1854) who used the idea of sets to unite logic, argumentation, and mathematics into a general formal system. To test an argument, Boole
converted statements into symbols, in order to focus on their logical relations,
independently of their real-world meanings. Then through rules of derivation or
inference he showed that it is possible to determine what new formulas may be
derived from the original ones. Boolean algebra, as it is called, came forward to
help mathematicians solve problems in logic, probability, and engineering. It also
removed meaning from logical argumentation once and for alla fact that has
come back to haunt logicians in the era of computer modeling (as we shall see in
chapter 3).
Booles primary objective was to break down logic into its bare structure by replacing words and sentences (which bear contextual or categorical meaning) with
symbols (which presumably do not). He reduced symbolism to the bare minimum
of two symbolsthe 1 of the binary system for true and the 0 for false. Instead
of addition, multiplication and the other operations of arithmetic (which bear
historical meanings) he used conjunction (), disjunction (), and complement
or negation (), in order to divest operations of any kind of external information
that may interfere with the logic used. These operations can be expressed either
with truth tables or Venn diagrams, which show how they relate to sets, such as
x and y below, where the symbolic representations and Venn diagrams of these
operations are displayed visually:
y
x y
y
x y
x
x
Figure 1.8: Set theory diagrams
Boolean algebra has had many applications, especially in computer programming

and in the development of electric circuits. American engineer Claude Shannon
(1948) was developing switching circuits in the 1930s when he decided to apply
Boolean algebra to control the circuits. In so doing, he achieved control on the
basis of a simple binary off -versus-on symmetry, thus laying the foundation for
modern-day digital computing. Shannons logic gates, as he called them, represented the action of switches within a computers circuits, now consisting of
millions of transistors on a single microchip.
Unauthenticated
By attempting to enucleate the structure of logic and mathematics as a unitary

phenomenon, Boolean algebra also gave a concrete slant to the question of what
mathematics is and of its relation to logic in strict formal terms. Moreover, it forced
mathematicians to reconsider their denitions and axioms from the perspective of
logical entailment, taking nothing for granted. This was Giuseppe Peanos aim in
1889 (Peano 1973), who revisited Euclids number denitions (see above) rening
them and in order to give them a more abstract formulation. His nine axioms start
by establishing the rst natural number (no matter what numeral system is used
to represent it), which is zero. The other axioms are successor ones showing that
they apply to every successive natural number. The axioms are reproduced here
for convenience:
1. 0 is a natural number.
2. For every natural number x, x = x.
3. For all natural numbers x and y, if x = y then y = x.
4. For all natural numbers x, y and z, if x = y and y = z, then x = z.
5. For all a and b, if a is a natural number and a = b, then b is also a natural number. That is, the set of natural numbers is closed under the previous axioms.
6. For every natural number n, S(n) is a natural number: S(n) is the successor
to n.
7. For every natural number n, S(n) = 0 is false. That is, there is no natural number whose successor is 0.
8. For all natural numbers m and n, if S(m) = S(n), then m = n.
9. If K is a set such that 0 is in K, and for every natural number n, if n is in K,
then S(n) is in K, then K contains every natural number.
These are self-evident axioms that need no proof. If one were to program a machine to carry out arithmetical operations, it would need to have these axioms
built into the appropriate algorithm. Like all axiomatic sets they are useful for formal analysis. Following on Peanos coattails, at the First International Congress
of Mathematicians of the 20th Century in Paris, David Hilbert asked if all science
could not be broken down into similar groups of fundamental axioms. The question is still an open one. Again, the question of meaning is the main problem in
this kind of approach. As Stewart (2013: 313) observes, the use of exist in any
logical treatment of mathematics is hardly unambiguous, raising several deep
questions, the most obvious one being the denition of exist itself:
The deep question here is the meaning of exist in mathematics. In the real world, something
exists if you can observe it, or, failing that, infer its necessary presence from things that can
be observed. We know that gravity exists because we can observe its effects, even though
no one can see gravity. However, the number two is not like that. It is not a thing, but a
conceptual construct.
Unauthenticated
1.1 Logic
35
The irrational numbers and the imaginary ones did not exist until they cropped
up in the solution of two specic equations made possible by the Pythagorean
theorem and the concept of quadratic equation respectively. So, where were they
before? Waiting to be discovered? This question is clearly at the core of the nature
of mathematics. This story can be told over and over within the eldtransnite
numbers, graph theory, and so on. These did not exist until they crystallized
in the conduct of mathematics, through ingenious notational modications, diagrammatic insights, ludic explorations with mathematical signs, and so on.
Aware of the problem of meaning in the formalization of logic, Gottlob Frege
(1879) introduced the distinction between sense and referent. The latter is the object named, whereas the former involves a mode of presentation. So, in an expression such as Venus is the Morning Star, Frege claimed that there are two terms
with different senses but with the same referent. Thus, for Frege this expression is
a version of Venus is Venus, involving a reference to an astronomical discovery.
In symbolic terms, A = A is rendered as A = B, only because in language A has
different senses. Freges distinction introduced the notion that two terms, whose
senses were already xed so that they might refer to different objects, refer to the
same object. His work inuenced Bertrand Russell in a negative way, since he became dissatised with Freges approach. So, Russell advanced his own theory of
descriptions. In his system, the expression Venus is the Morning Star is analyzed as there is an object which is both the Morning Star and Venus. The term
Morning Star is not a name as such; it is a description. Russell viewed such a
sentence as attributing the property Morning Star to the object named Venus.
The sentence therefore is not an identity, Venus is Venus, as Frege claimed.
The theory of reference was taken up by Ludwig Wittgenstein in 1921. Wittgenstein saw sentences as propositions about simple world factsthat is, they
represented features of the world in the same way that pictures or symbols did.
But Wittgenstein had serious misgivings about his own theory of language from
the outset. In his posthumously published Philosophical Investigations (1953), he
was perplexed by the fact that language could do much more than just construct
propositions about the world. So, he introduced the idea of language games,
by which he claimed that there existed a variety of linguistic games (describing,
reporting, guessing riddles, making jokes, and so on) that went beyond simple
Fregean semantics. Wittgenstein was convinced that ordinary language was too
problematic to describe with logical systems because of its social uses. Unlike
Russell, he wanted to ensure the careful, accurate, and prudent use of language
in communication.
Perhaps the most complete study attempting to outline the meta-structure of
logic was Russell and Whiteheads 1913 treatise, the Principia mathematica. The
features connected with their treatise will be discussed in the next chapter. It is
Unauthenticated
sufficient to say here that, like Euclids fth axiom, it immediately invited reservations from mathematicians. And after Gdels (1931) proof, it became obvious that
it could hardly be considered complete or consistent. By the mid-1950s, formal
analysis went into a crisisa crisis that was somewhat resolved by the rise of computer science and articial intelligence, which used the logical calculus as a basis
to carry out mathematical and linguistic tasks. Logic was the grammar of computers; and thus could be studied in computer software, rather than speculatively.
A little later, research in neuroscience started showing that certain computer algorithms mirrored neural processes. The rescue of formalism was achieved not by
speculations on the meta-structure of logic, but by computer science and brain
research working in tandem (as will be discussed subsequently).
1.2 Computation
Formal grammars and formal mathematics have typically sought to encode the
purported laws of thought, as Boole called them, that generate well-formed
statements, such as proofs and sentences. So, they are not necessarily about the
practical value of the proofs or sentences themselvesthat is, their meanings
but about how they are formed. As discussed, they are concerned with the form
of any argument and thus its validity. This entails ignoring those features that are
deemed to be irrelevant to this goal, such as specic language grammars or certain
proofs in mathematics. As we saw, the rst to concern himself with the metastructure of logic was Aristotle and the fundamental difference between modern
formal logic (as in Boolean, set-theoretic, or Markovian logic systems) and traditional, or Aristotelian logic, lies in their differing analyses of the logical structure of the statements they treat. The syllogism was Aristotles model of logical
form; modern analyses are based on notions such as recursion, logical connectives (such as quantiers) and rules that conjoin the various forms.
But all logical approaches have been fraught, from the outset, with the problem of undecidability. Euclids fth postulate is an example of an undecidable
statementit is obvious, but it cannot be decided whether it is an axiom or a theorem to be proved. At about the same time that formalist approaches surfaced in
linguistics, based on mathematical formalism, computer science and articial intelligence came onto the scene, providing new mechanisms and theoretical frameworks for testing and modeling formal theories and rule systems for decidability
and thus computability. Computational structure will be discussed in more detail in the third chapter. Here a few general ideas will be considered, especially
the one that the computer is a powerful modeling device. Moreover, according to
many contemporary formalists, the action has shifted over to computer science
Unauthenticated
1.2 Computation | 37
(so to speak), where computer algorithms written to model rule-making principles

are ipso facto theories of language and mathematics.
Needless to say, the computer did not originate as a modeling device, but
rather as a device for carrying out mechanical computations quickly and automatically. In a way, its origins parallel the origins of formal mathematicsnding
ways to facilitate computational tasks such as addition, multiplication, and the
like. To do so, notational systems were invented by mathematicians that allow
for computation to be carried out more efficiently, such as exponential notation.
Computer modeling is based on deriving functional notational systems (computer
programs) to do virtually the same kind of thing. Notational systems are symbolic
ones and, thus, algorithms that test the computational power of these systems
have led to many insights into the very nature of representation in language and
mathematics. In a general way, it can be said that algorithms are designed to compress data into notational symbols and combinatory rules that can be used to
produce new data ad innitum. So, generation and compression are intertwined,
as Chomsky clearly acknowledged himself in the Minimalist Program. The study
of compression has become a major theme in contemporary cognitive science, as
we shall see, dovetailing with the rise of computers as compacting devices.
Chomsky (1965, 1966b) was among the rst to see theories of language as modeling devices (akin to computer algorithms), ranking them in terms of what he
called their order of power to explain linguistic data. Some are too weak, he
stated, being incapable of explaining certain phenomena; others are too powerful, capable of taking into account phenomena that may never occur in human
languages. The best theory is the one that provides the best t to the data at hand.
Ironically, Chomskys own theory turns out to be too powerful, explaining anything at all that can be described because of its fundamentally mathematicalrecursive nature. Many linguists now wonder whether Chomskys initial agenda
for language study can ever be carried out at all. As in physics, it is useful to have
special theories of a phenomenon, but it is perhaps impossible to develop a
general all-encompassing one. The best we can hope for is to develop models or
theories to describe how actual languages function, using insights from a host of
sources, from research on language learning, speech therapy, computerized machine translation, automatic speech recognition research, and so on and so forth.
As computers became more and more sophisticated and powerful in the
1970s, a movement within formal linguistics emerged, known as computational
linguistics, which has become a useful branch that allows linguists to use the
computer to model various aspects of natural language in order to test both the
validity of formal theories and to penetrate hidden dimensions of language indirectly. In mathematics, a parallel movement called computability theory emerged
to allow mathematicians to carry out similar research on mathematical formal-
Unauthenticated
ism. When the computer would come to a halt in certain applications or models, it
indicated that the phenomenon that the computer could not handle would need
to be studied further by linguists or mathematicians (Martn-Vide and Mitrana
2001). In other words, if a theory was inconsistent, the computer would be able
to detect the inconsistency, because the program would go into an innite loop.
A loop is a sequence of instructions that is continually repeated until a certain
state is reached. Typically, when an end-state is reached the instructions have
achieved their goal and the algorithm stops. If it is not reached, the next step
in the sequence is an instruction to return to the rst instruction and repeat the
process over. An innite loop is one that lacks an exit routine. The result is that
the program repeats itself continually until the operating system senses it and
terminates the program with an error.
This approach to theory-testing is called retroactive data analysis in computer
science. This is a method whereby efficient modications are made to an algorithm and its correlative theory that do not generate some output or at least do
not correspond to the input data. The modications can take the form of insertions in the theoretical model, deletions, or updates with new information and
techniques. When nothing works, then we have eshed out of the algorithm something that may be faulty in the theory or, on the other hand, that may be unique
to the phenomenonlinguistic or mathematicaland thus non-computable, that
is, beyond the possibilities of algorithmic modeling.
Computer modeling is a very useful practice for linguists and mathematicians, allowing them to test their hand-made theories and models. In mathematics, it has even been used to devise proofs, the most famous one being the Four
Color Theorem (to be discussed subsequently). Known as proof by exhaustion, it is
established by dividing a problem into a nite number of cases and then devising
an algorithm for proving each one separately. If no exception emerges after an exhaustive search of cases, then the theorem is established as valid. The number of
cases sometimes can become very large. The rst proof of the Four Color Theorem
was based on 1,936 cases, all of which were checked by the algorithm. The proof
was published in 1977 by Haken and Appel and it astonished the world of mathematics, since it went against the basic Euclidean paradigm of proof, with its use of
axioms, postulates, and logic (deductive or inductive) to show that something was
valid. The central idea in traditional proofs is to show that something is always
true by the use of entailment and inference reasoning, rather than to enumerate
all potential cases and test themas does proof by exhaustion, where there is no
upper limit to the number of cases allowed. Some mathematicians prefer to avoid
such proofs, since they tend to leave the impression that a theorem is only true by
coincidence, and not because of some underlying principle or pattern. However,
there are many conjectures and theorems that cannot be proved (if proof is the
Unauthenticated
correct notion) in any other way. These include: the proof that there is no nite
projective plane of order 10, the classication of nite simple groups, and the
so-called Kepler conjecture.
The earliest use of computers in linguistics and mathematics goes back to the
late 1940s and the Machine Translation (MT) movement (Hutchins 1997), which
itself emerged within the context of the cybernetics movementthe science concerned with regulation and control in humans, animals, organizations, and machines. MT was of interest to both linguists and mathematicians because it showed
how algorithms translate one system into another. Cybernetics was conceived by
mathematician Norbert Wiener who used the term in 1948 in his book Cybernetics,
or Control and Communication in the Animal and Machine. The same term was used
in 1834 by the physicist Andr-Marie Ampre to denote the study of government
in his classication system of human knowledge, recalling Plato, who used it to
signify the governance of people. Cybernetics views communication in all selfcontained complex systems as analogous, since they all operate on the basis of
feedback and error-correction signals. The signals (or signal systems) are called
servomechanisms. The cybernetic movement no doubt enthused many linguists,
mathematicians, and computer scientists, leading to the MT movement. When the
early work failed to yield meaningful results, however, the automated processing of human languages was recognized as far more complex than had originally
been assumed. Thus, MT became the impetus for expanding the methods of computational linguistics and for revising formalist theories such as the syntax one.
Today, the computer as a modeling device has become intrinsic to linguistic and
mathematical research. Traditional concepts in the two sciences are being revised
and refashioned as the constant improvement in computer technologies makes it
possible to carry out efficient analyses of specic theories and models.
The Internet has also led to different ways of conducting research. One example of this is the Polymath Project. Mathematical discoveries have been largely
associated with individuals working with mathematical ideas in isolation. And
these are typically named after themPascals Triangle, Hamiltonian circuits,
Bayesian inference, and so on. The Pythagoreans, on the contrary, collaborated
among themselves to discuss and debate discoveries, such as their own theorem
and the unexpected appearance of irrationals. Probably aware of the intellectual power of this kind of collaboration, renowned mathematician Tim Gowers
initiated the online Polymath Project (Nielsen 2012), reviving the Pythagorean
ideal of cooperation in mathematical research. The Project is a worldwide one
involving mathematicians from all over the globe in discussing and proposing
solutions to difficult problems. The Project started in 2009 when Gowers posted
a problem on his blog, asking readers to help him solve it. The problem was to
nd a new proof for the density version of the Hales-Jewett theorem (1963). Seven
Unauthenticated
weeks later Gowers wrote that the problem was now probably solved, thanks to
the many suggestions he had received.
Computer modeling, data compression algorithms, and the like have led to
a new focus on the relation between quantitative notions such as frequency and
structure. For one thing, algorithms allow for an efficient and rapid collection and
analysis of large corpora of data. And this makes it possible to quantify it statistically. While some may claim that, outside of the use of statistics to analyze the
data, this paradigm has had little or no inuence on the development of theories
of pragmatics, ethnosemantics, and other such branches of language, as I will
argue in chapter 3, the opposite may be true, since interest in discourse may have
initiated in part by the inability of computers to produce human dialogue in a natural way, thus inducing a retroactive focus on conversational structure that would
have been likely inconceivable beforehand.
Moreover, by modeling discourse in the form of algorithms it has become
clear that within linguistic texts there is a hidden structure, based on events and
their probabilities of occurrence within certain contexts. Work in computational
quantication has also led to a new and fertile area of interdisciplinary research
between mathematicians and linguists in the domain of probability theory. The
computer modeling of discourse is essentially a Bayesian-guided one, as will be
discussed in chapter 3. For now, suffice it to say that probability theory has become
a new theme within linguistic research.
1.2.1 Modeling formal theories

The basic idea in computer modeling is to look rst at a formal theory and to
extract from it the principles that can be incorporated into the design of an appropriate algorithm. This forces the linguist and mathematician to be as precise
as possible in the process of theory formulation and, also, to check for consistency ahead of time. As discussed above, in linguistics the rst attempt to articulate a formal theory of language, using ideas from mathematics was the one by
Chomsky in 1957. Chomsky was inuenced initially by his teacher, the American
structuralist Zellig Harris (1951) who, like the Port-Royal grammarians, suggested
that linguists should focus on sentences as the basic units of language, not on
phonemes and words in isolation. As we saw, Chomsky developed this idea into
the syntax hypothesis, going on to argue that a true theory of language would have
to explain, for instance, why all languages seemed to reveal a similar pattern of
constructing complex sentences from more simple ones. As we saw, his solution
was a simple one, similar to the one posited by the Port-Royal scholarsassume
two levels, a deep structure and a surface structure, whereby the deep structure
Unauthenticated
level is transformed into the surface one via transformational rules. In its bare
outline form, Chomskys theory of language design was (and still is) an elegant
one, as discussed.
Chomsky subsequently claimed, as also discussed, that as linguists studied
the specics of phrase structure and transformational rules in different languages
they would eventually discover within them, and extract from them, a universal
set of rule-making principles, dened as the UG. With this claim, Chomsky turned
linguistics into a branch of both psychology and computer science (Thibault 1997).
But there are problems with his proposal, as we saw. First, rule-making principles
do not explain the semantic interactions among the words in sentences that often
guide the syntax of sentences themselves (Lakoff 1987). Second, sentences might
not be the basic units from which to develop a theory of language (Halliday 1975).
For instance, pronouns may not be simple slot-llers in syntactic descriptions, but
rather trace devices in conversations. The following stretch of conversation, does
not have pronouns in it:
Speaker A:
Speaker B:
Speaker A:
Speaker B:
Andrea is a wonderful young lady.

Yes, Andrea is a wonderful young lady.
Yes, but Andrea always likes to talk about Andrea.
Yes, Andrea does indeed always talk about Andrea.
This stretch would be evaluated by native speakers of English as stilted or, perhaps, as ironic-humorous in some contexts, not because it lacks sentence structure, but because it lacks text structure. The appropriate version of the conversation is one in which pronouns are used systematically as trace devices (anaphoric
and cataphoric) so that parts of individual sentences are not repeated in the conversational chain:
Speaker A:
Speaker B:
Speaker A:
Speaker B:
Andrea is a wonderful young lady.

Yes, she is.
Yes, but she always likes to talk about herself.
Yes, she does indeed.
The use of the pronouns she and herself is text-governed; that is, the pronouns
connect the various parts of the conversation, linking them like trace devices. This
is called coreference or indexicality, a text-making process which suggests that
pronouns cannot be examined in isolation as part of a syntactic rule system, but
rather as part of texts where they function as indexes or deictic particles to keep
conversations uid and non-repetitive. Chomsky has answered this critique by
claiming that transformational rules can handle deixis and deletion easily by ex-
Unauthenticated
tending the application of rules across sentences in a conversation. Texts, in this

view, are really concatenations of fully-formed sentences in their deep structure
that have undergone reshaping processes through the application of transformational rules to produce surface texts. For Chomsky, therefore, a text is a combination of well-formed sentences that have undergone text-based transformations.
But this response does not answer the question of why native speakers feel that
these sentences are anomalous. Does this mean that the rules are tied to social
functions rather than to some syntactic mechanism? If so, this would be a disaster for the syntax hypothesis. Whatever the case, the work in systemic linguistics
(Halliday 1985) suggests rather convincingly that the choice of certain grammatical items, such as pronouns, is hardly dependent on rules of grammar; but rather
that it is motivated by rules of communication. As Halliday has claimed, these
leave their imprint on the internal structures of the grammar.
But this is not the main critique. As discussed several times already, the most
important critique of generative grammar is that it has never been able to really
account for how sentences and texts encode meaning. This led in the late 1960s
to various movements (Allan 1986 and Harris 1993). By the late 1980s the one that
came to the forefront as a veritable challenge to generative grammar came to be
known as cognitive linguistics. As discussed, the most prominent gure in the
movement is George Lakoff whose doctoral thesis, in 1965, dealt with the idea of
exception or irregularity in the Chomskyan approach (see Lakoff 1970). This
depends on a prior notion of rule government, whereby in each string (phrase
marker) on which a transformational rule may operate, there exists one lexical
item that governs the rule. That item is, in fact, the only one that may be an exception to the rule. There are thus two types of transformational rules: (1) major,
which apply in normal cases, but not to exceptions, and (2) minor, which apply
only to exceptions. Moreover, there are absolute exceptions, which are lexical
items that must meet (or not) the structural description of some transformational
rule. Each lexical item is subcategorized with respect to each transformational
rule and each structural description via rule features and structural description features. Some lexical item must be represented with Boolean functions and,
therefore, each grammar may be said to dene the set of possible exceptions to its
rules.
Lakoffs thesis raised the problem of the role of meaning in a formal grammar
in a formal way, since lexical insertion restricts the operation of the grammar in
unpredictable ways, given that exceptions come not from considerations within
the grammar, but from outside. From this, Lakoff came over time to the conclusion
that the foundations of grammar were not syntactic but gurative, given that most
exceptions seemed to be of a gurative nature, and so, metaphor was hardly
to be considered an exception, but instead an intrinsic part of the grammar. In
Unauthenticated
a key 1987 work, he discussed a property of the indigenous Australian language

of the Dyirbal to bring out the inadequacies of formal grammar theories and the
need for linguistics to focus on the gurative properties of words and correlative structures. Like many languages, Dyirbal nouns are marked for grammatical
gender. In most European languages, the gender is often arbitrary, that is, it is unpredictable from its literal meaningtable is masculine in German (der Tisch),
feminine in French (la table), and neuter in Greek (to trapzi). In Dyirbal the rules
for gender assignment are based on conceptualizations, not on arbitrary assignment rules. One of its four gender classes includes nouns referring to women,
re, and dangerous things (snakes, stinging nettles, and the like). The link among
these classes is metaphorical, not arbitrary, and the gender category has been
constructed on the basis of this fact. Looking more deeply into European languages, metaphorical gender seems to be a latent property of grammar in general
(Jackobson 1956, Danesi 1998). As Lakoff argues, when one digs deeper into the
substratum of linguistic grammars, one tends to discover that metaphorical conceptualizations form the backbone of linguistic structure.
Despite Lakoffs well-founded critique, formal grammar still has a role to play
in language, just like set theory and propositional logic do in mathematics. Moreover, the rise of computer systems led already in the 1960s to the rst programs
testing theories of syntax, uniting linguistics and mathematics in a direct way.
From the outset, modeling was seen as part of mathematics, and was called, in
fact, mathematical linguistics. The idea was that a rule, like a formula or equation, must be consistent and complete and thus connected to the other rules of a
grammar. The rst true formal apparatus to describe language is due to the Russian mathematician Andrey A. Markov who put forward his theory initially in 1913,
showing the degree of syntactic dependence among units in linear concatenation
in the form of transition probabilities. It was initially considered irrelevant to the
study of grammar. But Chomsky revived interest in it in the late 1950s. Markovian analysis became a central component of generative grammar at rst, leading
to the syntax hypothesis. It also led to the use of probabilistic notions as part of
theory-making.
Formal approaches involve, essentially, describing the structure of the rules
inherent in the generation of forms. This is problematic, of course, but it is still
useful, in the same way that a logical proof in mathematics is. Chomskys original
goal was to provide generative grammar with a Markovian foundationa set of
nite-state events described by rules that ensue from one another in a branching
tree structure conguration. Consider a simple sentence such as The boy eats the
pizza. The sentence is organized into two main parts, a subject and a predicate.
The subject is to the left and the predicate to the right. Using a simple tree diagram,
this can be shown as follows:
Unauthenticated
Sentence
Subject
Predicate
The boy
eats the pizza
Figure 1.9: Tree diagram for The boy eats the pizza
This type of diagram represents Markovs idea that sentences are not constructed
by a direct concatenation of single words, but rather hierarchically in terms of
phrases and relations among them. So, their positioning to the right or left is not
a simple diagrammatic convenience; it shows how the parts of a sentence relate
to each other hierarchically. This means that the linear string, the + boy + eats +
pizza is not generated in a linear fashion with the words combined one after the
other, but rather in terms of rules that will show its hierarchical structure. The
formal study of syntax is, more precisely, an examination of this kind of structure
that can be divided into different states (the branches of the tree) which overlie the
structure of linear strings. The concatenation of items in a sentence is thus governed by states of different kinds, leading to the concept that the rules describe the
states as they are generated one after the other. Thus, people purportedly sense
that something is out of place in a sentence, not because it is necessarily in the
wrong linear place but because it has no syntactic value there. This is akin to place
value in digits. The 2 in 23 has a different value than it does in 12. The values
are determined not by linear order, but by compositional (hierarchical) structure.
The structure of a digit is read, like a string, with each digit having the value of
ascending powers of 10.
The generative rules, therefore, must show how the parts of speech are connected to each other relationally and compositionally. In the above sentence, for
instance, the subject consists of a noun phrase and the predicate of a verb phrase,
which itself is made up with a verb and another noun phrase:
Sentence
Subject
Predicate
Noun Phrase
Verb Phrase
The boy
Verb
Noun Phrase
eats
the pizza
Figure 1.10: Phrase structure diagram for

The boy eats the pizza
Unauthenticated
1.2 Computation |
45
Rules such as the following ones will generate the above sentence (S = Sentence,
Sub = Subject, Pr = Predicate, NP = Noun Phrase, VP = Verb Phrase, V = Verb).
Rules (1), (2), (4), and (5) are called rewrite rules because each one rewrites the
previous one by expanding one of its symbols; (3), (6), and (7) are called insertion rules because they show where a lexical item or phrase is inserted in the
generation process:
1.
2.
3.
4.
5.
6.
7.
S
Sub
NP1
Pr
VP
V
NP2
Sub + Pr
NP1
The boy
VP
V + NP2
eats
the pizza
These rules show that a sentence is composed of parts that are expanded in sequential order going through a series of states (indicated by the different rules),
producing a terminal string that has the following linear structure:
NP1 + V + NP2 = the boy + eats + the pizza
Needless to say, that is how a computer algorithm, in bare outline, works. So, it
is a relatively easy task to write algorithms to model Markovian nite-state grammars for generating sentences such as the one above. Now, to this system of rules,
Chomsky added the notion of transformational rule. So, the passive version of the
above stringThe pizza is eaten by the boyis generated by means of a transformational rule (T-rule) that operates on the string as input to produce the output
as required:
NP1 + V + NP2 NP2 + be + V [past participle] + by + NP1
The boy + eats + the pizza The pizza + is + eaten + by + the boy
This T-rule converts one string into another. Since it is a general rule, it applies to
any terminal string that has the representation on the left of the arrow as its structural description. This model of grammar was the standard one in 1965 (above).
Since then, many debates in the eld have dealt with which grammars or which
systems of rules are more powerful and more psychologically real and which modications must be made. From the outset, computer scientists were attracted to the
generative paradigm because it was algorithmically-friendly and thus could be
used as a basis for developing programs to generate language and to translate
from one language to another (chapter 3).
Unauthenticated
Similar tree diagrams can be devised to show the hierarchical structure overlying digit formation. A digit such as 2,234 has the following Markovian tree structure (Note: this is a highly simplied modeling of the relevant tree; V = value):
Figure 1.11: Markovian diagram for 2,234
Now, a similar type of rule to the phrase structure ones above can be written so
that a computer can model and test the representation of digital numbers for consistency and coherence:
D N n 10n1 + N n1 10n2 + N n2 10n3 + + N1 100
This says that a digit (D) is composed of numerals (N) that have values in ascending powers of 10 when read in a line. This is now a formal statement devised by
hand that can easily be written as an algorithm that will run a program to generate strings of numbers ad innitum.
Computational linguistics and mathematics have now gone beyond the modeling of formalisms such as digit formation, as we shall see. They now attempt
to reproduce human behavior in robots, with the development of very powerful
learning algorithms. But it must not be forgotten that many of the advances in
digital communications technologies, such as voice activation, speech recognition, and other truly remarkable capacities of computers today, were made possible by the early partnership among linguists, mathematicians, and computer
scientists. In effect, running formal programs is akin to following an engineering
manual for assembling some object. The rules of assemblage allow for the object
to work but they tell us nothing about the object itself, nor about why the rules
work or do not. This is why formal theorists have always sought conrmation or
corroboration in psychology wherein the models are tested out not on machines
but on human beings. The collaboration among psychologists, linguists, mathematicians, and computer scientists coalesced into a full-edged discipline called
cognitive science in the 1980s.
1.2.2 Cognitive science

The main objective of early psychology was to gure out the laws of human
thought and especially of learning by comparing them with animal behavior. The
Unauthenticated
1.2 Computation |
47
assumption was that the same laws of learning applied to all organisms and,
therefore, that the discovery of basic principles of learning and problem-solving
could be gleaned from experiments with animals. Cognitive science sought the
laws not in any comparison with animals, but instead from studying how machines learned to do things from a set of instructions. The term cognition,
rather than mind or behavior, was employed from the outset in order to eliminate the articial distinction maintained by behaviorist psychologists between
inner (mental) and observable (behavioral) processes. Indeed, this term has now
come to designate all mental processes, from perception to language. Adopting
insights from articial intelligence, cognitive scientists aimed from the outset to
investigate the mind by seeking parallels between the functions of the human
brain and the functions of computers.
Cognitive science thus adopted the notions and methods of articial intelligence researchers. If the output of an algorithm was a linguistic sentence and
it was shown to be well-formed, then the input (the rules used to create the algorithm) was evaluated as correct; if the output was not a well-formed sentence
then the fault was detected in the input and changed accordingly. The process
was a purely mechanical and abstract one, since it was thought that a faculty like
language could be analyzed in isolation form its functions in context and from its
biological interactions with the human body. As Gardner (1985: 6) put it, for early
cognitive scientists, it was practical to have a level of analysis wholly separate
from the biological or neurological, on the one hand, and the sociological or cultural, on the other; therefore, central to any understanding of the human mind
is the electronic computer.
The current focus of cognitive science has, of course, gone beyond this computational agenda. It now even seeks to design articial programs that will display all the characteristics of human cognition, not just model aspects of it. To
do so, it must not only be able to decompose the constituent parts that faculties such as perception, language, memory, reasoning, emotion, and so on might
have, and then reassemble them in terms of representations that can be then programmed into software, but also devise ways for the algorithm to generate new
rules on its own given variable inputs. For contemporary cognitive science the
guiding premise is the belief that representational structures in the mind and computational representations of these structures are isomorphic. Aware that lived or
embodied experience might interfere into this whole process, recent cognitive
science has gone beyond developing representations and algorithms to be implemented in computers, to studying how lived experience shapes cognition, in
contrast to articial intelligence. So, there are now two streams within cognitive
science: the formalist one that aims to translate formal theories into algorithms
that are believed to be transferable to non-living robots, and the one that seeks to
Unauthenticated
see how mental operations are unique because they are shaped by bodily experiences.
At the core of the cognitive science agenda, no matter which of the two streams
is involved, is learningHow do we learn language? How do we learn mathematics? As discussed above, the role of metaphor in the process was for many years
ignored, but today it is a central topic within both streams of cognitive science.
Metaphor indicates how we go from sensory knowledge or imaginative inference
to conceptual knowledge. Like other animals, human infants come to understand
things in the world at rst with their senses. When they grasp objects, for instance, they are discovering the tactile properties of things; when they put objects
in their mouths, they are probing their gustatory properties; and so on. However,
in a remarkably short period of time, they start replacing this type of sensory
knowing with conceptual knowingthat is, with words, pictures, and other forms
that stand for things. This event is extraordinaryall children require to set their
conceptual mode of knowing in motion is simple exposure to concepts in social
context through language, pictures, and other kinds of symbol-based forms of
representation and communication. From that point on, they require their sensory apparatus less and less to gain knowledge, becoming more and more dependent on their conceptual mode. Cognitive science research, such as the one by
Lakoff and Nez (2000), has started to show that the transition from one stage
to the other is mediated by metaphor. Without discussing the relevant research
here, since it will be discussed subsequently, it is sufficient to say that the role of
metaphor in childhood can no longer be ignored.
The shift from sensory to conceptual knowing was rst examined empirically
by two psychologistsJean Piaget and Lev S. Vygotsky. Piagets work documented
the presence of a timetable in human development that characterizes the shift
(Piaget 1923, 1936, 1945, 1955, 1969, Inhelder and Piaget 1969). During the initial
stage infants explore the world around them with their senses, but are capable of
distinguishing meaningful (sign-based) stimuli (such as verbal ones) from random noises. In short time, they show the ability to carry out simple problemsolving tasks (such as matching colors). Piaget called this the pre-operational
stage, since it is during this phase that children start to understand concept-based
tasks operationally. By the age of 7, which Piaget called the concrete operations
stage, children become sophisticated thinkers, possessing full language and other
conceptual modes of knowing for carrying out complicated tasks. The mental development of children culminates in a formal operations stage at puberty, when
the ability to reason and actualize complex cognitive tasks emerges.
As insightful as Piagets work is, it makes no signicant reference to the use
of metaphor in childhood as a creative strategy for knowing the world. Vygotsky
(1962), on the other hand, saw metaphor as a vital clue to understanding how the
Unauthenticated
1.2 Computation |
49
conceptual mode of knowing emerges. When children do not know how to label
somethingsuch as the moonthey resort to metaphor, calling it a ball or a
circle. Such metaphorical fables, as Vygotsky called them, allow children to
interconnect their observations and reections in a holistic and meaningful fashion. Gradually, these are replaced by the words they acquire in context, which
mediate and regulate their thoughts, actions, and behaviors from then on. By the
time of puberty children have, in fact, become creatures of their culture. Vygotsky
thus saw culture as an organizing system of the concepts that originate and develop with a group of people tied together by force of history.
This line of work raises the question of association as a major force in development and cognition. Given the controversy surrounding the term in psychology
and linguistics, it is necessary to clarify, albeit schematically, what it now means
within the cognitive science paradigm. In psychology, associationism is the theory
that the mind comes to form concepts by combining simple, irreducible elements
through mental connection. One of the rst to utilize the notion of association
was Aristotle, who identied four strategies by which associations are forged: by
similarity (for example, an orange and a lemon), difference (for example, hot and
cold), contiguity in time (for example, sunrise and a roosters crow), and contiguity in space (for example, a cup and saucer). John Locke (1690) and David Hume
(1749) saw sensory perception as the underlying factor in guiding the associative
process; that is, things that are perceived to be similar or contiguous in time or
space are associated to each other; those that are not are kept distinct by the
mind. In the nineteenth century, the early psychologists, guided by the principles
enunciated by James Mill (see 2001), studied experimentally how subjects made
associations. In addition to Aristotles original four strategies, they found that factors such as intensity, inseparability, and repetition played a role in stimulating
associative thinking: for example, arms are associated with bodies because they
are inseparable from them; rainbows are associated with rain because of repeated
observations of the two as co-occurring phenomena; etc.
Associationism took a different route when Ivan Pavlov (1902) published his
famous experiments with dogs, which, as is well known, established the theory
of conditioning as an early learning theory. When Pavlov presented a meat stimulus to a hungry dog, the animal would salivate spontaneously, as expected. He
termed this the dogs unconditioned responsean instinctual response programmed into each species by Nature. After Pavlov rang a bell while presenting
the meat stimulus a number of times, he found that the dog would eventually salivate only to the ringing bell, without the meat stimulus. Clearly, Pavlov suggested,
the ringing by itself, which would not have triggered the salivation initially, had
brought about a conditioned response in the dog. It was thus by repeated association of the bell with the meat stimulus that the dog had learned something
Unauthenticated
newsomething not based on instinctual understanding. Every major behavioral

school of psychology has utilized the Pavlovian notion of conditioning in one way
or other. To this day, behaviorists believe that the learning of new material can,
by and large, be accounted for as the result of conditioned associations between
stimuli and responses. Psychologists of other schools, however, reject this type of
associationism as useless when it comes to explaining different kinds of learning,
such as problem-solving. However, the Pavlovian notion of conditioning is still
a useful one on many counts, despite the many questions it raises. More importantly, it may have sidetracked the study of associationism until recently.
The associative structure of concepts to produce more complex ones can be
called layering (Danesi 2001). A rst-order layer is one that is constructed via concrete associations that produce a rst-order type of conceptual metaphor, such as
thinking is seeing, which associates thought with the perception of vision. A second-order layer is one that is derived from rst-order concepts. Expressions such
as When did you think that up? Have you thought that through?are second-order
concepts since they result from the linkage of two conceptsideas are viewable objects + ideas are objects that can be extracted, ideas are viewable objects + ideas are
objects that can be scanned, etc. The third-order layer crystallizes from constant
amalgams of previously-formed layers. It is a productive source of cultural symbolism. For example, in order to understand the meaning of the term Enlightenment,
we must rst know that it is ultimately traceable to the rst order association of
mind and sight.
1.2.3 Creativity
In the two streams of cognitive sciencethe formalist and the embodied one (so to
speak)creativity has different denitions. In the former it consists in the ability
to create well-formed strings ad innitum; in the latter it is a result of what is now
called blending. In Syntactic Structures, Chomsky (1957) compared the goal of linguistics to that of chemistry. A good linguistic theory should be able to generate
all grammatically possible utterances, in the same way that a good chemical
theory might be said to generate all physically possible compounds (Chomsky
1957: 48). A decade later (Chomsky 1966a: 10), he went on to dene verbal creativity as the speakers ability to produce new sentences that are immediately understood by other speakers. For generativists, linguistic creativity unfolds within a
system of rules and rule-making principles that allow for the generation of an innite class of symbol combinations and permutations with their formal properties.
It should come as no surprise, therefore, to nd that anyone who holds this perspective has an affinity for articial intelligence models and computer algorithms.
Unauthenticated
Ulrich Neisser (1967: 6) put it as follows just before the advent of cognitive science
as an autonomous area of inquiry:
The task of the psychologist in trying to understand human cognition as analogous to that
of a man trying to discover how a computer has been programmed. In particular, if the program seems to store and reuse information, he would like to know by what routines or
procedures this is done. Given this purpose, he will not care much whether his particular
computer stores information in magnetic cores or in thin lms; he wants to understand the
program, not the hardware. By the same token, it would not help the psychologist to know
that memory is carried by RNA as opposed to some other medium. He wants to understand
its utilization, not its incarnation.
However, Neisser was well aware that the computer metaphor, if brought to an
extreme, would actually lead psychology astray. So, only a few pages later he issued the following warning (Neisser 1967: 9): Unlike men, articially intelligent
programs tend to be single-minded, undistractable, and unemotional in my
opinion, none does even remote justice to the complexity of mental processes.
Although attempts have been made to model such creative linguistic acts as
metaphor, the results have never been successful. This is because metaphor is
an exception to the strict rules of syntax, as Lakoff found in his thesis (described
above). When the mind cannot nd a conceptual domain for understanding a
new phenomenon, it resorts instinctively to metaphor to help it scan its internal
space in order to make new associations. There is no innovation in science or art
without this capacity. Logic and syntax simply stabilize the rational architecture
of cognition, not create new features for it to utilize in some novel way. It should
be mentioned, however, that there are algorithms that can identify metaphorical
language very effectively, such as the one devised by Neuman et al. (2013). And,
various programs have been written for generating legitimate metaphors. The
problem of representation is therefore a fairly straightforward one. The difficulties
come at the level of interpretation. When asked what a novel metaphor generated
through a random algorithmic process means then the computer breaks down.
The embodied cognition stream of cognitive science actually complements
the more formalist one, aiming to study the shift from sensory to conceptual
knowledge discussed above. The two streams should not be considered to be bifurcating, but rather converginga thematic subtext of this book is that all kinds
of approaches to cognition, from the formalist to the highly creative, are relevant
for understanding it. This is the basic meaning of interdisciplinaritya form of
scientic inquiry that is not based on partisan partnerships, but rather on an
open-minded view of the methods and goals of each scientic epistemology.
Unauthenticated
1.3 Quantication
One area where linguistics and mathematics certainly converge practically is in
the use of quantication methods and theories. In the case of mathematics, elds
such as statistics and probability theory are branches that have theoretical implications for studying mathematics itself as well as many practical applications
(in science, business, and other elds). In the case of linguistics, quantication is
a tool used to examine specic phenomena, such as statistical and probabilistic
patterns in the evolution of languages, or to esh out hidden structure in language
artifacts (such as texts) through basic statistical techniques.
A fundamental premise in the quantication research paradigm is that statistical and probabilistic methods allow us to discover and model structure effectively. Modeling is a basic aspect of both the theoretical and computational
approaches to language and mathematics, as discussed above. Architects make
scale models of buildings and other structures, in order to visualize the structural
and aesthetic components of building design, while using quantication techniques as part of the engineering of such structures; scientists utilize computer
models of atomic and sub-atomic phenomena to explore the structure of invisible matter and thus to glean underlying principles of structure (as in quantum
analysis); and so on and so forth.
Another premise is that mathematics is itself fundamentally the science of
quantity. The most basic signs in mathematics are the numbers that stand for
quantitative concepts. The integers, for example, stand for holistic entities, and
these can be enumerated with the different numbers. The study of integers leads
to the discovery of hidden pattern. For example, the sum or product of whole numbers always produces another whole number: 2 + 3 = 5. On the other hand, dividing
whole numbers does not always produce another whole number, because division
is akin to the process of partitioning something. So, 2 divided by 3 will not produce
a whole number. Rather, it produces a partitive number known of course as a fraction: 2/3. Various types of number sign systems have been used throughout history
to represent all kinds of quantitative concepts. The connection between the number and its referent, once established, is bidirectionalthat is, one implies the
other. The decimal system has prevailed for common use throughout most of the
world because it is an efficient system for everyday number concepts. The binary
system, on the other hand, is better adapted to computer systems, since computers
store data using a simple on-off switch with 1 representing on and 0 off.
The study of quantitative structure is now a branch of mathematics and linguistics. The three main relevant topics that will interest us in this book are compression, economical structure, and probability structure. These will be discussed
in more detail in the fourth chapter.
Unauthenticated
1.3 Quantication
| 53
1.3.1 Compression
One of the more interesting ndings of contemporary cognitive science is that of
compression, or the idea that emergence of form and meaning comes from the
compression of previous form. Compression can be both modeled and quantied
using basic statistical techniques. As Ball and Bass (2002: 11) point out in the area
of mathematics teaching, understanding compression involves unpacking symbols and concepts:
Looking at teaching as mathematical work highlights some essential features of knowing
mathematics for teaching. One such feature is that mathematical knowledge needs to be
unpacked. This may be a distinctive feature of knowledge for teaching. Consider, in contrast, that a powerful characteristic of mathematics is its capacity to compress information
into abstract and highly usable forms. When ideas are represented in compressed symbolic
form, their structure becomes evident, and new ideas and actions are possible because of
the simplication afforded by the compression and abstraction. Mathematicians rely on this
compression in their work. However, teachers work with mathematics as it is being learned,
which requires a kind of decompression, or unpacking, of ideas.
The contemporary study of linguistic compression has produced many important

results. Foremost among these is the presence of economic tendencies in both
linguistic change and in conversational structure. Branches such as stylometry
and corpus linguistics have emerged to shed light on a whole series of important
ndings that relate language to its use, such as, for example, the notion of MLU or
Mean Length of Utterance. This is the technique of determining the average number of morphemes in sentences, utilizing the following counting procedures:
1. Repeated words are counted only once.
2. Fillers (um, oh) are not included.
3. Hedges and other kinds of discourse gambits (no, yeah, hi, like, well) are included.
4. Compound words (pocket book) are counted as single elements, as are altered
words (doggie, stylish)
5. Verbs are counted as single lexemes and their tense morphology ignored
(learning and learned, for instance, are counted once as tokens of learn).
6. Function words (to, a) are ignored.
One of the obvious uses of this technique is that languages can be compared in
terms of MLU and various hypotheses put forward to account for signicant differences or similarities. Are languages with isomorphic MLUs related phylogenetically? Why is MLU variable? Research presenting various sentence lengths to
informants and mapping these against spoken and written texts has also found
that there are optimal sentence lengths in terms of stylistic preferences and reg-
Unauthenticated
isters. As simple as this may seem, it does have implications for describing style,
dialectal variation, and the like in a precise way. It is interesting to note that research has shown that the MLU changes over the life cycle and can also be used to
chart various milestones in the acquisition of language in childhood. Miller (1981)
found that the following MLUs corresponded to specic ages as follows:
Table 1.1: Mean length of utterance and language development
MLU
Age Equivalent (months)
1.31
1.62
1.92
2.54
2.85
3.16
3.47
3.78
4.09
4.40
4.71
5.02
5.32
5.63
18
21
24
30
33
36
39
42
45
45
51
54
57
60
In a subsequent study, Garton and Pratt (1998) indicate, however, that while there
is a correlation between MLU and age equivalence, it is a weak one. So, at best it
should be used as a generic guide, not as a law of verbal development. Nevertheless, the MLU shows how a simple quantitative notion might be able to shed light
on something intrinsic, such as language acquisition.
One application of the MLU concept is to determine how many morphemes
are used to construct words and sentences, so as to provide a rationale for classifying languages as either agglutinative or isolating, that is as either morphological
or syntactic, with the latter being much more compressive. As is well known, the
former are languages, such as Turkish, Basque, and a number of indigenous American languages, that use bound morphemes such as suffixes abundantly in the
construction of their words; the latter are languages that tend to form their words
with one morpheme per word. Chinese is an example of an isolating language,
although it too uses affixes, but less frequently than other languages do. The American linguist Joseph Greenberg (1966) introduced the concept of morphological
index to assess degree of morphological relation of languages to each other in
Unauthenticated
1.3 Quantication
| 55
terms of mean length of words. The index is derived by taking a representative and
large sample of text, counting the words and morphemes in it, and then dividing
the number of morphemes (M) by the number of words (W):
I=MW
In a perfectly isolating language, the index will be equal to 1, because there is
a perfect match between number of words (W) and number of morphemes (M),
or M = W. In agglutinating languages, the M will be greater than W. The greater
it is, the higher the index, and thus the higher the degree of agglutination. The
highest index discovered with this method is 3.72 for the Inuit languages. Interestingly, this method of classifying languages has produced consistent results with
the traditional phylogenetic methods using cognate analysis and sound shifts to
determine language families.
1.3.2 Probability
In mathematics, the formal study of quantitative structure came to the forefront
with the calculus and probability theory, both of which showed that quantities
cannot be studied in absolutist terms, but relative to the situation in which they
exist. From this, theories of probability became ever more present in the philosophy of mathematics itself. Probability attempts, in fact, to express in quantiable terms statements of the form: An event A is more (or less) probable than an
event B. Mathematicians have struggled for centuries to create a theory of probability that would allow them to penetrate what can be called a quantication
principle. This can be dened simply as the extraction of some probability metric
in a set of seemingly random data. In fact, they have developed several related
theories and methods to carry this out. The subjective theory takes probability
as an expression of an individuals own degree of belief in the occurrence of an
event regardless of its nature. The frequency theory is applied to events that
can be repeated over and over again, independently and under the same exact
conditions.
The study of such phenomena as compression and probabilistic structure
constitutes yet another area of the common ground that connects linguistics and
mathematics. Together with computational modeling, quantication methods
have been showing more and more that there are inherent tendencies in the brain
that manifest themselves in specic ways in representational systems. Unraveling
these tendencies is part of the hermeneutical perspective that interdisciplinarity
entails.
Unauthenticated
1.4 Neuroscience
We started off this chapter discussing Lakoffs 2011 lecture at the Fields Institute
showing how mathematics and language shared a common propertyblending.
Gdels famous proof, Lakoff argued, was inspired by Cantors diagonal method.
It was, in his words, a blend of Cantors method with a new domain. Gdel had
shown essentially that within any formal logical system there are results that can
be neither proved nor disproved. Lakoff pointed out that Gdel found a statement
in a set of statements that could be extracted by going through them in a diagonal fashionnow called Gdels diagonal lemma. That produced a statement, S,
like Cantors C, that does not exist in the set of statements. The inspiration came,
according to Lakoff, through the linguistic process of metaphorization, whereby
one domain is associated with another and in the association one nds new ideas.
Cantors diagonalization and one-to-one matching proofs are metaphorsblends
between different domains linked in a specic way. This metaphorical insight led
Gdel, Lakoff suggested, to imagine three metaphors of his own. The rst one,
called the Gdel Number of a Symbol, is evident in the argument that a symbol
in a system is the corresponding number in the Cantorian one-to-one matching
system (whereby any two sets of symbols can be put into a one-to-one relation).
The second one, called the Gdel Number of a Symbol in a Sequence, consists in
Gdels demonstration that the nth symbol in a sequence is the nth prime raised to
the power of the Gdel Number of the Symbol. And the third one, called Gdels
Central Metaphor, was his proof that a symbol sequence is the product of the
Gdel numbers of the symbols in the sequence.
Lakoff concluded by claiming that Gdels proof exemplies the process of
blending perfectly. A blend is formed when the brain identies two distinct inputs
(or mental spaces) in different neural regions as the same entity in a third neural
region. But the blend contains more information than the sum of information bits
contained in the two inputs, making it a powerful form of new knowledge (see
Figure 1.12).
The three together constitute the blend, paralleling the process of metaphor
preciselyinput 1 might correspond to the topic, input 2 to the vehicle, and the
blend to the so-called ground. In the metaphor, That mathematician is a rock,
the two distinct inputs are mathematician (topic) and rock (vehicle). The blending
process is guided by the inference (or what Lakoff calls a conceptual metaphor)
that people are substances, constituting the nal touch to the blenda touch that
keeps the two entities distinct in different neural regions, while identifying them
simultaneously as a single entity in the third. Using conceptual metaphor theory, which will be discussed subsequently, Lakoff suggested that the metaphorical
blend occurs when the entities in the two regions are the source (substances) and
Unauthenticated
1.4 Neuroscience
Input 1
| 57
Input 2
Blend
Figure 1.12: Blending
target (people). Gdels metaphors, analogously, came from neural circuits linking
a number source to a symbol target. In each case, there is a blend, with a single
entity composed of both a number and a symbol sequence. When the symbol sequence is a formal proof, a new mathematical entity appearsa proof number.
The underlying premise in this whole line of theorization is that metaphorical
blends in the brain produce knowledge and insights.
In the end, Lakoff argued that mathematicians and linguists had a common
goalto study the blending processes that unite mathematics and language.
Chomsky before had also argued for a similar collaboration, but his take on the
kind of approach was (and still is) radically different. Whatever the case, it became obvious by the early 2000s that the area where mathematics and language
can be studied interactively lies within neuroscience. It is therein that formal
theories and blending theories can be assessed and corroborated or eliminated.
We will discuss the different research ndings in neuroscience that are making the
investigation of linguistic and mathematical competence truly intriguing in the
nal chapter. Here it is sufficient to go through some of the goals of neuroscience
in a prima facie way.
1.4.1 Neural structure

Two basic questions that neuroscience attempts to answer is: (1) whether or not
language or mathematics is a species-specic faculty and (2) whether or not this
faculty is innate. Research is showing that while counting may occur in other
species, abstract mathematical knowledge is undoubtedly a special human ability, requiring the use of language, art (for drawing diagrams), and other unique
creative faculties. Stanislas Dehaene (1997) has brought forth persuasive experi-
Unauthenticated
mental evidence to suggest that the human brain and that of some chimps come
with a wired-in aptitude for math. The difference in the case of chimps is, apparently, an inability to formalize this knowledge and then use it for invention and
discovery. So, humans and chimps possess a kind of shared number instinct,
according to Dehaene and others, but not number sense. Of course, the study of
language in primates has also revolved around a similar dichotomy: Do primates
possess a language instinct but not a language sense?
Within neuroscience a subeld, called math cognition, has emerged to seek
answers to the innate (Platonic)-versus-constructivist debate in the learning of
mathematics. Brain-scanning experiments have shown that certain areas of the
brain are hard-wired to process numerical patterns, while others are not. So, math
cognition is specic to particular neural structures; it is not distributed modularly
throughout the brain. Moreover, these structures come equipped with number
sense. Dehaene claims that the number line, for instance, is not a construct; it
is an image that is innate and can be seen to manifest itself (differentially, of
course) throughout the world. But anthropological evidence scattered here and
there (Bockarova, Danesi, and Nez 2012) would argue to the contrary, since in
cultures where the number line does not exist as a tradition, the kinds of calculations and concepts related to it do not appear. Whatever the truth, it is clear that
the neuroscientic study of math cognition is an area of relevance to understanding what mathematics is, how it is learned, and how it varies anthropologically.
The study of the latter is a eld known as ethnomathematics. It has been found, for
example, that proof and mathematical discoveries in general seem to be located
in the same neural circuitry that sustains ordinary language and other cognitive
and expressive systems. It is this circuitry that allows us to interpret meaningless
formal logical expressions as talking about themselves.
One of the more signicant ndings to emerge from neuroscience in general is
the likelihood that the right hemisphere (RH) is a crucial point-of-departure for
processing novel stimuli: that is, for handling input for which there are no preexistent cognitive codes or programs available. In their often-quoted review of a large
body of experimental literature a number of decades ago, Goldberg and Costa
(1981) suggested that the main reason why this is so is because of the anatomical structure of the RH. Its greater connectivity with other centers in the complex
neuronal pathways of the brain makes it a better distributor of new information.
The left hemisphere (LH), on the other hand, has a more sequentially-organized
neuronal-synaptic structure and, thus, nds it more difficult to assimilate information for which no previous categories exist. If this is indeed the case, then it
suggests that the brain is prepared to interpret new information primarily in terms
of its physical and contextual characteristics. Further work in this area has conrmed this synopsis. This is a relevant nding because the rst thoughts about
Unauthenticated
1.4 Neuroscience
| 59
number (number sense) are likely to be located in the RH of the brain; these are
then given formal status by the LH. This suggests that both hemispheres are involved in a connective form of thinking.
The RH is where the sense impressions that the brain converts into images are
subsequently transformed into concrete percepts. Percepts register our physiological and affective responses to the signals and stimuli present in the environment.
They lter incoming information and assay it for its relevance, discarding from it
all that is deemed to be irrelevant to the task at hand. In this way, bodily sense
is present in all thinking in such a way that it is even more ordered than language and logic. Number sense emerges as a kind of blend from the percepts in
the RH which are then transferred into ordered sense to the LH. Work in neuroscience today seemingly conrms this very simple hypothesis. For example,
Semenza et al. (2006) found that mathematical abilities are located and develop
in the brain with respect to language, whose acquisition also shows a RH to LH
ow. The researchers assessed math ability in six right-handed patients affected
by aphasia following a lesion to their non-dominant hemisphere (crossed aphasia) and in two left-handed aphasics with a right-sided lesion. Acalculia (loss of
the ability to execute simple arithmetical operations) was found in all cases, following patterns that had been previously observed in the most common aphasias
resulting from LH lesions. No sign of RH acalculia (acalculia in left lateralized
right-handed subjects) was detected by their study. Overall, the study suggested
that language and calculation share the same hemispheric substratum.
PET and fMRI studies are now conrming that language processing is extremely complex, rather than involving a series of subsystems (phonology, grammar, and so on) located in specic parts of the brain (Brocas area, Wernickes
area, and Penelds area), and that it parallels how we understand numbers and
space. The neuronal structures involved in language are spread widely throughout
the brain, primarily by neurotransmitters, and it now appears certain that different types of linguistic and computational (arithmetical) tasks activate different
areas of the brain in many sequences and patterns. It has also become apparent
from fMRI research that language and problem-solving are regulated, additionally, by the emotional areas of the brain. The limbic systemwhich includes
portions of the temporal lobes, parts of the hypothalamus and thalamus, and
other structuresmay have a larger role than previously thought in the processing
of certain kinds of speech and in the emergence of number sense.
Overall, the current research in neuroscience suggests that the brain is a connective organ, with each of its modules (agglomerations of neuronal subsystems
located in specic regions) organized around a particular task. The processing
of visual information, for instance, is not conned to a single region of the RH,
although specic areas in the RH are highly active in processing incoming visual
Unauthenticated
information. Rather, different neural modules are involved in helping the brain
process visual inputs as to their contents; in practice this means retaining from the
information what is relevant, and discarding from it (or ignoring) what is not. Consequently, visual stimuli that carry linguistic information or geometric information (such as diagrams) would be converted by the brain into neuronal activities
that are conducive to strictly logical, not visual, processing. This is what happens
in the case of American Sign Language. The brain rst processes the meanings
of visual signs, extracting the grammatical relations in them, in a connected or
distributed fashion throughout the brain (Hickok, Bellugi, and Klima 2001). But
visual stimuli that carry a different kind of informationsuch as the features of a
drawingare converted instead into neuronal activities that are involved in motor
commands for reproducing the drawing. This nding would explain why tonemes
(tones with phonemic value) are not processed by the RH, as is the case for musical
tones. Tone systems serve verbal functions, thus calling into action the LH. Musical tones instead serve emotional (aesthetic) functions, thus calling into action
the RH.
The connectivity that characterizes neural structure has been examined not
only experimentally with human subjects, but also theoretically with computer
software. Computer models of the brain have been designed to test out various
theories, from formalist to blending theories. One of the most cited theories in
computational neuroscience is the so-called Parallel Distributed Processing (PDP)
model. It is designed to show how, potentially, brain networks interconnect with
each other in the processing of information. The PDP model appears to perform
the same kinds of tasks and operations that language and problem-solving do
(MacWhinney 2000). As Obler and Gjerlow (1999: 11) put it, in the strong form
of PDP theory, there are no language centers per se but rather network nodes
that are stimulated; eventually one of these is stimulated enough that it passes a
certain threshold and that node is realized, perhaps as a spoken word.
The integration of RH and LH functions to produce language and mathematics
is now a virtual law of neuroscience. Investigating such phenomena as blending has, in fact, become a primary research target, since it provides a theoretical
framework for how we form and understand complex ideas via the interconnectivity of modules in separate neural pathways that are activated in tandem. The
specic branch of neuroscience that studies these phenomena is known as cognitive neuroscience. Methods employed in this branch include experimental studies
with brain-damaged subjects, neuroimaging studies, and computer modeling research on neural processes.
The relevant issues pertaining to the common ground of language and mathematics that cognitive neuroscience is now investigating are the following:
Unauthenticated
1.4 Neuroscience |
61
1.
2.
Are all numbers and words blends?

Are the same hemispheric structures that produce word sense involved in producing number sense?
3. What differentiates number and word blends?
4. What role does metaphor play in the construction of linguistic and mathematical concepts?
5. Is mathematics an independent faculty or is it a complementary faculty to
language?
6. Are rules of grammar embedded in modules or are they the result of some
integrated neural circuitry?
As mentioned, some of these questions will be discussed in the last chapter. The
point here is that the growth of neuroscience as a major branch of cognitive science is bringing a more empirical stance to the study of linguistic and mathematical theories. As a point-of-departure the type of research being conducted in cognitive neuroscience may be worthwhile annotating here. Kammerer (2014) found a
direct link between words (semantically and lexically) and grammar, thus providing contrary evidence to the view of some generativists that meaning and grammar are independent modules in the brain. It has been found that basic nouns
referring to a category (cat rather than feline) rely primarily on the ventral temporal lobes, which represent the shape features of entities; in contrast, basic verbs
(which involve the predication of actions) rely primarily on posterior middle temporal and fronto-parietal regions, both of which involve the visual motion features
and somatomotor features of events. Many word classes involve remarkably close
correspondences between grammar and meaning and hence are highly relevant
to the neuroscientic study of conceptual knowledge.
Moseley and Pulvermller (2014) present ndings that are also critical of some
generativist claims, such as the one that lexical category and semantic meaning are separate phenomena. Abstract words are a critical test case: dissociations
along lexical-grammatical lines would support models purporting lexical category as a basic principle governing brain organization, whereas semantic models
predict dissociation between concrete words but not abstract items. During fMRI
scanning, subjects read orthogonalized word categories of nouns and verbs, with
or without concrete, sensorimotor meaning. Analysis of inferior frontal, precentral and central areas revealed an interaction between lexical class and semantic
factors with category differences between concrete nouns and verbs but not abstract ones. Though the brain stores the combinatorial and lexical-grammatical
properties of words, the ndings showed that locational differences in brain activation, especially in the motor system and inferior frontal cortex, are driven by
semantics and not by lexical class.
Unauthenticated
Libertus, Pruitt, Woldorff, and Brannon (2009) presented 7-month-old infants

with familiar and novel number concepts while electroencephalogram measures
of their brain activity were recorded. The resulting data provided convergent evidence that the brains of infants can detect numerical novelty. Alpha-band and
theta-band oscillations both differed for novel and familiar numerical values. The
ndings thus provide hard evidence that numerical discrimination in infancy is
ratio dependent, indicating the continuity of cognitive processes over development. These results are also consistent with the idea that networks in the frontal
and parietal areas support ratio-dependent number discrimination in the rst
year of human life, consistent with what has been reported in neuroimaging studies in adults and older children.
1.4.2 Blending
As Whiteley (2012) has cogently argued, of all the models investigated by cognitive neuroscientists, the most promising one for getting at the core of the neural
continuity between mathematics and language is blending theory. The rst elaborate discussion of this theory is by Fauconnier and Turner (2002). The best way
to make the case of why blending may be a promising line of inquiry for neuroscience to pursue is to take a step back and review conceptual metaphor theory
(CMT) schematically here.
CMT subdivides gurative language into linguistic and conceptual. The former is a single metaphorical utterance; the latter a mental schema from which the
single metaphor derives. In other words, a specic linguistic metaphor is a token
of a type (a conceptual metaphor). For instance, Hes a real snake is a token of
people are animals. Using this distinction, in 1980 George Lakoff and Mark Johnson meticulously illustrated the presence of conceptual metaphors in everyday
speech forms, thus disavowing the mainstream view at the time that metaphorical
utterances were alternatives to literal ways of speaking or even exceptional categories of languagea topic that, as we saw above, Lakoff had himself addressed
in his doctoral thesis. According to the traditional account of discourse, an individual would purportedly try out a literal interpretation rst when he or she hears
a sentence, choosing a metaphorical one only when a literal interpretation is not
possible from the context. But as Lakoff and Johnson convincingly argued, if this
is indeed the case, then it is so because people no longer realize that most of their
sentences are based on metaphorical inferences and nuances. Moreover, many
sentences are interpreted primarily in a metaphorical way, no matter what their
true meaning. When a sentence such as The murderer was an animal is uttered,
almost everyone will interpret it as a metaphorical statement. Only if told that the
Unauthenticated
1.4 Neuroscience
| 63
animal was a real animal (a tiger, a bear, and so on), is the sentence given a
literal interpretation.
A critical nding of early CMT research concerned so-called nonsense or
anomalous strings. It was Chomsky (1957) who rst used such stringsfor example, Colorless green ideas sleep furiouslyto argue that the syntactic rules of a
language were independent from the semantic rules. Such strings have the structure of real sentences because they consist of real English words put together
in a syntactically-appropriate fashion. They meet the logical criterion of wellformedness. This forces us to interpret the string as a legitimate, but meaningless,
sentencea fact which suggests that we process meaning separately from syntax.
Of course, what Chomsky ignored is that although we do not extract literal meaning from such strings, we are certainly inclined to extract metaphorical meaning
from them. When subjects were asked to interpret them in follow-up research,
they invariably came up with metaphorical meanings for them (Pollio and Burns
1977, Pollio and Smith 1979, Connor and Kogan 1980). This nding suggests, therefore, that we are inclined, by default, to glean metaphorical meaning from any
well-formed string of words, and that literal meaning is probably the exception.
As Winner (1982: 253) has aptly put it, if people were limited to strictly literal
language, communication would be severely curtailed, if not terminated.
Another early nding of CMT is that metaphor implies a specic type of
mental imagery. In 1975, for instance, Billow found that a metaphor such as The
branch of the tree was her pony invariably was pictured by his child subjects in
terms of a girl riding a tree branch. Since the use of picture prompts did not significantly improve the imaging process or the time required to interpret metaphors,
Billow concluded that metaphors were already high in imagery-content and,
consequently, needed no prompts to enhance their interpretation. Incidentally,
visually-impaired people possess the same kind of imagery-content as do visually normal people. The fascinating work of Kennedy (1984, 1993. Kennedy and
Domander 1986) has shown that even congenitally blind people are capable of
making appropriate line drawings of metaphorical concepts if they are given
suitable contexts and prompts.
A conceptual metaphor results from a neural blend. In the linguistic metaphor
The professor is a bear the professor and the bear are amalgamated by the conceptual metaphor people are animals. Each of the two parts is called a domain
people is the target domain because it is the general topic itself (the target of
the conceptual metaphor); and animals is the source domain because it represents the class of vehicles, called the lexical eld, that delivers the metaphor (the
source of the metaphorical concept). Using the Lakoff-Johnson model, it is now
easy to identify the presence of conceptual metaphors not only in language, but
also in mathematics. The number line is a good example of what this entails. In
Unauthenticated
this case, the target domain is number and the source domain is linearity. The
latter comes presumably from the fact that we read numerals from left to right or
in some languages, vice versa. So, the line is a blend of two input domains leading to a new way of understanding number and of representing it (see Figure 1.12
above). Thus the notion of number sense is relevant and interpretable only on the
basis of specic cultural experience and knowledge. That is, only in cultures that
use Euclidean geometry is it possible to make a general inference between geometrical objects such as lines and numerical ideas. Thus, conceptual metaphors
are not just extrapolations; they derive from historical, cultural, social emphases,
experiences, and discourse practices.
What does talking about number as a gment of linearity imply? It means
that we actually count and organize counting in this way. In a phrase, the conceptual metaphor both mirrors and then subsequently structures the actions we
perform when we count. First, it reveals how the blend occurred; and second,
it then guides future activity in this domain of sense-making. For this reason,
the number line has become a source of further mathematics, leading to more
complex blends and thus producing emergent structure regularly. The number
line results from blending experiences (inputs) to further conceptual abstractions,
permitting us not only to recognize patterns within them, but also to anticipate
their consequences and to make new inferences and deductions. Thus, blending
theory suggests that the source domains (inputs) enlisted in delivering an abstract
target domain were not chosen originally in an arbitrary fashion, but derived from
the experience of events and, of course, from the subjective creativity of individuals who use domains creatively and associatively.
CMT has led to many ndings about the connectivity among language and
mathematics, culture, and knowledge (Lakoff and Nez 2000). Above all else,
it has shown that gurative cognition shows up not only in language but in other
systems as well. Lakoff himself has always been aware of this level of connectivity,
writing as follows: metaphors can be made real in less obvious ways as well, in
physical symptoms, social institutions, social practices, laws, and even foreign
policy and forms of discourse and of history (Lakoff 2012: 163164).
1.5 Common ground

The language-mathematics interface has been subdivided into various areas of
common ground research and theory-making in this book (formalist, computationist, quantitative-probabilistic, and neuroscientic), each of which will be
surveyed in the remaining chapters. Needless to say, there are other areas than
this simple categorization will allow. But the objective here is to give a generic
Unauthenticated
1.5 Common ground
| 65
overview, not an in-depth description and assessment of all the many applications and connections between the two disciplines. My goal is to show how this
collaborative paradigm (often an unwitting one) has largely informed linguistic
theory historically and, in a less substantive way, how it is starting to show the nature of mathematical cognition as interconnected with linguistic cognition. The
comparative study of mathematics-as-language and language-as-mathematics
gained momentum with Lakoff and Nezs (2000) key book and with work
in the neurosciences showing similar processing mechanisms in language and
mathematics.
As mentioned, the interface lays the groundwork for formulating specic
hermeneutical questions and conceptualizations about the nature of mathematics vis--vis language. Neuroscience enters the hermeneutical terrain by shedding
light on what happens in the brain as these conceptualizations are manipulated
in some way.
The primary task of any scientic or critical hermeneutics is to explain how
and why phenomena are the way they are by means of theories, commentaries,
annotations, and, as new facts emerge or are collected about the relevant phenomena, to subsequently adjust, modify, or even discard them on the basis of
the new information. The ultimate goal of science is to explain what Aristotle
called the nal causes of reality. To esh these out in mathematics and language specic interdisciplinary rubrics present themselves as highly suggestive.
Linguistics studies the nal causes that constitute the phenomenon of language
and mathematics the nal causes that constitute math cognition. Whether one
adopts a formalist or functionalist analytical framework, the role of both sciences
is to uncover laws of structure and meaning that undergird the systems under
study. Linguists use mathematics also in specic waysfrom computer to quantitative modeling. Vice versa, the mathematician can look to linguistic theories to
determine the degree of relationship between mathematical and linguistic structure. The balance tilts much more to the linguistics-using-math side than the
math-using-linguistics side. But the work in CMT and blending theory is changing all this and starting to instill a veritable equilibrium of research objectives and
theoretical modeling that nds its fulcrum in the neurosciences.
Unauthenticated
2 Logic
Logic will get you from A to B. Imagination will take you everywhere.
Albert Einstein (18791955)
Formal linguistics and mathematics focus on the rules, rule types, and rulemaking principles that undergird the formation of forms (words, digits, sentences,
equations). Both have developed very precise methods to describe the relevant
apparatus of rules and their operations. An obvious question is what similarities
or differences exist between the two. As we saw in the previous chapter, formalist
approaches have actually revealed many similarities traceable to a common foundation in logic. As a matter of fact, formal linguistics implies formal mathematics
thus uniting the two disciplines, ipso facto, at least at the level of the study of
rules. If the focus is on the latter, then indeed formalism is of some value; if it
is deemed to be an overt or indirect theory of mind, as for example UG theory,
then its value is diminished, unless the theory can be validated empirically. This
chapter will look more closely at the main techniques and premises that underlie
both formal mathematics and formal linguistics, as well as at the main critiques
that can be (and have been) leveled at them.
Language and mathematics were thought in antiquity to share a common
ground in lgos, which meant both word and thought. The main manifestation of this mental feature was in logic and this, in turn, was the basis of linguistic
grammars and mathematical proofs. As we saw, Aristotle and Dionysius Thrax,
envisioned language as a logically structured system of grammatical rules of sentence formation (Bck 2000, Kempe 1986), in an analogous way that Pythagoras
and Euclid envisioned mathematical proofs as a set of statements that followed
from each other logically. The term lgos emerged in the sixth-century BCE with
the philosopher Heraclitus, who dened it as a divine power that produced
order in the ux of Nature. Through the faculty of logic, all human beings, he
suggested, shared this power. The Greeks thus came to see logic as a unique intellectual endowment allowing humans to transform intuitive and practical observations about the world into general principles. They separated lgos from mythos
(discussed below). So, the starting point for a comparative study of formal mathematics and linguistics is a discussion of logic. For this purpose, it can be dened
simply (and restrictively) as a faculty of the mind that leads to understanding
through reection and ordered organization of information.
Unauthenticated
2 Logic
67
This emphasis on logical method in the study of geometry and grammar

persisted throughout the medieval, Renaissance, and Enlightenment eras, culminating in the development of formal or propositional logic as a mathematicalphilosophical system in the nineteenth century. The generative movement, as
discussed, adopted the basic principles and methods of this system to describe
the structure of language grammars and the language faculty (the UG). This
approach was called, and continues to be called generally and appropriately,
mathematical linguistics (Partee, Meulen, and Wall 1999). Its primary aim is to
unravel the logical structure of natural language using the tools of formal mathematics, such as propositional logic and set theory (Kornai 2008). Logical structure
implies rules and, thus, from the outset, generative grammar sought to identify
the rule-making principles that constituted linguistic competence. Of course,
this is a reductive characterization of mathematical linguistics (ML), which has
a broader purview than this, including the study of the statistical structure of
texts. Nevertheless, to this day, its main thrust is to study rule systems which, it
is claimed, reveal how the language faculty works.
The focus in both formal mathematics and formal linguistics is, to reiterate,
on logical (propositional) form, rather than on meaning in the broader social and
psychological sense of that word. Meaning is relegated to the margins or else to its
study in other disciplinary domains (from philosophy to semiotics). Formalism is
essentially symbol game playing, as Colyvan (2012: 4) aptly characterizes it:
In its purest form, formalism is the view that mathematics is nothing more than the manipulation of meaningless symbols. So-called game formalism is the view that mathematics is
much like chess. The pieces of a chess set do not represent anything; they are just meaningless pieces of wood, metal, or whatever; dened by the rules that govern the legal moves that
they can participate in. According to game formalism, mathematics is like this. The mathematical symbols are nothing more than pieces in a game and can be manipulated according
to the rules. So, for example, elementary calculus may tell us that d(ax2 + bx + c)/dx = 2ax + b.
This is taken by formalism to mean that the right-hand side of the equation can be reached
by a series of legal mathematical moves from the left-hand side. As a result of this, in future mathematical games one is licensed to replace the symbols d(ax2 + bx + c)/dx with
the symbols 2ax + b. That too becomes a legal move in the game of mathematics. There are
more sophisticated versions of formalism, but thats the basic idea. There is a question about
whether the pieces of the game are the actual mathematical symbol tokens, or whether it
is the symbol types. That is, is the instance of different from, or the same as this one: ?
They are two different tokens of the same type. Formalists need to decide where they stand
on this and other such issues. Different answers give rise to different versions of formalism.
It is interesting to note that Saussure (1916) used the analogy of the game of
chess in basically the same way to distinguish between formal linguistic structure (langue) and its uses (parole). Studying the actual uses in themselves is
Unauthenticated
68 | 2 Logic
impracticable, since they are unpredictable (parole); but the system that permits
them is not (langue). Getting at that system is the goal of linguistics, according
to Saussure and the later formalists. Moreover, since rules of grammar or rules
of proof are developed to organize relevant information by showing relations
among the parts within it, it is a small step to the belief that they mirror the laws
of thought. In showing how the moves literally move about the ultimate goal
is to understand how the mind plays the game of language or mathematics, so
to speak. But, as Colyvan puts it, because there are different versions of how the
game can be played, what the rule-makers end up doing is arguing over the nature
of the rules, losing sight of the original goalunraveling the raison dtre of forms
and their connection to logic in its fundamental sense of reasoning from facts and
thus of systematic organization of relevant information.
Formal linguistic and mathematical theories have always had a basis in logic.
Pn.ini, as we saw, described the Sanskrit language with a set of about 4,000 rules,
showing that many words were made up of smaller bits and pieces, which recur
in the formation of other words and thus are intrinsic parts of the grammar of
a language. Modern-day formalism also foresees its objectives essentially in this
wayas a study of how to identify the rules that describe grammar. The primary
goal is thus to come up with a set of consistent and complete rules that hold together logically. By studying these rules the assumption is that we are holding up
a mirror to the brain.
As a preliminary observation, it should be mentioned that the debates around
formalist approaches and the more central one of what they suggest in real (brainbased) terms have subsided somewhat today. The reason for this is that formalism
has become bogged down with strangling complexity (some of which we will see
in this chapter). After a very productive period (from the turn of the twentieth century to about the late 1990s), very little progress has been made since the start
of the new millennium in dening the types of rules and their logical properties
required to adequately describe language or mathematics. For this reason, many
linguists and mathematicians have apparently become tired of this line of inquiry.
Moreover, the cognitive linguistic movement, spearheaded by George Lakoff and
starting in the early 1980s (Lakoff and Johnson 1980), came forward to show that
we cannot separate meaning from the game, because if we do the game literally
has no meaning at all. Linguistics and mathematics have thus moved on somewhat, having become more and more interested in studying language and mathematics directly through the lens of meaning, seeing formalist games, by and large,
as adjuncts to this central interest. Nevertheless, the formalist episode in both
disciplines has been a very productive and insightful one, and continues to be so
with the advent of computer science and articial intelligence (chapter 3).
Unauthenticated
2.1 Formal mathematics |
69
2.1 Formal mathematics

The starting point for discussing the role of logic in mathematics is Pythagoras,
who is credited with devising the rst logical analyses of numbers and geometric
shapes. He and his followers collected a disjointed set of practical facts that were
known to builders and engineers inserting them into a coherent and powerful theoretical system of knowledge, and proving them to be consistent facts with the
methods of proof. The Pythagoreans thought that by unraveling the hidden laws
of number through proof, they would be simultaneously discovering the hidden
laws of Nature. In sum, the Pythagoreans took practical mathematical know-how
(epistemic knowledge)measuring and countingand turned it into a theoretical
knowledge (theorems and propositions) through logical demonstration (gnosis).
Pythagoras did not associate gnosis of mathematics with gnosis of language, however, and thus both systems in a common source of logical thought. It was Plato
who did so. This is perhaps why he laid out an educational system divided into two
components called, the Trivium, which included the study of grammar, logic, and
rhetoric, and the Quadrivium, which included arithmetic, geometry, music, and
astronomy. Signicantly, mathematic (in the singular) was not included in Platos
educational scheme for a simple reasonit did not yet exist as an autonomous
discipline, until Euclid, who took a major step in that direction by developing a
broad apparatus of proof to establish arithmetical and geometrical truths. Euclid
also sequenced key ideasfor example, placing planar geometry prior to solid
geometry, thus setting the stage for a unied, well-structured discipline. This was
the rst attempt at connecting logical method with mathematics in a systematic,
formal way.
Actually, Euclids approach was still not called mathematics (in the plural).
The nal unication of the various truths and proof methods of ancient arithmetic
and geometry into a comprehensive discipline had to await Descartes (1637) who
brought together arithmetic, analysis, and geometry through the ingenious idea of
linking number and shape by means of a coordinate system. That extraordinary
event marked the beginning of the modern-day system of knowledge that we
know as mathematics, affording a means to collect and analyze not just knowledge about numbers, shapes, and logical arguments, but also a host of other
phenomena that we now understand as properly mathematical. The shift was
signaled linguistically with a move from the singular mathematic to the plural
mathematics.
Unauthenticated
70 | 2 Logic
2.1.1 Lgos and mythos

For the Greeks, lgos was the force behind logic and this, in turn, was the guiding
force in the invention of the methods of proof to show how propositions are related
to each other within a system and if they are valid or not. Intuition and guesswork
were relegated to another type of thought, which they called mythos. The word
was rst used by Aristotle to describe the plot sequence structure of tragedies.
Like lgos, it also meant word, but it referred to a different kind of thought,
involving phenomena that are not real in the same sense that the facts of, say,
arithmetic or geometry, are. Mythos was the language of narratives such as myths
that were used to explain phenomena in imaginary and creative ways. What the
Greeks ignored was that mythos is also a form of knowledge-making, based on
experiences that fall outside known explanations or scientic paradigms.
Lgos thus came to assume a privileged place in Greek philosophy and mathematics early on. It was the Ionian School of Greece that took the rst radical step
away from mythos toward lgos. But mythos did not disappear from either philosophy or mathematics, playing a silent role in the two elds, since both made reference to and even adopted mythical themes and ideas. Mythos in its non-narrative
sense was thus seen as the use of intuition and sensory knowledge to explain
unknown phenomena and lgos as the use of reection and reason to establish
truths. Discoveries in mathematics were actually seen as the result of mythos and
lgos interacting in various ways. The Pythagoreans portrayed numbers as both
signs of quantity and signs of destiny or some other mythic notion. Pythagoras
believed that numeration (lgos) and numerology (mythos) were intrinsically interrelated. But many others discarded mythos completely, as pure speculation and
part of rituals and theater, not mathematics and philosophy. Democritus, who formulated the rst atomic theory of matter, reduced the sensory qualities of things,
such as warmth, cold, taste, and odor, to quantitative differences among their
atomic properties. For Democritus, all aspects of existence can be explained with
the logic of physical laws.
Socrates believed that lgos was innate in all human beings, teaching that
individuals had full knowledge of truth within them, and that this could be
accessed through conscious logical reection or elicited through dialogue. He
demonstrated that even an untutored slave could be led to grasp the Pythagorean
theorem through a form of dialogue that induced the slave to reect upon the
truths hidden within him. Socrates also stressed the need to analyze beliefs
critically and rationally (so that myths would not be construed as truths), to
formulate clear denitions of basic concepts, and to approach ethical problems
sensibly and analytically. From this tradition, Aristotle established the syllogism
as a technique in logical analysis (as discussed briey in the previous chapter).
Unauthenticated
71
Aristotle especially criticized the use of mythos in mathematics as meaningless.

He also disapproved of Platos separation of form from matter, maintaining that
forms are contained within the concrete objects that exemplify them. The aim of
philosophy and mathematics is to dene the observable forms of reality and to
classify them logically.
The Greek belief that logic could explain reality remained a cornerstone of
Roman philosophy. But mythos resurfaced as a mode of reasoning in the early
medieval Christian world, when very little progress occurred in mathematics,
not only because of the dominance of mythos, but perhaps because scholars
in that era had no access to the ancient texts and thus to their mathematical
demonstrations and notions. So, mathematics remained relatively dormant until
Fibonacci revived it in his Liber Abaci, which showed the power of the decimal
system for both theoretical and practical mathematical purposes. Combined with
the translation of Al-Khwarizmis treatise on algebra, mathematics started to gain
momentum again as a theoretical discipline. By the Renaissance, it became a
major school discipline and a critical tool in new scientic investigations, such as
those of Galileo. Galileo solved the problems of physics with simple mathematical descriptions, opening the way for the emergence of modern mathematical
physics.
At rst it was Aristotelian and Platonic philosophy that came to the forefront,
primarily because of the efforts of the Florentine intellectual, Marsilio Ficino, who
translated Platos writings into Latin. The Renaissance spawned and encouraged
a new, freer mood of debate. Ironically, from this new fertile intellectual terrain
came the rst major break with Platonic-Aristotelian philosophy. It was the English philosopher and statesman, Francis Bacon, who persuasively criticized it
on the grounds that it was futile for the discovery of physical laws. He called for a
scientic method based on observation and experimentation. Paradoxically, both
Bacons and Galileos emphasis on empirical observations generalized as mathematical truths led, by the late Renaissance, to the entrenchment of Aristotles idea
that a meaningful understanding of reality could be gained only by exact logical
thinking. By the seventeenth and eighteenth centuries this very same idea was
extended to the study of the mind. Philosophers like Hobbes, Descartes, Spinoza,
and Leibniz assumed that the mind could, and should, be studied by comparing
it to the laws of logic inherent in mathematical structures. For Hobbes (1656), in
fact, everything could be explained with the laws of arithmetic.
By the Enlightenment, the methods of logic and mathematics became even
more intertwined, remaining so to this day. Intuition was relegated to the margins,
although Kant (1790) suggested that intuition was essentially a priori reection.
He did not, however, explain the link between the two. Hegel (1807) argued that
intuition and experience were not cast in stone, but that they varied widely from
Unauthenticated
72 | 2 Logic
person to person, and there existed a rational logic in all humans that eventually had supremacy in governing human actions. Marx (1953) developed Hegels
philosophy into the theory of dialectical materialism by which he claimed that
human history (destiny) unfolded according to unconscious physical laws that
led to inevitable outcomes. On the other side of the debate, Nietzsche (1979) saw
intuition, self-assertion, and passion as the only meaningful human attributes,
with logic and reason being mere illusory constructs. Peirce (1931) developed a
comprehensive system of thought that emphasized the biological and social basis
of knowledge, as well as the instrumental character of ideas, thus uniting intuition and reason. Husserl (1970) stressed the experiential-sensory basis of human
thinking. For Husserl, only that which was present to consciousness was real. His
theoretical framework came to be known as phenomenology which has, since his
times, come to be a strong movement in psychology and philosophy dedicated to
describing the structures of experience as they present themselves to consciousness, without recourse to any theoretical or explanatory framework.
It is not surprising that many of the philosophers of the above eras were
also mathematiciansreecting the common origins of both forms of inquiry in
Ancient Greece. The Greeks, like many of these scholars, saw logic as the link between philosophy and mathematics. To understand mathematics therefore, one
had to study the nature of logic. They divided logic into two main categories
induction and deduction. The former involves reaching a general conclusion from
observing a recurring pattern; the latter involves reasoning about the consistency
or concurrence of a pattern. Induction is generalization-by-extrapolation; deduction is, instead, generalization-by-demonstration. They were, of course,
aware that there were other types of logic (as we shall see), but they argued that
induction and deduction were particularly apt in explaining mathematical truths.
2.1.2 Proof
The starting point for the development of any system of proof is a set of axioms
and postulates that are assumed to be self-evident. If A = B and A = C, then we
can condently conclude that B = C by the axiom of equalitythings equal to
the same thing are equal to each other. The axiom states something that we know
intuitively about the world. It has an inherent logical sense that needs no further
elaboration or explanation. Axioms, like those of Euclid and Peano in the previous
chapter, are common sensical in this way. Now, these can be used to carry out a
proof in arithmetic or geometry, which is essentially a set of statements (some of
which are axioms, some of which are previously proved theorems, and so on) that
are connected to each other by entailment. The sequential order of the parts in
Unauthenticated
| 73
the set leads to a conclusion that is inescapable, much like Aristotles syllogisms.
Needless to say, self-evident notions may not always be self-evident, as we saw
with Euclids fth axiom. And, as some research in anthropology has shown, the
concept of axiom itself may not be universal, as the Greeks assumed (see relevant
articles in Kronenfeld, Bennardo, and de Munck 2011). Work in the pedagogy of
mathematics across the world has shown, moreover, that the methods of proof
and their foundation on axioms are not found everywhere.
As Colyvan (2012: 5) remarks, the basic idea is that mathematical truths can,
in some sense, be reduced to truths about logic. This entails several related assumptions or corollaries. One of these is that human thinking is not random, but
structured logicallythat is, the components of thought are connected to each
other in a systematic way, as mirrored in a syllogism. It is in the domain of proofmaking that we can observe how logic works. Another common belief is that the
rules of logic written by mathematicians are real in the sense that they accurately
represent the mental logic involved and thus, as Boole (1854) put it, the laws of
thought. So, studying proofs is studying the laws of logic in actu and, by extension, the laws of thought.
Let us take a classic proposition that the number of degrees in a triangle is
180 as a case-in-point of how proof unfolds. If one measures the sum of degrees
in hundreds or thousands of triangles, one will nd that they add up to 180 (giving some leeway for measurement errors). But we cannot be certain that this is
always the case. So, we put it forth as a proposition to be proved. If the proof is
successful it would turn the proposition into a theorem that allows us to use the
fact that 180 is the sum for all triangles in subsequent proofs. First, a triangle is
constructed with the base extended and a line parallel to the base going through
its top vertex (A). The angles at the other vertices are labeled with B and C, as
shown below:
A
Figure 2.1: Part 1 of the proof that the sum of

the angles in a triangle is 180
Now we can use a previously proved theorem of plane geometrynamely that the
angles on opposite sides of a transversal are equal. In the diagram above, both
AB and AC are transversals (in addition to being sides of the triangle). We use the
previous theorem to label the equal angles with the same letters x and y:
Unauthenticated
74 | 2 Logic
A
x

Now, we can use another established fact to show that the angles inside the triangle add up to 180namely, that a straight line is an angle of 180. To do this, we
label the remaining angle at the vertex A as z:
A
z

We can now see that the sum of the angles at A is x + y + z. Since these make up
a straight line, we assert that x + y + z = 180 by the axiom of equality. Next, we
look at the angles within the triangle and notice that the sum of these, too, add
up to x + y + z. Since we know that this sum is equal to 180, we have, again by
virtue of the axiom of equality, proved that the sum of the angles in the triangle
is 180. Since the triangle chosen was a general one, because x, y, and z can take
on any value we desire (less than 180 of course), we have proved the proposition
true for all triangles. This generalization-by-demonstration process is the sum and
substance of deductive thinking.
It is relevant to note here that the proof applies to two-dimensional triangles.
As discussed in the previous chapter, the mathematics changes for triangles in
higher dimensions, a fact that was actually established by the so-called GaussBonnet proof applied to n-dimensional Riemannian manifolds.
This proof is deductive. The relevant feature about deduction is the way in
which the various parts are put together sequentially, much like the sentences in
a coherent verbal text, and how each move from one part to the next has sequitur or entailment structurethat is, the choice of the moves is not random; it
is based on how each move derives from the previous one logically in sequence. It
is the coherence that leads us to accepting the conclusion (theorem) as being necessarily so. In the development of the proof, previously-proved theorems, axioms,
or established facts were used. This is analogous to the semiotic notion of intertexuality, whereby one text (in this case the proof at hand) alludes to, or entails,
Unauthenticated
| 75
other texts (already-proved theorems and established facts). This, in turn, implies
associative thinking, not strict deductive thinking, whereby the solver brings in
information from outside the text that has bearing on the text.
As the Greeks found out early on, not all propositions can be proved by deduction. Some require induction. Consider the following well-known proposition:
to develop a formula for the number of degrees in any polygon. Lets consider a
triangle rstthe polygon with the least number of sides. The sum of the angles
in a triangle is 180. Next, lets consider any quadrilateral, which can be divided
into two triangles. By doing this, we discover that the sum of the angles in the
quadrilateral is equivalent to the sum of the angles in the two triangles, namely
180 + 180 = 360. The pentagon can be divided into three triangles and thus the
sum of its angles is equal to the sum of the angles in the three triangles: 180 +
180 + 180 = 540.
Continuing on in this way, we will nd that the number of angles in a hexagon
is equal to the sum of the angles in four triangles, in a heptagon to the sum of
the angles in ve triangles, and so on. Since any polygon can be segmented into
constituent triangles, we have uncovered a patternthe number of triangles that
can be drawn in any polygon is two less than the number of sides that make
up the polygon. For example, in a quadrilateral we can draw two triangles, which
is two less than the number of its sides (4), or (4 2); in a pentagon, we can
draw three triangles, which is, again, two less than the number of its sides (5),
or (5 2); and so on. In the case of a triangle, this rule also applies, since we
can draw in it one and only one triangle (itself). This is also two less than the
number of its sides (3), or (3 2). We can continue the same reasoning process as
far as our energy will permit us and we will not nd any exception to this pattern.
So, we can conclude that in an n-gon we can draw (n 2) triangles. Since we
know that there are 180 in a triangle, then there will be (n 2) 180 in an n-gon.
What if we do come across an exception? The answer goes somewhat as follows. Each experiment (segmenting a polygon into internal triangles) builds into
the next, moving from simple to increasingly more complex gures, but all connected by a structural principle (polygons can be dissected into triangles). Induction allows us, therefore, to discover a hidden principle or pattern by performing
various experiments on mathematical objects in order to esh the principle out.
Does the experiment come to an end? It does not because the proof is based on the
logical principle that if it applies to the nth case and then to the (n + 1)th casethe
one right after itit will establish the pattern without exception.
This is the underlying meta-principle of induction. To see how it works formally, consider the formula for summing a sequence of integers:
Sum(n) = n(n + 1)/2
Unauthenticated
76 | 2 Logic
We start by showing that the formula works for the rst case, that is, for n = 1:
Sum(n) = n(n + 1)/2
Sum(1) = 1(1 + 1)/2 = 1
Sum(1) = 1(2)/2 = 1
Sum(1) = 2/2 = 1
The next step is to show that the formula works for the sum of (n + 1) terms:
Sum(n) = n(n + 1)/2
Sum(n+1) = Sum(n) +(n + 1)
Sum(n+1) = n(n + 1)/2 + (n + 1)
Sum(n+1) = n(n + 1)/2 + 2(n + 1)/2
Sum(n+1) = (n + 1)(n + 2)/2
Sum(n+1) = (n + 1)[(n + 1) + 1]/2
The form of the last formula is identical to the form of the one for Sum(n) . This can
be seen more readily by letting (n + 1) = m:
Sum(n+1) = (n + 1)[(n + 1) + 1]/2
Sum(m) = m[(m + 1)]/2
In this way, we have just shown that the formula is true for (n + 1). Since we can
choose n to be as large as we want, we have proved that the formula can be
applied to any series.
Proof by induction can be compared to the domino effect, whereby a row of
dominoes will fall in succession if the rst one is knocked over. If the (n + 1)th
domino falls, then we can be sure that the (n + 2)th will as well, and so on ad
innitum. Again, the demonstration convinces us because the assumption is that
logical structure is like a game with the moves of the pieces in this case seen to
go on forever. Note that the way in which an inductive proof progresses is also
sequential, albeit in a different way from deductive proof.
Within the sequence of any proof (deductive or inductive), the choice of the
parts does not come from some pre-established set of statements concatenated
mechanically, but as a result of insight thinking. In the rst proof above, the key
insight was that parts of intersecting lines can be combined to show that they
are equal. This was not a predictable aspect of the proof; it came from an insight
based on previous knowledge (the number of degrees in a straight line). In the
polygon proof the insight was that a polygon can be divided into constituent triangles. Insight thinking of this kind is neither deductive nor inductive; it was called
Unauthenticated
77
abductive by Charles Peirce (19381951), dened as using hunches based on previous knowledge and experience that are mapped onto the problem at hand. So,
deduction and induction may indeed reveal how formal logic works, but they
also show that logic itself is, paradoxically, guided by an inferential and more
creative form of thought.
This is why a proof, like any text, will have many forms, subject to the inventiveness of the proof-maker. Moreover, the proof-maker might also have to devise
variants of a specic proof or else come up with a new type of logic to carry out
some new demonstration. Already Euclid was faced with several propositions that
he could not be prove deductively or inductively. So, he resorted to an ingenious
kind of logic, known as reductio ad absurdum. He used it to prove several important theorems, including the one that prime numbers are innite. Another
important proof was that irrationals were different from rationals. Euclid started
by noting that the general form of a rational number is p/q (q = 0). So, if 2 could
not be written in the form p/q, then we would have shown that it was not a rational. He did this by assuming the opposite, namely that the number 2 could
be written in the form p/q and then he went on to show that this would lead to a
contradiction.
Using a contemporary form of the proof, it proceeds like this. We start by
squaring both sides of the equation:
2 = p/q
(assumption)
(2)2 = (p/q)2
Therefore:
2 = p2 /q2
We multiply both sides by q2 :
2q2 = p2
Now, p2 is an even number because it equals 2q2 , which has the form of an even
number. So, p = 2n. Lets add this to the sequence of moves:
2q2 = p2
Since p = 2n:
2q2 = (2n)2 = 4n2
Therefore:
2q2 = 4n2
This equation can be simplied by dividing both sides by 2:
q2 = 2n2
Unauthenticated
78 | 2 Logic
This shows that q2 is an even number, and thus that q itself is an even number.
It can be written as 2m (to distinguish it from 2n): q = 2m. Now, Euclid went right
back to his original assumptionnamely that 2 was a rational number:
2 = p/q
In this equation he substituted what he had just proved, namely, that p = 2n and
q = 2m:
2 = 2n/2m
2 = n/m
Now, the problem is that we nd ourselves back to where we started. We have
simply ended up replacing p/q with n/m. We could, clearly, continue on indenitely in this way, always coming up with a ratio with different numerators and
denominators: 2 = {n/m, x/y, . . . }. We have thus reached an impasse, caused
by the assumption that 2 had the rational form p/q, and it obviously does not,
because it produces the impasse. Thus, Euclid proved that 2 is not a rational
number by contradiction. The relevant feature here is that the proof is also sequential but it doubles back on itself, so to speak. The way the proof text is laid
out uses deductive logic, but the key insight comes from assuming that it produces
an absurdity. Much like an ironic text in language, this method of proof convinces
us through a kind of logical irony. As a matter of fact, this method of proof was
devised originally by one of the greatest ironists of ancient philosophy, Zeno of
Elea with his paradoxes. As Berlinski (2013: 83) observes, it assigns to one half
[of the mind] the position he wishes to rebut, and to the other half, the ensuing
right of ridicule.
Are the methods of proof truly reective of the laws of thought, or are they
a matter of historical traditions and much creative thinking? For one thing, not
all cultures in antiquity had a similar view of proof. The Greek approach has remained the central one in mathematics for several reasons: it seems to be effective
in translating practical knowledge into theoretical knowledge uidly; and, more
importantly, it was Greek mathematics that made its way to medieval and Renaissance Europe, where it was institutionalized into the discipline of mathematics
itself. Of course, there can be mathematics without proof, and there can be mathematics with different kinds of proof. But, somehow, the Greek approach has remained entrenched in the mindset of mathematicians. It is undeniably powerful.
Take the Pythagorean theorem. It is not just a recipe of how to construct any right
triangle; it is a model of space, since it tells us that certain spatial relations are the
way they are because of a hidden logical structure inherent in them.
Proofs of the same theorem have been found in many parts of the ancient
world (from China to Africa and the Middle East) long before the Pythagoreans put
Unauthenticated
79
forward their own (Bellos 2010: 53). The archeological discovery of a Babylonian
method for nding the diagonal of a square suggests that the theorem was actually known one thousand years before Pythagoras (Musser, Burger, and Peterson
2006: 763). Actually, Pythagoras left no written version of the proof (it is described
through secondary sources). Many historians of mathematics believe that it was
a dissection proof, similar to the one below. First, we construct a triangle where
a2 + b2 = c2 . Then, we construct a square with length a + b (the sum of the lengths
of the two sides of the triangle). This is equivalent to joining four copies of the
triangle together in the way shown by the diagram:
a
b
a
c
c
c
a
b
Figure 2.4: Dissection proof of the Pythagorean theorem
The area of the internal square is c2 . The area of the large square is (a + b)2 , which is
equal to a2 + 2ab + b2 . The area of any one triangle in the square is ab. There are
four of them; so the overall area covered by the four triangles in the large square
is: 4 ( ab) = 2ab. If we subtract this from the area of the large square, [(a2 + 2ab +
b2 ) 2ab], we get a2 + b2 . This corresponds to the area of the internal square, c2 .
Using the axiom of equality, proof is now complete: c2 = a2 + b2 .
Many different kinds of proof of this theorem have been devised over the centuries. As Raju (2007) has argued, this shows that proof is not a closed system of
logic, but varies considerably. A broader view of proof, as Selin (2000) suggests,
will show that the acceptance of the Euclidean methods was due to the inuence
of the Graeco-Roman way of doing science on the Renaissances revival of knowledge and on the subsequent Enlightenment. The work in ethnomathematics is
showing, in fact, that cultures play major roles in determining how proof is understood and used (Ascher 1991, Goetzfried 2007). As Stewart (2008: 34) puts it,
proof is really a text, a mathematical story whose parts form a coherent unity:
What is a proof? It is a kind of mathematical story, in which each step is a logical consequence of the previous steps. Every statement has to be justied by referring it back to
previous statements and showing that it is a logical consequence of them.
Unauthenticated
80 | 2 Logic
The invention of proof is generally attributed to the philosopher Thales around

600 BCE (Maor 2007). Euclid demonstrated 467 propositions of plane and solid
geometry in his Elements with various kinds of proofs, as we saw. He nished
his proofs with QED, as it was later translated in Latin. The letters stood, as is
well known, for Quod erat demonstrandum (which was to be demonstrated)
remaining the symbolic hallmark of what mathematical proof is all about to
this day. The American philosopher Susanne Langer (1948) referred to the linearsequential construction of texts as a discursive process and to the overall meaning
we get from them as a presentational process. The former has the salient feature of detachment, which means that the constituent parts can be considered
separatelyfor example, one can focus on a specic statement in a proof, detaching it from its location in the proof text, without impairing the overall
understanding of the text. In contrast, the conclusion of a proof is presentational,
since it cannot be detached from the entiretyit emanates from the connectivity
of the parts.
Umberto Eco (1992) has identied two main types of text, which he calls
closed and open. Proofs, such as the ones above are closed textsthat is,
they lead to one and only one conclusion. Open texts do not and may, in fact, not
have any conclusion. These are sometimes called conjectures and they are truly
problematic for mathematicians, since they imply that maybe proof is not the
only way to get at mathematical truths. One of these is Goldbachs Conjecture. In
a letter to Euler in 1742, Christian Goldbach conjectured that every even integer
greater than 2 could be written as a sum of two primes:
4=2+2
6=3+3
8=5+3
10 = 7 + 3
12 = 7 + 5
14 = 11 + 3
16 = 11 + 5
18 = 11 + 7
No exception is known to the conjecture, but there still is no valid proof of it. Goldbach also hypothesized that any number greater than 5 could be written as the
sum of three primes:
Unauthenticated
81
6=2+2+2
8=2+3+3
7=2+2+3
9=3+3+3
10 = 2 + 3 + 5
11 = 3 + 3 + 5
Again, there is no known proof for this conjecture. From a practical perspective,
a proof for the conjectures may be unnecessary anyhow, for it would probably
not change anything in mathematics in any signicant way. But, mathematicians
continue to search for a proof, perhaps because it is part of the Euclidean game
that they continue to play. Proofs are convincing because like any closed text, they
provide closure. However, it seems that not all truths can be proved with the Euclidean rules of the game. As the Greek geometers, including Euclid, knew, some
constructions turn out to be impossible (squaring the circle, for instance). So, as
Peirce (19311958) often wrote, logic is useful to us because we can use it to explain
our practical mathematical know-how, but it may not apply to all mathematics
(Sebeok and Umiker-Sebeok 1980: 4041).
2.1.3 Consistency, completeness, and decidability

A valid proof has what mathematicians call consistency, completeness, and decidability. The latter term simply refers to the fact that a proof is possible in the rst
place. Consistency implies that the statements of a proof hold together to produce
an inescapable conclusion. Completeness refers to the fact that there is nothing
more to add to the proof. Formal mathematics is basically an investigation of these
criteria. The starting point for the investigation is the syllogism. It is a model of
the presence of completeness, decidability, and consistency in a logic text. It is
worthwhile going through another illustrative syllogism here (see also chapter 1):
Major premise:
Minor premise:
Conclusion:
All humans are mortal.

Kings are human.
Kings are mortal.
The major premise states that a category has (or does not have) a certain characteristic and the minor premise that some object is (or is not) a member of that
category. The conclusion then affirms (or negates) that the object in question has
the characteristic. By simply replacing the specic referents with letter symbols,
Unauthenticated
82 | 2 Logic
we get a generalized picture of the logic involved: = all, H = set of humans, M =
set of mortals, K = set of Kings, = is a member of):
Major Premise:
Minor Premise:
Conclusion:
H M
KH
KM
Any member of the set with the H trait also has the M trait. If K is a member of H,
then we conclude that K is also a member of M. It is the process of substituting
symbols that shows why this is so in an abstract way. The syllogism shows that
the conclusion is decidable. It also shows consistency and completeness.
The syllogism remained the basis for formal mathematical analysis well
into the nineteenth century. Bertrand Russell wanted to ensure that its structure
would always allow mathematicians to determine which conclusions are valid (or
provable) and which are not (Russell 1903, Russell and Whitehead 1913). Using
a notion developed two millennia earlier by Chrysippus of Soli, Frege (1879)
had suggested that circularity (the nemesis of consistency) could be avoided by
considering the form of propositions separately from their content. In this way,
one could examine the consistency of the propositions without having them refer
to anything in the real world. As we saw (chapter 1), Freges approach inuenced
Wittgenstein (1921), who used symbols rather than words to ensure that the form
of a proposition could be examined for logical consistency separate from any
content to which it could be applied. If the statement it is raining is represented
by the symbol p and the statement it is sunny by q, then the proposition
it is either raining or it is sunny can be assigned the general symbolic form
p q (with = or). A proposition in which the quantier all occurs would be
shown, as indicated above, with an inverted . If the form held up to logical
scrutiny, then that was the end of the matter. Undecidability and circularity stem,
Wittgenstein affirmed, from our expectation that logic must interpret reality for
us. But that is expecting way too much from it. Wittgensteins system came to be
known as symbolic logicpregured actually by Lewis Carroll in his ingenious
book The Game of Logic (1887).
As discussed briey in the previous chapter, Russell joined forces with Alfred
North Whitehead to produce his masterful treatise, the Principia mathematica, in
1913. His objective was, as mentioned, to solve the problem of circularity, such
as the classic Liar Paradoxa dilemma that goes back to the fth century BCE,
when a host of intriguing debates broke out throughout Greece over the nature
and function of logic in philosophy and mathematics. Prominent in them were
the philosopher Parmenides and his disciple Zeno of Elea. The latter became famous (or infamous) for his clever arguments, called paradoxes, that seemed to
Unauthenticated
| 83
defy common sense. The story goes that one of the most vexing of all the paradoxes concocted during the debates, known as the Liar Paradox, was uttered by
Protagoras. Its most famous articulation has been attributed, however, to the celebrated Cretan poet Epimenides in the sixth century BCE:
The Cretan philosopher Epimenides once said: All Cretans are liars. Did Epimenides speak the truth?
The paradox lies in the fact that the statement leads to circular reasoning, not to
a conclusion as in a syllogism. It is a menacing form of logic, because it suggests
that circularity might be unavoidable and that some statements are undecidable.
It thus exposes syllogistic logic as being occasionally useless. The source of the
circularity in the paradox is, of course, the fact that it was Epimenides, a Cretan,
who made the statement that all Cretans are liars. It arises, in other words, from
self-referentiality. Russell found the paradox to be especially troubling, feeling
that it threatened the very foundations of logic and mathematics. To examine the
nature of self-referentiality more precisely, he formulated his own version, called
the Barber Paradox:
The village barber shaves all and only those villagers who do not shave themselves. So, shall he shave himself?
Let us assume that the barber decides to shave himself. He would end up being
shaved, of course, but the person he would have shaved is himself. And that contravenes the requirement that the village barber should shave all and only those
villagers who do not shave themselves. The barber has, in effect, just shaved
someone who shaves himself. So, let us assume that the barber decides not to
shave himself. But, then, he would end up being an unshaved villager. Again
this goes contrary to the stipulation that he, the barber, must shave all and only
those villagers who do not shave themselvesincluding himself. It is not possible, therefore, for the barber to decide whether or not to shave himself. Russell
argued that such undecidability arises because the barber is a member of the village. If the barber were from a different village, the paradox would not arise.
Russell and Whitehead (1913) tackled circularity (and by implication, the
undecidability issue) in the Principia. But the propositions they developed led to
unexpected problems. To solve these, Russell introduced the notion of types,
whereby certain types of propositions would be classied into different levels
(more and more abstract) and thus considered separately from other types. This
seemed to avoid the problemsfor a while anyhow. The Polish mathematician
Alfred Tarski (1933) developed Russells theory further by naming each level of in-
Unauthenticated
84 | 2 Logic
creasingly abstract statements a metalanguageessentially, a level of statements

about a lower level statement. At the bottom of the hierarchy of levels are straightforward statements such as Earth has one moon. If we say The statement that
Earth has one moon is true, then we are using a metalanguage. The problem with
this whole approach is, evidently, that more and more abstract metalanguages
are needed to evaluate lower-level statements. And this can go on ad innitum.
In effect, the concept of metalanguage only postpones the decidability issue.
The Principia also addressed the problem of proof as a means of establishing
completeness and decidability for any logical system. How, for example, can we
prove that 1 + 1 = 2, even if we articulate this to be an axiomatic derivation from
previous axioms (as did Peano)? Russell did indeed prove that 1 + 1 = 2, in a way
that at rst seemed to be non-tautological. But, this whole line of reasoning was
brought to an abrupt end in 1931 by Gdel, who showed why undecidability is
a fact of logical systems (previous chapter). In any such system there is always
some statement that is true, but not provable in it. In other words, when mathematicians attempt to lay a logical basis to their craft, or try to show that logic and
mathematics are one and the same, they are playing a mind game that is bound to
come to a halt, as Alan Turing (1936) also argued a few years after Gdels proof.
Turing asked if there is a general procedure to decide if a self-contained computer
program will eventually halt. He concluded that it cannot be decided if the program will halt when it runs with that input. Turing used reductio ad absurdum
reasoning, starting with the assumption that the halting problem was decidable
and constructing a computation algorithm that halts if and only if it does not halt,
which is a contradiction.
In their 1986 book, The Liar, mathematician Jon Barwise and philosopher
John Etchemendy adopted a practical view of the Liar Paradox, claiming that the
mistake of pure logicians is to believe that systems must behave according to their
theories and constructs. As they assert, the Liar Paradox arises only because we
allow it to arise. When Epimenides says, All Cretans are liars, he may be doing
so simply to confound his interlocutors. His statement may also be the result of
a slip of the tongue. Whatever the case, the intent of Epimenidess statement can
only be determined by assessing the context in which it was uttered along with
Epimenidess reasons for saying it. Once such social or psychological factors are
determined, no paradox arises. In other words, it is a pipe dream to believe in
an abstract game of pure logic that plays by its own rules independently of experience and lived reality. Of course, this ingenious solution does not solve the
deeper problem of self-referentiality and, of course, of decidability. How do we
decide if a problem is solvable? How can we construct systems that are complete?
These questions became the sum and substance of post-Gdelian research in formal mathematics, as will be discussed below.
Unauthenticated
| 85
2.1.4 Non-Euclidean logic

Despite Gdels proof, Euclidean methods of logic are still central to mathematical proof because they work for a large set of problems. As Lewis Carroll argued in
1879, Euclids methods are remarkable ones because, for the rst time in human
history, they put at our disposable powerful logical tools for decoding or unraveling the laws of space and quantity. To this day, we feel the persuasive sway of
these methods, as if they held up a mirror to the brain. As Kaufman (2001) has
aptly observed, there is no separation of the human brain and the mathematics of
that brain. Euclids mirror, however, captured only a part of the brain. Over time,
mathematicians came to realize that his methods could not be applied to many
areas of mathematical knowledge. Apart from non-Euclidean geometries, such as
the Riemannian and Lobachevskian ones, mathematicians started devising new
methods of proof to cover the gaps left by classical Euclidean methods.
A modern example of the breakaway from these methods, as mentioned
briey in the previous chapter, is the Four-Color Problem, proved by Wolfgang
Haken and Kenneth Appel (1977, 1986, 2002), after the problem had deed solution
for nearly a century. In its simplest form, it reads as follows:
What is the minimum number of tints needed to color the regions of any map
distinctively? If two regions touch at a single point, the point is not considered a
common border.
Working at the University of Illinois, Haken and Appel published a demonstration
in 1976 that did not use any method within the traditional Euclidean proof system,
but rather a computer program. The program, when run on any map, has never
found a map that requires more than four tints to color it distinctively (Laurence
2013). It has been called proof by exhaustionthat is, the computer algorithm devised for it has never produced, and it seems highly improbable that it will, an
exception to the conjecture. But is it really proof (Tymoczko 1978)? It certainly
is not a logical proof in the Euclidean sense. It consists of a set of computer instructions, not a set of statements (axioms, propositions, and previously-proved
theorems) laid out in such a way that they lead inescapably to the conclusion. As
Peirce (19311958), who came under the spell of the Four Color Problem, so aptly
put it in a lecture he delivered at Harvard in the 1860s, the problem is so infuriating precisely because it appears to be so simple to prove, and yet a proof for it
with the traditional methods of logic seems to be elusive. If it is truly a proof, then
the Haken-Appel algorithm constitutes a veritable paradigm shift in mathematical
method. Because the algorithm cannot be examined in the same way that proofs
can, many mathematicians feel uneasy about it. As Appel and Haken themselves
Unauthenticated
86 | 2 Logic
admitted (2002: 193): One can never rule out the chance that a short proof of the
Four-Color Theorem might some day be found, perhaps by the proverbial bright
high-school student.
Proof by computer raises fundamental epistemological questions for formal
mathematics. Above all else, it raises issues about the larger question of decidability, as Fortnow (2013) has cogently argued. The gist of Fortnows argument
can be paraphrased as follows. If one is asked to solve a 9-by-9 Sudoku puzzle,
the task is considered to be a fairly simple one. The complexity arises when asked
to solve, say, a 25-by-25 version of the puzzle. And by augmenting the grid to 1000by-1000 the solution to the puzzle becomes gargantuan in terms of effort and time.
Computer algorithms can easily solve complex Sudoku puzzles, but start having
difficulty as the degrees of complexity increase. The idea is, therefore, to devise
algorithms to nd the shortest route to solving complex problems. So, the issue of
complexity raises the related issue of decidability, since there would be no point
in tackling a complex problem that may turn out not to have a solution. If we let P
stand for any problem with an easy solution, and NP for any problem with a difficult complex solution, then the whole question of decidability can be represented
in a simple way. If P were equal to NP, P = NP, then problems that are complex (involving large amounts of data) could be tackled easily as the algorithms become
more efficient (which is what happened in the Four-Color solution). The P = NP
problem is the most important open problem in computer science and formal
mathematics, as will also be discussed in the next chapter. It seeks to determine
whether every problem whose solution can be quickly checked by computer can
also be quickly solved by computer. Work on this problem has made it evident that
a computer would take hundreds of years to solve some NP questions and sometimes go into a loop (the halting problem). Indeed, to prove P = NP one would
have to use, ironically, one or more of the classic methods of proof. We seem to be
caught in a circle where algorithms are used to determine some proofs and vice
versa, some proofs are used to determine some algorithms.
So, is a computer algorithm a proof? And what does it tell us about mathematical statements? It is certainly logical, because the algorithm is a text consisting of
sequential instructions, revealing the same kind of sequential structure that traditional proofs have but with a different language. In other words, computer logic is
really a type of language that involves nite-state (closed) systems of instruction,
like a Turing machine or a Markov chain. The algorithm is a nished product; the
process of arriving at it is still inferential-abductive in the same way that traditional proofs are. Is a simple deductive proof of the Four-Color Theorem hidden
in the computer instructions somewhere? Can it be extracted and reformulated in
more traditional ways through abduction?
Unauthenticated
87
Today, proof by computer is part of a complex repertoire of formal proofs

accepted by mathematicians. The quest to nd answers to conundrums is what
drives mathematicians, rather than attempting merely to develop a formal architecture for conducting mathematics. A typology of proofs that are considered to
be standard in mathematics is the following one:
Deduction: applying axioms, postulates, established theorems, and other
mathematical facts to prove something as being true; its main structure is
syllogistic and is implanted on a simple axiom of logic: if X is Y, and Y is Z,
then X is Z
Induction: showing that something must be true in all cases by considering
specic cases: if it is true for n and then (n + 1), it is true for all cases
Transposition: showing that something is a valid replacement for some statement leading to an implication: if P implies Q, then not-Q implies not-P, and
vice versa
Contradiction (reductio ad absurdum): showing that a statement is contradictory and thus false so that its opposite is considered to be the only possible
alternative
Construction: constructing a diagram to exemplify that some pattern exhibits
a property
Exhaustion: performing a large number of calculations without ever nding a
contradictory result
Probabilistic (Analogical): Proving something by comparing it to something
known and thus assuming equivalence through a probability metric
Nonconstructive: explaining that a mathematical property must be true even
though it is not possible to isolate it or explain it
Statistical (Experimental): using statistics to show that a theorem is likely to
be true within a high degree of probability
Computer: using computer algorithms to perform an exhaustive proof
The history of proofs is the history of mathematics. The more elusive a proof is,
the more it is hunted down, even if it may seem to have no implications above
and beyond the proof itself. Consider the conjecture identied by Henri Poincar
in 1904if any loop in a certain kind of three-dimensional space can be shrunk to
a point without ripping or tearing either the loop or the space, then the space is
equivalent to a sphere. Poincar suggested, in effect, that anything without holes
has to be a sphere. Imagine stretching a rubber band around a ball (a sphere).
The band can be contracted slowly, so that it neither breaks nor loses contact
with the ball. In this way the band can be shrunk to become a point. The band
cannot be shrunk to a point if it is stretched around a doughnut, whether around
the hole or the body. It can be done, however, with any topological equivalent of
Unauthenticated
88 | 2 Logic
a ball, such as a deformed melon, a baseball bat with bulges, and the like. The
surface of the ball, but not of the doughnut, is simply connected. Any simply
connected two-dimensional closed surface, however distorted, is topologically
equivalent to the surface of a ball. Poincar wondered if simple connectivity characterized three-dimensional spheres as well. His conjecture was nally proved by
Russian mathematician Grigory Perelman in 2002, posting his solution on the Internet (OShea 2007, Gessen 2009). It is much too complex to discuss here (being
over 400 pages). Suffice it to say that a logical diagnosis of the proof shows that it
involves many kinds of logic and inferential processesanalogies, connections,
hunches. As Chaitin (2006: 24) observes, mathematical facts are not isolated,
they are woven into a spiders web of interconnections. And as Wells (2012: 140)
aptly states:
Proofs do far more than logically certify that what you suspect, or conjecture, is actually
the case. Proofs need ideas, ideas depend on imagination and imagination needs intuition,
so proofs beyond the trivial and routine force you to explore the mathematical world more
deeplyand it is what you discover on your exploration that gives proof a far greater value
than merely conrming a fact.
What seems certain in all this is, as iterated throughout this chapter, that our
brain might indeed possess the faculty that the Greeks called lgos. Analyzing
mathematics as a practical activity is not sufficient, in the same way that it is not
sufficient to study language just as a communicative activity. In both cases we seek
to understand the faculty (mathematics or language) as some faculty of the brain.
This means converting practical into theoretical knowledge. It is the conversion
process that is relevant here, since it is part of how the brain makes discoveries.
The practical knowledge of knotting patterns that produced a right triangle was
not enough for the Greeks; they wanted to understand why these were true in
an abstract way. So, they took the rst step in establishing mathematics as an
explanatory, rather than just utilitarian, discipline for use in everyday life. Proof
solidies a utilitarian practice by demonstrating that it ts in with the logic and
logistics of established ideas.
2.1.5 Cantorian logic

Perhaps the most salient manifestation of non-Euclidean proof-making can be
found in the demonstrations related to innity by Georg Cantor (1874) in the nineteenth centuryproofs which were totally mystifying in that era because they
apparently deed traditional logic and the common sense that was associated
with it. Although well known, it is worth going through the background and the
Unauthenticated
89
kind of proofs that Cantor introduced into mathematics, given that they laid out
the rudimentary principles of an emerging set theory in his era and, and given
that, as Lakoff argued about the Gdelian proofs (previous chapter), they can be
used to pinpoint the areas of connectivity between mathematical and linguistic
(metaphorical) thought.
The type of proof that Cantor used was a one-to-one correspondence proof.
Following Lakoff, it can be said in hindsight, that Cantor utilized a metaphorical
blendthat is, a form of proof that amalgamates two seemingly separate domains
and putting them together to produce insight. Actually, for the sake of historical
accuracy, the kind of thinking that Cantors proof displays can be found in an
observation made by Galileo, who suspected that mathematical innity posed
a serious challenge to common sense. In his 1632 Dialogue Concerning the Two
Chief World Systems he noted that the set of square integers can be compared,
one-by-one, with all the whole numbers (positive integers), leading to the incredible possibility that there may be as many square integers as there are numbers
(even though the squares are themselves only a part of the set of integers). How
can this be, in view of the fact that there are numbers that are not squares, as the
following comparison of the two sets seems to show? The bottom row of the comparison simply contains the integers that on the top row are also squares. So, for
instance, 2 is not also a square, but 4 is, since it can be broken down into 22 .
The comparison thus shows the relevant gaps between the top row (the complete
set of integers) and the bottom row (the subset of square integers):
Integers = 1
10
11
12
Squares = 1
Figure 2.5: Initial correspondence of the set of integers with the set of square numbers
As one would expect, this method of comparison shows that there are many more
blanks in the bottom set (the set of square integers), given that it is a subset of the
top set (the set of whole numbers). So, as anticipated, this proves that the set of
whole numbers has more members in it than the set of square numbers. But does
it, asks Galileo? All one has to do is eliminate the blanks and put the top numbers
in a direct one-to-one correspondence and we get an incredible result.
This shows that no matter how far we go down along the line there will never
be a gap. All we have to prove this is to use induction. If we stop at, say, point n
on the top row and nd that below it the point is n2 , all we have to do is go to
the next point (n + 1) and check if the bottom point is (n + 1)2 and thus induce
the fact that this will indeed go on forever. But this is hardly all there is going on
Unauthenticated
90 | 2 Logic
Integers = 1
10
11
12
Squares = 1
16
25
36
49
64
81
100
121
144
22
32
42
52
62
72
82
92
102
112
122
12
Figure 2.6: Second correspondence of the set of integers with the set of square numbers
cognitively here. It can, in fact, be argued that the initial insight comes from an
unconscious conceptual metaphorA line has no gaps vis--vis another parallel
line. Lines are made up of distinct points and the number of these is the same as
it is for any other line of equal length. So, the proof shows that there are as many
squares as integersa totally unexpected result.
As a product of an unconscious conceptual metaphor it tells us a lot more.
Indeed, it allowed Cantor to proceed to prove many more other theorems with the
same kind of logic. The method is by analogy (that is, by ana-logic, the logic of
correspondence) which, as Hofstadter has argued persuasively (Hofstadter 1979,
Hofstadter and Sander 2013), is a powerful force in mathematical and scientic
discoveries.
In 1872, Cantor showed that the same one-to-one correspondence logic can
be used to prove that the same pattern holds between the whole numbers and
numbers raised to any power:
Integers = 1
10
11
12
Powers = 1n
2n
3n
4n
5n
6n
7n
8n
9n
10n
11n
12n
Figure 2.7: Correspondence of the set of integers with the set of positive integer exponents
This simple proof by correspondence put a y in the ointment of classical proof,

so to speak. Cantors argument was, in fact, earth-shattering in mathematical circles when he rst made it public. But it is convincing because it falls within the
parameters of proof as dened in this chapterit is a text that lays out the information cohesively, a layout motivated by a brilliant metaphorical blend.
Because the integers are called cardinal numbers, any set of numbers that
can be put in a one-to-one correspondence with them is said to have the same
cardinality. Cantor used this notion to investigate all kinds of sets and, indeed,
established a basic epistemology for set theory, allowing it to become a major approach in formal mathematics. Amazingly, he also demonstrated, with a variant
of his proof method, that the rationals also have the same cardinality of the count-
Unauthenticated
91
ing numbers. The method is elegant and simple because, again, it comes from a
metaphorical blend. So, instead of putting numbers in a linear one-to-one pattern, he put them into a zigzag diagonal layout. It is not necessary to go through
the proof here, since it is well known. Suffice it to say that we cannot help but be
impressed by the result. When Cantors overall logic is understood, it ceases to
look like the product of the overactive imagination of a mathematical eccentric. It
is indeed logical, in a metaphorical way.
Cantor classied those numbers with the same cardinality as belonging to
the set aleph null, or 0 (the rst letter of the Hebrew alphabet). He called 0
a transnite number. Remarkably, he found that there are other transnite numbers. These constitute sets of numbers with a greater cardinality than the integers.
He labeled each successively larger transnite number with increasing subscripts
{0 , 1 , 2 , }.
2.1.6 Logic and imagination

The emphasis on deductive proof as the primary one in ancient mathematics prevailed for centuries, until the advent of non-Euclidean methods of proof, such as
the Cantorian and the Haken-Appel ones. However, Euclid himself, as we saw, was
a master at using different forms of logic, not just deductive. But in his Elements
he put deduction at the center of the mathematical edice he was buildingan
edice that has in many ways withstood the test of time. In the words of Aristotle (2012: 23): A deduction is speech (lgos) in which certain things having been
supposed, something different from those supposed results of necessity because
of their being so. Deduction, however, is really more of an organizational form
of logic or a logic based on using previous information that can be connected to
a problem at hand. Deductive logic can, however, lead to inconsistencies, as we
have seen several times already.
Moreover, as suggested, abduction plays the key role in the creation of mathematical proofs, not pure deductive or inductive logic. And this means that the
imagination is at the base of how logic is used and elaborated. Peirce 19311958,
volume 5: 180) dened abduction as follows:
The abductive suggestion comes to us like a ash. It is an act of insight, although of extremely
fallible insight. It is true that the different elements of the hypothesis were in our minds
before; but it is the idea of putting together what we had never before dreamed of putting
together which ashes the new suggestion before our contemplation.
So, what does proof prove? Simply put, it shows , or more accurately, convinces
us that something is the way it is. So, it may well be that the many proofs de-
Unauthenticated
92 | 2 Logic
vised by mathematicians do indeed mirror how the brain works. Abduction in

proof-making allows us to connect domains of information in an integrated way,
channeling it to the proof at hand. Logic then enters the picture to organize the
new information in a sequential way that shows consistency and completeness.
When a mathematician solves or proves an intractable problem by essentially
reducing it to a text, the way in which it is done puts the brains many creative
capacities on displayfrom the analogical and metaphorical to the purely deductive. The brain may, in fact, be a mirror organ, as some neuroscientists suggest
(Ramachandran 2011). A mirror neuron is one that res both when an animal acts
and when it observes the same action performed by another. The neuron is thus
said to mirror the behavior of the other, as though the observer were acting. In
humans, brain activity consistent with that of mirror neurons has been found in
the pre-motor cortex, the supplementary motor area, the primary somatosensory
cortex and the inferior parietal cortex. In an extended sense of the term, the capacity of neurons to mirror the world through various complex connective processes
leads to the creation of texts such as proof (in all its variations). This might explain
why we may not at rst understand all the implications that a proof conceals. It is
by unpacking them that they become comprehensible.
As Boole showed in his key work of 1854, logical structure can be reduced to
a simply binary form. Indeed, binary logic (using two symbols, 0 and 1) allows
computers and other electric circuits to work smoothly. Booles approach gave a
new slant to the question of what mathematics is, dovetailing perfectly, both in
time and in mindset, with set theory. Take, for example, the elements of the socalled Cantor set, which Cantor discussed in 1883, consisting of points along the
segment of a line. The Cantor set (T ) is formed by taking the interval [0, 1] in
set T0 , removing the open middle third (T1 ), removing the middle third of each of
the two remaining pieces (T2 ), and continuing this procedure ad innitum. It is
therefore the set of points in the interval [0, 1] whose ternary expansions do not
contain 1, illustrated in the comb-like diagram below:
Figure 2.8: The Cantor set
Repeating the process starting with 1 gives the sequence 1, 101, 101000101,
101000101000000000101000101, Cantors set is a Boolean set, which pregures fractal theory. The set can, in fact, be extended to encompass at surfaces.
The result is called the Sierpinski carpet, named after Waclaw Sierpinski, who
used the Cantor set to generate it:
Unauthenticated
| 93
Figure 2.9: The Sierpinski Carpet
Produced in 1916 it was one of the rst examples of a fractal. Connectivity among
ideas, including forms and rules, is the essence of mathematical thinking, and
thus goes well beyond syllogistic logic. So, what can we conclude, if anything?
One thing is that there is no logic without imagination. It is the latter that likely
spurs mathematicians on to nd things that cannot be proved. There are many
open questions, or conjectures, in mathematics that tantalize the intellect, yet
shut out its logical side. In the 1930s, mathematician Lothar Collatz noticed a pattern. For any number n, if it is even, make it half, or n/2; if it is odd, triple it and
add one, or (3n + 1). If one keeps repeating this rule, we always end up with the
number one. Here is a concrete example:
Example = 12
12/2 = 6
6/2 = 3
(3)(3) + 1 = 10
10/2 = 5
(3)(5) + 1 = 16
16/2 = 8
8/2 = 4
4/2 = 2
2/2 = 1
Is this always the case? Is there a number where oneness is not achieved? There
seems to be some principle in this conjecture that, if unraveled, might lead to deep
discoveries. How do we prove it? There is no known answer. The pattern is there,
but the proof is undecidable. Proof by contradiction, or reductio ad absurdum,
Unauthenticated
94 | 2 Logic
might be useful in this sense or even proof by exhaustion. Something can be either
yes or no, but not both. Aware of this verity, Aristotle claried the connection between contradiction and falsity in his principle of non-contradiction, which states,
simply, that an assertion cannot be both true and false. Therefore if the contradiction of an assertion (not-P) can be derived logically from the assertion (P) it can
be concluded that a false assumption has been used. The discovery of contradictions at the foundations of mathematics at the beginning of the twentieth century,
however, led mathematicians to reject the principle of non-contradiction, giving
rise to new theories of logic, which accept that some statements can be both true
and false.
To unpack the cognitive nature of contradiction, consider a well-known proof
in geometry, namely that for any non-degenerate right triangle, the length of the
hypotenuse is less than the sum of the lengths of the two remaining sides. The
proof relies, of course, on the Pythagorean theorem, c2 = a2 + b2 . The claim is that
a + b > c. As in any proof by contradiction, we start by assuming the opposite,
namely that a + b c. If we square both sides, we get the following:
(a + b)2 c2
or
a2 + 2ab + b2 c2
A triangle is non-degenerate if each side has positive length, so it may be assumed
that a and b are greater than 0. Therefore:
(a + b)2 c2
or
a2 + b2 < a2 + 2ab + b2 c2
The transitive relation can now be reduced to:
a2 + b2 < c2
Since the Pythagorean theorem is a2 + b2 = c2 we have reached a contradiction,
since strict inequality and equality are mutually exclusive. This means that it is
impossible for both to be true and we know that the Pythagorean theorem holds.
Thus, the assumption that a + b c must be false and hence a + b > c, proving the
claim. In abstract terms such a proof can be represented as follows (P = proposition we wish to disprove and S is the set of statements or premises that have been
previously established). We consider P, or the negation of P (P), in addition to S;
if this leads to a logical contradiction F, then we can conclude that the statements
in S lead to the negation of P (P), or P itself.
Unauthenticated
| 95
If
S {P}
then
S P .
Or if
S {P}
then
S P.
Proof in this sense is certainly much broader and exible than it was in classical Euclidean method. Proof by computer, too, is another form of proof that falls
outside the method. By accepting proof by computer, mathematicians have, actually, taken the induction principle one step furtherlet the computer decide if
something is computable, decidable, or not. The computer is a powerful iteration
machine that allows us to look at what happens when some pattern is iterated ad
innitum. Take fractal geometry again. A self-similar shape in this eld is a shape
that, no matter, what scale is used to observe it, resembles the whole thing. The
Mandelbrot set, or M-set, is the most widely known reproduced image in mathematics:
Figure 2.10: The M-Set
The set was generated in the 1980s when computer power to make it possible became available. The mathematics behind the M-set is relatively simple, since it
involves adding and multiplying numbers: z = z2 + c. The key is iterationrules repeated without end. The image of the M-set is a result of iteration. Mandelbrot had
found that for certain values of z the outputs would continue and grow forever,
while for others they shrunk to zero. The M-set therefore emerges as a modelit
denes the boundary limit between two classes of number. Outside the lines are
Unauthenticated
96 | 2 Logic
free z-values bound for innity; inside are prisoners destined for extinction. Incredibly, every object has a fractal dimension, dened as a statistical roughness
measure. Formulas for human lungs, trees, clouds, and so on can be generated
entirely articially based on a measure of their iterative complexity. Fractal geometry thus has emerged as a secret language of nature, telling us that iteration is an
inherent principle in the structure of the universe, at least in some of its parts. It is
amazing to contemplate that a simple logic game played by Mandelbrot has had
so many scientic reications.
2.2 Set theory

Cantors ideas bolstered the trend of using sets to do mathematics that was emerging in his era. Devising efficient ways to represent the intrinsic properties of numbers and other mathematical objects and relations is the sum and substance of
set theory. More importantly for the present purposes, set theory provided a new
and powerful formal apparatus for uniting logic and mathematics. Sets are collections of elements (integers, fractions, and so on) that have been grouped together on the basis of some shared feature or on some principle of classication
(or correspondence as Cantor showed). By doing so, it is easier to understand the
properties themselves independently of the individual elements (the possessors
of the properties)that is, as constituents of groups and elds. Indeed, group and
eld theories in mathematics can be seen to emanate from considering the behavior of numbers in terms of their groupings and distribution over collections.
Initially, the concept of set emerged in the nineteenth century with the ideas of
Augustus De Morgan (1847) who gave a very lucid account of what sets can do for
mathematicians. For example, the letter S might stand for the set of all girls with
straight hair. The letters m, s, and r might then be used to represent three members
of this setMartha, Sarah, and Ruth. So, the set of all girls with straight hair is:
S = {m, s, r, }. The symbol indicates that a member belongs to a certain set. So,
m S, s S, and r S. Some properties overlap among different sets. For example,
set B might consist of all boys with straight hair. Thus, there is an overlap between
B and S (above). The complete set of girls, G, and the complete set of boys, B, of
course will have areas that do not intersect.
Now, all of this seems obvious, but it is in stating the obvious formally that
set theory plays its most important role in formal mathematics. To cover relations
within and between sets, several main types of sets have been identied. Although
these are well known, it is useful to list them here for the sake of convenience:
Unauthenticated
2.2 Set theory |
97
1.
Universal sets consist of all members being considered at any one time. For
example, the set of all the positive integers is {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, }.
2. An innite set contains an endless number of members. The integers, for instance, form an innite set: {1, 2, 3, 4, }.
3. A nite set, on the other hand, has a specic number of members. One such
set is the set of natural single digits including zero: {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}.
4. An empty set, also called a null set, has no members. The symbol 0 is used to
show this set: {0}. An example of an empty set is all odd numbers that end
in 2there is no such set of course.
5. A single element set contains only one member. For example, the set of all
primes less than or equal to 2 contains only 2.
6. Equivalent or equal sets have the same number of members. For instance, the
set of even numbers under ten, {0, 2, 4, 6, 8}, is equal to the set of odd numbers
under ten, {1, 3, 5, 7, 9}.
7. Overlapping sets have some members in common. If the set of last years class
math stars is M1 = {Alex, Sarah, Betty} and the set of this years stars is M2 =
{Alex, Sarah, Tom}, sets M1 and M2 overlap because Alex and Sarah belong to
both sets. This relation between sets is usually shown with intersecting circles
in which the common members are included in the area of overlap: (a = Alex,
s = Sarah, b = Betty, t = Tom):
M1
b
M2
a s
t
Figure 2.11: Overlapping sets
8. Disjoint sets have no members in common. The set of even numbers and the
set of odd numbers are disjoint because they do not have any elements in
common.
9. Subsets are sets contained within other sets. For example, the set of even numbers, E = {0, 2, 4, 6, 8, }, is a subset of the set of all integers, I = {0, 1, 2, 3, 4,
5, 6, }. This is shown with E I.
Such notions clarify many aspects of the logical calculus, showing how different
sets with different members can sometimes interact or not at all. In some ways set
theory is a precursor to logic. In fact, it was developed from Booles symbolic logic
and the theory of sets as developed by De Morgan as a way of using mathematical
symbols and operations to solve problems in logic. Above all else it has shown that
Unauthenticated
98 | 2 Logic
thought might be visual, since set theory is essentially a theory of logic diagrams
that show, rather than tell (so to speak), where and what the logical connections
and patterns are among numbers.
2.2.1 Diagrams
Set theory makes it possible to envision commonality among what would otherwise be seen as disparate elements and to show how these can relate to each other.
Diagrams such as the overlapping circles above are called Venn diagrams, after
British logician John Venn (1880, 1881), who was the rst to use them. These provide visual snapshots of the constitution and operation of sets, bringing out the
logical patterns inherent in them. The translation of sentential (syllogistic) logic
to diagram logic started with Leonhard Euler. Before the advent of Venn diagrams,
Euler represented categorical or sentential statements in terms of diagrams such
as the following, which clearly pregure the Venn diagrams (Hammer and Shin
1996, 1998):
All A are B.
B
No A is B.
A
Some A is B.
A B
Some A is not B.
A
Figure 2.12: Eulers diagrams
The usefulness of the diagrams over the sentential forms lies in the fact that no additional conventions, paraphrases, or elaborations are neededthe relationships
holding among sets are shown by means of the same relationships holding among
the circles representing them. In other words, we do not have to worry about the
various problems that plague syllogistic logic (as discussed); all we have to do is
observe the logical relations through the conguration of the diagrams.
Euler was however aware of both the strengths and weaknesses of diagrammatic representation. For instance, consider the following problematic syllogism:
1.
2.
3.
No A is B.
Some C is A.
Therefore, some C is not B.
Unauthenticated
2.2 Set theory
99
Euler realized that no single diagram could be devised to represent the two
premises, because the relationship between sets B and C cannot be fully specied
in one single diagram. Instead, he suggested three possible cases:
(Case 1)
A C
(Case 2)
A C
(Case 3)
A C
Figure 2.13: Eulers diagram solution
Euler claimed that the proposition Some C is not B can be read from all these diagrams. But it is far from clear which one is best. It was Venn (1881: 510) who tackled
Eulers dilemma by pointing out that the weakness lay in the fact that Eulers
method was too strict. Venn aimed to overcome Eulers dilemma by showing
how partial information could be visualized. So, a diagram like the following one
(which he called primary) does not convey specic information about the relationship between sets A and B:
Figure 2.14: Venns basic diagram
This is not just a clever rewriting of Eulerian logic diagrams; it is different because it does not represent any specic information about the relation between
two sets. Now, for the representation of premises, Venns solution was to shade
them (Venn 1881: 122). With this simple modication, we can draw diagrams for
various premises and relations as follows (see Figure 2.15).
But even this system poses dilemmas. It was Charles Peirce (1931) who pointed
out that it had no way of representing existential statements, disjunctive information, probabilities, and relations. All A are B or some A is B cannot be shown by
either the Euler or Venn systems in a single diagram. But this does not invalidate
diagrammatic representation. It is not possible here to deal with Peirces solution to such logical dilemmas, known as Existential Graph theory (see Roberts
2009). Basically, he showed that the use of diagrams enhanced the power of logical reasoning and especially predicate logic. Like Euler, Peirce saw a diagram as
Unauthenticated
100 | 2 Logic
A
A
A
A
A
A
B
B
B
Figure 2.15: Venn diagrams
anything showing how the parts correlated to each other. This was evident especially in the outline of the diagram, which is a trace to how the thought process
unfolded. In other words, it is a pictorial manifestation of what goes on in the mind
as it grapples with structural-logical information. Graphs thus display the very
process of thinking in actu (Peirce 19311958, vol. 4: 6), showing how a given argument, proof, or problem unfolds in a schematic way (Parker 1998, Stjernfelt 2007,
Roberts 2009). Graphs allow us to grasp something as a set of transitional states.
Therefore, every graph conveys information and simultaneously explains how we
understand it. It is a picture of cognitive processes in action. And it doubles back
on the brain to suggest further information or ideas. The following citation encapsulates Peirces notion of graph. In it, we see him discussing with a general why
a map is used to conduct a campaign (Peirce 19311958, vol. 4: 530):
But why do that [use maps] when the thought itself is present to us? Such, substantially,
has been the interrogative objection raised by an eminent and glorious General. Recluse
that I am, I was not ready with the counter-question, which should have run, General, you
make use of maps during a campaign, I believe. But why should you do so, when the country
they represent is right there? Thereupon, had he replied that he found details in the maps
that were so far from being right there, that they were within the enemys lines, I ought to
have pressed the question, Am I right, then, in understanding that, if you were thoroughly
and perfectly familiar with the country, no map of it would then be of the smallest use to
you in laying out your detailed plans? No, I do not say that, since I might probably desire
the maps to stick pins into, so as to mark each anticipated days change in the situations of
the two armies. Well, General, that precisely corresponds to the advantages of a diagram
of the course of a discussion. Namely, if I may try to state the matter after you, one can make
exact experiments upon uniform diagrams; and when one does so, one must keep a bright
lookout for unintended and unexpected changes thereby brought about in the relations of
different signicant parts of the diagram to one another. Such operations upon diagrams,
whether external or imaginary, take the place of the experiments upon real things that one
performs in chemical and physical research.
Unauthenticated
2.2 Set theory
101
Although mathematicians have always used diagrams in their proofs or various

demonstrations, the use was ancillary and illustrative, unless the proof was a construction one. In set theory, the diagram is fundamental. Cantors demonstrations
were diagrammatic, since they were based on layouts of various kinds.
Eulers set theory was essentially an offshoot of his notion of graph as a mathematical notion, which led to the establishment of graph theory as a branch of
mathematics and as a generating notion in the emergence of topology. Topology
concerns itself with determining such things as the insideness or outsideness of
shapes. A circle, for instance, divides a at plane into two regions, an inside and
an outside. A point outside the circle cannot be connected to a point inside it by
a continuous path in the plane without crossing the circles circumference. If the
plane is deformed, it may no longer be at or smooth, and the circle may become
a crinkly curve, but it will continue to divide the surface into an inside and an outside. That is its dening structural feature. Topologists study all kinds of gures
in this way. They investigate, for example, knots that can be twisted, stretched,
or otherwise deformed, but not torn. Topology was a derivative of Eulerian graph
theory. Richeson (2008: 155) puts it as follows:
The fruitful dialogue about Eulerian and non-Eulerian polyhedra in the rst half of the nineteenth century set the stage for the eld that would become topology. These ideas were
explored further by others, culminating in Poincars marvelous generalization of Eulers
formula at the end of the nineteenth century.
Interestingly, topological theory has become a model of many natural phenomena. It has proven useful, for instance, in the study of the DNA. Stewart (2012:
105) elaborates as follows:
One of the most fascinating applications of topology is its growing use in biology, helping
us understand the workings of the molecule of life, DNA. Topology turns up because DNA is
a double helix, like two spiral staircases winding around each other. The two strands are intricately intertwined, and important biological processes, in particular the way a cell copies
its DNA when it divides, have to take account of this complex topology.
2.2.2 Mathematical knowledge

To summarize the foregoing discussion, proofs, set theory, and formal propositional systems have all been an intrinsic part of how mathematical truths are
established. Mathematical knowledge is part of what the Estonian biologist Jakob
von Uexkll (1909) called the internal modeling system of humans (the Innenwelt), which is well adapted to understanding their particular world (the Umwelt),
producing unconscious models of that world which take physical form in theories,
Unauthenticated
102 | 2 Logic
propositions, rule systems, and the like. The interplay of the Innenwelt with the
Umwelt is what produces knowledge. This interplay is much more complex and
exible than theories of logic have generally allowed, since it includes, as argued
throughout this chapter, inventive and creative processes.
This suggests that mathematics is both invented and discovered. The word
invention derives from Inventio, which in western rhetorical tradition refers to one
of the ve canons used for the elaboration of arguments. More broadly, the word
meant both invention and discovery, indicating that the two are intrinsically intertwined. Discovery comes about through largely creative-serendipitous processes,
whereas invention entails intentionality. For example, re is a discovery, but rubbing sticks to start a re is an invention. The general principles of arithmetic derive
from the experience of counting. Naming the counting signs (numerals) allows us
to turn these principles into ideas that can be manipulated intellectually and systematically. This whole line of thought suggests an anthropic principle, which
states that we are part of the world in which we live and are thus privileged to
understand it best. Al-Khalili (2012: 218) puts it as follows:
The anthropic principle seems to be saying that our very existence determines certain properties of the Universe, because if they were any different we would not be here to question
them.
The question becomes why all this is so. It is one of the greatest conundrums of human philosophy. We could conceivably live without the Pythagorean theorem. It
tells us what we know intuitivelythat a diagonal distance is shorter than taking
an L-shaped path to a given point. And perhaps this is why it emergedit suggests that we seek efficiency and a minimization of effort in how to do things and
how to classify the world. But in so doing we squeeze out of our economical
symbolizations other ideas and hidden truths. To put it another way, the practical
activity of measuring triangles contained too much information, a lot of which
was superuous. The theorem renes the information, throwing out from it that
which is irrelevant. The ability to abstract theories and models from the world of
concrete observations involves the optimal ability to throw away irrelevant information about the world in favor of new information that emerges at a higher level
of analysis (Neuman 2007, Nave Neuman, Howard, and Perslovsky 2014).
In the end, all theories and speculations about the nature of mathematics
are just thatspeculations. It is useful to reiterate them here, using Ren Thoms
(2010: 494) typology:
1. The Formalist Position. Formalists claim that mathematical objects are derivations of rules that cohere logically. This was the stance taken by Russell and
Whitehead.
Unauthenticated
2.3 Formal linguistics
2.
3.
103
The Platonic Position. Platonists claim that mathematical objects have an autonomous existence; the mathematician does not create them; he or she discovers them like an explorer might discover an unknown territory.
The Constructivist Position. Constructivists claim that the mathematician
builds complex mathematical forms from simpler ones and then applies
them within and outside mathematics. The use of mathematics to do things
is a practical outcome of this.
The discussion of what mathematical knowledge is constitutes a self-referential

argument itself. One question (to which we will return in the nal chapter) is
what differentiates mathematics from other faculties (if it is indeed distinct) and
whether or not mathematical knowledge is possessed in some form in other
species. Brain-scanning experiments have shown that number sense is scattered
in various parts of the brain, suggesting that it may overlap with other faculties,
such as language. And, it has become clear that as in other domains of human
representation, mathematical forms cannot be tied down to a specic meaning,
even if they emerge in a particular context. They can be applied time and again
to all kinds of referential domains, known and unknown. We do not know the
meaning of a form until it is contextualized. And as contexts change so too do
the meanings of the forms. Equations, constants, and variables are used over
and over again, acquiring new meanings, new applications especially in the
domain of science. The latter is to mathematics, what speech is to language. To
use Saussurean (1916) terms, science is parole and mathematics its langue. It is in
playing with the langue, applying it, that science doubles back on mathematics,
contributing to our understanding of what mathematics is.

As discussed, in antiquity logic was assumed to be the organizing principle behind both mathematics and grammar. So, it comes as no surprise to nd that many
of the ideas put forward in formal mathematics gradually found their way into
theories of grammar and formal linguistics more generally, which has borrowed
substantively and substantially from formal mathematical methods and theories.
From this, the debate as to whether language and mathematics constitute a single
system or separate ones has emerged, given the many points of contact between
the two systems at a formal level. As Hockett (1967: 6) aptly observed: ultimately
the language-like nature of mathematics is of basic importance, since it is the most
critical clue to an understanding of the place of mathematics in the physical uni-
Unauthenticated
104 | 2 Logic
verse of which mathematics, mathematicians, language, linguists and all of us are

a part.
As mentioned, the goal of formal linguistics is to devise a formal grammar
capable of accounting for how sentences and texts are constructed and subsequently what these reveal about the language faculty in the brain, in the same way
that mathematicians devise propositions and theorems about numbers and geometrical objects and then use these to probe the nature of mathematics. A formal
grammar is dened as a set of rules generating strings (sentences, for example)
in a language that are well-formed, which make it possible to assign meaning
to them. Sentences that are well-formed but have no ascertainable meaning are
called anomalous, as discussed already. The premise is that the meaning of strings
is not relevant to the task of formalizing a grammar. Thus, meaning (semantics)
is either an add-on or a derivative of syntax.
2.3.1 Transformational-generative grammar

The simplest kind of formal grammar is the generative grammar that was elaborated by Chomsky in 1957 and 1965, as already discussed in some detail in the
previous chapter. Given its vital importance to the emergence and evolution of
formal linguistics, it is useful here to revisit the bare elements of the theory. As we
saw, a string such as The boy loves the girl is generated, at one stage of the process,
by a set of phrase structure rules such as the following:
1. S NP + VP (S = sentence, NP = noun phrase, VP = verb phrase)
2. NP Det + N (Det = determiner, N = noun)
3. Det Art (Art = article)
4. Art Def (Def = denite article)
5. Def the
6. N boy
7. VP V + NP (V = verb)
8. V loves
9. NP Det + N
10. Det Art
11. Art Def
12. Def the
13. N girl
The rules that end with an actual lexical item (rules 5, 6, 8, 12, 13) are called lexical
insertion rules. The structural relations of the various parts are shown typically
with a tree diagram:
Unauthenticated
105
S
NP
Det
VP
N
NP
Art
Det
Def
Art
Def
The
boy
loves
the
girl
Figure 2.16: Tree diagram for The boy loves the girl
The diagram shows the hierarchical relation among the symbols in the string.
Each level in the tree is called a Markov state. The input state is S and the output, or end-state, is the string at the bottom of the tree. This version of generative
grammar was also called a state-grammar.
The rules show how a linear string is governed by hierarchical phrase structure and states of generation. Thus, the string The boy loves the girl may appear
linear to the ear or the eye, but it is actually the output of a series of states, specied by rules connected sequentially (one state leads to another) to each other. This
type of diagram was actually introduced by a modern-day founder of linguistics,
Wilhelm Wundt (1880, 1901). Like Chomsky, Wundt saw the sentence as the basic
unit of language. Rules, therefore, are not merely a convenient way of describing
sentence structure, but a formal means of showing how the parts in a sentence
relate to each other in specic ways.
The above rules tell only a part, albeit a central one, of the generation of
sentences. They produce simple declarative, or deep-structure, sentences. A true
theory of grammar would include transformational rules which change deepstructure strings into more complex outputs. So, the passive version of the above
sentence, The girl is loved by the boy, would result from the application of a
transformational rule, such as the one described in the previous chapter. It is the
transformational component of linguistic competence that is language-specic,
and thus produces linguistic diversity in grammars, not the base or deep-structure
component.
There are a number of theoretical issues raised by this early standard form
of transformational-generative (TG) grammar, such as how to determine the sequence of application of transformational rules to an input (originally called a
cycle) and the subsequent assignment of morphological and phonological features to the transformed string by a different set of rules. Suffice it to say that the
Unauthenticated
106 | 2 Logic
distinction between deep structure inputs and surface structure outputs by means
of ordered sets of rules describes the system used by Chomsky sufficiently for our
purposes. The key aspect of the TG model is that of movement from one state
or sets of states to another, as in formal mathematical proofs. Indeed, in early
versions of the theory, the rules were called part of a nite-state system of logic,
meaning that the movement from one state to another came to an end.
In the early model, there are thus two syntactic componentsthe base component (consisting of phrase structure rules) and the transformational component
(consisting of transformational rules), which generate deep and surface structures respectively. Deep structures are seen to be the input to the semantic component, which assigns meaning to the string (via further rules), basically through
lexical insertion and constraints on the insertions from syntactic conditions. The
surface structures that result from the application of the transformations constitute the input to the phonological component, which assigns a phonemic description to the string (also via further rules).
The early theory of TG grammar looked like this:
Sy
nta
Base component
Deep structures
Transformational
component
surface
structures
Semantic
component
Semantic
representation
of sentences
Phonological
component
Phonological representation of sentences
Figure 2.17: Early model of a transformational-generative grammar
The task of the linguist is to specify the rules that are in each of the boxes. These
represent the native speakers linguistic competence because, in knowing how to
produce and understand sentences, the speaker, Chomsky claimed, has an internal representation of these rules. All the linguist is doing is giving form to this
representation. The simple elegance of this early model has been marred since at
Unauthenticated
107
least the mid-1970s, in part by Chomsky himself, who has conceded that there may
be no boundary between syntax and semantics and hence no deep structures, at
least as he originally envisioned them. I actually disagree since the early model is
still useful for describing structural relations among sentences, such as the activepassive one. The problems that emerged subsequently are, to my mind, basically
squabbles that crop up within any theoretical school.
One thing has remained constant, thoughsyntactic rules are the essence of
linguistic competence (the syntax hypothesis). Chomsky claimed, further, that as
linguists studied the nature of rules in different languages they would eventually
discover a universal set of rule-making principles. From this basic planrevised
at various points after the 1965 expos (for example, Chomsky 1966a, 1966b, 1975,
1982, 1986, 1990, 1995, 2000, 2002)formal TG theory took its cue. Basically, a
TG grammar is an approach for devising a set of rules for writing base strings and
transforming them into complex (and language-specic) ones. It is fundamentally
similar to the propositional logic used by mathematicians to indicate how strings
of symbols follow from each other through statements of various kinds. Grammar is thus seen as a generator and the rules as the elements that activate the
generator. Of course, there is little room for phenomena such as grammaticalization where words themselves, if they acquire new functions, trigger grammatical
change; or the fact the communicative competence (parole) may change grammar
in and of itself.
One of the key notions in TG grammar is that of parsing, which is used to
specify how the phrases are composed and what rules are needed to specify their
composition. Parsing is dened as the process of representing a string in terms
of its phrase-structure relations. The meaning of the symbols in a string (input
and output) is considered to result from how the strings are structureda notion
called compositional semantics in later versions of generative grammar. By breaking down a string (parsing it) part by part in its deep structure form, we can determine its meaning. In other words, meaning is dependent on syntax. Although
various factions in the TG grammar movement broke away from this premise, by
and large, meaning has always constituted a difficult problem for this movement.
In my view, compositional semantics with its basis in lexical insertion is the best
t for any version of the theory. In the rules above, called production rules, the
parts that are not lexical are called symbols, including the start symbol (S), until
slots in a string occur whereby insertions from the lexicon occur. For example, an
insertion rule would specic that the verb love cannot be inserted if the preceding
noun phrase is, say, the rock. If the same string is generated by the same set of
rules, production and lexical, then the grammar is said to be ambiguous. Avoiding ambiguity of this type took up a large swath of research activity on the part of
TG grammarians throughout the 1970s and 1980s. Other models have emerged to
Unauthenticated
108 | 2 Logic
connect syntax to semantics but, as it has turned out, these have hardly migrated
to mainstream linguistic practices, indicating that they are relevant only within
the game of generative grammar, to use Colyvans metaphor once again.
Meaning in the sense of language connecting with outside of language referents (social and environmental) is seen to fall literally outside of linguistic theory
proper. It is seen to be part of psychology and pragmatic knowledge, not linguistic
competence per se, and should thus be relegated for study in applied areas, such
as sociolinguistics and psycholinguistics. Linguistic theory is seen as a pristine
theory about linguistic competence, not about the uses and variability of speech.
2.3.2 Grammar rules

The whole edice of TG grammar relies on a specic denition of rule. It is dened
as a formal statement that represents a state in the generation of a string. Rules, as
we have seen, fall into three categoriesphrase structure, transformational, and
phonological. Schematically, these can be described as follows:
1. Phrase structure rule. This is a rewrite rule, designed to parse syntactic categories and their relevant states in the generation sequence: S NP + VP is
thus read as rewrite S as NP followed by VP. Rewrite rules are not commutative.
2. Transformational rule. This rule converts base (deep) structure strings into
surface structure ones via reordering, insertion, deletion, or some other process. It is no coincidence that this is also a basic rule of formal mathematicsa
fact that Chomsky clearly acknowledged early on. In logic, it is a rule that
species in syntactic terms a method by which theorems are derived from the
axioms of a formal system.
3. Phonological rule. This is a rule that species phonemic and phonetic operations that are involved in realizing the surface structure string in physical
form.
The overall rule architecture of a formal grammar, G, can be characterized (somewhat reductively) as possessing the following rule-making principles, symbols,
and operations:
a start symbol S, also called the sentence symbol
a nite set, N, of nonterminal symbols that is disjoint with the strings generated by G: nonterminal symbols are, for example, NP, VP, Det, and so on
a nite set, , of terminal symbols, as for example the lexemes inserted above
in the phrase structure rules of the sentence
Unauthenticated
109
a nite set, P, of production rules, with each rule having the Kleene Star form:
( N)* N(( N)* ( N))*; the Kleene star operator is a set of instructions for mapping symbols from one string to another; these include phrase
structure and transformational systems
If is an alphabet (a set of symbols), then the Kleene star of , denoted *, is
the set of all strings of nite length consisting of symbols in , including the
empty string.
The concept of Kleene Star operator is basic to this type of rule systemit too is a
direct adoption from formal mathematics. If S is a particular set of strings, then the
Kleene star of S, or S*, is the smallest set of S that contains the empty string and is
closed under the string concatenation operationthat is, S is the set of all strings
that can be generated by concatenating strings in S. Below are some examples
({} = empty set):
1. 0* = {}, since there are no strings of nite length consisting of symbols in 0,
so is the only element in 0*
2. If, say, E = {}, then E* = E, since a = a = a by denition, so =
3. If, say, A= {a}, then A* = {, a, aa, aaa, }.
4. If = {a, b}, then * ={, a, b, aa, ab, ba, bb, aaa, }
5. If S = {ab, cd}, then S* = {, ab, cd, abab, abcd, cdab, cdcd, ababab, }
With this set of meta-rule-making principles, which are really the rules of combinatory algebra, it is possible to write the phrase structure grammar of any
language. Differences among languages occur at the transformational level; that
is, languages are differentiated by the kinds of transformation rules applied and
used, not by phrase structure. The grammar now can be dened in terms of
how strings relate to each other. The system in its entirety is rather complex and
need not be detailed here. The upshot is that grammars are built from a small
set of meta-rule-making principles, becoming complex through derivational and
transformational processes.
For instance, consider the grammar of a hypothetical language, L, made up of
N = {S, B) and = {a, b, c}, S the start symbol, and the following phrase structure
or simply production (P) rules:
1. S aBSc
2. S abc
3. Ba aB
4. Bb bb
Now, L can be dened as L = {an bn cn | n 1} where an denotes a string of consecutive as, bn , a string of consecutive bs, and cn , a string of consecutive cs. L is the
Unauthenticated
110 | 2 Logic
set of strings that consist of one or more as followed by an equal number of bs

and then cs. Examples of strings generated by this grammar are as follows:
1. S abc
2. S aBSc
3. S aBabcc
4. S aaBB
5. S aaaBbbccc
6. S aaabbbccc
Now, we can assign to rule (6) a phonological description (in the case of a natural
language) or a logical propositional structure in the case of some mathematical
function such as the composition of a digit in some numeral system. Phonemes
are thus the isomorphic equivalents of digits in this model. Overall these metarules can be applied to language or mathematics equally as part of the generation
of forms. Again, the problem lies in what they mean, as we shall see.
2.3.3 Types of grammar

Formal linguistics gained momentum when Chomsky classied grammars into
types in the mid-1950s, known as the Chomsky hierarchy. The difference between types lies in the fact that as production grammars develop they become
more efficient in the use of rules, somewhat exemplifying an inherent principle
of economythe more efficient (less symbolic material used) a rule system is
the more elegant or interesting it is considered to be (as Chomsky often
phrased it). This parallels the belief by mathematicians that proofs can also be
more elegant or interesting when compared to each other in terms of how they economically (compactly) present their arguments. Elegant proofs are also thought
to reveal much more about the propositions to be proved. So, rules are not just
rules in formal mathematics or formal linguisticsthey are subject themselves to
a metric of economy or optimality.
Two types of grammars that have been studied extensively within formal linguistics are context-free and regular grammars. Parsers for these grammars are
expected to be economical and can be easily developed to generate strings ad innitum. A context-free parser is a set of rules in which the left-hand side of each
rule consists of single nonterminal symbols, some of which end with terminal
(insertion) symbols. The production rules above are typical examples. Another
example is the following:
Unauthenticated
1.
2.
3.
4.
5.
6.
7.
111
AB+C
BF+G
F a (terminal)
G b (terminal)
CD+H
D c (terminal)
H d (terminal)
Research on context-free grammars has shown that these do not generate all kinds
of strings required by both natural and articial languages. The articial language
L = {an bn cn | n 1} above is not a context-free language, since at least one symbol
(for example, a) is followed by the same number of another symbol (for example, b).
In the set of production rules for regular grammars the same constraint of a
single nonterminal symbol on the left-side holds but, in addition, the right-hand
side is also restricted. It may also contain an empty string, a single terminal symbol, or a single terminal symbol followed by a nonterminal symbol. Rules in a
regular grammar might look like this:
1. S aA
2. A aA
3. A bB
4. B bB
5. B (terminal)
Many variations and extensions of these rule-making principles now exist in the
relevant literature. They have been developed not only by linguists but also by
computer scientists to generate actual language samples. Indeed, the latter eld
is the one that has most benetted by the work in formal grammars, applying the
rules of natural language grammars to the construction of articial languages.
One of the claims of formal grammarians generally is that language in its deep
structure is based on the principle of recursion. In mathematics, a classic example of recursion is the Fibonacci sequence, Fn = Fn1 + Fn2 , which generates the
following well-known sequence:
{1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, . . . } .
The recursion formula provides a snapshot of the internal structure of the sequence. An analogous claim is made by formal grammarians who indicate that a
recursive grammar is actually the key to unlocking the UG in the brain, explaining why we feel that some sentences are genuine, while others are not. However,
although such snapshots may be useful for relating the words in strings to each
Unauthenticated
112 | 2 Logic
other via grammatical categories; they hardly tell us what generates or triggers
the recursive rules themselves in the rst place. Aware of this, Chomsky has
suggested that the rules explicate only the ways in which sentences are formed
mentally and then realized physically in real grammars. One can infer the former
from the latter. He introduced the distinction between language in general, which
he calls I-language, and languages in particular, which he calls E-languages, in
order to make this point. Chomsky put forward the notion of a UG to explain
the I-language, explicating why children learn to speak so effortlessly without
trainingwhen the child learns one fact about a language, the child can easily
infer other facts without having to learn them one by one. Differences in language
grammars are thus explainable as choices of rule types, or parameters. From
recursive patterns observed in these we come to understand the role of recursion
in the UG.
But, then, this solution begs the fundamental question of deciding which
sentences are basic (in the I-language) and which ones are contextualized adaptations (in E-languages). It is beyond the scope of the present discussion to deal
with the relevant arguments for and against UG theory. The theory is implanted
on the view that recursion in the I-language reects the nature of recursion in the
UG. Although this may be somewhat reductive, overall it captures the gist of this
line of formal grammar research. It also implies that meaning has no effect on the
I-language, since it is an innate logical form. Meaning is a product of external factors in the formation of E-languages. And this need not concern grammariansit
is something for psychologists and philosophers to gure out.
Let us look a litle more closely at the concept of recursion. Essentially, it is
dened as the process whereby a procedure goes through one of the steps in the
procedure, evoking the procedure itself. The procedure is a set of steps based on
a set of rules. Chomsky applied this mathematical notion to natural language in
1965, in reference to the embedding of clauses within sentences. Thus, two distinct
sentences(1) You see that boy; (2) That boy is my grandsoncan be embedded
into each other to produce The boy who you see is my grandson by a recursive rule.
Chomsky calls this particular model of language X-Bar Theory. If we let, x and y
stand for two grammatical categories, and x-bar and y-bar for the corresponding
grammatical phrases, Chomsky claims that rule x-bar x + y-bar is the underling
recursive principle of language. Take, as an example, the sentence The clock is in
the corner. X-Bar Theory would analyze this sentence (schematically at least) as
follows:
Deep structure recursion principle:
x-bar x + y-bar
Unauthenticated
113
Surface rule:
x-bar = n-bar = noun phrase (the clock, the corner)
y-bar = p-bar = prepositional phrase (in the corner)
where:
n = noun (clock, corner)
p = preposition (in)
Structure of The clock is in the corner:
n-bar n + p-bar p + n-bar n
Supplemented with an appropriate system of transformational rules that assign
word order and sentence relations, Chomsky maintains that X-Bar Theory is
sufficient to explain the basic blueprint of language. If Chomsky is right, then,
the uniqueness of language comes down to a single rule-making principle that
species how word order develops. But then how would X-Bar Theory explain
languages in which word order is virtually irrelevant? Many critics have, in fact,
argued that languages such as Classical Latin do not display any evidence of recursion, because they encode grammatical relations by means of inection, that
is, by variations or changes that their words undergo to indicate their relations
with other words and changes in meaning. Chomsky has countered that one of
the word combinations in a language such as Latin or Russian is a basic one and
the others are its transformations. But deciding which one is basic is problematic,
given that all sentence permutations are perceived as basic by Latin or Russian
speakers according to the context in which each one is uttered: that is, the choice
of one or the other word order depends on stylistic, communicative, and other
types of factors, not on syntactic ones.
Recursion is certainly an operative principle in the structure of grammar, but
does it really explain language? As Daniel Everett (2005) has shown, albeit controversially, recursion may not be a universal feature after all since it is absent from
the Pirah language, spoken by the people of Amazones in Brazil. The reason,
according to Everett, is that cultural factors have made recursion unnecessary.
This does not minimize the importance of recursion in rule systems, including
grammatical ones, but it may well be a human invention, not an innate faculty
of the mind. That is, it is our way of formalizing repeating forms that come under
our observation. Information is highly recursiveideas built within other ideas
ad innitum. But this raises the question of what information is and what our
theories of information are all about. There is no proof that recursion is an inbuilt
property of information systems, but rather that it is a useful construct to describe
certain patterns within certain kinds of information. Moreover, the main feature
of information-processing is the discarding of information, as discussed. One of
Unauthenticated
114 | 2 Logic
the main tasks of the brain is to eliminate information that is either irrelevant or
else unrelated to what we need to extract from it. So, rules are really just responses
to how we select from information what we need or what we believe is relevant.
Rules are interpretations, not absolute statements of fact.
Moreover, the connection between linguistic competence and performance
is rarely, if ever, taken into consideration by formal grammarians, even though,
as most other approaches to language would now sustain, the use of language
is governed by features of communication that may themselves initiate change in
language grammars. Grammar is just one of the ways that allows people to express
their concepts of the world, not a hard-wired innate faculty organized into modules in the brain (Fodor 1983). Language draws upon general cognitive resources
to make sense of the world. The assumption of formal grammarians, on the other
hand, is that the essence of linguistic competence is an abstract sense of grammar,
not a sense of meaning.
2.3.4 Formal semantics

Formal grammar has purportedly provided a blueprint for describing what Pinker
(1994) calls the mechanisms that characterize the language instinct that leads
to language sense. This is dened narrowly within formal linguistics as a capacity
for grammar in itself that is innate and emerges in childhood to guide the childs
construction of the specic grammar that he or she needs to communicate in context. The role of the lexicon in formal grammars, as discussed, is to provide a list
of morphemes and lexemes (terminal symbols) that are inserted into the slots of
the terminal string. Presumably, in the formalist paradigm, lexical insertion can
be used to describe how children try out lexemes in slots until they get the right
one to match the situation.
Lexical insertion is thus guided by semantic principles that allow for the
lexicon to be organized and made ready for insertion. In early lexical semantics
this was accomplished via distinctive features. In order to understand the reason
for this it is instructive to step back momentarily and look at the whole concept of
distinctive features in linguistics, since it provides the rationale for its use in generative grammar. As is well known it was the Prague School, and especially Roman
Jakobson, who developed distinctive feature analysis (see Jakobson, Karcevskij,
and Trubetzkoy 1928, Jakobson 1932, Trubetzkoy 1936, 1939, 1968, 1975, Jakobson,
Fan, and Halle 1952, Jakobson and Halle 1956). These linguists wanted to determine which features of sound are critical in both setting up phonemic status and
predictable allophonic variation. For instance, the difference between the two allophones of /p/ ([p] and [ph ]) is to be located in the fact that one is aspirated. If
Unauthenticated
115
we represent this feature with the symbol [+aspirated], we can now specify the
difference between the two allophones more precisely[ph ] is marked as [+aspirated] and [p], which does not have this feature, as [aspirated]. The [aspirated]
symbol is a distinctive feature.
In effect, all linguistic units can be described in terms of distinctive features.
This includes the lexicon, whose units can be specied in terms of features that are
mapped against the structural prole of strings or slots in rules. It is a particular
kind of dictionary that contains not only the distinctive-feature specication of
items, but also their syntactic specication, known as subcategorization. Thus,
for example, the verb put would be subcategorized with the syntactic specication that it must be followed by a noun phrase and a prepositional phrase (I put
the book on the table). It cannot replace loving for this specic syntactic reason.
On the other hand, love would also t nicely into the same slot (I love the book on
the table). It is thus irrelevant what the verbs mean, as long as they are mapped
correctly onto strings via the rules of insertion. The different meanings of the two
sentences, due to different lexical insertions, are seen as being determined by extralinguistic socio-historical conventions of meaning, not by internal processes of
language.
Lexical insertion involves its own hierarchical structure and set of rules. For
example a verb such as drink can only be preceded by a subject that is marked
as [+animate] (the boy, the girl, and so on). If it is so marked, then it entails further feature-specication in terms of gender ([+male], [+female]), age ([+adult],
[adult]), and other similar notions. An example of how the lexicon would classify
the four lexemes man, boy, woman, girl is the following tree diagram:
person
[+animate]
[+male]
[+female]
[+adult]
[adult]
[+adult]
[adult]
man
boy
woman
girl
Figure 2.18: Lexical tree diagram
Any violation of lexical subcategorization (using a subject NP or other subject

nonterminal symbol marked as [animate] with the verb drink) would lead to an
anomalyThe house drinks wine. This implies that the lexicon is more than just
dictionary knowledge of words and their meanings; it includes syntactic and morphological knowledge as well.
Unauthenticated
116 | 2 Logic
The formalization of the role of lexical knowledge in formal linguistics comes

generally under the rubric of formal semantics (initially called generative semantics). Its goal is not only to describe the structure of subcategorization and lexical
insertion, including the set of distinctive features that can be deemed as universal,
but, as claimed, to understand linguistic meaning in the abstract by constructing
formal models of the lexical rule-making principles that underlie the construction
of well-formed sentences (Bach 1989, Cann 1993, Benthem and Meulen 2010). The
starting point for formal semantics is Richard Montagues (1974) demonstration in
the late 1950s that English could be treated like a formal language in logic. Known
as Montague Grammar, his framework, along with Bar-Hillels (1953) categorial
grammar, became the basis for the development of formal semantics. The most
important feature of Montague and categorial theory is the principle of compositionality, which asserts that the meaning of the whole is a function of the meanings of its parts via rules of combination. From this, idiomatic expressions such as
wild-goose chase or tip of the iceberg are seen as unitary lexical items, that is, as
singular lexemes that are composed of separate parts that cannot be isolated as in
literal speech. The idea is to formally connect syntax and semantics, by arguing
that a formal mathematical language is needed to describe natural language in the
same way that predicate logic is needed to describe the structure of mathematics.
Using basic set theory, Montague argued that natural language expressions
are sets of compositional features, not models of reality. If there is a rule that
combines the verbs walk and sing, there must be a corresponding rule that determines the meaning as the intersection of two sets. Consequently, the meaning
of walk and sing is a subset of the meaning of walk. Thus, Mary walks and sings
implies that Mary is an element of the set denoted by the verb phrase. So Mary
walks and sings entails logically Mary walks. The derivational history of a phrase
plays a role in determining its meaning, constrained by the production rules and
the tree structure of a sentence.
It is not essential to go into the complex details of a Montague Grammar, since
most of Montague theory is now considered to be pass. The main point to be made
here is that meaning is tied to the derivational (generative) history (sequence of
states) of stringsan idea that remains constant across formal semantic theories.
A basic critique of this model is that it is limited to describing lexical insertion
in declarative or literal sentences. It cannot treat metaphor, analogy, or intentionality in discourse. Formal semanticists counter that seemingly variable sentences
(such as questions, idioms, and gurative constructions) are really declarative
sentences in disguise that have become this way by the application of compositional rules. Formal semanticists thus see discourse as extended sentence structure, whereby the sentences in a discourse text are interpreted one by one and put
together into compositional wholes (Kamp 1981, van Eijck and Kamp 1997). This
Unauthenticated
117
gluing together of the parts comes under the name of Glue Theory, a rather appropriate term (Dalrymple, Lamping, and Saraswat 1993, Dalrymple 1999, 2001).
The claim is that meaning composition in any context (from the sentence to the
discourse text) is constrained by a set of instructions, called meaning constructors,
stated within a formal logic, which states how the meanings of the parts of a sentence can be combined to provide the meaning of the sentence or set of sentences.
The idea of compositionality was discussed in a detailed fashion even before
TG grammar by Bar-Hillel (1953), who used the term categorial grammar to characterize the process. A categorial grammar assigns a set of types (called categories)
to each basic nonterminal symbol, along with inference rules, which determine
how a string of symbols follows from constituent symbols. It has the advantage
that the inference rules can be xed, so that the specication of a particular language grammar is entirely determined by the lexicon. Whereas a so-called lambda
calculus (which is essentially the name of the types of rules used by formal grammarians) has only one type of rule, A B, a categorial grammar has two types:
(1) B/A, which describes a phrase that results in a phrase B followed on the right by
a phrase of type A; (2) A\B, which describes a phrase of type B when preceded on
the left by a phrase of type A. The formalization of types of categorial grammars is
known as type-logical semantics or Lambek calculus (Lambek 1958, Morrill 2010).
Although some valid arguments have been put forward in defense of compositionality concerning its psychological basis, many formal semanticists have by
and large kept their distance from it. The principle is seen as simply explaining
how a person purportedly can understand sentences he or she has never heard
before. However, Schiffer (1987) showed how this is a spurious argument. He illustrates his case with the following sentence: Tanya believes that Gustav is a dog.
Compositionality can never account for the content of Tanyas belief (given that
dog has various references). Partee (1988) counters that Schiffer did not distinguish between semantic and psychological facts. Formal semantics, she claimed,
provides a theory of entailment and this, in itself, cannot be excluded from any
viable theory of language understanding.
Despite Partees counter-argument, there is very little going on in this area of
formal semantic study today, perhaps because when one has come to the specication of the rules of production, compositionality, or lexical insertion, there is
very little left to do. On the other hand, some linguists now claim that the whole
approach was misguided from the outset. But this would constitute a baby-andthe-bathwater counter-argument. One of the achievements of formal grammar
and formal semantics is that linguists have become more aware of the logical
structure of grammar and, perhaps, of discourse. It remains to be seen how far
this insight can go with the ongoing research in cognitive linguistics and discourse
theory generally.
Unauthenticated
118 | 2 Logic
2.4 Cognitive linguistics

TG grammar constituted a mainstream approach in theoretical linguistics from
the early 1960s through to the 1980s. Nevertheless, it was challenged almost from
the outset. One of the reasons for this was, as Margaret King (1992) has cogently
argued, that at no time did it provide an empirical validation of its ideas. A major
critique to formalism also came from the so-called cognitive linguistic movement
that surfaced in the 1970s. The basic idea in the movement is that meaning cannot
be relegated to subcatgorization or compositional rules, but rather that it is intrinsic in the very make-up of sentences and texts. Formal semantics, as discussed,
never considered the possibility that the meaning of strings is greater the sum of
the parts, much like formal mathematics started doing, following on the coattails
of Cantor and Gdel. The early cognitive linguists argued, moreover, that the conceptualization of grammar as a set of rules that generate strings may be irrelevant
in the description of actual grammars themselves, since it ignores the relation
between changes in grammar due to changes in use and meaning.
An early counter-response to the challenge of cognitive linguistics and its
threat to overturn the whole generative system came from Sperber and Wilsons
1986 book, Relevance, Communication, and Cognitiona book widely endorsed by
generative linguists given that it essentially saw culture and its meaning structures as products of the same kinds of rule principles that governed formal grammars and their extension into discoursea composition of a composition. Sperber
and Wilson assumed, like Grice (1975), that communication (both verbal and nonverbal) required the ability to attribute mental states to others and thus to the
intrinsic rules in peoples minds that described these statesthat is why people
understand each other. They did not completely reject the idea that communication was conditioned by contextual and historical factors. But although they argued that context played a role in understanding communicative texts, in the end
they connected speakers to meaning structures not through context but through
implicatures (as in formal semantics). Their main claim was that this is how people found relevance in linguistic statementshence the term Relevance Theory to
describe their theory.
The y in the ointment here is the dominance of gurative language in discourse. Formalists have typically excluded such language as being idiomatic (thus
based on external factors, rather than internal structural ones) or else have treated
it with the mechanics of compositionality theory. The latter would explain an expression such as connecting the dots as a simple lexemic unit. Cognitive linguists,
on the other hand, would see it as resulting from mapping one domain into another (connecting dots on a visual diagram onto mental processes). The key to
Unauthenticated
119
understanding how this occurs is the notion of conceptual metaphor, discussed

schematically in the previous chapter.
2.4.1 Conceptual metaphors

The cognitive linguistic movement has established beyond any doubt that gurative language is hardly idiomatic or exceptional, but rather that it is systematic.
It is thus essential to clarify the notion of metaphor in cognitive linguistics. In
rhetoric there are many kinds of tropes or gures of speech and metaphor is considered to be simply one of them. But in Conceptual Metaphor Theory (CMT)
various tropes are seen as manifestations of metaphorical reasoning, rather
than as separate gures of speech. Thus, for example, personication (My cat
speaks Italian, Mystery resides here, etc.) is viewed as a specic kind of conceptual metaphor, in which people is the source domain: for example animals are
people, ideas are people, and so on. However, some of the traditional rhetorical
categoriesonomatopoeia, metonymy, synecdoche, and ironycontinue to be
viewed as separate tropes and thus treated separately. This distinction need not
concern us here. The diagram below shows how the various tropes are now treated
in cognitive linguistics. In all cases, there is a general structure: (1) A is B denes
metaphor; (2) A (a part of B) stands for B is metonymy; and so on.
Figures of Speech
Apostrophe
Hyperbole
Metaphor
Metonymy
Personification Other
Oxymoron
Synechdoche Other
Figure 2.19: Figures of speech
Conceptualization is guided by image schemas (Lakoff 1987). For example, the

sense of up and down is a mental schema derived from experiencing this sense
in the real world; it then guides the conceptualization of a host of ideas and beliefs
that are felt to implicate it in some imaginary (metaphorical) way. Here are a few
examples.
Unauthenticated
120 | 2 Logic
Conceptual metaphor
happiness is up
sadness is down
more is up
less is down
Linguistic metaphor
My grandson is nally feeling up after a long bout with
stress.
But I am feeling down, since I have way too much to do.
Our family income went up considerably last year.
But her salary went down.
The image schema is a blending mechanism, which amalgamates concrete experience with abstraction. In an early version of CMT, the formation of conceptual
metaphors was seen as a mapping process, whereby the elements in a source domain were mapped onto the target domain via image-schematic mechanisms. The
mapping was not seen as exclusive to language, but also as guiding representational practices in general. Consider the concept of time in English. Common
conceptual metaphors of time include source domains such as a journey (Theres
a long way to go before its over), a substance (Theres not enough time left to nish
the task), a person (Time comes and goes), and a device (Time keeps ticking on),
among others. These source domains manifest themselves as well in representations such as mythical gures (Father Time), narratives (The Time Machine, 1895,
by H. G. Wells), and others. So, CMT became a broad movement because of the
fact that it provided a means of linking the internal system of language to external systems of representation. To the best of my knowledge, this had never been
accomplished before in a systematic descriptive way.
More technically, the process constitutes a blend (as already discussed) which
involves several components. There is a generic space, as it is called, which guides
the mapping between the target and source domains, called a diagrammatic
mapping. The image schema undergirds the diagrammatic mapping through its
content which comes from the imagic mapping of sensory perception. This
produces the blend and thus metaphor, which is a conceptual blend that results
from the integration of the various components (see Figure 2.20).
In this revised model, mapping is part of blending. Other conceptual structures also result from the latter process (for example, metonymy and irony), but
each in a different way. Mapping best describes metaphor, whereas a part-for-thewhole blend best describes metonymy. The notion of conceptual metaphor has
had far-reaching implications. Substantive research has come forward to show
how conceptual metaphors coalesce into a system of cultural meanings that inform representations, symbols, rituals, activities and behaviors. Lakoff and Johnson (1980) called this coalescence idealized cognitive modeling (ICM). This is dened as the unconscious formation of over-arching models that result from the repeated blending of certain target domains with specic kinds of source domains.
Unauthenticated
| 121
Generic space
DIAGRAMMATIC
MAPPING
DIAGRAMMATIC
MAPPING
Image schema
DIAGRAMMATIC
MAPPING
Source
Target
Image content
IMAGIC
MAPPING
Sensory perception
Blend
CONCEPTUAL MAPPING
AND INTEGRATION
METAPHOR
Figure 2.20: Image schemas, mapping and metaphor
To see what this means, consider the target domain of ideas. The following conceptual metaphors, among others, are used in English to deliver the meaning of
this concept (from Danesi 2007):
ideas are food
1. My profs ideas left a sour taste in my mouth.
2. I always nd it hard to digest her ideas at once.
3. Although she is a voracious reader; she cant chew all the complex ideas in
that book.
4. She is always spoon-feeding her students.
ideas are persons
5. Freud is the father of modern psychology, isnt he?
6. Some medieval ideas continue to live on even today.
7. Quantum mechanics is still in its infancy.
8. Maybe we should resurrect Euclidean geometry.
9. She breathed new life into logical methods.
Unauthenticated
122 | 2 Logic
ideas are fashion

10. Formalism went out of style several years ago.
11. Quantum physics is at the avant-garde of science.
12. Those ideas are no longer in vogue.
13. The eld of cognitive science has become truly chic, academically.
14. That idea is an old hat.
ideas are buildings
15. That idea is planted on solid ground.
16. That is a cornerstone idea of modern-day biology.
17. That is only a framework for a new theory.
18. That theory is starting to crumble under the weight of criticism.
ideas are plants
19. That idea has many ramications.
20. How many branches of knowledge are there?
21. That theory has deep historical roots.
22. That idea has produced many offshoots.
ideas are commodities
23. That idea is worthless.
24. You must package your ideas more attractively.
25. Youll be able to sell your ideas easily.
ideas are geometrical gures
26. That idea is rather square.
27. His ideas are parallel to mine.
28. His ideas are diametrically opposite to mine.
29. Whats the point of your idea?
ideas can be seen
30. I dont see what that idea is about.
31. I cant quite visualize what you mean by that idea.
32. Let me take a look at that theory.
Now, the constant mapping of such source domains onto common discourse produces, cumulatively, an ICM of ideas, that is, an array of source domains that
can be accessed separately, in tandem, or alternatively to discuss ideas of various
kinds, and to represent them in different but interconnected ways. So, for example, a sentence such as I see that your idea has many ramications, given that it is
Unauthenticated
123
on solid ground can be described as having been constructed by enlisting three of

the above source domains that make up the ICM of ideas (seeing, plants, buildings):
ICM (ideas) = {seeing, plants, buildings, }
The importance of blending in language and mathematics will be discussed in the
nal chapter, since it seems to connect the two systems cognitively and culturally.
For now, it is sufficient to note that research is starting to establish blending as a
kind of over-arching process in the brain that connects various faculties in imaginative ways.
2.4.2 Challenge to formalism

The response of formalists to CMT has been that, while it is an interesting way
to describe some aspects of language, it is essentially trivial. Their main counterargument is that it does not penetrate the power of syntactic rules to create language forms ad innitum (the syntax hypothesis). But cognitive linguists have
responded by showing how blending processes actually shape the structure of
grammar. Consider the use of snake in the sentences below (from Sebeok and
Danesi 2000):
1.
2.
He snaked his way around the issue.

In fact, he has a snaky way of doing things.
These are linguistic metaphors based on the conceptual metaphor people are animals. In (1), the latter concept can show up as a verb, if it is the snakes movements
that are implicated; in (2) it manifests itself as an adjective, if it is a quality of the
snake that is implicated instead. The two different grammatical categories can
be seen to reect different nuances of metaphorical meaning. Work has shown
that such lexicalizations are common in grammars throughout the world (Cienki,
Luka, and Smith 2001). Differences in word order, too, can often be traced to conceptual distinctions. In Italian, for instance, the difference between the literal and
metaphorical meaning of an adjectival concept is often reected by the different
position of the adjective in a noun phrase:
1.
2.
Lui un uomo povero (Hes an indigent man).

Lui un povero uomo (Hes a forlorn man).
In the rst example it is the literal meaning of povero that is reected in the noun
phrase by the post-positioning of the adjective with respect to the noun. In the
Unauthenticated
124 | 2 Logic
second one the metaphorical meaning of povero is brought out by means of its prepositioning with respect to the noun, alerting the interlocutor in an anticipatory
fashion to this meaning.
Ronald Langacker (1987, 1990, 1999) has argued that the parts of speech themselves are the result of specic image schemas working unconsciously. Nouns, for
instance, encode the image schema of a region. Thus, a count noun such as leaf is
envisioned as referring to something that encircles a bounded region, and a mass
noun such as rice a non-bounded region. Now, this difference in image schematic
structure induces grammatical distinctions. Thus, because bounded referents can
be counted, the form leaf has a corresponding plural form leaves, but rice does
not. Moreover, leaf can be preceded by an indenite article (a leaf ), rice cannot.
In research on the worlds languages, these examples come up constantly. The
research also shows that not all languages use the same classication system of
nouns. The reason for this has a basis in historical context. In Italian, grapes is a
mass noun, uva, perhaps because the fruit plays a key role in Italian culture (not
only as a fruit but as part of wine-making and other activities).
It is worth noting that, even before the advent of cognitive linguistics, the
Gestalt psychologists were seriously entertaining the possibility that many concepts were indeed metaphorical in origin. Rudolf Arnheim (1969: 242), for example, explained the raison dtre of function words such as prepositions and conjunctions as the result of image schemas (before the use of that term):
I referred in an earlier chapter to the barrier character of but, quite different from although, which does not stop the ow of action but merely burdens it with a complication.
Causal relations are directly perceivable actions; therefore because introduces an effectuating agent, which pushes things along. How different is the victorious overcoming of a
hurdle conjured up by in spite of from the displacement in either-or or instead; and
how different is the stable attachment of with or of from the belligerent against.
The gist of the research in cognitive linguistics, therefore, suggests that grammar
and meaning cannot be separated. Montague tried to get around this critique before the advent of the cognitive linguistic movement in several ways, as we saw,
and Sperber and Wilson added the idea of relevance as being implicit in the application of the rules. For Chomsky (2000, 2002) the crux to understanding language
continues to be the syntax hypothesis, with meaning embedded in syntax. Cognitive linguists view the whole situation in reversesyntax is embedded in meaning
processes.
In sum, in treating linguistic knowledge as a form of everyday knowledge encoded into words and larger structures, the cognitive linguistic movement is a
radically different one from formalism, and poses a strong challenge to the latter. In response, formal grammarians have developed sophisticated counterargu-
Unauthenticated
2.5 Formalism, logic, and meaning
| 125
ments, claiming that words themselves are without meaning: they have, at best,
internal representations of meaning, which are really just ways of using words in
previously-derived strings of symbols. Along these lines, they argue that compositionality can be extended to discourse texts. Today, neuroscientic research is
being used more and more to resolve the debate. When a metaphor is produced,
different regions of the brain are activated in tandem, as fMRI studies have shown.
For instance, Prat (2012: 282) investigated the neural correlates of analogical mapping processes during metaphor comprehension by subjects using the fMRI technique. Prat explains his experiment and ndings as follows:
Participants with varying vocabulary sizes and working memory capacities were asked to
read 3-sentence passages ending in nominal critical utterances of the form X is a Y. Processing demands were manipulated by varying the preceding contexts. Three gurative conditions manipulated difficulty by varying the extent to which preceding contexts mentioned
relevant semantic features for relating the domains of the critical utterance to one another.
In the easy condition, supporting information was mentioned. In the neutral condition, no
relevant information was mentioned. In the most difficult condition, opposite features were
mentioned, resulting in an ironic interpretation of the critical utterance. A fourth, literal
condition included context that supported a literal interpretation of the critical utterance.
Activation in lateral and medial frontal regions increased with increasing contextual difficulty. Lower vocabulary readers also had greater activation across conditions in the right
inferior frontal gyrus. In addition, volumetric analyses showed increased right temporoparietal junction and superior medial frontal activation for all gurative conditions over
the literal condition. The results from this experiment imply that the cortical regions are
dynamically recruited in language comprehension as a function of the processing demands
of a task. Individual differences in cognitive capacities were also associated with differences
in recruitment and modulation of working memory and executive function regions, highlighting the overlapping computations in metaphor comprehension and general thinking
and reasoning.
In reviewing the fMRI studies on metaphor, Wang and Daili (2013) concluded,
however, that the results are not always this clear; they tend to be ambiguous,
albeit promising. In the context of the present discussion, their review nevertheless points out that metaphor can no longer be relegated to subsidiary status in a
theory of language.
2.5 Formalism, logic, and meaning

Despite the serious challenge from cognitive linguistics, there is little doubt that
formalism, in linguistics and mathematics, still has relevance. The question of
why a proof works or why a system of grammatical rules produces well-formed
sentences that can be modeled on computers is still an important one. As will be
Unauthenticated
126 | 2 Logic
discussed in the next chapter, formalism has had important applications to articial intelligence research and robotics. Language development in children, for
example, has been modeled in robots in order to test the validity of rule systems
and how these operate algorithmically. Interestingly, robots have been found to
develop word-to-meaning mappings without grammatical rulesa very enigmatic
nding to say the least. Algorithms can also be devised to model trends in data and
create reliable measures of similarity among natural textual utterances in order to
construct more reliable rule systems. Without formal approaches, the vastly complex information present in discourse data would have remained inaccessible to
linguists. With the proliferation of the Internet and the abundance of easily accessible written human language on the web, the ability to create a program capable
of reproducing human language on a statistical analysis of the data would have
many broad and exciting possibilities.
In the early 1970s the American linguist Dell Hymes (1971) proposed that
knowledge of language entailed more than linguistic competence, or languagespecic knowledgeit also entailed the ability to use language forms appropriately in specic social and interactive settings. He called this kind of knowledge
communicative competence, a term that has since become central in the study of
language. Hymes also maintained that such competence was not autonomous
from linguistic competence, but, rather, that it was interrelated with it. Moreover,
the words used in conversations are cues of social meanings, not just carriers
of lexical and grammatical information. To carry out a simple speech act such
as saying hello requires a detailed knowledge of the verbal and nonverbal cues
that can bring about social contact successfully. An infringement or misuse of
any of the cues will generally lead to a breakdown in communication. Every
conversation unfolds with its own kind of speech logicthat is, with its own
set of assumptions and implicit rules of reasoning that undergird its sequence,
form, and overall organization (Danesi and Rocci 2000). So, if we have learned
anything from the history of formal mathematics and linguistics it is that a pure
abstract theory of language or mathematics is an ideal, not a reality. Saussures
and Chomskys articial dichotomy between langue and parole is ill-founded,
as it turns out. Reconnecting the two through a study of meaning structures, as
in CMT, is the way in which progress towards answering the basic question of
what language is can be achieved. This has become evident even in computational models of Natural Language Processing, as will be discussed in the next
chapter.
Unauthenticated
2.5 Formalism, logic, and meaning | 127
2.5.1 A Gdelian critique

Perhaps the greatest challenge to strict formalism in linguistics is the same one
that was faced by strict formalism in mathematics, although it has never been
explicated in this way, to the best of my knowledge. It can be called a Gdelian
challenge, after Gdel showed that every propositional system in mathematics
is undecidable. It is worthwhile revisiting Gdels challenge here for the sake of
argument. Before Gdel, it was taken for granted that every proposition within a
logical system could be either proved or disproved within it. But Gdel showed
that this was not the case. Invariably, a logical system of propositions (rules) contains a proposition within it that is true but unprovable. Gdels argument is far
too complex to be taken up in an in-depth manner here. For the present purposes,
it can be condensed as follows (from Danesi 2002: 146; see also Smullyan 1997):
Consider a mathematical system that is both correctin the sense that no false statement is
provable in itand contains a statement S that asserts its own unprovability in the system.
S can be formulated simply as: I am not provable in system T. What is the truth status of S?
If it is false, then its opposite is true, which means that S is provable in system T. But this
goes contrary to our assumption that no false statement is provable in the system. Therefore,
we conclude that S must be true, from which it follows that S is unprovable in T, as S asserts.
Thus, either way, S is true, but not provable in the system.
Turings 1936 paper, published shortly after Gdels, also proved that in logical
systems some objects cannot be computed, which is another way of saying that
they are undecidable. An undecidable problem in computer science is one for
which it is impossible to construct a single algorithm that always leads to a correct
yes-or-no answer. This notion became an important early insight for determining
what could be programmed in a computer.
By extension, one can claim that any formal grammar will have a Gdelian
aw in it. Finding the undecidable proposition or rule in a formal grammar has
never been undertaken, as far as I know. But my guess is that it can be found with
some effort.
The Gdelian critique of formal grammar does not mean that formal approaches should be discarded. On the contrary, the efforts of formal linguists,
like those of mathematical logicians, have not been without consequences. As
mentioned, they have had applications in computer programming. But when it
comes to natural language, formal grammar theories break down because they
have never been able to account for meaning in any successful way. Simply put, in
human language strings of symbols involve interpretations of what they mean, not
just a processing of their sequential structure as in computer software. And those
interpretations come from experience that emanates from outside the strings.
Unauthenticated
128 | 2 Logic
2.5.2 Connecting formalism and cognitivism

As Yair Neuman (2014: 2627) has argued, formal approaches are products of the
reective mind (lgos) that aims to understand pattern on its own terms. Since
antiquity, it has allowed us to go from concrete (practical) modes of knowing to
abstract ones:
To identify a general patterna Gestalt, which is an abstraction of concrete operations
we need some kind of powerful tool that may help us to conduct the quantum leap from one
level of operating in this world to another level of operating in this world. Bees, for instance,
create a wonderful geometrical pattern when building their beehive. A spider weaving its
web was a source of amazement for the old geometricians. Neither the bee nor the spider
have ever developed the mathematical eld known as Group Theory, which is the abstract
formulation of group transformations and that can point at the deep level of similarity
between different geometrical patterns.
This dynamic between form and meaning was studied deeply by Vygotsky (1961:
223) who understood that they are really inextricable, and that when we speak we
are really involved with meaning and thought in tandem:
A word without meaning is an empty sound: meaning, therefore, is a criterion of word, its
indispensable component. But from the point of view of psychology, the meaning of every
word is a generalization or a concept. And since generalizations and concepts are undeniably acts of thought, we may regard meaning as a phenomenon of thinking. It does not
follow, however, that meaning formally belongs in two different spheres of psychic life. Word
meaning is a phenomenon of thought only in so far as speech is connected with thought and
illuminated by it. It is a phenomenon of verbal thought, or meaningful speecha union of
word and thought.
It should be mentioned initially here that neuroscientists are coming closer and
closer to accepting the cognitive linguistic work as being real in a psychological
sense; although contrasting work on the neuroscience of logic is also highly interesting and suggestive (for example Houd and Tzourio-Mazoyer 2003, Krawczyk
2012, Monti and Osherson 2012, Smith et al. 2015). A notion that has come forth
to attempt a compromise between formalism and cognitivism in both language
and mathematics is that of network. In previous work (Danesi 2000), this notion
was used to exemplify how various forms of language had a branching structure
to produce integrated layers of meanings. So, the meaning of cat is something
that can only be extrapolated from the network of associations that it evokes, including mammal, animal, organism, life, whiskers and tail. This has a denotative
branching structure within the network. By adding metaphorical branches (as in
Hes a cool cat and The cat is out of the bag), the network is extended to enclose
gurative and other kinds of meanings.
Unauthenticated
2.5.3 Overview
As argued in this chapter, formalist approaches are important in many ways. But
they are always fraught with challenging paradoxes. A classic one is the Unexpected Hanging paradox (a paradox to which we will return in subsequent chapters). It goes somewhat like this:
A condemned logician is to be hanged at noon, between Monday and Friday. But he is not
told which day it would be. As he waits, the logician reasons as follows: Friday is the nal
day available for my hanging. So, if I am alive on Thursday evening, then I can be certain
that the hanging will be Friday. But since the day is unexpected, I can rule that out, because
it is impossible. So, Friday is out. Thus, the last possible day for the hanging to take place
is Thursday. But, if I am here on Wednesday evening, then the hanging must perforce take
place on Thursday. Again, this conicts with the unexpectedness criterion of the hanging.
So, Thursday is also out. Repeating the same argument, the logician is able to rule out the
remaining days. The logician feels satised, logically speaking. But on Tuesday morning he
is hanged, unexpectedly as had been promised.
This is a truly clever demonstration of how one can reason about anything, and
yet how the reasoning might have nothing to do with reality. Are formalist theories
subject to the Unexpected Hanging paradox? Aware of the profoundly disturbing
aspect of this line of reasoning, David Hilbert (1931) put forth a set of requirements
that a logical theory of mathematics should obey. Known as Hilberts program, it
was written just before Gdels theorem as a framework for rescuing mathematics
from what can be called the Unexpected Hanging conundrum. Hilberts program
included the following criteria which, as we have seen throughout this chapter,
make up the underlying paradigm of formalism:
1. Formalization. A complete formalization of mathematics, with all statements
articulated in a precise formal language that obeyed well-dened rules.
2. Completeness. The formalization system must show that all mathematical
statements are true.
3. Consistency. A proof that no contradiction can be obtained in the formal set
of rules.
4. Conservation. A proof that any result relating to real things by using reasoning about ideal objects can be provided without the latter.
5. Decidability. An algorithm must be determined for deciding the truth or falsity
of any mathematical statement.
Hilberts program was put into some question by Gdels demonstration, but it
continues to have validity as a heuristic system for conducting mathematical activities. The current versions of mathematical logic, proof theory, and so-called
reverse mathematics, are based on realizing Hilberts programreverse math-
Unauthenticated
130 | 2 Logic
ematics is a system that seeks to establish which axioms are required to prove
mathematical theorems, thus turning the Euclidean system of proof upside down,
going in reverse from the theorems to the axioms.
Hilberts program was based on the hope that mathematics could be formalized into one system of the predicate calculus, whether or not it linked mathematics to reality. Similarly, Chomsky has always claimed that his theory is about
grammar, not language as it is spoken and used. But the implicit assumption in
both Hilbert and Chomsky is that logical formalism and reality are an implicit
match. This is known as logicismthe attempt to make logic the core of mathematics and language and then to connect it to reality. Aware of the issues connected
with this stance, Hilbert made the following insightful statement (cited in Tall
2013: 245):
Surely the rst and oldest problems in every branch of mathematics spring from experience
and are suggested by the world of external phenomena. Even the rules of calculation with
integers must have been discovered in this fashion in a lower stage of human civilization,
just as the child of today learns the application of these laws by empirical methods. But,
in the further development of a branch of mathematics, the human mind, encouraged by
the success of its solutions, becomes conscious of its independence. It evolves from itself
alone, often without appreciable inuence from without, by means of logical combination,
generalization, specialization, by separating and collecting ideas in fortunate ways, in new
and fruitful problems, and appears then itself as the real questioner.
Without going here into the many responses to Hilberts program, including the
P = NP problem, it is sufficient to point out that both formal mathematics and formal linguistics have opened up signicant debates about the nature of language
and mathematics. The Unexpected Hanging conundrum, however, continues to
hang over [pun intended] both. As Tall (2013: 246) comments, mathematicians
and linguists must simply lower their sights, continuing to use formalism only
when and where it is applicable:
Instead of trying to prove all theorems in an axiomatic system (which Gdel showed is not
possible), professional mathematicians continue to use a formal presentation of mathematics to specify and prove many theorems that are amenable to the formalist paradigm.
If formalism works, it is because it is a product of the creative brain trying to come

up with solutions to problems. As discussed, this is an abductive process; it is
only after this stage that the brain requires logic to give discoveries stabilityas
Ren Thom (1975) so cogently argued (above). This dual process was explained
by Einstein, whose commentary provides an overall summary of the connection
between intuition and formalism and, incidentally, between language and mathematics (cited in Hadamard 1945: 142143):
Unauthenticated
The words of language, as they are written or spoken, do not seem to play any role in the
mechanism of thought. The psychical entities which seem to serve as elements in thought
are certain signs and more or less clear images which can be voluntarily reproduced and
combined. There is, of course, a certain connection between those elements and relevant
logical concepts. It is also clear that the desire to arrive nally at logically connected concepts is the emotional basis of this rather vague play with the above mentioned elements.
But taken from a psychological viewpoint this combinatory play seems to be the essential
feature in productive thoughtbefore there is any connection with logical construction in
words or other kinds of signs which can be communicated to others.
Unauthenticated
3 Computation
Computing is not about computers any more. It is about living.
Nicholas Negroponte (b. 1943)
The P = NP problem discussed in the previous chapter is a profound one for mathematics. A starting point for understanding its import is a famous computing challenge issued by the security company, RSA Laboratories, in 1991. The company
published a list of fty-four numbers, between 100 and 617 digits long, offering
prizes of up to two hundred thousand dollars to whoever could factor them. The
numbers were semiprimes, or almost-prime numbers, dened as the product of
two (not necessarily different) prime numbers. In 2007 the company retracted
the challenge and declared the prizes inactive, since the problem turned out to
be intractable. But the challenge did not recede from the radar screen of mathematicians, as many tried to factor the numbers using computers. The largest
factorization of an RSA semiprime, known as RSA-200, which consists of 200 digits, was carried out in 2005. Its factors are two 100-digit primes, and it took nearly
55 years of computer time, employing the number eld sieve algorithm, to carry
out. This algorithm is the most efficient one for factoring numbers larger than
100 digits.
The enormity of the RSA challenge brings us directly into the core of the P = NP
problem. Can a problem, such as the RSA one, be checked beforehand to determine if it has a quick solution? The problem is still an outstanding one, and it
too carries a price tag of one million dollars, offered this time around by the Clay
Institute. To reiterate here, the P = NP problem entails asking whether a problem
whose solution can be determined to be possible by computer can also be solved
quickly by the computer. Not surprisingly, the problem was mentioned by Gdel in
a letter he sent to John von Neumann in 1956, asking him whether an NP-complete
problem could be solved in quadratic or linear time. The formal articulation of the
problem came in a 1971 paper by Stephen Cook. Of course, it could well turn out
that a specic problem itself will fall outside all our mathematical assumptions
and techniques. Quadratic time refers to the fact that the running time of an algorithm increases quadratically if the size of the input is doubled. That is, as we
scale the size of the input by a certain amount, we also scale the running time by
the square of that amount. If we were to plot the running time against the size of
the list, we would get a quadratic function.
Unauthenticated
3 Computation |
133
The foray in the last chapter into formalism led to the N = NP dilemma, which
constitutes a basis for investigating mathematics and language in terms of algorithms and computer models. One of the more important byproducts of the
formal grammar movement has been a growing interest in the modeling of natural
and articial languages. Known as computational linguistics (CL), it is a branch
that aims to devise algorithms in order to see what these yield both in terms of
machine-based processing systems and in terms of what they reveal about human
language. CL has had many interesting implications and applications, from machine translation to the study of language development. The interplay between
theoretical linguistics and CL has become a valuable one, since computational
models of language can be used not only to test linguistic theories but also to devise algorithms for generating useful articial languages, such as those used on
the Internet.
Because computers have an enormous capacity for data-processing, they are
heuristic devices that allow the linguist to examine large corpora of data and glean
from the data relevant insights into language and discourse. Without the computational approach, the vastly complex information present in discourse data
would have remained largely inaccessible to linguists and the current emphasis on discourse within linguistics, sociolinguistics, and applied linguistics might
never have come about. Indeed, the use of computer technology in discourse analysis has made it a relatively simple task to extract from the data the relevant patterns and categories that are hidden within it and thus to describe the rules of
discourse in as straightforward a manner as the rules of grammar.
A similar approach is found in mathematics, known generally as computability theory (CT), which asks questions such as the following one: How many sets
of the natural numbers are there, such as the primes, the perfect numbers, and
so on? There are more random numbers than ordered ones in sets. So, is there
any way, or more precisely is there an algorithm, that can tell us which is which?
Consider a set, A, which consists of certain numbers. Are, say, 23 and 79 in the set
or not? Can an algorithm be developed that can answer this question, which can
be rephrased as the question of whether 23 and 59 are computable? Clearly this
kind of approach penetrates the nature of sets and of membership in sets and,
thus, leads to a more comprehensive understanding of what logic is.
CL and CT are fascinating in themselves, especially in areas such as the N = NP
problem and in so-called Natural Language Processing (NLP), which constitutes
an attempt to make computers produce language in a more naturalistic manner.
Using linguistic input from humans, algorithms have been constructed that are
able to modify a computer systems style of production based on such input, thus
simulating the adaptability of verbal communication. The focus is on how humans comprehend linguistic inputs and then use this knowledge to produce rel-
Unauthenticated
134 | 3 Computation
evant outputs. An offshoot of this line of inquiry has been a focus on precision in
the development of theories from given data. With the proliferation of the Internet
and the abundance of easily accessible written human language on it, the ability
to create a program capable of processing human language by computer based
on an enormous quantity of natural language data has many broad and exciting
possibilities, including improved search engines and, as a consequence, a deeper
understanding of how language works. In a phrase, the computer is both a powerful modeling device for testing theories and a new means for reproducing human
language articially.
This chapter starts with a discussion of the connection of CL and CT to algorithms and computer modeling. Then it looks at how CL may have triggered the
interest in discourse and at how theories of discourse, in turn, inform NLP. It then
discusses computability in mathematics and what it tells us about mathematics in
general. It ends with an overall assessment of the computation movement in both
mathematics and linguistics. The thematic thread that I wish to weave throughout is that because language and mathematics can be modeled computationally
in similar ways, this can provide insights into their structure and, perhaps, even
their common nature. The computational streams in both linguistics and mathematics are extensions of formalism, since programming a computer requires a
fairly precise knowledge of how to write rules and connect them logically.
3.1 Algorithms and models

The concept of algorithm is a crucial one in computer science and articial intelligence (AI) research. It is thus worthwhile discussing it briey here before looking
at the relation of algorithms to the computer modeling of mathematical and linguistic phenomena.
As is well known, the concept (although not named in this way) goes back
to Euclid. His algorithm, called the Fundamental Theorem of Arithmetic, is worth
revisiting here because it brings out the essence of what algorithms are all about.
Given any composite number, such as 14 or 50, the theorem states that it is decomposable into a unique set of prime factors:
14 = 2 7
50 = 2 5 5
Lets look more closely at how the unique set of prime factors of a composite number, such as 24, can be identied using a version of Euclids algorithm:
Unauthenticated
3.1 Algorithms and models |
1.
2.
3.
4.
5.
135
24 = 12 2
Notice that 12 = 6 2
Plug this in (1) above: 24 = (6 2) 2 = 6 2 2
Notice that 6 = 3 2
Plug this in (3) above: 24 = 6 2 2 = (3 2) 2 2 = 3 2 2 2
The prime factors of 24 are 2 and 3, or 24 = 3 23 . We also note that each of the prime
factors that produces a composite number also divides evenly into it: 3 divides
into 24 as does 2. This is then the basis for constructing the algorithm:
1. Start by checking if the smallest prime number, 2, divides into the number
evenly.
2. Continue dividing by 2 until it is no longer possible to do so evenly.
3. Go to the next smallest prime, 3.
4. Continue in this way.
This method will work every time. The above instructions constitute the algorithm; that is, they constitute a logical step-by-step set of procedures. Euclid actually conceptualized his algorithm geometrically, as did Nichomachus even before
Euclid. Their geometric algorithms are described and illustrated by Heath (1949:
300). These are shown below:
Euclids example
Nichomachus example
Figure 3.1: Euclids and Nichomachus

algorithms
Unauthenticated
136 | 3 Computation
Euclids algorithm shows how to nd the greatest common divisor (gcd) of two
starting lengths BA and DC, which are multiples of a common unit length. DC,
being shorter, is used to measure BA, but only once because remainder EA is less
than DC. EA is divisible into DC, with remainder FC, which is shorter than EA,
and divides three times into its length. Because there is no remainder, the process
ends with FC being the gcd. Nichomachus algorithm shows how the factorization
of the numbers 49 and 21 results in the gcd of 7.
The algorithm is not only a set of instructions for the factorization of composite numbers but also a model of factorization itself, since it breaks the operation
down into its essential steps. Generally speaking, by modeling mathematical (and
linguistic) phenomena in the form of algorithms, we are in effect gaining insight
into the phenomena themselves.
Euclids algorithm above can be easily transformed into a computer program
via a owchart. Scott (2009: 13) provides the following owchart of the algorithm:
ENTRY
Euclids algorithm for the
greatest common divisor (gcd)
of two numbers
INPUT A, B
yes
B = 0?
no
yes
A > B?
no
(< or = 1)
BBA
GOTO 2
AAB
GOTO 2
PRINT A
END
Figure 3.2: A owchart of Euclids algorithm
Unauthenticated
137
This breaks down the steps in calculating the gcd of numbers a and b in locations named A and B. The algorithm proceeds by subtractions in two loops: If the
test B A yields yes (or true), or more accurately the number b in location B is
greater than or equal to the number a in location A, then, the algorithm species
B B A (meaning the number b a replaces the old b). Similarly, if A > B, then
A A B. The process terminates when (the contents of) B is 0, yielding the gcd
in A. Algorithms are thus tests for decidability. If an algorithm can be written for
something and comes to an end, it is computable (that is, it can be carried out
and thus decidable). The general procedures above for factorization of composite numbers are, as the owchart shows, easily turned into computer language,
which is then run on an actual computer. The computer is thus a modeling device
that allows us to test the model.
It is thus useful to look here at distinctions, denitions, and basic concepts
in computer modeling, although well known among computer scientists, since
these are implicit in all computation activities and theories. Computer modeling
is the representation of objects or ideas. Like physical models, computer models
show what something might look like when the real thing would be too difficult or
impossible to create physically. Architects use computer modeling to see what a
new house design might look like. The architect can change the design in order to
see what the changes entail. The model of the house is more exible to build than
a physical model. Similarly, a model of factorization (above) allows us to see what
factoring might look like. The mathematician can change the model in order to
see what the changes would entail and what they would yield in terms of a theory
of factorization.
A computer model lets the linguist or mathematician test the validity or computability of a theory in some domain. And this forces the mathematician or linguist to specify the algorithm precisely beforehand. The realism of a computer
algorithm reects the level of understanding of its maker. Algorithms are also useful as database-makers, so to speak, since they enable users to store large corpora
of information in databases which then allow for a guided search of the databases
in various ways. The efficiency with which computers store and retrieve information makes database management a major function in CL and CT. Neuroscientists
can also store the results of experiments and compare their results with those of
other scientists.
Computer modeling is also a means for mimicking various activities. Articial
intelligence (AI) software enables a computer to imitate the way a person solves
complex problems, speaks, or carries out some other expressive task. One particular type of AI software, called an expert system, enables a computer to ask
questions and respond to information the answers provide. The computer does
so by drawing upon rules and vast amounts of data that human experts have sup-
Unauthenticated
138 | 3 Computation
plied to the writers of the software. The computer can narrow the eld of inquiry
until a potential solution or viable theory is reached. However, if the rules and
data available to the system are incomplete, the computer will not yield the best
possible solution.
3.1.1 Articial intelligence

CL and CT emerged at the same time that AI did as a theoretical branch of computer science and psychology. AI is, fundamentally, a study of algorithms and
of how they can be used to create computer models and simulations of various
phenomena. Historically, this idea started in 1623 when German scientist Wilhelm
Schikard invented a machine that could add, multiply and divide. He got it to carry
out the operations by breaking down the operations into step-by-step procedures
which he fed into the machine with an early program. Blaise Pascal followed with
a machine in 1642 that added and subtracted, automatically carrying and borrowing digits from column to column. Shortly after, Leibniz designed a special gearing
system to enable multiplication on Pascals machine. In the early 1800s century
French inventor Joseph-Marie Jacquard invented a loom that used punched cards
to program patterns of woven fabrics. Inspired by Jacquards invention, British
mathematician Charles Babbage constructed what he called a Difference Engine
in the early 1820s to solve mathematical problems mechanically. Babbage also
made plans for an Analytical Engine, which pregured many of the features of
the modern computer. A little later, Herman Hollerith, an American inventor, combined Jacquards punched cards with devices that created and electronically read
the cards. Holleriths tabulator was used for the 1890 United States census. He
founded his Tabulating Machine Company, which eventually merged with other
companies in 1924 to become International Business Machines Corporation (IBM).
In the 1930s American mathematician Howard Aiken developed the Mark I
electronic calculating machine, which was built by IBM. From this basis, Hungarian-American mathematician John von Neumann developed the rst electronic
computer to use a program stored entirely within its memory. John Mauchley, an
American physicist, and J. Presper Eckert, an American engineer, built the rst
successful, general digital computer in 1945. In 1948 American physicists Walter
Houser Brattain, John Bardeen, and William Bradford Shockley developed the
transistor. By the late 1960s integrated circuits, electrical components arranged
on a single chip of silicon, replaced transistors. In the 1970s came the microprocessor, which led eventually to the personal computer and to the incredibly-powerful
computer systems available today. It was in that decade, in fact, that AI emerged
as a viable discipline.
Unauthenticated
139
Actually, proper AI began at a workshop at Dartmouth College in 1956 organized by John McCarthy, who is credited with coining the name of the new
discipline. At the workshop, computer scientists presented and discussed the rst
programs capable of modeling logical reasoning, learning, and board games, such
as checkers. One presentation described the rst program that learned to play
checkers by competing against a copy of itself.
AI is a major branch of computer science today, aiming to design systems
(models and simulations) that process information in a manner similar to the way
humans do. This makes it as well a branch of cognitive science and neuroscience.
A computer with AI is a very useful tool in these areas because, as mentioned
several times, it can test the consistency of theories, methods, and even such detailed artifacts as proofs and grammar rules. It can also be programmed to perform
the same tasks, making it possible to assess the algorithm itself as a theoretical construct. AI is typically divided into several branches, including knowledge
representation and reasoning, planning and problem solving, Natural Language
Processing, Machine Learning, computer vision, and robotics.
The key idea in AI is representation. The programmer asks a simple question:
How can we best represent phenomenon X? As a trivial, yet useful, example, consider how factoring in algebra could be represented, such as the factorization of
the expression 2x + 4y + 16z. The instructions to the computer would include sequential steps such as the following:
1. Check for factors in all symbols
2. Extract the factors
3. Move them to the front
4. Add parentheses
The operation of the instructions would then produce the required output: 2(x +
2y + 8z). This is said to be a manifestation of knowledge representation in a specic
domain. It is at the core of AI.
3.1.2 Knowledge representation

There are three main approaches to knowledge representation: (1) the logical approach, (2) the probabilistic approach, and (3) the neural network approach. In
the rst one, programs are designed to produce required outputs based on a series of if-then rules or instructions, as for example: If A is true and B is true,
then C is true. This approach has been effective in the development of expert
systems that are designed to solve specic problems. However, each program requires many detailed instructions and cannot carry out computations effectively
Unauthenticated
140 | 3 Computation
outside a narrow range of expertise. In the probabilistic approach, knowledge is

represented as a set of statistical probabilities. The program is thus designed in
terms of the probability of alternative outputs given specic data or information.
In the neural network approach, knowledge is represented with instructions that
are organized as a network, within which the interconnected units perform certain tasks by exchanging information. This approach is intended to imitate the
behavior of neurons, hence its name neural network programming.
To represent some truly complex human knowledge system, such as language, elements of all three might be used. Natural Language Processing (NLP),
for example, involves writing computer programs that communicate with users
in a human language, instead of a specialized programming language. Computer
scientists have thus developed sophisticated logical, probabilistic, and neural
network systems for NLP, as will be discussed below. These can effectively carry
out verisimilar conversations about a narrow topic, such as making restaurant
reservations. They can, in other words, process and produce only a limited range
of natural language.
The probabilistic aspect of contemporary computer models is a very important one (Kochenderfer 2015). Many aspects of problem-solving in mathematics and linguistic performance are potentially undecidable involving decisionmaking that is uncertainthat is, involving actions based on imperfect observations with unknown outcomes. Using Bayesian analysis, which models the
variability powerfully, algorithms can be designed to be much more exible and
thus more similar to human activities, which are frequently uncertain. Bayesian
analysis will be discussed in the next chapter. For now, it is sufficient to introduce
Bayess Theorem, which states that the conditional probability of each of a set of
possible causes for a given observed outcome can be computed from knowledge
of the probability of each cause and the conditional probability of the outcome of
each cause. Bayesian analysis has been described as a degree-of-belief interpretation of probability, as opposed to frequency or proportion interpretations.
The Bayesian approach turns strict propositional logic into a more exible
one, enabling reasoning with hypotheses, that is, with propositions whose truth
or falsity is uncertain. In this framework, a prior hypothesis is updated in the
light of new relevant observations or evidence, and this is done via a standard set
of algorithmic procedures that can perform the relevant new calculation. There
are actually two views on Bayesian probabilitythe objectivist view, whereby the
rules of Bayesian statistics can be justied by means of consistency criteria, and
the subjectivist view, whereby the statistics quantify a personal belief. Either one,
however, has had concrete and important applications in computer modeling.
AI has taken great strides in advancing how we may indeed construct some
systems of representation. As a consequence, some AI theorists have gone so far
Unauthenticated
141
as to affirm that AI itself is a theory of mind and thus a way to predict human
behaviora fact that has not escaped Google, which uses algorithms to mine the
Internet for information on people and groups (MacCormick 2009). The fundamental assumption here is that the minds functions can be thought of as attendant to neurological states (for example, synaptic congurations) and that these,
in turn, can be thought of as operations akin to those that a computer is capable of carrying out. That this was a viable approach to analyzing intelligence was
demonstrated by Turing (1936), mentioned in the previous chapter. He showed
that four simple operations on a tapemove to the right, move to the left, erase
the slash, print the slashallowed a machine to execute any kind of program that
could be expressed in a binary code (as for example a code of blanks and slashes).
As long as one could specify the steps involved in carrying out a task and translating them into the binary code, the Turing machine would be able to scan the tape
containing the code and carry out the instructions.
As Gardner (1985: 1718) correctly noted, Turing machines, and similar computational constructs of knowledge catapulted cognitive science to the forefront
in the study of the human mind in the 1980s:
The implications of these ideas were quickly seized upon by scientists interested in human
thought, who realized that if they could describe with precision the behavior of thought
processes of an organism, they might be able to design a computing machine that operated
in identical fashion. It thus might be possible to test on the computer the plausibility of notions about how a human being actually functions, and perhaps even to construct machines
about which one could condently assert that they think just like human beings.
There are now two versions of AI. The employment of computers to test models of
knowledge is the weak version of AI, and, as such, it has helped to shed some
light on how logical processes might unfold in the human mind. The strong
version, on the other hand, claims that all human activities, including emotions
and social behavior, are not only representable in the form of algorithms, but that
machines themselves can be built to think, feel, and socialize. This view depicts
human beings as special types of computation machines. The following citation
from Konner (1991: 120), an early supporter of the strong version, makes this emphatically clear:
What religious people think of as the soul or spirit can perhaps be fairly said to consist of
just this: the intelligence of an advanced machine in the mortal brain and body of an animal.
And what we call culture is a collective way of using that intelligence to express and modify
the emotions of that brain, the impulse and pain and exhilaration of that body.
Not all cognitive scientists have adopted the strong version of AI. Neuroscientists, in particular, are working more and more on the development of computa-
Unauthenticated
142 | 3 Computation
tional models of neurological processes as a means for gaining knowledge of how

thought is processed by the brain, not of reproducing thought in its human form
in some machine (the weak version of AI). By trying to gure out how to design
a computer program that simulates the relevant neurofunctional processes underlying mental activities, neuroscientists thus often discover certain unexpected
patterns. Kosslyn (1983: 116) put it aptly a while back as follows:
The computer model serves the function of a note pad when one is doing arithmetic: It helps
keep track of everything so that you dont get a headache trying to mentally juggle everything
at once. Sometimes the predictions obtained in this way are surprising, which often points
out an error in your thinking or an unexpected prediction.
As Black (1962) pointed out at the start of AI, the idea of trying to discover how a
computer has been programmed in order to extrapolate how the mind works was
bound to become a guiding principle in AI research on mathematics and language
for the simple reason that algorithms are so understandable and so powerful in
producing outputs. But there is a caveat here, expressed best by physicist Roger
Penrose (1989), who has argued that computers can never truly be intelligent because the laws of nature will not allow it. Aware that this is indeed an effective
argument, Allen Newell (1991) responded by pointing out that the use of mechanical metaphors for mind has indeed allowed us to think conveniently about the
mind, but that true AI theory is not based on metaphor. He summarized his case
as follows (Newell 1991: 194):
The computer as metaphor enriches a little our total view of ourselves, allowing us to see
facets that we might not otherwise have glimpsed. But we have been enriched by metaphors
before, and on the whole, they provide just a few more threads in the fabric of life, nothing
more. The computer as generator of a theory of mind is another thing entirely. It is an event.
Not because of the computer but because nally we have obtained a theory of mind. For a
theory of mind, in the same sense as a theory of genetics or plate tectonics, will entrain an
indenite sequence of shocks through all our dealings with mindwhich is to say, through
all our dealings with ourselves.
It is relevant to note that the advent of AI dovetails with the rise of Machine
Translationthe use of computers to translate texts from one natural language
to another. Machine Translation was, and still is, a testing ground for weak and
strong versions of AI. It made an early crucial distinction in knowledge representation between the virtual symbols in abstract systems or algorithms and the
actualized symbols in texts. The idea was to design algorithms capable of mimicking the actualized symbols in linguistic behavior. From this basic platform,
computational linguists developed representations of linguistic knowledge that
do indeed mimic linguistic behavior, as we shall see. Although the computer
Unauthenticated
143
cannot interpret its outputs (actual symbols) in human terms, it can model them
in virtual terms. The interpretation of the difference is the task of the analyst. All
this suggests that only the weak version of AI is a viable one in the modeling of
mathematical and linguistic knowledge.
The founding notion in knowledge representation within AI is Turings machine, discussed briey above. It is not a physical device. It is a logical abstraction.
Garnham (1991: 20) illustrates it appropriately as follows:
If something can be worked out by mathematical calculation, in the broadest sense of that
term, then there is a Turing machine that can do each specic calculation, and there is a General Turing machine that can do all of them. The way it works is that you pick the calculation
you want done and tell the General Turing machine about the ordinary Turing machine that
does that calculation. The General Turing machine then simulates the operation of the more
specic one.
To paraphrase, by picking an operation and loading a programa specic Turing machinefor carrying it out into the computers memory, the computera
General Turing machinecan then model what would happen if one actually had
that specic machine. The fundamental assumption in early CL was that rules of
syntax are akin to those that a Turing machine is capable of carrying out. The
modern computer works essentially in this way, using binary digits to realize the
operations. The simplicity of the machine is important to note. The main insight
from this line of investigation is that complexity is a derivative of simple operations working recursively at the level of operationality. This inherent principle of
computation may even be the implicit premise that led Chomsky to assume that
recursion was the underlying principle in the operation of the UG. Whatever the
case, it is obvious that algorithmic knowledge representation and human theories of that knowledge can be compared, analyzed, and modied accordingly. The
synergy that exists between the two is the essence of CL and CT. By trying to gure
out how to design a computer program that simulates the cognitive and neurofunctional processes underlying mental activities we can get an indirect glimpse
into those activities.
In computer science, recursion refers to the process of repeating items in a
self-similar way and, more precisely, to a method of dening functions in which
the function being dened is applied within its own denition, but in such a way
that no loop or innite chain can occur. The so-called recursion theorem says that
machines can be programmed to guarantee that recursively dened functions exist. Basically it asserts that machines can encode enough information to be able
to reproduce their own programs or descriptions.
Unauthenticated
144 | 3 Computation
3.1.3 Programs
It is useful here to discuss what is involved in programming a computer to model
or simulate some activity, behavior, or theory. Preparing a program begins with
a complete description of the operation that the computer is intended to model.
This tells us what information must be inputted, what system of instructions
and types of computing processes (logical, probabilistic, neural) are involved,
and what form the required output should take. The initial step is to prepare a
owchart that represents the steps needed to complete the task. This is itself a
model of the relevant knowledge task, showing all the steps involved in putting
the instructions together into a coherent program. The format of the owchart,
actually, imitates the formatting of a traditional proof in geometry. Each step in
the chart gives options and thus allows for decisions to be made. The owchart is
converted into a program that is then typed into a text editor, a program used to
create and edit text les.
Flowcharts use simple geometric symbols and arrows to specify relationships.
The beginning or end of a program is represented by an oval; a process is represented by a rectangle; a decision is represented by a diamond; and an I/O (inputoutput) process is represented by a parallelogram. The owchart below shows
how to build a computer program to nd the largest of three numbers A, B, and C:
Start
Read A,B,C
YES
NO
Is B > C?
YES
Is A > B?
NO
YES
Is A > C?
NO
Print B
Print C
Print A
End
Figure 3.3: Flowchart for determining the largest number
Unauthenticated
145
This breaks down the steps in the comparison of the magnitudes of numbers in
a precise and machine readable way. Basically it mimics what we do in the real
world, comparing two numbers at a time and deciding when to determine the
largest magnitude along the way.
Programs are written with high-level languages, which include symbols,
linguistic expressions, and/or mathematical formulas. Some programming languages support the use of objects, such as a block of data and the functions that
act upon the given data. These relieve programmers of the need to rewrite sections
of instructions in long programs. Before a program can be run, special programs
must translate the programming language text into a machine language, or lowlevel language, composed of numbers. Sophisticated systems today combine
a whole series of states and representational devices to produce highly expert
systems for processing input.
Now, for the present purposes it is sufficient to note that programming is a
translation system, converting one system (composed of virtual symbols) into another so that the initial system can be restructured into the second system to produce an output (composed of actual symbols) that allows the rst system to operate. These can be represented diagrammatically as follows (S1 = initial system,
S2 = computer system):
S1
S2
Output
Figure 3.4: Programming schema
In this diagram the S2 is the set of instructions that constitute the modeling system
required to translate S1 into the S2 (the computer system consisting of a knowledge
representation language with relevant symbols, objects and so on). The S2 thus
constitutes a model, albeit a specic kind of model, based in AI. So, a program is
a model that will allow us to represent mathematical and linguistic knowledge,
or at least aspects of such knowledge, in algorithmic ways. The mechanical system (S2 )more technically known as the source codeis an operating system and
requires interpretation on the part of the programmer to construct. As in traditional proofs, this means blending modes of logic, from abduction to deduction.
Abduction enters the picture when devising the steps and connecting them to
the programmers previous knowledge. So, programming languages contain the
materials to organize a format into a coherent representation of the S1 that the
machine can process.
The above description, although reductive, is essentially what a program does
in converting human ideas into machine-testable ones, thus allowing us to test
Unauthenticated
146 | 3 Computation
for their consistency, completeness, and decidability. For this reason, computers
have been called logic machines, since they allow for the testing of the three
criteria for knowledge representation that were discussed in the previous chapter.
It is relevant to note that a programming language is usually split into two
components: syntax (form) and semantics. These are understood in the same way
that formal grammars dene them (previous chapter). Without going in details
here suffice it to say that these are modeled to mimic the same type of sequential logical structure found in formal grammars. Lets look at a simple program
in BASIC that translates the source (S1 ) into its language (S2 ). The program is a
rst-generation BASIC one with simple data types, loop cycles and arrays. The following example is written for GW-BASIC, but will work in most versions of BASIC
with minimal changes. It is intended to produce a simple dialogue:
10
20
30
40
50
60
70
80
90
100
110
120
130
140
INPUT What is your name: , U$

PRINT Hello ; U$
INPUT How many cookies do you want: , N
S$ =
FOR I = 1 TO N
S$ = S$ + *
NEXT I
PRINT S$
INPUT Do you want more cookies? , A$
IF LEN(A)$ = 0 THEN GOTO 90
A$ = LEFT$(A$, 1)
IF A$ = Y OR A$ = y THEN GOTO 30
PRINT Goodbye ; U$
END
The resulting dialogue resembles a real dialogue:
What is your name: (Marcel)

Hello (Marcel)
How many cookies do you want: (7)
*******
Do you want more cookies? (Yes)
How many cookies do you want? (3)
***
Do you want more cookies? (No)
Goodbye (Marcel)
This is of course a very simple program. But it shows how syntax and semantics
are envisioned in a formal (compositional) way. Third-generation BASIC lan-
Unauthenticated
3.2 Computability theory
147
guages such as Visual Basic, Xojo, StarOffice Basic and BlitzMax have introduced
features to support object-oriented and event-driven programming paradigms.
Most built-in procedures and functions are now represented as methods of standard objects rather than operators. The point is that whether or not this type
of knowledge representation is psychologically real, for the purpose of theorytesting it can be assumed to be so.

The term mathematical modeling has many meanings in contemporary mathematics; here it is limited to describing how to design algorithmic systems (and
computer programs) for describing mathematical knowledge and for solving
mathematical problems such as the P = NP one. Proving that a problem falls
within the class P or NP is the starting point in all mathematicsCan it or can it
not be solved and in relatively quick time? This is the so-called Cobham-Edmonds
thesis, rst articulated in 1965 by Alan Cobham. Basically, it says that if a problem
can be computed in polynomial time (the time required by a computer to solve
a problem), then it lies in P. This implies that there exists an algorithm that can
produce a solution within a given time.
CT can thus answer some very basic questions in mathematics that would otherwise simply be debated philosophically leaving us with logical paradoxes such
as the Unexpected Hanging one. For example: If P = NP then what happens to computability? The case of the Traveling Salesman Problem, which is NP-complete, is
often cited to show what this question entails, since it is among the most difficult to solve by algorithm. As Elwes (2014: 289) puts it: If P = NP, then there is
some problem in NP which cannot be computed in polynomial time. Being NPcomplete, the Travelling Salesman Problem must be at least as difficult as this
problem, and so cannot lie in P (see Cook 2014).
3.2.1 The Traveling Salesman Problem

The Traveling Salesman Problem (TSP) allows us to model various aspects of
mathematical knowledge because it has a distinct computational structurethat
is, it has a structure that can be modeled on a computer easily by means of a
program, given that graphs are computer objects of a certain kind (that is, they
represent a problem in diagrammatic-essential terms). Heres a standard version
of the problem.
Unauthenticated
148 | 3 Computation
A salesman wishes to make a round-trip that visits a certain number of cities. He knows the
distance between all pairs of cities. If he is to visit each city exactly once, then what is the
minimum total distance of such a round trip?
(Benjamin, Chartrand, and Zhang 2015: 122)
The TSP involves the use of Hamiltonian cycles, which need not concern us here.
Simply put, a Hamiltonian cycle uses all the vertices of a graph at once. A graph
with a Hamiltonian path is thus traceable and connectible. The solution of the TSP
is elaborated by Benjamin, Chartrand, and Zhang (2015: 122) as follows (where c =
a city, n = number of vertices in a graph):
The Traveling Salesman Problem can be modeled by a weighted graph G whose vertices are
the cities and where two vertices u and v are joined by an edge having weight r if the distance
between u and v is known and this distance is r. The weight of a cycle C in G is the sum of the
weights of the edges of C. To solve this Traveling Salesman Problem, we need to determine
the minimum weight of a Hamiltonian cycle in G. Certainly G must contain a Hamiltonian
cycle for this problem to have a solution. However, if G is complete (that is, if we know the
distance between every pair of cities), then there are many Hamiltonian cycles in G if its
order n is large. Since every city must lie on every Hamiltonian cycle of G, we can think of
a Hamiltonian cycle starting (and ending) at a city c. It turns out that the remaining (n 1)
cities can follow c on the cycle in any of its (n 1)! orders. Indeed, if we have one of the
(n 1)! orderings of these (n 1) cities, then we need to add distances between consecutive
cities in the sequence, as well as the distance between c and the last city in the sequence. We
then need to compute the minimum of these (n 1)! sums. Actually, we need only nd the
minimum of (n 1)!/2 sums since we would get the same sum if a sequence was traversed in
reverse order. Unfortunately, (n 1)!/2 grows very, very fast. For example, when n = 10, then
(n 1)!/2 = 181,400.
The solution has algorithmic formelaborating instructions for connecting the

parts of a graph systematically with no detours or exceptions. By translating the
physical aspects of the problem (distances, cities, and so on) into symbolic notions, such as paths, weights, and so on that apply to graph systems, we have
thus devised a mathematical model of the TSPa model (S1 ) that decomposes all
aspects of the problem into its essential parts. Because of this, it can be translated into a computer program (S2 ), as has been done by computer scientists and
mathematicians throughout the history of the TSP. The interesting thing here is
that it involves knowledge of graph theory and of Hamiltonian cyclessomething
a computer would not know in advance. But, once programmed (S2 ), the outputs
of S1 (the TSP) show many alternatives to the solutionall connected by the main
elements in the algorithm. This example shows why CT is a useful approach to
problems (both of the P and NP variety) in mathematics.
The TSP is part of graph theory, which ultimately derives from Eulers Knigsberg Bridges problem as a means of deciding whether a tour is possible or not.
Unauthenticated
149
Eulers problem constitutes an important episode in mathematics. So, it is worth

revisiting briey here. The Knigsberg Bridges Problem (KBP), which Euler formulated in a famous 1736 speech, illustrates what impossibility is essentially about
and how to approach the P = NP problem (although it was not named, of course, in
this way during Eulers times). Euler presented the problem rst to the Academy
in St. Petersburg, Russia, publishing it later in 1741. He no doubt suspected that it
bore deep implications for mathematics.
The situation leading to the problem goes somewhat as follows. In the German town of Knigsberg runs the Pregel River. In the river are two islands, which
in Eulers times were connected with the mainland and with each other by seven
bridges. The residents of the town would often debate whether or not it was possible to take a walk from any point in the town, cross each bridge once and only
once, and return to the starting point. No one had found a way to do it but, on the
other hand, no one could explain why it seemed to be impossible. Euler became
intrigued by the debate, turning it into a mathematical conundrum:
In the town of Knigsberg, is it possible to cross each of its seven bridges over the
Pregel River, which connect two islands and the mainland, without crossing over
any bridge twice?
In the schematic map of the area below, the land regions are represented with
capital letters (A, B, C, D) and the bridges with lower-case letters (a, b, c, d, e, f, g):
Figure 3.5: Knigsberg Bridges Problem
Euler went on to prove that it is impossible to trace a path over the bridges without
crossing at least one of them twice. This can be shown by reducing the map of the
area to graph form, restating the problem as follows:
Is it possible to draw the following graph without lifting pencil from paper, and
without tracing any edge twice?
Unauthenticated
150 | 3 Computation
Figure 3.6: Knigsberg Bridges Problem in outline graph form
The graph version provides a more concise and thus elemental model of the situation because it disregards the distracting shapes of the land masses and bridges,
reducing them to points or vertices, and portraying the bridges as paths or edges.
This is called a network in contemporary graph theory. More to the point of the
present discussion, it shows that solving the problem is impossible without doubling back at some point. Creating more complex networks, with more and more
paths and vertices in them, will show that it is not possible to traverse a network
that has more than two odd vertices in it without having to double back over some
of its pathsan odd vertex is one where an odd number of paths converge. Euler
proved this fact in a remarkably simple way. It can be paraphrased as follows.
A network can have any number of even paths in it, because all the paths that
converge at an even vertex are used up without having to double back on any
one of them. For example, at a vertex with just two paths, one path is used to get to
the vertex and another one to leave it. Both paths are thus used up without having
to go over either one of them again. Take, as another example, a vertex with four
paths. One of the four paths gets us to the vertex and a second one gets us out.
Then, a third path brings us to the other vertex, and a fourth one gets us out. All
paths are once again used up.
The same reasoning applies by induction to any network with even vertices.
At an odd vertex, on the other hand, there will always be one path that is not used
up. For example, at a vertex with three paths, one path is used to get to the vertex
and another one to leave it. But the third path can only be used to go back to the
vertex. To get out, we must double back over one of the three paths. The same
reasoning applies to any odd vertex. Therefore, a network can have, at most, two
odd vertices in it. And these must be the starting and ending vertices. If there is
any other odd vertex in the network, however, there will be a path or paths over
which we will have to double back.
The network in the Knigsberg graph has four vertices in it. Each one is odd.
This means that the network cannot be traced by one continuous stroke of a pencil
Unauthenticated
151
without having to double back over paths that have already been traced. The relevant insight here is that Eulers graph makes it possible to look at the relationships
among elemental geometric forms to determine solvability (Richeson 2008: 107):
The solution to the Knigsberg bridge problem illustrates a general mathematical phenomenon. When examining a problem, we may be overwhelmed by extraneous information.
A good-problem-solving technique strips away irrelevant information and focuses on the
essence of the situation. In this case details such as the exact positions of the bridges and
land masses, the width of the river, and the shape of the island were extraneous. Euler
turned the problem into one that is simple to state in graph theory terms. Such is the sign of
genius.
The implications of Eulers problem for modern graph theory, topology, and the
computational-mathematical study of the P = NP problem are unending. Graph
theory has had a great impact on mathematical method, bringing together areas
that were previously thought to be separate. A path that traverses every edge of
a graph exactly once is called Eulerian. One that does not is called non-Eulerian.
Euler then looked at graphs in the abstract. In the case of a three-dimensional
gure, for instance, he found that if we subtract the number of edges (e) from the
number of vertices (v) and then add the number of faces (f) we will always get 2
as a result:
ve+f =2
Take, for example, a cube:
V2
E2
E1
V1
V3
E3
E4
V4
E7
E6
E8
E5
E10
V6
E9
V5
E12
V7
E11
V8
Figure 3.7: Number of vertices, edges,

and faces of a cube
As can be easily seen, the cube as 8 vertices, 12 edges and 6 faces. Now, inserting
these values in the formula, it can be seen that the relation it stipulates holds. The
KBP not only provided the basic insights that led to the establishment of two new
branches of mathematicsgraph theory and topologybut it also held signicant
implications for the study of mathematical impossibility. Eulers demonstration
Unauthenticated
152 | 3 Computation
that the Knigsberg network was impossible to trace without having to double
back on at least one of the paths showed how the question of impossibility can
be approached systematically. It was a prototype for the study of combinatorial
optimization (Papadimitriou and Steiglitz 1998), which consists essentially in developing algorithms for network ow, and testing NP-complete problems.
The KBP is a predecessor of the TSP, which was presented in the 1930s and
now constitutes one of the most challenging problems in algorithmic optimization, having led to a large number of programming ideas and methods. As Bruno,
Genovese, and Improta (2013: 201) note:
The rst formulation of the TSP was delivered by the Austrian mathematician Karl Menger
who around 1930 worked at Vienna and Harvard. Menger originally named the problem the
messenger problem and set out the difficulties as follows. At this time, computational complexity theory had not yet been developed: We designate the Messenger Problem (since this
problem is encountered by every postal messenger, as well as by many travelers) the task of
nding, for a nite number of points whose pairwise distances are known, the shortest path
connecting the points. This problem is naturally always solvable by making a nite number
of trials. Rules are not known which would reduce the number of trials below the number of
permutations of the given points. The rule, that one should rst go from the starting point
to the point nearest this, etc., does not in general result in the shortest path.
Of course, Mengers challenge has been tackled rather successfully by computer

science and mathematics working in tandem with the development of the eld
of combinatorial optimization, which was developed to solve problems such as
the TSP one. In 1954 an integer programming formulation was developed to solve
the problem alongside the so-called cutting-plane method, which enables the
nding of an optimal solution (namely, the shortest Hamiltonian tour) for a TSP
involving 49 U.S. state capitals (Bruno, Genovese, and Improta 2013: 202). The
problem has been generalized in various ways and studied algorithmically, leading to the growth of optimization theory.
Interestingly, it has had several applications, such as in the area of DNA sequencing. For the present purposes it is sufficient to say that it shows how to attack
NP-hard problems in general (for relevant applications, see Bruno, Genovese, and
Improta 2013: 205207). The tactics in the attack include the following:
1. creating algorithms for nding solutions
2. devising heuristic algorithms that may not provide a solution but will shed
light on the problem and generate interesting subproblems in the process.
As we have seen, algorithms proceed one step at a time, from a starting point to an
end-point. They are based on the formalist notion of nite states. The algorithm
mirrors the sequential organization of a traditional proof or a generative grammar,
in that the moves from one step to the other are computable. Computability is thus
Unauthenticated
153
a metric of solvability and provability. Turing machines work in this way, because
they can only be in one state at a time. But the advent of quantum physics and
quantum computing has started to provide a powerful alternative to the nite state
model. Quantum physics claims that the fundamental particles of Nature are not
in one xed state at any moment, but can occupy several states simultaneously,
known as superposition. It is only when disturbed that they assemble into one
state. This has obvious implications for the computability hypothesis, because it
could lead to faster machines. In 2009 a quantum program was devised that was
able to run Grovers reverse phone algorithm (Elwes 2014: 289). A phone book is
essentially a list of items organized in alphabetical order. So, looking up a name
in it is a straightforward (nite-state) task. However, if we have a phone number
and want to locate the person to whom it belongs we are faced with a much more
difficult problem to solve. This is the essence of the reverse phone book problem.
Its solution is a perfect example of how seemingly intractable problems can be
modeled in various computable ways to provide solutions. Elwes (2014: 289) puts
it as follows:
In 1996, Lou Grover designed a quantum algorithm, which exploits a quantum computers
ability to adopt different states, and thus check different numbers, simultaneously. If
the phone book contains 10,000 entries, the classical algorithm will take approximately
10,000 steps to nd the answer. Grovers algorithm reduces this to around 100. In general,
it will take around N steps, instead of N. The algorithm was successfully run on a 2-qubit
quantum processor in 2009.
An added aspect of quantum computing is that quantum computations are probabilistic. By running the algorithms over and over one can, thus, increase the level
of decidability to higher and higher degrees, but this would then slow down the
process. Grovers algorithm, actually, was found to be optimal, since no other algorithm has been discovered since that could solve the problem faster. It is not
known, moreover, whether every problem in NP, such as the TSP one, can be
solved with quantum algorithms.
3.2.2 Computability
CT constitutes a partnership between mathematics and computer science aiming
to decide what mathematical problems can be solved by any computer. A function
or problem is computable if an algorithm can be devised that will give the correct
output for any valid input. Since computer programs are countable but numbers
are not, then there must exist numbers that cannot be calculated by any program.
There is, as already discussed, no easy way of describing any of them.
Unauthenticated
154 | 3 Computation
There are many tasks that computers cannot perform. The most well-known
is the halting problem, mentioned in the previous chapter. Given a computer program and an input, the problem is to determine whether the program will nish
running or will go into a loop and run forever. Turing proved in 1936 that no algorithm for solving this problem can exist. He reasoned as follows: it is sufficient to
show that if a solution to a new problem were to be found, then it could be used to
decide an undecidable problem by changing instances of the undecidable problem into instances of the new problem. Since we know that no method can decide
the old problem, no method can decide the new problem either.
One could ask: Is this not just a moot point, since mathematics goes on despite
computability conundrums? The issue of computability is a crucial one, since it
allows us to reformulate classic questions in algorithmic ways. One of the most basic questions of mathematics is: What does a real number look like? This question
was actually contemplated before the advent of CT by mil Borel in 1909 (chapter 1). If we write out the decimal expansion, then each of the digits, from 0 to 9
should appear equally often. The decimal expansion of a number is its representation in the decimal system where each place consists of a digit from 0 to 9 arranged
in such a way that it is multiplied by a power of 10 (10n ), decreasing from left to
right, with 100 indicating the ones place. In other words, it shows the values
of each digit according to its place in the decimal layout or expansion. So, for
instance, the digit 1,236 has the following value structure:
1 103 + 2 102 + 3 101 + 6 100 .
Now, Borel argued that the equal occurrence of the digits does not happen over a
short stretch of the expansion, but if it is stretched out to innity the digits should
eventually average out. He dened this as a normal number. There are 100 possible different 2-digit combinations, 00 to 99, which should also appear equally
over longer stretches of the expansion; the same applies to 3-digit combinations;
and so on, to n-digit combinations. Generally, every nite string of digits in an
expansion should appear with the same frequency as any other string of the same
length. This is Borels main criterion for normality. As a corollary, the same criterion should hold for numbers in any base, such as the binary one.
Borel actually proved that virtually every real number (or more accurately every place-value representation of every number) is normal, with few exceptions.
This raised a few truly intriguing questions: Are the numbers e and normal? It is
conjectured that they are, but no one has been able to prove it. A non-computable
number is called a random real number because it seems to have no discernible
pattern. More specically, one can easily run an algorithm to predict the next integer in an expansion with a high degree of certainty; but no algorithm can predict
with any degree of certainty what the next digit would be. This is a crucial aspect
Unauthenticated
155
of numbers because randomness is stronger than normality. In effect, computability in this case leads, paradoxically, to a consideration of randomness and other
probability factors in the makeup of normality.
Computability, as examples such as this show, is an epistemological notion
that extends more traditional ways of doing mathematics. Indeed, before the advent of CT, computability (solvability) was examined in more direct mathematical
terms, as we have seen in previous chapters. Group theory is a case-in-point. It
came from the fact that two mathematicians, Neils Henrik Abel and variste Galois
in the nineteenth century, were contemplating the solutions of polynomial equations (Mackenzie 2012: 118119). Specically they were looking at quintic polynomials, which have no solution. Their proof involved an exploration of the mathematical concept of symmetry. The general form of the quintic polynomial looks
like this:
x5 + ax4 + bx3 + cx2 + dx + f
The equation has ve roots, {r1 , r2 , r3 , r4 , r5 }. Each coefficient in the equation is a
symmetric function of the roots:
a = (r1 + r2 + r3 + r4 + r5 )
b = (r1 r2 + r1 r3 + r1 r4 + r1 r5 + r2 r3 + r2 r4 + r2 r5 + r3 r4 + r3 r5 + r4 r5 )
and so on
Each of the roots participates equally in the formulas; if the roots are permuted
(say, by replacing r1 with r2 and r2 with r1 ) the formulas do not change. The terms
will have a different order in the written sequence but the sums will be the same.
To put it differently, the linear structure changes, but not the conceptual one it
represents. There are 120 ways to permute the ve roots (5! = 120). So a quintic
polynomial has 120 symmetries (conceptually speaking). Some polynomials have
fewer symmetries because some of the permutations may be excluded due to extra algebraic relations between some of the roots (for instance, a root may be the
square of another). If a polynomial is solvable by radicals, it generates a hierarchy
of intermediate polynomials and number elds, which correspond to the roots.
The symmetries of the original polynomial have to respect hierarchical structure.
The full group (as Galois called it) of 120 permutations of the roots does not allow a
hierarchy of subgroups of the requisite kind. As it turns out, the maximum height
(number of permutations for the quintic polynomial) is 20.
All this may prove to be very interesting in itself, but seems to constitute nothing but an internal ludic exercise. Does group theory have any other value or
meaning? As it has turned out it, it provides an accurate language for many natural phenomena, as Mackenzie (2012: 121) indicates:
Unauthenticated
156 | 3 Computation
Chemists now use group theory to describe the symmetries of a crystal. Physicists use it to
describe the symmetries of subatomic particles. In 1961, when Murray Gell-Mann proposed
his Nobel Prize-winning theory of quarks, the most important mathematical ingredient was
an eight-dimensional group called SU(3), which determines how many subatomic particles
have spin (like the neutron and proton). He whimsically called his theory The Eightfold
Way. But it is no joke to say that when theoretical physicists want to write down a new eld
theory, they start by writing down its group of symmetries.
More to the theme of the present discussion, group theory is a computable

theorythat is, it can be modeled computationally in order to break it down
into its component parts. A group is any system of numbers and mathematical
operations that obeys specic rules. The numbers and operations can vary from
group to group, but the rules are always the same and thus computable. The rules
are actually rather simple, and the operations include elementary notions and
methods, such as addition and multiplication. However, the overall mathematics
of group theory is complex and difficult.
In an equation such as x + 6 = 10, the quantity x is known, of course, as a
variable. A root of an equation is the quantity which, when substituted for the
variable, satises the equation. So in this case, 4 is a root, because 4 + 6 = 10.
Equations may be constructed with the square, cube, or higher powers of the variable and will thus have a number of roots. Prior to Galois, mathematicians had
found general solutions to equations containing powers up to the fourth. But they
had not been able to establish a theory on the solvability of equations containing
powers to the fth and higher. This was, before it was articulated as such, a P = NP
problem. Galois simply analyzed the collection of roots to an equation and then
the set of permutations of the roots. He showed that the permutations form the
structure known as a group. His fundamental result shows that the solvability of
the original equation is related to the structure of the associated group.
To reiterate, computability theory deals with solvability or non-solvability.
This means, in turn, that an algorithm can be devised to test the computability of,
say, real numbers. Heres one that can be made up for purely illustrative purposes.
1. Every number, except zero, is greater than some other number.
2. Every number, except zero, is also smaller than some other number.
3. Zero is neither greater nor smaller than any number.
4. Numbers that are not multiples of other numbers are prime.
5. Numbers that are multiples are composed of primes.
6. And so on.
Now, translating this set of statements into a program for both generating and testing if some symbol is a real number or not is a straightforward process. But, if we
conjoin this algorithm to one that generates normal numbers or to one that gen-
Unauthenticated
157
erates random numbers, we are faced with a much more complex situation, but
still a highly do-able one in computational terms. We have modeled the real numbers in terms of composition and expansion possibilities. Now, we can ask: What
other mathematical structures can be modeled computationally in this way? As it
turns out this type of question leads to a plethora of other phenomena that can
be modeled in the same way. These are known as non-standard models. They
were discovered by Abraham Robinson in 1960 (see Robinson 1974). Robinson
discovered what he called hyperreal numbers, which included the innitesimals
(numbers relating to, or involving, a small change in the value of a variable that
approaches zero as a limit), which truly surprised everyone as to the reality of their
existencehe found these by looking at models of the calculus and discovering
analogies in number systems. The hyperreal numbers now raise further questions,
because the real line and the hyperreal line seem to model things differentially,
and the philosophical problem is that we have no way of knowing what a line in
physical space is really like.
Given the importance of innitesimals to mathematical modeling it is worth
revisiting the whole episode schematically here. The early calculus was often critiqued because it was thought to be an inconsistent mathematical theory, given
its use of bizarre notions such as the innitesimals. These were dened as changing numbers as they approached zero. The problem was that in some cases they
behaved like real numbers close to zero but in others they behaved paradoxically
like zero. Take, as an example, the differentiation of the polynomial f(x) = ax2 +
bx + c (Colyvan 2012: 121):
4.
f(x + ) f(x)
a(x + )2 + b(x + ) + c (ax2 + bx + c)
f (x) =
2 + b
2ax
+
f (x) =
f (x) = 2ax + b +
5.
f (x) = 2ax + b
1.
2.
3.
f (x) =
Colyvan (2012: 122) comments insightfully on the solution as follows:

Here we see that at lines one to three the innitesimal is treated as non-zero, for otherwise
we could not divide by it. But just one line later we nd that 2ax + b + = 2ax + b, which
implies that = 0. The dual nature of such innitesimals can lead to trouble, at least if care
is not exercised.
Robinsons discovery laid to rest the problem of innitesimals. He did this by using
set theory. Statements in set theory that quantify the members of a specic set are
Unauthenticated
158 | 3 Computation
said to be of the rst-order, while those that quantify sets themselves are said to be
second-order. Higher-order systems involve quantifying sets of sets ad innitum.
Robinsons approach was a theory that generalized rst-order logical statements
but not higher-order onesthus avoiding problems of incompleteness. He posited
that a proper extension of the reals (), *, would allow for every subset, say D of
, to be extended to a larger set *D in * so that every function, f : D could be
extended from *D to *, that is: f : *D *. He called this the transfer principle:
Every statement about the real numbers expressed in rst-order logic is true in
the extended system *.
A hyperreal number is a number that belongs to *. It is relevant to note that
when Robinson presented his ideas there was a strong reaction against them. The
situation is described by Tall (2013: 378) as follows:
Non-standard analysis was Robinsons vision of a brave new world that encompassed the
ancient idea of innitesimal. But it was presented to a world immersed in the epsilon-delta
processes of mathematical analysis. Its rst weak spot was that the theory did not seem to
add any new results in standard mathematical analysis
Today, non-standard analysis is viewed as simply another mathematical tool.

Standard analysis is still the mainstream. Of course, innitesimals are the core of
the calculus and, as is well-known, the calculus is the language of science and
engineering. But then mathematical models may have no real scientic implications, because by their very nature they are selective of the information that
they generalize. Moreover, certain aspects of reality may not be mathematically
computable. Smolin (2013: 46) puts it as follows:
Logic and mathematics capture aspects of nature, but never the whole of nature. There are
aspects of the real universe that will never be representable in mathematics. One of them is
that in the real world it is always some particular moment.
Modeling and computability are really parts of a general approach in the search
for ways to represent knowledge. The use of the computer to facilitate this search
is essentially what CT is about. Since Euclid, mathematicians have been searching
for a meta-algorithm, so to speak, that would allow them to solve all intractable
problems. But, as it turns out this might be a dream, although it is one being pursued with different techniques and with a lot of know-how in collaboration with
other disciplines (Davis and Hersh 1986). Interdisciplinarity is now a basic mindset within what has been called here hermeneutic mathematics.
Unauthenticated
3.3 Computational linguistics
| 159

Computational linguistics is the counterpart of CT, constituting a research paradigm seeking to devise algorithms for describing aspects of language. Of course,
computers are also used to examine texts statistically. Techniques developed
within CL are applied to areas such as Machine Translation, speech recognition,
information retrieval, Web searching, and spell checking, among others.
In CL approaches, the modeling of linguistic theories on computers is often
carried out in order to test the content or validity of the theories. Computational
analysis identies how specic theories dene and handle the individual components that make up, say, a phonological system. In order to write an appropriate
algorithm the concept of regularity is critical. So, a pattern is dened as regular
if and only if elements in any category obey the pattern. A central aspect of the
modeling procedure is to determine if there are constraints shared by the system
being analyzed, no matter how diverse they appear to be. In effect, CL aims to
identify which theoretical models best describe the universal properties of systems and the sufficient conditions for something to be considered a system in the
rst place.
Theories of phonology, for example, aim to describe the phonological system of a language via generalizations which are connected by rules in particular
ways. A computational approach would ask the following questions about them:
Are the components in the system language-specic, universal, or both? What
constrains the systems so that they emerge with differences? When comparing
theories, the notions of restrictiveness and expressivity are thus taken directly
into account. Which theory is more powerful, perhaps too powerful, than some
other, and which theory is inadequately expressive? The restrictive theory which
is minimally expressive is assumed to be the most desirable. This parallels the
mathematicians view of some model of proof as better than some other when it
is economical but revelatory. A theory that anything is possible is considered to
be a trivial theory. The computational test is thus one of coverage, not of reality,
as many computationists would claim. In fact, they refer to such models and their
computerizations as learning theories, which, until proven differently would
apply to human learning as well. But this then brings us back to the Unexpected
Hanging conundrum, which continues to beset theoretical aspects of any logical
theory. More will be said about this below.
Unauthenticated
160 | 3 Computation
3.3.1 Machine Translation

The starting point for CL was the early work in Machine Translation (MT), dened
simply as translation from one language to another using computers. MT is different from automatic speech recognition, but it is part of the generic study of natural
language with computer models. There are three types of MT:
1. Machine-aided translation: translation carried out by a human translator who
uses the computer as an ancillary or heuristic tool to help in the translation
process.
2. Human-aided machine translation: translation of a source language text by
computer with a human translator editing the translation.
3. Fully-automated machine translation: translation of the source language text
solely by the computer without any human intervention.
For the present purposes, the term MT refers exclusively to the third type above.
MT goes as far back as the 1940s (Hutchins 1997). The work of mathematician
Warren Weaver and scientist Andrew D. Booth in the 1950s (Booth 1955, Booth
and Locke 1955, Weaver 1955) was especially critical in founding MT. The two researchers wrote the rst scientic papers in the eld and generated interest in it
among scientists in various elds. MT seems to have started with Weavers efforts
to adapt and modify the techniques of cryptanalysis used during World War II into
general principles of machine translation and the automated making of dictionaries. Given the low power of computers of the era various problems emerged that
almost shot down MT before it even got started. At Georgetown University in 1954
a widely-publicized experiment in MT used the translation of Russian sentences
into English to exemplify how MT worked. But it became clear that the algorithms
in the experiment lacked the kind of conceptual sophistication that humans have
when it comes to tapping into the meanings of texts. A classic example from the
Georgetown experiment is the translation of the Russian version of The spirit is
willing, but the esh is weak as The vodka is strong, but the meat is rotten. It
was obvious that the problem of gurative language was a serious one for MT.
Bar-Hillel (1960) then used an example of linguistic ambiguity that came to
be known as the Bar-Hillel Paradox to argue against MT. The main problem with
MT was, Bar-Hillel argued, that humans use extra-linguistic information to make
sense of messages and that computers could never access this in the same way that
humans do. In other words, context is a determinant in how humans understand
verbal signs and interpret their meanings, and context is not part of computation.
His example is as follows:
Unauthenticated
| 161
The pen is in the box (= the writing instrument is in the container)

VERSUS
The box is in the pen (= the container is inside another container [playpen])
Humans can distinguish between the two messages because they have access to
outside information about the nature of pens. To put it another way, polysemy is
a feature of human language, which produces ambiguity that is resolved by realworld knowledge when it occurs in messages. Ambiguities were also discussed
by Chomsky (1957, 1965), who attempted to resolve the problem not via real-world
pragmatics but in terms of transformational rules. For example, a sentence such
as Old men and women love that program has potentially two deep-structure meanings:
1.
2.
Old men and women (who are not necessarily old) love that program.
Old men and old women (both the men and women are old) love that program.
The source of the ambiguity is a transformation (factorization). The string in (2),

old + men + and + old + women, has the general form XY + XZ, where X = old,
Y = men, and Z = women string. Through a transformational rule this is reduced to
X(Y + Z) = old + men + and + women. But, as the algebraic form shows, we still interpret the X as applying to both Y and Z (as we do in mathematics). String (1), on the
other hand, has a different form, XY + Z, which leads to a different interpretation
of its meaning: old + men + and (not necessarily old) + women. Now, appropriate
knowledge of the deep structure provides us with the know-how for resolving the
ambiguity in real situations. For example, uttering old men followed by a brief
pause will render the meaning of XY + Z; on the other hand a brief pause after old
will render the meaning of X(Y + Z). While this is true perhaps for sentences of
this type, which produce structural ambiguity, it holds less so for sentences that
have lexical ambiguity, such as the one in Bar-Hillels paradox. In this case, extralinguistic knowledge invariably comes into play, even though Chomsky devised
transformational rules to account for itbut controversially so, as debates about
lexical ambiguity within generativism early on showed (for example, Zwicky and
Sadock 1975; see Cruse 1986: 4968 for an overall account of lexical ambiguity).
Bar-Hillels paradox and various studies on polysemy led shortly thereafter
to the serious study of extra-linguistic inferences in discourse. Indeed, it can be
argued that it was the starting point for the growth of pragmatics and discourse
analysis as major branches of linguistics. Overall, Bar-Hillels paradox brought
out the importance of real-world context in determining meaning. In order for
a fully-automatic MT system to process Bar-Hillels sentences correctly it would
Unauthenticated
162 | 3 Computation
have to have some contextual rule subsystem in the algorithm that would indicate:
1. that pens as writing instruments are (typically) smaller than boxes
2. that boxes understood as containers are larger than pens (typically again)
3. that it is impossible for a bigger object to be contained by a smaller one
The general form of such rules would be somewhat as follows (p = writing instrument known as a pen, b = box, c = container):
1.
2.
3.
p<b
bc
b>p
But the algorithm would still have to decompose the polysemy of the word pen.
An appropriate rule would indicate that the word pen means:
1.
2.
3.
a writing instrument
a play pen
a pig pen
These separate meanings would then be part of a system of subcatgorization rules

needed to avoid ambiguity. In effect, the programmer would need to conduct both
an internal linguistic analysis (ILA) of the grammatical and lexical aspects of
sentences and then an external linguistic analysis (ELA) of the real world contexts that constrain the selection and concatenation of the rules within the ILA
system. In other words, work on the relation between ILA and ELA led, in my view,
to a new awareness of the interconnection between intrasystemic processes (such
as grammar rules) and extrasystemic ones (such as contextual factors) that crystallized in the pragmatic linguistic movement that took a foothold in linguistics
generally by the end of the 1960s.
Advances in computer technology have now made the resolution of lexical
ambiguities rather straightforward, solving Bar-Hillels conundrum. A classic
study of how computers can do this and especially what the programmer needs
to know in advance is the one by Graeme Hirst (1988).
However, CL has not yet been able to deal completely with the ways in which
the human brain infers meaning from various semantic modalities, such as those
inherent in metaphorical speech acts, as discussed in the previous chapter. Nevertheless, it has made many advances in these domains as well since the 1980s
that have led to such effective MT systems today as Google Translation. The new
forms of MT are based on a procedure called interlanguage (or Interlingua) transfer
Unauthenticated
| 163
strategy. First, the SL (source language) text is parsed into an internal representation, much like the ones used in formal grammars. Second, a transfer is made
from the SL text to the TL (target language) text. The transfer mechanisms between the SL and TL consist of an analyzer that literally transforms the SL text
into an abstract form and a generator which then converts this into a representation in the TL. Of course, as in many versions of formal grammar, this assumes
a universal set of rules or rule types in the structure of languages. Experience
with programming rules, however, has shown this to be impracticable. Nevertheless, the Interlingua approach has taken schemas of real-world knowledge into
account, thus expanding the purview and sophistication of MT. In other words, it
has started to integrate ILA with ELA in a sophisticated way.
A variant of the Interlingua system is called Knowledge-Based Machine Translation (KBMT) which also converts the SL text into a representation that is claimed
to be independent of any specic language, but differs in that its inclusion of semantic and contextual information is based on frequency analyses. By adding
these, the system deals with polysemy and other ambiguities in statistical terms
(Nirenburg 1987). This allows the algorithm to make inferences about the appropriate meaning to be selected in terms of frequency distribution measures of a
lexical item. This is intended to simulate the human use of real-world information
about polysemy, allowing the analyzer to integrate inference of meaning based on
probability metrics into the mechanical translation process. The generator simply searches for analogous or isomorphic forms in the TL and converts them into
options for the system. The key notion, though, is that of knowledge modeling.
The details of how this is done are rather complex; and they need not interest us
here as such. Suffice it to say that the computer modeling of knowledge through
Interlingua involves mining data from millions of texts on the Internet, analyzing
them statistically in terms of knowledge categories, and then classifying them for
the algorithmic modeling of polysemy.
3.3.2 Knowledge networks

The key notion in knowledge representation is that of a knowledge network, also
called conceptual network. There exist three main types of conceptual networks
in languagedenotative, connotative, and metaphorical (see, for example, Danesi
2000). In its simplest denition, a concept designates the conventional meaning
we get from a word. As it turns out, however, it is not a straightforward matter to
explicate what a concept is by using other words to do so. Consider, for example,
what happens when we look up the denition of a word such as cat in a dictionary.
Typically, the latter denes a cat as a carnivorous mammal (Felis catus) domes-
Unauthenticated
164 | 3 Computation
ticated since early times as a catcher of rats and mice and as a pet and existing in
several distinctive breeds and varieties. The problem with this denition is that it
uses mammal to dene cat. What is a mammal? The dictionary denes mammal as
any of various warm-blooded vertebrate animals of the class Mammalia. What is
an animal? The dictionary goes on to dene an animal as a living organism other
than a plant or a bacterium. What is an organism? An organism, the dictionary
stipulates, is an individual animal or plant having diverse organs and parts that
function together as a whole to maintain life and its activities. But, then, what
is life? Life, it species, is the property that distinguishes living organisms. At
that point it is apparent that the dictionary has gone into a conceptual loopit
has employed an already-used concept, organism, to dene life.
Looping is caused by the fact that dictionaries employ words to dene other
words. As it turns out, the dictionary approach just described is the only possible
onefor the reason that all human systems of knowledge seem to have a looping
structure. This suggests that the meaning of something can only be inferred by
relating it to the meaning of something else to which it is, or can be, linked in
some way. So, the meaning of cat is something that can only be inferred from the
circuitry of the conceptual associations that it evokes. This circuitry is part of a
network of meanings that the word cat entails.
Each associated meaning or concept is a node in the network. There is no
limit (maximum or minimum) to the number and types of nodes and circuits that
characterize a concept. It depends on a host of factors. In the network for cat,
secondary circuits generated by mammal, for example, could be extended to contain carnivorous, rodent-eater, and other nodes; the life node could be extended to
generate a secondary circuit of its own containing nodes such as animate, breath,
existence, and so on; other nodes such as feline, carnivorous, Siamese, and tabby
could be inserted to give a more detailed picture of the conceptual structure
of cat. In a circuit there is always a focal nodethe one chosen for a discourse
situation. In the above network cat is the focal node, because that is the concept
under consideration. However, if animal were to be needed as the focal concept,
then cat would be represented differently as a nonfocal node connected to it in a
circuit that would also include dog and horse, among other associated nodes. In
effect, there is no way to predict the conguration of a network in advance. It all
depends on the purpose of the analysis, on the type of concept, and on other such
factors that are variable and/or unpredictable.
In psychology, the primary nodesmammal, animal, life, and organismare
called superordinate ones; cat is instead a basic concept; and whiskers and tail
are subordinate concepts. Superordinate concepts are those that have a highly
general referential function. Basic concepts have a typological function. They allow for reference to types of things. Finally, subordinate concepts have a detail-
Unauthenticated
| 165
ing function. Clearly, the conguration of a network will vary according to the
function of its focal nodethat is, a network that has a superordinate focal node
(mammal) will display a different pattern of circuitry than will one that has a basic
concept at its focal center.
The above description of cat constitutes a denotative network. Denotation is
the initial meaning captured by a concept, as is well known. Denotative networks
allow speakers of a language to talk and think about concrete things in specic
ways. But such networks are rather limited when it comes to serving the need
of describing abstractions, emotions, morals, and so on. For this reason they are
extended considerably through further circuitry. Consider the use of cat and blue
in sentences such as:
1.
2.
3.
4.
Hes a real cool cat.

Today Ive got the blues.
She let the cat out of the bag.
That hit me right out of the blue.
These encode connotative and metaphorical meanings. The use of cat in (1) to
mean attractive or engaging, comes out of the network domain associated
with jazz music and related pop culture circuits (Danesi 2000); and the use of
blues in (2) to mean sad, gloomy, comes out of the network domain associated
with blues music. In effect, these have been linked to the networks of cat and blue
through the channel of specic cultural knowledge. They are nodes that interconnect cat and blue to the network domains of jazz and blues music. The meaning
of something secret associated with cat in example (3) above and the meaning
of unexpectedness associated with blue in (4) result from linking cat with the
secrecy network domain and blue with the sky domain. Sentence (3) is, in effect,
a specic instantiation of the conceptual metaphor animals reect human life and
activities, which underlies common expressions such as: Its a dogs life; Your life
is a cats cradle; I heard it from the horses mouth. Sentence (4) is an instantiation
of the conceptual metaphor Nature is a portent of destinywhich literary critics
classify as a stylistic technique under the rubric of pathetic fallacy. This concept
underlies such common expressions as: I heard it from an angry wind; Cruel clouds
are gathering over your life.
A comprehensive network analysis of cat and blue for the purposes of MT
would have to show how all meaningsdenotative, connotative, metaphorical
are interconnected to each other through complex circuitry that involves both
ILA and ELA. It would also have to add a statistical measure of the frequency of
the probable presence of a specic circuitry in a discourse textas will be discussed below. It is the ability to navigate through the intertwining circuitry of such
Unauthenticated
166 | 3 Computation
networks, choosing appropriate denotative, connotative, or metaphorical nodes

according to communicative need, and integrating them cohesively into appropriate individually-fashioned circuitry to match the need, that constitutes human
discourse competence.
Network analyses of conversations within MT and specically within Interlingua have shown, above all else, that discourse is structured largely by internetwork linkages. There are various kinds of such linkages that characterize discourse ow. Some of these contain nodes based on narrative traditions; these are
concepts referring to themes, plot-lines, characters, and settings that surface in
narratives. Calling someone a Casanova or a Don Juan, rather than lady-killer,
evokes an array of socially-signicant connotations that these characters embody.
Referring to a place as Eden or Hell elicits connotations that have a basis in mythic
and religious narrative. Work in knowledge networks is starting to show how analyses of this type might be programmed into sophisticated algorithms via the notions of nodes and circuits
As a simple example of what a knowledge (conceptual) network might look
like, consider the following one, which shows how various circuits connected
with snake can be linked into an interconnected representation (from Kendon and
Creen 2007):
sidney
slither
is a
is a
vegetarian
grass_snake
size
crocodile
green
small
is a
is a
color
eats
meat
is a
reptiles
is a
snake
has
no_legs
Figure 3.8: Knowledge network for snake
Unauthenticated
| 167
As can be seen, this is a denotative network. Connotative linkages added to it

would include the use of snake as a metaphor for human personality and as a
symbol of biblical temptation (among others).
The above discussion had a twofold intent: rst, it aimed to trace the origin
of CL to Machine Translation and second it attempted to show that a sophisticated
form of MT would have to involve ILA and ELA, as it is beginning to do. MT has
thus been a critical paradigm in the evolution of CL. Neuman (2014: 61) sheds the
following light on this whole area of inquiry:
The reason for using MT is twofold. First, there is no better way to understand the loss
accompanying translation than by examining the most structured and formal attempt of
translation known today. Second, instead of pointing at the problems and errors of MT,
I suggest using it in order to better understand cultural peculiarities and discrepancies. The
second suggestion is somewhat counterintuitive as we positively think of eliminating errors
and solving problems. Sometimes, however, errors can be used for the better.
3.3.3 Theoretical paradigms

The greatest advancements in MT have occurred since the early 2000s via Google.
As imic and Vuk (2010: 416) have aptly put: The impact of the Internet on MT
is manifold. MT thus continues to be a major focus within CL because of Google
Translation and its apparent efficacy in knowledge representation. With the raw
data available on the web, Google has taken MT to another level because of the
possibility of data mining and the almost instantaneous analysis of the data with
statistical software and with the use of learning and optimization algorithms,
many of which integrate the information into knowledge circuits.
MT on the web started in 1996 with a system called Systran Offering which
translated small, mainly formulaic, texts. This was followed by AltaVista Babelsh
(1997) and Moses (2007). From these, sophisticated translation systems were developed across the globe. In 2012, Google announced that its Google Translate
had the capacity to translate enough text that would ll one million paper books
per day. The key technique in the Google system is the use of data mining procedures involving statistical analysis of bilingual corpora integrated with knowledge network models of various kinds. Because of its ability to access such huge
amounts of textual data, Google Translate has become very effective in assessing differences between denotative, connotative, and metaphorical networks and
how these can be mapped onto syntactic structures. In so doing, it has indirectly
shown that meaning is not embedded in syntactic representation, but rather that
the reverse may be true. Google Translate works by detecting patterns in hundreds
of millions of documents that have been translated by humans and making in-
Unauthenticated
168 | 3 Computation
ferences (or more accurately, extracting patterns) based on statistical analyses.

More recently, algorithms have been developed to analyze smaller corpora, focusing instead on knowledge networks through recognition programs. The two
most-commonly used models of translation used are BLEU (Bilingual Evaluation
Understudy) and NIST (National Institute of Standards and Technology). Both use
an n-gram mean measure. NIST is based on BLEU, with the difference lying in the
ways in which the two systems calculate the meanBLEU calculates the geometric
mean, NIST the arithmetic mean. Another well-known system is the F-measure,
which determines the maximum matching between the SL text and the TL text
(Papineni, Roukos, Ward, and Zhu 2002).
N-gram theory has become a major theoretical paradigm within CL generally.
An n-gram model predicts the next item in a sequence of an (n 1)-order Markov
chain. The idea goes back to the founder of information theory, Claude Shannon
(1948), who asked the question: Given a sequence of letters (for example, the
sequence for ex) what is the likelihood of the next letter? A probability distribution to answer this question can be easily derived given a frequency history of
size n: a = 0.4, b = 0.00001, , where the probabilities of all the next letters sum
to 1.0. In strict mathematical terms, an n-gram model predicts xi :
P(x i | x i x(n1) , . . . , x(i1) )
In the model, the probability of a word is computed by determining the presence
of a certain number of previous words. Despite some critiques of n-gram models,
in practice, they have proven to be very effective in modeling language data. Although MT does not rely exclusively on n-gram theory, it uses it in tandem with
Bayesian inferencea statistical method whereby all forms of uncertainty can be
expressed in terms of probability metrics (as discussed). Basically, this involves
using prior distributions to predict unknown parameters. It is a kind of posterior
(post hoc) analysis whereby future observations are based on previous ndings.
When a language input is involved, a Bayesian analysis is used to gauge the delity
of a possible translation. The variables used in such analysis include:
1. the position of a word in a text
2. the linguistic features typically associated with the topic or theme of the text
(which involves specic kinds of grammatical and lexical choices given the
networks and circuitry that words entail, as we saw above)
3. syntactic considerations involving the likelihood that a certain structure will
follow or precede others
By converting items in a text into a set of n-grams, the sequence in an SL text can
be mapped against, and compared, to a sequence in the TL text. Then z-scores can
be used to compare texts in terms of how many standard deviations each n-gram
Unauthenticated
| 169
differs from its mean occurrence in large corpora. Research on the use of BLEU has
shown that there is a strong positive correlation between human assessments and
delity of translation by using n-gram algorithms (Doddington 2002, Coughlin
2003, Denoual and Lepage 2005).
N-gram theory has brought about great interest in machine-learning as a theoretical paradigm. Machine-learning (ML) is now a branch of AI and CL. It studies
how computers can learn from huge amounts of data by using statistical techniques such as n-grams. An everyday example of an ML system is the one that distinguishes between spam and non-spam emails on many servers, allocating the
spam ones to a specic folder. An early example of ML goes back to the 1956 Dartmouth workshop which introduced the rst program that learned to play checkers
by competing against a copy of itself. Other programs have since been devised for
computers to play chess, backgammon, as well as to recognize human speech and
handwriting.
Simply put, ML algorithms are based on data mining information which is
converted into knowledge network systems to produce knowledge representation.
In some instances the algorithm attempts to generalize from certain inputs in order to generate, speculatively, an output for previously unseen inputs. In other
cases, the algorithm operates on inputs where the desired output is unknown,
with the objective being to discover hidden structure in the data. Essentially, such
ML algorithms are designed to predict new outputs from specic test cases. The
algorithms thus mimic inductive learning by humans, that is, the extraction of
a general pattern on the basis of specic cases. This whole line of investigation
has, remarkably, led to the construction of robots which acquire human-like skills
through the autonomous exploration of specic cases and through interaction
with human teachers.
Perhaps the rst scientist to devise an ML algorithm for MT was Makoto Nagao
in 1984, who called his technique example-based MT. Using case theory in linguistics (Fillmore 1968), Nagao based his algorithm on analogy-making in language. From a corpus of texts that had already been translated, he selected specic model sentences to get the algorithm to translate other components of the
original sentence, combining them in a structural way to complete the translation.
Nagaos system, as far as I can tell, has been rather successful, but still falls short
of translating texts with full communicative and conceptual delity and certainly
does not approach the power of human analogy. To this day, the main obstacle is
gurative sense, which an algorithm would need to untangle from the structure
of a text, before bits and pieces can be put together according to strict rules of
syntax. As Bar-Hillel argued, without a universal encyclopedia, a computer will
probably never be able to select the appropriate meaning of the word on its own.
Unauthenticated
170 | 3 Computation
Work today in ML involves strategies designed to further overcome Bar-Hillels

paradox. Two main ones have emerged, called the shallow and the deep approach. The former uses plain statistical n-gram analysis to determine the sense
of an ambiguous word on the basis of the words surrounding it in a text. Collocation theory is thus used abundantly in shallow disambiguation. A collocation
is a sequence of words that typically co-occur in speech more often than would
be anticipated by random chance. Collocations are not idioms, which have xed
phraseology. Michael Halliday (1966) used the example of strong tea to dene a
collocation. He pointed out that the same idea could be conveyed with powerful
tea, but that this is not used at all or is considered to be anomalous by native
speakers. On the other hand, powerful would work with computer (powerful computer) whereas strong does not (strong computer). Phrases such as crystal clear,
cosmetic surgery, and clean bill of health are all collocations. Whether the collocation is derived from some syntactic criterion (make choices) or lexical (clear cut),
the principle underlying collocationsfrequency of usage of words in tandem
always applies. And it is this principle that undergirds shallow disambiguation.
First, the algorithm identies a key word in context and then determines the frequency of combination of other words with the key word in order to disambiguate
the meaning of the phrase. Thus, z-scores and other such measures are built into
the algorithm.
In general, shallow approaches use statistical analysis to determine likelihood of meaning of a word or phrase. On the other hand, deep approaches assume
much more, combining statistical methods with tags for ambiguities in textual
structures. The latter seek out potential ambiguities and then assign the appropriate interpretation to them based on the statistical probabilities of those meanings
occurring in specic kinds of texts. The algorithmic model can, however, become
rather complex. Known as ontological modeling, this kind of knowledge extraction
and disambiguation involves parsing and tree-structuring the options, amalgamating the parsing results with appropriate knowledge networks. In some cases,
more than 50,000 nodes may be needed to disambiguate even simple stretches
of text. Using so-called similarity matrices, a deep-approach algorithm can then
match meanings of words in syntactic phrases and assign a condence factor
using statistical inference. Evidence that such approaches are productive comes
from the fact that Google and military departments of the government are extremely interested in developing ML to make it as effective as possible.
It is useful to look at how computer scientists might go about solving translation problems today. One of these is the work of Dr. Yuqing Gao, reported on the
website: http://blogs.oregonstate.edu/mandyallen/cultural-site-dr-yuqing-gao/
4-technology-tools-methods-used/. In translating phylogenetically disparate languages (English and Chinese, for example) the task of MT is enormous, because
Unauthenticated
| 171
the algorithm consists of a mathematical formula that attempts to constrain error

to an absolute minimum. An example of how an algorithm breaks the process of
translating from English to Chinese down is the following owchart:
!S!
QUERY
SUBJECT
WELLNESS
PLACE
PLACE
is
he
bleeding any where else
PREPPH
BODY-PART
besides
his abdomen
!S!
PLACE
SUBJECT
WELLNESS
PREPPH
BODY-PART
PLACE
(besides)
(his) (abdomen)
(anywhere else)
QUERY
(he) (bleeding)
Figure 3.9: An example of how English is translated into concepts, then recombined from concepts into Chinese. IBM, 2007
In this case a probabilistic approach is required. The following diagrams show the
mathematical formulas that apply.
k = arg max
k g
log
l= sql m=
( )
f k ,s,c m ,c m+ ,s n ,s n
k g
( )
f k ,s ,c m ,c m+ ,s n ,s n
s V k
( )
g k f k ,s,c m ,c m+ , w m , w m+ ,s n ,s n
log
k g
s
( )
f k ,s ,c m ,c m+ , w m , w m+ ,s n ,s n
V k
Figure 3.10: Using statistics to translate spoken language into concepts. IBM, 2007
Unauthenticated
172 | 3 Computation
It is obvious that the task is a complex one and the mathematical system used a
highly sophisticated one. The interesting thing about the algorithm above is that
it breaks down the process into concepts rather than words and then assigns a
statistical modeling framework to it. It is beyond the scope here to delve into the
mathematical relation to the knowledge representation system in question. It is
sufficient for the present purposes to simply present it, since MT today is venturing into territories that even linguists have rarely entered in the past. And these
territories are drawing mathematics and linguists closer and closer together in the
search for determining the computability of relevant phenomena.
3.3.4 Text theory

Statistically-based algorithms for MT using such mathematical notions as ngrams and Bayesian analysis have redirected the thrust of linguistic theory
towards understanding discourse and the nature of texts in general. For example, the machine programmer must know how each type of text might encode
knowledge networks based on specic cultural traditions that will inuence
grammatical and lexical choices. In other words, MT has led to the specic study
of discourse texts (ELA) in terms of how they are stitched together grammatically
and lexically (ILA). A relevant typology of texts is the following one (Danesi and
Rocci 2009). This is a minimal one used here for illustrative purposes:
1. Foundational and reference texts, such as sacred texts, ritualistic texts, foundation myths, charters, sayings, monument inscriptions, certain kinds of literary works (fables, tales), festivities, culinary traditions, and so on, have a
high degree of n-gram predictability and thus lend themselves more easily to
knowledge representation and MT.
2. Historical texts designed to preserve the extant traditions of a community
such as legends and history books also have a high degree of n-gram predictability but are more likely to create ambiguities than the foundational
texts.
3. Canonical texts recording and ensconcing civic obligations and ethical principles, including philosophical texts, folk sayings, and juridical texts also have
a relatively high degree of machine translatability.
4. Formulaic texts that record and encode constitutions, laws, and other standardized systems for everyday interactions are also highly translatable, but
changes occur when political or economic systems change at which point
translatability becomes more problematic.
5. Implied texts, which are the written and unwritten rules of interaction and
discourse are the ones that give MT the highest degree of difficulty since
Unauthenticated
| 173
they are replete with all kinds of circuitsdenotative, connotative, and

metaphoricaland thus with polysemous features.
Being able to distinguish among the features and functions of such texts constitutes what Bar-Hillel called an encyclopedic reference system. Equally relevant
are those texts that constitute points of reference for the community, either because they represent an implied value system, or because they are considered
to be sources for inspiration as authoritative cultural benchmarks. In the former
case, referring to the texts in question is considered to provide inherent proof to
support a certain claim (this role is played, for instance, by the Bible in Christian
communities); in the latter case, allusion to the texts in question is considered
to have intrinsic validity in itself because they have become widely recognized as
being particularly meaningful to the culture (this role is played, for instance, by
Shakespeares plays, which are perceived as critical for understanding the world
by members of Anglo-American culture). Such texts constitute a repository of
wisdom, which is often echoed in all kinds of conversations. These texts undergird the use of expressions such as the following:
1. Hes a real Romeo, without his Juliet.
2. He certainly is no Solomon.
3. Your statements are apocryphal.
4. The way things are going, Armageddon is just around the corner.
Allusive elements of this kind abound in actual conversations, requiring encyclopedic knowledge to understand or translate. In He pulled a Machiavellian trick on
me, the use of the word Machiavellian refers to the Italian philosopher Niccol
Machiavelli and, especially, his treatise titled The Prince, in which Machiavelli
claimed that expediency in achieving a desired goal is to be given prominence over
ethical behavior and morals. How would this be translated into a TL that has no access to this reference system? Paraphrase and elaboration are used systematically
in such cases, but the translation loses its connection to cultural intertextuality.
In a way, such allusions are conceptual metaphors whereby the source domain
is a lexical eld that includes personages derived from canonical texts such as
the Bible, Shakespeare, Machiavelli, and so on. There have been some interesting
attempts to incorporate conceptual metaphors and cultural information of this
kind into computer algorithms (starting with MacCormac 1985), but these have
not as yet produced the required sophistication that humans have when accessing
historical intertextual meanings. The computer would have to tap not only into
all lexical polysemous constructions but also the interrelationship of lexemes,
phrases, concepts, and cultural reference points in tandem.
Unauthenticated
174 | 3 Computation
3.4 Natural Language Processing

The histories of MT and NLP (Natural Language Processing) are intertwined. As
mentioned above, the 1954 experiment in MT carried out at Georgetown University, which consisted of more than 60 Russian sentences that were translated by
machine into English, laid the foundations for NLP. There was great expectation
after the experiment that effective machine translation would be a possibility, despite anomalous translations such as the one discussed above. However, a government report in 1966 pointed out that MT was essentially a failure and with virtually
no governmental funding work on MT subsided dramatically until the 1980s when
Bayesian and n-gram models of MT emerged to rescue it and expand it considerably, as discussed above. The experiment also made it obvious that the semantic
features of NLP would need to be tackled alongside grammatical ones before effective automatic translation systems could be developed. This meant, at rst, writing programs that would make the computer simulate human speech. A famous
early one was developed by Joseph Weizenbaum (1966), which he called ELIZA.
It was a program designed to mimic a dialogue with a psychotherapist. ELIZAs
questions such as Why do you say your head hurts? in response to My head
hurts were perceived by subjects as being so realistic that many believed that
the machine was actually alive. But, as Weizenbaum (1976) wrote a decade later,
ELIZA was a parodic imitation of psychoanalytic speech; it had no consciousness
of what it was saying.
ELIZA was the start of NLP. It was shortly after that many NLP algorithms
emerged, with each one coming closer and closer to simulating human speech
that verged on verisimilitude. The early NLP languages were constructed with versions of BASIC, as in the dialogue discussed above. Heres another example for the
sake of illustration:
What is your name?

Jennifer.
Hello Jennifer. How many children do you have?
Two.
Do you want more children?
Yes.
How many children do you want?
One.
Do you want more children (after that)?
No.
Goodbye, Jennifer.
Unauthenticated
175
The would be written, more or less as we saw above, as follows:

10
20
30
40
50
60
70
80
90
100
110
120
130
140
INPUT What is your name: , U$

PRINT Hello ; U$
INPUT How many children do you have: , N
S$ =
FOR I = 2 TO N
S$ = S$ + **
NEXT I
PRINT S$
INPUT Do you want more children? , A$
IF LEN(A)$ = 0 THEN GOTO 90
A$ = LEFT$(A$, 1)
IF A$ = Y OR A$ = y THEN GOTO 30
PRINT Goodbye ; U$
END
Without going into specics here, it is instructive to note that the program models
question-and-answer sequences that characterize human conversation in terms of
distinct algorithmic states. Even this early simple program shows how real-world
information can be transformed into computer-usable instructions. Until the late
1980s, most systems were based on BASIC. Shortly thereafter, ML programs using
statistical and n-gram models, rather than strict if-then rules, greatly enhanced
the ability of algorithms to simulate human conversation by incorporating the relative certainty of possible questions and answers in common stretches of dialogue
into the instructions. These algorithms work effectively if the conversation tends
to be script-based. Take, for instance, what is involved in successfully ordering
a meal at a restaurant. The components of this script include: a strategy for getting the waiters attention; an appropriate response by the waiter; a strategy for
ordering food to t ones particular tastes and nancial capabilities; an optional
strategy for commenting favorably or unfavorably on the quality of the food. Any
radical departure from this script would seem anomalous and even result in a
breakdown in communication.
3.4.1 Aspects of NLP

The development of NLP has been a challenging task because computers traditionally require humans to interact with them in a programming language that is
precise, unambiguous and highly structured. Human speech, however, is rarely
precise and highly variable along various social and geographical axes, which include slang, dialects, and registers. To solve these dilemmas, current approaches
Unauthenticated
176 | 3 Computation
to NLP use learning-based AI that examines patterns in data to improve a programs own understanding. Typical tasks today include the following:
1. developing subprograms for segmenting sentences, as well as tagging and
parsing the parts of speech
2. applying sophisticated data processing methods capable of yielding outputs
from large and multi-source data sets that consist of both unstructured and
semi-structured information (known as deep analytics)
3. developing methods of information extraction that locate and classify items
in a text into pre-established categories, such as peoples names, organizations, expressions of times, and so on (known as named-entity extraction)
4. determining which expressions refer to the same entity in a text (known as
co-reference resolution)
As even this minimal list shows, NLP has allowed linguists to understand the components of language and their relation to external knowledge representation in a
very precise way. One of the best known NLP approaches to this internal-versusexternal modeling is script theory, especially as developed initially by computer
scientist Roger Schank (1980, 1984, 1991), which has had signicant implications
for pragmatics and the study of discourse. It assumes that some (perhaps many)
human interactions are governed by internal scripts, which refer essentially to
the real-world knowledge structures that manifest themselves in typical social
situations. They allow people to carry out conversations effectively. The computational task at hand is described by Schank (1984: 125) as follows:
When we read a story, we try to evaluate the reasoning processes of the main character. We
try to determine why he does what he does and what he will do next. We examine what
we would do in a similar situation, and we try to make the same connections that the main
character seems to be making. We ask ourselves, What is he trying to do? Whats his plan?
Why did he do what he just did? Any understanding system has to be able to decipher the
reasoning processes that actors in stories go through. Computer understanding means computers understanding people, which requires that they understand how people formulate
goals and plans to achieve those goals. Sometimes people achieve their goals by resorting
to a script. When a script is unavailable, that is, when the situation is in some way novel,
people are able to make up new plans.
Making contact with a stranger, for instance, requires access to both the appropriate cultural script, its contextualization, and the verbal structures that encode
it. If the contact occurs in an elevator, the script might call for talking about the
weather. By extension, all social actions and interactions can be conceived in
terms of this script-language-context complementarity. The enactment of agreements, disagreements, anger, irtations, and so on can be seen to unfold in a
script-like fashion.
Unauthenticated
177
Work in contemporary NLP has been using script theory effectively, alongside other theoretical paradigms (discussed above). By decomposing even a simple script-like conversation into its pragmatic, linguistic, and conceptual components, NLP has developed a truly sophisticated array of tasks and research questions that overlap considerably with research agendas in pragmatics and conversational analysis. Some of these are listed below (note that these summarize much
of the foregoing discussion about CL):
1. nding ways to produce conceptually-appropriate machine-readable summaries of chunks of text (automatic summarization)
2. determining which words in a text refer to the same objectsfor instance,
matching pronouns and adverbs with preceding (anaphora) or following (cataphora) nouns or names (coreference resolution)
3. classifying discourse texts in terms of their social function (yes-no question,
content question, assertion, directive, and so on), since many can be decoded
in terms of script theory
4. segmenting words into their constituent morphemes (morphological segmentation) and then relating these to their use in a text
5. determining which items in a text refer to proper nouns (people names,
places, organizations, and so on) (named entity recognition)
6. converting computer language into understandable human language (natural
language generation)
7. understanding which semantic-conceptual rules apply in a certain text, while
others are excluded (natural language understanding)
8. determining the text corresponding to a printed text image (optical character
recognition)
9. tagging the part of speech for each word so that its role in sentences and its
connection to the lexicon can be determined; this is part of disambiguation,
since many words are polymorphic, that is, pertain to different morphological
classes, as, for example, the fact that the word set can be a noun (I bought a
new set of chess pieces), a verb (I always set the table) or adjective (He has too
many set ways of thinking)
10. parsing a sentence effectively, since in addition to being polysemous and polymorphic, natural languages are also polyanalytical, that is, sentences in a
language will have multiple syntactic analyses (Roark and Sproat 2007); different types of parsing systems, such as dependency grammar, optimality theory, and stochastic grammar are, essentially, attempts to resolve the parsingrepresentation problem
11. identifying relationships among named entities in a text (who is the son of
whom, what is the connection of a some thing to another, and so on)
Unauthenticated
178 | 3 Computation
There are a host of other problems that NLP research faces in devising algorithms
to produce natural language-like outputs. The usefulness of this approach to
linguists is that it allows them to zero in on the various components that make
up something as simple as a sentence or a conversational text. NLP has made
great strides in many areas and, like work on algorithms in various elds of
human endeavor (from ight simulation to medical modeling), it has produced
some truly remarkable accomplishments. For example, in the area of speech
recognition technology, voice-activated devices that skip manual inputting are
now routine. The work in this area has shed light on how oral speech relies not
so much on pauses between items, but on other segmental cues. For example,
in speech /naitrait/ is not articulated with a break between /nait/ and /rait/, but
the word could be either a single morpheme, nitrate, or two morphemes, night
rate. So the segmentation process involves not only determining which phonic
cues are phonemic, but also contextual ones that produce the relevant cues to
determine word boundaries.
3.4.2 Modeling language

Overall, the transformation of natural language input into algorithms involves
developing both a representational code (how the parts of sentences cohere at
all levels) and then an execute code, such as HTML or LATEX. In both cases the
idea is to specify what something is at the same time that the same item is used.
In other words, the software utilizes both the formal information about structure
(metalinguistic information) and information about its function in a unit such as
a sentence (contextual information). Here are a few examples from HTML:
1.
Headings are dened with <h> tags (<h1>, <h2>, ) and specied as follows:
<h1>This is a heading</h1>
and so on
2.
Paragraphs are dened with the <p> tag:

<p>This is a paragraph.</p>
<p>This is another paragraph.</p>
3.
Links are dened with the <a> tag:

<a href="http://www.mathematics.com">This is a link</a>
NLP has made great strides in producing ML systems. The fundamental goal is understanding the relation between the system (language), its representation (mod-
Unauthenticated
3.5 Computation and psychological realism
179
eling), and how these connect to the outside world, both bringing it into the system and using the system to understand the outside world. The complexity of this
task has been made obvious by the fact that the rule systems employed by computer languages in NLP are intricate and difficult to develop. In my view, the main
goal of NLP is to nd simpler languages that have the same kind of ergonomic
power of human language. NLP holds great promise for making computer interfaces easier for people, so as to be able to talk to the computer in natural language,
rather than learn a specialized language of computer commands. This can be
called a meta-ELIZA project, in reference to one of the rst programs attempting
to simulate speech.
Both CT and CL constitute interdisciplinary hermeneutic modes of investigation, involving linguists, computer scientists, articial intelligence experts, mathematicians, and logicians in the common goal of unraveling the nature of mathematical and linguistic phenomena by modeling them in the form of algorithms.
To summarize, the algorithms devised by computer scientists are insightful on at
least three counts:
1. They force analysts to unravel the relation between structure and meaning in
the formation of even the simplest sentences and the simplest mathematical
formulas and thus to focus on how the constituents of a mathematical or linguistic form can lead to the production of meaningful wholes by means of the
relation among them (internal information) and with the real world (external
information). The computer cannot do this; the analyst can and must do it in
representing the knowledge system or subsystem involved.
2. They produce machine-testable models that can then be discussed vis--vis
the theoretical models of mathematicians and linguists.
3. They emphasize the relation among representation, internal knowledge, and
contextualization and how these might be modeled.

In the previous chapter, Colyvans metaphor of game-playing was adopted to
describe how formalists went about their tasks; the same metaphor can be extended to computational mathematicians and linguists, since they too are playing
a kind of intellectual game, but with computers rather than with pen and paper.
The question is whether or not the games played are psychologically real in any
sensethat is, if they truly mirror what is going on in the brain as it produces
mathematics or languageor whether they are exercises in AI.
CL as a eld emerged from the early failures of MT, as we saw above. And
CT emerged from the rephrasing of standard general problems in mathematics
Unauthenticated
180 | 3 Computation
in computational terms. In MT, the initial task was to translate from one system
(S1 ) to another (S2 ), seeking equivalences in the structure and the lexicon of the
two, but this turned out to be insufficient to produce translations that approximated the abilities of a human translator. So, work in MT led eventually to a focus
on semantic systems, real-world knowledge (network theory), pragmatic forms
(scripts), and so on and so forth. I argued that the rise of pragmatics and conceptual metaphor theory in linguistics came about, at least indirectly, by the failures
of MT and the rise of CL to solve unexpected problems such as the high density of
metaphorical speech in language. In this case the computer was a catalyst in expanding the purview of linguistics. That is to say, what began as an effort to make
MT more imitative of human translation, morphed into a discipline dedicated to
unraveling the nature of language using computer modeling and simulation. Similarly, in CT, it can be argued that the rise in the heuristic modeling of problems,
rather than on concrete solutions, including which problems have or have not a
solution, has expanded mathematics epistemologically. So, the research in both
CL and CT has led to expanding the research paradigms of both mathematics and
linguistics as well as the span of the common ground on which they rest. But
we are still left with the question of whether the algorithms are truly reective
of human mental processes and thus truly describe the nature of language and
mathematics.
Aware of the implications of this question, computer scientists have been developing algorithms to test theories of language development and even to predict
certain aspects of how languages are acquired in infancy. If there is a match then
the conclusion is, surely, that the algorithms are psychologically real. The computer modeling of language learning has the advantage of making it possible to
manipulate the algorithms and the data as the data is assembled. This is an example of black box testing, or checking that the output of a program yields what is
expected and then, on that basis, to infer the validity of the algorithm, modifying
it appropriately.
3.5.1 Learning and consciousness

Algorithms are used best when they are used as testers of human theories. It is
in so doing that a wide array of new theoretical suggestions emerges that may be
psychologically real. How do we recognize a problem as a problem? How do we
learn to solve it? Can this be modeled? If the algorithms are modied to answer
these questions, they will give us insights into our theories and what goes on in
our brains as we devise the theories.
Unauthenticated
181
The computer model uses virtual symbols to understand the psychology of

real symbols. This is, however, what may actually be happening when children go
from counting to using number symbols. As math educator Richard Skemp (1971:
101) has put it, this shift from concrete to abstract knowledge is the essence of how
we learn mathematics (see also Vygotsky 1961). Computationism is a different way
to approach the study of this shift. A program can be built to mimic, for example,
the problem-solving process in humans, as it goes from concrete to abstract ways
of solving the same problems. Now, whether or not this is true in any psychological
sense is a moot pointit is interesting, and that is all we can ask of a theory.
Computer programs allow us to examine the learning process in a specic
way:
1. Data structure. The actual features (data) of a proof, for example, have a structure that must be specied, with each type of step in a proof stored in an
array with all possible selections annotated. This is probably what occurs in
memory as we solve a problem. The computational problem is to identify all
possibilities in the data structure.
2. Searching procedures. In the program a searching system is inserted that can
select the appropriate steps that t the problem, proof, or discourse text.
This includes pattern recognition and recall strategies for determining the
best t. All of these are now easily programmable features.
3. Tree structure. Learning programs organize datasets as trees, as in generative
grammar or Markov chains, which show the possible structure of the data
items to each other.
4. Evaluation. This is the part of a program that evaluates the selections as to
their t to the problem at hand. Unfortunately, these algorithms vary considerably across problems and tasks. But, then again, so does learning across
individuals.
5. Memory tasks. For problem-solving to be potentially applicable to all sorts of
related problems, tables must be devised to record solutions that have been
evaluated and saved for recalculation.
The prospects of completely modeling problem-solving in humans are rather remote. There simply is no computational method that can be devised to solve all
kinds of mathematical problems efficiently and in polynomially-economical time.
But the effort to do so is what is valuable. Given the enormous size of the data that
the computer can handle, it has allowed us to test hypotheses that would otherwise be impossible to carry out. This computer has thus become a very powerful
tool for probing the structure of human learning.
But this still leaves us with the dilemma of psychological realism, because it
is difficult to separate the theory-making process form the activity of thinking and
Unauthenticated
182 | 3 Computation
thus consciousness itself. The result is always undecidable. In his classic book,
Mental models (1983), Johnson-Laird gives us a good overall taxonomy for the
kinds of machines or theoretical algorithm systems that have been used (unconsciously) to model consciousness:
1. Cartesian machines which do not use symbols and lack awareness of themselves
2. Craikian machines (after Craik 1943) that construct models of reality, but
lack self-awareness
3. self-reective machines that construct models of reality and are aware of their
ability to construct such models.
Programs designed to simulate human intelligence are Cartesian machines in
Johnson-Lairds sense, whereas animals and human infants are probably Craikian
machines. But only human infants have the capacity to develop self-reective
consciousness, which Maturana and Varela (1973) aptly called autopoietic, that
is, a machine that is capable of self-generation and self-maintenance. To quote
McNeill (1987: 262):
Self-aware machines are able to act and communicate intentionally rather than merely as
if they were acting intentionally (of which Craikian machines are capable). This is because
they can create a model of a future reality and a model of themselves deciding that this
reality should come into being.
As McNeill (1987: 262264) goes on cogently to argue, self-awareness is tied to linguistic actions. The inner speech that Vygotsky discussed is a manifestation
of self-awareness. Unlike a Cartesian machine, a human being can employ selfawareness, at will, to construct models of reality. But this also means social conditioning, a dimension that is completely lacking in AI. As McNeill (1987: 263) states,
We become linguistically conscious by mentally simulating social experience.
Perhaps the view of consciousness that is the most relevant to the topic at
hand is the one put forward by Popper (1935, 1963). Popper classied the world of
the mind into three domains. World 1 is the domain in which the mind perceives
physical objects and states instinctively, whereby human brains take information by means of neuronal synapses transmitting messages along nerve paths
that cause muscles to contract or limbs to move. It is also the world of things.
World 1 may describe human-built Cartesian machines and Craikian machines/organisms. World 2 is the domain of subjective experiences. This is the level at
which the concept of Self emerges, as the mind allows humans to differentiate
themselves from the beings, objects, and events of the outside world. Craikian
machines might participate in this world, but likely to a limited degree. It is at
Unauthenticated
183
this level that we perceive, think, plan, remember, dream, and imagine; so actual
machines might simulate these faculties but not really possess them in any human sense. World 3 is the domain of knowledge in the human sense, containing
the externalized artifacts of the human mind. It is, in other words, the humanmade world of culture, including language and mathematics. This corresponds to
Johnson-Lairds self-reective level; in order to create mathematics one has to
possess this level of consciousness, otherwise mathematics would be reduced to
counting for survival. The World 1 states become World 2 and World 3 ones through
imaginative thought (such as metaphorical thought), not through algorithmic processes. As Hayward (1984: 49) has stated: we could say that our extended version
of Poppers World 3, which includes a very large part of World 1 and of World 2, is
formed by interacting webs of metaphor gestalts. There is no evidence that any
Cartesian or Craikian machine has access to this form of consciousness.
No Cartesian or Craikian machine can ever reach World 3 because it has no
historical knowledge that leads to it. Terry Winograd (1991: 220), a leading researcher himself in articial intelligence, has spotted the main weakness in the
belief that computational artifacts are psychologically real, putting it as follows:
Are we machines of the kind that researchers are building as thinking machines? In asking this kind of question, we engage in a kind of projectionunderstanding humanity by
projecting an image of ourselves onto the machine and the image of the machine back onto
ourselves. In the tradition of articial intelligence, we project an image of our language activity onto the symbolic manipulations of the machine, then project that back onto the full
human mind.
As Nadeau (1991: 194) has also put it, such exercises in theoretical reasoning in
both formalist and computationist models of mind are essentially artifacts: If
consciousness is to evolve on this planet in the service of the ultimate value, we
must, I think, quickly come to the realization that reality for human beings is a
human product with a human history, and thereby dispel the tendency to view
any product of our world-constructing minds as anything more, or other, than a
human artifact.
The computer is one of our greatest intellectual achievements. It is an extension of our logical intellect. We have nally come up with a machine that will
eventually take over most of the arduous work of the logical calculus. Arnheims
(1969: 73) caveat is still valid today: There is no need to stress the immense practical usefulness of computers. But to credit the machine with intelligence is to
defeat it in a competition it need not pretend to enter. In Sumerian and Babylonian myths there were accounts of the creation of life through the animation of
clay (Watson 1990: 221). The ancient Romans were fascinated by automata. By
the time of Mary Shelleys Frankenstein in 1818, the idea that robots could be
Unauthenticated
184 | 3 Computation
brought to life both fascinated and horried the modern imagination. Since the
rst decades of the twentieth century the quest to animate machines has led to
many fascinating achievements, from AI to Google. As William Barrett (1986: 160)
has warned, if a self-reective machine will ever be built it would have a curiously disembodied kind of consciousness, for it would be without the sensitivity,
intuitions, and pathos of our human esh and blood. And without those qualities
we are less than wise, certainly less than human.
3.5.2 Overview
The ELIZA program was an early attempt to model human speech, by simply
matching questions and answers on the basis of simple discourse patterns. The
humans who were exposed to ELIZA interpreted the answers as being delivered
by a conscious entity. ELIZA had passed the Turing Test, or Turings (1936) idea
that if a human cannot distinguish between the answers of a computer and a
human, he or she must conclude that the machine is indeed intelligent. This
raises some deep questions about intelligence and consciousness. So, although
it is well known, the Turing Test is worth revisiting here by way of conclusion to
the theme of computation in language and mathematics.
In 1950, shortly before his untimely death in his early forties, Turing suggested
that one could program a computer in such a way that it would be virtually impossible to discriminate between its answers and those contrived by a human being.
This notion quickly became immortalized as the Turing Test. Consider an observer
in a room which hides on one side a programmed computer and, on the other, a
human being. The computer and the human being can only respond to the observers questions in writingsay, on pieces of paper which both pass on to the
observer through slits in the wall. If the observer cannot identify, on the basis of
the written responses, who is the computer and who the human being, then he
or she must conclude that the machine is intelligent and conscious. It has passed
the Turing Test.
The counter-argument to the Turing Test came from John Searle (1984) and
his Chinese Room illustration. Searle argued that a machine does not know
what it is doing when it processes symbols, because it lacks intentionality. Just
like a human being who translates Chinese symbols in the form of little pieces
of paper by using a set of rules for matching them with other symbols, or little
pieces of paper, knows nothing about the story contained in the Chinese pieces
of paper, so too a computer does not have access to the story inherent in human
symbols. As this argument made obvious, human intentions cannot be modeled
algorithmically. Intentionality is connected intrinsically with the interpretation
Unauthenticated
185
of incoming information and the meaning codes that humans have acquired from
cultural inputs.
The modeling of mathematical and linguistic knowledge cannot be extricated
from the question of intentionality. It cannot be reduced to a Turing machine.
This does not preclude the importance of modeling information in itself, as argued throughout this chapter. Shannons (1948) demonstration that information
of any kind could be described in terms of binary choices between equally probable alternatives is still an important one. Information in this computable model
is dened as data that can be received by humans or machines, and as something that is mathematically probabilistica ringing alarm signal carries more
information than one that is silent, because the latter is the expected state of
the alarm system and the former its alerting state. When an alarm is tripped in
some way, the feedback process is started and the information load of the system
increases (indeed reaches its maximum). Shannon showed, essentially, that the
information contained in a signal is inversely proportional to its probability of
occurrencethe more likely a signal, the less information load it carries; the less
likely, the more. But this does not solve what can be called the central computational dilemmahow to get a machine to interpret information not in simple
probabilistic terms but in ways that relate the information to its historical meanings and to the intentions of the purveyor or conveyor of the informationthe
Chinese Room dilemma.
This problem seems to be intractable, even though the modeling methods
in CT and CL have become increasingly rened, sophisticated, and intelligent.
Work that allows computers to produce linguistic outputs that are very close to
human speech are improving dramatically, with algorithms that allow a computer
to modify its style of production to take into account abstract pragmatic factors
such as politeness, anger, deference, and other social features of register. But the
question becomes: Can the computer truly understand this (the Chinese Room
dilemma)? Comprehension is just as intractable as anything else in CT or CL. It is
relevant to note that Turing himself was aware of the limitations of computability
theories in general. He proved, in fact, that a machine, unlike humans, would not
stop for a given input and consider it differently from its program.
The premise guiding all computational modeling is that any theory of the
mind can be reduced to the search for the algorithmic procedures that relate mind
states to brain statesthat is, the minds functions can be thought of as attendant
to neurological states and that these, in turn, can be thought of as operations akin
to those that a computer is capable of carrying out. It is a form of black-box theorizing, as mentioned, but it avoids the Chinese Room dilemma and other aspects
of intentionality.
Unauthenticated
186 | 3 Computation
Terry Winograd (mentioned above) developed an early NLP program in 1979

that he called SHRDLUa name that came about by considering the frequency of
letters in English, with this stretch of lettersS, H, R, D, L, Uhaving a moderate
frequency in discourse texts. SHRDLU was capable of carrying out a seemingly
natural conversation with a user who inputted commands on a keyboard. The
machine would produce commands such as Find a block which is taller than
the one you are holding and put it into the box, and make comments such as
I dont understand which pyramid you mean. The following stretch of dialogue
between a person and SHRDLU has now become a classic one in the annals of AI:
Person: Pick up a big red block.
Computer: OK.
Person: Grasp the pyramid.
Computer: I dont understand which pyramid you mean.
Person: Find a block which is taller than the one you are holding and put it into
the box.
Computer: By it, I assume you mean the block which is taller than the one I am
holding.
Computer: OK.
Person: What does the box contain?
Computer: The blue pyramid and the blue block.
Person: What is the pyramid supported by?
Computer: The box.
Person: How many blocks are not in the box?
Computer: Four of them.
Person: Is at least one of them narrower than the one which I told you to pick up?
Computer: Yes, the red cube.
Clearly, SHRDLU passes the Turing Test, but it could not possibly pass the Chinese
Room Test. In fairness, the goal of NLP theories is not to bring the computer to
consciousness, but to get it to reproduce natural language in such a way that it
approximates what humans do when they talk, and thus glean insights from the
modeling process itself. The computer may not know what it is doing, but it does
it well nonetheless.
CL has opened up a truly fascinating debate about the nature of linguistic inquiry and how to conduct linguistic research. It is relevant to note that Chomsky
has often been skeptical of NLP, rmly believing that language is specic to the
human species and cannot be reproduced in computer software. As we have
discussed throughout, for Chomsky, the speech faculty is constituted by a set of
Unauthenticated
187
universal principles present in the brain at birth that are subjected to culturallyspecic parameters during infancy. The parameter-setting feature of Chomskys
theory assigns some role to experiential factors. But he has always maintained
that the role of the linguist is to search out the universal rule-making principles
that make up the speech faculty. In reviewing Chomskys Syntactic Structures,
Robert Lees (1959) predicted that it would revolutionize linguistics, rescuing it
from its prescientic and piecemeal descriptive practices. Data collection and
classicatory assemblages of linguistic facts are interesting in themselves, but
useless for the development of a theory of language. Chomsky (1990: 3) himself
articulated the main goal of linguistics as the search for an answer to the question: What is the initial state of the mind/brain that species a certain class of
generative procedures?
One of the more zealous advocates and defenders of this perspective is Jerry
Fodor (1975, 1983, 1987). Fodor sees the mind as a repository of formal symbols.
Because symbols take on the structure of propositions in discourse, and so serve
thought during speech, he refers to them as mental representations that are decomposable into nite-state rules that are converted to higher structures by conversion rules. Cumulatively, they constitute the brains language of thought.
Like Chomsky, Fodor sees language as a mental organ present in the brain at
birth, equipping humans with the ability to develop the specic languages that
cultures require of them. The psycholinguist Stephen Pinker (1990: 230231), another staunch formalist, agrees:
A striking discovery of modern generative grammar is that natural languages all seem to be
built on the same basic plan. Many differences in basic structure but different settings of
a few parameters that allow languages to vary, or different choices of rule types from a
fairly small inventory of possibilities On this view, the child only has to set these parameters on the basis of parental input, and the full richness of grammar will ensue when those
parametrized rules interact with one another and with universal principles. The parametersetting view can help explain the universality and rapidity of language acquisition: when
the child learns one fact about her language, she can deduce that other facts are also true of
it without having to learn them one by one.
The problem with such views is that, as Rommetveit (1991: 12) has perceptively
remarked, they ignore a whole range of lived phenomena such as background
conditions, joint concerns, and intersubjectively endorsed perspectives. As Rommetveit goes on to observe, we really can never escape the vagueness and indeterminacy of the social situation or of the intentions of the interlocutors when
we engage in discourse, no matter how precise the analysts assessment may
appear to be. Pinkers analysis of language ontogenesis is an acceptable interpretation, among many others, if it is constrained to describing the development of
Unauthenticated
188 | 3 Computation
syntax in the child. But it is not a viable psychological theory, because it ignores
a much more fundamental creative force in the childthe use of metaphorical
constructs to ll in knowledge gaps that the child development literature has documented rather abundantly.
From the failure of MT to incorporate performance factors into its algorithms,
work in CL has led indirectly to a refocusing of linguistic inquiry in general. It can
be argued that it brought about a signicant number of defections from the Chomskyan camp. A focus on how gurative meaning interconnects with other aspects
of language, including grammar, is the most promising direction for CL to take.
If nothing else, the plethoric research conducted on the worlds languages during the last century has amply documented that syntactic systems are remarkably
alike and rather unrevealing about the nature of how a message is programmed
differentially among people living in different cultures. It has shown, in my opinion, that syntax constitutes a kind of organizing grid for the much more fundamental conceptual-semantic plane.
The question becomes: If metaphor is truly a unique human feature of
language and mathematics, is it still programmable? Among the rst to model
metaphorical cognition computationally were Eric MacCormac (1985) and James
M. Martin (1990) who were able easily to model what rhetoricians call frozen
metaphors, those that have lost their metaphorical semantics due to frequency
of usage, leaving judiciously out the computational study of creative or novel
metaphors, which as they admitted are virtually impossible to model. But, despite the difficult computational problems involved, metaphor processing is a
rapidly expanding area in NLP. Because of its data-processing and data-mining
capacities, the corpus that the computer can examine for metaphoricity in real
speech has become a crucial part of NLP, with deep implications for the automatic
identication and interpretation of language indispensable for any true NLP.
The turn of the millennium witnessed a technological leap in natural language computation, as manually crafted rules have gradually given way to more
robust corpus-based statistical methods. This is also the case for metaphor research. Recently, the problem of metaphor modeling has become a central one,
given the increase in truly sophisticated statistical techniques. However, even the
statistically-based work has been producing fairly limited results in getting the
computer to understand metaphorical meaning. The computer can of course produce new metaphorical language ad innitum, but it takes a human brain to interpret it. At the same time, work on computational lexical semantics, applying
machine learning to open semantic tasks, has opened up many new paths for
computer scientists to pursue in programming metaphorical competence. It still
remains to be seen how far this line of inquiry can proceed. All that can really be
done is to examine the trends in computational metaphor research and compare
Unauthenticated
189
different types of approaches, so as to identify the most promising system features

and techniques in metaphor modeling.
Some research in is extremely promising. Terai and Nakagawa (2012), for instance, built a computational model of metaphor understanding based on statistical corpora analysis, which included a dynamic interaction among the intrinsic
features in the data. Their model was derived from a consideration of two processes: a categorization and a dynamic-interaction process. The former was based
on class inclusion theory, representing how a target domain is assigned to an ad
hoc category of which the source domain and the actual vehicle selected is a prototypical member. The model represents how the target assigned to the ad hoc
category is inuenced and how emergent features are emphasized by dynamic
interactions among them. The model, the researchers claim, is able to highlight
the emphasized features of a metaphorical expression.
But, as Lachaud (2013) has shown, conceptual metaphorical competence may
be impossible to articially model because of its intrinsic human psychology, so to
speak. He investigated if and how EEG (electroencephalogram) coherence would
differ between types of metaphor during comprehension. The hypothesis testing
implied formalizing an algorithm of conceptual metaphor processing before collecting EEG data from 50 normal adults and looking for condition-specic EEG
coherence patterns. His results conrmed the psychological reality of conceptual
metaphors. But, interestingly and intriguingly, they also supported alternative explanations of the algorithm and thus of the nature of complex metaphors.
Fan-Pei et al. (2013) looked more specically at blending theory as the
neurological source of metaphor production and comprehension. Previous eventrelated potential (ERP) studies had suggested that literal mapping occurs during
metaphor comprehension. However, their study used a two stimulus word-tosentence matching paradigm in order to examine the effects of literal mapping
and semantic congruity on metaphor comprehension using words from different
domains. ERPs were recorded when 18 participants read short novel metaphors
(for example, The girl is a lemon) or literal control sentences (for example, The fruit
is a lemon) preceded by either a relevant or irrelevant word. Five conditions were
measured: congruent target metaphor, congruent source metaphor, congruent
literal, incongruent metaphor, and incongruent literal conditions. Their analysis
revealed a signicant difference in the P600 amplitudes between incongruent
and congruent conditionsP600 is an event-related potential (ERP), or peak in
electrical brain activity, measured by electroencephalography. They also found
that mapping across remote domains evoked larger P600 amplitudes than mapping across close domains or performing no mapping. The results suggest that the
demands of conceptual reanalysis are associated with conceptual mapping and
incongruity in both literal and metaphorical language, which supports the notion
Unauthenticated
190 | 3 Computation
in blending theory that there is a shared mechanism for both metaphorical and
literal language comprehension. So, the eld is still open, needing much more
extensive research.
The main issues in the computational modeling of metaphor comprehension
include the following:
1. distinguishing algorithmically between conceptual and linguistic metaphor
2. distinguishing between frozen and novel metaphors
3. dening multiword metaphorical expressions
4. programming extended metaphor and metaphor in discourse
Metaphor processing systems that incorporate state-of-the-art NLP methods include the following themes and issues:
1. statistical metaphor processing modalities
2. the incorporation of various lexical resources for metaphor processing
3. the use of large corpora
4. programs for the identication of conceptual and linguistic metaphor
5. metaphorical paraphrasing
6. metaphor annotation in corpora
7. datasets for evaluation of metaphor processing tools
8. computational approaches to metaphor based on cognitive evidence
9. computational models of metaphor processing based on the human brain
Despite the many caveats mentioned in this chapter, in the end, all human knowledge inheres in model-making. Models of nature, of the mind, and so on are how
we ultimately understand things. The worst that could occur in science is, as
Barrett (1986: 47) has phrased it, that the pseudo-precise language of theorists leaves us more confused about the matters of ordinary life than we would
otherwise be. In computational approaches to mathematics and languageat
least as I see itthe goal has been to come up with a simple modeling language
that can penetrate the core of the brains processing capacities. In its search for
what it means to be human in everyday situations and to express it in language
or mathematics, computer science may not have found the answer, but it has
spurred on mathematicians and linguists to search for it in new ways.
It was probably Descartes who originated the idea of a universal or articial
common language in the 1600s, although the quest for a perfect language goes
back to the Tower of Babel story. More than 200 articial languages have been invented since Descartes made his proposal. The seventeenth-century clergyman,
John Wilkins, wrote an essay in which he proposed a language in which words
would be built in a nonarbitrary fashion. Volapkinvented by Johann Martin
Schleyer, a German priest, in 1879was the earliest of these languages to gain
Unauthenticated
191
moderate currency. The name of the language comes from two of its words meaning world and speak. Today, only Esperanto is used somewhat and studied as
an indirect theory of perfect language design. It was invented by Polish physician Ludwik Lejzer Zamenhof. The name is derived from the pen name Zamenhof
used, Dr. Esperanto (1887). The word Esperanto means, as Zamenhof explained
it, one who hopes. Esperanto has a simple and unambiguous morphological
structureadjectives end in /-a/, adverbs end in /-e/, nouns end in /-o/, /-n/ is
added at the end of a noun used as an object, and plural forms end in /-j/. The core
vocabulary of Esperanto consists mainly of root morphemes common to the IndoEuropean languages. The following sentence is written in Esperanto: La astronauto, per speciala instrumento, fotografas la lunon = The astronaut, with a special instrument, photographs the moon. Much like computer languages, there
can be no ambiguity to sentences such as this one.
Esperanto espouses the goal of standardizing language so that ideas can be
communicated in the same way across cultures. Some estimates peg the number
of speakers of Esperanto from 100,000 to over a million. It is difficult to accurately
quantify the speakers, because there is no specic territory or nation that uses
the language. Zamenhof actually did not want Esperanto to replace native or indigenous languages; he intended it as a universal second language, providing a
common linguistic vehicle for communication among people from different linguistic backgrounds. The Universala Esperanto-Asocio (Universal Esperanto Association), founded in 1908, has chapters in over a hundred countries. Cuba has
radio broadcasts in Esperanto. There are a number of periodicals published in
Esperanto, including Monato, a news magazine published in Belgium. Some novelists, such as Hungarian Julio Baghy and the Frenchman Raymond Schwartz,
have written works in Esperanto.
It is ironic to note, however, that research on Esperanto indicates that it has
a tendency to develop dialects, and that it is undergoing various predictable
changes (diachronically speaking), thus impugning its raison dtre. Benjamin
Bergen (2001) discovered that even in the rst generation of speakers, Esperanto
had undergone considerable changes in its morphology and has borrowed words
from other languages. So, perfect languages may not be possible after all, either
as devised by computers or humans. The structure of grammar and vocabulary in
articial languages is reduced to a bare outline of natural language grammar and
vocabulary, and meaning is generally restricted to a denotative rangeone-wordone-meaning. In a phrase, the idea is to eliminate culture-specic knowledge
networks from human language. This is an ideal, but an impossible one to attain,
since even articial languages such as Esperanto apparently develop digressions
from the ideal.
Unauthenticated
192 | 3 Computation
So, what have we learned about mathematics and language in general from
computationism, from AI, and from articial languages? As mentioned several
times, the most important insight that these approaches have produced inheres
in eshing out patterns that can be modeled and thus compared. As a corollary, it
has become obvious that many aspects of mathematics and language have computational structure. Connecting this structure to meaning continues to be a major
problem. In computationism, three things stand out, which can be reiterated here
by way of conclusion:
1. In the task of writing an algorithm, we may have identied a specic way a
mental process operates and, as a consequence, we can better understand or
evaluate theories about that process.
2. It may be possible to simulate that process on the computer.
3. It might be possible to design computers that can do things that humans do.
This is an open question that requires much more research and theoretical
debate.
Unauthenticated
4 Quantication
It is the mark of a truly intelligent person to be moved by statistics.
George Bernard Shaw (18561950)
If one were to do a very quick calculation of the number of words consisting of a
specic number of letters (2 letters, 3 letters, 4 letters, and so on) as they occur on
several pages of common texts (newspapers, blogs, novels, and so on), a pattern
would soon become conspicuous. Words consisting of two to four letters (to, in,
by, the, with, more) are more frequent overall than words consisting of more letters. If the size of the text is increased, this pattern becomes even more apparent.
This in itself is an interesting discovery, reinforcing perhaps an intuitive sense
that shorter words are more frequent in all kinds of common communications
because they make them more rapid. But there is much more to the story. Grammatical constructions and discourse patterns too seem to be governed by the same
kind of statistical economya fact that is easily discerned today in text messages
and other forms of digital communication. Textspeak, as it is called (Crystal 2006,
2008), reveals a tendency to abbreviate words, phrases, and grammatical forms in
the same way that once characterized telegrams. The reason in the latter case was
to save on the price of sending messages, since each letter would cost a signicant
amount of money. In textspeak it seems to be a stylistic feature that cuts down on
the time required to construct and send messages. The high frequency of shorter
words in all kinds of texts and the propensity to abbreviate language forms in
rapid communication systems suggests a principle that can be paraphrased simply as the tendency to do more with less. This principle, as it turns out, has been
investigated and researched seriously by linguists and mathematicians.
Wherever one looks in both mathematics and language, one will note what
can be called an economizing tendency. In other words, there are aspects of both
systems (if not many) that can be measured as compression phenomena and this
can lead to various theoretical conclusions about the nature of the two systems.
The approach to the study of mathematics and language as governed by laws
of statistics, probability, and quantication of various kinds can be allocated to
the general rubric of quantication. Statistical-quantitative techniques have been
applied to the investigation of the structure of natural languages, to patterns
inherent in language learning, to rates of change in language, and so on. The
general aim has generally been to unravel hidden patterns in language. At rst,
Unauthenticated
194 | 4 Quantication
the use of quantication methods might appear to constitute a simple ancillary

technique, aiming to conrm self-evident patterns. This is certainly true in some
cases. But the applications of statistical and probabilistic methods to language
have also produced unexpected ndings that have led to deeper insightsinsights
that would not have been possible otherwise. All social sciences today make use
of statistics as an exploratory tool to make sense of their own particular corpora
of data. Statistics allows social scientists to esh out of the data relevant patterns that can then be mapped against theories and explanatory frameworks.
This applies as well to the study of language; and the historical record shows
that quantitative methodology (QM) has provided linguists with many valuable
insights.
QM can be categorized into several main approaches, each with a specic aim
and set of techniques. These are: the statistical-inference testing of collected data,
glottochronology, lexicostatistics, stylometry, corpus linguistics, and Zipan
analysis. As will be discussed in this chapter, quantitative approaches have led to
a general view of human communication that can be expressed as a general economizing principle that is built into systems such as language and mathematics.
The general study of probability phenomena by mathematicians, including
the study of probability itself, also can be located under the rubric of QM, which
may seem like a supercial statement, but is intended merely to distinguish the
study of quantitative phenomena from formalistic and computational ones. Like
the use of QM in linguistics, this approach aims to understand various quantiable phenomena such as compression, in addition to probability distributions,
within mathematics. Lets take a simple example. Devised initially to be an abbreviation strategy to facilitate the cumbersomeness of reading the number of repetitions of same digit in multiplicationsuch as 10 10 10 10 10 10 = 106
exponential notation did indeed render multiplication of this kind more efficient
and economical. The brain seems to boggle at complex information-processing
tasks. But the use of 6 in superscript form, which stands for the times a number is
to be used as a factor, greatly simplies the task at hand. In other words, it saves
on the cognitive energy required to process the same information. But this simple
notational device did much more than just make multiplication less effortful to
process. Right after its introduction it took on a life of its own. In fact, subsequent
to its invention mathematicians started to play with exponential notation in an
abstract way, discovering new facts about numbers. For example, they discovered
that n0 = 1, thus enucleating a property of zero that was previously unknown. It
also led to an arithmetic of exponents, with its own derived laws and properties,
such as the following:
Unauthenticated
4.1 Statistics and probability | 195
(n a )(n b ) = n a+b
(n a )(m a ) = (nm)a
(n a ) (n b ) = n ab
(n ) = n
a b
(n = 0)
ab
na = 1/n a
Exponential numbers also became the catalyst of the theory of logarithms, which
similarly started out as a means of making computations much more efficient and
automatic. Logarithms have since been used in many areas of mathematics, science, and statistics, allowing for all kinds of discoveries to occur in these domains
as well. The relevant point is that a simple notational device invented to make a
certain type of multiplication easier to read was the source of many discoveries,
directly or indirectly. The history of mathematics is characterized by the invention of notational strategies (such as exponents) that have led serendipitously to
unexpected discoveries.
By probabilistic structure, two things are intended in this chapter. First, it
refers to aspects in both language and mathematics which can be studied with
the tools of probability theory or can be quantied in order to assess them theoretically; and second, it refers to the ways in which information is compressed in
both systems. Whatever the case, it is obvious that quantication maps out another area of the common ground shared by linguistics and mathematics, and
so we will start this chapter off with a brief historical digression into statistical
techniques and their general implications for the study of both.
4.1 Statistics and probability

Except for specic instances, such as probabilistic computational models, statistical methods in mathematics and language are intended as tools for analyzing
recurrence and correlation in data. In order to grasp why statistics is intrinsic to
the human and social sciences generally, it is useful to take a step back and cast a
glance at the origin and evolution of statistics as a tool of scientic investigation.
Statistics grew out of probability theory. Although well known, it is instructive to review the basic notions in probability here. It is interesting to note that
probability theory took its initial form from the world of gambling, where it is an
advantage to have some estimation of the likelihood that one will win, say, a card
game or a roulette outcome. Suppose that the objective of a game is to draw an ace
from a deck of cards. What are ones chances of doing so? Lets start with a more
general version: How many different ways can four cards be drawn blindly from a
Unauthenticated
standard deck? The answer is:

52 51 50 49 = 6,497,400
The simple reasoning behind this answer goes like this. Any one of the 52 cards in
a standard deck can be drawn rst, of course. Each of the 52 possible rst cards
can be followed by any of the remaining 51 cards, drawn second. Since there are
51 possible second draws for each possible rst draw, there are 52 51 possible
ways to draw two cards from the deck. Now, for each draw of two cards, there are
50 cards left in the deck that could be drawn third. Altogether, there are 52 51
50 possible ways to draw three cards. Reasoning the same way, it is obvious that
there are 52 51 50 49, or 6,497,400, possible ways to draw four cards from a
standard deck.
Now, to nd out what the chances of getting any of the four aces in a row are,
it is necessary to determine, rst, the number of four-ace draws there are among
the 6,497,400 possible draws. Lets consider each outcome, draw by draw. For the
rst ace drawn, there are three remaining aces that can be drawn second, or 4 3
possibilities. There are two aces left that can be drawn third, or 4 3 2 possibilities. Finally, after three aces have been drawn, only one ace remains. So, the total
number of four-ace arrangements is: 4 3 2 1 = 24. Thus, among the 6,497,400
ways to draw four cards there are 24 ways to draw four aces. The probability of
doing this 24/6,497,400 = .0000036, which makes it a highly unlikely outcome.
This simple, yet instructive, example connects events in gambling with their
probability of occurrence. It is the latter that is of relevance to QM, aiming to
unravel patterns of occurrence and recurrence within certain phenomena and
within systems. The foundations of probability theory were laid by mathematician
Girolamo Cardano, himself an avid gambler, in the sixteenth century. Cardano
was the rst to discuss and calculate the probability of throwing certain numbers
and of pulling aces from decks of cards. He presented his results in his Book of
games of chance (1663), discussing the likelihood of winning fair games, as well
as suggesting ways to cheat. In the subsequent century the French mathematicians Blaise Pascal and Pierre de Fermat developed Cardanos ideas into a branch
of mathematics, known as probability theory. The whole idea of taming chance by
mathematizing it reveals a desire to conquer uncertainty, as the French writer
Franois de La Rochefoucauld argued in his Maxims (1665). La Rochefoucauld suggested, however, that this is a pipe dream, because uncertainty is an ever-present
force in human life no matter how many ingenious mathematical artifacts we create to control it or even to understand it.
Actually, for the sake of accuracy, it should be mentioned that it was John
Graunts 1662 Observations on the bills of mortality where one can nd the rst
use of what is called descriptive statistics today (see Petty 2010). Graunt presented
Unauthenticated
enormous amounts of data in a few tables in order to show visually what the data
imply. His goal was to communicate information economically and effectively.
Here is a sample of one of his tables:
Table 4.1: Example of one of Graunts tables
Buried within the walls of London
Whereof the plague
Buried outside the walls
Whereof the plague
Buried in total
Whereof the plague
3,386
1
5,924
5
9,310
6
Graunt then went on to derive percentages to show the relative quantities for comparison purposes. From this simple technique of gathering data and displaying it
in an organized fashion, the science of statistics crystallized shortly thereafter.
It has now become a tool of both mathematicians and linguists to study quantiable phenomena or else to assay the probabilistic structure of various phenomena
within mathematics and language. In effect, it truly denes a large stretch of common ground.
4.1.1 Basic notions

It was in the 1700s that the word Statistik came into circulation in German universities to describe a systematic comparison of data about nations, using the insights
of probability theory. Statistics quickly became a very important branch of mathematics and a useful tool in the then-emerging social sciences. The reason why
statistics, if applied correctly, has predictive value is that its database is random.
Essentially, it is a mathematical method of modeling the randomness in a phenomenon in order to see if there is a pattern hidden within it and if the pattern is
signicant. Inference is crucial to science, because it allows the scientist to draw
conclusions from data that accrue from random variation or from naturally occurring information. As a technique, it has great usefulness in mathematics. To quote
Elwes (2014: 318319):
Many powerful and elegant results can be proved about probability distributions, such as
the law of large numbers and the central limit theorem. It is striking that such mathematical calculations often violently disagree with human intuition, famous examples being the
Monty Hall problem and the Prosecutors Fallacy. (Some have even suggested that this trait
may be evolutionarily ingrained.) To combat this tendency towards irrationality, people in
many walks of life apply techniques of Bayesian Inference to enhance their estimation of risk.
Unauthenticated
The Monty Hall Problem and the Prosecutors Fallacy will be discussed subsequently. For now, it is important to note that the key idea is that of the normal
distributionthe curve that has the shape of a bell. The curve is a continuous
probability distribution, indicating the likelihood that any real observation will
fall between any two limits as the curve approaches zero on either side. A normal
distribution is characterized mathematically as follows:
(x)
1
e 22
2
f(x, , ) =
In this formula, is the mean (or expectation of the distribution) and the standard deviation. Without going into the mathematical details here, suffice it to say
that when = 0 and = 1 the distribution becomes the normal curve. This is a
remarkable discovery of statisticians, having changed our whole view of random
phenomena. Statistical applications have shown consistently that phenomena of
various kinds hide within them a patterna specic statistic in random data will
tend to occur within three standard deviations of the mean of the curve, as determined by a specic set of the relevant data.
19.1% 19.1%
15.0%
15.0%
9.2%
9.2%
0.1% 0.5%
3
1.7%
2.5 2
4.4%
4.4%
1.5
0.5
0.5
1.5
1.7%
2
2.5
0.5%
0.1%
Figure 4.1: The normal curve
The tail ends of the curve are the exceptional onesgiven a large enough sample, most measured phenomena will fall between 2 and 2 standard deviations
(average deviations) from the mean at 0. So, when a statistical test is applied to
the curve and reveals that a variable in the data verges beyond these deviations
then one can infer relevance at different levels of condence. This implies that in
random data it is possible to estimate the probability of occurrence of the value
of any variable within it. The total area under the curve is dened to be 1. We can
multiply the area by 100 and thus know that there is a 100 percent chance that any
value will be somewhere in the distribution. Because half the area of the curve is
below the mean and half above it, we know that there is a 50 percent chance that
Unauthenticated
a randomly chosen value will be above the mean and the same chance that it will
be below it.
This implies that linguistic and mathematical data, when collected, will likely
have a pattern in it that shows a normal distribution. Some examples of this will
be discussed below. For now, it is remarkable to note that a simple classicatory
and probabilistic method can reveal hidden structure and thus allow us to esh
out some implicit principle in it.
The area under the normal curve is equivalent to the probability of randomly
drawing a value in that range. The area is greatest in the middle, where the hump
is, and thins out toward the tails. There are, clearly, more values close to the mean
in a normal distribution than away from it. When the area of the distribution is
divided into segments by standard deviations above and below the mean, the area
in each section is a known quantity. For example, 0.3413 of the curve falls between
the mean and one standard deviation above the mean, which means that about
34 percent of all the values of a random sample are between the mean and one
standard deviation above it. It also means that there is a 0.3413 chance that a value
drawn at random from the distribution will lie between these two points.
The amount of curve area between one standard deviation above the mean
and one standard deviation below is 0.3413 + 0.3413 = 0.6826, which means that
approximately 68.26 percent of the values lie in that range. Similarly, about 95 percent of the values lie within two standard deviations of the mean, and 99.7 percent
of the values lie within three standard deviations.
99.7% of area
95% of area
68% of area
+1
+2
+3
Figure 4.2: Standard deviations
In order to use the area of the normal curve to determine the probability of occurrence of a given value, the value must rst be standardized, or converted to a
z-score. To convert a value to a z-score means to express it in terms of how many
standard deviations it is above or below the mean. After the z-score is obtained,
one can look up its corresponding probability in a table. The formula to compute
Unauthenticated
a z-score is as follows:
= Mean
= Standard Deviation
z=
Needless to say, normal curves result from measuring many naturally-occurring

phenomena. In biology it has been found that the logarithm of measures of living
tissue (skin area, weight), length of inert appendages (hair, claws), and certain
physiological measures such as blood pressure all display a normal distribution.
In economic theory, changes in the logarithms of exchange rates, price and stock
market indices, for instance, also display the distribution. It also emerges from
standardized testing scores, learning phenomena, and many more types of information and data.
One theoretical notion that statistical methods have opened up is the idea of
randomness and its importance to science and mathematics. There are now algorithms, called Random Number Generators (RNGs), that are devised to generate
sequences of numbers or symbols that lack any pattern, or at least, appear to be
random. RNGs have led to the development of ways to produce randomness for
the sake of various activities, such as lotteries and PIN numbers. In other words,
statistics has led to the study of randomness, which might have been unthinkable
without it. Random numbers are also used by so-called Monte Carlo methods to
achieve numerical approximations to problems that may be too difficult to solve
exactly. Randomness is the opposite of recursion and if it is found to be characteristic of some systems, then the whole concept of recursion will have to be revisited.
For now, RNGs are very difficult to devise, since randomness seems to be exceptional.
4.1.2 Statistical tests

There are three main types of statistical tests that relate to the curve above
signicance testing, regression analysis, and correlation analysis. Although well
known, it is useful to go over them here since they relate to a common ground on
which mathematics and linguistics are embedded.
To illustrate signicance, suppose we have 1,000 subjects taking an IQ test
in order to determine if males or females are more intelligent (according to the
given test). After administering the test we nd that the mean (average) score for
500 males is 98 and that for 500 females is 100. Now, one may claim that a 2 % difference in average score is a miniscule one. But since the sample size was 1,000,
it becomes de facto signicant. With a large sample size, very small differences
Unauthenticated
can be detected as signicant. This means that we can be quite sure that the difference is real. Signicance tells us, in fact, how sure we can be that a difference
or relationship exists. One important concept in signicance testing is whether
we use a one-tailed or two-tailed test of signicance. This depends on our
hypothesis. If it involves the direction of the difference or relationship, then we
should use a one-tailed probability. For example, a one-tailed test can be used to
test the null hypothesis: Females will not score signicantly higher than males
on an IQ test. This hypothesis (indirectly) predicts the direction of the difference.
A two-tailed test can be used instead to test the opposite null hypothesis: There
will be no signicant difference in IQ scores between males and females. Whenever one performs a signicance test, it involves comparing a test value that we
have calculated to some critical value for the statistic. It doesnt matter what type
of statistic we are calculating (a t-test, a chi-square test, and so on), the procedure
to test for signicance is the same.
In 2005, Adam Kilgarriff (2005) used basic signicance techniques to check
for randomness in language, with the null hypothesis that randomness was a
feature of language, nding the opposite. Signicance testing between corpora
of linguistic data has now become a key tool in investigating language, as the
Kilgarriff study showed. Basically, corpus linguists test their data with statistics.
Psycholinguistic experiments, grammatical elicitation tests and survey-based investigations also commonly involve statistical tests of some sort. A special type
of statistical technique is called the type-token ratioa token is any instance of a
particular morpheme or phrase in a text. Comparing the number of tokens in the
text to the number of types of tokens can reveal how large a range of vocabulary
is used in the text.
The two most common uses of signicance tests in corpus linguistics are calculating keywords and collocations. To extract keywords, the statistical signicance of every word that occurs in a corpus must be determined, by comparing
its frequency with that of the same word in a reference corpus. When looking for
a words collocations, the co-occurrence frequency of that word and everything
that appears near it once or more in the corpus is determined statistically. Both
procedures typically involve many thousands of signicance tests.
Regression analysis involves identifying the relationship between a dependent and an independent variable. A relationship is hypothesized and estimates
if the values are used to develop a regression equation. Various tests are then employed to determine if the model is satisfactory, and if the equation can be used
to predict the value of the dependent variable. Correlation analysis also deals
with relationships among variables. The correlation coefficient is a measure of
association between two variables. Values of the correlation coefficient are always between 1 and +1. A correlation coefficient of +1 indicates that two vari-
Unauthenticated
ables are perfectly related in a positive linear sense, while a correlation coefficient
of 1 indicates that two variables are perfectly related in a negative linear sense,
and a correlation coefficient of 0 indicates that there is no linear relationship between the two variables. Correlation makes no a priori assumption as to whether
one variable is dependent on the other(s) and is not concerned with the relationship between variables; instead it gives an estimate of the degree of association
between the variables, testing for interdependence of the variables. Regression
analysis describes the dependence of a variable on one (or more) explanatory
variables; it implicitly assumes that there is a one-way causal effect from the explanatory variable(s) to the response variable, regardless of whether the path of
effect is direct or indirect.
Speelman (2014) gives a comprehensive overview of how these basic statistical techniques inform and guide the conduct of research in corpus linguistics.
Focusing on regression analysis, he explains why it is exceptionally well suited to
compare near-synonyms in corpus data, allowing us to identify the different factors that have an impact on the choice between near synonyms, and to determine
their respective effects.
4.2 Studying properties quantitatively

The use of statistical and probabilistic reasoning in linguistics and mathematics
has led to several interesting ndings about the economic structure of both systems, which can be dened with a common expressionthey do a lot with little.
Statistical analyses have also revealed that specic phenomena in each system
are governed by intrinsic probability laws connected with their occurrence and
recurrence.
In this section, we will look at several key notions that cut across mathematics
and language, falling under a general category of QM that is called the Principle
of Economy (PE). The PE posits that structures in a system tend towards economy
of form without loss of meaning and, in some case, with an increase in meaning.
The PE is an operative principle in studies of optimization in computer science
and applied mathematics. Optimization consists in nding a value (algorithm,
set of instructions and so on) from a set of input values that can be maximized
or minimized according to a situation to produce the optimal output. The PE is
itself a corollary of the Principle of Least Effort (PLE), which claims that human
communication and representation tend towards economy and thus optimization
(in a general sense), so as to render them efficient and effective. The PE and the
PLE crop up statistically in various phenomena, from the frequency of rst-digits
in texts (Benfords Law) to the length of words in common texts (Zipfs Law). In
Unauthenticated
203
other words, the computer science notion of optimization is really a derivative of

a more inherent tendency in human systems towards efficiency of form.
4.2.1 Benfords Law

Shortly after the advent of the above key notions in statistics, mathematicians
started looking at probability features and statistical distributions in various phenomena. One of these came about by surprisethe statistical predictability of
digit occurrence. One of the earliest studies in QM was, therefore, the analysis
of the so-called rst-digit phenomenon (see Raimi 1969, Hill 1998). It was the
American astronomer Simon Newcomb who found, in 1881, that if the digits used
for a task are not entirely random, but somehow socially based, the distribution
of the rst digit is not uniform1 tends to be the rst digit in about 30 % of cases,
2 will come up in about 18 % of cases, 3 in 12 %, 4 in 9 %, 5 in 8 %, etc. Newcomb
came to this discovery by noticing that the rst pages of books of logarithms were
soiled much more than the remaining pages. A number in a table of physical constants is more likely to begin with a smaller digit than a larger digit.
In 1938, physicist Frank Benford investigated listings of data more systematically, nding a similar pattern to the one uncovered by Newcomb in income tax
and population gures, as well as in the distribution of street addresses of people
listed in phone books. Benford then went on to propose a formula for the rst-digit
phenomenon, known as Benfords Law. It posits that the proportion of time that
d (digit) occurs as a rst digit is around:
log10 = (1 +
1
)
d
More generally, in terms of natural logarithms the formula is:

P(d) =
ln (1 + 1d )
ln (10)
The underlying assumption of Benfords Law is that the sample quantities, expressed in the base 10 and more or less arbitrary units will be fairly evenly distributed on a logarithmic scale. So, this is why the probability of the leading digit
being d clearly approaches:
ln (1 + 1d )
1
log10 (d + 1) log10 (d)
= log10 (1 + ) =
log10 (10) log10 (1)
d
ln (10)
Benfords Law applies mainly to data that are distributed uniformly across many
orders of magnitude. On the other hand, a distribution that lies within one order
Unauthenticated
of magnitude, such as the heights of human beings, is less likely to conform to

the law. However, as the distribution becomes narrower the predictive value of
the law increases. So, for example, bacteria multiply profusely and quickly. By
the end of, say, 30 days there will be around a trillion bacteria in a dish. It is then
that Benfords Law applies rather accurately to describing the digit distribution
representing bacteria. The reason is that bacteria grow exponentially, doubling
each day. An exponentially-growing quantity moves rightward on a logarithmic
scale at a constant rate. Measuring the number of bacteria at a random time we
will reach a random point on the scale, uniformly distributed.
Benfords Law remained a part of mathematical speculation and seen as having a rather limited application until Theodore Hill (1998) provided the rst rigorous mathematical explanation of its validity. He showed that the law is not base
dependent, applying to any base, b, not just 10, with the frequency of the leading
digit given by the above generalized formula:
logb = (1 +
1
)
d
Research in QM has shown that distributions that conrm Benfords Law include
statistical data where the mean is greater than the median and the skew is positive; numbers produced through various combinations, such as quantity unit
price; and various calculations such as multiplicative ones whose answers fall
into a logarithmic distribution. As Havil (2008: 192) suggests, there are at least
two main observations to be made vis--vis Benfords Law:
One, that if Benfords Law does hold, it must do so as an intrinsic property of the number
systems we use. It must, for example, apply to the base 5 system of counting of the Arawaks
of North America, the base 20 system of the Tamanas of the Orinoco and the Babylonians
with their base 60, as well as to the exotic Basque system, which uses base 10 up to 19,
base 20 from 20 to 99 and then reverts to base 10. The law must surely be base independent.
The second is that changing the units of measurement must not change the frequency of
rst signicant digits.
What Havil is pointing out here is that Benfords Law must be a law of numbers
not numeration. As such it is a veritable mathematical discovery. As we shall see
below, a version of the law applies to language as well, thus uniting mathematical
and linguistic probability phenomena rather unexpectedly.
Benfords Law seems to crop up everywhere. Bartolo Luque and Lucas Lacasa
(2009) used it to examine prime numbers. It is known that prime numbers, in
very large datasets, are not distributed according to the law. Rather, the rst digit
distribution of primes seems to be uniform. However, as Luque and Lacasa discovered, smaller datasets (intervals) of primes exhibit a signicant bias in rst digit
distribution. They also noticed another remarkable pattern: the larger the dataset
Unauthenticated
205
of primes, the more closely the rst digit distribution approached uniformity. The
researchers wondered, therefore, if there existed any pattern underlying the trend
toward uniformity as the prime interval increased to innity.
The set of all primes is innitea fact proved by Euclid, as is well known. From
a statistical point of view, one difficulty in this kind of analysis is deciding how
to choose numbers randomly in an innite dataset. So, only a nite interval can
be chosen, even if it is not possible to do so completely randomly in a way that
satises the laws of statistics and probability. To overcome this obstacle, Luque
and Lacasa chose several intervals of the shape [1, 10d]; for example, 1100,000
for d = 5, and so on. In these sets, all rst digits are equally probable a priori.
So if a pattern emerges in the rst digit of primes in a set, it would reveal something about the rst digit distribution of primes within that set. By looking at sets
as d increases, Luque and Lacasa thus investigated how the rst digit distribution of primes changes as the dataset increases. They found that primes follow a
size-dependent Generalized Benfords Law (GBL), which describes the rst digit
distribution of numbers in series that are generated by power law distributions,
such as [1, 10d]. As d increases, the rst digit distribution of primes becomes more
uniform.
Signicantly, Luque and Lacasa showed that the GBL can be explained by
the prime number theoremspecically, the shape of the mean local density of
the sequences is responsible for the pattern. The researchers also developed a
framework that provides conditions for any distribution to conform to a GBL. The
conditions build on previous research. Luque and Lacasa also investigated the sequence of nontrivial Riemann zeta zeros, which are related to the distribution of
primes, and whose distribution of the zeros is considered to be one of the most important unsolved mathematical problems. Although the distribution of the zeros
does not follow BL, here the researchers found that it does follow a size-dependent
GBL, as in the case of the primes.
This is a crucial, if unexpected nding about primes, that may lead to solving some of the most intractable theorems in prime number theory, such as the
Riemann Hypothesis (Derbyshire 2004, Du Sautoy 2004, Sabbagh 2004, Wells
2005, Rockmore 2005). In 1859, Bernhard Riemann presented a paper to the Berlin
Academy titled On the Number of Prime Numbers Less Than a Given Quantity
in which he put forth an hypothesis that remains unsolved to this day. Riemann
never provided a proof for his hypothesis and his housekeeper burnt all his personal papers on his death. It is a proof that is waiting to be made, so to speak,
even though it has already led to several signicant discoveries in primality. On a
number line, the primes become scarcer and scarcer as the numbers on the line
grow larger: twenty-ve percent of the numbers between 1 and 100, 17 percent
of the numbers between 1 and 1,000, and 7 percent of the numbers between 1
Unauthenticated
and 1,000,000 are primes. Paul Erds (1934) proved that there is at least one prime
number between any number greater than 1 and its double. For example, between
2 and its double 4 there is one prime, 3; between 11 and its double 22 there are three
primes, 13, 17, and 19. Riemann argued that the thinning out of primes involves an
innite number of dips called zeroes, on the line, and it is these zeroes that
encode all the information needed for testing primality. So far no vagrant zero
has been found, but at the same time no proof of the hypothesis has ever come
forward.
From previous work, Riemann knew that the number of primes around a
given number on the line, n, equals the reciprocal of the natural logarithm of that
numberthe number of times we have to multiply e by itself to get a given number.
Riemann showed that at around one million, whose natural logarithm is about 3,
every 13th number or so is prime. At one billion, whose natural logarithm is 21,
about every 21st number is prime. A pattern seems to jut out from such discoveries.
So, Riemann asked why primes were related to natural logarithms in this way. He
suspected that he might nd a clue to his question in a sequence, {1 + 1/2s + 1/3s +
1/4s + + 1/ns }, now called the Riemann zeta function. For imaginary numbers
the zeta function equals zero. Proving the hypothesis means proving that every
exponent makes summing the fractions in the zeta function zero. If the hypothesis
is right, then we will know how the primes thin out along the number line. So far
computers have been able to verify the hypothesis for the rst 50 billion. What
kind of proof would be involved in showing that it applies to all? Incredibly, the
zeta function is related to the energies of particles in atomic nuclei, to aspects of
the theory of relativity, and other natural phenomena.
What is remarkable is that a simple quantication phenomenon that crops
up in one domain morphs to another to provide insights into it. Clearly, statistical
techniques are indeed revelatory vis--vis the hidden properties of various phenomena.
4.2.2 The birthday and coin-tossing problems

The discovery of Benfords Law is truly mindboggling, since it suggests that there
is an inherent probability structure to seemingly random phenomena that are
exactly quantiable, using logarithmic-probability techniques. Probability is, in
effect, a mathematical way of quantifying chance and thus, in a sense, of taming
it for observation. Randomness thus becomes less random, in a manner of speaking. An intriguing example of this is the so-called birthday problem, which has
become a classic one in mathematics:
Unauthenticated
4.2 Studying properties quantitatively |
207
How many people do there need to be in a place so that there is at least a 50 %

chance that two of them will share the same birthday?
The way to answer this is to use basic probability thinking, which means asking
the same question for different numbers of people, calculating the relevant probabilities, up till when the probability rst drops below 50 %. So, lets suppose that
there are 2 people in a room. The total number of possible arrangements of birthdays in this case is:
365 365
If the two people do indeed have different birthdays, then the rst one, say A, may
have his or her on any day of the year (365 possibilities) and the second one, say B,
may have his or hers on any day except the day of As birthday. So, there are 364
possibilities for Bs birthday. The number of possible pairs of distinct birthdays is
thus:
365 364
And the probability of one occurring is:
364 365
365 365
Now, we can generalize this approach to n people. In this case the number of possible birthday arrangements is:
365n
Assuming that every single person has a different birthday the same reasoning
applies: A may have it on any of 365 days, B, on any of the remaining 364 days,
C, on any of the then remaining 363 days, and so on, until the last, or nth person,
who will have his or her birthday on any of the remaining (365 n) days in order
to avoid the rst (n 1) possibilities. The probability is:
364!
(365 n)! 365n1
The rst value for n for which it is below 0.5 is 23. This means that 23 people will
do the tricka truly remarkable nding, if one thinks of it. The graph below summarizes the probability structure of the problem (see Figure 4.3).
More technically, the number of ways of assigning birthdays to people is the
size of the set of functions from people to birthdays. How many possible functions
are there? The answer is, in symbolic form: $| B | ^{| P |}$, where $| B |$ is the
number of days in the year, 365, and $| P |$ is the number of people in the group.
The birthday problem is essentially part of a class of probability problems
that involve permutations and combinations of elements. One of the best known
Unauthenticated
Figure 4.3: Birthday problem
ones is the coin toss problem which is worthwhile revisiting here for the sake of
argument and illustration. If a coin is to be tossed eight times in a row, there is
only one possible outcome of throwing all heads (H = head, T = tails):
H H H H H H H H Only possible outcome of eight heads thrown in a row
Another way to describe this outcome is to say that it consists of no tails. There
are, however, eight possible outcomes composed of seven heads and one tail.
These can be shown as follows:
H H H H H H H T One possible outcome of seven heads and one tail
H H H H H H T H A second possible outcome of seven heads and one tail
H H H H H T H H A third possible outcome of seven heads and one tail
H H H H T H H H A fourth possible outcome of seven heads and one tail
H H H T H H H H A fth possible outcome of seven heads and one tail
H H T H H H H H A sixth possible outcome of seven heads and one tail
H T H H H H H H A seventh possible outcome of seven heads and one tail
T H H H H H H H An eighth possible outcome of seven heads and one tail
For six heads and two tails, there are 28 outcomes; for ve heads and three tails,
there are 56 outcomes; and so on. Altogether, the total number of possible outcomes of tails is:
1 + 8 + 28 + 56 + 70 + 56 + 28 + 8 + 1 = 256
So, the probability of getting all heads and no tails in eight tosses is 1/256; the
probability of seven heads and one tail is 8/256 = 1/32; the probability of six heads
and two tails is 28/256 = 7/64; and so on. In sum, calculating probabilities in various seemingly random phenomena allows us to detect pattern. It allows us to
sift the wheat from the chaff of randomness. It also shows that mathematics itself may have an intrinsic probability structure which, when applied to external
phenomena, seems to provide fascinating insights into them. Further excursions
Unauthenticated
209
into this world of probability will be taken below. In a sense, this type of analysis
brings out what can be called the efficiency of events, by which is meant that
probability theory looks at how things become streamlined through a trial-anderror process that hides within it a denumerable probability system. This system
also brings out that there is a minimal, versus a maximal, way of doing things
and that events occur through one or the other, if no articial interferences are
involved. This efficiency of events criterion shows up in two main ways:
1. It shows up in probability distributions which indicate that the path of least
resistance in a coin toss or in determining the likelihood of two birthdays being on the same day have a denite numerical structure that shows how one
can achieve something minimally.
2. It shows up in the way we do mathematics through compression (such as
exponential notation) which makes it more efficient yet, at the same time,
becomes the source of further mathematics.
4.2.3 The Principle of Least Effort

Numerical patterns such as those discussed above have counterparts in language.
One of the areas of QM is the study of compression, as mentioned, such as using exponential notation to indicate repeated multiplication (above). There is a
similar phenomenon in language, revealing an unconscious tendency within linguistic representation and communication.
This chapter started off with an anecdotal observation: If we count the number of 2-letter, 3-letter, 4-letter, and so on words in a common text such as a newspaper, we nd that they are more frequent that words consisting of, say, 8-letters
or 12-letters. Statistical regularity relating the length of a word and its frequency
of use has been documented by relevant research. For the sake of historical accuracy, it should be mentioned that the rst to look at word length and frequency
in this way was the French stenographer J. B. Estoup in his 1916 book, Gammes
stnographiques, which describes work he had been conducting on word lengths
in French in previous years.
The branch of QM that studies this tendency is called, generally, Zipan analysis, after the work of Harvard linguist George Kingsley Zipf starting in the late
1920s (for example, Zipf 1929). Essentially, it involves determining the relation between word length and frequency of word usage in specic texts. Zipf presented
data and analyses that showed an inverse statistical correlation between word
length in phonemes and its frequency of usage in texts. Simply put, the shorter
the word, the more frequent its occurrence, and vice versa, the longer the word
the less frequent. If this nding could be shown to have general validity, then its
Unauthenticated
implications would be seen to be a law akin to Benfords Law. It suggests, among

other things, that speakers of a language might be choosing the path of least resistance in constructing and getting messages across, economizing the linguistic
material used to do so.
French linguist Andr Martinet (1955) argued that languages evolved over time
to make communication more economical so as to preserve effort. Calling it the
Principle of Economic Change, it revived the notion of a Principle of Economy
(PE) articulated by Whitney in 1877. Martinet posited that complex language forms
and structures tended towards reduction over time because of usage to facilitate
communication, making it more rapid and effortless. For example, the opposition
between short and long vowels in Latin, which kept a relatively large inventory of
words distinct in that language, disappeared in the emerging sound systems of
the Romance languages. Latin had ten distinct vowel phonemes, pronounced as
either long or shortfor example, the pronunciation of the word spelled os could
mean either mouth or bone, depending on whether the vowel was articulated
long or short (respectively). The Latin vowel system was, to a large extent, reduced in the Romance languages, in line with the PE. Distinctions of meaning
were preserved via a realignment of structures in other parts of the language. In
other words, Martinet found that a reduction in the physical materials in a structural system due to economizing tendencies involved the emergence of different
subsystems to differentiate meaning. The economizing change in one subsystem
(phonology) entails readjustment and realignment in the other subsystems (morphology, syntax). This can be called simply reorganization.
Reorganization can be used to explain why isolating (syntactic) languages
languages in which word order and organization determines meaningmay have
evolved from previous stages in which morphology played a larger role. A classic
example of the operation of this hypothesis is the loss of the Latin declension
system in the Romance languages, as Martinet suggested, which came about in
reaction to the loss of nal consonants, and the concomitant reorganization of
grammar along a syntactic axis to compensate for this loss. In a sentence such
as Puer amat puellam (The boy loves the girl), discussed previously, the case
structure of the words is what allowed Latin speakers to extract the appropriate
meaning from it. Indeed, the words of the sentence could have been permuted
in any way and the meaning would have remained the same, because the ending
(or lack of ending) on each word informed the speaker or listener what relation
it had to the others. In English, on the contrary, The boy loves the girl and The
girl loves the boy mean different things, because English is an analytic language.
But older stages of English had case structure and thus meaning-cuing processes
that were more dependent on morphology. In both modern-day English and the
Romance languages, syntax has taken over many of the roles of previous mor-
Unauthenticated
211
phology because changes in phonology brought about the need for grammatical
reorganization (Clivio, Danesi, and Maida-Nicol 2011). Different devices emerged
in the Romance languages to maintain case distinctionsprepositions, for example, became necessary to distinguish case functions. This transfer of the burden
of meaning from morphological structure to syntactic order suggests that syntax
is a later development in language.
Not all meaning is preserved, however, in reorganization. Sometimes it leads
to expansion and, thus, to the discovery of new meanings. This happens not only
in language change, but also in other systems, including mathematics. For example, the use of superscripts in the representation of exponential numbers, which
was introduced in the Renaissance period, led serendipitously to the investigation
of new laws governing numbers, as already discussed.
The Principle of Economy is not, in itself, an explanatory theory of why change
occurs in the rst place. Nor are its corollaries. To unravel the causes of change, ultimately one must resort to a theorization of the internal forces at work in change.
The explanatory framework under which such inquiry has been conducted is that
of the Principle of Least Effort (PLE), mentioned above. The PLE in language was
likely discovered by the French scholar Guillaume Ferrero in 1894, articulating
it in an article that laid out previously-undetected facts about natural phenomena. Zipf (1929, 1932, 1935, 1949) claimed that its operation was independent of
language and culture. As Van de Walle and Willems (2007: 756) write, Zipf saw language as a self-regulating structure evolving independently from other social
and cultural factors. The PLE is the likely reason why speakers minimize articulatory effort by shortening the length of words and utterances. Through reorganization this leads to change in grammar and vocabulary. The changes, however,
do not disrupt the overall system of language, since they continue to allow people
to interpret the meaning of words and utterances unambiguously and with least
effort or, in some cases, to nd new meanings for them.
Initially, Zipf noticed that the length of a specic word (in number of phonemes) and its rank order in the language (its position in order of its frequency of
occurrence) were in a statistically inverse correlationthe higher the rank order of
a word, the more it tended to be shorter (made up with fewer phonemes). Articles
(the), conjunctions (and, or), and other function words (to, it), which have a high
rank order in English (and in any other language for that matter), are typically
monosyllabic, consisting of 13 phonemes. What emerged as even more intriguing
was that abbreviation and acronymy were used regularly with longer words and
phrases that had gained general and diffuse currency. Modern examples include:
FYO, ad, photo, 24/7, aka, DNA, IQ, VIP, and so on. In some cases, the abbreviated
form eclipsed the full formphoto is now more frequent than photograph in common conversation, as is ad rather than advertisement. These tendencies are now
Unauthenticated
called forms of compression. In some kinds of texts, compression is actually part

of style. Technical and scientic texts commonly use compressed forms (etc., ibid.,
and so on). The reason here is, again, that these occur frequently and thus need
not be literally spelled out. All this suggests a veritable law of communication
the more frequent or necessary a form is for communicative purposes, the more
likely it is to be rendered compressed in physical structure. The reason for this
seems to be the tendency to expend the least effort possible in speech, making it
more economical and efficient.
To see how this works all one has to do is take all the words in a substantial
corpus of text, such as an issue of a daily newspaper or a novel, count the number
of times words of two, three, four, and so on phonemes appear in the text, tabulating at the same time their frequency. Plotting the frequencies on a histogram,
sorted by length and rank, the resulting curve will be found to approach the shape
of straight line with a slope of 1. If rank is given by r and frequency by f, the result C of multiplying the two (r f = C) is constant across texts: that is, the same
word presents the same C in texts of relative size.
Mathematical studies of Zipan curves have conrmed the initial ndings:
(1) the magnitude of words tends, on the whole, to stand in an inverse relationship
to the number of occurrences (the more frequent the word the shorter it tends
to be); and (2) the number of different words in a text seems to be ever larger as the
frequency of occurrences becomes ever smaller. In the gure below (adapted from
Cherry 1957: 104106), curve A shows the result of a word count made upon James
Joyces Ulysses, which contains nearly 250,000 word tokens with a vocabulary of
nearly 30,000 lexemes:
Frequency
10,000 A
1,000
100
10
1
10
100
1,000
Rank Order
10,000
Figure 4.4: Zipan curve of Joyces Ulysses
Note that the slope of the curve is downward from left to right, approaching the
value of 1 (the straight line in the middle). This result emerges no matter what
type of text is used. Indeed, given a large enough corpus, the exact same type
of curve describes the rank order-frequency pattern in newspapers, textbooks,
Unauthenticated
213
recipe collections, and the like. The larger the corpus the more the curve tends
towards the slope 1. The specic language also does not inuence this result.
Indeed, Zipf used data from widely-divergent languages and found this to be true
across the linguistic spectrum. Not only words, but also web page requests, document sizes on the web, and the babbling of babies have been found to t the
Zipan paradigm. If the different Zipan curves are compared, they tend to show
the following shape in terms of a logarithmic (rather than linear) function:
Figure 4.5: Zipan curves (logarithmic function)
The relation of word frequency (pn ) to rank order (n) was formalized by Zipf as
follows:
log p n = A B log n
(where A and B are constants and B 1)
Shortly after the publication of Zipfs research, the mathematician Benoit Mandelbrot (1954, 1983), who developed fractal geometry, became fascinated by its
implications. He detected in it a version of what is called a scaling law in biology.
As a brilliant mathematician, Mandelbrot also made appropriate modications to
Zipfs original formula and, generally speaking, it is Mandelbrots formula that is
used today to study frequency distribution phenomena:
f(k; N, q, s) =
1/(k + q)s
H N,q,s
In this formula, k is the rank of the data, and q and s are parameters of the distribution. N is nite and q = 0. Finally, HN,q,s is as follows:
N
1
(i
+
q)s
i=1
H N,q,s =
Since the mid-1950s, research in various disciplines has largely validated the Zipfian paradigm (Miller and Newman 1958, Wyllys 1975, Rousseau and Zhan 1992,
Li 1992, Ridley and Gonzalez 1994, Perline 1996, Nowak 2000). The most frequent
words are economical in form and they account for most of the actual constitution
Unauthenticated
of sizeable texts, with the rst ranking 15 words accounting for 25 %, the rst 100
for 60 %, the rst 1,000 for 85 % and the rst 4,000 for 97.5 %. Remarkably, the operation of Zipan patterns has been found to surface in various types of activities
and behaviors, from numeration patterns (Raimi 1969, Burke 1991, Hill 1998) to the
distribution of city populations. Perhaps the most relevant nding comes from the
Nielsen Norman Group which examined the popularity of web sites using Zipan
methodology. It found that the rst page is the most popular one (the home page),
the second page is the one that receives second-most requests, and so on. Other
studies have found that Zipan curves characterize the outgoing page requests
there are a few pages that everybody looks at and a large number of pages that are
seen only once. The distribution of hypertext references on the web also appears
to manifest a Zipan distribution.
In early research, Zipf did not bring meaning and cultural diversity into his
statistical analyses. However, when he did, he also found some fascinating patterns. For example, he discovered that, by and large, the number of words (n) in
a verbal lexicon or text was inversely proportional to the square of their meanings (m): (n)(m)2 = C. In 1958, psycholinguist Roger Brown (1958) claimed that
Zipan analysis could even be extended to explain the Whoran concept of codability (Whorf 1956). This notion implies that speech communities encode the
concepts that they need. And this determines the size and composition of their
vocabularies. If speakers of a language need many colors for social reasons (such
as clothing fashion), then they will develop more words for color concepts than do
the speakers of other languages. Codability extends to the grammar (verb tenses,
noun pluralization, and many others), which is a guide to a speech communitys
organization of time and space. For instance, if planning ahead of time for future
events is not part of a communitys need, then the verb system will either not have
a future tense-marking system, or else will use it minimally. Thus, vocabulary and
grammar reveal codability. Brown (1958: 235) put it as follows:
Zipfs Law bears on Whorfs thesis. Suppose we generalize the nding beyond Zipfs formulation and propose that the length of a verbal expression (codability) provides an index of
its frequency in speech, and that this, in turn, is an index of the frequency with which the
relevant judgments of difference and equivalence are made. If this is true, it would follow
that the Eskimo distinguishes his three kinds of snow more often than Americans do. Such
conclusions are, of course, supported by extralinguistic cultural analysis, which reveals the
importance of snow in the Eskimo life, of palm trees and parrots to Brazilian Indians, cattle
to the Wintu, and automobiles to the American.
This interpretation of Zipan theory was critiqued by George Miller (1981: 107) as
follows: Zipfs Law was once thought to reect some deep psychobiological principle peculiar to the human mind. It has since been proved, however, that com-
Unauthenticated
215
pletely random processes can also show this statistical regularity. But a resurgence of interest in Zipan analysis today suggests that it may have tapped into
something deep indeed, although some renement or modication is needed
to guide the tapping. Recent work by Ferrer i Cancho (Ferrer i Cancho and Sole
2001, Ferrer i Cancho 2005, Ferrer i Cancho, Riordan, and Bollobs 2005), for instance, has shown that there are social reasons behind the operation of Zipfs
law. In other words, Zipfs law does not operate blindly but rather in response
to communicative and other pragmatic factors. When there are small shifts in the
effort expended by speaker or hearer, changes occur cumulatively because they
alter the entropy of the whole system. Interestingly, Zipfs law has been found in
other species. For example, McCowan, Hanser, and Doyle (1999) discovered that it
applies to dolphin communication which, like human language, had a slope of 1;
however, in squirrel monkeys it is 0.6, suggesting a simpler form of vocalization.
As Colin Cherry (1957: 103) pointed out a while back, Zipf understood the
relation between effort and language rather insightfully, unlike what his critics
believed:
When we set about a task, organizing our thoughts and actions, directing our efforts toward
some goal, we cannot always tell in advance what amount of work will actually accrue; we
are unable therefore to minimize it, either unconsciously or by careful planning. At best we
can predict the total likely work involved, as judged by our past experience. Our estimate of
the probable average rate of work required is what Zipf means by effort, and it is this, he
says, which we minimize.
In human affairs there are always two forces at work, Zipf asserted: a social force
(the need to be understood), which he called the Force of Unication, and the
personal force or the desire to be brief, which he called the Force of Diversication.
Clearly, therefore, the implications of Zipan analysis go far beyond the simple
statistical study of how form (length of words) and frequency of usage correlate.
In a fundamental way, the overall consequence afforded by the work in Zipfian analysis is a specic realization of Gregory Batesons aim, contained in his
Steps to an ecology of mind (1972), to understand the relation between form and
content, mind and nature, using scientic rather than speculative philosophical
theories. By showing a statistical correlation between the form of communication
and its usage, one will be on a more scientic footing in developing theories of
linguistic change.
Unauthenticated
4.2.4 Efficiency and economy

The discussion of the PLE as it manifests itself in language and other systems leads
to the notions of efficiency and economy, which can now be dened as the tendency to compress physical material in a system for reasons of adeptness. Economy and efficiency are thus intrinsically intertwined. This can even be seen in
the structural make-up of the forms language and mathematics. Take, for example, the notion of double articulation, as Martinet (1955) called it, or the fact that
both systems use a small set of symbols to make structures (numbers, words, and
so on) ad innitum. The presence of this feature in both brings out the fact that
language and mathematics are economical systems. A small set of phonemes in a
language (usually around 5060), in fact, is sufficient to make words endlessly in
that language. The construction processes are guided by rules of word formation
of course, but even this constraint does not block the innity-making potential of
language. For example, in English /p/ can be combined with /f/ (helpful, upow,
stepfather) within words, but the two cannot be combined in initial or nal position, as they can in German, its sister language (Pferd horse, Knopf button).
However, /p/ can be combined with /r/ or /l/ in any environment, other than wordnal position, to make words ad innitum. Without double articulation, it would
require an enormous amount of effort to create words with distinct sounds each
time, given the need for huge vocabularies in human situations, and an equally
enormous memory system to remember them. It would require millions of different sounds to create millions of different words, rather than the same sounds
combined in different ways to produce words.
The same principle applies to positional notation systems in mathematics.
With a small set of symbols (digits) one can construct numerals and various numerical representations ad innitum. The minimal requirement for double articulation to be operative is two symbols. This is the case with the binary digit system,
where all numbers can be represented with 0 and 1. This type of system is found in
many domains of human activity. For example, it was used in the Morse Code as
dashes and dots, and of course it is the basic principle underlying how computer
architecture works (where on-versus-off are the two basic states). Binary symbol
systems constitute a skeletal set of elements (two) from which complex structures
can be formed.
Double articulation is a manifestation of the operation of the PE. In both
mathematics and language, there are two structural levels: a higher level of rst
articulation and a lower level of second articulation. The former consists of
the smallest units available for constructing the larger ones. Complex units that
are made up from this minimal set occur at the secondary level. Now, the units
at the rst level lack meaning in themselves, whereas those at the secondary
Unauthenticated
217
level (such as morphemes, actual numbers) bear meaning or function. The lower
level units have differential function, that is, they provide the minimal cues for
making distinctions at the higher level. The higher-level units have combinatory
function, since they are combinations from the set of units at the rst level and
thus possess meaning in themselves. Double articulation does not seem to occur
in the signal systems of animals, making it a unique property of human systems.
Nth (1990: 155) puts it as follows:
Among these features, double articulation most certainly does not occur in natural animal
communication systems. Most probably, not even the ape language Yerkish is decoded as
a system with double articulation. Some authors who ascribe the feature of double articulation to bird calls and other animal languages seem to take the mere segmentability of
acoustic signals for a level of second articulation, However, a prerequisite of a truly phonemic patterning is that the same minimal but meaningless elements are combined to form
new messages. When they are substituted for each other, the substitution results in a semantic difference. This type of patterning seems to be absent from animal communication
systems.
In his classic study of human-versus-animal communication, Hockett (1960)

refers to double articulation as duality of patterning. He proposed a typology of
13 design features that, he suggested, would allow linguists to establish what true
language behavior was. He dened duality of patterning as the feature that vocal
sounds have no intrinsic meaning in themselves, but combine in different ways
to form words that do convey meanings.
The term efficiency needs further commentary here, since it has taken on various specialized meanings in science. In general it is dened as the ability of organisms or machines to do something successfully without waste (of time, energy,
and other resources). It is thus related to the concept of optimization in computer
science. In strictly computational terms, it refers to the measure of the extent to
which an input is used optimally for an intended output, with a minimum amount
or quantity of waste, expense, or effort. Efficiency is sometimes associated with
effectiveness. In general, efficiency is a measurable phenomenon, in terms of the
ratio of output to input. Effectiveness is the concept of being able to achieve a
desired result, which is not directly computable.
In the calculus the concept of efficiency dovetails with that of maxima and
minima, which are the turning points of a graph (see gure 4.6).
At a maximum, the derivative of the function f(x), or f (x) changes sign from
+ to . At a minimum, f (x) changes sign from to +, which can be seen at the
points E and F. We can also see that at the maximum, A, the graph is concave
downward, whereas at the minimum, B, it is concave upward. These are measures
of the extreme values of a function. These are counterparts to efficiency in real-
Unauthenticated
y = f(x)
A
b
E
X
F
Figure 4.6: Maxima and minima
life phenomena. For example we could nd the largest rectangle that has a given
perimeter or the least dimensions of a carton that is to contain a given volume,
both of which are deemed to have efficiency features.
In a more general framework, efficiency is connected to economy and thus
compression. Perhaps the most salient manifestation of the relation between the
two is in the use of symbols. As Godino, Font, Wilhelmi, and Lurduy (2011: 254)
observe, compression via symbolization is a central aspect of mathematics:
If we consider, for example, the knowledge required to nd the number of objects in a set,
it is necessary to use some verbal or symbolic tools, procedures, counting principles, etc.
Consequently, when an agent carries out and evaluates a mathematical practice, it activates
a conguration of objects formed by problems, languages, concepts, propositions, procedures, and arguments. The six types of primary entities postulated extend the traditional
distinction between conceptual and procedural knowledge when considering them insufficient to describe the intervening and emergent object in mathematical activity. The problems
are the origin or reason of being of the activity; the language represents the remaining entities and serves as an instrument for the action; the arguments justify the procedures and
propositions that relate the concepts to each other. The primary objects are related to each
other forming congurations, dened as the networks of intervening and emergent objects
from the systems of practices. These congurations can be socio-epistemic (networks of institutional objects) or cognitive (networks of personal objects).
Without notation, there would be no abstractions, theories, propositions, theorems, and so on in mathematics. There would be only counting and measuring
practices. As Steenrod, Halmos, and Dieudonn (1973) point out, notational systems are compressions of linguistic notions, and a mathematical system without
language would be indecipherable to the brain. This is why we have to explain
each and every math symbol in language, and when the notation leads to new
ideas, then those ideas have to be not only symbolized, but also explained with
language.
Unauthenticated
4.3 Corpus linguistics
| 219

The general study of quantitative phenomena in language comes under the
branch called corpus linguistics. Some of the statistical techniques and concepts in this branch have already been discussed above. A primary target of
study is the statistical analysis of the features that make up style, known more
specically as stylometry. This involves studying the relative frequencies of such
units as phonemes, syllables, words, and syntactic constructions that can be
associated with a certain literary genre, a specic author, or an individuals style
(idiolect). Relevant data are collected and analyzed statistically to reveal various
thingsincluding the stylistic features inherent in a text, the sources of texts, the
meaning of historical writings. To establish relationships between the data and
the style (or idiolect), stylometry employs simple inferential statistics (Bod, Hay,
and Jannedy 2003), which, as discussed above, can be used to explore the properties of various subsystemsphonological, morphological, syntactic, semantic.
Corpus linguistics conceptualizes categories as normal probability distributions
and views knowledge of language not as a minimal set of discrete categories but
as a set of gradient categories that may be characterized by such distributions.
4.3.1 Stylometric analysis

Individuals use certain words, phrases, and other linguistic forms consistently as
part of their speaking (or writing) style, known as idiolect, but are barely conscious of so doing. This is part of linguistic identity. The stylometrist uses statistical analysis as a means of establishing the identity of someone, such as the
author of some text through an analysis of stylistic features of his or her idiolect.
The assumption is that each individual has a unique set of linguistic habits. Of
course, there are interfering factors, such as the fact that an individuals style is
always susceptible to variation from environmental inuences, including other
speakers, the media, and changes in language itself. Nevertheless, stylometric research has shown that grammatical and vocabulary styles tend to be fairly stable
and immune from outside inuences even as people age. A written text can thus
be examined for lexical and grammatical patterns within it by classifying these in
specic ways and then measuring them statistically against known style features
of the author. The analysis may, at the very least, be adequate to eliminate an
individual as an author or narrow down an author from a small group of subjects.
The statistical techniques used include factor and multivariate analysis, Poisson
distributions, and the discriminant analysis of function words (Buckland 2007).
Unauthenticated
Much suspicion about the validity of stylometry existed until Donald Foster
brought the eld into the spotlight with his 1996 study that correctly identied the
author of the pseudonymously authored book, Primary Colors, as Joe Klein (Foster
2001). This led to an upsurge in interest in corpus linguistics generally and more
specically in stylometry among linguists, literary scholars, and others. Statistical studies of idiolect started to appear in the 2000s. A fascinating study was
carried out by James Pennebaker (2011). Studying the speeches of American presidents, Pennebaker found an inordinate use of the pronoun I in them, relative
to other speech styles and idiolects. The reason is, Pennebaker suggested, that
a president may unconsciously wish to personalize his commitment to specic
causes or issues through I-word use. He discovered, surprisingly, that president
Obama turned out to be the lowest I-word user of any of the modern presidents,
including Truman who came in second in this regard. He did not interpret this,
however, as humility or insecurity on the part of Obama, but rather, as its diametrical opposite (condence and self-assurance). Pennebaker based this analysis
on his statistical nding that self-assured speakers used I less than others, although most people would assume the opposite. It shows emotional distance from
a cause, not an emotional entanglement in it. In effect, Pennebaker suggests, function words (pronouns, articles, and the like) reveal more about idiolect than do
content words (nouns, adjectives, verbs). These words have an under-the-radar
furtiveness to them, constituting traces to personal identity in the everyday use of
language.
The nding also showed that social and emotional factors change style. The
profession of president is conducive to the use of a specic pronoun. The question becomes: Is it characteristic of other professions? Is it found in certain types
of individuals? These are the questions that a corpus linguistic approach would
attempt to answer. They have obvious implications for the study of style and for
the connection of discourse patterns to external inuences.
Pennebakers work falls under the rubric of stylometry (although this is not
mentioned explicitly as such in it). He started researching the connection between
language forms and personality by looking at thousands of diary entries written
by subjects suffering through traumas and depressions of various kinds. Today,
with social media sites such as Facebook and Twitter the potential sample size
of diaries has become enormous and can be used to carry out relevant stylometric analyses very effectively. Pennebaker discovered, for instance, that pronouns
were actually indicators of improvements in mental health in many subjects. A recovery from a trauma or a depression requires a form of perspective switching
that pronouns facilitate. They are linguistic symptoms revealing the inner life of
the psyche. The use of function words also correlates with age, gender, and class
differences. Younger people, women, and those from lower classes seem more
Unauthenticated
| 221
frequently to use pronouns and auxiliary verbs than do their counterparts. Lacking power, Pennebaker suggests, requires a more profound engagement with the
thoughts of others.
Perhaps the earliest example of the analysis of a text to determine its authenticity based on a stylistic analysis is that of Lorenzo Vallas 1439 proof that the
fourth-century document, Donation of Constantine, was a forgery. Valla based his
argument in part on the fact that the Latin used in the text was not consistent
with the language as it was written in fourth century documents. Valla thus used
simple logical reasoning. This kind of reasoning can now be more accurate given
the statistical techniques that corpus linguistics makes available. The basic ones
were laid out for the rst time by Polish philosopher Wincenty Lutosawski in
1890. Today, computer databases and algorithms are used to carry out the required
measurements.
With the growing corpus of texts on the Internet, stylometry is being used
more and more to study Internet texts and thus to rene its methods. The main
concept is that of writer invarianta property of a text that is invariant in the authors idiolect. To identify this feature, the 50 most common words are identied
and the text is then broken into word chunks of 5,000 items. Each is analyzed to
determine the frequency of the 50 words. This generates a unique 50-word identier for each chunk.
4.3.2 Other techniques

A contemporary statistical technique within corpus linguistics, used in various
areas, is called the articial neural network (ANN) (Tweedie, Singh, and Holmes
1996). The ANN carries out a nonlinear regression analysis in order to allow a
linguist to generalize the recognition patterns detected in a text. ANNs are simulative algorithms that are constructed to mimic the structure of the mammalian
brain. A large ANN might have hundreds or thousands of processor units, simulating functions such as those of the retina and the eye. ANNs do not carry out
programmed instructions; rather, they respond in parallel (either simulated or
actual) to the pattern of inputs involved. There are also no separate memory addresses for storing data in ANNs. Instead, information is contained in the overall
activation state of the network. ANNs work well in capturing regularities in data
where the diversity and volume is very great. A related approach, known as the
genetic algorithm, comes up with similar extractions of recurrence. It works somewhat like this: If well occurs more than 2 times in every thousand words, then the
text is authored by X.
Unauthenticated
Perhaps the best known use of stylometric techniques is in the areas of forensic science and archeological-philological investigations of various kinds. Within
these elds the cognate technique of lexicometry is used, which is simply the measurement of the frequency of words within a text and then plotting the frequency
distribution of a given word in the speech of an individual, a specic genre of text,
and so on. This allows the analyst to determine how a lexical item is used and who
the probable user might be. Thus, lexicometry, like stylometry in general, is used
both as proof of identity and as a heuristic tool (Findler and Viil 1964).
A primary objective of corpus linguistics is to derive a set of general rules of
vocabulary use, sentence formation and text-construction on the basis of the automated analyses of language samples collected in natural speech environments.
Quirks 1960 survey of English usage and Kucera and Franciss 1967 computational analysis of a carefully chosen corpus of American English, consisting of
nearly 1 million words, are early examples of this kind of analysis. One of the
rst offshoots has been the preparation of dictionaries combining prescriptive information (how language should be used) and descriptive information (how it is
actually used).
Corpus linguistics has also produced several other research methods allowing for theoretical generalizations to be made on the basis of actual corpora of
data. Wallis and Nelson (2001) summarize the principles in terms of what they
call the 3A perspective: Annotation, Abstraction and Analysis. Annotation is the
application of a scheme to texts, such as a structural mark-up, parsing, and other
such rule-based frames; abstraction involves generating a mapping of the data
against the model or scheme used; and analysis is the statistical generalization
of the data in order to determine what models work best. In effect, corpus linguistics has become an important branch of linguistics for validating if certain
features or patterns in speech samples are relevant to explicating structural and
semantic aspects of a language, in addition to idiolectal characteristics. This adds
a signicant empirical component to linguistic theories and models.
4.3.3 The statistics on metaphor

Statistical analyses of language can also shed signicant light on controversies,
such as those discussed in the previous chapter, and more specically on the frequency of literal-versus-metaphorical speech patterns. One of the rst statistical
studies in this area was the one by Howard Pollio et al. in 1977. The study found
that the average speaker of English creates approximately 3,000 novel metaphors
per week and 7,000 idioms (frozen metaphors) per week (Pollio, Barlow, Fine,
and Pollio 1977). It became clear from their study that verbal metaphor was
Unauthenticated
| 223
hardly a mere stylistic option to literal language. They found, overall, that people
used 1.80 novel and 4.08 frozen metaphors per minute of discourse. Altogether
this totals to 5.88 per minute of metaphorical speech. These ndings came from
transcripts of psychotherapeutic interviews, various essays, and even the 1960
Kennedy-Nixon presidential debates.
Graesser, Mio, and Millis (1989) analyzed the use of metaphor in six TV debates and news programs on the PBS Mac Neil/Lehrer News. They counted a total
of 504 unique metaphors in the six debates (repetitions were not counted), which
totaled 12,580 words; 12,580 divided by 504 is 24.96, hence an approximate rate
of one unique metaphor every 25 words. Steen et al. (2010) examined patterns
of metaphor usage in various kinds of discourse using techniques of corpus linguistics nding that on average one in every seven and a half words is related to
metaphor (Steen et al 2010: 780).
From these studies has come an impetus for developing algorithms to detect
metaphor in speech and to generate metaphorical discourse, not interpret it as discussed in the previous chapter (for example, Steen 2006, Renning and LnnekerRodman 2007, Shutkova 2010, see relevant studies in Diamantaras, Duch, and
Iliadis 2010). This has led some to put forth a neural theory of metaphor based on
several psycholinguistic and computational studies (for example, Feldman 2006).
Essentially, the extraction of metaphor from texts as well as its computational
modeling involves establishing a probabilistic relationship between concepts and
words via a statistical analysis of language data and then constructing the relevant algorithm and, nally, a third-party rating of the metaphors the model generated. This type of research was discussed in the previous chapter. The point here
is that it is still ongoing and can fall under several branches, including and especially, corpus linguistics.
With the advent of social media, the research focus has started to shift towards the use of gurative language in these media. Ngyuen, Nguyen, and Hwang
(2015) used a statistical method for the analysis of gurative language in tweets,
determining if they were sarcastic, ironic, or metaphorical tweets by extracting
two main features (actual term features and emotion patterns). Their study used
two datasets, the Trial set (1,000 tweets) and the Test set (4,000 tweets). Performance was evaluated by cosine similarity to gold standard annotations. These
are trustworthy corpora that are critical for evaluating algorithms that use annotations. Their proposed method achieved 0.74 on the Trial set. On the Test set, they
achieved 0.90 on sarcastic tweets and 0.89 on ironic tweets. This is a remarkable
nding, showing that in social media, metaphor, especially in its ironic forms, is
very dense.
Overall, the statistics on metaphor corroborate that metaphor is not an exception to literal language, but a common feature (if indeed a major feature) of
Unauthenticated
discourse. The point here, again, is that corpus linguistics in collaboration with
computational linguistics is useful in corroborating or refuting the theories of linguists.
4.4 Probabilistic analysis

The discussion of compression, Benfords Law, the birthday problem, stylistic patterns, and metaphorical density in speech exemplies what QM implies in one
of the senses used herenamely, an approach to quantifying seemingly random
phenomena and events in terms of the laws of probability. QM in some domains
of research is essentially probability analysis with a specic statistical inference
objective in mind. The basic idea in QM is that of studying all possible outcomes
of some event or phenomenon, assigning a random variable of probability to the
outcomes. When considered on the whole, the assigned random variables form
a probability distribution. This is the key to understanding probabilistic structure (which is really seeking pattern in randomness) in phenomena and events.
The design of contemporary computational models of, say, metaphor, are based
on probability theory. A probability distribution assigns a probability measure to
each subset of the possible outcomes of a random event or phenomenon. Subsets include: events or experiments whose sample space is non-numerical; those
whose sample space is encoded by discrete random variables; and those with
sample spaces encoded by continuous random variables. More complex experiments, such as those involving stochastic processes dened in continuous time,
may demand the use of more general probability measures.
There are many probability distributions, but the two mains ones are the discrete and the continuous one. In the former the outcomes are considered separately, as in the example of coin-throwing (above); in the latter the goal is to
examine phenomena such as a persons height. A discrete distribution is one in
which the data can only take on certain values, for example integers; a continuous distribution is one in which data can take on any value within a specied
range. For the former, probabilities can be assigned to the values in the distribution: for example, the probability that there are 12 metaphorical tweets in a
sample of 2,000 is 0.15. In contrast, a continuous distribution has an innite
number of possible valuesthe probability associated with any particular value
of a continuous distribution is null. It is thus described in terms of probability
density, which can be converted into the probability that a value will fall within a
certain range.
Unauthenticated
| 225
The tossing of a coin is a perfect example of what a probability distribution is

and allows us to do. More specically, it is called a binomial distribution, which
consists of the probabilities of each of the possible numbers of successes on N trials for independent events that each have a probability of . For the coin ip, N = 2
and = 0.5 and the distribution is shown below:
P(x) =
N!
x (1 )Nx
x!(N x)!
This distribution is based on the existence of equal outcomes. On the other hand,
the Bernoulli distribution describes the tossing of a biased coin (and similar experiments with unequal probabilities). The two possible outcomes are n = 0 (failure)
and n = 1 (success) in which the latter occurs with probability P and the former
with probability Q = 1 P, with 0 < P < 1, which has the probability density function:
P(n) = P n (1 P)1n
The Bernoulli distribution is the simplest discrete distribution, and it is the building block for other more complicated discrete distributions. The distributions of
a number of such types based on sequences of independent Bernoulli trials are
summarized in the following table (Evans, Hastings, and Peacock 2000: 32).
Table 4.2: Probability distributions
Distribution
Denition
binomial distribution
geometric distribution
negative binomial distribution
number of successes in n trials

number of failures before the rst success
number of failures before the xth success
There are many other types of distributions that need not concern us here. The
point to be made is that probability distributions both describe and analyze
random events with equal and unequal elements involved. In other words, they
unravel hidden quantitative structure in randomness. Probability considerations
have also been applied to three areas that are relevant to the discussion here since
they, too, reveal different angles from which to view mathematical probabilities
and thus provide insights into mathematics and its description of the world.
The three are: the Monty Hall Problem, the Prosecutors Fallacy, and Bayesian
Inference.
Unauthenticated
4.4.1 The Monty Hall Problem

The Monty Hall Problem (MHP) was named after television quiz show host Monty
Hall who was the presenter of Lets Make a Deal. It was formulated by Steven
Selvin in 1975. The contestants on the show had to choose between three doors
that hid different prizes. The problem goes like this, broken down into stages:
1. There are three doors: A, B, and C. Behind one is a new car, behind the other
two are goats.
2. The contestant chooses one door, say A.
3. He or she has a 1/3 probability of selecting the car.
4. Monty Hall knows where the car is, so he says: Im not going to tell you whats
behind door A, yet, But I will reveal that there is a goat behind door B.
5. Then he asks: Will you now keep door A or swap to C?
The assumption is that the odds are 50/50 between A and C, so that switching
would make no difference. But that is incorrect: C has a 2/3 probability of concealing the car, while A has just a 1/3 probability. This seems to defy common sense,
but probability reasoning says something different. Elwes (2014: 334) explains it
as follows:
It may help to increase the number of doors, say to 100. Suppose the contestant chooses
door 54, with a 1% probability of nding the car. Monty then reveals that doors 153, 5586,
and 88100 all contain wooden spoons. Should the contestant swap to 87, or stick with 54?
The key point is that the probability that door 54 contains the car remains 1 %, as Monty
was careful not to reveal any information which affects this. The remaining 99 %, instead of
being dispersed around all the other doors, become concentrated at door 87. So she should
certainly swap. The Monty Hall problem hinges on a subtlety. It is critical that Monty knows
where the car is. If he doesnt, and opens one of the other doors at random (risking revealing
the car but in fact nding a wooden spoon), then the probability has indeed shifted to .
But in the original problem, he opens whichever of the two remaining doors he knows to
contain a wooden spoon. And the contestants initial probability of 1/3 is unaffected.
For the sake of historical accuracy, it should be mentioned that the MHP was
similar to the three prisoners problem devised by Martin Gardner in 1959 (see
Gardner 1961). Of course, playing by the rules of probability may mean nothing
if one losesthat is, nds himself or herself in a wrong point in the probability
curve. However, knowing about the existence of the curve leads to many more insights into the nature of real events than so-called common sense. The MHP has
various implications that reach right into the power of probability theory to unravel hidden structure. Our assumption that two choices means 50-50 chances
is true when we know nothing about either choice. If we picked any coin then the
Unauthenticated
| 227
chances of getting a head or tail are, of course, 50-50. But information is what
matters here and changes the game.
The MHP brings out the principle that the more we know, the better our decision will be. If the number of doors in the MHP were 100 this becomes even
clearer, as we saw. As Monty starts eliminating the bad candidates (in the 99 that
were not chosen), he shifts the focus away from the bad doors to the good ones
more and more. After Montys ltering, we are left with the original door and the
other door. In effect, the information provided by Monty does not improve our
chances. Here is where Bayesian Inference (BI) comes into play, which will be discussed below. BI allows us to generalize the MHP as follows, since it allows us to
re-evaluate probabilities as new information is added. The probability of choosing
the desired door improves as we get more information. Without any evidence, two
choices are equally likely. As we gather additional evidence (and run more trials)
we can increase our condence interval that A or B is correct. In sum:
1. Two choices are 50-50 when we know nothing about them.
2. Monty helps by ltering the bad choices on the other side.
3. In general, the more information the more the possibility of re-evaluating our
choices.
The MHP makes us realize how subsequent information can challenge previous
decisions. The whole scenario can be summarized with the main theorem in BI,
which is as follows:
The conditional probability of each of a set of possible causes for a given observed
outcome can be computed from knowledge of the probability of each cause and
the conditional probability of the outcome of each cause.
4.4.2 The Prosecutors Fallacy

Another famous problem in probability that brings out the underlying principle
of structure in randomness and uncertainty is the so-called Prosecutors Fallacy
(PF). It goes like this (Elwes 2014: 331):
A suspect is being tried for burglary. At the scene of the crime, police found a strand of the
burglars hair. Forensic tests showed that it matched the suspects own hair. The forensic
scientist testied that the chance of a random person producing such a matching is 1/2000.
The prosecutors fallacy is to conclude that the probability of the suspect being guilty must
therefore be 1999/2000, damning evidence indeed.
Unauthenticated
This is indeed fallacious reasoning. Consider a larger sample. In a city of, say,
2 million people, the number with matching hair samples will be 1/2,000
2,000,000 = 1,000. Now, the probability of the suspect being guilty is a mere
1/1,000. The PF was rst formulated by William Thomson and Edward Schumann
in 1987. They showed how real people in court situations made this mistake, including at least one prosecuting attorney. Thomson and Schumann also examined
the counterpart to the PF, which they called the Defense Attorneys Fallacy. The
defense attorney might argue that the hair evidence is worthless because it only
increases the probability of defendants guilt only by a small amount, 1/1,000,
especially when compared to the overall pool of potential suspects (2,000,000).
However, the hair sample is normally not the only evidence, and thus together
with the evidence it might indeed point towards the suspect.
The key here is, again, that the reasoning involves BI (discussed in the next
section). The fallacy lies in confusing P(E|I) with P(I|E), whereby E = evidence,
I = innocence. If the former is very high, people commonly assume that P(I|E) must
also be high. P(E|I) is the probability that the incriminating evidence would be
observed even when the accused is innocent, known as a false positive; and P(I|E)
is the probability that the accused is innocent, despite the evidence E. The fallacy
thus warns us that in the real world probability considerations are to be taken at
their face value and that they can provide true insights into situations.
4.4.3 Bayesian Inference

Both the MHP and PF involve Bayesian probability theory, which makes explicit
the role of the assumptions underlying the problems. In Bayesian terms, probabilities are associated to propositions, and express a degree of belief in their truth,
subject to whatever background information happens to be known. In order to
discuss them, even schematically, we must take a step back to briey describe the
notion of conditional probability. An example of how to envision this concept is
given by Elwes (2014: 330):
In a particular city, 48 % of houses have broadband internet installed, and 6 % of houses
have both cable television and broadband internet. The question is: what is the probability
that a particular house has cable TV, given that it has broadband?
If we represent the separate events as X and Y, the conditional probability required

by the problem, that is, X given Y (symbolized as P(X|Y), is dened as follows:
P(X|Y) =
P(X&Y)
P(Y)(P(Y) = 0)
Unauthenticated
| 229
So, X = house that has cable and Y = house that has broadband. Given the percentages expressed in the problem, the answer is: P(X|Y) = 0.06/0.48 = 0.125 or
12.5 %. This analysis allows probabilities to be updated as events change. It is
called Bayesian Inference, after the Reverend Thomas Bayes in 1763, which he formulated as follows:
P(X)
P(X|Y) = P(Y|X)
P(Y)
BI has become part of QM and has been used, for example, to help solve the MHP
and Prosecutors Fallacy problems above, among many other very complex problems. Rather than use the closed reasoning system of formal logic, mathematics
has developed a more comprehensive approach to problems with Bayesian probabilistic reasoning. There are several ways to write the Bayesian formula, as follows, which can be used to shed light, for example, on the MHP:
P(Y|X) =
P(X|Y)P(Y)
P(X)
Expanding the bottom probability in terms of three mutually exclusive events,

R, S, and T we get the following (from Havil 2008: 6263):
P(X) = P(X R) + P(X S) + P(X T)
P(X|R) P(R) + P(X|S) P(S) + P(X|T) P(T)
Now, the MHP can be broken down as follows in terms of the Bayesian formula:
1.
2.
3.
4.
5.
6.
A = the event car is behind door A

B = the event car is behind door B
C = the event car is behind door C
MA = the event Monty opens door A
MB = the event Monty opens door B
MC = the event Monty opens door C
Suppose the contestant chooses A, then Monty has the choice of B and C to open
and this can be now represented as follows:
P(M B |A) =
1
2
P(M B |B) = 0
P(M B |C) = 1
Plugging these into the Bayesian formula, we get:
P(M B ) = P(M B |A) P(A) + P(M B |B) P(B) + P(M B |C) P(C)
=
1
2
1
3
+0
1
3
+1
1
3
1
2
Unauthenticated
The contestant can stick with his or her choice or switch to another door. If he or
she keeps door A, the probability of winning the car is as follows:
P(A|M B ) =
=
=
P(M B )|A) P(A)

P(M B )
1
1
2
3
1
2
1
3
If the contestant switches to door C, then the probability of nding the car becomes:
P(M B )|C) P(C)
P(C|M B ) =
P(M B )
1 1
= 13
2
2
3
So, Bayesian Inference makes it clear why the answer is what it is. Now, what
does this all imply? Basically, that some events have a Bayesian structure and this
means that they are both part of chance (uncertainty) and external intervention.
Mathematics has thus formalized a situation that typies a whole stretch of real
living that we grasp intuitively.
4.4.4 General implications

Many events and phenomena in Nature, human life, language, and numerical systems seem to obey hidden laws of probability and more specically the Bayesian
laws. In other words, the world seems to have probabilistic structure and its two
main descriptorsmathematics and languageare themselves shaped by this
structure. As mentioned, probability theory started in the eld of gambling but it
was treated formally for the rst time in 1933 by Andrei Kolmogorov who axiomatized it in terms of set theory, thus showing the intrinsic interconnection between
formal and probabilistic structure in mathematics. Kolmogorov suggested that
innite probability spaces are idealized models of real random processes. This is
the cornerstone idea in the use of probability theory to describe phenomena that
seem random but instead reveal hidden structure.
Given specic conditions, BI allows us to compute various probabilities. And
this is where interpretation comes into the picture. What does unveiling a probability structure imply about the phenomenon at hand? The guiding idea is that
probability is distributed equally among all possible outcomes, so that the likelihood of an event is simply the fraction of the total number of possibilities in which
Unauthenticated
| 231
the event occurs. This, as we saw, is especially well suited to those dilemmas, illustrated by the MHP and the PF, which suggest that BI models are the most suitable
ones.
To elaborate on this point, lets return to Benfords Law. The law has, as discussed, logarithmic structure. In effect, Newcomb and Benford found that in a
large sample, the rst digit, d, obeys the following frequency law (Barrett 2014:
188):
P(d) = log10 [1 + 1/d], for d = 1, 2, 3, . . . , 9
The relevant probabilities are as follows:
P(1) = 0.30
P(2) = 0.18
P(3) = 0.12
P(4) = 0.10
P(5) = 0.08
P(6) = 0.07
P(7) = 0.06
P(8) = 0.05
P(9) = 0.05
This shows that the digit 1 is the most likely to occur. Does this pattern apply correspondingly to language, that is, to the frequency of rst-letters? I applied the
formula to a series of texts in Italian, using a simple concordance algorithm, and
found striking similarity, whereby the letter p has a 35 % chance of being the rst
letter in a word within a large-sized sample. I know of no work investigating this
possibility formally in Italian. But even anecdotal assessmentssuch as counting the letters that start words in a dictionaryseem to conform to the law. This
may hint at something deeper both within mathematics and language and their
connection to the real world.
Actually, it was Andrey Markov who ventured into this territory in 1913. He
wanted to determine whether he could characterize a writers style by the statistics
of the sequences of letters that he or she used. Barrett (2014: 237238) describes
Markovs intriguing experiment as follows:
Markov looked at an extract from Pushkin of 20,000 (Russian) letters which contained the
entire rst chapter and part of the second chapter of a prose poem, with its characteristic rhyming patterns Markov simplied Pushkins text by ignoring all punctuation marks
and word breaks and looked at the correlations of successive letters according to whether
they were vowels (V) or consonants (C). He did this rather laboriously by hand (no computers then!) and totaled 8,638 vowels and 11,362 consonants. Next, he was interested in
the transitions between successive letters: investigating the frequencies with which vowels
Unauthenticated
and consonants are adjacent in the patterns VV, VC, CV or CC. He nds 1,104 examples of
VV, 7,534 of VC and CV and 3,827 of CC. These numbers are interesting because if consonants
and vowels had appeared randomly according to their total numbers we ought to have found
3,033 of VV, 4,755 of VC and CV and 7,457 of CC. Not surprisingly, Pushkin does not write at
random. The probability VV or CC is very different from VC and this reects the fact that language is primarily spoken rather than written and adjacent vowels and consonants make for
clear vocalization. But Markov could quantify the degree to which Pushkins writing is nonrandom and compare its use of vowels and consonants with that of other writers, If Pushkins
text were random then the probability that any letter is a vowel is 8,638/20,000 = 0.43 and
that it is a consonant is 11,362/20,000 = 0.57. If successive letters are randomly placed then
the probability sequence VV being found would be 0.43 0.43 = 0.185 and so 19,999 pairs of
letters would contain 19,999 0.185 = 3,720 pairs. Pushkins text contained only 1,104. The
probability of CC is 0.57 0.57 = 0.325. And the probability of a sequence consisting of one
vowel and one consonant, CV or VC, is 2 (0.43 0.57) = 0.490
Leaving aside the fact that the results could pertain only to the Russian language,
the nding is still remarkable. The implication is that factors such as personal
style, genre, meaning, and other factors have an effect on form and structure and
this can be determined probabilistically. This raises a fundamental question: Why
are numbers and letters not evenly distributed in texts and lists? Moreover, why is
the distribution scale-invariant, that is, measurable with different units?
Markovs idea has been taken up within QM and the results have been very
interesting. It is now known as a statistical language model, which assigns a probability to a sequence of n words, w, using a probability distribution:
P(w1 , w2 , w3 , . . . , w n )
The idea is then to estimate the probability of certain words, letters, expressions,
and so on in different kinds of texts. This has had, as we saw in the previous chapter, various applications to NLP study. For example, in speech recognition, the
algorithm attempts to match sounds with word sequences, given instructions for
distinguishing homophones and synonymous forms. Texts, moreover, are ranked
on the probability of the query Q in the text.
This line of research has recently been employed in cryptography and is
called, generally, frequency analysis (FA). The basis of FA is the observation that
the letters of the alphabet are not equally common (as discussed for Italian above).
The following frequency patterns have been noted across large samples of English
texts (Elwes 2014: 345) (see Table 4.3).
Applying FA to texts thus allows us to identify, within a range of probability,
language affinities at the level of phonemic-morphemic structure, in an analogous
way that Benfords Law allows us to identify number structure.
Unauthenticated
| 233
Table 4.3: Frequency patterns of English letters

Letter
Average number of occurrences

per 100 characters
Letter
Average number of occurrences

per 100 characters
e
t
a
o
i
n
s
h
r
d
l
u
c
12.7
9.1
8.2
7.5
7.0
6.7
6.3
6.1
6.0
4.3
4.0
2.8
2.8
m
w
f
g
y
p
b
v
k
j
x
q
z
2.4
2.4
2.2
2.0
2.0
1.9
1.5
1.0
0.8
0.2
0.2
0.1
0.1
The fact that logarithmic laws can be extracted from seemingly random data is
a truly remarkable nding. Probability theory has categorized events into three
classes:
1. Independent: each event is not affected by other events
2. Dependent or Conditional: an event is affected by other events
3. Mutually Exclusive: events cannot occur at the same time
Independent events, such as coin tosses, indicate that the elements of the events
do not know the outcome (so to speak). Each coin toss is an isolated event. If we
toss a coin three times and it comes up tails each time, what is the chance of the
next one being a head or a tail? Well, it is or 0.50, just like any other toss event.
There is no link between the current coin toss and the previous ones. Independent
events occur throughout Nature and human systems. Connecting themthat is,
giving them meaningis a human activity, not a probabilistic one. The kind of
probability law that applies to this kind of situation can be called, simply, probability I (for independent), or PI.
PI explains why the so-called gamblers fallacy is indeed fallacious. Basically, it asserts that since we have had three tails, a head as the next outcome is
due and therefore likely to occur with the next coin toss. But, as the PI suggests,
this is not true. As Elwes (2014: 341) elaborates: The error is that this law makes
probabilistic predictions about average behaviour, over the long term. It makes no
predictions about the results of individual experiments.
Unauthenticated
Dependent or conditional events are those that are dependent on previous

ones. After taking a card from a 52-card deck there are less cards available, so the
probability of drawing, say, an ace now change. For the rst draw the chances
have been discussed already. For the second card the chances determined are as
follows. If the rst one was not an ace, then the second one is slightly more likely
to be an ace, because there are still 4 aces left in a slightly smaller deck. The probability of drawing an ace can now be easily drawn up with permutation analysis.
This kind of situation is called probability C (for conditional), or PC. By the way,
putting cards back after drawing does not change the mathematics of PC.
Mutually Exclusive events are those that cannot occur at the same time. This
is called Probability of Mutual Exclusivity, or PME. PC events involve a both-and
probability, while PME events involve an either-or one. For example, when turning you can go one of two waysleft or right, and the two do not depend on each
other.
The point here is that studying events probabilistically has led to laws of structural possibility that would have otherwise remained unknown. One of the most
intriguing ndings lies in the use of the natural logarithm, ln, which is the logarithm with base e = 2.718281828, dened as follows:
x
ln x
dt
t
for x > 0 .
This means that e is the unique number with the property that the area of the
region bounded by the hyperbola y = 1/x and the x-axis, and the vertical lines
x = 1 and x = e is 1:
e
dx
= ln e = 1 .
x
1
The natural logarithm shows up in various branches of mathematics, and it has

applicability to the study of various probabilistic events, especially those that involve growth. It has even been used by Google, which gives every page on the
web a score (PageRank), constituting a rough measure of importance. This is a
logarithmic scale. So, a site with PageRank 2 (2 digits) is ten times more popular
than a site with PageRank 1. Logarithms thus describe the root cause for an effect
at the same time that they compress mathematical operationsa dualism that is
consistent with the laws discussed above.
Above all else, probability laws describe stochastic processesthose that
have a random probability distribution or pattern that may be analyzed statistically but may not be predicted precisely. These develop over time as probabilistic
rules. Consider the concept of a random walk, introduced by mathematics educator George Plya in 1921:
Unauthenticated
| 235
Choose a point on a graph at the beginning. What is the probability that a random
walker will reach it eventually? Or: What is the probability that the walker will
return to his starting point?
Plya proved that the answer is 1, making it a virtual certainty. He called it a
1-dimensional outcome. But in higher dimensions this is not the case. A random
walker on a 3-dimensional lattice, for instance, has a much lower chance of returning to the starting point (P = 0.34). This brings us back to the Markov chain
as a relevant model. Say that any stage of a random walk you ip a coin to decide in which direction to go next. In this case the type of analysis involved is of
the PI variety. The dening characteristic of a Markov chain is that the probability distribution at each stage depends only on the present, not the past. Markov
chains are thus perfect models for random walks and random events. The following gure (from Wikipedia) shows a walk whereby a marker is placed at zero on the
number line and a coin is ippedif it lands on heads (H) the marker is moved one
unit to the right (1); if it lands on tails (T), it is moved one unit to the left (1). There
are 10 ways of landing on 1 (by 3H and 2T), 10 ways of landing on 1 (2H and 3T),
5 ways of landing on 3 (4H and 1T), 5 ways of landing on 3 (1H and 4T), 1 way of
landing on 5 (5H), and 1 way of landing on 5 (5T) (see Figure 4.7).
In sum, probability constructs are much more than devices for determining
gambling outcomes. They appear to penetrate the structure of many events. These
are interconnected with Markov models that have formed the basis of formalism
in both mathematics and linguistics (previous chapter), thus bringing out the usefulness of the constructs, even if in constrained ways. As Elwes (2014: 342) points
out:
Markov chains are an excellent framework for modeling many phenomena, including population dynamics and stock-market uctuations. To determine the eventual behaviour of a
Markov process is a deep problem, as Plyas 3-dimensional random walk illustrates.
Of course, there are phenomena that do not obey probabilistic-logarithmic laws.

But the fact that probabilistic structure exists in the rst place is rather remarkable. It has in fact been found that logarithmic distributions are a general feature
of statistical physics. But what then do we make of datasets that do not conform to
the laws? Any theory would have to explain why some datasets follow the laws and
others do not. Simply put, not all of language or mathematics or natural events
have logarithmic (probabilistic) structure.
Unauthenticated
Unauthenticated
Figure 4.7: Markov chain analysis of the random walk problem (from Wikipedia)
1 1 3
1 1 3 1 3 3 5
Lands on
Outcome
HHHHH
HHHHT
HHHTH
1 1 3
HHHTT
HHTHH
HHTHT
HHTTH
HHTTT
3
HTHHH
HTHHT
HTHTH
HTHTT
HTTHH
HTTHT
HTTTH
HTTTT
THHHH
THHHT
THHTH
THHTT
THTHH
THTHT
THTTH
THTTT
TTHHH
TTHHT
TTHTH
TTHTT
TTTHH
TTTHT
TTTTH
Fifth flip
TTTTT
Fourth flip
Third flip
Second flip
First flip
4.5 Quantifying change in language
| 237

Probability distributions have had applications in corpus linguistics as already
discussed. But even before the advent of this branch in the 1980s, statistical techniques were used by linguists to study various phenomena, such as the regularity
(assumed) of change in language. The latter is a truly fascinating area of historical
linguistics that may have implications as well for studying change in mathematical systems, although these implications are beyond the scope of the present
treatment. In effect, change in language can be quantied with the laws of probability, at least within certain limits. These will shed some very important light
on the economic forces within language (PE). In other words, studying change
from the standpoint of QM is a means of penetrating the phenomenon of language
change from a particular angle.
Interest in how language originated goes right back to the ancient world. Starting in the eighteenth century, philologists tackled the origins question and by the
nineteenth century the number of speculative theories had become so profuse
that in 1866 the Linguistic Society of Paris imposed a ban on all discussions related to this topic. A similar prohibition was decreed by the Philological Society
of London a half century later in 1911. Such actions were motivated by the endless conjectures and unfounded models. But all this changed around the middle
part of the twentieth century. In that era, studying the origins of language became a highly scientic enterprise, constituting a new branch of linguistics, called
glottogenetics, whose modus operandi was informed by the blending of ndings
from cognate elds such as paleoneurology, archeology, evolutionary biology, and
statistics and probability theory. In the context of this new approach several fascinating questions crystallized: Is it possible to determine mathematically when
the languages in a family diverged to become autonomous codes? Is it possible to
measure the rate of change from the source language? Questions such as these led
to a new quantitative focus in glottogenetics that has shed considerable light on
how languages evolve, despite various limitations and controversies (Embleton
1986, Thomason and Kaufman 1988).
4.5.1 Lexicostatistics and glottochronology

Among the rst to consider quantitative methods in historical linguistics were
Kroeber and Chretien (1937), who investigated nine Indo-European languages,
comparing 74 phonological and morphological features. Ross (1950) then put forward a specic proposal for the quantitative study of changea suggestion elaborated by the American structuralist Morris Swadesh (1951, 1955, 1959, 1971), who
Unauthenticated
developed the concept of time depth, which became the founding technique in
glottochronology or lexicostatistics. Although there is a difference between the two
todaywith lexicostatistics used more generally for the measurement of inherent
tendencies in vocabularies and glottochronology for measuring the diversication
of related languages over timefor the present purposes it is sufficient to say that
they originated with the same purpose, namely to search for statistical regularity
in vocabulary systems and rates of replacement. Swadesh divided the origin and
evolution of language into four primary periods, in synchrony with the major ages:
1.
2.
3.
4.
the Eolithic (the dawn of the Stone Age)

the Paleolithic (the Old Stone Age)
the Neolithic (the New Stone Age)
the Historical, spanning the last 10,000 years
Within these time frames he located corresponding stages of linguistic evolution,

suggesting that all languages derived from one source during the Paleolithic
period and diversifying in quantitatively-determinable ways. Swadeshs reconstruction scenario was challenged on several counts. But his method showed,
once and for all, that a quantitative approach to the age-old question of language origins was conceivable and could be highly productive. Using data from
archeology and anthropology, together with a detailed knowledge of language
reconstruction, Swadesh demonstrated how a plausible primal scene could be
drafted, and how the transition to contemporary languages could be measured
as they branched off the original language root to become autonomous codes. In
other words, he suggested that by using statistical and probabilistic techniques
one could pinpoint when (approximately) languages diverged to become independent speech codes. He called it time depth.
So, how would one measure time depth? More specically, what linguistic features could be used as input to which quantication can be applied? Swadesh was
aware that the best available data for comparing languages was vocabulary and its
steady replacement over time. So, crucial to his framework was the notion of core
vocabularies or the basic vocabularies that all languages are assumed to develop
from common human conditionswords for kin, anatomy, natural objects such
as the sun and the moon, and the like. These, as Swadesh argued, can be used
to estimate the relative length of time that might have elapsedthe time depth
since two languages within a family began to diverge. The quantitative method
consists of the following three general procedures:
1.
First, a core vocabulary appropriate to the language family is established. Swadesh

claimed that the list should generally contain words for bird, dog, skin, blood,
bone, drink, eat, and so on, which are concepts that probably exist in all languages.
Unauthenticated
2.
3.
| 239
Culturally-biased words, such as the names of specic kinds of plants or animals, are
to be included in the core vocabulary only if relevant to the analysis at hand.
The number of cognates in the core vocabulary can be used to measure time depth,
allowing for sound shifts and variation. The lower the number of cognates, the longer
the languages are deemed to have been separated. Two languages that can be shown to
have 60 % of the cognates in common are said to have diverged before two which have,
instead, 80 % in common.
In 1953, Robert Lees modied Swadeshs formula for estimating time depth. Lees
assumed that the rate of loss in basic core vocabularies was constant. Allowing
for extraneous factors and interferences such as borrowing and social interventions (the maintenance of certain words for ritualistic reasons), Lees claimed that
the time depth, t, could be estimated within a normal probability distribution and
that it was equal to the logarithm of the percentage of cognates, c, divided by twice
the logarithm of the percentage of cognates retained after a millennium of separation, r:
log c
t=
2 log r
As in virtually all the cases discussed so far in this chapter, the key notion is,
again, that of logarithm. Although well known (and brought up frequently in this
chapter), it is worthwhile revisiting this key concept, since it is one of the main
ones in the development of probability theories, appearing, as we have seen, constantly in quantitative analyses of all kinds.
In mathematics a logarithm is the power to which a base, usually 10, must be
raised to produce a given number. If nx = a, the logarithm of a, with n as the base,
is x; symbolically, logn a = x. For example, 103 = 1,000; therefore, log10 1,000 = 3. To
get a sense of how Lees developed his formula, an analogy might be useful. Suppose we wanted to calculate the number of ancestors in any previous generation.
We have 2 parents, so we have 2 ancestors in the rst generation. This calculation
can be expressed as 21 = 2. Each of our parents has 2 parents, and so we have 2 2 =
22 = 4 ancestors in the second generation. Each of the four grandparents has 2 parents, and so we have 4 2 = 2 2 2 = 23 = 8 ancestors in the third generation. The
calculation continues according to this pattern. In which generation do we have
1,024 ancestors? That is, for which exponent x is it true that 2x = 1,024? We nd the
answer by multiplying 2 a number of times until we reach 1,024. But if we know
that log2 1,024 = 10, we can estimate the answer much more quickly.
So, like many other mathematical constructs the logarithm is a shortcut and
like all forms of economical compression has led to many discoveries. It recurs in
various mathematical functions, such as the constant e dened as the limit of the
expression (1 + 1/n)n as n becomes large without bound. Its limiting value is approximately 2.7182818285. As it turns out, e forms the base of natural logarithms;
Unauthenticated
it appears in equations describing growth and change; it surfaces in formulas for

curves; it crops up frequently in probability theory; and it appears in equations
for calculating compound interest. The remarkable aspect is that e was devised
as a symbol to represent a specic sequence of numbers. Its occurrences and applications in other domains are serendipitous discoveriesnot intended initially
or even expected.
As Marcus (2013: 123) puts it, logarithms are more than logarithms, so to speak
given their use in probability metrics:
We may conjecture with high plausibility that humans are logarithmic beings with respect to
the surrounding nature. They have the tendency to slow its rhythms; in contrast, our actions
via electronic computers are in most cases anti-logarithmic. They are exponential, because,
for the most important problems, the computational time to process their solution is exponential with respect to the size of the input. In these claims we took in consideration the
Weber-Fechner law (19th century) that when a sequence of excitations goes in geometric progression, the corresponding sensations go in arithmetic progression; some research in the
rst half of the 20th century led to the plausible conjecture that biological (organic) time is
a logarithmic function of chronological time, while, in the second half, we learned about
experiments indicating that psychological (subjective) time could be a logarithmic function
of chronological time. We also learned the above mentioned fact about the anti-logarithmic
computational behavior of human beings. Another example of a phenomenon going across
a lot of very heterogeneous disciplines is the idea of equilibrium starting with Lagranges
analytical mechanics, going further in physics, with the remarkable phenomenon of thermodynamic equilibrium, moving into the realms of biology, sociology and the study of strategic
games (Nash equilibrium). Here too some mathematical tools served as a catalyst.
It is not necessary to go here into the detailed mathematical reasoning used by

Lees. Suffice it to say that it is very similar to that used above to calculate the
number of ancestors in any generation. Instead of generations, Lees dealt with
cognates. Remarkably, his formula has produced fairly accurate estimates of
time depth for the Romance languages and other languages with documented
source languages. However, it has also produced ambiguous estimates for other
languagesone of these being the Bantu languages, whose source or protolanguage is not documented. The accuracy of the time depth formula will depend on
the accuracy of the core vocabularies used. Moreover, since logarithms are exponents, the slightest computational error will lead to a high degree of inaccuracy.
But despite such drawbacks, the value of lexicostatistics for contemporary work
on language evolution is undeniable.
The lexicostatistic analysis of core vocabularies also provides a database for
inferring what social and kinship systems were like in an ancient culture, what
kinds of activities people in that culture engaged in, what values they espoused,
and which ones changed over time. The work on PIE (Proto-Indo-European) has
Unauthenticated
| 241
remained the most useful one for establishing core vocabularies more scientically, for the simple reason that knowledge about this protolanguage is detailed
and extensive (Renfrew 1987, Mallory 1989). Already in the nineteenth century,
linguists had a pretty good idea both of what PIE sounded like and what its core
vocabulary may have been. Speakers of PIE lived around ten to ve thousand years
ago in southeastern Europe, north of the Black Sea. Their culture was named Kurgan, meaning barrow, from the practice of placing mounds of dirt over individual graves. PIE had words for animals, plants, parts of the body, tools, weapons,
and various abstract notions.
The core vocabulary notion has been used to reconstruct other language families, to compare variants within them, and to determine time depth. The main
problem is that vocabulary substitution is not constant. For this reason, a number of linguists today reject glottochronology. But if the database is large enough
and the time depth long enough, glottochronology has proven to be highly accurate, suggesting that languages do indeed undergo change regularly (see, for
example, Currie, Meade, Guillon, and Mace 2013). The premise that languages,
like natural substances, are governed by an inbuilt radioactive decay is both
true and false. It is true because languages do indeed change naturally; it is false
because language is also a variable social tool that is subject to factors other than
internal evolutionary tendencies. And this is why probabilistic measures are more
useful than linear metrics not involving logarithmic functions. The assumptions
of glottochronology can be outlined as follows:
1. Vocabulary is replaced at a constant rate in all languages and this rate can be
measured and used to estimate how long ago the language existed and when
it broke off from its family tree branch. But this may not always be the case,
as some ambiguous results using glottochronology have shown.
2. A core vocabulary should encompass common or universal concepts: personal pronouns, kinship terms, anatomical parts, and so on; these may show
some variation (as for example Russian ruk, which covers the same referential domain as two English words, arm + hand); so, the rening of terms is
required according to language family.
3. In lexicostatistical analysis, it is only the cognates (words with a clear common etymological origin) that are used in the time depth measurement. The
larger the percentage of cognates, the more recently the two languages are
said to have separated. But often words that are borrowed from languages for
various reasons may affect the overall computation and this should be taken
always into account.
Lees actually obtained another value (from the one above) for the glottochronological constant using a 200 word vocabulary, obtaining a value of 0.805 with
Unauthenticated
90 % condence. The constant is: L = 2 ln (r), where L = rate of replacement, ln =

natural logarithm (to the base e), and r = glottochronological constant. Swadesh
had originally constructed a 200 word list but later rened it into a 100 word one.
Problems with this list have been discussed in the relevant literature, but in the
end they seem to be minimal. After rening and elaborating a new core vocabulary database, Lees developed the following formula, which (to the best of my
knowledge) is the standard one today or at least the frame of reference for other
computational frameworks:
ln (c)
t=
L
In this formula t = a given period of time from one stage of the language to another, c = proportion of wordlist items retained at the end of that period, and L =
rate of replacement for that word list. Applications of this formula with veriable
casesthose for which we have knowledge of the source languagehave been
fairly successful. Swadesh himself had arrived at the value L = 0.14 for the Romance family, indicating that the rate of replacement consisted of approximately
14 words per millennium. In the case of PIE, the time depth approach accounted
for 87 % of the diversication. Fleming (1973) found a similar accuracy level in the
analysis of the Hamito-Semitic family, matching the results of radiocarbon dating
and blood dating of people related to each other by race.
As mentioned, glottochronology has been controversial from the outset (Gudschinsky 1956, Hoijer 1956, Bergsland and Vogt 1962, Holm 2003, 2005, 2007),
despite the encouraging results it has produced over the years (for example, Chretien 1962, Dobson 1969, Dobson and Black 1979, Renfrew, McMahon, and Trask
2000). Rebuttals to the critiques, such as the ones by Dyen (1963, 1965, 1973, 1975)
and Kuskal, Dyen, and Black (1973) have been somewhat effective in counteracting the general belief that glottochronology is fundamentally useless. Rather, they
suggest that it is in need of even further renement, both methodologically and
mathematically, and must take into account other factors that may be involved in
affecting time depth. In other words, glottochronology has become a fertile interdisciplinary area involving linguists, mathematicians, and computer scientists.
There is substantial work that shows, cumulatively, that it is empirically veriable, within limits and within certain restrictions (Arndt 1959, Hymes 1960, Chretien 1962, Wittman 1969, 1973, Brainerd 1970, Embleton 1986, Callaghan 1991).
Some issues remain, though. The stability of lexemes in Swadesh lists is one of
these (Sankoff 1970, Haarmann 1990). The original Swadesh-Lees formulas have
also been found not to work universallyone reason is that words get replaced
through borrowing, additions, and the like, which fall outside of the formulas.
Historical accidents are not covered by the mathematics and these seem to be determinative in many cases. Language change is not just spontaneous, as change
Unauthenticated
| 243
in organisms and natural substances is (all subject to decay); there is a sociohistorical component that affects change which falls outside of the Swadesh-Lee
paradigm.
Some mathematical linguists have, actually, confronted the main issues in
glottochronology. Van der Merwe (1966) split up the word list into classes that
showed an isomorphic rate of change. Dyen, James, and Cole (1967) allowed the
meaning of each word (realized by different lexemes) to have its own rate. Gleason (1959) and Brainerd (1970) modied the formulas so as to take into account
change in cognation, and Sankoff (1973) did the same for borrowing factors and
synonyms. Embleton (1986) used various simulation models to further rene the
mathematics. Gray and Atkinson (2003) developed a lexicostatistical model that
does not assume constant rates of change, showing that the dating of languages
is still a viable method that can used to adjust previous estimates using SwadeshLees formulas. Similarly, Starostin (1999) made adjustments that allow for the
elimination of borrowing and other accidental interferences in the rate of change.
Starostins proposals are very intriguing and seemingly viable ones. These include
the following:
1. Since loanwords, words borrowed from one language into another, are a disruptive factor in the calculations, it is relevant to consider the native replacement of items by items from the same language. The failure to do this was a
major reason why Swadeshs original estimation of the replacement rate was
under 14 words from the 100-wordlist per millennium, when the real rate is,
actually, much slower (around 5 or 6). Introducing this correction into the formula effectively cancels out counter-arguments based on the loanword principle. A basic wordlist includes generally a low number of loanwords, but it
does bring down the time depth calculations as indicated.
2. The rate of change is not really constant, but actually depends on the time period during which the word has existed in the language (in direct proportion
to the time elapsedthe so-called aging of words, understood as gradual
erosion of the words primary meaning under the weight of acquired secondary ones).
3. The individual items on the 100 wordlist have different stability rates (for instance, the lexemes for the pronoun I generally have a much lower chance
of being replaced than a word for, say, yellow).
Starostins formula takes the above variables into account, including rate of
change and individual stability quotients:
t=
ln (c)
Lc
Unauthenticated
In this formula, Lc denotes the gradual slowing down of the replacement process due to different individual rates (the less stable lexemes are the rst and the
quickest to be replaced), whereas the square root represents the reverse trend,
namely the acceleration of replacement as items in the original wordlist age and
become more apt to shift their meaning. This yields more credible results than the
Swadesh-Lees one. More importantly, it shows that glottochronology can really
only be used as a serious mathematical tool on language families whose phonology is known.
Dyen, Kruskal and Black (1992) used an Indo-European database with 95 languages, nding that glottochronological approaches are rather successful in
predicting time depth. Ringe, Warnow and Taylor (2000) used a quantitative
analysis on 24 Indo-European languages, involving 22 phonological units, 15 morphological structures and 333 lexical ones, again obtaining fairly accurate results
when mapped against known historical factors (such as when the societies
emerged as autonomous entities). Gray and Atkinson (2003) examined a database
of 87 languages with 2,449 lexical items, incorporating cognation research. Other
databases have been drawn up for African, Australian and Andean language
families, among others. As linguists acquire more and more information on the
nature of core vocabularies and as research in quantication methods becomes
evermore accurate, good glottochronological analyses are becoming more and
more a reality, thus validating Swadeshs pioneering work.
But, then, how do we reliably recognize distant relatives whose spellings have
drifted far apart? Why should we even presume that the tree of language is a
tree, as opposed to a sort of network, given that lexical borrowings and language
admixtures are common occurrences? Over the years, historical linguists have
separately tackled such questions with steadily increasing mathematical sophistication. One has been supplanting Swadeshs time depth method with cladistic
techniques that account for each word to model the actual process of evolution.
Cladistics is a method of classifying animals and plants according to the proportion of measurable characteristics that they have in common. It is assumed
that the higher proportion of characteristics that two organisms share, the more
they have recently diverged from a common ancestor. In other words, cladistics is
the counterpart to lexicostatistics, but provides more sophisticated mathematical
models that seem to apply as well to language diversication. Gray and Atkinson (2003) have applied sophisticated computational tools (maximum-likelihood
models and Bayesian Inference techniques) for dealing with variable rates. By
breathing new life into glottochronology, research paradigms such as these are
stimulating the cross-fertilization of ideas.
Gray and Atkinsons paper dates the initial divergence of the Indo-European
language family to around 8,700 years ago, with Hittite the rst language to split
Unauthenticated
| 245
off from the family tree. They support their theory by taking into account the fact
that Indo-European originated in Anatolia and that Indo-European languages
were transported to Europe with the spread of agriculture. They argue against
the alternative Kurgan hypothesis, which claims that the Kurgan culture of the
Steppes was Indo-European speaking. They used an existing database of core
words compiled by Dyen (discussed above) with software developed in genetics
to construct a family tree and assign dates to it. Their approach is similar to
glottochronology but also different in that it uses new computational-algorithmic
methods to construct the tree and compute the dates. The study thus avoids many
of the problems that frequently arise in work of this type. However, like most
studies in glottochronology the method does not take cultural inuences into
account, which interfere with the regularity of change in language.
4.5.2 Economy of change

The idea that change in language is economically-motivated is an implicit one
in glottochronology, since logarithmic functions describe the details of how optimization or compression in change unfolds along a time axis. Saussures (1879)
work on Hittite was based on considering compression and how it can be used
to deduce various properties in systems. Saussure had proposed to resolve various anomalies in the PIE vowel system by postulating the existence of a laryngeal
sound /h/ that, he claimed, must have caused the changes in the length and quality of adjacent vowels to occur in PIEs linguistic descendants. His suggestion was
based purely on reconstructive reasoning and the rationale of symmetry in phonemic systems. If there is, say, a voiceless phoneme in one category (like /p/ in the
category of the occlusives) then the symmetry of phonological structure requires
the presence of a voiced counterpart (/b/). Phonological systems are economical,
as discussed, but they also require that the elements within them display symmetry. This turns out to be consistently the case, with some exceptions. Saussures
hypothesis was considered clever, but dismissed as improbable because it could
not be substantiated. However, in 1927, when cuneiform tablets of Hittite were dug
up by archeologists in Turkey, they revealed, according to Kuryowicz (1927), the
presence of an /h/ sound in that language that occurred in places within words
where Saussure had predicted it should be. Despite the doubts that still linger,
most would agree that the tablets contain the laryngeal (or something approximate to it). As Gamkrelidze and Ivanov (1990: 110) have put it, it is quite remarkable to note that linguistics can reach more deeply into the human past than the
most ancient records.
Unauthenticated
The gist of the work in quantitative linguistics generally shows that economical forces were at work in language evolutiona principle elaborated later by Martinet (1955) as the PE, as discussed throughout this chapter. Various other theories
have, of course, been put forward to explain why languages change so predictably.
Ignoring the alternatives for the sake of argument, what stands out is the fact
that the PE covers so many phenomena, such as time depth and cognation factors. The general implication of this virtual law of change is intertwined with the
PLE, namely that reducing the physical effort involved in speaking has an effect
on language structure. Economy is thus tied to effort and efficiency. Compressed
forms (abbreviations, for instance) and systems (syntax-versus-morphology, for
example) lead to efficiency in use. The same applies to mathematics. There are
many episodes in the history of mathematics whereby someone comes up with
an economical method to represent a cumbersome task, as we saw with exponential notation, which, later, leads to discoveries forming the foundations of a new
branch.
Actually, for the sake of historical accuracy, it should be mentioned that the
PE was both implicit and somewhat explicit in linguistics before Martinet. As mentioned above the rst mention of the PE was in Whitney 1877, where it was called
the Principle of Economy. In 1939, Joseph Vendryes (1939: 49) discussed the presence of economic forces in language as did Hjelmslev (1941: 111116) shortly thereafter. Interestingly, Vendryes saw economy as operative not only in phonology, but
in other areas of language, without however seeing a reorganizational system involved among the various levels (discussed above). He also articulated a version of
the PLE by pointing out that the formation of sentences also seems to be regulated
by economy. Basically, the PE posits that a language does several things at once:
(1) it increases its communicative rapidity rate through compression of form, (2) it
gets rid of superuous material, an idea that was already known in the nineteenth
century and articulated eloquently by Paul Passy in 1890, who also claimed that a
language gives prominence to every necessary element in the system, discarding
or marginalizing the other elements.
Passy was probably inuenced by the ideas of Whitney (1877) and Henry
Sweet (1888) both of whom noted two patterns in language change: (1) the
dropping of superuous sounds and (2) easing the transition from one sound
to another via assimilatory processes. The Romance language family was used
as a litmus test to evaluate the accuracy of Sweets and others observations.
Consider the following cognates in three Romance languagesItalian, French,
and Spanish. The Latin words from which they derived are provided as well:
Unauthenticated
| 247
Table 4.4: Cognates in three Romance languages

Latin
Italian
French
Spanish
nocte(m) (night)
octo (eight)
tectu(m) (roof )
notte
otto
tetto
nuit
huit
toit
noche
ocho
techo
It can be seen that Latin ct (pronounced /kt/) developed to tt (= /tt/) in Italian, to it

(= /it/) in Old French, and to ch (= //) in Spanish. These are, in effect, sound shifts
that occurred in these languages. Having established the sound shifts, one can
now generalize as follows: Latin words constructed with /kt/ will be pronounced
with /tt/ in Italian, /it/ in French, and // in Spanish. So, our analysis would predict that Latin words such as lacte(m) and factu(m) would develop to the forms
latte and fatto in Italian, lait and fait in French, and leche and hecho in Spanish.
Given the type of consistent results it produced, this method of comparative
analysis was used not only to reconstruct undocumented protolanguages, but also
to understand the nature of sound shifts and thus of change in language generally. In Italian, it can be seen that the rst consonant /k/ assimilated completely in
pronunciation to the second one, /t/. Assimilation is, clearly, the process whereby
one sound takes on the characteristic sound properties of another, either partially
or totally. In Old French, the assimilation process was only partial, since the zone
of articulation of the vowel sound /i/ in the mouth is close, but not identical, to
that of /t/. This particular type of assimilation is called vocalization. In Spanish,
the /k/ and /t/ merged, so to speak, to produce a palatal sound, //, which is articulated midway between /k/ and /t/. The process is known as palatalization.
As a factor in sound shift, assimilation can easily be seen as a manifestation of
economythat is to say, in all three Romance languages, the outcome of the cluster /kt/ reects an attempt to mitigate the gap between the /k/ sound, which is
articulated near the back of the throat, and the /t/ sound which is articulated at
the front end of the mouth. Phonetically, the distance between these two sounds
makes it effortful to articulate the cluster /kt/ (as readers can conrm for themselves by pronouncing the Latin words slowly). Assimilation makes the articulation much more effortless, by either gapping the distance between /k/ and /t/ or
eliminating it altogether.
Werner Leopold (1930) discovered contradictory tendencies in linguistic
change, which he called distinctness versus economy, which means that economical forces are at odds sometimes with those aiming to avoid ambiguity (as
discussed in the previous chapter). In fact, the PE focuses on superuity, but it
is tempered by ambiguity criteria, thus showing that usage is a powerful force in
Unauthenticated
language. In 1988, Valter Tauli followed up on this dichotomy suggesting that the
forces driving language change include the following (Tauli 1958: 50):
1. a tendency towards an economy of effort (the PE)
2. a tendency towards clarity (so as to avoid ambiguity)
3. emotional impulses
4. aesthetic tendencies
5. social impulses
Various other studies have been published since Martinet. Virtually all take into
account external factors in the operation of the PE. Interestingly, generative grammars have also given their particular take on the PE. In the Minimalist Program
economy seems to be inherent in how the rules show economy of form through
various processes called generally optimality theory. This is a general model of
how grammars are structured. For example, if a vowel appears only when it is
needed for markedness reasons, in words that would otherwise be without vowels
and in clusters that would otherwise violate certain phonological rules, the process is called economical because it follows from intrinsic properties of optimality
rather than stipulated economy principles. This is, of course, a different approach
to economy, but even in formalist grammars, the concept itself is seen as cropping
up in various places and is thus used to explain tendencies in language.
4.6 Overview
In general, QM has allowed linguists and mathematicians to discover principles
of structure that would have otherwise remained unknown. And it has suggested
that intrinsic forces are at work in the evolution of both mathematics and language. These have been subsumed under two principles herethe PE and the PLE.
The ideas discussed in this chapter, however, bring us back to the most fundamental question of all: Are they truly real or are they simple constructs that
match our views? One could say from the research in QM that the brain compacts
the information that it uses frequently, making it an efficiency-seeking organ. But
there are many dangers involved in correlating the brain with its products and
inferring from the latter what goes on in the former. Nonetheless, had the brain
had a different structure, the PLE might not have manifested itself in language
and mathematics.
Of course, a way around the brain-as-mind-as-brain vexatious circularity is to
eliminate the distinction between inner (mental) and observable (behavioral) processes and to create articial models of the processes in computer software. The
most radical AI researchers, like Ray Kurzweil (2012), view this as not only plau-
Unauthenticated
4.6 Overview
249
sible but inevitable. While this seems to be a modern premise, it really is no more
than a contemporary version of an age-old belief that the human mind is a machine programmed to receive and produce information in biologically-determined
ways. The new impetus and momentum that this belief has gained has rekindled
the mind-body problem in a modern form: Is cognition a derivative of individual
experience? Or is it inherent in innate mental structures independently of bodily
processes and individual feelings?
When a mathematician solves or proves an intractable problem by essentially
reducing it to a formula, an equation, or a proof, the way in which it is done puts
the brains capacities on display. But this cannot explain the process. The concept
of ergonomicsa term coming out of psychology and sociology in relation to the
design of workplaces so that they may provide optimum safety and comfort and
thus enhance productivity ratesmay be of relevance here. This notion has been
extended to the study of biological systems and to the study of language.
The term was introduced in 1857 by Woiciech Jastrzbowski and then again
in 1949 by British psychologist Hywel Murrell. The basic premise of ergonomics is
that the design of things tends towards maximum efficiency. A simple, yet still profound demonstration of this is the Pythagorean theorem. The Egyptians had discovered that knotting and stretching a rope into sides of 3, 4, and 5 units in length
produced a right triangle, with 5 the longest side (the hypotenuse). The Pythagoreans were aware of this discovery. It was an ergonomic one. The aim of the goal was
to show that it revealed a general structural patternan inherent PE in the world.
Knotting any three stretches of rope according to this patternfor example, 6, 8,
and 10 unitswill produce a right triangle because 62 + 82 = 102 (36 + 64 = 100). As
the historian of science, Jacob Bronowski (1973: 168) has insightfully written, we
hardly recognize today how important this demonstration was. It could no longer
be attributed simply to simple invention. It was a discovery that reached out into
the world:
The theorem of Pythagoras remains the most important single theorem in the whole of mathematics. That seems a bold and extraordinary thing to say, yet it is not extravagant; because
what Pythagoras established is a fundamental characterization of the space in which we
move, and it is the rst time that it is translated into numbers. And the exact t of the numbers describes the exact laws that bind the universe. If space had a different symmetry the
theorem would not be true.
We could conceivably live without the Pythagorean theorem. It tells us what we

know intuitivelythat a diagonal distance is shorter than taking an L-shaped path
to a given point. And perhaps this is why it emerged. Theorems such as the one
by Pythagoras substantiate the claim that we seek efficiency and a minimization
of effort in how we do things and how we classify and understand the world. In
Unauthenticated
the biological realm, research has shown that the human body is designed to seek
maximum efficiency in locomotion and rate of motion. The body is an ergonomic
structure. From this, we are apparently impelled to design our products and artifacts ergonomicallyfrom handles to the design of chairs for maximum comfort.
Language and mathematics too would fall under the rubric of ergonomic structure.
The overall premise that derives from the work in QM is that mathematics and
language are subject to many of the same laws of biological and physical systems.
When mathematics and language go contrary to these laws, it is for social, creative, and inventive reasons. And this happens often, since social and imaginative
forces are as powerful, if not more so, than inbuilt psycho-biological ones. These
allow us to step outside the laws of evolutionary thrust and of the normal distribution. Any model, including a logarithmic one, is an interpretation. But then
why does Benfords Law apply no matter who devised it (namely Newcomb and
Benford)? This takes us back to whether or not mathematics is discovered or invented, to which there is no clear-cut answer. It is both and the interplay between
invention and discovery is what gives principles such as the PE one some validity.
More accurately, discovery occurs through abductive processes but it needs to be
rened to make it stable and viable. Discovery involves a lot of information; theorization steps in to eliminate the superuous information and rene the discovery
to t specic needs and ideas.
Clearly, there is a connection between mathematics, language, the mind, and
reality. But is this connection of our own making or is it a reex of our need to
understand the world? In order to grasp the hermeneutical nature of discovery in
mathematics, this is perhaps the most crucial question of all. It is relevant to note
that What is mathematics? was the title of a signicant book written for the general
public by Courant and Robins in 1941. Their answer to this question is indirect
that is, they illustrate what mathematics looks like and what it does, allowing us
to come to our own conclusions as to what mathematics is. And perhaps this is the
only possible way to answer the conundrum of mathematics. The same can be said
about music. The only way to answer What is music? is to play it, sing it, or listen
to it. And of course the answer to What is language? is to speak it. A year before,
in 1940, Kasner and Newman published another signicant popular book titled
Mathematics and the imagination. Again, by illustration the authors show how
mathematics is tied to imaginative thought. We come away grasping intuitively
that mathematics is both a system of logic and an art, allowing us to investigate
reality. Lakoff and Nez also approached the topic of what is mathematics in
2000, as mentioned. But rather than illustrate what mathematics does, they made
the claim that it arose from the same conceptual system that led to the origin of
language, being located in the same areas of the brain as language. So, maybe one
Unauthenticated
4.6 Overview
251
can do more than just illustrate what mathematics or language is; one can truly
understand it by comparing the two. Lakoff and Nez are on the right track, as
will be discussed in the next chapter. As we saw, the two scholars claimed that
mathematical notions and techniques such as proofs are interconnected through
a process of blending. This entails taking concepts in one domain and fusing them
with those in another to produce new ones or to simply understand existing ones.
Changing the blends leads to changes in mathematical structure and to its development. Like language, no one aspect of mathematics can be taken in isolation.
So, what is reality and what is the connection between mathematics, language and reality? Is the calculus just a means of coding reality and then using it,
like a map, to explore reality further? There is no doubt that the calculus is a symbolic artifact and that it allows us to engage with reality. The connection between
symbols and the reality they represent is a dynamic one, with one guiding the
other. By way of conclusion, consider the use of quantication in science. Science
is not based on certainty, but on guesses, theories, paradigms, and probable outcomes. It obeys the same laws of probability that mathematics and language do. To
make their hunches useable or practicable, scientists express them in mathematical language, which gives them a shape that can be seen, modied, and tested.
In some ways, science is the referential domain of mathematics.
It was in the early 1900s when scientists started looking beyond classical
Newtonian physics, discovering gaps within it, and thus looking for new interpretations of observed events. The reason was that the observations and the mathematical equations were out of kilter. Max Planck published a new theory of energy
transfer in 1900 to explain the spectrum of light emitted by certain heated objects,
claiming that energy is not given off continuously, but in the form of individual units that he called quanta. Planck came to this theory after discovering an
equation that explained the results of these tests. The equation is E = N h f, with
E = energy, N = integer, h = constant, f = frequency. In determining this equation,
Planck came up with the constant (h), which is now known as Plancks constant.
The truly remarkable part of Plancks discovery was that energy, which appears to
be emitted in wavelengths, is actually discharged in small packets (quanta). This
new theory of energy revolutionized physics and opened the way for the theory of
relativity.
In 1905, Einstein, proposed a new particle, later called the photon, as the
carrier of electromagnetic energy, suggesting that light, in spite of its wave nature, must be composed of these energy particles. The photon is the quantum
of electromagnetic radiation. Although he accepted Maxwells theory, Einstein
suggested that many anomalous experiments could be explained if the energy
of a Maxwellian light wave were localized into point-like quanta that move independently of one another, even if the wave itself is spread continuously over
Unauthenticated
space. In 1909 and 1916, he then showed that, if Plancks law of black-body radiation is accepted, the energy quanta must also carry momentum, making them
full-edged particles. Then, in 1924, Louis de Broglie, demonstrated that electrons could also exhibit wave properties. Shortly thereafter, Erwin Schrdinger
and Werner Heisenberg, devised separate, but equivalent, systems for organizing the emerging theories of quanta into a framework, establishing the eld of
quantum mechanics. The relevant point to be made is that these systems were
all expressed in mathematical language and it was because of this language that
further ideas crystallized to make quantum physics a reality.
Quantum mechanics provides a different view of the atom than classic
physics. The discovery that atoms have an internal structure prompted physicists
to probe further into these tiny units of matter. In 1911, Ernest Rutherford developed a model of the atom consisting of a spherical core called the nucleus, made
up of a dense positive charge, with electrons rotating around this nucleus. Bohrs
proposal was a modication of this model. In 1932, James Chadwick suggested that
the atomic nucleus was composed of two kinds of particles: positively charged
protons and neutral neutrons, and a few years later in 1935, Hideki Yukawa, proposed that other particles, which he called mesons, made up the atomic nucleus.
After that, the picture of the atom grew more complicated as physicists discovered
the presence of more and more subatomic particles. In 1955, Owen Chamberlain
and Emilio Segre discovered the antiproton (a negatively charged proton), and in
1964, Murray Gell-Mann and George Zweig, proposed the existence of so-called
quarks as fundamental particles, claiming that protons and neutrons were composed of different combinations of quarks. In 1979, gluons (a type of boson) were
discovered as carrying a powerful strong force. This force, also called the strong
interaction, binds the atomic nucleus together. In 1983, Carlo Rubbia discovered
two more subatomic particlesthe W particle and the Z particle, suggesting that
they are a source of the weak force, also called the weak interaction.
Today, physicists believe that six kinds of quarks exist and that there are
three types of neutrinos, particles that interact with other particles by means of
the weak nuclear interaction. The last kind of neutrino to be directly detected
is known as the tau-neutrino. There may be an underlying unity among three
of the basic forces of the universe: the strong force, the weak force, and the
electromagnetic force that holds electrons to the nucleus.
Now, the point to the above historical excursion into quantum physics is that
the discoveries related to it dovetail perfectly with the rise of group theory, matrix
theory, and other modern-day mathematical theories, forming the basis of quantum physics. The question of which came rst, the physics or the mathematics, is a
moot one. In 1927, Heisenberg discovered a general characteristic of quantum mechanics, called (as is well known) the uncertainty principle. It is to physics, what
Unauthenticated
4.6 Overview
253
Gdels undecidability theory is to logic and mathematics. According to this principle, it is impossible to precisely describe both the location and the momentum
of a particle at the same instant. For example, if we describe a particles location
with great precision, we must give its momentum in terms of a broad range of numbers. In effect, we must force the electron to absorb and then re-emit a photon so
that a light detector can see the electron. We know the precise location of both
the photon source and the light detector. But even so, the momentum spoils our
attempt: The absorption of a photon by the electron changes the momentum. The
electron is therefore in a new direction when it re-emits the photon. Thus, detection of the re-emitted photon does not allow us to determine where the electron
was when it absorbed the initial photon.
Such phenomena nd their codication in the language of functional analysis,
a research area within mathematics that was inuenced in large part by the needs
of quantum mechanics, which can model the values of physical observations such
as energy and momentum, considered to be Eigen values, involving the mathematics of continua, linear operators in Hilbert space, and the like. Essentially,
functional analysis deals with functionals, or functions of functions. It is the result
of conceptual blending whereby diverse mathematical processes, from arithmetic
to calculus, are united because they exhibit very similar properties. A functional,
like a function, is a relationship between objects, but the objects may be numbers,
vectors, or functions. Groupings of such objects are called spaces. Differentiation
is an example of a functional because it denes a relationship between a function
and another function (its derivative). Integration is also a functional. Functional
analysis and its osmosis with quantum mechanics shows how discoveries have
always been made, by the analogical blending of previous ideas with new ones.
Classical mechanics, special relativity, general relativity, and quantum mechanics, all utilize the concept of symmetry in their mathematical forms, such as
the symmetry related to rotations in space. A guiding assumption is that fundamental physical laws should look the same no matter which direction one looks.
A physicist can describe this property by saying that the laws are invariant under rotation. But invariance under rotation presupposes a role for the observer.
The variable direction to be used as a result of a rotation is the direction that the
observer chooses. A translation in space is dened as a shift in the measurement
system produced by placing the origin for measurement at a different location. It is
anticipated that the fundamental laws will look the same after a translation. This
property is called invariance under translation. It is of course a theoretical constraint in the minds of physicists. But the concept of invariance has been found to
occur in actual spaces. This kind of symmetry also occurs at the subatomic level.
The mathematical properties of the rotation group, together with the group of
Unauthenticated
symmetries under interchange of two or more electrons, constrain many of the

properties of the electron orbits and the atomic spectra related to them.
The search for an all-encompassing formula to decode the universe is what
mathematics and science are ultimately all about. For a moment in time, it was
thought, at least unconsciously, that Einsteins E = mc2 may have been that formula, even though, of course, it was not. But it has imprinted in it a lot of information about reality that could not be expressed in any other way. It says, in a
nutshell, that the speed of light is constant and thus constrains physical reality
in its own way, but it also indicates that mass, length, and time are not. They
are the variables that bring about change. It also says that the changes are innitesimal as we approach the speed of light, but they are there. What happens if
there is a universe where this formula does not hold? It would be unimaginable.
In effect, a formula expresses the otherwise inexpressible. As Wittgenstein (1922)
put it, Whereof one cannot speak, thereof one must be silent. As the language
of nature, mathematics breaks the silence periodically. E = mc2 speaks volumes,
to belabor the point somewhat.
But then, no one has really seen an atom and of course light has been
measured indirectly through ingenious techniques. This is actually the point
of physicist Lee Smolin (2013). How, Smolin asks, can quantum and relativity
laws account for the highly improbable set of conditions that triggered the Big
Bang jump-starting the universe? How can quantum scientists ever really test
their timeless cosmic hypotheses? Though time has always been a quantity to
measure, Smolin asserts that in the seventeenth century, scientists wondered
whether the world is in essence mathematical or it lives in time. Newtons laws
of motion made time irrelevant, and Einsteins two theories of relativity are,
at their most basic, theories of timeor, better, timelessness. Galileo, on the
other hand, suggested that time should be regarded as another dimension, and in
1909, mathematician Hermann Minkowski developed the theory of spacetime,
a feature of the universe shaped by gravity.
To summarize, science comes to its ideas via models of quanticationin
some cases the models precede the observations, in others they are the only way
these can be grasped. Science changes the mathematics and vice versa the mathematics is the language of science, changing in ways that are due from the interplay
between the two. The interplay between form, its compression tendencies, and its
proclivity to show statistical regularities in specic ways (such as Benfords and
Zips Laws) is also studied by statistics and probabilistic analysisboth of which
have become, as we have seen, powerful tools in the quest to understand both
mathematics and language.
Unauthenticated
5 Neuroscience
Tears come from the heart, not from the brain.
Leonardo da Vinci (14521519)
In chapter 2, abduction was discussed as guiding the development of deductive
proofs, that is, in allowing the proof-maker to infer what is needed along the
sequence of statements that make up the proof. The argument was made that,
although proofs show logical structure, especially in the way they are laid out
through a concatenation of statements, the selection of some of the statements
does not come automatically from the concatenation structure itself, but rather
from insights that are akin to metaphorical hunches in language. The source of
the abductive insights has been called a neural blending process, which involves
amalgamating something in one region of the brain with a task at hand so that
it can be better understood and carried out. The concept of blending thus sheds
light on how the two parts of cognitionabduction (imagination) and logic
constitute a single system that has been called interhemispheric.
In many of the theories and models discussed in previous chapters, the assumption is that they reveal what mathematics and language are all about. Today,
cognitive scientists look to validate these by turning to experimental methods
made available by neuroscience. Whether or not there is a continuity (or ontological osmosis) between brain and mind, the fact is that it is assumed to be there
by theorists. Neuroscience has thus been used by formalists, computationists,
and cognitivists alike to justify their theories, having become an intrinsic litmustesting tool, so to speak, of both linguistics and mathematics. It may well be the
central disciplinary link between the two.
The question that this train of thought raises is a rather deep epistemological
one: As interesting as it is, does a theory really explain mathematics, language, or
anything else, for that matter? Or is it nothing more than a gment of the fertile
imaginations of linguists and mathematicians, working nowadays with computer
scientists and statisticians? It was Roman Jakobson (1942) who was among the rst
to deal with this question, claiming that neurological research is the only one that
can be used in any empirical sense to evaluate the validity of theories and constructs (see also Jakobson and Waugh 1979). Modern-day neuroscience has taken
its cue from Jakobsons suggestion, expanding the research paradigm considerably with sophisticated brain-imaging techniques. Since at least the mid-1990s,
Unauthenticated
256 | 5 Neuroscience
truly signicant ndings have started to emerge within neuroscience, showing,

for example, that some system-specic theories have broader applicability than
others and, thus, may be assigned greater validity in psychological terms.
For example, the theory of markedness in linguisticsthe view that some
units are more fundamental that other oneshas found fertile ground in the
study of mathematical and musical learning (see, for example, Collins 1969,
Barbaresi 1988, Park 2000, Mansouri 2000, Schuster 2001, Hatten 2004, Arranz
2005, Vijayakrishnan 2007). An important recent study by van der Schoot, Bakker
Arkema, Horsley and van Lieshout (2009) examined the opposition more thanvs.-less than (the rst being the unmarked, basic, term) in word problem solving
in 1012 year old children differing initially in math skills. The researchers found
that the less successful problem solvers utilized a successful strategy only when
the primary term in a problem was the unmarked one. In another signicant
study, Cho and Proctor (2007) discovered that classifying numbers as odd or even
with left-right keypresses was carried out more successfully with the mapping
even-right-vs.-odd-left than with the opposite mapping. Calling this a markedness
association of response codes (MARC) effect, the researches attributed it to compatibility between the linguistic markedness of stimulus and response codes. In
effect, a specic linguistic theorymarkednesshas proven itself to be a viable
tool for investigating mathematical learning experimentally.
This brief nal chapter looks at some relevant research in neuroscience that
can be used to shed light on various theories and positions vis--vis the nature
of mathematics and language, many of which have been discussed in previous
chapters. This survey is necessarily selective in the same way that the discussions
in other parts of the book have been. Nevertheless, the selection has been guided
by the themes in the literature that seem to crop up frequently in it.
5.1 Neuroscientic orientations

One of the earliest uses of neuroscientic reasoning in mathematics was with
respect to the so-called Church-Turing thesis, which formalized the principles
underlying computability theory (Church 1935, 1936). The thesis states that any
real-world computation can be translated into an equivalent computation with a
Turing machine, that is, a real-world calculation (as it takes place in the brain)
can be done using the lambda calculus, which is equivalent to using general
recursive functions (see chapter 3). The thesis was applied to cellular automata,
substitution systems, register machines, combinators, and even quantum computing. There were conicting points of view about the thesis from the outset. One
states that it can be proved, even though a proof has not been as yet discovered,
Unauthenticated
5.1 Neuroscientic orientations |
257
and another says that it serves mainly as a denition for computation. Support
for the validity of the thesis comes from the fact that every realistic model of computation, yet discovered, has been shown to be equivalent. The thesis has been
extended to the principle of computational equivalence (Wolfram 2002), which
claims that there are only a small number of intermediate levels of computing
power before a system is universal and that most natural systems are universal.
The relevant point here is that the thesis was believed to mirror what happens in
the brain.
It is the work of McCulloch and Pitts in 1943 that can be called neuroscientic
in the modern sense. The researchers aimed to show that a logical model of nervous activity was consistent with the logical calculus. Using articial models of
neurons connected together as if in a network, the researchers claimed to show
that the brain produced highly complex patterns in the same way as their models.
Their contribution led to the development of articial neural networks (ANNs),
which, as we saw, are networks designed to model biological neurons. McCulloch
and Pitts also argued that the features of the network could be expanded to allow
it to learn from new inputs. Then in 1957, Frank Rosenblatt added the notion of
the perceptron to ANNs, whereby inputs are processed by association units programmed to detect the presence of specic features in the inputs.
This type of research was an early version of computational neuroscience,
an orientation within neuroscience attempting to model formalist and computational models of language and mathematics in computer software designed
to mimic biological software. It did not take hold until the late 1950s when AI
emerged as a branch of computer science and psychology. By the early 1960s,
Hilary Putnam (1961) laid out a research paradigm that would incorporate the
notion of Turing machines into the study of mind. From this a debate emerged between cognitivists and neural network theorists, laying the foundation of another
orientation, sometimes called cognitive neuroscience, as a branch of cognitive
science and a key tool in the investigation of the relation between gurative
language and mathematics.
5.1.1 Computational neuroscience

As its name implies, computational neuroscience is the orientation that derives
from formalist-computational approaches to mathematics and language. Basically, it can be dened as the modeling of brain functions and processes in
terms of the information processing properties of the structures that make up the
brain. The same kinds of computational techniques for analyzing, modeling, and
understanding mathematics and language on computers are extended to study
Unauthenticated
the behavior of cells and circuits in the brain. In our case, this involves mainly
exploring the computational principles governing the processing of language and
mathematics, including the representation of information by spiking neurons, the
processing of information in neural networks, and the development of algorithms
simulating linguistic and mathematical learning.
Computational neuroscience (CN) focuses on the use of formalist concepts
and techniques in the design of experiments and algorithms for simulating the behavior of neurons and neural networks during processing states. Techniques such
as nonlinear differential equations and applied dynamical systems are applied to
neuronal modeling. The idea here is to understand a natural phenomenon via its
computational counterpart. As discussed in the third chapter, this approach has
led to many interesting insights.
In an excellent overview of the eld Silva (2011) looks at the validity of the
basic CN approach which, as he correctly asserts, genuinely does aim to understand how the brain and related structures represent and process information via
computational modeling, which attempts to replicate observed data in order to
penetrate the dynamics of brain functioning that is inherent in the data. So, unlike
straightforward computationism (chapter 3), CN starts with experimental observations or measurements, rather than a pure theoretical framework, from which
it constructs a computer (mathematical) model aiming to furnish a set of rules
that are capable of explaining (simulating) properties of the experimental observations, using typical statistical-inferential techniques such as those described in
the previous chapter, and thus setting up a relation paradigm between the data
and the underlying molecular, cellular, and neural systems that produced it.
This whole approach, Silva points out, begins with an inference about how
the data t together and what are the likely rules that govern the patterns within
it. This is, of course, an abductive process on the part of the neuroscientist (which
seems not to be acknowledged as such in CN) and thus essentially tells us more
about his or her theoretical stance than about the data in any objective sense.
Indeed, in this approach there are many uncontrollable variables, including the
amount and quality of the data and how it was collected, which may limit the applicability of the model. The inference (abduction) is then translated into a quantitative algorithm framework which involves expressing the patterns observed in
the data in terms of differential equations or state variables that evolve in space
or time. The translation depends on the abilities of the translator and his or her
particular preferences. The model, Silva admits, is thus nothing more than an
informed guess, and this is where testing it out by carrying out numerical simulations becomes a critical aspect of the whole approach. CN thus seeks answers
to neurological questions in terms of its models compared against the actual experimental data.
Unauthenticated
259
Three general outcomes are possible:

1. The model describes the data correctly but cannot make any non-trivial predictions or hypotheses about the underlying neural system. The relevant studies in the CN literature may thus provide a modest contribution into understanding the relevant neural mechanisms through follow-up experimentation
guided by the model, but not a truly signicant one.
2. The constructed model contains limitations due to technology or to the mathematics used, making follow-up experimental testing unlikely, since there
would be no known real world counterparts to the model.
3. The model results in a novel non-trivial or unexpected experimental hypothesis that can be tested and veried. This may lead to the design and implementation of new experiments and may lead, in turn, to potentially signicant
new experimental ndings.
In all cases the basic thrust of the approach is the same: the neuroscientist guesses
which computational model best ts the observations and then attempts to validate it by observing the outputs it produces. The CN literature is full of studies
that iterate this basic approach, but, as Silva admits, they may have had minimal
impact on mainstream neuroscience and thus have made only trivial contributions to a true understanding of brain function. Silva proposes to reorient CN not
as a simple modeling of hypotheses based on mathematical simulations, but as
the systematic analytical investigation of data-based theorems. The goal would
thus be to construct a conjecture that is mathematically sound and conforms to
an experimentally known set of theorems, avoiding the subjective inferences of
traditional CN. The model would need to be evaluated by other neuroscientists
and mathematicians before it is tested out experimentally. His example is an interesting one (Silva 2011: 51):
Consider on-going efforts to decipher the connectome of the mammalian brain; that is,
identifying and mapping the structural connectivity of the networks in the brain at various
scales. At the cellular scale, no one would disagree that the connections between cells
represented by the vast spaghetti of processes that make up the neuropil are a complex
intermingling of curves. This represents a universally accepted qualitative anatomical
statement of fact about the structural connectivity of cellular networks in the brain that
few would argue with. We can translate this agreed upon statement into a mathematical
statement. For example, we can say that the set of edges that connects the vertices that make
up the network of interest are not represented by Euclidean geodesics but by curves that can
be described geometrically as Jordan arcs or some other appropriate mathematical object.
We may decide to characterize the turning numbers (from topology) of similar curves in a
set or use some other math to describe a different property. The point is that we have taken a
simple agreed upon experimental neurobiological statement of fact and have translated it
into a mathematical statement. We have captured some desirable aspect or property about
this experimental axiom within the language of mathematics.
Unauthenticated
The next step is to set up a model that says something about the set of axioms.
While admitting that this is itself a guess, Silva emphasizes that it is the result
of much trial and error, making it a plausible conjecture that can be tested empirically. This allows CN to break free of the inbuilt limitations of mathematical
models, such as differential equations and allows it the latitude to write down a
set of axioms and to prove a conjecture from those axioms using whatever mathematics is required. Returning to his example, Silva goes on to make the following
relevant observation:
Again, consider the example from above regarding the signicant resources and time being
put into deciphering the structural connectome of the brain. This massive amount of accumulating data is qualitative, and although everyone agrees it is important and necessary to
have it in order to ultimately understand the dynamics of the brain that emerges from the
structural substrate represented by the connectome, it is not at all clear at present how to
achieve this. Although there have been some initial attempts at using this data in quantitative analyses they are essentially mostly descriptive and offer little insights into how the
brain actually works. A reductionists approach to studying the brain, no matter how much
we learn and how much we know about the parts that make it up at any scale, will by itself never provide an understanding of the dynamics of brain function, which necessarily
requires a quantitative, i.e., mathematical and physical, context. The famous theoretical
physicist Richard Feynman once wrote that people who wish to analyze nature without
using mathematics must settle for a reduced understanding. No where is this more true
than in attempting to understand the brain given its amazing complexity.
Above all else, it is in understanding how we create new language and new mathematics that CN has never really produced satisfying hypotheses. But scholars
such as Sandri (2004) make the argument that creativity can also be modeled
algorithmicallyit all depends on the complexity of the model. It was Turing who
discussed a system whose computational power was beyond that of his nite state
machine (Turing Machine). Turings challenge was an early impetus for developing a so-called hybrid computational system in CN, based on neural networks and
brain automata, which can go beyond the Turing Machine (Sandri 2004: 9). The
model, Sandri asserts, would need to simulate highly integrating activities, like
feedback and novelty-making processes, which are understood as processes that
involve innitary procedures, ending up in a complex information network, and
computational maps, in which both digital, Turing-like computation and continuous, analog forms of calculus are expected to occur (Sandri 2004: 9).
While this seems to be a signicant new trend in CN, it still involves a degree of
circularitythat is, creativity needs to be dened precisely beforehand in order to
develop hybrid algorithms, and this takes us back to the set of problems described
above by Silva. CN is thus in a Catch-22 situation. In a follow-up co-authored
paper (Toni, Spaletta, Casa, Ravera, and Sandri 2007), Sandri reiterates his view
Unauthenticated
261
that it is the hybrid development of neural networks and brain automata that will
expand the computational power of models. The authors support their view as
follows (Toni et al. 2007: 67):
The cerebral cortex and brain stem appears primary candidate for this processing. However, also neuroendocrine structures like the hypothalamus are believed to exhibit hybrid
computational processes, and might give rise to computational maps. Current theories on
neural activity, including wiring and volume transmission, neuronal group selection and
dynamic evolving models of brain automata, bring fuel to the existence of natural hybrid
computation, stressing a cooperation between discrete and continuous forms of communication in the CNS. In addition, the recent advent of neuromorphic chips, like those to
restore activity in damaged retina and visual cortex, suggests that assumption of a discretecontinuum polarity in designing biocompatible neural circuitries is crucial for their ensuing
performance. In these bionic structures, in fact, a correspondence exists between the original anatomical architecture and synthetic wiring of the chip, resulting in a correspondence
between natural and cybernetic neural activity. Thus, chip "form" provides a continuum essential to chip function. We conclude that it is reasonable to predict the existence of hybrid
computational processes in the course of many human, brain integrating activities, urging
development of cybernetic approaches based on this modelling for adequate reproduction
of a variety of cerebral performances.
The main point made by Sandri et al. is a valid one, of course. But this was the path
followed by the connectionists, as will be described below. Zyllerberg, Dehaene,
Roelfsma, and Sigman (2011) also argue for hybridity, but with a slightly different slant. Their objective is to model neuronal mechanisms by which multiple
such operations are sequentially assembled into mental algorithms. We outline a
theory of how individual neural processing steps might be combined into serial
programs. Their solution is a hybrid neuronal device, whereby each step involves parallel computation that feeds a slow and serial production system. Thus,
production selection is mediated by a system of competing accumulator neurons
that extends the role of these neurons beyond the selection of a motor action.
An experiment by Weisberg, Keil, Goodstein, Rawson, and Gray (2008), however, seems to show that humans do not process information in the same way
as the algorithms of neuroscientists do. The researchers tested peoples abilities
to critically consider the underlying logic of a computational explanation, giving
nave adults (those with no knowledge of neuroscience), students in a neuroscience course, and neuroscience experts brief descriptions of psychological phenomena followed by one of four types of explanation. The actual information was
irrelevant to the logic of the explanation, as conrmed by the expert subjects. The
subjects evaluated good explanations as more satisfying than bad ones. But those
in the two non-expert groups additionally judged that explanations with logically
irrelevant information as more satisfying than those without. The neuroscience
Unauthenticated
information, in other words, had a particularly striking effect on judgments of

bad explanations, masking otherwise salient problems in these explanations. Although the experts were not fooled by the explanation, the experiment did issue
a warning about the nature of CN explanations and their purported realism.
Bernacer and Murillo (2014) pointed out that some articial assumptions in
CN may be the reason why the models are hardly realistic. The notion of habit in
neuroscience, they argue, has always been of central importance in the modeling
process, but the problem is that the main conceptualization of what a habit is
comes from the behaviorist tradition, which characterized habits as rigid, automatic, unconscious, and opposed to goal-directed actions (Bernacer and Murillo
2014: 883). The scholars suggest the use of the classic Aristotelian notion of habit
as a new guide for conducting CN research. Aristotle saw habits as acquired dispositions that allowed individuals to perform specic actions. This disposition
can thus be viewed as habit-as-learning, in contrast to the behaviorist habitas-routine, which the authors claim can be integrated with the Aristotelian denition, since habit can be classied into three main domains:
1. theoretical, or the modeling of learning understood as knowing that x is so
2. behavioral, through which an individual achieves a rational control of emotional behavior (knowing how to behave)
3. technical or learned skills (knowing how to make or to do).
According to the authors, it is the Arstotelian conception of habit that could serve
as a framework concept for neuroscience: Habits, viewed as a cognitive enrichment of behavior, are a crucial resource for understanding human learning and
behavioral plasticity (Bernacer and Murillo 2014: 883).
5.1.2 Connectionism
Connectionism was an early counter-trend to computational neuroscience that
continues to provide to this day a serious theoretical alternative to CN constructs
such as neural network theory. The connectionist movement started with Russian
neuroscientist Alexander Luria, who in 1947 suggested that the neural processing of information involved interconnectivity in functional task distribution that
spanned the entire brain. Adopting Jakobsons (1942) idea that the selection of
linguistic units and their combination were neurologically complementary processes, Luria showed that the latter was impaired by lesions in the anterior areas
of the language centers, whereas the former was disrupted when damage occurred
to the posterior areas of the same centers. Luria argued that although a single linguistic function (articulation, comprehension, etc.) could be safely located in a
Unauthenticated
263
specic area of the left hemisphere (LH), the overall phenomenon of language as
an expressive and representational code resulted from the interaction of several
cooperative cerebral structures that were connected by a network of synaptic processes. Subsequent aphasiology studies conrmed Lurias basic idea: for example, LH-damaged patients use intonation patterns correctly (Danly and Shapiro
1982), suggesting a right-hemisphere (RH) location for this function; RH-damaged
patients, on the other hand, show little or no control of intonation (for example,
Heilman, Scholes, and Watson 1975, Ross and Mesulam 1979). This kind of work
led to the concept of parallel distributed processes (Rumelhart and McClelland
1986) which has been shedding some light on how Lurias idea of interconnectivity may in fact be the source of the higher mental functions.
Connectionism garnered broad interest in the 1960s and 1970s after widelypublicized split-brain studies conducted by the American psychologist Roger
Sperry and his associates showed that there was much more to the brain than locationism, or the idea that functions can be located in specic brain areas (for example, Sperry 1968, 1973). Split-brain patientsknown more technically as commisurotomy patientsare epilepsy subjects who have had their two hemispheres
separated by surgical section of the corpus callosum in order to attenuate the
seizures they tend to suffer. Each of their hemispheres can thus be investigated,
so to speak, in isolation by simply presenting stimuli to them in an asymmetrical fashion. So, any visual or audio stimulus presented to the left eye or left
ear of a split-brain subject could be assessed in terms of its RH effects, and vice
versa any visual or audio stimulus presented to the right eye or right ear could
be assessed in terms of its LH effects. The commisurotomy studies were pivotal
in providing a detailed breakdown of the main psychological functions according
to hemisphere and in how these worked in tandem. Overall, they suggested that
in the intact brain both hemispheres, not just a dominant one, were needed
in a neurologically-cooperative way to produce complex thinking. The split-brain
experiments established, once and for all, that the two hemispheres complement
each other in normal cognitive processing. So, in order to carry out a complex
cognitive task (for example, problem-solving in mathematics, reading, etc.) the integrated cooperation of both hemispheres is required. Cognition, in other words,
is interhemispheric, not just the product of dominant sites or centers in one or the
other of the two hemispheres of the brain.
The use of clinical methods such as aphasiology data and of commisurotomy
experiments as the primary ones in establishing facts about brain functioning started to give way, by the mid-1970s, to the employment of non-clinical
techniques to investigate the brains of normal subjects. They included dichotic
listening (sending signals to the brain via headphones), electroencephalograph
analysis (graphing brain waves with electrodes), and lateral eye movement
Unauthenticated
(videotaping the movement of the eyes during the performance of some cognitive task). The ndings generated by such techniques started casting further
doubt on the idea that neural networks based on computation worked as models
of the mind. By the early 1980s, new experiments conrmed, for instance, that
metaphor was the result of interhemispheric programming and that it could not
be explained in terms of a simple logical calculus.
Many of these techniques have been largely abandoned today for a simple
reasonthey have been made obsolete by new technologies such as positron
emission tomography (PET) scanning and functional magnetic resonance imaging (fMRI). These have enabled neuroscientists to observe the brain directly while
people speak, listen, read, solve problems, conduct proofs, and think in general.
These are particularly effective because they do not require any physical contact
with the brain. They produce images similar to X rays that show which parts of
the brain are active while a person carries out a particular mental or physical task.
PET scanning shows the parts of the brain that are using the most glucose (a form
of sugar), and fMRI shows the parts where high oxygen levels indicate increased
activity.
The PET and fMRI studies are gradually conrming that mathematical and
linguistic processing are extremely complex, rather than involving a series of subsystems located in specic parts of the brain (Brocas area, Wernickes area, and
Penelds area). The neuronal structures involved are spread widely throughout
the brain, primarily by neurotransmitters, and it now appears certain that different types of tasks activate different areas of the brain in many sequences and
patterns. It has also become apparent from fMRI research that language is regulated, additionally, by the emotional areas of the brain. The limbic systemwhich
includes portions of the temporal lobes, parts of the hypothalamus and thalamus,
and other structuresmay have a larger role than previously thought in the processing of all kinds of information (Damasio 1994).
5.1.3 Modularity
Connectionist neuroscience has led to the notion that the brain is a modular organ, with each module (agglomeration of neuronal subsystems located in a specic region) organized around a particular task. It is worthwhile repeating here
previously made annotations about how interhemisphericity works. The processing of visual information, for instance, is not conned to a single region of the
RH, although specic areas in the RH are highly active in processing incoming
visual forms. Rather, different neural modules are involved in helping the brain
process visual inputs as to their contents. Consequently, visual stimuli that carry
Unauthenticated
265
linguistic information would be converted by the brain into neuronal activities

that are conducive to linguistic, not visual, processing. This is what happens in the
case of American Sign Language. The brain rst processes the meanings of visual
signs, extracting the grammatical relations in them, in a connected or distributed
fashion throughout the brain (Hickok, Bellugi, and Klima 2001). As discussed
previously, visual stimuli carry a different kind of information and are converted
instead into modules that are involved in visual motor commands. This nding
would explain, as already discussed, why tonemes, which serve verbal functions,
call into action the LH. Musical tones instead serve other functions, thus calling
into action the RH.
The connectivity that characterizes modularity has been examined not only
experimentally with human subjects, but also theoretically with computer software. Computer models of connectionism are called, as mentioned, parallel distributed processing (PDP) models. These are designed to show how, potentially,
brain modules interconnect with each other in the processing of information. The
PDP models appear to perform the same kinds of tasks and operations that language and mathematics do (MacWhinney 2000). Contrary to the computational
ones mentioned above, PDP models appear to actually t the neurological patterns better. As Obler and Gjerlow (1999: 11) put it, in PDP theory, there are no
language centers per se but rather network nodes that are stimulated; eventually one of these is stimulated enough that it passes a certain threshold and that
node is realized, perhaps as a spoken word. This type of modeling has produced
rather interesting ideas, the paramount one being that mathematics and language
appear to form a single interconnected systemas will be discussed subsequently.
It is relevant to compare CN with PDP models in general. First and foremost,
the former is guided by the computer metaphor and the latter by the network
or web metaphor. CN sees algorithms as more signicant, or real, than other
kinds of models, working under the assumption that conscious intelligence works
according to procedures that can be parsed and simulated by computers. PDP sees
computers as simple artifacts on which connective processes can be represented,
thus reversing the CN technique of going from theory to verication and then modication. CN thus is guided by AI; PDP is guided instead by network theories. In
both, however, the idea of neural system is a dominant one.
So, the differences between these two computational orientations can be summarized as follows:
Unauthenticated
Table 5.1: CN versus PDP models

CN models
PDP models
Based on the computer metaphor
Based on the network metaphor
View information as passing through a rulebased algorithmic system
View information as diffused in patterns

of activation across a network
See information as being held in storage

containers or bins (computer memory)
See information as distributed across a network (parallel distributions)
See cognition as residing in neurons sending

out information to other neurons
Reverse the CN view by modeling information

after neuronal structure
5.1.4 Research on metaphor

Given the importance of metaphor in the debates regarding the nature of mathematics (discussed in chapters 1 and 2), it is obvious that the neuroscientic ndings on metaphor are of central signicance in this regard. Early work established,
rst and foremost, that metaphor involves content-related functions in the RH
and form-related ones in the LH, which are interconnected through complex neural networks. It became a test case for connectionism. The evidence for an interhemispheric model came originally from the study of brain-damaged patients. In
1964, the psychologist Weinstein was among the rst to conduct a clinical study
demonstrating that patients with RH damage had lost the ability to comprehend
and produce metaphors. This suggested an RH locus for metaphorical meanings.
A study by Winner and Gardner (1977), a little more than a decade later, corroborated Weinsteins nding. The two researchers presented a series of utterances
to various subjects asking them to select one of four pictures that best portrayed
the meaning of the utterance. For the sentence A heavy heart can really make a
difference the subjects were shown four pictures from which to choose:
1. a person crying (= metaphorical meaning)
2. a person staggering under the weight of a huge red heart (= literal meaning)
3. a 500-pound weight (= a representation emphasizing the adjective heavy)
4. a red heart (= a representation emphasizing the noun phrase red heart)
Of the subjects used in the studyaphasics (subjects with LH damage), patients
with RH damage, and a normal control grouponly the RH-damaged ones manifested consistent difficulties in identifying metaphorical meanings. In the same
year, Stachowiak, Huber, Poeck, and Kerschensteiner (1977) conducted a similar
type of study and came to the same conclusion. The researchers read subjects stories and then asked them to pick from a set of ve drawings the one which best
Unauthenticated
267
described what happened to the main character of each one. One of the stories
contained a metaphorical idiom. The groups tested were aphasics, RH patients,
and normals. Like Winner and Gardner, the researchers found that, of the three
groups, the RH patients were the ones who showed the greatest inability to detect
the metaphorical idioms.
In the 1980s, the evidence in favor of an RH involvement in metaphor mounted. Hier and Kaplan (1980) found that RH patients exhibited decits in explaining
the meaning of proverbs. Wapner, Hamby, and Gardner (1981) discovered that
RH patients tended to exhibit signicant difficulties in deriving the metaphorical
point of a story. Brownell, Potter, and Michelow (1984) and Brownell (1988) detected RH involvement in metaphor comprehension, but could not specify what
neural regions of the RH were implicated. Using PET-scanning equipment Bottini
et al. (1994) showed the right temporal lobe to be the most active one in metaphor.
They also found that the right parietal lobe was active in some metaphorical tasks,
whereas the corresponding lobe in the LH was not.
This whole line of research suggests that metaphor results from an interhemispheric connectivity, originating in the RH and moving over to the LH for its organization into language or some other system, including mathematics. After the
publication of Lakoff and Nezs study (2000), which claimed that metaphor had
the same neural structure in mathematics, a plethora of neuroscientic studies
surfaced showing that metaphor and math were indeed connected and that a unitary neuroscientic model could be drafted. Pesci (2003) argued persuasively, on
the basis of a literature review connecting mathematics and metaphor, that the
latter seemed to play a critical role in math because it was an efficient transformation mediator of cognition. Lakoff and Nezs main claim was that we
understand mathematics through conceptual metaphors and thus through linkages between source domains (for example spatial relationships between objects)
and target domains (abstract mathematical notions). These are based on certain
basic schemas of thought, or cross-modal organizational structures, as discussed
in chapter 3. In 2009, Aubry showed how the Lakoffian model works in explaining abstract mathematical conceptions of space. Mowat and Davis (2010), Ernest
(2010), and Zwicky (2010) have argued along the same lines. The gist of this line of
inquiry is that the role of metaphor in mathematics can no longer be ignored. Computational models in neuroscience cannot handle connective phenomena such as
metaphorical blending. And if the relevant research is at all correct, then it is in
studying blending that the greatest insights into the relation between math cognition and language can be gleaned.
Recent work on metaphor processing has largely substantiated the connectionist ndings. Some questions have also been raised that require further investigation. For instance, Schmidt-Snoek, Drew, Barile, and Aguas (2015) show
Unauthenticated
that there are links between sensory-motor words used literally (kick the ball) and
sensory-motor regions of the brain, but nd no conclusive evidence to suggest
whether metaphorically used words (kick the habit) also show signs of such embodiment. Nonetheless, their study indicated greater amplitudes for metaphorical than literal sentences, supporting the possibility of different neural substrates
for motion and auditory sentences. The ndings are consistent with a sensorymotor (RH) neural categorization of metaphor.
Parallel ndings have been documented in a vast array of studies that conrm
RH involvement in metaphor processing (Schmidt and Seger 2009, Diaz, Barrett,
and Hogstrom 2011). A review of the literature, and the controversies it has generated, is the one by Lai, van Dam, Conant, Binder, and Rutvik (2015). By and large,
the studies substantiate the difference between literal and metaphorical cognition
neurologically.
5.2 Math cognition

Research such as that described above suggests that a common neural system
exists for mathematics and language. Much of the research is based on neuroimaging studies of mathematical learning disabilities. Different neural mechanisms
contribute to different aspects of mathematical knowledge, and this is showing up
in research with children with a disability such as dyscalculia (severe difficulty in
carrying out the arithmetical operations, as a result of brain disorder) show variable patterns of abnormality at the brain level. Some children with dyscalculia
also have dyslexia, and may show different activation of the verbal networks that
support math cognition, whereas those who have dyscalculia only, may show impairments of the parietal number sense system alone. Such evidence has ignited
a theoretical debate between researchers who believe that dyscalculia is caused
by a brain level decit of number sense and those who believe that it stems from a
problem in using symbols to access the numerical information. Models of dyscalculia that generate explicit testable hypotheses are being used more and more
to investigate the link between mathematical learning disorders and their neural
correlates.
Starting with the work of Brian Butterworth (1999), Stanislas Dehaene (1997),
Keith Devlin (2000), and Lakoff and Nez (2000), among others, the eld of math
cognition research started burgeoning in the early 2000s, having provided today a
huge database of research ndings, theories related to math learning, and insights
into how mathematics intersects with other neural faculties such as language and
drawing. The eld has not just produced signicant ndings about how math is
processed in the brain, but also reopened long-standing philosophical debates
Unauthenticated
5.2 Math cognition
269
about the nature of mathematics, allowing us to revisit, for example, the Platonistversus-constructivist one with new empirical information.
Overall, ongoing research in neuroscience suggests that understanding of
number and space are a result of the same kind of brain circuitry that processes
the two phenomena, even though the debate continues as to what areas are involved in number sense versus linguistic awareness. And this leads a new way of
examining how the brain models the world. Our external experience of quantity
and space, and our symbolic representations of that experience, activate the same
neural networks, as Edward Hubbard and his associates have argued (see, for example, Hubbard et al. 2005). Abstract mathematical concepts such as Cartesian
coordinates or the complex plane might appear to be cultural inventions, but
they may have emerged as concepts because they t in with the architecture of
the brain and thus its cerebral symbolism. So, they are both part of the biology
of cognition, but also shaped by cultural inuence, which initiates the abstraction process. This may or may not be veriable, but it does bring out that the
two dimensions of human knowledge-makingthe Umwelt and the Innenwelt
interact constantly in the production of knowledge and this interaction is guided
by image schemas such as more than, less than, nearer, farther, and bigger, smaller
that apply to language as well as to mathematics.
Since the circuitry encoding different magnitudes produces blends, one
would expect that the perception of phenomena such as duration, size, and
quantity would affect each other. And this has been shown with so-called interference studies. For example, if subjects are given information indicating that two
trains of different size are travelling at the same speed, they will tend to perceive
the larger train as moving faster.
Guhe et al. (2011) have developed a computational model of how blending
might be simulated. They devised a system by which different conceptualizations
of number can be blended together to form new ones via recognition of common
features, and a judicious combination of their features. The model of number is
based on Lakoff and Nezs grounding metaphors for arithmetic. The ideas are
worked out using a so-called Heuristic-Driven Theory Projection (HDTP, a method
based on higher-order anti-unication). HDTP provides generalizations between
domains, thus allowing for a mechanism of nding commonalities and allows for
the transfer of concepts from one domain to another, producing new conceptual
blends.
Of course, the work on metonymy is also critical for understanding the connection between mathematics and language, but need not be discussed in any
detail here. The difference between metaphor and metonymy can be reduced to a
simple paraphrase: metaphor amalgamates information, metonymy condenses it.
So, metonymy is operative in giving rise to symbols; metaphor is operative in how
Unauthenticated
ideas are amalgamated or compressed to produce new conceptualizations. Both

processes reect blending in general, taking different inputs and amalgamating
them, as mentioned several times already, as can be shown by the following general diagram:
Generic
space
Input 1
Input 2
Blend or
solution
Figure 5.1: Blending
The difference is that in metonymy one of the inputs is actually part of the other.
Again without going into details here, suffice it to say that the concept of blend
covers a broad range of cognitive activities, including metonymy, metaphor, and
irony. Note that by generic space the model simply renames concept-to-beconstructed.
5.2.1 Dening math cognition

Mathematical cognition is not easy to dene, although we may all have an intuitive sense of what it is. Generally, it can be dened in two main ways. First, it
is the awareness of quantity, space, and structural patterns inherent among specic kinds of concepts. This denition reects the possibility that math cognition
may be innate and not necessarily limited to the human species. Second, it can
be dened as the awareness of how symbols stand for concepts and how they
encode them. At this level, math cognition is symbolic cognition and, as Radford
(2010) among many others have argued, cannot be studied in isolation from contextual factors, and thus from the symbolic practices in which people are reared.
As Radford (2010: 1) puts it, to understand the relation between number sense and
its varied symbolic representations, one must grasp the fundamental role of the
context, the body and the senses in the way in which we come to know. So, those
who would claim that our mathematical symbols match the requirements of our
primal heritage are really reaching into speculation, rather than empirical facts.
Unauthenticated
5.2 Math cognition
271
A historical starting point for a discussion of math cognition is Immanuel Kant

(2011), although writing and speculation on the ontological nature of mathematics goes back to antiquity, as we have seen throughout this book. Kants (2011:
278) view is, however, the rst modernist one, dening it as knowledge of combining and comparing given concepts of magnitudes, which are clear and certain,
with a view to establishing what can be inferred from them. Kant argued further that this basic intuitive sense becomes explicit when we examine the visible
signs that we use to highlight the structural detail inherent in this type of knowledge. For example, a diagram of a triangle compared to that of a square will show
where the differentiation occursone consists of three intersecting lines, while
the other has four parallel and equal sides that form a boundary. As trivial as this
might seem, upon further consideration it is obvious that this kind of visualization is a cultural process designed to make the intuitions manifest. This type of
diagrammatic strategy is based on the brains ability to synthesize scattered bits
of information into holistic entities that can then be analyzed reectively.
The problem with dening math cognition is that it cannot be separated
from the various dimensions of math knowledge itself. As Alexander (2012) has
cogently argued, the cognition of mathematics involves three dimensionspremath, math and mathematics. Pre-math is innate, which includes some
primitive sense of number and geometry, although even this sense might be
more subtle than one might think. Some animals other than humans may share
the same kind of sense. Math is what we learn as formal skills, from infancy
through all levels of schooling. It is what educators, public policy makers, and
other authorities want everyone to be competent in. Mathematics is a discipline, with its own professional culture, its own sense of correctness built around
rigorous proofs, and various epistemological practices. The boundaries among
the dimensions are fuzzy, and certainly there are cross-inuences, although the
distinctions are useful. What neuroscientists call math cognition therefore might
be exclusively based on one of these dimensions, on all three, or on an interaction
of the three.
As Fauconnier and Turner (2002) have argued, if there is a connection among
the dimensions it is through blending. This is because a blend, once completed, is
available for use in subsequent or additional blends. And in fact, a major modus
operandi of mathematics is to build blend upon blend upon blend, within the
rigid formal structures that mathematics permits. In this way, mathematicians
construct entire edices of generalizations to solidify their objectives. To quote
Turner (2005):
As long as mathematical conceptions are based in small stories at human scale, that is,
tting the kinds of scenes for which human cognition is evolved, mathematics can seem
Unauthenticated
straightforward, even natural. The same is true of physics. If mathematics and physics
stayed within these familiar story worlds, they might as disciplines have the cultural status
of something like carpentry: very complicated and clever, and useful, too, but tting human
understanding. The problem comes when mathematical work runs up against structures
that do not t our basic stories. In that case, the way we think begins to fail to grasp the
mathematical structures. The mathematician is someone who is trained to use conceptual
blending to achieve new blends that bring what is not at human scale, not natural for
human stories, back into human scale, so it can be grasped.
Hyde (2011) looked at the relevant literature on math cognition in order to provide a more comprehensive denition of the phenomenon. After going through
a set of studies of adults, infants, and animals he concluded that non-symbolic
number sense is supported by at least two distinct cognitive systems: a parallel individuation system that encodes the numerical identity of individual items
and an approximate number system that encodes the approximate numerical
magnitude, or numerosity, of a set. Of course, some argue that the non-symbolic
representation of small numbers is carried out solely by the parallel individuation
system, while the non-symbolic representation of large numbers is carried out
solely by the approximate number system. Others argue that all numbers are represented by a single system. This debate has been fueled by experiments showing
dissociations between small and large number processing and contrasting ones
showing similar processing of small and large numbers. Hyde argues for diversity
in results due to subjectivity (Hyde 2011: 150):
When items are presented under conditions that allow selection of individuals, they will be
represented as distinct mental items through parallel individuation and not as a numerical
magnitude. In contrast, when items are presented outside attentional limits (e.g., too many,
too close together, under high attentional load), they will be represented as a single mental
numerical magnitude and not as distinct mental items. These predictions provide a basis
on which researchers can further investigate the role of each system in the development of
uniquely human numerical thought.
In effect, it is difficult, if not impossible, to pin down math cognition to specic

parameters and views, given the nature of human and cultural diversity. The research is far too diffuse to allow for a general theory of math cognition.
5.2.2 Charles Peirce

Kants ideas found their implicit elaboration and amplication in Charles Peirces
Existential Graph Theory (Peirce 19311956, vol. 2: 398433, vol. 4: 347584), by
which visual signs (such as diagrams) are tools that are more powerful than lan-
Unauthenticated
5.2 Math cognition
273
guage as models of reality because they display how the parts resemble relations
among the parts of some different set of entities in other domains. Thus, it can
be said that math cognition is especially visible (literally) in the use of diagrams
to represent math concepts. Diagrams do not simply portray information, but
also the process of thinking about the information as it unfolds in the brain
(Peirce, vol. 4: 6). Peirce called diagrams moving pictures of thought (Peirce,
vol. 4: 811) because in their structure we can literally see a given argument. As
Kiryuschenko (2012: 122) has aptly put it, for Peirce graphic language allows us to
experience a meaning visually as a set of transitional states, where the meaning
is accessible in its entirety at any given here and now during its transformation.
If Kant and Peirce are correct, then it is obvious that the role of diagrams and
visual signs generally in the neuroscientic study of mathematical cognition is an
important one because they mirror brain structure. The work on math cognition
and visualization is actually quite extensive (Shin 1994, Chandrasekaran, Glasgow, and Narayanan 1995, Hammer 1995, Hammer and Shin 1996, 1998, Allwein
and Barwise 1996, Barker-Plummer and Bailin 1997, 2001, Kulpa 2004, Stjernfelt
2007, Roberts 2009). So too is the interest in phenomenology among mathematicians, a trend that was pregured by Peirces notion of phaneroscopy, which
he described as the formal analysis of appearances apart from how they appear
to interpreters and of their actual material content (see Hartimo 2010). In effect,
mathematical diagrams express our intuitions about quantity, space, and relations in a way that seems to parallel mental imagery in general as a means of
grasping and retaining reality. The intuitions are probably universal (rst type
of denition); the visual representations, which include numerals originally, are
products of historical processes (second type of denition).
The Kantian notion of visual sign extends to numerals, equations and other
mathematical artifacts. Algebraic notation is, in effect, a diagrammatic strategy
for compressing information, much like pictography does in referring to specic
referents (Danesi and Bockarova 2013). An equation is a graph consisting of signs
(letters, numbers, symbols) organized to reect the order and structure of events
that it aims to represent iconically. It may show that some parts are tied to a strict
order, whereas others may be more exibly interconnected. As Kauffman (2001:
80) observes, Peirces graphs contain arithmetical information in an economical
form:
Peirces Existential Graphs are an economical way to write rst order logic in diagrams on
a plane, by using a combination of alphabetical symbols and circles and ovals. Existential
graphs grow from these beginnings and become a well-formed two dimensional algebra. It
is a calculus about the properties of the distinction made by any circle or oval in the plane,
and by abduction it is about the properties of any distinction.
Unauthenticated
An equation such as the Pythagorean one (c2 = a2 + b2 ) is a type of Existential
Graph, since it is a visual portrait of the relations among the variables (originally
standing for the sides of the right triangle). But, being a graph, it also tells us that
the parts relate to each other in many ways other than in terms of the initial triangle referent. It reveals hidden structure, such as the fact that there are innitely
many Pythagorean triples, or sets of integers that satisfy the equation. Expressed
in language (the square on the hypotenuse is equal to the sum of the squares
on the other two sides), we would literally not be able to see this hidden implication. To return to Susan Langers (1948) distinction between discursive and
presentational forms (chapter 2), the equation tells us much more than the statement (a discursive act) because it presents inherent structure holistically, as an
abstract form. We do not read a diagram, a melody, or an equation as made up
of individual bits and pieces (notes, shapes, symbols), but presentationally, as a
totality which encloses and reveals much more meaning.
5.2.3 Graphs and math cognition

In blending theory, further mathematical knowledge occurs by unpacking the
inherent information immanent within the medium of graphsas, for example,
Pythagorean triples. All mathematical notation is thus graphic, and this is why
it allows us to experiment with referents so that we can see if the experiment
leads to further information and knowledge. Reasoning in mathematics does, of
course, entail the use of information obtained through other media, including
linguistic sentences. However, as neuroscientic research has shown rather convincingly, mental imagery and its expression in diagrammatic form is more powerful and may even predate the advent of vocal language (Cummins 1996, Chandrasekaran, Glasgow, and Narayanan 1995). Even sentences, as Peirce often argued, hide within their logical structure a visual form of understanding that can
be easily rendered diagrammatically. This is what linguists have, actually, been
doing with their diverse diagrams of linguistic structure.
In sum, diagrams show relations that are not apparent in linguistic statements (Barwise and Etchemendy 1994, Allwein and Barwise 1996). As Radford
(2010: 4) remarks, they present encoded and hidden information to us by ways
of appearance. Diagrams are inferences that translate hunches visually. These
then lead to Peircean abductions. The process is complete after the ideas produced
in this way are organized logically (deduction). This suggests a ow model of
mathematical cognition that moves from hunches to deduction:
Unauthenticated
5.2 Math cognition
hunch
inference
abduction
deduction
guessing
informed
guessing
insight
logical form
275
Figure 5.2: Flow model of math cognition
Hunches are the brains attempts to understand what something unknown means
initially. These eventually lead to inferences through a matrix of associative devices to previous knowledge such as induction, analogy, and metaphor. So, the
Pythagorean triangle, which came initially from the hunches of builders, led to
an inference that all similar triangles may contain the same pattern, and this led
to the insight that we call the Pythagorean Theorem, which was given a logical
form through proof. Once the form exists, however, it becomes the source for more
inferences and abductions, such as the previously-hidden concept of Pythagorean
number triples. Eventually, it gave rise to an hypothesis, namely that only when
n = 2 does the general formula hold (cn = an + bn )called Fermats Last Theorem
(Taylor and Wiles 1995). This, in turn, led to many other discoveries (Danesi 2013).
As another example of how unpacking leads to insight, consider imaginary numbers. The motivation for their invention/discovery came from solving
quadratic equations that produced the square root of negative numbers. It was not
clear, at rst, how to resolve this apparent anomaly. So, a hunch that they could
be treated like any number surfaced at some point, which led to an inference,
namely that the square root of a negative number must exist in some domain,
which in turn, led to an abductionthe ingenious invention of a diagram, called
the Argand diagram, that showed the relation of imaginary numbers to real ones.
As is well known, the diagram locates imaginary numbers on one axis and real
ones on another. The point z = x + iy is then used to represent a complex number
in the Argand plane, displaying its vectorial features in terms of the angle that
it forms. The Argand diagram turned out, moreover, to be much more than a
simple heuristic device, showing how to carry out arithmetical operations with
complex numbers; it soon became a source of investigation of the structure of
these numbers and numbers in general.
Needless to say, mathematicians have always used diagrams to unpack hidden structure. For this reason, the relation between mental imagery and math
cognition has become a main topic in both neuroscience and psychology. Among
the rst to investigate this relation empirically was Piaget, who sought to understand the development of number sense in relation to symbolism (summarized
in Piaget 1952). In one experiment, he showed a ve-year-old child two matching
sets of six eggs placed in six separate egg-cups. He then asked the child whether
Unauthenticated
there were as many eggs as egg-cups (or not)the child replied in the affirmative.
Piaget then took the eggs out of the cups, bunching them together, leaving the
egg-cups in place. He then asked the child whether or not all the eggs could be put
into the cups, one in each cup and none left over. The child answered negatively.
Asked to count both eggs and cups, the child would correctly say that there was
the same amount. But when asked if there were as many eggs as cups to ll, the
child would again answer no. Piaget concluded that the child had not grasped
the relational properties of numeration, which are not affected by changes in the
positions of objects. Piaget showed, in effect, that ve-year-old children have not
yet established in their minds the symbolic connection between numerals and
number sense (Skemp (1971: 154).
5.2.4 Neuroscientic ndings

The neuroscientic study of math cognition has led to a whole series of existentialphilosophical questions. For example: Is number sense a cross-species capacity,
but the use of symbols to represent numbers a specic human activity?
There exists a substantive literature showing that animals possess an intuitive sense of number, but that they cannot transform their intuitions into useable
knowledge and thus to act upon the world conceptually, rather than just instinctively. The Alexandrian geometer Pappus, may have been among the rst to examine math cognition in animals, as he was contemplating the following problem:
What is the most efficient way to tile a oor? One can do it with equilateral triangles, equal four-sided gures, or regular hexagons, with the latter having the
most area coverage (Flood and Wilson 2011: 36). He then observed that bees instinctively use the hexagon pattern for their honeycombs. Pappus found this to be
a truly perplexing phenomenon. But the astonishment is a human one; it is unlikely that bees are aware of their instinctive knowledge. As Uexkll (1909) might
have put it, the internal modeling system of bees (the Innenwelt) is well adapted to
understanding their particular world (the Umwelt), producing instinctual models
of that world.
The beginning of neuroscientically-based research on math cognition can
probably be traced to Stanislas Dehaenes (1997) work, which is seen by many to
have initiated the serious and systematic study of math cognition, bringing forth
experimental evidence to suggest that the human brain and that of some chimps
come with a wired-in aptitude for math. The difference in the case of the latter
is an inability to formalize this innate knowledge and then use it for invention
and discovery. This is why certain ideas are found across cultures. One of these,
Dehaene claims, is the number line (as discussed previously). But anthropological
Unauthenticated
5.2 Math cognition
277
evidence scattered here and there (Bockarova, Danesi, and Nez 2012) would
seem to dispute this, since there are cultures where the number line does not exist
and thus that the kinds of calculations and concepts related to it do not appear.
Whatever the truth, it is clear that neuroscience, as Dehaene suggests, can provide
answers to many of these conundrums.
Dehaene brings forth evidence that animals such as rats, pigeons, raccoons,
and chimpanzees can perform simple calculations, describing ingenious experiments that show that human infants also show a parallel manifestation of number
sense. Further, Dehaene suggests that this rudimentary number sense is as basic
to the way the brain understands the world as our perception of color or of objects
in space, and, like these other abilities, our number sense is wired into the brain.
But how then did the brain leap from this basic number sense to trigonometry, calculus, and beyond? Dehaene argues that it was the invention of symbolic systems
of numerals that started us on the climb towards higher abstract mathematics,
He makes his case by tracing the history of numbers, from early times when people indicated a number by pointing to a part of their body (even today, in many
societies in New Guinea, the word for six is wrist), to early abstract symbols
such as Roman numerals (chosen for the ease with which they could be carved
into wooden sticks), and to modern numerals and number systems. Dehaene also
explores the unique abilities of idiot savants and mathematical geniuses, asking what might explain their special mathematical talent. Using modern imaging
techniques (PET scans and fMRI), Dehaene illustrates exactly where numerical
calculation takes place in the brain. But perhaps most importantly, Dehaene argues that the human brain does not work like a computer, and that the physical
world is not based on mathematicsrather, mathematics evolved to explain the
physical world in a similar way that the eye evolved to provide sight. His model
of math cognition is charted in gure 5.3. It shows that there are verbal and attention components, but overall numeracy and numerical magnitude processes are
independent modules of cognition.
Dehaenes arguments are far-reaching. But do they really explain math cognition? Is it a shared instinctual sense with other species, or are we nding simple
analogies in those species? This type of speculation has always been evident in the
primate language studies, which sought to establish, or else reject, a language instinct in primates. There really has emerged no impartial evidence to suggest that
chimpanzees and gorillas are capable of math or language in the same way that
humans are, nor of having the ability or desire to pass on to their offspring what
they have learned from their human mentors, despite claims to the contrary. Conditioning effects cannot be ruled out when assessing the reported ndings of the
primate experiments. Also, there is no way of ascertaining if the kinds of counting procedures witnessed in other animals are really no more than instinctive
Unauthenticated
Linguistic
Symbolic
number
system
Geometry
measurement
Numeration
number line
calculation
Spatial
attention
Quantitative
Numerical
magnitude
process
Magnitude
comparison
Cognitive
skills
Early numeracy
knowledge
Mathematical
outcomes
Figure 5.3: Model of numeracy and math cognition
responses to stimuli presented by the experimenters, rather than manifestations

of true numerical cognition.
Another early neuroscientic study of math cognition is Brian Butterworths
1999 book, What counts. As he suggests, human civilization is founded on the
development and elaboration of number sense and its relation to other faculties.
He then puts forward a model of how numbers are formed in the brain, how they
get there, and how they are used to explore the world.
He starts with the premise that we all possess an instinctual number sense
faculty, which he calls numerosity. This faculty is, purportedly, more basic to
human cognition and likely survival than language is. Basically, for Butterworth
numbers do not exist in the brain the way verbal forms such as words do; they
constitute a separate and unique kind of intelligence with their own brain module,
located in the left parietal lobe. But this alone does not guarantee that math cognition will emerge homogeneously in all individuals. Rather, the reason a person
falters at math is not because of a wrong gene or engine part in the left parietal lobe, but because the individual has not fully developed the number sense
with which he or she was born, and the reason for this is, of course, due to environmental and personal psychological factors, not nature. To use Alexanders
terminology (above), everyone has pre-math sense, but actual math knowledge
needs training and cultivation. It is no coincidence, therefore, that the left parietal lobe controls the movement of ngers, constituting a neurological clue to the
evolution of our number sense, explaining why we count on our ngers. The nonlinguistic nature of math also might explain why cultures that have no symbols
or words for numbers have still managed to develop counting systems for practical purposes. Butterworth presents ndings that neonates can add and subtract
Unauthenticated
5.2 Math cognition
279
even a few weeks old, contrary to Piagets ndings, that number sense requires
cognitive growth, and that people afflicted with Alzheimers have unexpected numerical abilities. The diagram below summarizes many of the ideas elaborated by
Butterworth. It is taken from his literature review of very low attainment in arithmetic (dyscalculia) which is a core decit in an inherited foundational capacity
for numbers (Butterworth 2010). It shows how it might come about:
(a)
(b)
Hidden layer
Symbolic representation
Numerals
Five
Semantic representation
(numerosity)
Three
Number words
Patterns of dots
(c)
Mediated semantic
pathway
Hidden layer
Semantic representations numeriosity (parietal)

1st Operand 2nd Operand Oper
Result
111000000000 111111000000 + 1111111110000
Direct semantic
pathway
Mediated symbolic
pathway
Symbolic representations, e.g. verbal (temporal)

1st Operand
2nd Operand Oper
Result
001000000000 000001000000 + 000000001000
Direct symbolic
pathway
Figure 5.4: Butterworths model
As can be seen, Butterworth connects numerosity with symbolism and semantic

pathways, suggesting that the core decit in dyscalculia may lie in such an inherited system responsible for representing approximate numerosity; but it could
also lie in the lack of a minimal system for numerosity, which is less than or equal
to four. An alternative proposal holds that the decit lies in an inherited system for
sets of objects and operations on them (numerosity coding) on which arithmetic
is built.
What counts is one of the rst books to provide a comprehensive picture of
how math cognition emerges and how it supposedly evolves neurologically. It
is a signicant work, but it leaves several evolutionary questions untouched,
at least from my perspective. Finding hard scientic evidence to explain why
numerosity emerged from the course of human evolution is, all told, a speculative venture. However, having said this, there is a body of research that is
Unauthenticated
supportive of Butterworths basic modelthat number sense is instinctual and

that it may be separate from language. In one recent study, Izard, Pica, Pelke,
and Dehaene (2011) looked at notions of Euclidean geometry in an indigenous
Amazonian society. The research team tested the hypothesis that certain aspects
of non-perceptible Euclidean geometry map onto intuitions of space that are
present in all humans (such as intuitions of points, lines, and surfaces), even in
the absence of formal mathematical training. The Amazonian society is called the
Mundurucu, and the subjects included adults and age-matched children controls
from the United States and France as well as younger American children without
education in geometry. The responses of Mundurucu adults and children converged with that of mathematically educated adults and children and revealed
an intuitive understanding of essential properties of Euclidean geometry. For
instance, on a surface described to them as perfectly planar, the Mundurucus
estimations of the internal angles of triangles added up to ~180 degrees, and
when asked explicitly, they stated that there exists one single parallel line to
any given line through a given point. These intuitions were also present in the
group of younger American participants. The researchers concluded that, during
childhood, humans develop geometrical intuitions that spontaneously accord
with the principles of Euclidean geometry, even in the absence of training in such
geometry.
In such studies however, one must also keep in mind the possibility of experimenter bias. Moreover, there exists contradictory evidence. For example, using
the concept of embodied cognition, Nez, Edwards, and Matos 1999 argue that
mathematics is an inherent skill inside the body-mind complex, with the physical and social context playing a determining role in how and if it develops. In a
relevant study, Lesh and Harel (2003) got students to develop their own models of
a problem space, guided by instruction. Without the latter, they were incapable
of coming up with them. The results of the study led to a whole spate of subsequent studies conrming the ndings. It appears that mathematics is not a unied
phenomenon, and awareness of what math is depends on rearing and situation.
The many proofs of the Pythagorean theorem provide concrete evidence of this.
There is no one proof, but many, depending on who, where, and why the proof is
developed. Nonetheless, the basic constituents of a proof will not change; the details will, thus also supporting indirectly the Butterworth hypothesis. As Harel and
Swoder (2007) have argued there exists a taxonomy of proof schemes, which is
based on the inuence of convention vis--vis how proofs are modeled and how
they are believed.
It can be argued that math and language are, actually, united by several
key evolutionary factors (Cartmill, Pilbeam, and Isaac 1986). The emergence of
abilities such as language and counting must have occurred in tandem, sharing a
Unauthenticated
5.2 Math cognition
281
large swath of neuro-evolutionary processes since both are a consequence of four

critical eventsbipedalism, a brain enlargement unparalleled among species,
an extraordinary capacity for tool-making, and the advent of the tribe as the
main form of human collective life. Bipedalism liberated the ngers to do several
thingscount and gesture. Both likely occurred simultaneously, thus negating
any uniqueness to nger-use for the math faculty; it was also needed, in fact,
for the language faculty. Although other species, including some non-primate
ones, are capable of tool use, only in the human species did complete bipedalism
free the hand sufficiently to allow it to become a supremely sensitive and precise
manipulator and grasper, thus permitting procient tool making and tool use in
the species. Shortly after becoming bipedal, the evidence suggests that the human species underwent rapid brain expansion. In the course of human evolution
the size of the brain has more than tripled. Modern humans have a braincase
volume of between 1300 and 1500 cc. The human brain has also developed three
major structural components that undergird the unique mental capacities of
the speciesthe large dome-shaped cerebrum, the smaller somewhat spherical
cerebellum, and the brainstem. The size of the brain does not determine the
degree of intelligence of the individual; this appears to be determined instead
by the number and type of functioning neurons and how they are structurally
connected with one another. And since neuronal connections are conditioned by
environmental input, the most likely hypothesis is that any form of intelligence,
however it is dened, is most likely a consequence of upbringing. Unlike the
early hominid adult skulls, with their sloping foreheads and prominent jaws,
the modern human skullwith biologically insignicant variationsretains a
proportionately large size, in relation to the rest of the body.
The large brain of modern-day Homo is more than double that of early toolmakers. This increase was achieved by the process of neoteny, that is, by the prolongation of the juvenile stage of brain and skull development in neonates. As a
consequence, human infants go through an extended period of dependency on,
and stimulation by, adults. In the absence of this close external bond in the early
years of life, the development of the infants brain would remain incomplete. This
strongly suggests that those notions that we hold as universal would dissipate and
even become extinct without the support of culture.
Like most other species, humans have always lived in groups. Group life
enhances survivability by providing a collective form of shelter. But at some
point in their evolutionary historyprobably around 100,000 years agobipedal
hominids had become so adept at tool-making, communicating, and thinking
in symbols that they became consciously aware of the advantages of a group
life based on a common system of representational activities. By around 30,000
to 40,000 years ago, the archeological evidence suggests, in fact, that hominid
Unauthenticated
groups became increasingly characterized by communal customs, language, and

the transmission of technological knowledge to subsequent generations. The
early tribal collectivities have left evidence that gesture (as inscribed on surfaces
through pictography) and counting skills occurred in tandem.
The evolutionary evidence can thus be interpreted differently from the interpretations of Dehaene and Butterworth. There is no right nor wrong in this case;
just speculation. Actually, several case studies of brain-damaged patients support
the locationist research of Butterworth and Dehaene. Defects in grasping numbers (known as anarithmia) have been shown to be associated with lesions in the
left angular gyrus and with Gerstmanns syndrome which involves the inability
to count with ones ngers. Patients with aculculia (inability to calculate), who
might read 14 as 4, have difficulty representing numbers with words. For example,
they might have difficulty understanding the meaning of hundred in expressions such as two hundred and a hundred thousand. Acalculia is associated
with Brocas aphasia and, thus with the left inferior frontal gyrus. But acalculia
has also been found in patients suffering from Wernickes aphasia, who have difculties in saying, reading, and writing numbers. This is associated with the left
posterior superior temporal gyrus. Patients with frontal acalculia have damage in
the pre-frontal cortex. They have serious difficulties in carrying out arithmetical
operations (particularly subtraction), and solving number problems. Dyscalculia is associated with the horizontal segment of the intraparietal sulcus, in both
hemispheres. Many studies have conrmed these patterns (Ardila and Rosselli
2002, Dehaene 2004, Isaacs, Edmonds, Lucas, and Gadian 2001, Dehaene, Piazza,
Pinel, and Cohen 2003, Butterworth, Varma, and Laurillard 2011).
A number of studies have also found numerosity in non-human animals. As
Dehaene (1997) himself showed, when a rat is trained to press a bar 8 or 16 times to
receive a food reward, the number of bar presses will approximate a Gaussian distribution with peak around 8 or 16 bar presses. When rats are more hungry, their
bar pressing behavior is more rapid. So, by showing that the peak number of bar
presses is the same for either well-fed or hungry rats, it is possible to disentangle
time and number of bar presses. McComb, Packer, and Pusey (1994) set up hidden
speakers in the African savannah to test natural (untrained) behavior in lions. The
speakers played a number of lion calls, from 1 to 5. If a single lioness heard, for
example, three calls from unknown lions, she would leave, while if she was with
four of her sisters, they would go and explore. This suggested to the researchers
that not only can lions tell when they are outnumbered but also that they can
do this on the basis of signals from different sensory modalities, suggesting that
numerosity is a multisensory cross-species ability.
In 2008, Burr and Ross noted an effect called the numerosity adaptation
effect as a perceptual phenomenon in math cognition, demonstrating how non-
Unauthenticated
5.2 Math cognition
283
symbolic numerical intuition and numerical percepts can impose themselves

upon the human brain automatically. Their experiment is summarized in the
following example from their study:
Stare at the fixation + sign for 30 sec, then see the figure below.
After staring at the figure above for 30 seconds, the left side
of the display should be experienced as more numerous than the right,
although they are actually identical (after Burr & Ross, 2008).
Figure 5.5: The numerosity adaptation effect
The effect shows that non-symbolic numerical intuition can imprint itself upon
the human brain directly. In the diagram a subject should have a strong impression that the display on the lower left is more numerous than the one on the
right, after 30 seconds of viewing the upper gure, although both have the same
number of dots. The subject might also underestimate the number of dots in the
display. The effects are resistant to the manipulation of non-numerical features of
the display (size, density, contrast). Since these effects happen automatically, the
operation of a largely automatic processing system in the brain appears to be the
most likely explanation. As Burr and Ross (2008: 428) observe: Just as we have
a direct visual sense of the reddishness of half a dozen ripe cherries, so we do of
their sixishness.
Some critics suggest that the effects are dependent on density and less so
on numerosity. Others suggest that numerosity may be related with kurtosis (the
perception of sharpness) and, thus, that the effect may be better explained in
Unauthenticated
terms of texture such that only the dots falling with the most effectively-displayed
region are the ones involved in the effects. However, since the display in the experiment was of spots that were uniformly either white or black, the kurtosis effect
is inapplicable. It is not the number of dots in the entire display that causes the
adaptation but only those within a particular area. At present, there is no real explanation of why adaptation has such a profound effect on numerosity. What the
experiment shows, however, is that perception and number sense are intrinsically
intertwined, and this brings out the force of contextual factors. The repetition of
the same experiment in various cultural contexts would go a long way to answering this question.
5.3 Mathematics and language

As discussed above, Dehaene and Butterworth, and a host of supporting studies, would suggest that mathematics and language are separate faculties, even
though they might intertwine in some tasks, and that they are essentially cultureindependent. They also claim that number sense is an innate faculty. But not
everyone agrees, and there is data to the contrary, as already discussed several
times. Keith Devlin (2000, 2005) asks a key question in this regard: If there is some
innate capacity for mathematical thinking, which there must be, otherwise no one
could do it, why does it vary so widely, both among individuals in a specic culture and across cultures? The question is a key one. Devlin, unlike Butterworth,
connects the math ability to language, since both are used by humans in very
similar ways. But this then raises another question: Why, then, do we acquire language easily, with no direct instruction, but have difficulty learning to do math
(in many cases)? The answer, according to Devlin, is that we can and do acquire
math effortlessly, but that we do not recognize that we are doing math when we
do it. As he argues, our prehistoric ancestors brains were essentially the same as
ours, so they must have had basic number sense. But those brains could hardly
have imagined how to multiply 15 by 36 or prove Fermats Last Theorem. In order
to conceptualize these language and training were required.
There are two kinds of math: the hard kind and the easy kind. The easy kind,
practiced by ants, shrimp, Welsh Corgis and the human kind, is innate. But, if we
have innate number sense, why do we have to teach math and why do most of us
nd it so hard to learn? Can we improve our math skills by learning from dogs,
cats, and other creatures that do math?
Unauthenticated
5.3 Mathematics and language | 285
5.3.1 Mathematics and gurative cognition

The last question brings us to the research that links mathematics and language
via metaphor. The rst major work to make the explicit claim that metaphor is
indeed the link is, of course, the one by Lakoff and Nez (2000), which ultimately stems from blending. Consider the formation of negative numbers. The
blending process in this case is manifested by grounding and linking conceptual
metaphors. The former are metaphors which encode ideas as grounded in experience. For example, addition develops from the experience of counting objects and
then inserting them in a collection. Linking metaphors connect concepts within
mathematics that may or may not be based on physical experiences. Some examples of this are the number line, inequalities, and absolute value properties
within an epsilon-delta proof of limit. Now, linking metaphors can be seen to be
the source of negative numbers. They are particular kinds of blends, as Alexander
(2012: 28) elaborates:
Using the natural numbers, we made a much bigger set, way too big in fact. So we judiciously collapsed the bigger set down. In this way, we collapse down to our original set of
natural numbers, but we also picked up a whole new set of numbers, which we call the negative numbers, along with arithmetic operations, addition, multiplication, subtraction. And
there is our payoff. With negative numbers, subtraction is always possible. This is but one
example, but in it we can see a larger, and quite important, issue of cognition. The larger set
of numbers, positive and negative, is a cognitive blend in mathematics The numbers, now
enlarged to include negative numbers, become an entity with its own identity. The collapse
in notation reects this. One quickly abandons the (minuend, subtrahend) formulation, so
that rather than (6, 8) one uses 2. This is an essential feature of a cognitive blend; something
new has emerged.
This kind of connective thinking occurs because of gaps that are felt to inhere in
the system. As Godino, Font, Wilhelmi, and Lurduy (2011: 250) cogently argue, notational systems are practical (experiential) solutions to the problem of counting:
As we have freedom to invent symbols and objects as a means to express the cardinality of
sets, that is to say, to respond to the question, how many are there?, the collection of possible
numeral systems is unlimited. In principle, any limitless collection of objects, whatever its
nature may be, could be used as a numeral system: diverse cultures have used sets of little
stones, or parts of the human body, etc., as numeral systems to solve this problem.
All this implies that mathematics is both invented and discovered, not through
abstract contemplation, but through the recruitment of everyday cognitive mechanisms that make human imagination and abstraction possible. Fauconnier and
Turner (2002) have proposed arguments along the same lines, giving substance
Unauthenticated
to the notion that ideas in mathematics are based on inferences deriving from
experiences and associations within these experiences.
The idea that metaphor plays a role in mathematics seems to have never been
held seriously until after Lakoff and Nezs watershed work, even though, as
Marcus (2012: 124) observes, mathematical terms are mainly metaphors:
For a long time, metaphor was considered incompatible with the requirements of rigor and
preciseness of mathematics. This happened because it was seen only as a rhetorical device
such as this girl is a ower. However, the largest part of mathematical terminology is the
result of some metaphorical processes, using transfers from ordinary language. Mathematical terms such as function, union, inclusion, border, frontier, distance, bounded, open, closed,
imaginary number, rational/irrational number are only a few examples in this respect. Similar metaphorical processes take place in the articial component of the mathematical sign
system.
Like language, no one aspect of mathematics can be taken in isolation. Matrix algebra is a more general way of doing arithmetic; Boolean algebra is a more general
way of doing algebra; and so on. The connecting links are, typically, conceptual
metaphors such as: arithmetic is motion along a path (a notion represented in the
number line), sets are containers, geometric gures are objects in space, recurrence
is circular, and so on. Many resist the approach taken by Lakoff and Nez, pointing out that there are strategies other than conceptual metaphor involved in doing
math. The main critics, though, come out of the computational camp (discussed
briey above).
As discussed in the opening chapter, already in the 1960s, a number of
linguists became intrigued by the relation between mathematics and language
(Hockett 1967, Harris 1968). Their work contained an important subtextby exploring the structures of mathematics and language in correlative ways, we might
hit upon deeper points of contact and thus at a common cognitive origin for both.
Mathematics makes sense when it encodes concepts that t our experiences of
the worldexperiences of quantity, space, motion, force, change, mass, shape,
probability, self-regulating processes, and so on. The inspiration for new mathematics comes from these experiences as it does for new language. What was
lacking at the time was the concept of blend, which started appearing only in the
early 2000s.
The example of Gdels famous proof, which Lakoff has argued (see Bockarova and Danesi 2012: 45), was inspired by Cantors diagonal method, as was
mentioned briey in the opening chapter. It is worth revisiting here. Gdel proved
that within any formal logical system there are results that can be neither proved
nor disproved. He found a statement in a set of statements that could be extracted by going through them in a diagonal fashionnow called Gdels diagonal
Unauthenticated
lemma. That produced a statement, S, like Cantors C, that does not exist in the
set of statements. Cantors diagonal and one-to-one matching proofs are mathematical metaphorsassociations linking different domains in a specic way
(one-to-one correspondences). This insight led Gdel to envision three metaphors
of his own (as we saw): (1) the Gdel number of a symbol, which is evident in the
argument that a symbol in a system is the corresponding number in the Cantorian
one-to-one matching system (whereby any two sets of symbols can be put into a
one-to-one relation); (2) the Gdel number of a symbol in a sequence, which is
manifest in the demonstration that the nth symbol in a sequence is the nth prime
raised to the power of the Gdel number of the symbol; and (3) Gdels central
metaphor, which was Gdels proof that a symbol sequence is the product of the
Gdel numbers of the symbols in the sequence.
The proof, as Lakoff argues, exemplies perfectly how blending works. When
the brain identies two distinct entities in different neural regions as the same
entity in a third neural region, they are blended together. Gdels metaphors come
from neural circuits linking a number source to a symbol target. In each case, there
is a blend, with a single entity composed of both a number and a symbol sequence.
When the symbol sequence is a formal proof, a new mathematical entity appears
a proof number.
5.3.2 Blending theory

The premise that mathematics and language share structural and functional properties comes down to the assumption that they occur in the same neural substratum or, at least, involve the operation of the same neural mechanisms. This entails
taking concepts in one domain and blending them with those in another to produce new ones or to simply understand existing ones. Changing the blends leads
to changes in mathematical structure and to its development. Blending theory
thus makes it possible to connect language and mathematics in a way that goes
beyond simple analogies.
Blending can be broken down into two main processes (Danesi 2007). The rst
one can be described as a clustering of source domains around a target domain.
When the topic of ideas comes up in discourse, speakers of English deliver it by
navigating conceptually through the various source domains that cluster around
it according to need, whim, or situation. For example, the sentence I cant see
why your ideas are not catching on, given that they have deep roots and lie on solid
ground has been put together with four source domains (seeing, attraction, plants,
and buildings) from the ICM of ideas.
Unauthenticated
Not all ICMs manifest a clustering structure. A second major type can be called
radiation, which inheres in different target domains being delivered by identical
source domains. It can be envisioned as a single source domain radiating outwards to deliver different target domains. For example, the plant source domain
above not only allows us to conceptualize ideas (That idea has deep ramications), but also such other abstract concepts as love (Our love has deep roots),
inuence (His inuence is sprouting all over), success (His career has borne great
fruit), knowledge (That discipline has many branches), wisdom (His wisdom has
deep roots), and friendship (Their friendship is starting to bud just now), among
many others. Radiation can be dened more neursocientically as the blending of
abstract concepts that implicate each other through a specic experiential model
or frame of reference (source domain). Radiation, by the way, explains why we
talk of seemingly different things, such as wisdom and friendship, with the same
metaphorical vehicles. Clustering, on the other hand, explains why we use different metaphorical vehicles. It thus allows people to connect source domains as
they talk.
Now, clustering can be seen in how algorithms and proofs are constructed. In
the proof of the triangle as containing 180 (chapter 2), several domains clustered
around the proof. First, the domain of angle sizes was involved in determining
that the straight line was an angle; second, there was the idea that angles can be
dissected into parts and then recombined. In other words, grounding and linking were involved in the proof, clustering around the main task of connecting the
statements in the proof.
Radiation can be seen in connective branches such as Cartesian geometry,
which blends arithmetic, algebra and geometry through the image schema of intersecting number lines. The radiation occurs in how these three domains radiate
outwards into linkages among each other, showing how arithmetic, algebra, and
geometry are highly interrelatedone assumes knowledge of the other. Descartes
called this radiative blend, of course, analytic geometry. A number line is itself
a rudimentary geometric representation that shows the continuity between positive and negative numbers and a one-to-one correspondence between a specic
number and a specic point on the line. Descartes simply drew two number lines
intersecting at right angles. The horizontal line is called the x-axis, the vertical
one the y-axis, and their point of intersection the origin. This system of two perpendicular intersecting number lines is called eponymously the Cartesian plane.
Blending is unconscious and that is why we hardly ever are aware of what
we are doing when we do math. Consider a simple statement such as 7 is larger
than 4. This is a metaphor, produced by blending a source domain that involves
concepts of size with the target domain of numbers (Presmeg 1997, 2005). The
conceptual metaphor that underlies the statement 7 is larger than 4 is numbers
Unauthenticated
are collections of objects of differing sizes. Similarly, the concept of quantity, involves at least two metaphorical blends. The rst is the more is up, less is down
image schema, which appears in common expressions such as the height of those
functions went up as the numerical value increased and the other functions sloped
downwards as the numerical values decreased. The other is linear scales are paths,
which manifests itself in expressions such as rational numbers are far more numerous than integers, and innity is way beyond any collection of nite sets. As Lakoff
(2012b: 164) puts it:
The metaphor maps the starting point of the path onto the bottom of the scale and maps
distance traveled onto quantity in general. What is particularly interesting is that the logic
of paths maps onto the logic of linear scales. Path inference: If you are going from A to C, and
you are now at in intermediate point B, then you have been at all points between A and B and
not at any points between B and C. Example: If you are going from San Francisco to N.Y. along
route 80, and you are now at Chicago, then you have been to Denver but not to Pittsburgh.
Linear scale inference: If you have exactly $ 50 in your bank account, then you have $ 40,
$ 30, and so on, but not $ 60, $ 70, or any larger amount. The form of these inferences is the
same. The path inference is a consequence of the cognitive topology of paths. It will be true
of any path image-schema.
As mathematician Freeman Dyson has also asserted, mathematicians are slowly

coming to the realization that mathematics is, in a basic way, a product of
metaphorical cognition (Dyson cited in Marcus 2012: 89):
Mathematics as Metaphor is a good slogan for birds. It means that the deepest concepts
in mathematics are those which link our world of ideas with another. In the seventeenth
century Descartes linked disparate worlds of algebra and geometry, with his concept of coordinates. Newton linked the worlds of geometry and dynamics, with his concept of uxions,
nowadays called calculus. In the nineteenth century Boole linked the worlds of logic and
algebra, with his concept of symbolic logic, and Riemann linked the worlds of geometry
and analysis with his concept of Riemann surfaces. Coordinates, uxions, symbolic logic,
and Riemann surfaces are all metaphors, extending the meanings of words from familiar to
unfamiliar contexts. Manin sees the future of mathematics as an exploration of metaphors
that are already visible but not yet understood.
The same kind of argument can be made for scientic thinking in general (Black
1962). Science often involves theorizing about things that we cannot see, hear,
touchatoms, gravitational forces, magnetic elds, and so on. So, scientists use
their imagination to take a look. The result is a metaphorical theory. A classic
example of this is the early history of atomic theory (Sebeok and Danesi 2000),
which can be sequenced into three main phases: (1) the Rutherford Model which
portrays the atom space as a tiny solar system; (2) the Bohr Model, which adds
quantized orbits to the Rutherford Model; and (3) the Schrdinger Model, which
Unauthenticated
posits the idea that electrons occupy regions of space. The three models are rendered in diagram form below (Danesi 2013). These show how radiation worksthe
Rutherford model radiating outwards (metaphorically) to suggest the Bohr model
which in turn radiates outward towards the Schrdinger model:
the nucleus
electrons
orbits
Figure 5.6: Diagram for Rutherfords model

of the atom
Nucleus
1st shell = 2 electrons
2nd shell = 8 electrons
3rd shell = 18 electrons
Figure 5.7: Diagram for Bohrs model

of the atom
6 protons and
6 neutrons
in the nucleus
orbits
electron
clouds
Figure 5.8: Diagram for Schrdingers model

of the atom
The way in which each model is composed is hardly haphazard, as Black pointed
out: each one attempts to model atomic structure according to specic types of
experimental data, and each one is generated from a radiative ICMone target
domain linked to separate source domains. The target domain in all three cases
is atomic structure; but each diagram provides, literally, a different metaphorical view of the same domaina domain that is not directly accessible to vision.
Rutherford speculated that atomic structure mirrors the solar systema theory
Unauthenticated
that may have been inuenced by the ancient Pythagorean concept of the cosmos
as having the same structure at all its levels, from the microcosmic (the atom) to
the macrocosmic (the universe). The Bohr Model is, in effect, an extension of the
Rutherford one, and the Schrdinger Model an extension of the previous two. The
model envisioned by Rutherford is a rst-order blend of the structure of the solar
and atomic systems. Bohr began with Rutherfords model as his source domain,
but then postulated further that electrons can only move in certain quantized orbits, blending emerging ideas in quantum physics to the Rutherford model. Bohr
was thus able to explain certain qualities of emission for hydrogen, but failed for
other elements. His was a second-order blenda blend of a previous blend.
Schrdingers model, in which electrons are described not by the paths they
take but by the regions where they are most likely to be found, can explain certain qualities of emission spectra for all elements. The basic source domain has
not changed, but it is now elaborated signicantly to account for phenomena that
are not covered by the original model. It was in 1926 that Schrdinger used mathematical equations to describe the likelihood of nding an electron in a certain
position. Unlike the Bohr model, Schrdingers model does not dene the exact
path of an electron, but rather, predicts the odds of the location of the electron.
This model is thus portrayed as a nucleus surrounded by an electron cloud. Where
the cloud is most dense, the probability of nding the electron is greatest, and on
the other side, the electron is less likely to be in a less dense area of the cloud. This
model is a third-order blenda blend of previous blends. Note, however, that at
each stage of the development of atomic theory, there is an inherent connectivity. Blending occurs in different orders to produce complex ideas. The trace to the
brains inner blending processes is metaphor, either conceptually or visually (as
in the diagrams above). This is also why physicists use metaphor descriptively, referring to sound waves as undulating through empty space, atoms as leaping from
one quantum state to another, and electrons as orbiting an atomic nucleus; and
so on. The physicist K. C. Cole (1984: 156) puts it as follows:
The words we use are metaphors; they are models fashioned from familiar ingredients and
nurtured with the help of fertile imaginations. When a physicist says an electron is like a
particle, writes physics professor Douglas Giancoli, he is making a metaphorical comparison like the poet who says love is like a rose. In both images a concrete object, a rose or a
particle, is used to illuminate an abstract idea, love or electron.
As Robert Jones (1982: 4) has also pointed out, for the scientist metaphor serves
as an evocation of the inner connection among things. It is interesting and relevant to note that the philosopher of science, Fernand Hallyn (1990), identied
the goal of science as that of giving the world a poetic structure. Scientic models, in this view, are visual-metaphorical interpretations of given information that
Unauthenticated
lead to further connections and insights. Marcus (2012: 184) writes on this theme
insightfully as follows:
When mathematics is involved in a cognitive modeling process, both analogical and indexical operations are used. But the conict is unavoidable, because the model M of a situation A
should be concomitantly as near as possible to A (to increase the chance of the statements
about M to be relevant for A too), but, on the other hand, M should be as far as possible
from A (to increase the chance of M to can be investigated by some method which is not
compatible with the nature of A). A similar situation occurs with cognitive mathematical
metaphors. Starting as cognitive model or metaphor for a denite, specic situation, M acquires an autonomous status and it is open to become a model or a metaphor for another,
sometimes completely different situation. M may acquire some interpretation, but it can also
abandon it, to acquire another one. No mathematical construction can be constrained to
have a unique interpretation, its semantic freedom is innite, because it belongs to a ctional universe: mathematics. Mathematics has a strong impact on real life and the real
world has a strong impact on mathematics, but all these need a mediation process: the replacement of the real universe by a ctional one.
And, as Shorser (2012: 296) asserts, the embedded metaphorical structure in a

mathematical model is, ipso facto, its meaning: In the absence sensory data, we
perceive mathematical objects through cognitive metaphor, imbuing an abstract
mathematical object with meaning derived from physical experience or from other
mathematical objects, ultimately linking every chain of metaphors back to concepts that are directly based upon physical perceptions.
Mark Turner (2012) refers to knowledge-making as a packing-vs.-unpacking
process. One of the characteristics of mathematical and scientic representation
is its tendency to compress information into compact forms, such as the diagrams
above. When ideas are represented in this way, their structure becomes evident,
and new ideas are possible because of the simplication afforded by the compression and abstraction. As Whiteley (2012: 264) puts it, the sum and substance
of mathematical modeling is the packing of ideas into ever-increasing abstract
blends:
Mathematical modeling can be viewed as a careful, and rich, double (or multiple) blend
of two (or more) signicant spaces. In general, modeling involves at least one space that
is tangibleaccessible to the senses, coming with some associated meaning (semiotics)
and a question to be answered! The mental space for this physical problem contains some
features and properties that, if projected, will support reasoning in the blend. The mental
space also includes a number of features and properties that will, if projected by distracting.
Worse, these irrelevant features may suggest alternative blends that are not generative
of solutions to the problem. Selective forgetting has been recognized as a crucial skill in
modeling with mathematicssometimes referred to as a form of abstraction.
Unauthenticated
5.3 Mathematics and language
| 293
When ideas are represented in this way, their structural possibilities become evident in the blend itself, which is a kind of snapshot of hidden or suggestive structure, and new ideas are possible because of this. It is this hidden structure packed
into a blend that is often the source of discovery in mathematics. Unpacking it describes a large amount of how mathematical cognition unfolds. Progress is thus
guided by blending on blending and so on, ad innitum.
One can, actually, describe entire systems in terms of n-order blends. For
example, algebra is a second-order blend from arithmetic. The ancient Egyptians
and Babylonians used a proto-form of algebra, and hundreds of years later, so too
did the Greeks, Chinese, and people of India. Diophantus used what we now call
quadratic equations and symbols for unknown quantities. But between 813 and
833, al-Khwarizmi, a teacher in the mathematical school in Baghdad, wrote an
inuential book on algebra that came to be used as a textbook. As al-Khwarizmi
argued, restoration and completion were symbol-manipulating techniques. As
such, they enshrined algebra as a separate and powerful branch of arithmetic. It
was then that algebra developed into the equation modeling system that it has
become. This happened between the fteenth and seventeenth centuries when,
as Bellos (2010: 123) puts it, mathematical sentences moved from rhetorical to
symbolic expression. As Bellos (2010: 124) goes on to write:
Replacing words with letters and symbols was more than convenient shorthand. The symbol x may have started as an abbreviation for unknown quantity, but once invented, it
became a powerful tool for thought. A word or an abbreviation cannot be subjected to mathematical operation in the way that a symbol like x can be. Numbers made counting possible;
but letter symbols took mathematics into a domain far beyond language.
Algebra made formulas in science a reality, greatly enhancing the power of science to explore reality. As Crilly (2011: 104) observes, the desire to nd a formula
is a driving force in science and mathematics. Perhaps the worlds most famous
example of this is Einsteins E = mc2 , which compresses so much information in
it that it dees common sense even to start explaining why this should be so. The
formula, devised in 1905, tells us that the energy (E) into which a given amount of
matter can change equals the mass (m) of that matter multiplied by the speed of
light squared (c2 ). Using this equation, scientists determined that the ssioning
of 0.45 kilograms of uranium would release as much energy as 7,300 metric tons
of TNT. Constructing a formula is, in effect, devising a notation for something.
But in so doing, it also becomes a predictive tool, as is evident in the applications
of Einsteins formula. Science is prophecy of a mathematical kind. Mathematical
formulas predict events that have not occurred; and when some new formula predicts them better or shows that the previous formulas are faulty, then replacement
occurs.
Unauthenticated
5.4 Concluding remarks

The work and debate on the neuroscientic aspects of mathematics initiated by
several key works in the late 1990s and early 2000s has brought a wave of experimental seriousness to the question of what mathematics is. But in the end
neuroscientic theories are essentially metaphors themselves, as we have argued
here and basically throughout this book. They are useful, insightful, and certainly
very interesting; but they cannot really explain mathematics in its totality. As in
atomic modeling, they can only offer glimpses with n-order blending (blend of a
blend of a blend ). Gdel made it obvious to mathematicians that mathematics
was made by them, and that the exploration of mathematical truth would go on
forever as long as humans were around. Like other products of the imagination,
mathematics lies within the minds of humans. In effect, mathematics is itself an
attempt to unpack reality.
The main objective of this foray into the common ground that language and
mathematics share has been to illustrate how mathematics has been used by linguists to develop models, to rene certain techniques, to develop insights into
language and to investigate the common neural mechanisms involved in generating language and mathematics.
Glottochronology introduced various useful notions into linguistics, such as
that of core vocabulary and time depth. It provided a quantitative basis to test and
evaluate theories of language changetheories that can be discussed in much
more concrete mathematical ways than in purely speculative or inferential ways.
Corpus linguistics and statistical analyses of data have injected a critical empirical element into the conduct of linguistic inquiry, bringing linguistics closer in
methodology to other social and cognitive sciences. The use of mathematics as a
metalanguge in formal grammars has also been a very useful trend in linguistics,
since it has raised many questions about the nature of meaning and its relation to
grammarquestions tackled directly by computational linguistics. The latter has,
actually, brought linguistics into the age of the Internet, since its main research
agenda is shaped by nding ways to reproduce natural language in digital forms.
But there is one aspect of language that cannot be described so easilymeaning.
One way around the problem of meaning is to relegate it to elds outside of linguistics, such as philosophy and semiotics. Another way is to formalize it in terms of
algorithmic models. But whatever one does it remains elusive. Perhaps meaning
is something that is intractable scientically.
In this book, I have attempted to cover some of the more salient areas of the
common groundespecially those that have beneted from cross-fertilization. My
sense is that the interdisciplinary paradigm is gradually becoming an intrinsic
part of both disciplines and that what has been called a hermeneutic approach
Unauthenticated
5.4 Concluding remarks |
295
is starting to yield signicant insights into the question of what math and language are. Linguistics is, in fact, highly exible as a science, both theoretically and
methodologically. Together with traditional forms of eldwork and ethnographic
analysis, the use of mathematics can help the linguist gain insights into language
and discourse that would be otherwise unavailable (as we have seen throughout
this book). That, in my view, is the most important lesson to be learned from considering the math-language nexus. The more we probe similarities (or differences)
in mathematics and language with all kinds of tools, the more we will know about
the mind that creates both. That, as Arika Orent (2009) aptly puts it, should always be the fundamental goal of linguistics (and math cognition for that matter):
The job of the linguist, like that of the biologist or the botanist, is not to tell us
how nature should behave, or what its creations should look like, but to describe
those creations in all their messy glory and try to gure out what they can teach
us about life, the world, and, especially in the case of linguistics, the workings of
the human mind.
Unauthenticated
Unauthenticated
Bibliography
Adam, John A. 2004. Mathematics in nature: Modeling patterns in the natural world. Princeton,
NJ: Princeton University Press.
Al-Khalili, Jim. 2012. Paradox: The nine greatest enigmas in physics. New York, NY: Broadway.
Alexander, James. 2012. On the cognitive and semiotic structure of mathematics. In Mariana
Bockarova, Marcel Danesi, and Rafael Nez (eds.), Semiotic and cognitive science essays
on the nature of mathematics, 134. Munich: Lincom Europa.
Allan, Keith. 1986. Linguistic meaning. New York, NY: Routledge.
Allwein, Gerard and Barwise, Jon (eds.) 1996. Logical reasoning with diagrams. Oxford: Oxford
University Press.
Alpher, Barry. 1987. Feminine as the unmarked grammatical gender: Buffalo girls are no fools.
Australian Journal of Linguistics 7. 169187.
Ambrose, Rebecca C. 2002. Are we overemphasizing manipulatives in the primary grades to the
detriment of girls? Teaching Children Mathematics 9: 1621.
Andersen, Henning. 1989. Markedness theory: The rst 150 Years. In Olga M. Tomic (ed.),
Markedness in synchrony and diachrony, 1116. Berlin: Mouton de Gruyter.
Andersen, Henning. 2001. Markedness and the theory of linguistic change. In Henning Andersen (ed.), Actualization, 1957. Amsterdam: John Benjamins.
Andersen, Henning. 2008. Naturalness and markedness. In: K. Wellems and L. De Cuypere
(eds.), Naturalness and iconicity in language, 101119. Amsterdam: John Benjamins.
Andersen, Peter B. 1991. A theory of computer semiotics. Cambridge: Cambridge University
Press.
Anderson, Myrdene, Senz-Ludlow, Adalira, and Cifarelli, Victor (eds.). 2003. Educational perspectives on mathematics as semiosis: From thinking to interpreting to knowing. Ottawa:
Legas Press.
Anderson, Myrdene, Senz-Ludlow, Adalira, and Cifarelli, Victor. 2000. Musement in mathematical manipulation. In Adrian Gimate-Welsh (ed.), Ensayos semiticos, 663676.
Mexico: Porra.
Andrews, Edna and Tobin, Yishai (eds.). 1996. Toward a calculus of meaning: Studies in
markedness, distinctive features and deixis. Amsterdam: John Benjamins.
Andrews, Edna. 1990. Markedness theory. Durham, NC: Duke University Press.
Andrews, Edna. 2003. Conversations with Lotman: Cultural semiotics in language, literature,
and cognition. Toronto: University of Toronto Press.
Anndsen, Jens. 2006. Aristotle on contrariety as a principle of rst philosophy. Uppsala: Uppsala University Thesis.
Appel, Kenneth and Haken, Wolfgang. 1986. The Four Color Proof suffices. The Mathematical
Intelligencer 8: 1020.
Appel, Kenneth and Haken, Wolfgang. 2002. The Four-Color Problem. In: D. Jacquette (ed.),
Philosophy of mathematics, 193208, Oxford: Blackwell.
Ardila A. and Rosselli M. 2002. Acalculia and dyscalculia. Neuropsychology Review 12: 179
231.
Aristotle. (1952a). Rhetoric. In The works of Aristotle, Vol. 11, W. D. Ross (ed.). Oxford: Clarendon
Press.
Aristotle. (1952b). Poetics. In The works of Aristotle, Vol. 11, W. D. Ross (ed.). Oxford: Clarendon
Press.
Unauthenticated
298 | Bibliography
Aristotle. 2012. The Organon, trans. R. B. Jones, E. M. Edghill, and A. J. Jenkinson. CreateSpace
Independent Publishing Platform.
Arndt, Walter W. 1959. The performance of glottochronology in Germanic. Language 35. 180
192.
Arnheim, Rudolf. 1969. Visual thinking. Berkeley, CA: University of California Press.
Arranz, Jos I. P. 2005. Towards a global view of the transfer phenomenon. The Reading Matrix
5. 116128.
Ascher, Marcia. 1991. Ethnomathematics: A multicultural view of mathematical ideas. Pacic
Grove, CA: Brooks/Cole.
Association of Teachers of Mathematics. 1980. Language and mathematics. Washington: Association of Teachers of Mathematics.
Aubry, Mathieu. 2009. Metaphors in mathematics: Introduction and the case of algebraic geometry. Social Science Research Network. Available at SSRN: http://ssrn.com/abstract=
1478871 or http://dx.doi.org/10.2139/ssrn.1478871
Babin, Arthur E. 1940. The theory of opposition in Aristotle. Notre Dame, IN: Notre Dame Doctoral Thesis.
Bach, Emmon W. 1989. Informal lectures on formal semantics. Albany, NY: SUNY Press.
Bck, Alan. 2000. Aristotles theory of predication. Leiden: Brill.
Bacon, Roger. 2009. The art and science of logic, trans. Thomas S. Maloney. Toronto: PIMS.
Ball, Deborah and Bass, Hyman (2002). Toward a Practiced-Based Theory of Mathematical
Knowledge for Teaching. In: Elaine Simmt and Brent David, eds., Proceedings of the 2002
Annual Canadian Mathematics Education Study Group/Groupe Canadien dtude en Didactique des Mathematiques, 2327. Sherbrooke, Canada.
Ball, Keith M. 2003. Strange curves, counting rabbits, and other mathematical Explorations.
Princeton, NJ: Princeton University Press.
BarHillel, Yehoshua. 1953. A quasi arithmetical notation for syntactic description. Language
29. 4758.
BarHillel, Yehoshua. 1960. The present status of automatic translation of languages. Advances in Computers 1. 91163.
Barbaresi, Lavinia M. 1988. Markedness in English discourse: A semiotic approach. Parma:
Edizioni Zara.
Barker-Plummer, Dave and Bailin, Sydney C. 1997. The role of diagrams in mathematical proofs.
Machine Graphics and Vision 8: 2558.
Barker-Plummer, Dave and Bailin, Sydney C. 2001. On the practical semantics of mathematical
diagrams. In: M. Anderson (ed.), Reasoning with diagrammatic representations. New York,
NY: Springer.
Barrett, William. 1986. The death of the soul: From Descartes to the computer. New York, NY:
Anchor.
Barrow, John D. 2014. 100 essential things you didnt know about maths & the arts. London:
Bodley Head.
Barthes, Roland. 1964. Elements of semiology. London: Cape.
Barthes, Roland. 1967. Systme de la mode. Paris: Seuil.
Barwise, Jon and Etchemendy, John 1994. Hyperproof. Stanford, CA: CSLI Publications.
Barwise, Jon and Etchemendy, John. 1986. The liar. Oxford: Oxford University Press.
Bateson, Gregory. 1972. Steps to an ecology of mind. New York, NY: Ballantine.
Battistella, Edwin L. 1990. Markedness: The evaluative superstructure of language. Albany, NY:
State University of New York Press.
Unauthenticated
Bibliography |
299
Battistella, Edwin L. 1996. The logic of markedness. Oxford: Oxford University Press.
Baudouin de Courtenay, Jan. 1894 [1972]. A Baudouin de Courtenay anthology: The beginnings
of structural linguistics, ed. and trans. Edward Stankiewicz. Bloomington, IN: Indiana
University Press.
Beckmann, Petr. 1981. A history of . New York, NY: St. Martins.
Belardi, Walter. 1970. Lopposizione privativa. Napoli: Istituto Universitario Orientale di Napoli.
Bellos, Alex. 2010. Heres looking at Euclid: A surprising excursion through the astonishing
world of math. Princeton, NJ: Princeton University Press.
Bellos, Alex. 2014. The grapes of math: How life reects numbers and numbers reect life. New
York, NY: Doubleday.
Belsey, Catherine. 2002. Poststructuralism: A very short introduction. Oxford: Oxford University
Press.
Benford, Frank. 1938. The law of anomalous numbers. Proceedings of the American Philosophical Society 78: 551572.
Benjamin, Arthur, Chartrand, Gary, and Zhang, Ping. 2015. The fascinating world of graph theory. Princeton, NJ: Princeton University Press.
Benthem, Johann van and Ter Meulen, Alice (eds.). 2010. Handbook of logic and language,
2nd ed. Oxford: Elsevier.
Benveniste, Emile. 1946. Structure des relations de personne dans le verbe. Bulletin de la Socit de Linguistique de Paris 43. 225236.
Bergen, Benjamin K. 2001. Nativization processes in L1 Esperanto. Journal of Child Language
28. 575595.
Bergin, Thomas G. and Max H. Fisch. 1984. The New Science of Giambattista Vico, 2nd ed.
Ithaca, NY: Cornell University Press.
Bergsland, Knut and Vogt, Hans. 1962. On the validity of glottochronology. Current Anthropology 3. 115153.
Berlinski, David. 2013. The king of innite space: Euclid and his elements. New York, NY: Basic
Books.
Bernacer, Javier and Murillo, Jos Ignacio. 2014. The Aristotelian conception of habit and its
contribution to human neuroscience. Frontiers if Human neuroscience 8: 883.
Bernstein, Basil. 1971. Class, codes and control: Theoretical studies towards a sociology of
language. London: Routledge.
Bickerton, Derek. 2014. More than nature needs: Language, mind, and evolution. Cambridge,
MA: Harvard University Press.
Billeter, Jean Franois. 1990. The Chinese art of writing. New York, NY: Rizzoli.
Billow, R. M. 1975). A cognitive developmental study of metaphor comprehension. Developmental Psychology 11: 415423.
Black, Max. 1962. Models and metaphors. Ithaca, NY: Cornell University Press.
Blanch, Robert. 1966. Structures intellectuelles. Paris: Vrin.
Blatner, David. 1997. The joy of pi. Harmondsworth: Penguin.
Bloomeld, Leonard. 1933. Language. New York, NY: Holt.
Boas, Franz. 1940. Race, language, and culture. New York, NY: Free Press.
Bochnski, Innocentius M. J. 1961. A history of formal logic. Notre Dame, IN: University of Notre
Dame Press.
Bockarova, Mariana, Marcel Danesi and Rafael Nez (eds.). 2012. Semiotic and cognitive science essays on the nature of mathematics. Munich: Lincom Europa.
Unauthenticated
300 | Bibliography
Bod, Rens, Hay, Jennifer and Jannedy, Stefanie. 2003. Probabilistic linguistics. Cambridge:
MIT Press.
Bogoslovksy, Boris B. 1928. The technique of controversy. London: Paul, Trench and Teubner.
Bolinger, Dwight. 1968. Aspects of language. New York, NY: Harcourt, Brace, Jovanovich.
Boole, George. 1854. An investigation of the laws of thought. New York, NY: Dover.
Booth, Andrew D. 1955. Use of a computing machine as a mechanical dictionary. Nature 176.
565.
Booth, Andrew D. and Locke, William N.. 1955. Historical introduction. In W. N. Locke and A. D.
Booth (eds.), Machine translation of languages, 114. New York, NY: John Wiley.
Borel, mil. 1909. Le continu mathmatique et le continu physique. Rivista di Scienza 6: 2135.
Bottini, Gabriella, Corcoran, Rhiannon, Sterzi, Roberto, Paulesu, Eraldo, Schenone, Pietro,
Scarpa, Pina, Frackowiak, Richard S. J., and Frith, Christopher D. 1994. The role of the right
hemisphere in the interpretation of gurative aspects of language: A positron emission
tomography activation study. Brain 117: 12411253.
Brainerd, Barron. 1970. A stochastic process related to language change. Journal of Applied
Probability 7. 6978.
Bronowski, Jacob. 1973. The ascent of man. Boston, MA: Little, Brown, and Co.
Bronowski, Jacob. 1977. A sense of the future. Cambridge, MA: MIT Press.
Brown, Roger. 1958. Words and things: An introduction to language. New York, NY: The Free
Press.
Brown, Roger. 1986. Social psychology. New York, NY: Free Press.
Brownell, Hiram H. 1988. Appreciation of metaphoric and connotative word meaning by braindamaged patients. In: Cgristine Chiarello (ed.), Right hemisphere contributions to lexical
semantics, 1932. New York, NY: Springer.
Brownell, Hiram H., Heather H. Potter and Diane Michelow. 1984. Sensitivity to lexical denotation and connotation in braindamaged patients: A double dissociation? Brain and
Language 22. 253265.
Bruno, Giuseppe, Genovese, Andrea, and Improta, Gennaro. 2013. Routing problems: A historical perspective. In: Mircea Pitici (ed.), The best writing in mathematics 2012. Princeton, NJ:
Princeton University Press.
Bryant, Edwin. 2001. The quest for the origins of Vedic culture. Oxford: Oxford University Press.
Buckland, William. 2007. Forensic semiotics. Semiotic Review of Books 10. 916.
Bhler, Karl. 1908 [1951]. On thought connection. In D. Rapaport (ed.), Organization and
pathology of thought, 8192. New York, NY: Columbia University Press.
Bhler, Karl. 1934. Sprachtheorie: Die Darstellungsfunktion der Sprache. Jena: Fischer.
Burke, John and Kincannon, Eric. 1991. Benfords law and physical constants: The distribution
of initial digits. American Journal of Physics 14. 5963.
Burr, David and Ross, John. 2008. A visual sense of number. Current Biology 18: 425428.
Butterworth, Brian. 1999. What counts: How every brain is hardwired for math. Michigan: Free
Press.
Butterworth, Brian. 2010. Foundational numerical capacities and the origins of dyscalculia.
Trends in Cognitive Science 14: 534541.
Butterworth Brian, Varma Sashank, and Laurillard Diana. 2011. Dyscalculia: From brain to education. Science 332: 10491053.
Bybee, Joan. 2006. Frequency of use and organization of language. Oxford: Oxford University
Press.
Unauthenticated
Bibliography |
301
Callaghan, Catherine A. 1991. Utian and the Swadesh list. In J. E. Redden (ed.), Papers for the
American Indian language conference, held at the University of California, Santa Cruz, July
and August, 1991, 218237. Carbondale, IL: Department of Linguistics, Southern Illinois
University.
Calude, Cristian and Paun, Gheorghe. 1981. The absence of contextual ambiguities in programming languages. Revue Roumaine de Linguistique: Cahiers de Linguistique Thorique et
Applique 18. 91110.
Calude, Cristian. 1976. Quelques arguments pour le caractre nonformel des langages de
programmation. Revue Roumaine de Linguistique: Cahiers de Linguistique Thorique et
Applique 13. 257264.
Cameron, Angus. 2011. Ground zeroThe semiotics of the boundary line. Social Semiotics 21.
417434.
Cann, Ronnie. 1993. Formal semantics: An introduction. Cambridge: Cambridge University
Press.
Cantor, Georg. 1874. ber eine Eigenschaft des Inbegriffes aller reelen algebraischen Zahlen.
Journal fr die Reine und Angewandte Mathematik 77. 258262.
Cappelletti, Marinella, Butterworth, Brian, and Kopelman, Michael. 2006. The understanding
of quantiers in semantic dementia: A singlecase study. Neurocase: The Neural Basis of
Cognition 12. 136145.
Cardano, Girolamo. 1663 [1961]. The book on games of chance (Liber de ludo aleae). New York:
Holt, Rinehart, and Winston.
Carroll, Lewis 1879 [2004]. Euclid and his modern rivals. New York, NY: Dover.
Carroll, Lewis. 1887. The game of logic. New York, NY: Dover.
Cartmill, Matt, Pilbeam, David, and Isaac, Glynn. 1986. One hundred years of paleoanthropology. American Scientist 74: 410420.
Cassirer, Ernst. 1944. An essay on man. New Haven, CT: Yale University Press.
Chaitin, Gregory J. 2006. Meta math. New York, NY: Vintage.
Chandrasekaran, B., Glasgow, Janice, and Narayanan, N. Hari (eds.) 1995. Diagrammatic reasoning: Cognitive and computational perspectives. Cambridge, MA: MIT Press.
Changeux, Pierre, 2013. The good, the true, and the beautiful: A neuronal approach. New
Haven, CT: Yale University Press.
Chartier, Tim. 2014. Math bytes. Princeton, NJ: Princeton University Press.
Cherry, Colin. 1957. On human communication. Cambridge, MA: MIT Press.
Cho, Yank S. and Proctor, Robert W. 2007. When is an odd number not odd? Inuence of task
rule on the MARC effect for numeric classication. Journal of Experimental Psychology,
Learning, Memory, and Cognition 33. 832842.
Chomsky, Noam and Halle, Morris. 1968. The sound pattern of English. New York, NY: Harper
and Row.
Chomsky, Noam. 1957. Syntactic structures. The Hague: Mouton.
Chomsky, Noam. 1965. Aspects of the theory of syntax. Cambridge, MA: MIT Press.
Chomsky, Noam. 1966a. Cartesian linguistics: A chapter in the history of rationalist thought.
New York, NY: Harper and Row.
Chomsky, Noam. 1966b. Topics in the theory of generative grammar. The Hague: Mouton.
Chomsky, Noam. 1975. Reections on language. New York, NY: Pantheon.
Chomsky, Noam. 1982. Some concepts and consequences of the theory of government and
binding. Cambridge, MA: MIT Press.
Unauthenticated
302 | Bibliography
Chomsky, Noam. 1986. Knowledge of language: Its nature, origin, and use. New York, NY:
Praeger.
Chomsky, Noam. 1990. Language and mind. In D. H. Mellor (ed.), Ways of communicating, 56
80. Cambridge: Cambridge University Press.
Chomsky, Noam. 1995. The minimalist program. Cambridge, MA: MIT Press.
Chomsky, Noam. 2000. New horizons in the study of language and mind. Cambridge: Cambridge University Press.
Chomsky, Noam. 2002. On nature and language. Cambridge: Cambridge University Press.
Chretien, Douglas. 1962. The mathematical models of glottochronology. Language 38. 1137.
Church, Alan. 1935. Abstract No. 204. Bulletin of the American Mathematical Society 41: 332
333
Church, Alan. 1936. An unsolvable problem of elementary number theory. American Journal of
Mathematics 58: 345363.
Cienki, Alan, Luka, Barbara J., and Smith, Michael B. (eds.). 2001. Conceptual and discourse
factors in linguistic structure. Stanford, CA: Center for the Study of Language and Information.
Clark, Michael. 2007. Paradoxes from A to Z. London: Routledge.
Clawson, Calvin C. 1999. Mathematical sorcery: Revealing the secrets of numbers. Cambridge,
MA: Perseus.
Clivio, Gianrenzo P., Danesi, Marcel and Maida-Nicol, Sara. 2011. Introduction to Italian dialectology. Munich: Lincom Europa
Cobham, Alan. 1965. The intrinsic computational difficulty of functions. Proceedings of Logic,
Methodology, and Philosophy of Science II, North Holland.
Cole, K. C. 1984. Sympathetic vibrations. New York, NY: Bantam.
Collins, Joan M. 1969. An exploration of the role of opposition in cognitive processes of kindergarten children. Ontario Institute for Studies in Education Theory.
Colyvan, Mark. 2012. An introduction to the philosophy of mathematics. Cambridge: Cambridge
University Press.
Connor, K. and Kogan, N. 1980. Topic-vehicle relations in metaphor: The issue of a symmetry.
In: R. P. Honeck and R. R. Hoffman (eds.), Cognition and gurative language, 238308.
Hillsdale, NJ: Lawrence Erlbaum Associates.
Cook, Stephen. 1971. The complexity of theorem proving procedures. Proceedings of the Third
Annual ACM Symposium on Theory of Computing. pp. 151158.
Cook, Walter A. 1969. Introduction to tagmemic analysis. New York, NY: Holt, Rinehart and
Winston.
Cook, William J. 2014. In pursuit of the traveling salesman problem. Princeton, NJ: Princeton
University Press.
Coseriu, Eugenio. 1973. Probleme der strukturellen Semantik. Tbingen: Tbinger Beitrge zur
Linguistik 40.
Coughlin, Deborah A. 2003. Correlating automated and human assessments of machine translation quality. In MT Summit IX, New Orleans, USA 2327.
Courant, Richard and Robbins, Herbert (1941). What is mathematics? An elementary approach
to ideas and methods. Oxford: Oxford University Press.
Craik, Kenneth. 1943. The nature of explanation. Cambridge: Cambridge University Press.
Craik, Kenneth. 1943. The nature of explanation. Cambridge: Cambridge University Press.
Crilly, Tony. 2011. Mathematics. London: Quercus.
Cruse, D. Alan. 1986. Lexical semantics, Cambridge, Eng.: Cambridge University Press.
Unauthenticated
Bibliography |
303
Crystal, David. 2006. Language and the Internet. 2nd ed. Cambridge: Cambridge University
Press.
Crystal, David. 2008. txtng: the gr8 db8. Oxford: Oxford University Press.
Cummins, Robert. 1996. Representations, targets, and attitudes. Cambridge, MA: MIT Press.
Currie, Thomas E., Meade, Andrew, Guillon, Myrtille, and Mace, Ruth. 2013. Cultural phylogeography of the Bantu languages of SubSaharan Africa. Royal Society Publishing.
http://royalsocietypublishing.org/content/280/1762/20130695.
Dalrymple, Mary (ed.). 1999. Semantics and syntax in lexical functional grammar: The resource
logic approach. Cambridge, MA: MIT Press.
Dalrymple, Mary, Lamping, John, and Saraswat, Vijay. 1993. LFG semantics via constraints.
In Proceedings of the Sixth Meeting of the European ACL (97105). Utrecht: University of
Utrecht.
Dalrymple, Mary. 2001. Lexical functional grammar, No. 42 in Syntax and Semantics Series.
New York, NY: Academic Press.
Damasio, Antonio R. 1994. Descartes error: Emotion, reason, and the human brain. New York:
G. P. Putnams Sons.
Danesi, Marcel and Bockarova, Mariana. 2013. Mathematics as a modeling system. Tartu: University of Tartu Press.
Danesi, Marcel and Rocci, Andrea. 2009. Global linguistics: An introduction. Berlin: Mouton
de Gruyter.
Danesi, Marcel. 1987. Formal mothertongue training and the learning of mathematics in
elementary school: An observational note on the Brussels Foyer Project. Scientia Paedogogica Experimentalis 24: 313320.
Danesi, Marcel. 1998. Gender assignment, markedness, and indexicality: Results of a pilot
project. Semiotica 121: 213240.
Danesi, Marcel. 2000. Semiotics in language education. Berlin: Mouton de Gruyter.
Danesi, Marcel. 2001. Layering processes in metaphorization. International Journal of Computing Anticipatory Systems 8: 157173.
Danesi, Marcel. 2002. The puzzle instinct: The meaning of puzzles in everyday life. Bloomington, IN: Indiana University Press.
Danesi, Marcel. 2003. Second language teaching: A view from the right side of the Brain. Dordrecht: Kluwer Academic Publishers.
Danesi, Marcel. 2004a. Poetic logic: The role of metaphor in thought, language, and culture.
Madison, WI: Atwood Publishing.
Danesi, Marcel. 2004b. The liar paradox and the Towers of Hanoi: The ten greatest math puzzles
of all time. Hoboken, NJ: John Wiley.
Danesi, Marcel. 2006. Alphabets and the principle of least effort. Studies in Communication
Sciences 6. 4762.
Danesi, Marcel. 2007. The quest for meaning: A guide to semiotic theory and practice. Toronto:
University of Toronto Press.
Danesi, Marcel. 2008. ProblemSolving in mathematics: A semiotic perspective for educators
and teachers. New York, NY: Peter Lang.
Danesi, Marcel. 2011. George Lakoff on the cognitive and neural foundation of mathematics.
Fields Notes 11 (3). 1420.
Danesi, Marcel. 2013. Discovery in mathematics: An interdisciplinary perspective. Munich:
Lincom Europa.
Unauthenticated
304 | Bibliography
Danly, M. and Shapiro, B. 1982. Speech prosody in Brocas aphasia. Brain and Language 16:
171190.
Davies, W. Vivien. 1988. Egyptian hieroglyphs. Berkeley: University of California Press.
Davis, Philip J. and Hers, Reuben. 1986. Descartes dream: The world according to mathematics.
Boston, MA: Houghton Mifflin.
Dawkins, Richard. 1976. The selsh gene. Oxford: Oxford University Press.
Dawkins, Richard. 1985. River out of Eden: A Darwinian view of life. New York, NY: Basic.
Dawkins, Richard. 1987. The blind watchmaker. Harlow: Longmans
Dawkins, Richard. 1998. Unweaving the rainbow: Science, delusion and the appetite for wonder. Boston, MA: Houghton Mifflin.
De Morgan, Augustus. 1847. Formal logic or the calculus of inference. London: Taylor and Walton.
De Souza, Clarisse S. 2005. The semiotic engineering of humancomputer interaction. Cambridge, MA: MIT Press.
Dehaene, Stanislas. 1997. The number sense: How the mind creates mathematics. Oxford: Oxford University Press.
Dehaene, Stanislas. 2004. Arithmetic and the brain. Current Opinion in Neurobiology 14: 218
224.
Dehaene, Stanislas., Piazza, Manuela, Pinel, Philippe, and Cohen, Laurent. 2003. Three parietal circuits for number processing. Cognitive Neuropsychology 20: 487506.
Denoual, Etienne and Lepage, Yves. 2005. BLEU in characters: Towards automatic MT evaluation in languages without word delimiters. Companion Volume to the Proceedings of the
Second International Joint conference on Natural Language Processing 8186.
Derbyshire, J. 2004. Prime obsession: Bernhard Riemann and his greatest unsolved problem in
mathematics. Washington, DC: Joseph Henry Press.
Derrida, Jacques. 1967. De la grammatologie. Paris: Minuit.
Descartes, Ren. 1637 [1996]. La gometrie. Paris: Presses Universitaires de France.
Descartes, Ren. 1641 [1986]. Meditations on rst xwphilosophy with selections from the objections and replies. Cambridge: Cambridge University Press.
Devlin, Keith J. 2000. The math gene: How mathematical thinking evolved and why numbers are
like gossip. New York, NY: Basic.
Devlin, Keith. 2005. The math instinct. New York, NY: Thunders Mouth Press.
Devlin, Keith. 2011. The man of numbers: Fibonaccis arithmetic revolution. New York, NY:
Walker and Company.
Dewdney, Andrew K. 1999. A mathematical mystery tour: Discovering the truth and beauty of
the cosmos. New York, NY: John Wiley and Sons.
Diamantaras, Konstantinos, Duch, Wlodek and Iliadis, Lazaros S. (eds.). 2010. Articial Neural
Networks ICANN 2010: 20th International Conference. New York, NY: Springer.
Diaz, Michele T., Barrett, Kyle M., and Hogstrom, Larson J. 2011. The inuence of sentence novelty and gurativeness on brain activity. Neuropsychologia 49: 320330.
Dirven, Ren and Verspoor, Marjolijn. 2004. Cognitive exploration of language and linguistics.
Amsterdam: John Benjamins.
Dobson, Annette and Black, Paul. 1979. Multidimensional scaling of some lexicostatistical
data. Mathematical Scientist 1979/4, 5561.
Dobson, Annette. 1969. Lexicostatistical grouping. Anthropological Linguistics 7, 216221.
Unauthenticated
Bibliography |
305
Doddington, George. 2002. Automatic evaluation of machine translation quality using ngram
cooccurrence statistics. Proceedings of the human language Technology conference (HLT),
San Diego, CA 128132.
Dormehl, Luke. 2014. The formula. New York, NY: Perigree.
Driver, Godfrey R. 1976. Semitic writing: From pictograph to alphabet. Oxford: Oxford University
Press.
Du Sautoy, M. 2004. The music of the primes: Bernhard Riemann and the greatest unsolved
problem in mathematics. New York, NY: HarperCollins.
Dyen Isidore. 1975. Linguistic subgrouping and lexicostatistics. The Hague, Mouton.
Dyen, Isidore (ed.). 1973. Lexicostatistics in genetic linguistics: Proceedings of the Yale conference, April 34, 1971. The Hague: Mouton.
Dyen, Isidore, James, A. T. and Cole, J. 1967. Language divergence and estimated word retention
rate. Language 43. 150171.
Dyen, Isidore, Kruskal, Joseph. and Black, Paul. 1992. An Indoeuropean classication, a lexicostatistical experiment. Transactions of the American Philosophical Society 82/5.
Dyen, Isidore. 1963. Lexicostistically determined borrowing and taboo. Language 39, 6066.
Dyen, Isidore. 1965. A lexicostatistical classication of the Austronesian languages. International Journal of American Linguistics, Memoir 19.
Eckman, Fred R., Moravcsik, Edith A., and Wirth, Jessica R. (eds.). 1983. Markedness. New York,
NY: Plenum.
Eco, Umberto. 1984. Semiotics and the philosophy of language. Bloomington, IN: Indiana University Press.
Eco, Umberto. 1992. Interpretation and overinterpretation. Cambridge: Cambridge University
Press, 1992).
Eco, Umberto. 1998. Serendipities: Language and lunacy, translated by William Weaver. New
York, NY: Columbia University Press.
Elk, Victor and Matras, Yaron. 2006. Markedness and language change: The Romani sample.
Berlin: Mouton de Gruyter.
Elwes, Richard. 2014. Mathematics 1001. Buffalo, NY: Firey.
Embleton, Sheila M. 1986. Statistics in historical linguistics. Bochum: Brockmeyer.
English, Lyn D. (ed.). 1997. Mathematical reasoning: Analogies, metaphors, and images. Mahwah, NJ: Lawrence Erlbaum Associates.
Erds, Paul. 1934. A theorem of Sylvester and Schur. Journal of the London Mathematical Society 9: 282288.
Ernest, Paul. 2010. Mathematics and metaphor: A response to elzabeth Mowat & Brent Davis.
Complicity: An International Journal of Complexity and Education 7: 98104
Euclid (1956). The thirteen books of Euclids elements, 3 volumes. New York, NY: Dover.
Evans, Merran, Hastings, Nicholas, and Peacock, Brian. 2000. Statistical distributions. New
York, NY: John Wiley.
Everett, Daniel. 2005. Cultural constraints on grammar and cognition in Pirah. Current Anthropology 46. 621624.
Eymard, Pierre, Lafon, Jean-Pierre, and Wilson, Stephen S. 2004. The number pi. New York, NY:
American Mathematical Society.
Fan-Pei, Gloria Yanga et al. 2013. Contextual effects on conceptual blending in metaphors: An
event-related potential study. Journal of Neurolinguistics 26: 312326.
Fauconnier, Gilles and Turner, Mark. 2002. The way we think: Conceptual blending and the
minds hidden complexities. New York, NY: Basic.
Unauthenticated
306 | Bibliography
Feldman, Jerome. 2006. From molecule to metaphor: A neural theory of language. Cambridge,
MA: MIT Press.
Ferrer i Cancho, Ramon and Sol, Ricard V. 2001. Two regimes in the frequency of words and the
origins of complex lexicons: Zipfs law revisited. Journal of Quantitative Linguistics 2001,
8, 165231.
Ferrer i Cancho, Ramon, Riordan, Oliver, and Bollobs, Bla. 2005. The consequences of Zipfs
law for syntax and symbolic reference. Proceedings of the Royal Society of London, Series B, Biological Sciences, 2005, 15. Royal Society of London.
Ferrer i Cancho, Ramon. 2005. The variation of Zipfs law in human language. European Physical Journal 2005, 44, 24957.
Ferrero, Guillaume. 1894. Linertie mentale et la loi du moindre effort. Revue Philosophique de
la France et de ltranger 37. 169182.
Fillmore, Charles J. 1968. The case for case. In E. Bach and R. T. Harms (eds.), Universals in
linguistic theory. London: Holt, Rinehart and Winston.
Findler, Nicholas V. and Viil, Heino. 1964. A few steps toward computer lexicometry. American
Journal of Computational Linguistics. 179.
Fischer, John L. 1958. Social inuences in the choice of a linguistic variant. Word 14. 4757.
Fleming, Harold C. 1973. Subclassication in HamitoSamitic. In Isidore Dyen (ed.), Lexicostatistics in genetic linguistics, 8588. The Hague: Mouton.
Flood, R. and Wilson, R. 2011. The great mathematicians: Unravelling the mysteries of the universe. London: Arcturus.
Fodor, Jerry A. 1975. The language of thought. New York, NY: Crowell.
Fodor, Jerry A. 1983. The modularity of mind. Cambridge, MA: MIT Press.
Fodor, Jerry A. 1987. Psychosemantics: The problem of meaning in the philosophy of mind.
Cambridge, MA: MIT Press.
Fortnow, Lance. 2013. The golden ticket: P, NP, and the search for the impossible. Princeton, NJ:
Foster, Donald. 2001. Author unknown: Tales of a literary detective. New York, NY: Holt.
Foucault, Michel. 1972. The archeology of knowledge, trans. by A. M. Sheridan Smith. New York,
NY: Pantheon.
Fox, Anthony. 1995. Linguistic reconstruction: An introduction to theory and method. Oxford:
Oxford University Press.
Fox, James J. 1974. Our ancestors spoke in pairs: Rotinese views of language, dialect and code.
In R. Bauman and J. Scherzer (eds.), Explorations in the ethnography of speaking, 6588.
Cambridge: Cambridge University Press.
Fox, James J. 1975. On binary categories and primary symbols. In R. Willis (ed.), The interpretation of symbolism, 99132. London: Malaby.
Frege, Gottlob. 1879. Begriffsschrift eine der arithmetischen nachgebildete Formelsprache des
reinen Denkens. Halle: Nebert.
Freiberger, Marianne and Thomas, Rachel. 2015. Numericon: A journey through the hidden lives
of numbers. New York, NY: Quercus.
Friedman, Thomas L. 2007. The world is at: A brief history of the twenty-rst century. New York:
Picador.
Gabelentz, Georg von der. 1901. Die Sprachwissenschaft; ihre Aufgaben, Methoden und bisherigen Ergebnisse. Leipzig: C. H. Tauchnitz.
Galilei, Galileo. 1638 [2001]. Dialogue concerning the two chief world systems, trans. by Stillman Drake. New York, NY: Modern Library.
Unauthenticated
Bibliography |
307
Gamkrelidze, Thomas V. and Ivanov, Vjaeslav V.. 1990. The early history of Indo-European
languages. Scientic American 262. 110116.
Ganesalingam, Mohan and Herbelot, Aurelie. 2006. Composing distributions: mathematical
structures and their linguistic interpretation. Computational Linguistics 1. 131.
Gardner, Howard. 1985. The minds new science: A history of the cognitive revolution. New York,
NY: Basic Books.
Gardner, Martin. 1961. The 2nd Scientic American book of mathematical puzzles. New York,
NY: Simon and Schuster.
Garnham, Alan. 1991. The mind in action: A personal view of cognitive science. London: Routledge.
Geeraerts, Dirk (ed.). 2006. Cognitive linguistics. Berlin: Mouton de Gruyter.
Gessen, Masha. 2009. Perfect rigor: A genius and the mathematical breakthrough of the century. Boston, MA: Houghton Mifflin Harcourt.
Ghyka, Matila. 1977. The geometry of art and life. New York, NY: Dover.
Gibbs, Raymond W. 1994. The poetics of mind: Figurative thought, language, and understanding. Cambridge: Cambridge University Press.
Gillings, Richard J. 1972. Mathematics in the time of the pharaohs. Cambridge, MA: MIT Press.
Gleason, Henry L., Jr. 1959. Counting and calculating for historical reconstruction. Anthropological Linguistics 2. 2232.
Gdel, Kurt. 1931. ber formal unentscheidbare Stze der Principia Mathematica und verwandter Systeme, Teil I. Monatshefte fr Mathematik und Physik 38: 173189.
Godel, Robert. 1957. Les sources manuscrites du Cours de linguistique gnrale de F. de
Saussure. Paris: Minard.
Godino, Juan D., Font, Vicenc, Wilhelmi, Miguel R., and Lurduy, Orlando. 2011. Why is the learning of elementary arithmetic concepts difficult? Semiotic tools for understanding the
nature of mathematical objects. Educational Studies in Mathematics 77: 247265.
Goetzfridt, Nicholas J. 2008. Pacic ethnomathematics: A bibliographic study. Honolulu, HI:
University of Hawaii Press.
Goldberg, Elkhonon and Costa, Louis D. 1981. Hemispheric differences in the acquisition of
descriptive systems. Brain and Language 14: 144173.
Gordon, Alison F. and Chris Pratt. 1998. Learning to be literate, 2nd ed. Oxford: Blackwell.
Graesser, A., Mio, J. and Millis, K. 1989. Metaphors in persuasive communication. In: D. Meutsch
and R. Viehoff (eds.), 131154, Comprehension and literary discourse: Results and problems of interdisciplinary approaches. Berlin: Mouton de Gruyter.
Gray, Russell D. and Quentin D. Atkinson. 2003. Languagetree divergence times support the
Anatolian theory of Indo-European origin. Nature 425. 435439.
Greenberg, Joseph H. 1966. Language universals. The Hague: Mouton.
Greimas, Algirdas J. 1966. Smantique structurale. Paris: Larousse.
Greimas, Algirdas J. 1970. Du sens. Paris: Seuil.
Greimas, Algirdas J. 1987. On meaning: Selected essays in semiotic theory, trans. by P. Perron
and F. Collins. Minneapolis, MN: University of Minnesota Press.
Grice, Paul. 1975. Logic and conversation. In P. Cole and J. Morgan (eds.), Syntax and semantics,
Vol. 3, 4158. New York, NY: Academic.
Gudschinsky, Sarah. 1956. The ABCs of lexicostatistics (glottochronology). Word, 12, 175210.
Guhe, Markus et al. 2011. A computational account of conceptual blending in basic mathematics. Cognitive Systems Research 12: 249265.
Unauthenticated
308 | Bibliography
Haarmann, Harald. 1990. Basic vocabulary and language contacts; the disillusion of glottochronology. Indogermanische Forschungen 95. 749.
Hadamard, Jacques. 1945. The psychology of invention in the mathematical eld. Princeton, NJ:
Haken, Wolfgang and Appel, Kenneth. 1977. The solution of the Four-Color-Map Problem. Scientic American 237: 108121.
Hales, Alfred W. and Jewett, Robert. 1963. Regularity and positional games. Transactions of the
American Mathematical Society 106: 222229.
Halliday, Michael A. K. 1966. Lexis as a linguistic level. Journal of Linguistics 2(1) 1966. 5767.
Halliday, Michael A. K. 1975. Learning how to mean: Explorations in the development of language. London: Arnold.
Halliday, Michael A. K. 1985. Introduction to functional grammar. London: Arnold.
Hallyn, Ferdinand. 1990. The poetic structure of the world: Copernicus and Kepler. New York, NJ:
Zone Books.
Hammer, Eric and Shin, SunJoo. 1996. Euler and the role of visualization in logic. In Seligman, J. and Westersthl, D. (eds.), Logic, language and computation: Volume 1, 271286.
Stanford, CA: CSLI Publications.
Hammer, Eric and Shin, SunJoo. 1998. Eulers visual logic. History and Philosophy of Logic 19:
129.
Hammer, Eric. 1995. Reasoning with sentences and diagrams. Notre Dame Journal of Formal
Logic 35: 7387.
Harel, Guershon and Sowder, Larry. 2007. Toward comprehensive perspectives on the learning
and teaching of proof. In: F. K. Lester (ed.), Second handbook of research on mathematics
teaching and learning, 805842. Charlotte, NC: Information Age Publishing.
Harris, Roy. 1993. The Linguistics Wars. Oxford: Oxford University Press.
Harris, Zellig. 1951. Methods in structural linguistics. Chicago, IL: University of Chicago Press.
Harris, Zellig. 1968. Mathematical structures of language. New York, NY: John Wiley.
Hartimo, Mirja (ed.) 2010. Phenomenology and mathematics. New York, NY: Springer.
Haspelmath, Martin. 2006. Against markedness (and what to replace it with). Journal of Linguistics 42. 2570.
Hatten, Robert S. 2004. Musical meaning in Beethoven: Markedness, correlation and interpretation. Bloomington, IN: Indiana University Press.
Havil, Julian. 2008. Impossible? Princeton, NJ: Princeton University Press.
Hayward, J. W. 1984. Perceiving ordinary magic. Boston, MA: Shambala.
Heath, Thomas L. 1949. Mathematics in Aristotle. Oxford: Oxford University Press.
Hegel, G. W. F. 1807. Phaenomenologie des Geistes. Leipzig: Teubner.
Heilman, Kenneth M., Scholes, R., and Watson, R. T. 1975. Auditory affective agnosia: Disturbed
comprehension of affective speech. Journal of Neurology, Neurosurgery and Psychiatry
38: 6972.
Hersh, Reuben. 1997. What is mathematics really? Oxford: Oxford University Press.
Hertz, Robert. 1973. The preeminence of the right hand: A study in religious polarity. In
R. Needham (ed.). Right and left, 2336. Chicago, IL: University of Chicago Press.
Hickok, Gregory, Bellugi, Ursula, and Klima, Edward S. 2001. Sign language in the brain. Scientic American 284 (6): 5865.
Hier, Daniel B. and Joni Kaplan. 1980. Verbal comprehension decits after right hemisphere
damage. Applied Psycholinguistics 1. 270294.
Unauthenticated
Bibliography |
309
Hilbert, David. 1931. Die Grundlagen der elementaren Zahlentheorie. Mathematische Annalen
104: 485494.
Hill, Theodore P. 1998. The rst digit phenomenon. American Scientist 86. 35863.
Hirst, Graeme. 1988. Resolving lexical ambiguity computationally with spreading activation
and Polaroid Words. In: S. L. Small, G. W. Cottrell, and M. K. Tanenhaus (eds.), Lexical ambiguity resolution: Perspectives from psycholinguistics, neuropsychology, and articial
intelligence, 73107. San Mateo, CA: Morgan Kaufmann Publishers.
Hjelmslev, Louis. 1939. Note sur les oppositions supprimables. Travaux de Cercle Linguistique
de Prague 8. 5157.
Hjelmslev, Louis. 1959. Essais linguistique. Copenhagen: Munksgaard.
Hjelmslev, Louis. 1963. Prolegomena to a theory of language. Madison, WI: University of Wisconsin Press.
Hobbes, Thomas. 1656 [1839]. Elements of philosophy. London: Molesworth.
Hockett, Charles F. 1960. The origin of speech. Scientic American 203. 8896.
Hockett, Charles F. 1967. Language, mathematics and linguistics. The Hague: Mouton.
Hoenigswald, Henry M. 1960. Language change and linguistic reconstruction. Chicago, IL:
University of Chicago Press.
Hofstadter, Douglas and Sander, Emanuel. 2013. Surfaces and essences: Analogy as the fuel
and re of thinking. New York, NY: Basic.
Hofstadter, Douglas. 1979. Gdel, Escher, Bach: An eternal golden braid. New York, NY: Basic.
Hoijer, Harry. 1956. Lexicostatistics: A critique. Language, 32, 4960.
Holm, Hans J. 2003. The proportionality trap. Or: What is wrong with lexicostatistical subgrouping. Indogermanische Forschungen 108. 3846.
Holm, Hans J. 2005. Genealogische Verwandtschaft. In R. Khler, G. Altmann, R. Piotrowski
(eds.), Quantitative Linguistik; ein internationales Handbuch. Berlin: Walter de Gruyter.
Holm, Hans J. 2007. The new arboretum of IndoEuropean Trees: Can new algorithms reveal the
phylogeny and even prehistory of IE? Journal of Quantitative Linguistics 14. 167214.
Hopper, Paul. 1998. Emergent grammar. In: Tomasello, M. eds. 1998. The new psychology of
language: Cognitive and functional approaches to language structure. Mahwah, NJ: Earlbaum, pp. 155176.
Houd, Olivier and Tzourio-Mazoyer, Nathalie (2003). Neural foundations of logical and mathematical cognition. Nature reviews Neuroscience 4: 507514.
Hubbard Edward M., Arman, A. C., Ramachandran V. S., and Boynton, G. M. 2005. Individual
differences among grapheme-color synesthetes: Brain-behavior correlations. Neuron 45:
975985.
Hubbard, Edward M., Diester, Ilka, Cantlon, Jessica, Ansari, Daniel, Opstal, Filip van, and
Troiani, Vanessa. 2008. The evolution of numerical cognition: From number neurons to
linguistic quantiers. Journal of Neuroscience 12. 1181911824.
Humboldt, Wilhelm von. 1836 [1988]. On language: The diversity of human language-structure
and its inuence on the mental development of mankind, P. Heath (trans.). Cambridge:
Cambridge University Press.
Hume, David. 1749 [1902]). An enquiry concerning human understanding. Oxford: Clarendon.
Husserl, Edmund 1970 [1891]. Philosophie der Arithmetik. The Hague: Nijhoff
Hutchins, John. 1997. From rst conception to rst demonstration: The nascent years of machine translation, 19471954. A chronology. Machine Translation 12. 195252.
Hyde, Daniel C. 2011. Two systems of non-symbolic numerical cognition. Frontiers in Human
Neuroscience. 10.3389/fnhum.2011.00150
Unauthenticated
310 | Bibliography
Hymes, Dell. 1960. Lexicostatistics so far. Current Anthropology 1. 344.

Hymes, Dell. 1971. On communicative competence. Philadelphia, PA: University of Pennsylvania
Press.
Isaacs E. B, Edmonds, C. J., Lucas, A., and Gadian D. G. 2001. Calculation difficulties in children
of very low birthweight: A neural correlate. Brain 124: 17011707.
Isacoff, Stuart. 2003. Temperament: How music became a battleground for the great minds of
Western civilization. New York, NY: Knopf.
Ivanov, Vjaeslav V. 1974. On antisymmetrical and asymmetrical relations in natural languages
and other semiotic systems. Linguistics 119. 3540.
Izard, Vronique. Pica, Pierre, Spelke, Elizabeth S., and Dehaene, Stanislas 2011. Flexible intuitions of Euclidean geometry in an Amazonian indigene group. PNAS 108: 97829787.
Jakobson, Roman (ed.). 1961. Structure of language and its mathematical aspects. New York,
NY: American Mathematical Association
Jakobson, Roman and Halle, Morris. 1956. Fundamentals of language. The Hague: Mouton.
Jakobson, Roman and Waugh, Linda. 1979. Six lectures on sound and meaning. Cambridge, MA:
MIT Press.
Jakobson, Roman, Fant, Gunnar, and Halle, Morris. 1952. Preliminaries to speech analysis.
Jakobson, Roman, Karcevskij, Serge, and Trubetzkoy, Nikolai S.. 1928. Proposition au premier
congrs international des linguistes: Quelles sont les mthodes les mieux appropries
un expos complet et pratique de la phonologie dune langue quelconque? Premier
congrs international des Linguistes, Propositions, 3639.
Jakobson, Roman. 1932. Zur Struktur des russischen Verbum. In Charisteria Guilelma Mathesio
Quinquagenario a Discipulis et Circuli Linguistici Pragensis Sodalibus Oblata, 7484.
Prague: Prazsky lingvistick.
Jakobson, Roman. 1936. Beitrag zur allgemeinen Kasuslehre: Gesamtbedeutungen der russischen Kasus. Travaux du Cercle Linguistique de Prague 6, 24488.
Jakobson, Roman. 1939. Observations sur le classement phonologique des consonnes. Proceedings of the Fourth international congress of Phonetic Sciences, 3441.
Jakobson, Roman. 1942. Kindersprache, Aphasie und allgemeine Lautgesetze. Uppsala:
Almqvist and Wiksell.
Jakobson, Roman. 1952. Preliminaries to speech analysis. Cambridge, MA: MIT Press.
Jakobson, Roman. 1956. Two aspects of language and two types of aphasic disturbance. In
R. Jakobson and M. Halle (eds.), Fundamentals of language, 383. The Hague: Mouton.
Jakobson, Roman. 1968. The role of phonic elements in speech perception. Zeitschrift fr
Phonetik, Sprachwissenschaft und Kommunikationsforschung 21. 920.
Jespersen, Otto. 1922. Language: Its nature, development and origin. London: Allen and Unwin.
JohnsonLaird, Philip N. 1983. Mental models. Cambridge, MA: Harvard University Press.
Johnson, George. 2013. Useful invention or absolute truth: What is math? In G. Kolata and
P. Hoffman (eds.), The New York Times book of mathematics, 38. New York, NY: Sterling.
Johnson, Mark. 1987. The body in the mind: The bodily basis of meaning, imagination and reason. Chicago, IL: University of Chicago Press.
Jones, Roger. 1982. Physics as metaphor. New York, NY: New American Library.
Kadvany, John. 2007. Positional value and linguistic recursion. Journal of Indian Philosophy 35.
487520.
Kammerer, D. 2014. Word classes in the brain: Implications of linguistic typology for cognitive
neuroscience. Cortex 132: 2751.
Unauthenticated
Bibliography |
311
Kamp, Hans. 1981), A theory of truth and semantic representation, in J. Groenendijk, T. Janssen,
and M. Stokhof (eds.), Formal methods in the study of language. Centre for Mathematics
and Computer Science, Amsterdam, 114.
Kant, Immanuel. 2011 [1790]. Critique of pure reason, trans. J. M. D. Meiklejohn. CreateSpace
Platform.
Kaplan, Robert and Kaplan, Ellen. 2007. Out of the labyrinth: Setting mathematics free. London:
Bloomsbury Press.
Kaplan, Robert and Kaplan, Ellen. 2011. Hidden harmonies: The lives and times of the
Pythagorean theorem. London: Bloomsbury Press.
Kasner, Edward and Newman, James. 1940. Mathematics and the imagination. New York, NY:
Simon and Schuster.
Kauffman, Louis K. 2001. The mathematics of Charles Sanders Peirce. Cybernetics and Human
Knowing 8: 79110.
Kemp, J. Alan (trans.). 1986. The Tekhne Grammatike of Dionysius Thrax. Amsterdam: John Benjamins.
Kendon, Simon and Creen, Malcolm. 2007. An introduction to knowledge engineering. New
York, NY: Springer.
Kennedy, J. M. 1984. Vision and metaphors. Toronto: Toronto Semiotic Circle.
Kennedy, J. M. 1993. Drawing and the blind: Pictures to touch. New Haven, CT: Yale University
Press.
Kennedy, J. M. and Domander, R. 1986. Blind people depicting states and events in metaphoric
line drawings. Metaphor and Symbolic Activity 1: 109126.
Kennedy, John M. 1999. Metaphor in pictures: Metonymy evokes classication. International
Journal of Applied Semiotics 1. 8398.
Kilgarriff, Adam. 2005. Language is never, ever, eve, random. Corpus Linguistics and Linguistic
Theory 12: 263275.
King, Margaret. 1992. Epilogue: On the relation between computational linguistics and formal
semantics. In Michael Rosner; Roderick Johnson. Computational linguistics and formal
semantics. Cambridge: Cambridge University Press.
King, Ruth. 1991. Talking gender: A nonsexist guide to communication. Toronto: Copp Clark
Pitman Ltd.
Kiryushchenko, Vitaly. 2012. The visual and the virtual in theory, life and scientic practice: The
case of Peirces Quincuncial map projection. In Mariana Bockarova, Marcel Danesi, and
Rafael Nez (eds.), Semiotic and cognitive science essays on the nature of mathematics,
6170. Munich: Lincom Europa.
Kochenderfer, Mykel J. 2015. Decision making under uncertainty. Cambridge, MA: MIT Press
Koehn, Phili 2010. Statistical machine translation. Cambridge: Cambridge University Press.
Khler, Reinhard, Altmann, Gabriel, and Grzybek, Peter (eds.). 2015. Quantitative linguistics.
Kolmogorov, Andrei N., 1933, Grundbegriffe der Wahrscheinlichkeitsrechnung, Ergebnisse der
Mathematik. Berlin: Springer.
Konnor, Melvin. 1991. Human nature and culture: Biology and the residue of uniqueness. In:
J. J. Sheehan and M. Sosna (eds.), The boundaries of humanity, pp. 103124. Berkeley, CA:
University of California Press.
Kornai , Andrs. 2008. Mathematical linguistics. New York, NY: Springer.
Kosslyn, Stephen M. 1983. Ghosts in the minds machine: Creating and using images in the
brain. New York, NY: W. W. Norton.
Unauthenticated
312 | Bibliography
Kramsch, Claire. 1998. Language and culture. Oxford: Oxford University Press.
Krawczyk, Daniel C. 2012. The cognition and neuroscience of relational reasoning. Brain Research 1428: 1323.
Kroeber Alfred L. and Chretien, Charles D. 1937. Quantitative classication of Indo-European
languages. Language 13. 83103.
Kronenfeld, David B., Bennardo, Giovanni, and de Munck, Victor C. (eds.). 2011. A companion to
cognitive anthropology. Chichester: Wiley-Blackwell.
Kruszewski, Mikolai. 1883 [1955]. Writings in general linguistics. Amsterdam: John Benjamins.
Kucera, Henry and Francis, W. Nelson. 1967. Computatonal analysis of present-day American
English. Providence, RI: Brown University Press.
Kuhn, Thomas S. 1970. The structure of scientic revolutions. Chicago, IL: University of Chicago
Press.
Kulacka, Agnieszka. On the nature of statistical language laws. In: Piotr Stalmaszczyk (ed.),
Philosophy of language and linguistics: Volume I, pp. 151168. Piscataway, NJ: Transaction
Publishers.
Kulpa, Zenon 2004. On diagrammatic representation of mathematical knowledge. In: A. Sperti,
G. Bancerek, and A. Trybulec (eds.), Mathematical knowledge management. New York, NY:
Springer.
Kuryowicz, Jerzy. 1927. Schwa indoeuropen et Hittite. Symbolae grammaticae in honorem
Ioannis Rozwadowski, Vol. 1, 95104. Cracow: Gebethner and Wolff.
Kurzweil, Ray. 2012. How to create a mind: The secret of human thought revealed. New York, NY:
Viking.
Labov, William. 1963. The social motivation of a sound change. Word 19. 273309.
Labov, William. 1967. The effect of social mobility on a linguistic variable. In S. Lieberson (ed.),
Explorations in sociolinguistics, 2345. Bloomington, IN: Indiana University Research
Center in Anthropology, Linguistics and Folklore.
Labov, William. 1972. Language in the inner city. Philadelphia, PA: University of Pennsylvania
Press.
Lachaud, Christian Michel. 2013. Conceptual metaphors and embodied cognition: EEG coherence reveals brain activity differences between primary and complex conceptual
metaphors during comprehension. Cognitive Systems Research 2223: 1226.
Lai, Vicky T., van Dam, Wessel, Conant, Lisa L., Binder, Jeffrey R. and Rutvik, H. Desai. 2015.
Familiarity differentially affects right hemisphere contributions to processing metaphors
and literals. Frontiers in Human Neuroscience, Volume 10.
Lakoff, George and Johnson, Mark. 1980. Metaphors we live by. Chicago, IL: Chicago University
Press.
Lakoff, George and Johnson, Mark. 1999. Philosophy in esh: The embodied mind and its challenge to western thought. New York, NY: Basic.
Lakoff, George and Nez, Rafael. 2000. Where mathematics comes from: How the embodied
mind brings mathematics into being. New York, NY: Basic Books.
Lakoff, George. 1970. Irregularity in syntax. New York, NY: Holt, Rhinehart, & Winston.
Lakoff, George. 1987. Women, re and dangerous things: What categories reveal about the
mind. Chicago, IL: University of Chicago Press.
Lakoff, George. 2012a. Explaining embodied cognition results. Topics in Cognitive Science 4.
773785.
Unauthenticated
Bibliography |
313
Lakoff, George. 2012b. The contemporary theory of metaphor. In Marcel Danesi and Sara
Maida-Nicol (eds.), Foundational texts in linguistic anthropology, 12871. Toronto: Canadian Scholars Press.
Lamb, Sydney. 1999. Pathways of the brain: The neurocognitive basis of labguage. Amsterdam:
John Benjamins.
Lambek, Joachim. 1958. The mathematics of sentence structure. American Mathematical
Monthly 65. 54170.
Langacker, Ronald W. 1987. Foundations of cognitive grammar. Stanford, CA: Stanford University Press.
Langacker, Ronald W. 1990. Concept, image, and symbol: The cognitive basis of grammar.
Langacker, Ronald W. 1999. Grammar and conceptualization. Berlin: Mouton de Gruyter.
Langer, Suzanne K. 1948. Philosophy in a new key. New York, NY: Mentor Books.
Laroche, Paula. 2007. On words: Insight into how our words work and dont. Oak Park, IL:
Marion Street Press.
Laurence, William L. 2013. Four-Color proof. In: G. Kolata and P. Hoffman (eds.), Book of mathematics, 135137. New York, NY: Sterling.
Leepik, Peet. 2008. Universals in the context of Juri Lotmans semiotics. Tartu: Tartu University
Press.
Lees, Robert. 1953. The basis of glottochronology. Language 29. 113127.
Lees, Robert. 1957. Review of Syntactic Structures. Language 33. 375407.
Lesh, Robert and Harel, Guershon. 2003. Problem solving, modeling, and local conceptual
development. Mathematical Thinking and Learning 5: 157.
LviStrauss, Claude. 1958. Anthropologie structurale. Paris: Plon.
LviStrauss, Claude. 1971. LHomme nu. Paris: Plon.
Levine, Robert. 1997. A geography of time: The temporal misadventures of a social psychologist
or how every culture keeps time just a little bit differently. New York, NY: Basic.
Li, Wentian. 1992. Random texts exhibit Zipfslawlike word frequency distribution. IEEE
Transactions on Information Theory 38. 18421845.
Libertus, M. E., Pruitt, L. B., Woldorff, M. G. and Brannon, E. M. 2009. Induced alpha-band oscillations reect ratio-dependent number discrimination in the infant brain. Journal of
Cognitive Neuroscience 21: 23982406.
Locke, John. 1690 [1975]. An essay concerning human understanding, ed. by P. H. Nidditch.
Oxford: Clarendon Press.
Lorrain, Franois. 1975. Rseaux sociaux et classications sociales. Paris: Hermann.
Lotman, Juri. 1991. Universe of the mind: A semiotic theory of culture. Bloomington, IN: Indiana
University Press.
Luhtala, Anneli. 2005. Grammar and philosophy in late anqituity. Amsterdam: John Benjamins.
Luque, Bartolo and Lacasa, Lucas. 2009. The rst digit frequencies of primes and Riemann zeta
zeros. Proceedings of the Royal Society A. 10: 1098.
Luria, Alexander R. 1947. Traumatic aphasia. The Hague: Mouton.
Lutosawski, Wincenty. 1890. Principes de stylomtrie. Revue des tudes grecques 41. 6181.
Macaulay, Ronald. 2009. Quantitative methods in sociolinguistics. New York, NY: Palgrave
Macmillan.
MacCormac, Eric. 1985. A cognitive theory of metaphor. Cambridge, MA: MIT Press.
MacCormick, John. 2012. Nine algorithms that changed the future. Princeton, NJ: Princeton
University Press.
Unauthenticated
314 | Bibliography
Mackenzie, Dana. 2012. The universe in zero words. London: Elwin Street Publications.
MacNamara, Olwyn. 1996. Mathematics and the sign. Proceedings of PME 20. 369378.
MacWhinney, Brian. 2000. Connectionism and language learning. In: M. Barlow and S. Kemmer
(eds.), Usage models of language, 121150. Stanford: Center for the Study of Language
and Information.
Mallory, James P. 1989. In search of the Indo-Europeans: Language, archaeology and myth.
London: Thames and Hudson.
Malmberg, Bertil. 1974. Langueformevaleur: Reexion sur trios concepts saussurienes.
Semiotica 18. 312.
Mandelbrot, Benoit. 1954. Structure formelle des textes et communication. Word 10. 127.
Mandelbrot, Benoit. 1977. The fractal geometry of nature. New York, NY: Freeman and Co.
Mansouri, Fethi. 2000. Grammatical markedness and information Processing in the acquisition
of Arabic [as] a second language. Munich: Lincom.
Maor, Eli. 1994. e: The story of a number. Princeton, NJ: Princeton University Press.
Maor, Eli. 2007. The Pythagorean theorem: A 4,000-year history. Princeton, NJ: Princeton University Press.
Marcus, Solomon and Vasiliu, Em. 1960. Mathmatique et phonologie: Thorie des graphes
et consonantisme de la langue roumaine. Revue de mathmatqiues pures et appliqu 5.
319340.
Marcus, Solomon. 1975. The metaphors and the metonymies of scientic (especially mathematical) language. Revue Roumaine de Linguistique 20, 535537.
Marcus, Solomon. 1980. The paradoxical structure of mathematical language. Revue Roumaine
de Linguistique 25, 359366.
Marcus, Solomon. 2003. Mathematics through the glasses of Hjelmslevs semiotics. Semiotica
145, 235246.
Marcus, Solomon. 2010. Mathematics as semiotics. In: Thomas A. Sebeok and Marcel Danesi
(eds.), Encyclopedic dictionary of semiotics, 3rd ed. Berlin: Mouton de Gruyter.
Marcus, Solomon. 2013. Mathematics between semiosis and cognition. In: Mariana Bockarova,
Marcel Danesi, and Rafael Nez (eds.), 99129. Semiotic and cognitive science essays on
the nature of mathematics. Munich: Lincom Europa.
Markov, Andrey A. 1906 [1971]. Extension of the limit theorems of probability theory to a sum
of variables connected in a chain. In R. Howard. Dynamic probabilistic systems, Volume 1:
Markov chains. New York, NY: John Wiley and Sons
Marr, David. 1982. Vision: A computational investigation into the human representation and
processing of visual information. New York, NY: W. H. Freeman.
Marr, David. 1982. Vision: A computational investigation into the human representation and
processing of visual information. New York, NY: W. H. Freeman.
MartnVide, Carlos and Mitrana, Victor (eds.). 2001. Where mathematics, computer science,
linguistics and biology meet. Dordrecht: Kluwer.
Martin, James M. 1990. A computational model of metaphor interpretation. Boston, MA: Academic.
Martinet, Andr. 1955. conomie des changements phontiques. Paris: Maisonneuve and
Larose.
Marx, Karl. 1953 [1858]. Grundrisse der Kritik der Politischen konomie. Berlin: Dietz
Maturana, Humberto R. and Varela, Francisco. 1973. Autopoiesis and cognition: The realization
of the living. Dordrecht: Reidel.
Unauthenticated
Bibliography |
315
McCarthy, John. 2001. A thematic guide to optimality theory. Cambridge: Cambridge University
Press.
McComb, Karen, Packer, Craig, and Pusey, Anne. 1994. Roaring and numerical assessment in
contests between groups of female lions, Panthera leo. Animal Behavior 47: 379387.
McCowan, Brenda, Hanser, Sean F., and Doyle, Laurance R. 1999, Quantitative tools for comparing animal communication systems: Information theory applied to Bottlenose dolphin
whistle repertoires. Animal Behaviour 62. 11511162.
McCulloch, Warren S. and Pitts, Walter. 1943. A logical calculus of the ideas immanent in nervous activity. Bulletin of Mathematical Biophysics 5: 115133.
McNeill, David. 1987. Psycholinguistics: A New Approach. New York, NY: Harper & Row.
Meluk, Igor. 2001. Linguistic theory: Communicative organization in natural language. Amsterdam: John Benjamins.
Menninger, Karl. 1969. Number words and number symbols: A cultural history of number. Cambridge, MA: MIT Press.
Merton, Robert K. and Barber, Elinor. 2003. The travels and adventures of serendipity: A study
in sociological semantics and the sociology of science. Princeton, NJ: Princeton University
Press.
Mettinger, Arthur. 1994. Aspects of semantic opposition in English. Oxford: Oxford University
Press.
Mill, James. 2001. Analysis Phenomena Of Human Mind. Thoemmes Facsimile Edition.
Miller, George A. 1956. The magical number seven, plus or minus two: Some limits on our capacity for processing information. Psychological Review 63, 8197.
Miller, George A. 1981. Language and speech. New York, NY: W. H. Freeman.
Miller, George A. and Newman, E. B.. 1958. Tests of a statistical explanation of the rank
frequency relation for words in written English. American Journal of Psychology 1958, 71,
20918.
Miller, Jon F. 1981. Eliciting procedures for language. In J. F. Miller (ed.), Assessing language
production in children. London: Arnold.
Mitchell, W. J. T. and Davidson, Arnold I. (eds.). 2007. The late Derrida. Chicago, IL: University of
Chicago Press.
Montague, Richard. 1974. Formal philosophy: selected papers of Richard Montague / ed. and
with an introd. by Richmond H. Thomason. New Haven, CT: Yale University Press.
Monti, Martin M. and Osherson, Daniel N. 2012. Logic, language and the brain. Brain Research
1428: 3342.
Morrill, Glyn. 2010. Categorial grammar: Logical syntax, semantics, and processing. Oxford
University Press.
Morris, Charles. 1938. Foundations of the theory of signs. Chicago, IL: University of Chicago
Press.
Morrow, Glenn R. 1970. A commentary on the First Book of Euclids Elements. Princeton, NJ:
Princeton University Press
Moseley, R. L. and Pulvermller F. 2014. Nouns, verbs, objects, actions, and abstractions: Local
fMRI activity indexes semantics, not lexical categories. Brain and Language 132: 2842.
Mowat, Elizabetg and Davis, Brent. 2010. Interpreting embodied mathematics using network
theory: Implications for mathematics education. Complicity: An International Journal of
Complexity and Education 7: 131.
Mller, Cornelia. 2008. Metaphors dead and alive, sleeping and waking: A dynamic view.
Chicago, IL: University of Chicago Press.
Unauthenticated
316 | Bibliography
Musser, Gary L., Burger, William F., and Peterson, Blake E. 2006. Mathematics for elementary
teachers: A contemporary approach. Hoboken, NJ: John Wiley.
Nadeau, R. L. 1991. Mind, machines, and human consciousness. Chicago, IL: Contemporary
Books.
Nagao, Makoto. 1984. A framework of a mechanical translation between Japanese and English
by analogy principle. In A. Elithorn and R. Banerji (eds.), Articial and human intelligence.
Oxford: Elsevier.
Nave, Ophir, Neuman, Yair, Howard, D., and Perslovsky, L. 2014. How much information should
we drop to become intelligent? Applied Mathematics and Computation 245: 261264.
Needham, Rodney. 1973. Right and left. Chicago, IL: University of Chicago Press.
Neisser, Ulrich. 1967. Cognitive psychology. Englewood Cliffs, NJ: Prentice-Hall.
Neuman, Yair, Assaf, Dan, Cohen, Yohai, Last, Mark, Argamon, Shlomo, Newton, Howard, and
Frieder, Ophir. 2013. Metaphor identication in large texts corpora. PLoS ONE 8: e62343.
Neuman, Yair. 2007. Immune memory, immune oblivion: A lesson from Funes the memorious.
Progress in Biophysics and Molecular Biology 92: 258267.
Neuman, Yair. 2014. Introduction to computational cultural psychology. Cambridge: Cambridge
University Press.
Neumann John von. 1958. The computer and the brain. New Haven, CT: Yale University Press.
Newcomb, Simon. 1881. Note on the frequency of use of the different digits in natural numbers.
American Journal of Mathematics 4: 3940.
Newell, Allen. 1991. Metaphors for mind, theories of mind: Should the humanities mind? In:
J. J. Sheehan and M. Sosna (eds.), The boundaries of humanity, pp. 158197. Berkeley, CA:
University of California Press.
Nguyen, Hoang Long, Nguyen, Trung Duc and Hwang, Dosam. 2015. KELabTeam: A statistical approach on gurative language sentiment analysis in Twitter. Proceedings of the
9th International Workshop on Semantic Evaluation (SemEval 2015), pages 679683.
Denver, CO, June 45.
Nielsen, Michael. (2012). Reinventing discovery: The new era of networked science. Princeton,
NJ: Princeton University Press.
Nietzsche, Friedrich 1873 [1979]. Philosophy and truth: Selections from Nietzsches notebooks
of the early 1870s. Atlantic Heights, NJ: Humanities Press.
Nirenburg, Sergei. 1987. Machine translation: theoretical and methodological issues. Cambridge: Cambridge University Press.
Nth, Winfried. 1990. Handbook of semiotics. Bloomington, IN: Indiana University Press.
Nowak, Martin A. 2000. The basic reproductive ratio of a word, the maximum size of a lexicon.
Journal of Theoretical Biology 204. 179189.
Nez, Rafael, Edwards, L. D., and Matos, Filipe J. 1999. Embodied cognition as grounding for
situatedness and context in mathematics education. Educational Studies in Mathematics
39, 4565.
OShea, Donal. 2007. The Poincar Conjecture. New York, NY: Walker.
Obler, Loraine K. and Gjerlow, Kris. (1999). Language and the brain. Cambridge: Cambridge
University Press.
Ogden, Charles K. 1932. Opposition: A linguistic and psychological analysis. London: Paul,
Trench, and Trubner.
Ogden, Charles K. and Richards, Ivor A. 1923. The meaning of meaning. London: Routledge and
Kegan Paul.
Unauthenticated
Bibliography |
317
Okrent, Arika. 2009. In the land of invented languages: Esperanto rock stars, Klingon poets,
Loglan lovers, and the mad dreams who tried to build a perfect language. New York:
Spiegel and Grau.
Osborne, Thomas M. 2014. Human action in Thomas Aquinas, John Duns Scotus, and William of
Ockham. Washington, DC: The Catholic University of America Press.
Osgood, Charles E., Suci, George J., and Tannenbaum, Percy H. 1957. The measurement of
meaning. Urbana, IL: University of Illinois Press.
Otte, Michael. 1997. Mathematics, semiotics, and the growth of social knowledge. For the
Learning of Mathematics 17. 4754.
Papadimitriou, Christos H. and Steiglitz, Kenneth. 1998. Combinatorial optimization: Algorithms and complexity. New York, NY: Dover.
Papineni, Kishore, Roukos, Salim, Ward, Todd, and Zhu, Wei-Jing. 2002. BLEU: A method for
automatic evaluation of machine translation, Proceedings of the 40th Annual Meeting of
the Association for the computational linguistics (ACL), Philadelphia, July 2002, 311318.
Park, Hye Sook. 2000. Markedness and learning principles in SLA: Centering on acquisition of
relative clauses. Journal of PanPacic Association of Applied Linguistics 4. 87114.
Parker, Kelly A. 1998. The continuity of Peirces thought. Nashville, TN: Vanderbilt University
Press.
Parsons, Talcott and Bales, Robert. 1955. Family, socialization, and interaction process. Glencoe, IL: Free Press.
Partee, Barbara, Meulen, Alice Ter, and Wall, Robert. 1990. Mathematical methods in linguistics. New York, NY: Springer.
Partee, Barbara. 1988. Semantic facts and psychological facts. Mind and Language 3. 4352.
Passy P. 1890. tude sur les changements phontiques et leurs caractres gnraux. Paris:
Firmin-Didot.
Pavlov, Ivan. 1902. The work of digestive glands. London: Griffin.
Peano, Giuseppe. 1973. Selected works of Giuseppe Peano, H. Kennedy, ed. and trans. London:
Allen and Unwin.
Peirce, Charles S. 1923. Chance, love, and logic. New York, NY: Harcourt, Brace.
Peirce, Charles S. 19311958. Collected papers of Charles Sanders Peirce, Vols. 18, C. Hartshorne and P. Weiss (eds.). Cambridge, MA: Harvard University Press.
Pennebaker, James W. 2011. The secret life of pronouns. London: Bloomsbury Press.
Penrose, Roger. 1989. The emperors new mind. Cambridge: Cambridge University Press.
Perline, Richard. 1996. Zipfs law, the central limit theorem, and the random division of the unit
interval. Physical Review 54. 220223.
Pesci, Angela. 2003. Could metaphorical discourse be useful for analysing and transforming individuals relationship with mathematics? The Mathematics Education into the 21st Century
Project: Proceedings of the International Conference, 224230. Brno, Czech Republic,
September 2003.
Petty, William. 2010. Natural and political observations, mentioned in a following index, and
made upon the bills of mortality by John Graunt, citizen of London; with reference to the
government (1662). EEBO Editions, ProQuest (December 13, 2010)
Piaget, Jean. 1923. Le langage et la pense chez lenfant. Neuchtel: Delachaux et Niestl.
Piaget, Jean. 1936. Lintelligence avant le langage. Paris: Flammarion.
Piaget, Jean. 1945. La formation du symbole chez lenfant. Neuchtel: Delachaux et Niestl.
Piaget, Jean. 1952. The childs conception of number. London: Routledge and Kegan Paul.
Piaget, Jean. 1955. The Language and thought of the child. Cleveland: Meredian.
Unauthenticated
318 | Bibliography
Piaget, Jean. 1969. The childs conception of the world. Totowa: Littleeld, Adams and Company.
Pike, Kenneth. 1954. Language in relation to a unied theory of the structure of human behavior. The Hague: Mouton.
Pinker, Stephen. 1990. Language acquisition. In: D. N. Osherson and H. Lasnik (eds.), Language: An invitation to cognitive science, 191241. Cambridge, Mass.: MIT Press.
Pinker, Stephen. 1994. The language instinct: How the mind creates language. New York, NY:
William Morrow.
Pollio, H. and Burns, B. 1977. The anomaly of anomaly. Journal of Psycholinguistic Research 6:
247260.
Pollio, H. and Smith, M. 1979. Sense and nonsense in thinking about anomaly and metaphor.
Bulletin of the Psychonomic Society 13: 323326.
Pollio, H., Barlow, J., Fine, H., and Pollio, M. 1977. Psychology and the poetics of growth: Figurative language in psychology, psychotherapy, and education. Hillsdale, NJ: Lawrence
Erlbaum Associates.
Plya, George. 1921. ber eineAufgabe der Wahrscheinlichkeitsrechnung betreffend die Irrfahrt
im Strassennetz. Mathematische Annalen 84: 149160.
Popper, Karl. 1935 [2002]. The logic of scientic discovery. London: Routledge.
Popper, Karl. 1963. Conjectures and refutations. London: Routledge and Keagan Paul.
Pos, Hendrik. 1938. La notion dopposition en linguistique. XIe Congrs International de Psychologie, 24647.
Pos, Hendrik. 1964. Perspectives du structuralisme. In tudes phonologiques dedies la
mmoire de M. le Prince K. S. Trubetzkoy, 7178. Prague: Jednota Ceskych Mathematiku
Fysiku.
Posamentier, Alfred S. 2004. Pi: A biography of the worlds most mysterious number. New York,
NY: Prometheus.
Posamentier, Alfred S. and Lehmann, Ingmar. 2007. The (fabulous) Fibonacci numbers.
Amherst, NY: Prometheus.
Pottier, Bernard (1974. Linguistique gnrale. Paris: Klincksieck.
Prat, Chantel S. 2012. An fMRI investigation of analogical mapping in metaphor comprehension: The inuence of context and individual cognitive capacities on processing demands.
Journal of Experimental Psychology, Learning, Memory, and Cognition 38. 282294.
Presmeg, Norma C. 1997. Reasoning with metaphors and metonymies in mathematics learning.
In L. D. English (ed.), Mathematical reasoning: Analogies, metaphors, and images, 267
280. Mahwah, NJ: Lawrence Erlbaum.
Presmeg, Norma C. 2005. Metaphor and metonymy in processes of semiosis in mathematics
education. In J. Lenhard and F. Seeger (eds.), Activity and sign, 105116. New York, NY:
Springer.
Prince, Alan and Smolensky, Paul. 2004. Optimality theory: Constraint interaction in generative
grammar. Oxford: Blackwell.
Putnam, Hilary, 1961. Brains and Behavior, paper presented at the American Association for
the Advancement of Science, Section L (History and Philosophy of Science), meeting,
December 27, 1961.
Quirk, Randolph. 1960. Towards a description of English usage. Transactions of the Philological
Society. 1960. 4061.
Radford, Louis. 2010. Algebraic thinking from a cultural semiotic perspective. Research in
Mathematics Education 12: 119.
Unauthenticated
Bibliography |
319
Radford, Luis and Grenier, Monique. 1996. On dialectical relationships between signs and
ideas. Proceedings of PME 20, 179186.
Raimi, Ralph A. 1969. The peculiar distribution of rst digits. Scientic American 221. 109119.
Raju, C. K. 2007. Cultural foundations of mathematics. Delhi: Pearson Longman.
Ramachandran, Vilayanur S. 2011. The telltale brain: A neuroscientists quest for what makes
us human. New York, NY: Viking.
Reed, David. 1994. Figures of thought: Mathematics and mathematical texts. London: Routledge.
Reining, Astrid and Lnneker-Rodman, Birte. 2007. Corpus-driven metaphor harvesting. In:
Proceedings of the HLT/NAACL-07 Workshop on Computational Approaches to Figurative
Language, 512, Rochester, NY.
Renfrew, Colin, McMahon, April, and Trask, Larry (eds.). 2000) Time depth in historical linguistics. Cambridge, England: The McDonald Institute for Archaeological Research.
Renfrew, Colin. 1988. Archaeology and language: The puzzle of Indo-European origins. Cambridge: Cambridge University Press.
Richards, Ivor A. 1936. The philosophy of rhetoric. Oxford: Oxford University Press.
Richeson, David S. 2008. Eulers gem: The polyhedron formula and the birth of topology.
Princeton, NJ: Princeton University Press.
Ridley, Dennis R. and Gonzales, Emilia A. 1994. Zipfs law extended to small samples of adult
speech. Perception and Motor Skills 1994, 79, 1534.
Rieux, Jacques and Rollin, Bernard E. 1975. General and rational grammar: The Port-Royal
grammar. The Hague: Mouton.
Ringe, Donald, Warnow, Tandy, and Taylor, Ann. 2002. Indo-European and computational
cladistics. Transactions of the Philological Society 100. 59129.
Roark, Brian and Sproat, Richard W. 2007. Computational approaches to morphology and syntax. Oxford University Press.
Roberts, Don D. 2009. The existential graphs of Charles S. Peirce. The Hague: Mouton.
Roberts, Royston M. 1989. Serendipity: Accidental discoveries in science. New York, NY: John
Wiley.
Robins, Robert H. 1990. Leibniz, Humboldt and comparative linguistics. In: Tullio De Mauro and
Lia Formigari (eds.), Leibniz, Humboldt, and the origins of comparativism, pp. 85102.
Amsterdam: John Benjamins.
Robinson, Abraham. 1974. Non-standard analysis. Princeton, NJ: Princeton University Press.
Robinson, Andrew. 1995. The story of writing. London: Thames and Hudson.
Rochefoucauld, Franois, Duc de la. 1665 [2006]. Maxims. New York, NY: Dover.
Rockmore, D. 2005. Stalking the Riemann Hypothesis: The quest to nd the hidden law of prime
numbers. New York, NY: Vintage.
Rommetveit, Ragnar. 1991. Psycholinguistics, hermeneutics, and cognitive science. In G. Appel and H. W. Dechert (eds.), A case for psycholinguistic cases, 115. Amsterdam: John
Benjamins.
Rosenblatt, Frank. 1957. The perceptron, a perceiving and recognizing automaton Project Para.
Ithaca, NY: Cornell Aeronautical Laboratory.
Ross, Alan S. C. 1950. Philological probability problems. Journal of the Royal Statistical Society,
Series B 12. 1959
Ross, Elliotl D. and Mesulam, Marek Marsel. 1979. Dominant language functions of the right
hemisphere: Prosody and emotional gesturing. Archives of Neurology 36: 144148.
Rotman, Brian. 1988. Towards a semiotics of mathematics. Semiotica 72. 135.
Unauthenticated
320 | Bibliography
Rotman, Brian. 1993. Signifying nothing: The semiotics of zero. Stanford, CA: Stanford University Press.
Rousseau, Ronald and Zhang, Qiaoqiao. 1992. Zipfs data on the frequency of Chinese words
revisited. Scientometrics 24. 201220.
Rumelhart David E. and McClelland, James L. (eds.) (1986). Parallel distributed processing.
Russell, Bertrand and Alfred N. Whitehead. 1913. Principia mathematica. Cambridge: Cambridge University Press.
Russell, Bertrand. 1903. The principles of mathematics. London: Allen and Unwin.
Sabbagh, K. 2004. The Riemann Hypothesis: The greatest unsolved problem in mathematics.
New York, NY: Farrar, Strauss & Giroux.
Saddock, Jerrold M. 2012. The modular architecture of grammar. Cambridge: Cambridge University Press.
Samoyault, Tiphaine. 1988. Alphabetical order: How the alphabet began. New York, NY: Viking.
Sandri, G. 2004. Does computation provide a model for creativity? An epistemological perspective in neuroscience. Journal of Endocrinological Investigation 27: 922.
Sankoff, David. 1970. On the rate of replacement of wordmeaning relationships. Language 46.
564569.
Sapir, Edward. 1921. Language. New York, NY: Harcourt, Brace, and World.
Saussure, Ferdinand de. 1879. Mmoire sur le systme primitif des voyelles dans les langues
indoeuropennes. Leipzig: Vieweg.
Saussure, Ferdinand de. 1916. Cours de linguistique gnrale. Ed. Charles Bally and Albert
Sechehaye. Paris: Payot.
Schank, Roger C. 1980. An articial intelligence perspective of Chomskys view of language.
The Behavioral and Brain Sciences 3. 3542.
Schank, Roger C. 1984. The cognitive computer. Reading, MA: Addison-Wesley.
Schank, Roger C. 1991. The connoisseurs guide to the mind. New York, NY: Summit.
Schiffer, Stephen 1987. Remnants of meaning. Cambridge, MA: MIT Press.
Schlegel, Friedrich von. 1808 [1977]. ber die Sprache und Weisheit der Indier: Ein Beitrag zur
Begrndung der Altertumskunde. Amsterdam: John Benjamins.
SchmandtBesserat, Denise. 1978. The earliest precursor of writing. Scientic American 238.
509.
SchmandtBesserat, Denise. 1992. Before writing, 2 vols. Austin, TX: University of Texas Press.
Schmidt-Snoek, Gwenda L., Drew, Ashley R., Barile, Elizabeth C., and Aguas, Stephen J. 2015.
Auditory and motion metaphors have different scalp distributions: A ERP study. Frontiers
in Human Neuroscience, Volume 9.
Schmidt, Gwenda L. and Seger, Carol A. 2009. Neural correlates of metaphor processing: the
roles of gurativeness, familiarity and difficulty. Brain and Cognition 71: 375386.
Schneider, Michael S. 1994. Constructing the universe: The mathematical archetypes of nature,
art, and science. New York, NY: Harper Collins.
Schooneveld, Cornelius H. van. 1978. Semantic transmutations. Bloomington, IN: Physsardt.
Schuster, Peter. 2001. Relevance theory meets markedness: Considerations on cognitive effort
as a criterion for markedness in pragmatics. New York, NY: Peter Lang.
Scott, Michael L. 2009. Programming language pragmatics. Oxford: Elsevier
Searle, John R. 1984. Minds, brain, and science. Cambridge, MA: Harvard University Press.
Sebeok, Thomas A. and Danesi, Marcel. 2000. The forms of meaning: Modeling systems theory
and semiotics. Berlin: Mouton de Gruyter.
Unauthenticated
Bibliography |
321
Sebeok, Thomas A. and Umiker-Sebeok, Jean. 1980. You know my method: A juxtaposition of
Charles S. Peirce and Sherlock Holmes. Bloomington, IN: Gaslight Publications.
Segerstrle, Ullica. 2000. Defenders of the truth: The battle for science in the sociobiology
debate and beyond. Oxford: Oxford University Press.
Selin, Helaine. 2000. Mathematics across cultures. Dordrecht: Kluwer.
Selvin, Steven. 1975. A problem in probability (letter to the editor). American Statistician 29: 67
Semenza C, Delazer M, Bertella L, Gran A, Mori I, Conti FM, Pignatti R, Bartha L, Domahs F,
Benke T, Mauro A. 2006. Is math lateralised on the same side as language? Right hemisphere aphasia and mathematical abilities. Neurosci Lett. 2006 Oct 9;406(3):2858.
Senechal, Marjorie. 1993. Mathematical structures. Science 260. 11701173.
Shannon, Claude E. 1948. A mathematical theory of communication. Bell Systems Technical
Journal 27 (1948): 379423.
Shannon, Claude E. 1951. Prediction and entropy of printed English. Bell Systems Technological
Journal 1951, 30, 5064.
Sheehan, J. J. 1991. Coda. In: J. J. Sheehan and M. Sosna (eds.), The boundaries of humanity,
259265. Berkeley, CA: University of California Press.
Shin, Soon-Joo. 1994. The logical status of diagrams. Cambridge: Cambridge University Press.
Shorser, Lindsey. 2012. Manifestations of mathematical meaning. In: Mariana Bockarova, Marcel Danesi, and Rafael Nez (eds.), 295315. Semiotic and cognitive science essays on
the nature of mathematics. Munich: Lincom Europa.
Shutova, Ekaterina. 2010. Automatic metaphor interpretation as a paraphrasing task. In: Proceedings of NAACL 2010, 10291037, Los Angeles, CA.
Silva, Gabriel A. 2011. The need for the emergence of mathematical neuroscience: Beyond computation and simulation. Computational Neuroscience 5: 51.
imic, Jelena and Vuk, Damir. 2010. Machine translation in practice. Proceedings of the
21st central European conference on information and intelligent systems, 415419.
Varadin, Croatia.
Singh, Simon. 1997. Fermats enigma: The quest to solve the worlds greatest mathematical
problem. New York, NY: Walker and Co.
Sjoberg, Andree and Sjoberg, Gideon. 1956. Problems in glottochronology. American Anthropologist 58. 296308.
Skemp, Richard R. 1971. The psychology of learning mathematics. Harmondsworth: Penguin.
Smith, Kathleen, W., Balkwill, Laura-Lee, Vartanian, Oshin, and Goel, Vinod. 2015. Syllogisms delivered in an angry voice lead to improved performance and engagement of a
different neural system compared to neutral voice. Frontiers in Human Neuroscience 10
(10.3389/fnhum.2015.00222).
Smolin, Lee. 2013. Time reborn: From the crisis in physics to the future of the universe. Boston,
MA: Houghton Mifflin Harcourt.
Smullyan, Raymond. 1997. The riddle of Scheherazade and other amazing puzzles, ancient and
modern. New York, NY: Knopf.
Speelman, Dirk. 2014. Logistic regression: A conrmatory technique for comparisons in corpus
linguistics. Amsertdam: John Benjamins.
Sperber, Dan and Wilson, Deirdre. 1986. Relevance, communication, and cognition. Cambridge,
MA: Harvard University Press.
Sperry, Roger W. 1968. Hemisphere disconnection and unity in conscious awareness. American
Psychologist 23: 723733.
Unauthenticated
322 | Bibliography
Sperry, Roger W. 1973. Lateral specialization of cerebral function in the surgically separated
hemisphere. In: P. J. Vinken and G. W. Bruyn (eds.), The psychophysiology of thinking, 273
289. Amsterdam: North Holland.
Stachowiak, F., Huber, W., Poeck, K., and Kerschensteiner, M. 1977. Text comprehension in
aphasia. Brain and Language 4: 177195.
Starostin, Sergei. 1999. Methodology of longrange comparison. In Vitaly Shevoroshkin and
Paul J. Sidwell (eds.), Historical linguistics and lexicostatistics, 6166. Melbourne.
Steen, G. J., Dorst, A. G., Herrmann, J. B., Kaal, A. A. and Krennmayr, T. 2010. Metaphor in usage.
Cognitive Linguistics 21: 765796.
Steen, Gerard J. 2006. Finding metaphor in grammar and usage. Amsterdam: John Bejamins.
Steenrod, Norman, Halmos, Paul, Schiffer, Menahem N., and. Dieudonn, Jean A. 1973. How to
write mathematics. New York, NY: Springer.
Stewart, Ian. 1995. Natures numbers. New York, NY: Basic Books.
Stewart, Ian. 2008. Taming the innite. London: Quercus.
Stewart, Ian. 2013. Visions of innity. New York, NY: Basic Books.
Stjernfelt, Frederik. (2007). Diagrammatology: An investigation on the borderlines of phenomenology, ontology, and semiotics. New York, NY: Springer.
Swadesh, Morris. 1951. Diffusional cumulation and archaic residue as historical explanations.
Southwestern Journal of Anthropology 7, 121.
Swadesh, Morris. 1955. Towards greater accuracy in lexicostatistic dating. International Journal
of American Linguistics 21. 121137.
Swadesh, Morris. 1959. Linguistics as an instrument of prehistory. Southwestern Journal of
Anthropology 15. 2035.
Swadesh, Morris. 1971. The origins and diversication of language. Chicago, IL: AldineAtherton.
Sweet Henry. 1888. A history of English sounds from the earliest period. Oxford: Clarendon.
Tagliamonte, Sali. 2006. Analysing sociolinguistic variation. Cambridge: Cambridge University
Press.
Tall, David. 2013. How humans learn to think mathematically. Cambridge: Cambridge University
Press.
Tanaka-Ishii, Kumiko and Ishii, Yuichiro. 2008. Sign and the lambda term. Semiotica 169. 123
148.
Tanaka-Ishii, Kumiko and Ishii. 2007, Yuichiro. Icon, index, symbol and denotation, connotation, metasign. Semiotica 166. 124135.
Tarski, Alfred. 1933 [1983]. Logic, semantics, metamathematics, Papers from 1923 to 1938, ed.
John Corcoran. Indianapolis, IN: Hackett Publishing Company.
Tauli Valter. 1958. The structural tendencies of languages. Helsinki:
Taylor, Richard and Andrew Wiles. 1995. Ring-theoretic properties of certain Hecke algebras.
Annals of Mathematics 141. 553572.
Teraia, A. and Nakagawa, M. 2012. A corpus-based computational model of metaphor understanding consisting of two processes. Cognitive Systems Research 1920: 3038.
Thibault, Paul J. 1997. ReReading Saussure: The dynamics of signs in social life. London: Routledge.
Thom, Ren. 1975. Structural stability and morphogenesis: An outline of a general theory of
models. Reading: Benjamin.
Thom, Ren. 2010. Mathematics. In: Thomas A. Sebeok and Marcel Danesi (eds.), Encyclopedic
dictionary of semiotics, 3rd ed. Berlin: Mouton de Gruyter.
Unauthenticated
Bibliography |
323
Thomason, Sarah Grey and Kaufman, Terrence. 1988. Language contact, creolization, and genetic linguistics. Berkeley, CA: University of California Press.
Thomson, William and Schumann, Edward. 1987. Interpretation of statistical evidence in criminal trials. Law and Human Behavior 11: 167187.
Tiersma, Peter M. 1982. Local and general markedness. Language 58. 832849.
Titchener, Edward B. 1910. A textbook of psychology. Delmar: Scholars Facsimile Reprints.
Tomic, Olga M. (ed.). 1989. Markedness in synchrony and diachrony. Berlin: Mouton de Gruyter.
Toni, R., Spaletta, G., Casa, C. D., Ravera, S., and Sandri, G. 2007. Computation and brain processes, with special reference to neuroendocrine systems. Acta Biomedica78: 6783.
Trubetzkoy, Nikolai S. 1936. Essaie dune thorie des oppositions phonologiques. Journal de
Psychologie 33. 518.
Trubetzkoy, Nikolai S. 1939. Grundzge der Phonologie. Travaux du Cercle Linguistique de
Prague 7 (entire issue).
Trubetzkoy, Nikolai S. 1968. Introduction to the principles of phonological description. The
Hague: Martinus Nijhoff.
Trubetzkoy, Nikolai S. 1975. Letters and notes, ed. R. Jakobson. The Hague: Mouton.
Turing, Alan. 1936. On computable numbers with an application to the Entscheidungs problem.
Proceedings of the London Mathematical Society 42: 230265.
Turing, Alan. 1950 [1963]. Computing machinery and intelligence. In: E. A. Feigenbaum and
J. Feldman (eds.), Computers and thought, 123134. New York, NY: McGraw-Hill.
Turner, Mark 2005. Mathematics and narrative. thalesandfriends.org/en/papers/pdf/
turnerpaper.pdf.
Turner, Mark. 2012. Mental packing and unpacking in mathematics. In: Mariana Bockarova,
Marcel Danesi, and Rafael Nez (eds.), Semiotic and cognitive science essays on the
nature of mathematics, 123134. Munich: Lincom Europa.
Tweedie, Fiona J., Singh, S., and Holmes, David I. 1996. Neural network applications in stylometry: The Federalist Papers. Computers and the Humanities 30: 110.
Tymoczko, Thomas. 1978. The Four-Color Problem and its philosophical signicance. Journal of
Philosophy 24: 5783.
Uexkll, Jakob von. 1909. Umwelt und Innenwelt der Tierre. Berlin: Springer.
Van de Walle, Jrgen and Willems, Klaas. 2007. Zipf, George Kingsley (19021950). In Encyclopedia of languages and linguistics, 2nd ed., K. Brown, ed.; Vol. 13, 75657. Oxford: Elsevier
Science.
Van der Merwe, Nikolaas J. 1966) New mathematics for glottochronology. Current Anthropology
7. 485500
Van der Schoot, Bakker Manno, A. H., Arkema, T. M., Horsley and E. C. D. M van Lieshout. 2009.
The consistency effect depends on markedness in less successful but not successful
problem solvers: An eye movement study in primary school children. Contemporary Educational Psychology 34: 5866.
Van Eyck, Jan and Kamp, Hans. 1997. Representing discourse in context. In: J. van Benthem and
A. ter Meulen (eds.) Handbook of logic and language, volume 3, 179237. Amsterdam:
Elsevier.
Varelas, Maria. 1989. Semiotic aspects of cognitive development: Illustrations from early mathematical cognition. Psychological Review 100. 420431.
Vendryes J. 1939. Parler par conomie. In; C. Bally and G. Genve (eds.), Mlanges de linguistique offerts Charles Bally, 4962. Geneva: Georg & Co.
Unauthenticated
324 | Bibliography
Venn, John. 1880. On the employment of geometrical diagrams for the sensible representation
of logical propositions. Proceedings of the Cambridge Philosophical Society 4: 4759.
Venn, John. 1881. Symbolic logic. London: Macmillan.
Verene, Donald P. 1981. Vicos science of imagination. Ithaca, NY: Cornell University Press.
Vijayakrishnan, K. J. 2007. The grammar of Carnatic music. Berlin: Mouton de Gruyter.
Vygotsky, Lev S. 1961. Thought and language. Cambridge, MA: MIT Press.
Walker, C. B. F. 1987. Cuneiform. Berkeley, CA: University of California Press.
Wallis, Sean and Nelson, Gerald. 2001. Knowledge discovery in grammatically analysed corpora. Data Mining and Knowledge Discovery 5: 307340.
Wallon, Henri. 1945. Les origines de la pense chez lenfant. Vol. 1. Paris: Presses Universitaires
de France.
Wang, Xiaolu and He, Daili. 2013. A review of fMRI Investigations into the neural mechanisms of
metaphor comprehension. Chinese Journal of Applied Linguistics 38: 234239.
Wapner, Wendy, Hamby, Suzanne, and Gardner, Howard. 1981. The role of the right hemisphere
in the apprehension of complex linguistic materials. Brain and Language 14: 1533.
Watson, L. 1990. The nature of things. London: Houghton and Stoughton.
Waugh, Linda. 1979. Markedness and phonological systems. LACUS (Linguistic Association of
Canada and the United States) Proceedings 5: 155165.
Waugh, Linda. 1982. Marked and unmarked: A choice between unequals in semiotic structure.
Semiotica 39: 211216.
Weaver, Warren. 1955. Translation. In: W. N. Locke and A. D. Booth (eds.), Machine Translation
of languages, 1523. New York, NY: John Wiley.
Weinreich, Uriel. 1953. Languages in contact: Findings and problems. The Hague: Mouton.
Weinreich, Uriel. 1954. Is a structural dialectology possible? Word 10: 388400.
Weinstein, Edward A. 1964. Affections of speech with lesions of the nondominant hemisphere. Research Publications of the Association for Research on Nervous and Mental
Disorders 42: 220225.
Weisberg, Donna Skolnick, Keil, Frank C., Goodstein, Joshua, Rawson, Elizabeth, and Gray,
Jeremy R. 2008. The seductive allure of neuroscience explanations. Journal of Cognitive
Neuroscience 20: 470477.
Weizenbaum, Joseph. 1966. ELIZAA computer program for the study of natural language communication between man and machine. Communications of the ACM 9: 3645.
Weizenbaum, Joseph. 1976. Computer power and human reason: From judgment to calculation.
New York, NY: W. H. Freeman.
Wells, David. 2005. Prime numbers: The most mysterious gures in math. Hoboken: John Wiley.
Wells, David. 2012. Games and mathematics: Subtle connections. Cambridge: Cambridge University Press.
Werner, Alice. 1919. Introductory sketch of the Bantu languages. New York, NY: Dutton.
Werner, Heinz and Kaplan, Bernard. 1963. Symbol formation: An organismic-developmental
approach to the psychology of language and the expression of thought. New York, NY:
John Wiley.
Wheeler, Marilyn M. 1987. Research into practice: Childrens understanding of zero and innity.
Arithmetic Teacher 35: 4244.
Whiteley, Walter. 2012. Mathematical modeling as conceptual blending: Exploring an example
within mathematics education. In: Mariana Bockarova, Marcel Danesi, and Rafael Nez
(eds.), 256279. Semiotic and cognitive science essays on the nature of mathematics.
Munich: Lincom Europa.
Unauthenticated
Bibliography |
325
Whitney, W. D. 1877. The Principle of Economy as a phonetic force. Transactions of the American
Philological Association 8: 123134.
Whorf, Benjamin Lee. 1956. Language, thought, and reality, J. B. Carroll (ed.). Cambridge, MA:
MIT Press.
Wiener, Norbert. 1948. Cybernetics, or control and communication in the animal and the machine. Cambridge, MA: MIT Press.
Wierzbicka, Anna. 1996. Semantics: Primes and universals. Oxford: Oxford University Press.
Wierzbicka, Anna. 1997. Understanding cultures through their key words. Oxford: Oxford University Press.
Wierzbicka, Anna. 1999. Emotions across languages and cultures: Diversity and universals.
Cambridge: Cambridge University Press.
Wierzbicka, Anna. 2003. Crosscultural pragmatics: The semantics of human interaction. New
York, NY: Mouton de Gruyter.
Wiles, Andrew. 1995. Modular elliptic curves and Fermats last theorem. Annals of Mathematics. Second Series 141: 443551.
Wilson, E. O. and Harris, M. 1981. Heredity versus culture: A debate. In: J. Guillemin (ed.), Anthropological realities: Reading in the science of culture, 450465. New Brunswick, NJ:
Transaction Books.
Winner, Ellen and Gardner, Howard. 1977. The comprehension of metaphor in brain-damaged
patients. Brain 100: 717729.
Winner, Ellen. 1982. Invented worlds: The psychology of the arts. Cambridge, MA: Harvard University Press.
Winograd, Terry. 1991. Thinking machines: Can there be? Are we? In: J. J. Sheehan and M. Sosna
(eds.), The boundaries of humanity, 198223. Berkeley, CA: University of California Press.
Wittgenstein, Ludwig. 1921. Tractatus logico-philosophicus. London: Routledge and Kegan
Paul.
Wittgenstein, Ludwig. 1953. Philosophical investigations. New York, NY: Macmillan.
Wittmann, Henri. 1969. A lexico-statistic inquiry into the diachrony of Hittite. Indogermanische
Forschungen 74: 110.
Wittmann, Henri. 1973. The lexicostatistical classication of the French-based Creole languages. Lexicostatistics in genetic linguistics: Proceedings of the Yale conference, 8999.
The Hague: Mouton.
Wolfram, Stephen. 2002. A new kind of science. Champaign, IL: Wolfram Media.
Wundt, Wilhelm. 1880. Grundzge der physiologischen Psychologie. Leipzig: Englemann.
Wundt, Wilhelm. 1901. Sprachgeschichte und Sprachpsychologie. Leipzig: Eugelmann.
Wyllys, Ronald E. 1975. Measuring scientic prose with rankfrequency (Zipf) curves: A new
use for an old phenomenon. Proceedings of the American Society for Information Science
12: 3031.
Yancey, A., Thompson, C., and Yancey, J. 1989. Children must learn to draw diagrams. Arithmetic Teacher 36: 1519.
Zipf, George K. 1929. Relative frequency as a determinant of phonetic change. Harvard Studies
in Classical Philology 40: 195.
Zipf, George K. 1932. Selected studies of the principle of relative frequency in language. Cambridge, MA: Harvard University Press.
Zipf, George K. 1935. The psycho-biology of language: An introduction to dynamic philology.
Boston, MA: Houghton-Mifflin.
Unauthenticated
326 | Bibliography
Zipf, George K. 1949. Human behavior and the principle of least effort. Boston, MA: AddisonWesley.
Zwicky, Arnold and Sadock, Jerrold. 1975. Ambiguity tests and how to fail them. In: J. Kimball
(ed.) Syntax and semantics 4, New York, NY: Academic Press.
Zwicky, Jan. 2010. Mathematical analogy and metaphorical insight. For the Learning of Mathematics 30: 914.
Zyllerberg, A., Dehaene, S., Roelfsma, P. R., and Sigman, M. 2011. The human Turing machine:
A neural framework for mental programs. Trends in Cognitive Science 15: 293300.
Unauthenticated
Index
abduction 86, 91, 92, 145, 255, 258, 273, 274,
275
Abel, Neils Henrik 155
acalculia 59, 282
agglutinative 7, 23, 54, 55
Aiken, Howard 138
algorithm 1, 5, 34, 3740, 4547, 50, 51,
8487, 126, 127, 129, 132143, 145, 147,
148, 152154, 156, 162, 167, 169, 171,
172, 178, 188, 189, 202, 221, 223, 231,
232, 258, 265, 294
allophone 114, 115
alphabet 26, 91, 109, 153, 232, 273
ambiguity 11, 2529, 107, 160163, 170, 172,
191, 247, 248
analogy 17, 67, 90, 116, 169, 239, 275
anomalous 42, 63, 104, 170, 174, 175, 251
anthropic principle 102
aphasia 59, 282
Appel, Kenneth 38, 85, 91
argumentation 8, 32, 33
Aristotle 1012, 17, 36, 49, 65, 66, 70, 71, 73,
94, 262
arithmetic x, 2, 79, 11, 14, 15, 21, 25, 33, 34,
59, 69, 7072, 102, 134, 142, 168, 194,
240, 253, 268, 269, 273, 275, 279, 282,
285, 288, 293
Arnauld, Antoine 31
articial intelligence (AI) 36, 47, 50, 68, 96,
111, 134, 137, 138, 179, 183, 192
articial language 133, 190
articial neural network (ANN) 221, 222
associationism 49, 50
axiom 1, 8, 10, 13, 21, 25, 27, 28, 30, 31, 36,
38, 7274, 79, 84, 85, 87, 108, 130, 230,
259, 260
Babbage, Charles 138
Bacon, Francis 71
Bacon, Roger 12
Bar-Hillel Paradox 160162, 170
Bar-Hillel, Yehoshua 116, 117, 160, 169, 173
Barber Paradox 83
BASIC 146, 175
Bayes, Thomas 229

Bayess Theorem 140
Bayesian Inference 39, 168, 172, 225,
227230
Benford, Frank 203
Benfords Law 202, 203, 206, 210, 224, 231,
232, 250, 254
Bernoulli distribution 225
binary logic 92
binary number (binary digit) 7, 33, 52, 141,
143, 154, 185, 216
binomial distribution 225
birthday problem 206209, 224
blending 4, 15, 50, 56, 57, 60, 62, 64, 65, 120,
123, 145, 189, 190, 237
Boas, Franz 9, 16, 21
Boole, George 33, 92, 97, 289
Boolean algebra 28, 34, 36, 42, 73, 286
Booth, Andrew D. 160
Borel, mil 24, 25, 154
Butterworth, Brian 268, 278280, 282, 284
calculus 28, 55, 67, 157, 158, 217, 251, 253,
289
Cantor set 92
Cantor, Georg 8890, 96, 101, 118
Cantors diagonal method 56
Cantors proof 89
Cantorian logic 88, 91
Cardano, Girolamo 196
cardinality 90, 91, 285
Carroll, Lewis 82, 85
catastrophe 15
categorial grammar 116, 117
categorical logic 32, 98
Chomsky, Noam 1, 3, 10, 14, 1820, 22, 23,
37, 4042, 45, 50, 57, 63, 104108, 110,
112, 113, 124, 126, 130, 143, 161, 186, 187
Church-Turing thesis 256
circularity 82, 83, 248, 260
Cobham-Edmonds thesis 147
codability 214
cognate 5, 55, 222, 237, 239, 240, 241, 246
cognition 47, 51, 225, 263
Unauthenticated
328 | Index
cognitive linguistics 5, 11, 21, 42, 68,

118121, 123, 124
cognitive neuroscience 60, 257
cognitive science 2, 3, 37, 4651, 53, 61, 122,
141
cognitivism 128
coin-tossing problem, 206, 225
Collatz, Lothar 93
collocation 170, 201
communicative competence 126
completeness 8, 29, 81, 82, 84, 92, 129, 146,
158
compositional semantics 107
compositionality 44, 117, 118
compression 37, 40, 53, 55, 193, 194, 212,
218, 224, 219, 245, 246, 254
computability theory (CT) 37, 147, 149,
151158, 172, 185, 256
computation 1, 21, 36, 37, 39, 41, 43, 45, 47,
49, 51, 52, 55, 59, 132, 134, 138144,
146148, 150154, 156158, 160, 162,
168, 179185, 188, 189, 192, 194, 195,
217, 262
computational linguistics (CL) 21, 46, 133,
159, 161, 163, 165, 167, 169, 171, 173
computational neuroscience (CN) 60,
257262
computer model 33, 3740, 60, 134, 137, 138,
140, 142, 143, 181
computer program 84, 85, 127, 133, 136, 140,
144146, 148, 181
computer proof 86, 87, 95
computer science 5, 36, 41, 68, 127, 143
concept
basic 164
subordinate 164
superordinate 164, 165
conceptual metaphor theory (CMT) 6265,
119, 120, 123, 126
conceptual metaphor 50, 56, 6164, 90, 119,
120, 123, 165, 166, 173, 189, 190, 267,
285, 286
conditioning 49, 50
conjecture 38, 39, 80, 81, 85, 87, 88, 93
connectionism 262, 263, 265, 266
connectivity 6, 24, 58, 60, 64, 80, 88, 89, 93,
259, 262, 265, 267
connotation 166
consistency 24, 25, 27, 29, 38, 72, 81, 82, 92,
129, 139, 140, 146
contradiction 77, 78, 84, 87, 93, 94, 129
constructivism 14, 15
context 103, 110113, 160, 161
conversation 4042, 53, 126, 140, 166, 173,
175178, 186, 211
core vocabulary 191, 238242, 244, 245
coreference 41, 177
corpus linguistics 53, 194, 201, 202,
219224, 294
correlation 195, 200, 209, 211
correlation coefficient 201, 202
creativity 50, 64, 260
cybernetics 39
De Morgan, Augustus 96
decidability 8, 36, 8184, 86, 129, 137, 146,
153
decimal number 7, 52, 71, 154
deduction 4, 64, 72, 74, 77, 87, 91, 145, 274,
275
deep structure 1820, 22, 40, 42, 105108,
111, 112, 161
Dehaene, Stanislas xi, 57, 58, 261, 268, 276,
277, 280, 282, 284
deixis 41
Democritus 70
denotation 165
Descartes, Ren 2, 11, 17, 20, 69, 71, 190, 288
Devlin, Keith 268, 284
diagram 13, 29, 33, 43, 98, 99, 101, 273275
dialogue 40, 70, 146, 174
disambiguation 170, 177
discourse 40, 62, 116, 133, 134, 172, 223
distinctive feature 114116
double articulation 1, 216
Eckert, J. Presper 138
economy 6, 110, 193, 202, 216, 218, 245, 247,
248
efficiency 6, 216, 246
ELIZA 174, 179, 184
embodied cognition 51, 280
emergence 15, 21, 53
Enlightenment x, 50, 67, 71, 79
Unauthenticated
Index |
Epimenides 83, 84
ergonomics 249
Esperanto 191
ethnomathematics 58, 79
ethnosemantics 40
Euclid 2628, 66, 69, 77, 78, 80, 85, 91,
134136, 205
Euclidean geometry 2, 8, 29, 31, 64, 280
Euler, Leonhard 80, 98, 99, 101, 148151
Euler diagram 98, 99
Existential Graph 99, 272274
exponent 37, 90, 194, 195, 204, 211, 239, 240
Fermat, Pierre de 196
Fermats Last Theorem 31, 275, 284
Ferrero, Guillaume 211
Fibonacci, Leonardo 71
Fibonacci sequence 111
Ficino, Marsilio 71
fth postulate (axiom) 29, 36, 73
gurative, xi 4, 11, 42, 43, 62, 64, 116, 118,
125, 128, 160, 169, 188, 223, 257, 285
Fodor, Jerry 187
formal grammar 1012, 42, 43, 104, 108, 111,
133, 146, 163
formal linguistics 22, 23, 66, 67, 103, 104, 110
formal mathematics 26, 37, 66, 67, 69, 81,
86, 96, 108
formal semantics 114, 116118
formalism xii, 59, 13, 15, 2326, 36, 51, 67,
68, 124, 125, 127130, 133
formalist hypothesis 16, 17
Foster, Donald 220
Four Color Theorem 38, 85, 86
fractal 92, 93, 95, 96
Frege, Gottlob 35, 82
Fundamental Theorem of Arithmetic 134
Galileo 71, 89, 254
Galois, variste 155, 156
Gao, Yuqing 170
generativism 21, 161
genetic algorithm 221
geometry, x 9, 30, 31, 67, 69, 70, 72, 73, 80,
94, 288
glottochronology 194, 237, 238, 241243,
245, 294
329
Glue Theory 117

Gdel, Kurt 4, 56, 84, 118, 127
Gdels diagonal lemma 56, 286
Gdels proof 4, 29, 36, 56, 84, 286, 287
Goldbach, Christian 80
Goldbachs Conjecture 80
Google 141, 167, 170, 234
Google Translate 162, 167
Government and Binding 22
Gowers, Tim 39, 40
grammar 104, 108, 110
graph 274
group theory 155
Grovers reverse phone algorithm 153
Haken, Wolfgang 38, 85, 91
halting problem 84, 86, 154
Hamiltonian cycle (circuit) 39, 148, 152
Harris, Zellig 40
Hegel, Georg Wilhelm 71, 72
Heraclitus 66
hermeneutics xi, xii, 3, 6, 55, 65, 158, 179,
250, 294
Hilbert, David 34, 129, 130
Hilberts program 129, 130
Hobbes, Thomas 2, 11, 71
Hockett, Charles 1, 3, 103, 217
Hollerith, Herman 138
Humboldt, Wilhelm von 16, 21
Hume, David 49
Husserl, Edmund 72
Hymes, Dell 126
hyperreal number 157, 158
idealized cognitive modeling (ICM) 120, 122,
123, 287, 288, 290
image schema 120, 121, 269, 288, 289
imaginary number 35, 206
index 55
indexicality 41
induction 72, 7577, 87, 89, 95, 150, 275
inference 8, 17, 33, 38, 48, 62, 170, 97
innitesimal 157, 158
Innenwelt 102, 269, 276
integer 52, 75, 80, 8991, 96, 97, 130, 224,
274, 289
intentionality 102, 184, 185
Unauthenticated
330 | Index
interdisciplinary x, xi, 5, 6, 17, 40, 65, 179,

294
interhemispheric 255, 263, 264
Interlingua 162, 163
Internet 5, 39, 126, 133, 134, 141, 163, 221,
228, 294
isolating 54, 55, 210
Jacquard, Joseph-Marie 138
Jakobson, Roman 1, 3, 24, 114, 255, 262
Jespersen, Otto 9
Kant, Immanuel 71, 271273
keyword 201
Kleene Star 109
knowledge network 139, 140, 163165
knowledge representation 139, 140, 163165
Kolmogorov, Andrei 230
Knigsberg Bridges Problem 149152
Lakoff, George xi, xii, 25, 23, 4143, 48, 51,
56, 57, 6265, 68, 89, 119, 120, 250, 251,
267269, 285287, 289
lambda calculus 117, 256
Lambek calculus 117
Lancelot, Claude 13
langue 20, 21, 67, 68, 103, 126
Leibniz, Gottfried Wilhelm 8, 9, 11, 71, 138
Leopold, Werner 247
lexeme 22, 53, 108, 114116, 173, 212,
242244
lexical ambiguity 161, 162
lexical class 61
lexical eld 63, 173
lexical insertion 22, 42, 104, 106, 107,
114116
lexical semantics 114, 188
lexical tree 115
lexicon 107, 114, 117, 177, 180, 214
lexicostatistics 194, 237, 238, 240, 244
Liar Paradox 83, 84
linguistic competence 20, 21, 107, 108, 114,
126
linguistic metaphor 62, 63, 120, 123
literal meaning 1012, 43, 63, 116, 123, 163,
189, 190, 222, 268, 290
Lobachevskian geometry 30, 31, 85
Locke, John 11, 49

logarithm 195, 200, 203, 204, 206, 231, 233,
239241, 245, 250
logic 68, 3235, 66, 69, 70, 85, 91, 92, 94,
98
logical calculus 8, 9, 25, 31, 36, 97, 183, 260,
264
logicism 130
lgos 66, 70, 88, 91, 128
loop 38, 86, 87, 137, 143, 146, 154, 164
Luria, Alexander 262, 263
M-Set 95
Machine Translation (MT) 5, 39, 142, 159,
160, 163, 167, 169, 174
Machine-learning (ML) 139, 188
Mandelbrot, Benoit 85, 96, 213
mapping 4, 9, 13, 118, 120122, 189
markedness 248, 256
Markov, Andrey A. 1, 43, 231, 232
Markov chain 21, 86, 168, 181, 235, 236
Markov state 105
Martinet, Andr 210, 216, 246, 248
Marx, Karl 72
math cognition xi, xii, 58, 65, 268283, 293
mathematical knowledge 101103
Mauchley, John 138
McCarthy, John 139
Mean Length of Utterance (MLU) 53, 54
metalanguage 84
metaphor 119124
Mill, James 49
Minimalist Program 23, 37, 248
mirror neuron 92
model 4045, 144146
modularity 264, 265
Montague, Richard 116, 124
Montague grammar 116
Monty Hall Problem 198, 225227, 229
morpheme 18, 22, 5355, 114, 177, 178, 191,
201, 217
morphological index 54
morphology 10, 54, 177, 191, 210, 211, 219,
237, 246
mythos 66, 70, 71
Unauthenticated
Index |
Nagao, Makoto 169

Natural Language Processing (NLP) 133, 134,
140, 174179, 186, 188, 190, 232
natural language 174
natural logarithm 206, 234, 242
network 62, 128, 150, 152, 164169, 265
Neumann, John von 132, 138
neural circuit 56, 58
neural network 60, 139, 140, 257, 258,
260262, 264, 265, 269
neural structure 56
neuroscience 56, 57, 255, 256
Newcomb, Simon 203, 231, 250
Nichomachus 135, 136
Nietzsche, Friedrich 72
non-contradiction 94
normal curve 198
normal distribution 198
notation 26, 37
null hypothesis 201
number sense 268270, 272, 275280, 284
numeracy 277, 278
numerosity 272, 278, 279, 282284
optimality 24, 110, 248
optimization 152, 202, 203, 217
P = NP 130, 132, 133, 147149, 151154, 156,
157
Pnini 810, 13
Pappus 276
paradox 78, 82, 129
Parallel Distributed Processing (PDP) 60, 265,
266
Parmenides 82
parole 21, 67, 68, 103, 107, 126
parsing 107, 170, 176, 177, 222
Pascal, Blaise 138, 196
Pavlov, Ivan 49, 50
Peano, Giuseppe 34, 72
Peanos axioms 34
Peirce, Charles S. 31, 77, 81, 85, 91, 99, 100,
272274
Pennebaker, James 220, 221
Perelman, Grigory 88
phenomenology 72, 273
phoneme 18, 40, 110, 209212, 216, 219, 245
331
phonological rule 106, 108, 248

phonology 10, 59, 106, 159, 211, 245, 246
phrase structure rule 19, 20, 107, 108
phrase structure 19, 21, 22
Piaget, Jean 48, 275, 276, 279
Plato 14, 17, 69, 71
Platonism 14, 15, 58, 103, 269
Poincar, Henri 87, 88, 101
Polymath Project 39
polynomial time 147
polysemy 161163
Popper, Karl 182, 183
Port-Royale Circle 16
postulate 13, 25, 72
Prague School 114
presentational 80
Principle of Economy (PE) 202, 210, 211
Principle of Least Effort (PLE) 209
probability 55, 56, 195, 224, 225, 228230,
248
problem-solving 47, 48, 50, 59, 60, 86, 139,
140, 151, 181, 263
proof 72, 73
proof by exhaustion 38, 85, 87, 94
propositional logic 1, 43, 67, 107, 140
Prosecutors Fallacy 227, 228
Putnam, Hilary 257
Pythagoras 14, 66, 69, 70, 79
Pythagorean theorem 14, 31, 35, 78, 94, 102,
249, 274, 275, 280, 291
QED 80
quadratic time 132
Quadrivium 69
quantication 52, 193, 194, 248
quantum computing 256
quantum physics 153, 252, 253, 291
Quintilian 12
random number 154, 157, 200
random walk 234236
randomness 155, 197, 200, 201, 206, 208,
224, 225, 227
recursion 2, 36, 111113, 143, 200
reductio ad absurdum 77, 84, 87, 94
regression 200202, 221
Relevance Theory 118
Unauthenticated
332 | Index
reorganization 210, 211, 246

representation 37, 47, 51, 106, 139
retroactive data analysis 38, 40
reverse mathematics 129, 130
Riemann, Bernhard 30, 206
Riemannian geometry 30, 31, 85
Riemann zeta function 205, 206
Robinson, Abraham 157, 158
rule 7, 9, 18
Russell, Bertrand 35, 8284, 102
Saussure, Ferdinand de 7, 20, 21, 28, 67, 68,
103, 126, 245
scaling law 213
Schank, Roger 176
Schikard, Wilhelm 138
Scotus, John Duns 13
script theory 138
Searle, John 184
self-referentiality, 83, 84
semiogenesis 15
set theory 96, 97
Shannon, Claude 33, 168, 185
SHRDLU 186
Sierpinski Carpet 92, 93
signicance test 200, 201
Socrates 15, 70
source domain 63
Sperry, Roger 263
split-brain 263
standard deviation 1991
statistics 195
straticational grammar 10
structuralism 3, 21, 28
stylometry 53, 194, 219222
surface structure 1820, 22, 40, 42, 105108,
111, 112, 161
Swadesh, Morris 237239, 242244
syllogism 8, 32, 33, 36, 70, 73, 8183, 98
symbolic logic 82, 97
syntax 1820
syntax hypothesis 17
tagmemics 10
target domain 63
Tarski, Alfred 83
text theory 172, 173
textspeak 193
Thales 80
Thom, Ren 15, 102, 130
Thrax, Dionysius 12, 66
time depth 238244, 246, 294
topology 101, 151, 259, 289
transnite number 35, 91
transformational rule 104, 105
transformational-generative grammar 104
Traveling Salesman Problem (TSP) 147, 148,
152, 153
tree diagram 43, 44, 46, 104, 105, 115
Trivium 69
Turing, Alan 1, 84, 127, 141, 154, 184, 185
Turing machine 2, 86, 141, 143, 153, 185, 257,
260
Turing Test 184
Uexkll, Jakob von 101, 276
Umwelt 101, 102, 269, 276
undecidability 36, 8284, 253
Unexpected Hanging paradox 129, 130, 147,
159
Universal Grammar (UG) 1
Valla, Lorenzo 221
Vendryes, Joseph 246
Venn diagram 98
Venn, John 98100
Vico, Giambattista 17
Vygotsky, Jean 48, 49, 128, 181, 182
Weaver, Warren 160
Weizenbaum, Joseph 174
well-formedness 63
Whitehead, Alfred North 35, 82, 83, 102
Wiener, Norbert 39
William of Ockham 13
Winograd, Terry 183, 186
Wittgenstein, Ludwig 35, 82, 254
Wundt, Wilhelm 105
Zamenhof, Ludwik Lejzer 191
Zeno of Elea 78, 82
Zipf, George Kingsley 209, 211, 213
Zipfs law 214, 215
Zipan analysis 194, 209, 214
Zipan curve 212
Unauthenticated

1language and Mathematics PDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

1language and Mathematics PDF

Uploaded by

Copyright:

Available Formats

Marcel Danesi

Language and Mathematics

Library of Congress Cataloging-in-Publication Data

Formal semantics | 114

Other techniques | 221

Metaphor as the basis for new understanding | 5

Euclids and Nichomachus algorithms | 135

Number of vertices, edges, and faces of a cube | 151

The normal curve | 198

Mathematics is often designated a language, complete with its own symbols

neuroscientists, and computer scientists to study cognition, learning, and mental

at in a non-partisan way, do give insights into language and its mathematical

1.1.1 Formalism in linguistics and mathematics

Hobbes (1656) also inveighed ercely against metaphor, characterizing it as an

meaningful form of philosophical inquiry was of the same literal-logical kind as

Figure 1.2: The formalist mode of inquiry

that rules of this kind manifested themselves in different languages in specic

current formalist research. It can be summarized as a corollary to the formalist

John is eager to please.

Figure 1.3: Chomskyan analysis of surface structure

John is eager to please

It is easy for someone

John is easy to please

Minimalist Program (MP) is an extension and modication of GB starting

For formalists, the key to understanding complexity of structure is to be found

1.1.3 Formal analysis

18. A square number is equal multiplied by equal, or a number which is contained

Also known as the Parallel Postulate, it attracted immediate criticism, since it

Figure 1.7: Riemannian geometry

1.1.4 The structure of logic

Figure 1.8: Set theory diagrams

Boolean algebra has had many applications, especially in computer programming

By attempting to enucleate the structure of logic and mathematics as a unitary

(so to speak), where computer algorithms written to model rule-making principles

1.2.1 Modeling formal theories

Andrea is a wonderful young lady.

Andrea is a wonderful young lady.

tending the application of rules across sentences in a conversation. Texts, in this

a key 1987 work, he discussed a property of the indigenous Australian language

eats the pizza

Figure 1.10: Phrase structure diagram for

Figure 1.11: Markovian diagram for 2,234

1.2.2 Cognitive science

newsomething not based on instinctual understanding. Every major behavioral

The contemporary study of linguistic compression has produced many important

Age Equivalent (months)

Figure 1.12: Blending

1.4.1 Neural structure

Are all numbers and words blends?

Libertus, Pruitt, Woldorff, and Brannon (2009) presented 7-month-old infants

1.5 Common ground

1.5 Common ground

This emphasis on logical method in the study of geometry and grammar

2.1 Formal mathematics |

2.1 Formal mathematics

2.1.1 Lgos and mythos

2.1 Formal mathematics |

Aristotle especially criticized the use of mythos in mathematics as meaningless.

2.1 Formal mathematics

Figure 2.1: Part 1 of the proof that the sum of

Figure 2.2: Part 2 of the proof that the sum of

Figure 2.3: Part 3 of the proof that the sum of

2.1 Formal mathematics