Professional Documents
Culture Documents
Evolution of
Semantic
Systems
Evolution of Semantic Systems
Bernd-Olaf Küppers Udo Hahn
Stefan Artmann
Editors
Evolution of
Semantic
Systems
123
Editors
Bernd-Olaf Küppers Stefan Artmann
Frege Centre for Structural Sciences Frege Centre for Structural Sciences
Friedrich Schiller University Jena Friedrich Schiller University Jena
Jena Jena
Germany Germany
Udo Hahn
Frege Centre for Structural Sciences, and
Jena University Language & Information
Engineering (JULIE) Lab
Friedrich Schiller University Jena
Jena
Germany
Complex systems in nature and society make use of information for the development
of their internal organisation and the control of their functional mechanisms.
Alongside technical aspects of storing, transmitting and processing information, the
various semantic aspects of information—such as meaning, sense, reference and
function—play a decisive part in the analysis of such systems. This raises important
questions:
– What are paradigmatic examples of semantic systems—formal languages, nat-
ural languages, programming languages and corresponding coding artefacts
(programs), ontologies, the genetic code, neural nets, animal signal systems, the
World Wide Web or social organisations?
– Are there universal principles underlying the origin and evolution of semantic
systems?
– Can we draw a rigid line of demarcation between semantic and non-semantic
systems, or is the transition between the two types of systems a smooth one?
– Does the understanding of semantic systems necessarily imply a “naturalisation”
of semantics?
A large variety of scientific disciplines are required to cooperate if adequate
answers to these and further related questions are to be developed. With the aim
of fostering a better understanding of semantic systems from an evolutionary and
multidisciplinary perspective, this volume collates contributions by philosophers
and natural scientists, linguists and information and computer scientists. They do
not follow a single research paradigm—rather, they shed, in a complementary way,
new light upon some of the most important aspects of the evolution of semantic
systems. The eleven chapters of this volume are arranged according to an increasing
level of specificity of their respective topics.
The contributions of Dagfinn Føllesdal and Stefan Artmann are concerned with
philosophical questions that arise when fundamental concepts of semantics, such as
meaning and reference, become the object of analytical and pragmatist philosophy.
In “The Emergence of Reference”, Føllesdal asks which properties objects must
possess so that they can be referred to. He presents three such features. First of
v
vi Preface
all, objects must have numerous properties that, most frequently, are only partially
known by us but which can be explored. Secondly, objects must be temporal entities
that retain their identity when changing their properties. Thirdly, while we may
falsely ascribe properties to an object, it must be possible that our false ascriptions
concern the same object as do our corrected, true ascriptions. Føllesdal shows how
these three features give objects such great epistemological importance for human
cognition and how they enable language to keep track of objects.
Artmann’s paper on “Pragmatism and the Evolution of Semantic Systems” also
investigates reference, yet it takes its point of departure in pragmatist philosophy,
namely in Charles S. Peirce’s evolutionary pragmatism and its naturalistic recon-
struction by John Dewey. How do these philosophers understand the origin and
development of meaning, and how might their theories contribute to an explanation
of the evolution of semantic systems? Artmann maintains that pragmatism could
play a very important role in our understanding of the dynamics of meaning if it were
to integrate theories and models from structural science. His analyses of classical
pragmatism lead to the project of developing a modern version of pragmatist
thought, which he calls “structural pragmatism”.
After Føllesdal’s and Artmann’s philosophical papers, the next three chapters
discuss abstract structural-scientific concepts that are necessary for the development
of an evolutionary theory of semantic systems: the concepts of system, semantics
and structure.
Systems, in general, are the topic of Klaus Kornwachs’ paper on “System
Surfaces—There Is Never Just Only One Structure”. Kornwachs focuses on the
concept of surface of a system: does it refer to the top layer of a real object or
to the construction of the mind that is useful when the world is conceived of in
terms of systems? Kornwachs tackles this question by looking into the history of
systems theory and reconstructing the relationships that philosophers and scientists
have postulated between the ideas of system and order, on the one hand, and between
the ideas of system and process, on the other. How system theorists think about those
relationships will have a great influence on how they conceptualise the evolution of
semantic systems.
In “Elements of a Semantic Code”, Bernd-Olaf Küppers presents a new approach
to the objectification and quantification of meaning-bearing information. His theory
of semantic code addresses general aspects of semantics, such as novelty, pragmatic
relevance, selectivity and complexity. He integrates these into a reservoir of value
elements from which a recipient configures a value scale for the particular evaluation
of information. Küppers’ approach to a quantified theory of semantic information
is a new example of how structural sciences—such as cybernetics, information
theory and systems theory—proceed when analysing complex systems in nature and
society: the comprehensive, abstract structures that they describe are not restricted
to a particular class of complex systems, such as natural or artificial, living or
non-living ones.
Both Kornwachs’ theory of systems and Küppers’ theory of semantics follow a
structural-scientific approach to the evolution of semantic systems. In doing so, they
rely heavily upon the concept of structure. In “Talking about Structures”, Rüdiger
Inhetveen and Bernhard Schiemann show how that concept has become, during the
Preface vii
ix
x Contents
Index . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 231
List of Contributors
xi
xii List of Contributors
Dagfinn Føllesdal
Abstract The central theme of this volume, the evolution of semantic systems,
raises issues about the general idea of a system, what differentiates semantic
systems from other systems, and about the two key topics of semantics: meaning
and reference. The paper focuses on the emergence of reference and how this is
connected with individuation, reification and the notion of an object. Objects have
three features that are crucial for reference: First: They are the bearers of a (usually)
large number of properties and relations. Normally we know only a small number
of these, but the object is conceived of as having numerous further properties that
we do not know yet, but which are there to be explored. Secondly: Objects, except
mathematical ones and a few others, change over time. One and the same object
can have a property at one time and lack it at another time. The object remains
identical through changes. Modalities come in at this point; not only are there the
actual changes, but there are also possible ones, there are accidents and there are
necessities. Or, at least, so we say. Finally: There is our fallibility. We may have
false beliefs about objects. We may seek to correct these beliefs, but all the while
our beliefs, true or false, are of the objects in question. A belief, or set of beliefs, is
not about whichever object happens best to satisfy our beliefs ([1]). A semantics that
just would seek to maximize our set of true beliefs would reflect poorly the role that
objects play in epistemology. These three important features of objects are reflected
in the way names and other referring expressions function in language. Objects are
crucially important for our daily lives, our cognition and our semantic theory, and
keeping track of objects is an important feature of language. How this happens is a
major theme of the paper.
D. Føllesdal ()
Department of Philosophy, Stanford University, and Center for the Study of Mind in Nature,
University of Oslo, 0316 Oslo, Norway
e-mail: dagfinn@csli.stanford.edu
1 Introduction
Among the questions that have been discussed at the conference I will concentrate
on two and for each of them address a number of the sub-questions that have been
indicated:
Question 1: Semantic systems—What are paradigmatic examples of semantic sys-
tems: formal languages, natural languages, programming languages, ontologies,
the genetic code, neural nets, animal signal systems, computer systems and social
systems?
Question 2: Semantic versus non-semantic systems—Does there exist a rigid line
of demarcation between semantic and non-semantic systems, or is the transition
between both types of systems a smooth one?
2 Semantic Systems
What is a system? As a first approximation, which may be too broad, but suffices for
our purposes, we may define a system as a structure, that is, set of entities (nodes)
with relations defined between them. What then are semantic systems?
In traditional linguistics it is customary to distinguish between syntax, phonology
and semantics. Are these three disciplines autonomous or do they depend upon one
another, if so how?
Syntax depends on semantics: syntax studies those structural features that are
relevant to meaningfulness. Likewise, semantics is dependent upon syntax: while
syntax focuses on those substructures, or expressions, that occur in communicative
activity and studies their structure, semantics is supposed to explain how these
expressions that occur in normal speech and writing have a meaning, how they
make sense and function in communication and what meaning they have. Phonology
explores differences in sound where they matter for semantics.
Now on to the second main question: Does there exist a rigid line of demarcation
between semantic and non-semantic systems, or is the transition between both types
of systems a smooth one?
My tentative answer to this question is: a system becomes a semantic system
when it becomes endowed with meaning. Let us see how.
The Emergence of Reference 3
3.1 Semantics
Slide rules are now obsolete; students nowadays have barely heard of it and do not
know how it functions.
A slide rule is a simple system, two rods with numerical scales that slide against
one another. Through the use we make of it it becomes a simple semantic system.
This semantic system incorporates a mathematical structure: instead of multiplying
numbers one adds the corresponding exponents. The rods and their movements are
given meaning, which makes them part of a semantic system that also includes some
mathematics.
The rods could be given other meanings; one could use them for simple addition
and also for many other purposes. Not the physical system itself is semantic, but the
system under an interpretation.
3. Holism: Changes in one part of the structure tend to affect the whole structure,
make the parts function differently. Thus, for example, contrasts between
elements play an important role in semantics and also in other branches of
linguistics. An example of the latter is Roman Jacobson’s treatment of phonemes.
Many of the structuralists emphasized binary opposition as particularly signifi-
cant here. However, the emphasis on binary opposition may in part be due to its
being easier to spot and easier to handle than more complex structural features.
4. The semantic interpretation is affected by new knowledge: inseparability of
theory and theory change on one side and meaning on the other. This is a topic
that will be addressed by Alexa McCray and Joon Lee in this volume. This is
a main point in Quine: each “translation manual” is a new way of separating
theory and meaning, all that matters for correct translation is fit with the publicly
observable evidence.
Frege contributed more than anybody else to developing semantics.1 Two of his
observations will be particularly important for what follows:
1. Semantics has two branches: Theory of sense and theory of reference. This
distinction is partly motivated by trying to account for three phenomena: How
can identity statements be both true and informative? How can empty names be
meaningful? How do we determine what a name refers to?
2. Frege proposed a unified semantics: There are three basic kinds of expressions,
singular terms, general terms and sentences, which all function in the same way:
they have a sense and this sense determines their reference. Carnap’s method of
intension and extension basically functions in the same way.
5 Quine
We will contrast some of Frege’s views with those of Quine and we will see that
there are some important points where neither of them can be right.
Quine criticized a notion of sense that gave no identity criteria for sense. “No
entity without identity” he said in 1957 and he insisted on this point throughout the
1
Particularly important are Frege’s Über Sinn und Bedeutung [4] and Der Gedanke [5]. Wolfgang
Künne’s recent book, Die Philosophische Logik Gottlob Freges: Ein Kommentar, contains some
of the main texts and gives a very valuable overview and analysis of Frege’s ideas in this field
(cf. [9]).
The Emergence of Reference 5
rest of his life.2 Quine was more satisfied with the theory of reference, including
a theory of truth. From 1941 on Quine used his views on reference to criticize the
modal notions, which he regarded as closely connected with the notion of sense.
Quine presented a number of arguments against the modalities. In the beginning
he pointed to their lack of clarity: necessity can be defined in terms of possibility
and both notions in terms of possible worlds. However, possible words in turn can
hardly be defined without appeal to possibility and thereby we are back in a small
circle again, similar to the one that arises in the theory of meaning when we define
synonymy and analyticity in terms of meaning and the latter term again in terms
of the former. In both cases one moves in a small circle that should satisfy no
philosopher.
In later writings Quine reinforced his criticism of the modalities with ever
stronger arguments. All of these arguments brought out a severe lack of clarity about
what the objects are that we talk about in modal contexts. Quine took this as an
indication that there is something fundamentally wrong with the modal notions, like
necessity and possibility.
We shall not review these arguments here, but will mention the last of them,
which Quine presented in Word and Object [11]. Quine here gave an argument that
showed that modal distinctions collapse when we quantify into modal contexts.
That is, not only is everything that is necessary true, but also conversely, everything
that is true, is necessary. So there was no point in the modal notions. Carnap and
others had held that without such quantification modal notions would be of no use.
So Quine felt that he had clinched his case after 19 years of criticism against the
modal notions.
6 Two-Sorted Semantics3
Quine’s argument was disastrous for the philosophical approach to the modal
notions at that time. Formalizing it shows that all assumptions were universally
accepted Fregean views on reference, accepted by Carnap and everybody else
writing on the modalities at that time. However, the argument is too catastrophic.
An analysis of the argument showed that it leads to a similar collapse of all non-
extensional notions, not only the logical modalities but also causality, probability,
knowledge and belief and legal and ethical notions.
2
Quine used the famous phrase in his presidential address to the Eastern Division of the American
Philosophical Association in December 1957, printed in the Association’s Proceedings and
Addresses for 1958, reprinted in Ontological Relativity and Other Essays 1969 [12, p. 23]. The idea
is clearly there much earlier. Thus in “On what there is” (1948), he writes: “But what sense can be
found in talking of entities which cannot meaningfully be said to be identical with themselves and
distinct from one another?” [10, p. 4] He used it again in many later writings, for example, in [12,
p. 23], [13, p. 102] and [15, pp. 40, 75].
3
This section and the next are a summary of some sections in my dissertation [3].
6 D. Føllesdal
at this point; not only are there the actual changes, there are also possible ones,
there are accidents and there are necessities. Or, at least, so we say.
3. Fallibility
Finally: There is our fallibility. We may have false beliefs about objects. We
may seek to correct these beliefs, but all the while our beliefs, true or false, are
of the objects in question. A belief, or set of beliefs, is not about whichever
object happens best to satisfy our beliefs. A semantics that just would seek to
maximize our set of true beliefs would reflect poorly the role that objects play in
epistemology.
9 Quine on Reification
Starting in Word and Object [11] Quine discussed some features of the role of
objects in our lives. In the 1990s this became one of his main concerns. Here is
an example, from 1995 [16, p. 350]4:
As Donald Campbell puts it, reification of bodies is innate in man and the other higher
animals. I agree, subject to a qualifying adjective: perceptual reification (1983). I reserve
“full reification” and “full reference” for the sophisticated stage where the identity of a
body from one time to another can be queried and affirmed or conjectured or denied
independently of exact resemblance. [Distinct bodies may look alike, and an identical object
may change its aspect.] Such [discriminations and] identifications depend on our elaborate
theory of space, time and of unobserved trajectories of bodies between observations.
This distinction between perceptual reification and full reification Quine made
already in 1990, although without this terminology. Here is a passage from a lecture
Quine gave that year ([14, p. 21], [15, pp. 35–40]):
I wonder whether a dog ever gets beyond this stage. He recognizes and distinguishes
recurrent people, but this is a qualitative matter of scent. Our sophisticated concept of
recurrent objects, qualitatively indistinguishable but nevertheless distinct, involves our
elaborate schematism of intersecting trajectories in three-dimensional space, out of sight,
trajectories traversed with the elapse of time. These concepts of space and time, or the
associated linguistic devices, are further requisites on the way to substantial cognition.
If I am right in what I have been saying so far, names are normally introduced for
the following three purposes:
(i) When we are interested in further features of the object beyond those that were
mentioned in the description that was used to draw our attention to the object.
4
The italics are Quine’s.
8 D. Føllesdal
of the object. If we are not moved by any of these three reasons we will normally
not introduce names or pronouns.
Here, the first of the descriptions may be by far the most frequently used. On the
Fregean view it would therefore be likely to be replaced by a name. However, I
doubt that any of you have ever introduced a name instead of this description which
you use so often. The reason is that you are not interested in any other feature of
this number than that it happens to be the present value of your bank account. If you
were particularly interested in other features of that number, perhaps because you
are a number theoretician, you would switch to using a numeral.
The second description or others like it, you may, however, have used occasion-
ally in order to pick out an object or a person. If you want to say something more
about that person, regardless of whether the person keeps the glasses on or not, then
you introduce the person’s name, if you know it, or you may use a pronoun, like
“he” or “she.”
In my view, the notion of a genuine singular term is not fundamentally a modal
notion; it is not a notion that requires appeal to necessity or essentialism for its
definition or clarification. That genuine singular terms refer to the same object “in
all possible worlds” to use modal jargon is not definitory of such terms. It merely
follows from the fact that these terms are referring expressions. Preservation of
reference is basic to our use of language even outside of modal logic, for example
in connection with our talk of an object as changing over time.
I have argued that we use genuine singular terms to signal our intention to track
some particular object through changes, etc. This view is normally taken to mean
that genuine singular terms are guaranteed to keep their reference. However, the
argument for this view in my dissertation is a conditional argument: it says that
if quantification is to make sense, then one has to keep on referring to the same
objects “in all possible worlds.” There is no guarantee that one succeeds in this, and
the argument says that if one gets confused about the references of one’s terms, so
that they sometimes relate to one object, sometimes to another, then quantification
ceases to make sense.
10 D. Føllesdal
If genuine singular terms were guaranteed to keep their reference, then appeal
to a causal tie which connects the term with its object would seem an attractive
solution. The tie would hold regardless of confusions and mistakes. Gareth Evans
was, I believe, the first to propose such a causal theory of reference [2, pp. 187–208].
The same guaranteed preservation of reference is characteristic of many versions
of the transmission theory, which was proposed by Peter Geach and Saul Kripke.
Geach is very clear on this. He wrote in 1969 [6, pp. 287–300, pp. 153–165]5:
I do indeed think that for the use of a word as a proper name there must in the first instance
be someone acquainted with the object named. But language is an institution, a tradition;
and the use of a given name for a given object, like other features of language, can be handed
down from one generation to another; the acquaintance required for the use of a proper
name may be mediate, not immediate. Plato knew Socrates, and Aristotle knew Plato, and
Theophrastus knew Aristotle, and so on in apostolic succession down to our own times; that
is why we can legitimately use “Socrates” as a name the way we do. It is not our knowledge
of the chain that validates our use, but the existence of such a chain, just as according to
Catholic doctrine a man is a true bishop if there is in fact a chain of consecrations going
back to the apostles, not if we know that there is.
However, I regard these accounts as unsatisfactory, for several reasons that I will not
discuss here, one of them being their inability to account for reference change.
Let me end by sketching my own view on the tie between genuine singular terms and
their objects. Adherents to causal theories of reference and transmission theories of
reference of the kind described by Geach tend to regard the tie between singular
terms and their objects as a matter of ontology. According to them, we must
distinguish between the ontological issue of what a name as a matter of fact refers
to and the epistemological issue of how we find out what it refers to. I look upon the
ontological and the epistemological issue as much more closely intertwined. This is
largely because language is a social institution. What our names refer to—and not
only how we find out what they refer to—depends upon evidence that is publicly
available in situations where people learn and use language.
Rigidity is not something that is achieved through the introduction of a genuine
singular term in our language. Sameness of reference is never guaranteed. There is
always a risk that in spite of the best of our efforts, we get mixed up, and if the
mix-up is persistent and community-wide, the term may even come to change its
reference. A name does not always continue to refer to the object on which it was
bestowed at the original “baptism.”
Rigidity, or genuineness, as I see it, is not incompatible with such a reference
shift. Instead, I look upon rigidity as an ideal, something like a Kantian regulative
5
My italics.
The Emergence of Reference 11
idea, which prescribes the way we use language to speak about the world. When
we use a name or another genuine singular term we signal to our listener that we
want to refer to a particular object and will do our best to keep on referring to that
same object as long as we use the term. There is in our use of names and other
genuine singular terms a normative pull towards always doing our best to keep track
of the reference and keep on referring to it. Sometimes we go wrong and it is unclear
both what we believe and what our beliefs are about until a new usage has been
established.
All our talk about change, about causation, ethics, and knowledge and belief,
as well as about the other modalities, presupposes that we can keep our singular
terms referring to the same objects. To the extent that we fail, these notions become
incoherent.
To conclude this brief discussion of reference: I hold that there are genuine
singular terms, or rigid designators, in our language and that they are indispensable
for our talk about change, causality, modality, etc. However, my view differs from
other current views on reference mainly in that I do not regard preservation of
reference as automatically achieved through our use of singular terms, but as
something we try to achieve. This is what I mean by “normative pull” and also
by what I call a “regulative idea.” This view on reference could therefore also be
called a normative view on reference.
References
1. Davidson, D.: Truth and meaning. Synthase 17, 304–323 (1967). Reprinted in Davidson’s Truth
and Interpretation, pp. 17–36. Clarendon Press, Oxford (1984)
2. Evans, G.: The causal theory of names. Aristotelian Society Supplementary Volume xlvii, 187–
208 (1973)
3. Føllesdal, D.: Referential Opacity and Modal Logic. Dissertation, Harvard University (1961)
(Oslo University Press, 1966, also Routledge, 2004)
4. Frege, G.: Über Sinn und Bedeutung. Zeitschrift für Philosophie und philosophische Kritik
100, 25–50 (1892). Translated as ‘On Sense and Reference’ by M. Black in Translations from
the Philosophical Writings of Gottlob Frege, 3rd edn., P. Geach and M. Black (eds. and trans.).
Blackwell, Oxford (1980)
5. Frege, G.: Der Gedanke. Beiträge zur Philosophie des Deutschen Idealismus I, Heft 2,
58–77 (1918). Translated as ‘The Thought; A Logical Inquiry’ by A.M. and Marcelle Quinton
in Mind 65, 289–311 (1956)
6. Geach, P.: The perils of Pauline. Rev. Metaphys. 23, 287–300 (1969). Reprinted in Geach,
Logic Matters, pp. 153–165. Blackwell, Oxford (1972)
7. Kripke, S.: Identity and necessity. In: Munitz, M.K. (ed.) Identity and Individuation. New York
University Press, New York (1971)
8. Kripke, S.: Naming and necessity. In: Davidson, D., Harman, G. (eds.) Semantics of Natural
Language. Reidel, Boston (1972) (also Harvard University Press, Cambridge, 1980)
9. Künne, W.: Die Philosophische Logik Gottlob Freges: Ein Kommentar. Klostermann Rote
Reihe, Frankfurt am Main (2010)
10. Quine, W.V.: On what there is. Rev. Metaphys. 21–38 (1948) [Reprinted in: From a Logical
Point of View. Harvard University Press, Cambridge (1953, 1961, 1–19); Harper & Row,
Harper Torchbook, New York (1963, pp. 1–19)]
12 D. Føllesdal
Stefan Artmann
Abstract This paper discusses two classical pragmatists and their theories about
the relation of semantics and pragmatics: Peirce’s evolutionary pragmatism and its
naturalistic reconstruction by Dewey. Two questions are addressed: how do these
philosophers understand the origin and development of meaning and how might
their theories contribute to an explanation of the evolution of semantic systems?
Pragmatism could play a very important role in our understanding of the dynamics
of meaning if it integrated theories and models of structural science. This would
lead to a new version of pragmatist thought, ‘structural pragmatism.’
1 Introduction
S. Artmann ()
Frege Centre for Structural Sciences, Friedrich Schiller University Jena, Jena, Germany
e-mail: stefan.artmann@uni-jena.de
Peirce published the pragmatic maxim in his 1878 paper, How to Make Our Ideas
Clear [1]. The maxim tells us how we, as rational agents, ought to elucidate the
meaning of our conceptions: “Consider what effects, which might conceivably have
practical bearings, we conceive the object of our conception to have. Then, our
conception of these effects is the whole of our conception of the object” [1, p. 132].
To follow the maxim we must distinguish, in our imagination, a conception from its
object. We then imagine what effects that object can cause and focus on such effects
which we can imagine that they have practical consequences in the sense that they
might influence the actions we could perform. After we developed as precise and
complete an idea as possible of those consequences, we know as clearly as possible
what our initially vague conception of the object ought to mean: namely, the totality
of potential effects of the object on our possible actions.
For Peirce, the pragmatic maxim is a general tool for the semantic analysis
of conceptions. He suggests using the maxim, not only to make clear ideas
that obviously are practically relevant, but also to elucidate the most abstract
Pragmatism and the Evolution of Semantic Systems 15
Peirce argues for his conception of reality and his hope that we shall know the
real, by trying to give us an absolutely reliable guarantee that we are not hoping in
vain. Peirce’s reasoning takes on the form of an evolutionary metaphysics. Why? Or,
as a pragmatist would ask, what is the practical relevance of the object of Peirce’s
conception of evolution? Let us start, for example, by vaguely imagining a process in
which neither is the same repeated again and again nor do ever new events succeed
each other in a completely random fashion.1 An evolutionary process connects
regularity and randomness so that we can observe continuous transformations of
existent process components. Taking part in such an evolution, we may suppose that
we perform our actions in an environment that is stable over certain stretches of time
so that, e.g., even catastrophic events do not lead to a sudden change in fundamental
causal laws. We may suppose, too, that our actions can intentionally change our
environment with rather predictable results and that we usually have time enough
to adapt to unforeseeable environmental changes. In short, we can develop habits in
such an ambience and can change it to adjust it to our habits.2
If the cognitive and behavioral evolution of the community of rational agents
is part of the evolution of what is real, then those rational agents are justified to
conceive, first, that they can learn continually from their unsuccessful actions and
adapt themselves to reality, and second, that they can adapt, in a certain degree,
what is real to their conceptions. For Peirce, the conditional clause of the preceding
sentence must express a truth. Otherwise, his pragmatic maxim would lead to
philosophical confusion: it could then not be applied to its components (here, the
conception of reality) in order to make our conception of the whole maxim clear.
Peirce’s attempt to prove the reality of learning and adaptation can be recon-
structed succinctly as an evolutionary narrative in seven chapters.
1. From random to iconic events. To get started ex nihilo, an evolutionary process
depends on events that are random in an absolute sense: they are completely
unforeseeable and thus uncontrollable (cf. Peirce’s speculative concept of the
First and its cosmogonic interpretation as the “womb of indeterminacy” in [3,
p. 273 ff.]). Such events can be conceived of only after they have occurred.
However, if similar events are happening, any of those events is representing
all events which are related to each other by their similarity. Those events are not
absolutely random anymore since some of their features are known before they
happen. Using Peirce’s semiotic terms, we can say that such events are ‘iconic’
1
Peirce criticizes evolutionary theories that, as he sees them, overemphasize either the role of
necessity (e.g., August Weismann’s) or the role of chance (e.g., Charles Darwin’s), in his 1893
paper, Evolutionary Love (see [2, p. 358 ff.]).
2
For Peirce, the main evolutionary principle is thus neither necessity nor chance but habit (see [2,
p. 360 ff.]).
Pragmatism and the Evolution of Semantic Systems 17
ones: “I call a sign which stands for something merely because it resembles it,
an icon” [4, p. 226].
2. From iconic to indexical events. It might occur that, when an iconic event of
similarity class i1 is happening, it is sometimes the case that an iconic event of
similarity class i2 is happening shortly thereafter. Both events are then spatio-
temporally connected to each other in a particular way. Using Peirce’s semiotic
terminology, such events are indexical representations of each other (see [4,
p. 226] for the definition of indexical sign, which “[: : :] signifies its object solely
by virtue of being really connected with it.”). The more often an i1 -event is hap-
pening under circumstances that are also similar to each other, the more precisely
the probability that an i2 -event is happening shortly thereafter will become fixed.
Thus the following generalization becomes possible: “[: : :] experiences whose
conditions are the same will have the same general characters” [5, p. 169].
3. From indexical to symbolic events. Why does the probability that an i2 -event is
happening given an i1 -event under particular circumstances of similarity class
c get fixed more and more precisely? After very small regularities between
indexical events appeared, there emerges a new type of events that are rule-
based or, in Peirce’s semiotic terms, ‘symbolic’ (see [4, p. 225 f.] on tokens or
symbols). The rules are of the following form: If it is more probable that an
i2 -event is happening after an i1 -event happened under c-circumstances than
when an event of any other similarity class did, and if an i2 -event is desired
to happen, then the given situation shall be transformed to c-circumstances in
a way that makes an i1 -event most probable. Knowledge of such rules is an
epistemic precondition of becoming a rational agent who follows those rules
to effectuate events in a controlled fashion. Rational agents conceive of their
actions as symbolic representations of indexical relations of events and they
develop the habit of performing particular actions, given specific situations, in
order to bring about particular indexical relations (see [3, p. 257 ff. and p. 262 ff.]
on synthetic consciousness and habit, respectively). In this way, rational agents
proactively adapt the environment in which they act, to their desires so that they
are controlling it better and better. By so doing, they reorganize their ambience
so that the probability that an i2 -event is happening given an i1 -event under
c-circumstances gets fixed more and more precisely.
4. Evolution as symbolization. Peirce calls the process in which events are becom-
ing symbolic “[: : :] that process of evolution whereby the existence comes more
and more to embody those generals which were [: : :] said to be destined, which is
what we strive to express in calling them reasonable” [6, p. 343]. The definition
of evolution given above as a process in which regularity and randomness are
so related that we may expect to be acting in a stable world, can now be
extended. Symbolization means, for Peirce, the establishment of regularities by
rule-following actions. When more and more events are becoming symbolic,
the general relation of the regular and the random is an evolving one, too.
The rules that are effective in an evolutionary process themselves are resulting
from evolution. According to Peirce, this is valid not only for the rules we are
following in our actions. The physical laws of nature, too, have been developing
18 S. Artmann
in cosmic evolution and can be changing further (see [3, p. 279]). Since Peirce
does not allow for any regularity that is not explainable evolutionarily, he must
understand the beginning of an evolutionary process as a ‘state’ of unbounded
irregularity—more precisely, as an absolutely random event (see [3, p. 278]).
5. Evolution as meta-symbolization. Yet why did the absolutely random origin of
all that really exists, start an evolutionary trend towards symbolization, i.e., ever
growing regulation? This trend must be seen as the realization of a rule that is, in
a sense, applying itself to itself necessarily, and is amplifying its own validity as
soon as it randomly starts to become effective at some time:
[: : :] in the beginning,—infinitely remote,—there was a chaos of unpersonalised feeling,
which being without connection or regularity would properly be without existence. This
feeling, sporting here and there in pure arbitrariness, would have started the germ of
a generalizing tendency. Its other sportings would be evanescent, but this would have
a growing virtue. Thus, the tendency to habit would be started; and from this with the
other principles of evolution all the regularities of the universe would be evolved. At
any time, however, an element of pure chance survives and will remain until the world
becomes an absolutely perfect, rational, and symmetrical system, in which mind is at
last crystallized in the infinitely distant future. [7, p. 297]
Thus, the self-applying meta-rule Become symbolic! asserts itself over abso-
lute randomness. The evolutionary trend towards symbolization will become
symbolized itself sometime. Of course, Peirce sees himself as the philosopher
who achieves this feat: he considers his philosophy the generally valid conception
of evolving reality, to which scientific research will contribute the details (see [3,
p. 247] and [7, p. 297]).
6. God as foundation of evolution. We can apply the pragmatic maxim onto itself
in order to test its practical consequences—so we started our exploration of
Peirce’s metaphysics. But in case of the fundamental principle of evolution
doubts might arise: how can it be self-applying and ensure its own effectiveness?
Peirce’s answer specifies a being that we are justified to ascribe such a self-
founding power to: God (see, e.g., concerning God as foundation of our evolving
knowledge of reality: “[: : :] God’s omniscience, humanly conceived, consists in
the fact that knowledge in its development leaves no question unanswered” [8,
p. 236]). The fundamental principle of evolution is, in a sense, the pragmatic
maxim writ in large that the creator of the evolving universe is following: God’s
conceptions are evolutionarily realized in creation by rational agents who are
giving God’s conceptions real meanings and are making God’s conceptions clear
by partaking in creation.
7. In God we trust ourselves as rational beings. Wherefrom do we, as rational
agents, know of the existence of God in whose creation we are cocreatively
acting? Peirce appeals to us that we bring ourselves into a state of ‘musement,’
i.e., of non-intentional, meditative observation and thought (see [9, p. 436 ff.]).
Peirce is prophesying that in such a state we conceive of the necessarily vague
hypothesis of the reality of a creative infinite being we cannot fully understand:
God. Due to that hypothesis we develop an ardent love to reality as the totality of
what has been created in a way that allows us to realize our rational conceptions
Pragmatism and the Evolution of Semantic Systems 19
Peirce’s evolutionary speculation about the reason why the pragmatic maxim is valid
ends in an appeal to experience the universe meditatively as God’s creation. Now we
may wonder and ask: does Peirce direct his religious appeal to the same addressee
as he does recommend his pragmatic maxim to?
Peirce’s metaphysics of evolution leads to an absolute version of the pragmatic
maxim: the evolution of what is real and the possibility of its being known are
based upon a creator who explicates, in his creation, the practical consequences
of the objects of his conceptions. Those consequences are nothing but evolving
reality and become known as real by rational agents. In this sense, Peirce’s idea
that the community of rational agents progresses towards true conceptions, so
that the real will have been symbolized at the end of their research, is essentially
an eschatological one: rational agents will finally know what has been, as vague
conceptions, in the mind of God at creation by conceiving, as cocreators, what is
real as clearly as possible and controlling their actions accordingly.
The original addressee of Peirce’s maxim is a rational agent who wants to
have clear conceptions. Peirce does not search for the foundation of his maxim by
applying it to itself in the scope of its procedure of making conceptions clear, i.e.,
in respect to possible actions of rational agents. Instead, Peirce’s self-application of
the pragmatic maxim proceeds by speculations that elevate conceptions needed for
the explication of the maxim into an absolute form. Peirce is not focused anymore
on the relation of our everyday and philosophical conceptions to the world in which
we act. He is not focused anymore on the not completely controllable environment
by which we are brought to think about the potential effects of the objects of our
conceptions on our possible actions. And he is not focused anymore on the causal
relations, which we explore experimentally in stretches of time we can control to
some degree, between the objects of our conceptions and our actions.
20 S. Artmann
Peirce lets himself be swept away by his absolutist leanings and addresses his
evolutionary metaphysics to an observer who is musing about the cosmos as a
whole. Such a meditative, Cartesian program of philosophy Peirce does criticize
vehemently—as he is obliged to do as a pragmatist—in earlier texts as a project
of philosophical research that is doomed to failure from the beginning (see, e.g.,
[1, p. 125 ff.], on belief in a Cartesian and in a pragmatistic sense). But when he
is realizing his own pragmatist program, Peirce is adapting, by speculating about
the metaphysical foundation of his pragmatic maxim, to a style of philosophizing
that would be quite satisfying for a philosopher in the Cartesian tradition. To its
original addressee the pragmatic maxim hands over a semantic tool that helps her
to make her conceptions clear and her actions rational. For that, it is not necessary
that the maxim gives her a guarantee that conceptions made clear, and actions made
rational, by the maxim are clear and rational sub specie aeternitatis because they
are contributing to making real, and known, the creative intentions of God.
of the indexical relation between the object and its effects on our actions. If, for
example, an action did not reach its intended aim, we change—then and now, at
least—our conceptions of the objects implicated in the action because we want to
understand what went wrong. Peirce’s pragmatic maxim is, thus, a postulate that
we should make our symbolic representations of the world clear by exploring the
indexical relations of its objects to their effects on our actions.
Dewey defines his own version of pragmatism, which he calls ‘instrumentalism’,
in a way that shows, on the one hand, how close his thought is to Peirce’s, and
indicates, on the other, where he deviates from Peirce:
Instrumentalism is an attempt to establish a precise logical theory of concepts, of judgments
and inferences in their various forms, by considering primarily how thought functions
in the experimental determinations of future consequences. That is to say, it attempts to
establish universally recognized distinctions and rules of logic by deriving them from the
reconstructive or mediative function ascribed to reason. [11, p. 9]
and culture as it has already been for the exploration of nature since Galileo and
Newton (see [13, p. 18]). Pragmatist reconstruction, “the method of observation,
theory as hypothesis, and experimental test” [13, p. 10], is the semantic instrument
by which Dewey grapples with any problem of meaning, knowledge, and reason for
the purpose of letting reason understand itself as the means of experimentation.
Peirce’s evolutionary metaphysics has the function to prove that the key insight
of pragmatism, which is expressed in the pragmatic maxim, about the intrinsic
relation of meaning and action is true. His speculations about the evolving cosmos,
however, lose sight of the primary addressee of pragmatism: the rational agent who
tries to make her conceptions clear in a world she cannot control completely by
her own finite means. Dewey, in contrast, recommends evolutionary thinking, more
precisely: Darwinian thinking, to the rational agent, who followed philosophers such
as Peirce, so that she can rediscover the specific circumstances under which she
tries to make her conceptions clear. For Dewey, Darwin’s main scientific as well
as philosophical merit is to have transferred the experimental method of modern
science to research on organisms, including human behavior and consciousness.
Darwin achieved this feat by destructing “the dogma of fixed unchangeable types
and species, of arrangement in classes of higher and lower, of subordination of the
transitory individual to the universal or kind” [13, p. 76]. The Darwinian theory
of the origin of species by natural selection explains how general forms, which
are called ‘species,’ evolve through the competitive and cooperative interactions of
individual organisms. Without falling back into simply registering a myriad of single
facts, Darwin introduced, into biology, a genetic logic of the context-dependent
formation of the general from the individual (see [14, p. 41]).
Stated as abstractly as possible, Darwin’s insight is that “[t]here is [: : :] such
a thing as relative–that is relational–universality” [13, p. 13]. Philosophy, Dewey
challenges his fellow pragmatists, has to develop as comprehensive an understand-
ing of that insight as is possible for human reason. Philosophy would thus become
the science of mechanisms through which generalizations are developed under
particular circumstances and would use those mechanisms in order to explain why
people interpret their life as they do and to give them the means through which they
can meliorate their situation according to their own preferences (see [14, p. 43 f.]).
From Peirce: Dewey adopts the general name for generative mechanisms when
phenomena of meaning, knowledge, and reason are concerned, he calls them
‘habits.’ As to humans, Dewey defines ‘habit’ as “[: : :] activity which is influenced
by prior activity and in that sense acquired; which contains within itself a certain
ordering or systematization of minor elements of action; which is projective,
Pragmatism and the Evolution of Semantic Systems 23
dynamic in quality, ready for overt manifestation; and which is operative in some
subdued subordinate form even when not obviously dominating activity” [12,
p. 31]. Habits are mechanisms that produce behavioral patterns which transcend
a particular context but are context-sensitive in the sense that they have been
developed in the life history of a human being and are thus actualized only if a
context is experienced, by that individual, as meeting certain conditions.
For Dewey, evolution of life consists in the transformation of habits leading to
more and more flexible and powerful behavioral patterns and to environments that
are controlled better and better (see [13, p. 82]). This includes also human reason,
which is not only a product of evolution but also a powerful motor of evolution, since
it is the most advanced means by which existent circumstances can be transformed
and controlled (see [15, p. 123]). All aspects of human reason—including its most
potent habit, the method of modern science (see [16, p. 99])—are evolutionarily
explainable phenomena. It does not come as a surprise that Dewey wants also to
give an evolutionary account of the invention of language, which he considers the
most important event in the history of humanity (see [16, p. 121]). Language is of
utmost relevance also to an ontogenetic understanding of individual rational agents
since linguistic structures constitute the fundamental categories of human cognition
(see [13, p. 86 f.]). Reason, or to use the designation preferred by Dewey: intelli-
gence, is nothing but the habit to regard an object as representing another one, i.e.,
as a sign or symbol (see [16, p. 170]).
In one of his chief works, Experience and Nature, Dewey tells a story of how
symbols originate through the interaction of humans. Neither do signs make up
the material traces of spiritual entities, such as Platonian ideas, nor do they result
from the expression of mental states, such as pain or joy. Things become symbols
when they are marked, by interacting humans, as having particular functions in
the performance of social behavior (see [10, p. 138]). To symbolize something
objects first must be made meaningful by human gestures and sounds that have
been becoming more than just physical events (see [10, p. 138 f. and p. 142]). How
does this come about? Here is Dewey’s story:
A requests B to bring him something, to which A points, say a flower. There is an original
mechanism by which B may react to A’s movement in pointing. But natively such a reaction
is to the movement, not to the pointing, not to the object pointed out. But B learns that the
movement is a pointing; he responds to it not in itself, but as an index of something else. His
response is transferred from A’s direct movement to the object to which A points. Thus he
does not merely execute the natural acts of looking or grasping which the movement might
instigate on its own account. The motion of A attracts his gaze to the thing pointed to; then,
instead of just transferring his response from A’s movement to the native reaction he might
make to the thing as stimulus, he responds in a way which is a function of A’s relationship,
actual and potential, to the thing. The characteristic thing about B’s understanding of A’s
movement and sounds is that he responds to the thing from the standpoint of A. He perceives
the thing as it may function in A’s experience, instead of just ego-centrically. Similarly, A
in making the request conceives the thing not only in its direct relationship to himself,
but as a thing capable of being grasped and handled by B. He sees the thing as it may
function in B’s experience. Such is the essence and import of communication, signs and
meaning. Something is literally made common in at least two different centres of behavior.
24 S. Artmann
In the terminology of Peirce’s semiotics, we can say that, for Dewey, the origin
of signs is in the symbolization, by a human being (B), of indexical relations
(pointing) between human actions (advancement of A’s arm towards object) and
physical objects (flower). ‘Symbolization’ means that B learns to consider the object
to which A has an indexical relation as being in that relation. Then the object is seen,
by B, as an object that is a means for A to involve B into A’s action so that one of A’s
ends may come into being with B’s help. Thus, the object, as a communicative tool,
has been becoming meaningful to A and B (see [10, p. 146 f.]). Being symbolized,
the indexical relation of A to the object has become more than just a physical
incident: first, A and B have developed a common understanding of the object as
tool and, second, both A and B can symbolically use the same form of indexical
relation in other contexts in order to communicate the same type of meaning to each
other. Meanings are generalizing methods of action, or habits, which philosophers
tend, as so-called ‘ideas,’ to cut off from “their naturalistic origin in communication
or communal interaction” [10, p. 149].
To characterize symbolization in terms of means and ends, Dewey uses the
concepts of the instrumental and the consummatory. An object used as a symbol
stands generically for something that does not exist yet: it is thus instrumental to
future consequences of its being used as a symbol. However, the very same object
still is a physical entity that can become, e.g., the object of esthetic perception by
which its sensual properties are consummatorily experienced in themselves (see [16,
p. 188]). The same duality of the consummatory and the instrumental applies to
social action and thus to communication, which is both a means to an end and an
end in itself: “For there is no mode of action as fulfilling and as rewarding as is
concerted consensus of action. It brings with it the sense of sharing and merging
in a whole” [10, p. 145]. Communication, by giving everyone in a community the
chance to participate in the shared understanding and common development of
habits that are socially valued, is even the highest end in itself (see [10, p. 157
and p. 159]). Dewey’s emphatic characterization of the consummatory character
of communication as a deeply meaningful “communion” [10, p. 159] lets him
write that “[i]n communication, such conjunction and contact as is characteristic of
animals become endearments capable of infinite idealization; they become symbols
of the very culmination of nature” [10, p. 157]. From here it is just one small
step to a religiously sounding praise of communication: “[: : :] when the emotional
force, the mystic force one might say, of communication, of the miracle of shared
life and shared experience is spontaneously felt, the hardness and crudeness of
contemporary life will be bathed in the light that never was on land or sea” [13,
p. 164]. Such an emphatic promise of salvation indicates a principal problem that a
pragmatist ought to have with Dewey’s evolutionary theory as naturalistic support
of instrumentalism.
Pragmatism and the Evolution of Semantic Systems 25
Dewey defines his pragmatist category of God as denoting “the unity of all ideal
ends arousing us to desire and actions” [17, p. 42]. Made clear by applying the
pragmatic maxim, his concept of God refers to the feeling that a rational agent has
when letting her actions be guided by her imagination of the unity of nature. “It is
this active relation between ideal and actual to which I would give the name ‘God’ ”
[17, p. 51]. For Dewey, we literally are evolving God if we reconstruct nature in
order to perfect it. We then feel the community of rational agents to be the driving
force of evolution. Dewey’s naturalistic holism thus becomes indistinguishable from
what may be called ‘community pantheism.’
Dewey applies his evolutionary naturalism to nature as a whole and weakens
his strategy to keep even his most abstract argumentations tightly connected to
situations in which agents try to solve problems of life rationally by their finite
means. In analogy to my pragmatist criticism of Peirce’s evolutionary metaphysics,
I propose that a pragmatist should give a negative answer to the question whether
Dewey directs his pantheistic appeal to the same addressee as he does recommend
his method of pragmatist reconstruction to. Dewey’s original addressee, a finitely
rational agent, wants to understand how she can reconstruct the environment she
lives in, according to preferences that should meet criteria of rationality. Yet the
addressee of Dewey’s holistic version of naturalism is an agent who yearns for
an unshakable feeling that unfailingly motivates her to act rationally. To such an
agent, Dewey supplies his advice to interpret social communication as a secular
ersatz of Holy Communion. If she accepts Dewey’s offer, the agent regards herself
as a tool of evolving nature. From rational reconstruction, via social fusion, to
cosmic religion: this must not be the way of a philosophical program that shall help
individual rational agents to make their conceptions clear and their actions rational,
by developing a pragmatist understanding of meaning, knowledge, and reason.
The genuine starting point of any pragmatist inquiry into meaning is the situation
where a rational agent is trying to solve problems, by her finite means, in an
environment she cannot control completely. The reason why Peirce and Dewey
Pragmatism and the Evolution of Semantic Systems 27
went wrong on their way towards an evolutionary theory of meaning is that both
philosophers tried to find some absolute point of reference to be reached from their
starting point. For doing so, they paradoxically used evolutionary reasoning, which,
we would rather suspect, proceeds towards a relativistic stance on all questions
philosophical. Peirce wanted to prove that our conceptions will lead, in the long run,
to our knowing the real since we are cocreating the rational world by experimenting
with our conceptions. The evolutionary process as realization of truth is, as a whole,
Peirce’s absolute point of reference. Dewey showed a moderated, yet analogous,
tendency: he wanted to provide his followers with an irresistible motivation to
act rationally and did so by appealing to the socio-cosmic emotion of being an
instrumentality of evolutionary nature as a whole. Yet the finitely rational agent,
whom the pragmatist philosopher tries to help solve her theoretical and practical
problems, does originally ask neither for an unshakable foundation of her rational
conceptions nor for an overwhelming motive for acting rationally.
However, Peirce’s pragmatic maxim expresses the basic method of any pragma-
tist inquiry into semantics. And Dewey’s pragmatic reconstruction is the principal
strategy for any pragmatist who sees continuity between scientific and philosoph-
ical approaches to rationality. Even beyond methodology, Peirce’s and Dewey’s
evolutionary thinking is full of important contributions to an understanding of
semantic systems. Peirce’s semiotics, for example, provides a systematic framework
for the description of sign processing. And Dewey’s insight that an evolutionary
understanding of meaning is in need of a general logic of context-dependence still
is of highest pertinence.
Yet how can we make as best a use of those pragmatist contributions to the
evolutionary theory of semantic systems as possible? By interpreting them as
abstract, yet not formalized, descriptions of basic structures that determine any
relation, theoretical or practical, of an agent to its environment. By ‘structure,’ we
mean general relational forms that are not restricted to be found in a particular
empirical field of research, but may occur in many diverse fields. A scientific
research program that analyzes a wide variety of empirical phenomena by means of
a theory of some structure is called ‘structural science’ (see [19–21]). For example,
Peirce’s structural concept of sign as general relation of representation may apply
not only to human languages but also to animal communication systems.
From a Deweyan perspective, each structural science addresses a particular
aspect of habit, which is the most general and comprehensive relation of agents and
environments, and develops formal models of that aspect with the structure of habit
serving as a vanishing point. When Dewey sketches, for example, the sensorimotor
coordination of an organism in its environment, we are reminded of the description
of a cybernetic feedback loop avant la lettre (see Dewey’s 1896 paper, The Reflex
Arc Concept in Psychology [22]). His remarks on context-dependence should be
connected with Bernd-Olaf Küppers’ project of developing a structural science
of boundary conditions, called ‘peranetics’ (see [23, p. 314]). Dewey’s general
picture of human rationality and Herbert A. Simon’s structural science of bounded
rationality deeply agree on the necessity of searching for “[: : :] relative invariants:
regularities that hold over considerable stretches of time and ranges of systems”
28 S. Artmann
[24, p. 36]. More examples of research programs of structural science that address
particular aspects of Dewey’s general concept of habit can easily be found.
Analyzing the situation where a rational agent is trying to solve problems, by
her finite means, in an environment she cannot control completely, a pragmatist
philosophy may heavily be using the methods, theories, and results of structural
sciences. If, by doing so, that philosophy tries also to make the concepts of structural
sciences clear, then I propose to call such a philosophy ‘structural pragmatism.’
One of the most important differences of structural pragmatism and Dewey’s
instrumentalism is that the former, while interpreting structural science in the light
of the relation of agents to their environment, does not interpret that relation, or
any of its aspects, in an absolutist manner. Though the interaction of agents and
their environments is structured by what Dewey calls ‘habits,’ the cosmos, as the
totality of environments past, present, and future, should not be seen as a total habit
evolving, if we want to remain pragmatist philosophers and do not want to turn into
theologians who go, a little bit too easily, over the boundary between the actions of
finite beings and the ‘doings’ of an infinite being.
References
1. Peirce, Ch.S.: How to make our ideas clear. In: Houser, N., Kloesel, N. (eds.) The Essen-
tial Peirce. Selected Philosophical Writings, vol. 1 (1867–1893), pp. 124–141. Indiana
University Press, Bloomington (1992)
2. Peirce, Ch.S.: Evolutionary love. In: Houser, N., Kloesel, N. (eds.) The Essential Peirce.
Selected Philosophical Writings, vol. 1 (1867–1893), pp. 352–371. Indiana University Press,
Bloomington (1992)
3. Peirce, Ch.S.: A guess at the riddle. In: Houser, N., Kloesel, N. (eds.) The Essential Peirce.
Selected Philosophical Writings, vol. 1 (1867–1893), pp. 245–279. Indiana University Press,
Bloomington (1992)
4. Peirce, Ch.S.: On the algebra of logic: a contribution to the philosophy of notation [Excerpts].
In: Houser, N., Kloesel, N. (eds.) The Essential Peirce. Selected Philosophical Writings, vol. 1
(1867–1893), pp. 225–228. Indiana University Press, Bloomington (1992)
5. Peirce, Ch.S.: The probability of induction. In: Houser, N., Kloesel, N. (eds.) The Essential
Peirce. Selected Philosophical Writings, vol. 1 (1867–1893), pp. 155–169. Indiana University
Press, Bloomington (1992)
6. Peirce, Ch.S.: What pragmatism is. In: Peirce Edition Project (ed.) The Essential Peirce.
Selected Philosophical Writings, vol. 2 (1893–1913), pp. 331–345. Indiana University Press,
Bloomington (1998)
7. Peirce, Ch.S.: The architecture of theories. In: Houser, N., Kloesel, N. (eds.) The Essential
Peirce. Selected Philosophical Writings, vol. 1 (1867–1893), pp. 285–297. Indiana University
Press, Bloomington (1992)
8. Peirce, Ch.S.: An American Plato: review of Royce’s ‘Religious aspect of philosophy’. In:
Houser, N., Kloesel, N. (eds.) The Essential Peirce. Selected Philosophical Writings, vol. 1
(1867–1893), pp. 229–241. Indiana University Press, Bloomington (1992)
9. Peirce, Ch.S.: A neglected argument for the reality of God. In: Peirce Edition Project (ed.) The
Essential Peirce. Selected Philosophical Writings, vol. 2 (1893–1913), pp. 434–450. Indiana
University Press, Bloomington (1998)
Pragmatism and the Evolution of Semantic Systems 29
10. Dewey, J.: The Later Works, 1925–1953, vol. 1: 1925. Experience and Nature. Southern Illinois
University Press, Carbondale (1981)
11. Dewey, J.: The development of American pragmatism. In: Hickman, L.A., Alexander, Th.M.
(eds.) The Essential Dewey, vol. 1: Pragmatism, Education, Democracy, pp. 3–13. Indiana
University Press, Bloomington (1998)
12. Dewey, J.: The Middle Works, 1899–1924, vol. 14: 1922. Human Nature and Conduct.
Southern Illinois University Press, Carbondale (1983)
13. Dewey, J.: Reconstruction in Philosophy. Enlarged edition. Mentor Books, New York (1950)
14. Dewey, J.: The influence of Darwinism on philosophy. In: Hickman, L.A., Alexander, Th.M.
(eds.) The Essential Dewey, vol. 1: Pragmatism, Education, Democracy, pp. 39–45. Indiana
University Press, Bloomington (1998)
15. Dewey, J.: Does reality possess practical character? In: Hickman, L.A., Alexander, Th.M.
(eds.) The Essential Dewey, vol. 1: Pragmatism, Education, Democracy, pp. 124–133. Indiana
University Press, Bloomington (1998)
16. Dewey, J.: The Later Works, 1925–1953, vol. 4: 1929. The Quest for Certainty. Southern
Illinois University Press, Carbondale (1984)
17. Dewey, J.: A Common Faith. Yale University Press, New Haven (1934)
18. Dewey, J.: Evolution and ethics. In: Hickman, L.A., Alexander, Th.M. (eds.) The Essential
Dewey, vol. 2: Ethics, Logic, Psychology, pp. 225–235. Indiana University Press, Bloomington
(1998)
19. von Weizsäcker, C.-F.: The Unity of Nature. Farrar Straus Giroux, New York (1980)
20. Küppers, B.-O.: Die Strukturwissenschaften als Bindeglied zwischen Natur- und
Geisteswissenschaften. In: Küppers, B.-O. (ed.) Die Einheit der Wirklichkeit. Zum
Wissenschaftsverständnis der Gegenwart, pp. 89–105. Fink, München (2000)
21. Artmann, S.: Historische Epistemologie der Strukturwissenschaften. Fink, München (2010)
22. Dewey, J.: The reflex arc concept in psychology. In: Hickman, L.A., Alexander, Th.M. (eds.)
The Essential Dewey, vol. 2: Ethics, Logic, Psychology, pp. 3–10. Indiana University Press,
Bloomington (1998)
23. Küppers, B.-O.: Nur Wissen kann Wissen beherrschen. Macht und Verantwortung der
Wissenschaft. Fackelträger, Köln (2008)
24. Simon, H.A.: Cognitive science: the newest science of the artificial. Cognit. Sci. 4, 33–46
(1980)
System Surfaces: There Is Never Just Only One
Structure
Klaus Kornwachs
Ich weiß ein allgewaltiges Wort auf Meilen hört’s ein Tauber.
Es wirkt geschäftig fort und fort mit unbegriffnem Zauber ist
nirgends und ist überall bald lästig, bald bequem. Es passt auf
ein und jeden Fall: das Wort - es heißt System.1
Franz Grillparzer (1791–1872)
1
Franz Grillparzer (1791–1872), Austrian Poet, cf. [32, GG VI 316, p. 463].
K. Kornwachs ()
Brandenburgische Technische Universität Cottbus, Cottbus, Germany
e-mail: Klaus@Kornwachs.de
Only a few examples may illustrate this point of view. In attic philosophy for
Aristotle system was a whole entity. A tragedy or the state served him as an
example.6 Here we have two meanings: ˛K expresses an active putting
together like a combination of different and equal things, whereas the Greek verb
K describes the action of unifying, ordering, putting together. In later Stoa
philosophy system was an overall concept for things used in practical sciences.
G. W. Leibniz intended with his Monadologie to give an elementary theory of
everything, every object, and every process [78]. Originally conceptualized to
answer the question of substances in ontology together with a Theodizee (how
a good God can allow or tolerate the evil within the world), all monads form a
2
The following section has been adopted and modified from a former paper, cf. [68].
3
Here we do not use the term “ontology” as it is used in the realm of computer science like in
artificial intelligence, data base technology, and others (universe of discourse). We refer to the
original term ontology as it is used in philosophy. Ontology is the teaching about being (Sein),
existing things (Seiendes), and modalities thereof.
4
Cf. for instance Heraklit: Über die Natur. Fragment 1 and 2 [15, p. 77] or Fragments 10–13
[15, p. 131f.]. For medieval thinkers take for an example Heinrich Gent [29, p. 564 ff.]. Cf. also
Hübner’s article about order (Ordnung) [50, pp. 1254–1279].
5
Like the Aristotelian book (1965) with the headline de generatione et corruptione [4].
6
Cf. [3, Poetry, Book 18, 1456a11] or in [5, Nicomachian Ethics, Book 9, 1168b32].
System Surfaces: There Is Never Just Only One Structure 33
whole of being, made stable by a god who designed the best of all possible worlds.
This pre-stabilized harmony was an order that assigned the characteristic of an
existing system to the world. Moreover the pre-designed system structures could
be explained as the laws of the whole system. Each monad is different from each
other one such that they present a kind of subsystems and G. W. Leibniz introduced
a certain kind of hierarchy between the different sorts of monads. Nevertheless no
monad could substitute another one or be removed from the best of all possible
worlds.
The reason why this example is stretched here may be given by the fact that
in such a kind of system conception the real existing world or objects have the
character of a system because they have been designed or created as systems.
Thus systems are real things and they can be distinguished from each other and all
systems form a giant supersystem that rules everything. The system characteristic is
corresponding to the design that presents the best thinkable order. Here, order and
existence coincide: order is not a perspective or result of an evaluation, but a genuine
property of existing things and of the world. This concept may serve as a paradigm
for what we call classical ontology.
R. Descartes changed the number of substances, simply reducing it to two: res
extensa and res cogitans [21]. Speaking about systems, he referred to a complete
and ordered set of knowledge. To know something about an object means to
know its shape (Gestalt), its structure, the border between object and environment,
its extension, its existence, and its individuality.7 Descartes tried to establish
philosophy as an autonomous building of knowledge. Thus in his “Principles of
Philosophy” [21] he aimed to certify, to justify, and to found philosophy as a system
by itself. For Descartes any systematization means to proceed in an ordering manner.
Then a system is the result of such a process. Firstly the laws of system are the
laws of system description. But this description can be only successful if there are
coincidences between the laws, the methods of description, and the properties of the
object described. Due to the fact that an object has the property to be a system, it can
be described as a system. Therefore a description, using the mathematical character
of nature, can be successful, because it applies the principles of mathematics.
Not the things are systems, but our concepts about them. Prepared by Descartes,
this is a dramatic change because it de-ontologizes the system concept and shifts
the concept of order from ontology to description. Later, the system of Pure
Reason has been conceptualized by Kant [53] as an epistemological prerequisite
that foregoes all possible human experience like Geometry, Aristotelian Syllogism,
or even Newton Mechanics (cf. also [52]). The system character of this precondition
for the possibilities of empirical experience lies in the order which we believe to find
by experiences. Subsequently we believe to find this order in nature and in the nature
of things as well (Fig. 1).
G. W. Hegel proposed a system of historic development ruled by dialectical laws
corresponding to his System of Philosophy [38, pp. 11–151]. Exactly here we have
7
Cf. a comprehensive summary by [112, p. 1463].
34 K. Kornwachs
Fig. 1 Corresponding coincidence of system description, method, and the characteristic properties
of the described object, according to Descartes
8
For a more extensive presentation of the history of system concept cf. [35, 102, 104].
9
Heidegger [40] pointed out that a system is encompassing all what can be deduced. It induces
structures and changes all knowledge of science, it saves it (the knowledge; the author) and brings
it to communication; it may be conceptualized as the totality of grounding relations, in which the
matter of science (as objects, germ.: “Gegenstände”; the author) is represented with respect to
its basis, and where they can be understood conceptually [40, p. 56] (transl. by the author). This
has been also explicitly pronounced by Heidegger in his famous SPIEGEL-Gespräch (1976) with
System Surfaces: There Is Never Just Only One Structure 35
Augstein [42]: What will come after Philosophy? Cybernetics. But this will not be Philosophy
anymore. (transl. by the author).
10
Like cybernetics or control theory [109], theory of automaton [88, 91], theory of syntax [20],
game theory [90], theory of communication [57, 58, 103], and general system theory [9]. All these
theories deal with state space approaches, at least homomorphism between syntax theory and
theory of automaton can be shown, discrete cybernetics are automata, and continuous automata
can be modeled as control circuits. Game theory can be modeled as automata, and the theory of
communication can be formulated either as statistics of stochastic signal processes or even as noisy
automata (i.e., communication channels, receivers, and emitters). These theories may considered
to be classical theories (see Sect. 4).
36 K. Kornwachs
equations in analogy to the equations of classical mechanics. They show the motion
of the system within its n dimensional state space along its trajectories. Using
all that, the control theory has developed the concept of behavior of a system:
it is represented by the dynamic which depends on state transition laws and
boundary conditions. If laws and boundary conditions are known, the properties of
observability, stability, controllability, and reachability can be investigated [93]. In
this concept, a structure is represented by the connections between the elements or
subsystems. In classical mechanics these connections are given by forces, in system
theory this is generalized to influences or interactions. The decisive dependence
between behavior and structure is given by the postulate that the overall behavior of
a system can be calculated in principle,11 if the structure and the basis behavior of
the elements are known.
Formally written, the system can be represented by a graph and the structure can
be expressed like in the following example (Fig. 2).12
According to this a system can be defined by a quintuple
S D f˛; ; ;
; T g: (1)
˛ D f˛x ; ˛y ; ˛z ; tg (2)
11
In this rigidity the statements hold exactly only for linear type behavior of the elementary
behaviors.
12
The structure, shown in Fig. 2, serves as an example.
System Surfaces: There Is Never Just Only One Structure 37
(D dynamic of the system) ı and the output function represent the overall behavior
of the system, written as
D fij g with i i D 1; (4)
Iterations are possible down to the wanted level [86]. Nevertheless some presupposi-
tions have to make for this quite simple framework: the structure should be constant,
i.e.,
D fij g D const (6)
as well as
˛ D f˛x ; ˛y ; ˛z ; tg (7)
must be stable during the observation or working time of the system. As a
consequence the dynamic
D fı; g (8)
doesn’t change its characteristics. If we simplify the overall behavior with the output
function as identical with
˛z ! ˛y ; (9)
then we can write the local behavior i of subsystem (or element) indexed with i as
yi D .zi ; xi / (10)
with
xi .t/ 2 f˛x g; zi .t/ 2 f˛z g; yi .t/ 2 f˛y g; zi .t/ D yi .t/: (11)
If the subsystems are connected due to a given structure
D fij g more than one xj
is influencing the i th subsystem. Writing the local dynamics i as linear operators,
we can calculate the overall behavior
K D fij g (13)
with ı as the Kronnecker symbol by solving the set of linear equations
˚X D Y ˚K Y; (14)
38 K. Kornwachs
After this more formal side of the system concept the question may arise how
and by whom the borderline between system and environment (D nonsystem) is
drawn. If we take an object we constitute a kind of a surface by observation.
Usually we can separate objects from each other, in most cases we can divide
objects in several parts as sub-objects, i.e., we can make a cut between object and
environment by an action we could call preparation. In Philosophy of Technology
this is well known as the transition from presented-at-hand (as an objet trouvé) to the
System Surfaces: There Is Never Just Only One Structure 39
readiness-to-hand (the tool).13 This constitutes an inside and an outside. The same
can be done conceptually—we may describe a system from an outside perspective,
as done for example by behaviorism and its concept of a black box, or a rather
inside perspective, using analytical methods to find out structures and elements.
Both views can be justified very well, an extended discussion can be found in terms
of exo- and endo-view [6].
Using Descartes notation of extension one could generalize the concept of
system surface to four physical dimensions. A four-dimensional extension defines
the topological spatial limitation of the extension as well as the time limitation of
existence of an object, i.e., the very beginning and the very end. There are border
areas of this concept: the atoms of Democrit are not dividable and therefore they
have no beginning and no end in time.14 Fractal objects like the Menger Sponge
or Cantor’s Dust are another example—potentially they have an infinite surface.
Looking for very large systems in cosmology or even in large technical systems, the
concept of subsystems and elements is only meaningful if they can be “reached” in
terms of interaction by the whole system in an appropriate time.
For dynamic black box systems the set of independent, i.e., influencing variables,
and dependent, influenced variables constitutes the surface of a system, together
with the constraints (initial and boundary conditions). The latter is defined or put in
practice by the preparation as a precondition of an experiment.
This is the question of how we can observe obviously or seemingly existing
surfaces, i.e., how borders between the inner and the outer side of objects and
“systems” as objects, processes or ordered set of them, can be described as systems?
(Fig. 4 and Table 1)
To solve this problem, we may make a very rough distinction between different
types of how a surface could be constituted [68].
System surfaces, recognized by observation, behave like simple objects. There
is a clear cut borderline between the box (as a synonym for an object) and its
environment. The simple square in Fig. 5 may symbolize this isolation from a
surrounding environment. The observation can be verified by stating that a selected
property (value of a variable) is changing twice if one passes the borderline,
symbolized by the arrow (like color, etc.).
13
Cf. for this distinction at [41, Sect. 15, p. 69 ff.] between a present object, found by man, i.e.,
Vorhandenes D presented-at-hand (as an objet trouvé), and a tool as Zuhandenes D the readiness-
to-hand. For the translation of this concept by J.M. Macquarrie and E. Robinson cf. [41, p. 98 f.].
14
Cf. Democrit in [15] pp. 396–405 (fragments 68 A) and p. 418 (fragment 68 A 71). Notation
according [22], Vol. 2, p. 41 ff. and p. 95 ff. respectively.
40 K. Kornwachs
Fig. 4 Outside (left) and inside (right) aspect of an information transfer through a system surface
Table 2 Left four rows: surfaces of different system types. According to the different surfaces
the transfer of material, information, and energy flow may be observed. The two rows at right: by
interest of authors of the system and by changing constraints, the definition of the initial surface
of the system in account is extended
System Inner Outer Transfer Extension Changed
type surface surface by of the given perspective
surface by are given by:
EDP, Layer Screen, Signals Ubiquitous Developer of a
compu- behind I/O- display, computinga technical
ter elements, keyboard, “system”
compiler sensors
Biology Organ Epider-, Metabolism, Transition Changing bound-
surface, mis skin diffusion, from ary conditions,
membranes osmotic individual implementation
process organism to of inner binding
ecosystemb interactions
Physics - Obser- Measure- Transition Free choice by
function vable ment from particles model building
to an ensemble
Techno- Motor Instru- Sensors, Device ) Functional
logy surface ment actors organizational interest, e.g.,
(Car) panel (force closure externalize a
transmission) given function
Organi- Procedure Legality Decisions Super- Doubts in
zation organization competence
and legitimacy
Net- Connecti- Devices Information Communica- Networks in
works, vity tion strategy, competition
nets connection
policies
General State Signals ı-, - Embedded Model building
systems ! signs ! infor- function system ) and control
mation embedding interest
system
a
Cf. for UC in [69]
b
Another extension could be considered within this context: the transition from an individual
entity to a swarm; cf. [12]
In a further step we would like to contrast two epistemological points of view, the
realism and the descriptionism.
In realism systems are existing as such, and therefore there are also objec-
tively given laws of such systems. Taking natural or artificial systems like elec-
tronic devices, physical objects, and technical devices, they all obey to natural,
System Surfaces: There Is Never Just Only One Structure 43
i.e., physical laws. Even within the realm of society and organization, systems like
societies, communities, or institutions do exist in a plain ontological manner [83].
In the context of descriptionism, systems do not exist as an outer object (Gegen-
stand), but systems are descriptions of selected facts, properties, and processes.
There are laws according to the circumstance that the facts, properties, and processes
can be understood, and there are laws of the system description, dependent on the
chosen language of description, like mathematics, grammar, logic, control theory,
or cybernetics. Thus, natural or artificial systems like electronic devices, physical
objects, and technological devices are described as a connected and wired net of
electronic elements, as physical objects or as techno-organizational systems. They
are modeled for example with the help of the equation in Chap. 1.2 or by a state
space approach with coupled equations or by event oriented simulation models.
Even societies, institutions, or organizations are modeled in analogy to the known
laws in biology, physics, or control theory.
It is clear that descriptionism makes no statement about reality and ontology. The
modeling approaches use knowledge about the field in which the Gegenstand should
be modeled as a system. To use physical knowledge to find appropriate state space
equations or to use cybernetic knowledge to find the appropriate control equation is
not yet a statement that the Gegenstand obeys physical laws, but a modeling strategy
using obvious or presumed analogies. In other word: who is writing a model of a
Gegenstand is the author of the system. Each system has an author [59].
Let us unfold the position of descriptionism more in detail. A good test is the
question: what is an open system?
If we had, according to Kant, a pre-theory we could observe such systems—
openness and systemicity (as the property to be able to be a system) would be the
necessary forms of Anschauung (i.e., forgoing all experiences) in order to be able to
make system experiences.
A potential answer we could get from thermodynamics: if one is able to describe
the constituting parts of a systems in terms of non-equilibrium thermodynamics
(NEThD), then it can be calculated like a NEThD. Some characteristics can be
explained like the difference between the inside and outside entropy. Other traits like
reduplication or growing and shrinking cannot be explained in terms of NEThD.
The approach of constructivism, put forward by Heinz von Foerster and others
[25, 26], states that any “thing” we can observe is calculated by the brain, based on
sensory data and already available models or model like imaginations. The question
of true or false is not relevant—the justification and the validation of the model are
at least given by its successful application with respect to the respective survival
44 K. Kornwachs
function for the “user.” Nothing is said about the truth of the model, only about its
successfulness.
The descriptionism tries to modify the strict position of constructivism: each
attempt to model a phenomena or a Gegenstand is constrained by the purpose of this
modeling like control, explanation, forecast, simulation, or technical production.
Each model process draws a borderline between the parts belonging to the system
and other entities not belonging to it. Such borderline constitutes a surface of the
system. This surface can be conceptualized as an expression of the interests of the
model builder about his Gegenstand. In this case there is no another one. Thus each
system as system description has an author who pursues his interests when dealing
with his system description.
This is obvious in system theory. Within the context of Philosophy of Science
this authorship should be recognized as a basic trait of scientific modeling. In other
words: system theory is a certain kind of Philosophy of Science. Any general system
theory is a tool for the describing, modeling, and mathematizing interesting fields
of objects.
Selected fields of objects which we describe as systems seem to behave like
separate entities (say objects with surfaces). Normally we talk about them as they
would be existing systems with an ontological status. Nevertheless we should take
into account that it is our choice of description types (or say model types) which
generates the condition under which a system description is adequate with respect
to our interest in the field of objects. One could accept reality, i.e., for circumstances
independent of us, as a hypothesis, but without a modeling scheme we cannot make
any observational or descriptive statements. According to Kant each observation
presupposes basic concepts, stemming from a preliminary description resp. pre-
theory. Thus system laws are no natural laws but laws of description. Laws of
description are always formal laws.
We conclude that a system description is not only given mathematically a priori,
so to say as a mere specification of set theory.15 There is of course an actual relation
between the level of description and the level of objects, since each modeling uses
knowledge about the field of objects.
If we conceive the inputs and outputs of an observable system in terms of
information, i.e., if we model the input and output variables as representing
information acting between the objects, conceptualized as systems, then we can
correlate changes within the field of objects with changes of system descriptions.16
15
The set theoretic definition of a system is given according to [86, p. 254 ff.]. A general system
is a relation on abstract sets S 2 XfVi I i 2 I g with X as the Cartesian product. I is the index
set. If I is finite, it is S 2 V1 V2 : : : Vn ; the components of the relation Vi are called objects of
systems. They can be divided into the input variables X 2 fVi I i 2 IX g and the output variables
Y 2 fVi I i 2 Iy g such that an input/output system can be written as S fX Y g. The definitions
in Chap. 1.2 can be regarded as specifications of this concept.
16
Let us presuppose that some modifications in the mathematical apparatus of system theory
are possible. These modifications are discussed in [87, 108], and [61] with respect to numerous
applications and disciplines.
System Surfaces: There Is Never Just Only One Structure 45
Due to this, system theory loses its pure mathematical character and becomes a
tool and part of an empirical science. As such it is subject of a transcendental-
pragmatic procedure of foundation,17 which conceives system design and modeling
as a question of practical philosophy as well as the use of system descriptions in
order to control the relevant field of interest.
One could argue that there is a contradiction between the two positions,
developed above: on the one hand, system is a construction using mathematical
tools. On the other hand system theory is an expression and part of empirical
science.
As far as we know the method of a description and its elementary units, the
formula of statements, similar to mathematics and logic, cannot be found by
empirical data. They are analytical judgments a priori according to Kant.20 This
was at least the saying of the nineteenth and beginning twentieth century: as far
17
Structurally seen, a method to found a system of rules, i.e., for moral philosophy, in a pragmatic-
transcendental way (acc. to [2]) is coextensive with a method of foundation for rules of the use
of a system. More in detail: any pragmatic interpretation of scientific law leads only to a rule
of action, if the validity of the law and the acceptance for the necessity of existence thereof are
unanimously accepted as an a priori in the realm of a dispute procedure. If, when designing a
system model, validity and acceptance cannot be denied without contradictions, then one has a
constitutive presupposition for the design itself.
18
For this section cf. also [63, Chap. I.4.].
19
Oral communication by Walter v. Lucadou, 1974.
20
This is not very original, but it used to be forgotten quite often. The relevant passage
at Kant in Critique of Pure Reason: Philosophical cognition is the cognition of reason by
means of conceptions; mathematical cognition is cognition by means of the construction of
conceptions. The construction of a conception is the presentation a priori of the intuition which
corresponds to the conception. For this purpose a nonempirical intuition is a requisite, which,
as an intuition, is an individual object; while, as the construction of a conception (a general
representation), it must be seen to be universally valid for all the possible intuitions which rank
under that conception. [“Die philosophische Erkenntnis ist die Vernunfterkenntnis aus Begriffen,
die mathematische aus der Konstruktion der Begriffe. Einen Begriff konstruieren heißt: die ihm
korrespondierende Anschauung a priori darstellen. Zur Konstruktion eines Begriffs wird also
eine nicht empirische Anschauung erfordert, ohne folglich, als Anschauung, ein einzelnes Objekt
ist, aber nichtsdestoweniger, als die Konstruktion eines Begriffs (einer allgemeinen Vorstellung),
Allgemeingültigkeit für alle möglichen Anschauungen, die unter denselben Begriff gehören, in der
Vorstellung ausdrücken muss.”] (cf. [53, B 741], translated by J. M. D. Meiklejohn; cf. http://www.
gutenberg.org/cache/epub/4280/pg4280.html).
46 K. Kornwachs
21
Einstein has paraphrased this Kantian position in his famous lecture for the Prussian Academy of
Science about “Geometry and Experience”: As long as mathematical propositions refer to reality
they are not certain, and as long as they are certain [well proved], they don’t refer to reality. Cf.
[23, p. 119 f.] (transl. by the author).
22
See textbooks for orbital calculations [11].
23
These conditions are not valid, if the concept of derivability and the concept of conclusion (i.e.,
the concept of truth and provability) don’t coincide anymore. This is the case in higher level calculi,
proved by Gödel [31]. This leads to a lot of restrictions for system descriptions when dealing with
complex systems that must be modeled by such higher level calculi. Cf. [63, Chap. III.2.].
24
See above; for [17, Part. Sect. 9–11], this was a pseudo-problem of philosophy.
25
Heisenberg [45, p. 92] reported about an entretien with Albert Einstein in the year 1927.
According to that, Einstein said: From a heuristic point of view it may be useful to remember
what we really observe. In principle it is completely wrong to ground a theory on only observable
entities. The reality runs the other way around: It’s just the theory which decides what you can
observe. (Transl. by the author). Later, Heisenberg adopted this point of view: We have to remember
that what we observe is not nature [here] itself, but nature exposed to our method of questioning.
Cf. [44, pp. 58, 41].
System Surfaces: There Is Never Just Only One Structure 47
The choice of the concept corresponds to the definition of the borderline between
the inside and outside of the system. The measurement device is described in
terms of systems. The property constituting the measurement device is the ability
to produce information ruled by the concept. This property is invariant from its
physical realization; it is rather a function.26
Since the choice of the concept constitutes instrumentally what will be measured,
i.e., by which concept system and information are ruled, this choice also determines
inasmuch a system description will lead to empirical consequences. That means
that this choice determines whether it is possible to transform the description into
a physically realizable, instrumentally mediated function. In general the use of our
language neglects this difference and calls the measurement device as part of the
field of objects as well as the process to measure as “system.”
Indeed one is forced to make a choice on the level of description due to the
empirical consequences. The protocol statements (according to [16, p. 232 f.]).
containing empirical terms, which can be formulated on the level of objects by
the help of observation, can only be semantically valid, if they are interpreted on
the level of description.27 Beyond the level of description of the system object,
measurement device, and measurement there are statements possible on formal
structures of description–these are mathematical and logical statements with a priori
characteristics. A reduction from this meta-level to the level of description or the
level of objects is not possible due to reasons discussed in Philosophy of Science.28
Therefore there are no a priori statements within a field of objects of a system
theoretical description.
System theory is a construction if it is operating on a conceptual level, i.e., if it is
used for description. Thus system theory expresses a theory of possible description
forms with respect to available concepts.
Possible forms of observation are coextensive with possible structures.29 Struc-
ture is an extensional concept, not a concept of observation.30 System theory could
therefore be a good candidate for a theory of possible structures.31 Thus system
theory remains on the level of description. Nevertheless system theory may become
26
The concept of function has not yet made explicit. Nevertheless it is comprehensible in terms of
a mathematical function as a mapping; cf. [60].
27
This is expressed by the so-called Ramsay Theorem; cf. [16, pp. 246–254].
28
For reductionism see also [82] and [63, Chap. III.4.].
29
This is due the fact that we are used to design concepts. Adopting the picture of Klix [56, pp. 536–
549], it is clear that each observation based science presupposes the possibility of invariants.
Invariant properties can be characterized by a structural difference to what one can observe.
Each object seems to be higher structured as its environment—this is what an observer states if
he speaks about an object. What can be observed potentially from the forms of observation are
possible structures of objects. These and only these structures correspond with the possible forms
of observation. This is the reason to use the term “coextensive.”
30
This is not a concept which could be interpreted as an operative measurement rule
(Messvorschrift). To each observable there is a concept, but not vice versa, cf. [16, p. 59ff.].
31
Here structure has the meaning we ascribed to it in Sect. 1, Fig. 2.
48 K. Kornwachs
During the twentieth century new phenomena has been discovered which forced to
leave the system theoretical paradise with linear relations and normal state space
approaches.
Thus in control theory the non-interaction-free observer is well known. Here
the change of an observed behavior due to the interaction between system and
observer is described. This induces a new structure of state space approach for the
description of the observed system. One may find this in textbooks of control theory:
it is flabbergasting to discover here dual state spaces usually applied in quantum
mechanics.35 From a phenomenological point of view one has the same effect—
the microscopic particles like electrons are so small that their physical behavior
32
The concept of preparation is discussed intensively with respect to quantum mechanical
measurement. The preparation of the constraints is an ontic presupposition for the availability
of a system and therefore for an experiment. On the level of description we have the definition of
the constraints as initial and boundary conditions, together with the dynamics of a system in order
to make predictions.
33
As proposed by Weizsäcker [107, p. 259].
34
This is no contradiction to the saying above according to which system theory could be a theory
of possible structures. A measurement device presupposes an elementary structure, realized in
an apparatus that is defined by the concept, which is the base of the observed variable. Thus an
observed structure must have a relation to the elementary structure of the measurement device.
This relation is contained within a theory about the field of objects in question.
35
Cf. [93, Chap. 4, particularly Chap. 4.4, pp. 233–271]. In quantum theory Hilbert spaces are
applied. When controlling systems non-interaction-free observed, dual spaces like Hilbert spaces
System Surfaces: There Is Never Just Only One Structure 49
are applied, too, e.g., for linear time discrete systems. Even the problem of nondistinguishable state
is discussed there.
36
Cf. Heisenberg’s original paper [43]; with respect to this see also the canonical textbook of
Landau and Liefschitz [76, vol. 3, Sect. 16, p. 49ff.].
37
Nevertheless, an estimation of these properties with observers free from interaction is also
restricted when dealing with nonlinear systems. For exact definitions see textbooks of control and
mathematical system theory. For illustration here some verbal definitions: Controllabilty is given,
if it is possible, to “move” the system through the state space from a defined start state into a finite
state (or area if states) of interest in a finite time by applying an input. Stability is the property of
the system to remain at or come back to a finite state or area of states asymptotically, invariant
from a given input. Reachability: The system behavior has an area or points within the state space
that can be reached by a given input from an arbitrary starting point in a finite time. Feed-forward
(or feed-through): the system output can be controlled to a given degree directly by the input. To
this degree the system can be considered as a channel for the input. Observability means that one
can estimate the starting point of a system by measuring the input and the output dynamics. Cf.
[80, 93]. A real theoretically satisfying formalization of the control problem has been given by
[8, 51, 97]. For history cf. also [111].
50 K. Kornwachs
38
The picture of incompressibility of an algorithmic description of complex processes has been
developed independently by Kolmogorov and Chaitin [18, 19, 58].
System Surfaces: There Is Never Just Only One Structure 51
knowledge of structure and elementary behavior, unless the new observed behavior
doesn’t contradict the theoretical description of system in a retrodictive way.39
Nevertheless the generation of new structures from chaotic states of motion
(within state space), i.e., the generation of attractors,40 has led to exaggerated
pretensions. Particularly when these models have been applied to other fields of
interest beside objects of natural sciences or pure abstract mathematical problems,
there have been a lot of disappointments and misleading interpretations.
The development lines indicated in Table 4 are using the system-theoretical
description in a same non-classical way: normally systems in AI are large computer
programs, performing manipulation of data and symbols and operating with data
sets which are structured by diverse relations. These operations are ruled by finite
class logic, using predicates. Often logic calculi are called systems, too. It has
been shown that the application of pure deductive methods in this field is not very
successful when trying to generate an answer for a certain problem using knowledge
bases.
There are two possibilities to overcome this obstacle: A so-called default logic
has been developed, with which one can formalize general rules and exceptions
and derive reliable, i.e., logical true statements from them (e.g., all birds are flying
beside Kiwis). Another way is not very pleasant: some inference machines as a
part of so-called expert systems, performing conclusions from statements contained
in a knowledge base, include also a syllogism called abduction.41 This syllogism
is logically wrong, but applied very often in every day arguing. It serves as a
plausibility argument and it is also used in technical sciences, supporting to find
functional hypotheses and technical means (cf. [49], especially [28, p. 280 ff.]). This
leads to an uncertainty, if one considers the reliability, i.e., the derivability of those
consequential statements as answers to questions posed. Therefore one can only talk
about an uncertain plausibility of the results of expert system procedures.
Also divergent from the classical concept of system the disciplines, known under
the name cognitive science, use the term “system.” Herein included are robotics,
processing sensory data in order to produce higher level patterns of behavior, as well
as neuro-physiologically oriented models of neuronal nets, simulated by physical
models (like spin glasses). It became clear that the usual concept of programming
(i.e., to write an algorithm) cannot be applied directly in neuronal nets, since the
timely development of the net states cannot be predicted being in the learning
phase. Sometimes the concept of emergence is used. It is possible to find models
that make understandable the appearance of attractors, but time and location cannot
39
A critique of the concept of emergence is discussed in [67] and [70].
40
Attractors are points or finite areas within the state space, in which the system moves and remains
asymptotically. Therefore systems with attractors are stable, but not necessarily controllable, since
stability and reachability is only a necessary but not sufficient condition for controllability. See
also textbooks about control and system theory.
41
In contrast to the modus ponens .Œ8x.A.x/ ! B.x//^A.e/
! B.e//, the abduction concludes
from the particular to the general: .Œ8x.A.x/ ! B.x//^B.e/
! A.e//. The proposition logical
expression .Œ.a ! b/ ^ b
! a/ is falsifiable, i.e., not always true.
52 K. Kornwachs
be predicted. There is no wonder that the concept of emergence and the non-
classical phenomenology has offered a seducing terminology in order to explain
badly understood phenomena42 in social or psychical systems. Therefore we would
exclude here the so-called sociological system theories after Niclas Luhmann and
other theories of action, since they actually use the system theoretical vocabulary
in a metaphorical purpose only. This doesn’t lead to real models which could be
understood or criticized in a formal, mathematical, or system theoretical modus
(according to Tables 3 and 4 both rows left). In the realm of the scientific community
in Humanities and Social Sciences the term “system theory” is nearly exclusively
used in terms of Luhmann Theory; this development branch has been also included
in Table 4.
A presupposition for a classical system definition is the robustness of a system—
its structure, behavior, and surface should be constant during the observation or use
(like simulation or control processes).
All this new phenomenon share a common ground. The non-interaction-free
observer with its alteration of behavior observed forces a new state space structure
for description. In chaotic systems the dynamic equation have no analytic solutions,
only recursive procedures. The behavior depends extremely upon initial conditions;
nothing can be said about stability. New structures are appearing from unordered
relations, e.g., the generation of attractors, and we cannot conclude anymore from
structure and elements to the overall behavior. All these changes can be related either
to the change which we observe in contrast to what we expect (forecast and protocol)
or to a change of the frame of observation, i.e., we have changed our view on the
system and therefore the surface of the system. The changes appearing to us within
the field of objects are due to the change of perspective on the level of description
like the change of drawing the borderline between system and environment, a
change in constraints and preparation or an altered embedding into the realm of
other co-systems.
As an example a self-changing system within the realm of endo-view may serve:
it can be described within the realm of exo-view as a mechanical problem (see
Fig. 8).
As a first approach we could describe the change of concept: a classical system
has a constant structure
D fij g D const. (17)
and the dimensions of the control
˛ D f˛x ; ˛y ; ˛z ; tg (18)
remain timely constant. Moreover the dynamics D fı; g doesn’t change its
characteristics.
42
For badly understood phenomena one is not successful to reduce them to law-like principles
within the realm of a theory.
System Surfaces: There Is Never Just Only One Structure 53
P
#
Basic questions Formalization of Understanding and Understanding of
knowledge, designing of social systems
simulation of cognitive processes
cognitive processes
Authors Winston [110] McCulloch [84] Parson [95]
Newell, Simon [92] Palm [94] Luhmann [83]
Weizenbaum [106]
i.e., the commutator has the physical dimension of action. (For more details see
also [65].)
It has been discussed in quantum theory in the middle of 1980s in the last century
whether the quantum mechanics is a axiomatic reduced system theory.43 Taking
43
I refer here to several oral communications in talks with F. H. Zucker in Cambridge, M.A. in
1981 and 1982. Cf. also some hints in [113] and presumably in [114]. I have not yet found an
extensive exploration of this hypothesis within literature, but this could be an interesting research
question.
System Surfaces: There Is Never Just Only One Structure 55
the standpoint of descriptionism the meaning of this conjecture is that the question
whether the quantum world per se would be a world by chance is no more relevant.
Quantum mechanics as a system-theoretical non-classical description of the micro-
world allows only a description on this level up to probabilities. Nevertheless the
probability distribution can be predicted very exactly (spectra of possible values of
observables and he probability distribution when measured frequently). We have to
do it with lack of information in principle.
Further on there are field of objects in which a system-theoretical description
forces us to introduce non-classical, i.e., complementarily related observables.
Table 5 shows some examples from diverse fields of objects.44
Examples for a classical theory are the descriptions of classical mechanics,
wave optics, electromagnetism (Maxwell’s equations), general and special theory
of relativity, theory of linear circuits, linear cybernetics like control theory without
influencing observer. A classical theory can be characterized by the properties
locality (L), determinacy (D), and predictability (P ). In terms of descriptivism
these are properties of the system description and not properties in the field of
objects.
The principle of locality could also be named the principle of complete distinc-
tion, since one can always decide whether an object as a part of a system has a
certain property or not. Further on it is always decidable whether the whole system
is in a defined state or not. The states can always be separated from each other, and
the system is only in one state at the same time.
A typical violation of L can be found in quantum theory and the attempt to
remove it by the theory of hidden variables [10]. The non-locality in quantum theory
leads to the fact that a localization of an actual state of the system in the state
space is not possible anymore. The theory of hidden variables tries to avoid that
by arguing that the collapse of the quantum mechanical wave vector into a defined,
later measured state is determined completely by variables which are independent of
time and space and which are not available for the system description in principle.
The principle of determinacy D assumes that each event has at least one cause
(principle of sufficient causation), and each cause delivers at least one action. A
violation of this principle we call chance. It can be interpreted in three ways:
1. There is a contingent, i.e., not necessary lack of information about all sufficient
causes, like the Brownian motion of molecules, which could be removed in
principle. If one states that
1 3
m v2 D k T (21)
2 2
44
A detailed discussion of such complementarily related observables has been given by Röhrle
[99]. He tried to show the hypothesis that the appearance of complementarily related entities within
a system description indicates sign that the field of interest is not yet satisfyingly understood.
56 K. Kornwachs
45
This has been shown by Kornwachs et al. [63, 71, 72] using a measure of complexity which is
defined by the degree of time dependence of system structure.
System Surfaces: There Is Never Just Only One Structure 57
any prediction will become impossible. This holds for the case as well, that the laws
are incompletely known. In other words: chance and chaos are unpredictable by
definition.
After this short discussion it is possible to arrange the different degrees of violation
of the properties L, P , and D and to get a “measure” for the violation of degree of
being “classic.”
The theories resp. system descriptions are listed in Table 6. The comments therein
refer to the different forms of “non-classicity.” In cases (1) and (2), a system surface
could be defined according to the determining (stochastically or not) variables and
independent variables. In cases (3) and (4) the borderline between quantum objects
and observer (including the classical measurement device) can be shifted—this is
named the Heisenberg Cut.46 Synergetic or chaotic (i.e., nonlinear) systems (5)
can be described with defined surfaces (as physical systems) like nonequilibrium
thermodynamic systems (6), since the generation of new structures is due to the
change of the decisive (control) parameters which can be regarded as constraints.
In case (7) Roger Penrose has introduced the distinction between weak and strong
determinism [96, p. 422]. In the case of strong determinism it is postulated that the
whole history of universe is fixed to all times. This can be expressed by the picture
of Feymann-graphs of elementary processes in particle physics (physical states and
time space). Even Albert Einstein was a proponent of this strong determinism;
nevertheless he didn’t see any contradiction to the free will of man. Also the Many
World Interpretation of Quantum Mechanics may be counted to the class of strongly
deterministic theory. The weak determinism is characterized by the assumption that
the dynamics of the world is not fixed for all times but determined by preceding
and present events. That is, an actual process determines the succeeding process
iteratively, but each process has some uncertainties. The determination of the
succeeding process is highly sensible with respect to his uncertainty. Thus the
possibility to define a surface remains uncertain, too.
In case (8) we have the strongest violation of the LPD-properties. With respect
to calculability it is possible to make a cross characteristic like in Table 7.
We may shortly discuss these four fields:
Machines
The paradigm of a machine is based on even this classical system description.
The usual dynamics for continuous variables (or observables) is given by the
equations of motion, as known in classical Hamilton dynamics. It has been proved
46
Cf. [46]. For modern discussion see, e.g., [7, 13].
58 K. Kornwachs
Non-algorithmic
(a) Ordinary chance
may serve, given by n particles with fixed distances. If the differential forms are not
able to be integrated, we have non-holonome constraints.
In Newtonian mechanics we have a six-dimensional phase space, since the
velocity
dx
(23)
dt
of a particle is independent of the location vector—also expressable as a continuous
state space—x, and the acceleration
d 2x
(24)
dt 2
depends only on a function of the phase (or state). The equation of motion is
written as
d 2x dx
D F ; t; x : (25)
dt 2 dt
The function F expresses the respective forces, divided by a constant factor (in
classical mechanics it is the mass m).
Having holonome constraints one can reduce the dimensionality of the state
space to n D 2. With non-holonome constraints it is always locally possible to
eliminate the velocity
dx
vD : (26)
dt
Partially constrained systems are called according to Haken [36] synergetic systems,
whereas the maximal possible number of constraints leads to the component of the
velocity vector
dxi
D fi .x1 ; : : : ; xn / with i D 1; : : : ; n (27)
dt
particles and n possible constraints. Rosen [101] has considered such maximal
constrained systems as an explicit possibility to characterize a machine as a physical
system.
60 K. Kornwachs
Rosen [101] further showed that for a maximal constrained system with a given
set of parameters, describing the coupling with external forces (like masses and
others) as
dxi
D fi .x1 ; : : : ; xn ; ˇ1 ; : : : ; ˇn / (28)
dt
with a parameter ˇ 2 B from the parameter space B, for a given state x 2 X ,
each ˇ 2 B defines a single tangential vector of a tangent space T .X / at the point
x. Therefore such parameters ˇ can be conceptualized as control or input for a
machine. The task of a control is to find an appropriate x.t/ 2 T for a given ˇ.t/.
The main question is: what the system is doing when influenced by a given input or
control. Another question is to find an appropriate control ˇ.t/ such that X.t/ may
have some defined or aimed optimal properties.
The decisive point by Rosen [101] is to use the locality of the system. The
equation above, written in vector presentation
dx
v.t/ D D fi .x; ˇ/ (29)
dt
can be regarded as a local relation, i.e., it is only valid within the neighborhood of
a state in which the system actually can be found. This state is such that for the
respective components of the parameter vector (i.e., the control ˇ) holds
In this case, the control vector ˇ may contain further parameters independent of
time and the actual state. Let be be such a parameter vector, we can write
which is invariant with respect to time and state. This program could be interpreted
as a surface of such a device when described as a system.47
47
Nevertheless, it is not yet clear how the interaction between variables dependent upon time and
location and invariant variables can be explained in an ontologically satisfying manner. From a
phenomenological point of view one can model this within a system description if one has an idea
how this interaction could be hypothesized. Cf. [64, 101].
System Surfaces: There Is Never Just Only One Structure 61
Ordinary chance
In this case one is forced to apply statistical descriptions and to calibrate the
probability distributions with experiences. In order to define probability at all one
has to limit the number of possible states, items, or events; otherwise one is not able
to normalize the probability distribution to
X
n
p.xi / D 1 : (33)
i D1
48
It is really astonishing that processes with changing structures and patterns of behavior like decay
and growth cannot be modeled satisfyingly with usual system theoretical descriptions, based on
elementary set theory and mappings. This would require designing sets mathematically, which
can generate or annihilate their own elements without a choice process from another finite set, of
already “stored” elements. There is a correspondence to this problem with the fact that there is no
real calculable GOTO in the theory of algorithms.
49
To introduce a wave function in quantum mechanics is an actual strategy to do so. As a state
is no observable, but it obeys a linear “equation of motion,” the well-known Schrödinger equation.
Even mathematical operations on this wave function will allow a calculation of the possible values
(eigenvalues) of observables.
62 K. Kornwachs
References
29. Gent, H.: Summa Quaestionum Ordinarium, vol. 2 (reprint). Franciscan Institute Press,
Bonaventura (1953)
30. Glansdorff, P., Prigogine, I.: Structure, Stabilité et Fluctuations. Masson, Paris. English:
Thermodynamic Theory of Structure. Stability and Fluctuations. Wiley, London (1971)
31. Gödel, K.: Formal unentscheidbare Sätze in der principia mathematica. Monatshefte für
Mathematik und Physik 38, 173–198 (1931)
32. Grillparzer, F.: Epigramme. Hanser, München (1960)
33. Habermas, J.: Erkenntnis und Interesse. Suhrkamp, Frankfurt/Main (1971)
34. Habermas, J.: Theorie der Gesellschaft oder Sozialtechnologie? Einleitung. In: Habermas, J.,
Luhmann, N. (eds.) Theorie der Gesellschaft oder Sozialtechnologie, pp. 142 ff. Suhrkamp,
Frankfurt/Main (1971)
35. Hager, F.-P.: System. I. Antike. In: Ritter, J., Gründer, K. (eds.) Historisches Wörterbuch der
Philosophie, Bd. 10, pp. 824–825. Wiss. Buchgesellschaft, Darmstadt (1998)
36. Haken, H.: Synergetics. An Introduction. Springer, Berlin (1978)
37. Haken, H.: Synergetics and the problem of Selforganization. In: Roth, G., Schwegler, R. (eds.)
Selforganizing Systems. Campus Verlag, Frankfurt (1979)
38. Hegel, G.W.F.: Phänomenologie des Geistes. Vorrede (Bamberg, Würzburg, 1807), pp. 11–15.
Ullstein, Frankfurt/Main (1970)
39. Hegel, G.W.F.: Grundlinien der Philosophie des Rechts. Vorrede (Berlin, 1821). In:
Hauptwerke in sechs Bänden, Bd. 5. Meiner, Hamburg (1999)
40. Heidegger, M.: Identität und Differenz. Neske, Pfullingen (1957)
41. Heidegger, M.: Sein und Zeit, 11. Aufl. Niemeyer, Tübingen (1967). Engl.: Being and Time;
translated by J.M. Macquarrie and E. Robinson. Blackwell, New York (1962), Reprint 2005
42. Heidegger, M.: Nur ein Gott kann uns retten. Der Spiegel 23, 209 (1976)
43. Heisenberg, W.: Über quantentheoretische Umdeutung kinematischer und mechanischer
Beziehungen. Zeitschrift für Physik 33, 879–893 (1925)
44. Heisenberg, W.: Physik und Philosophie. Hirzel, Stuttgart (1962); Engl.: Physics and
Philosophy: The Revolution in Modern Science (1958). Lectures delivered at University of
St. Andrews, Scotland, Winter 1955–1956. Harper & Broth, New York (1958), Reprint 2007
45. Heisenberg, W.: Der Teil und das Ganze. Piper, München (1969)
46. Heisenberg, W.: Ist eine deterministische Ergänzung der Quantenmechanik möglich? (1935).
In: Von Meyenn, K. (ed.) Wolfgang Pauli: Scientific Correspondence with Bohr, Einstein,
Heisenberg, vol. II, pp. 409–418. Springer, Berlin (1985)
47. Helbing, D., Farkas, I., Vicsek, T.: Simulating dynamical features of escape panic. Nature
407, 487–490 (2000)
48. Hoare, C.A.R.: An axiomatic basis for computer programming. Comm. ACM 12(10), 576–
580, 583 (1969)
49. Hubig, C.: Mittel. Transcript, Bielfeld (2002)
50. Hübner, W.: Ordnung. II. Mittelalter. In: Ritter, J., Gründer, K. (eds.) Historisches Wörterbuch
der Philosophie, Bd. 6, Sp. 1254–1279. Wiss. Buchgesellschaft, Darmstadt (1984)
51. Kalmann, R.E.: On the general theory of control systems. In: Proceedings of IFAC Congress,
Moskau, Oldenburg, München (1960)
52. Kant, I.: Allgemeine Naturgeschichte und Theorie des Himmels oder Versuch von der
Verfassung und dem mechanischen Ursprung des ganzen Weltgebäudes, nach Newtonschen
Grundsätzen abgehandelt (Königsberg, Leipzig, 1755). In: Gesammelte Schriften, hrsg. von
der Kgl. Preu. Akad. der Wissenschaften. Georg Reimer, Berlin (1910)
53. Kant, I.: Kritik der reinen Vernunft. Riga 1781, Bd. 37a. Felix Meiner, Hamburg (1956)
54. Kant, I.: Kritik der Reinen Vernunft. Riga 1978. In: Werke. (Akademiausgabe), hrsg. von
Wilhelm Weischedel, Norbert Hinske. 6 Bde. Suhrkamp, Frankfurt am Main (1968)
55. Klir, G.J.: An Approach to General System Theory, p. 282. Van Nostrand, New York (1969)
56. Klix, F.: Information und Verhalten. VEB Deutscher Verlag der Wissenschaften, Berlin (1973)
57. Kolmogorov, A.N.: Foundation of Probability. Chelsea Publishing, New York (1950)
58. Kolmogorov, A.N.: Three approaches to the quantitative definition of information. Prob.
Inform. Transm. 1, 1–7 (1965)
64 K. Kornwachs
59. Kornwachs, K.: Technik impliziert ihre Verwendung. Bild der Wissenschaft, Heft 9, 120–121
(1980)
60. Kornwachs, K.: Function and information - towards a system-theoretical description of the
use of products. Angewandte Systemanalyse 5(2), 73–83 (1984)
61. Kornwachs, K. (Hrsg.): Offenheit - Zeitlichkeit - Komplexität. Zur Theorie der offenen
Systeme. Campus, Frankfurt (1984)
62. Kornwachs, K.: Utility and costs of information. A systematical approach to an information-
cost-estimation. Angewandte Systemanalyse 5(3–4), 112–121 (1984)
63. Kornwachs, K.: Offene Systeme und die Frage nach der Information. Habilitation Thesis,
University of Stuttgart (1987)
64. Kornwachs, K.: Complementarity and cognition. In: Carvallo, M.E. (ed.) Nature, Cognition
and Systems, pp. 95–127. Kluwer, Amsterdam (1988)
65. Kornwachs, K.: Information und der Begriff der Wirkung. In: Krönig, D., Lang, R. (Hrsg.)
Physik und Informatik – Informatik und Physik. Informatik Fachberichte, Nr. 306, S. 46–56.
Springer, Berlin (1991)
66. Kornwachs, K.: Pragmatic information and system surface. In: Kornwachs, K., Jacoby, K.
(eds.) Information – New Questions to a Multidisciplinary Concept, pp. 163–185. Akademie,
Berlin (1996)
67. Kornwachs, K.: Pragmatic information and the emergence of meaning. In: Van de Vijver, G.,
Salthe, S., Delpos, M. (eds.) Evolutionary Systems, pp. 181–196. Boston, Kluwer (1998)
68. Kornwachs, K.: System ontology and descriptionism: Bertalanffy’s view and new develop-
ments. TripleC 2(1), 47–62 (2004)
69. Kornwachs, K.: Vulnerability of converging technologies – the example of ubiquitous
computing. In: Banse, G., Hronszky, I. (eds.) Converging Technologies, pp. 55–88. Edition
Sigma, Berlin (2007)
70. Kornwachs, K.: Nichtklassische Systeme und das Problem der Emergenz. In: Breuninger, R.,
et al. (Hrsg.) Selbstorganisation. 7. Ulmer Humboldt-Kolloquium. Bausteine der Philosophie
Bd. 28, S. 181–231. Universitätsverlag, Ulm (2008)
71. Kornwachs, K., von Lucadou, W.: Komplexe Systeme. In: Kornwachs, K. (Hrsg.) Offenheit –
Zeitlichkeit – Komplexität, pp. 110–165. Campus, Frankfurt (1984)
72. Kornwachs, K., von Lucadou, W.: Open systems and complexity. In: Dalenoort, G. (ed.) The
Paradigm of Self Organization, pp. 123–145. Gordon & Breach, London (1989)
73. Kuhn, T.S.: Die Struktur wissenschaftlicher Revolution, 2nd edn. Suhrkamp, Frankfurt/Main
(1973)
74. Küppers, B.-O.: Die Strukturwissenschaften als Bindeglied zwischen Natur- und Geis-
teswissenschaften. In: Küppers, B.-O. (Hrsg.) Die Einheit der Wirklichkeit – Zum Wis-
senschaftsverständnis der Gegenwart, S. 89–106. W. Fink, München (2000)
75. Kuznetsov, Yu.A.: Elements of Applied Bifurcation Theory. Springer, New York (1995)
76. Landau, L.D., Liefschitz, E.M.: Lehrbuch der Theoretischen Physik. Bd. III: Quanten-
mechanik. Akademie verlag, Berlin (Ost) (1967)
77. Lazlo, E., Margenau, H.: The emergence of integrative concepts in contemporary science.
Philos. Sci. 39(2), 252–259 (1972)
78. Leibniz, G.W.: Monadologie. In: Leibniz, G.W. (ed.) Hauptwerke. Hrsg. von G. Krüger,
pp. 131–150. Kröner Verlag, Stuttgart (1958)
79. Lenk, H.: Wissenschaftstheorie und Systemtheorie. In: Lenk, H., Ropohl, G. (eds.) Sys-
temtheorie als Wissenschaftsprogramm, S. 239–269. Athenäum-Verl., Königstein (1978)
80. Locke, M.: Grundlagen einer Theorie allgemeiner dynamischer Systeme. Akademie verlag,
Berlin (Ost) (1984)
81. Lorenz, E.N.: Deterministic nonperiodic flow. J. Atmos. Sci. 20(2), 130–141 (1963)
82. von Lucadou, W., Kornwachs, K.: The problem of reductionism from a system theoretical
viewpoint - how to link Physics and Psychology. Zeitschrift für Allgemeine Wissenschafts-
theorie XIV/2, 338–349 (1983)
83. Luhmann, N.: Soziale Systeme. Suhrkamp, Frankfurt/Main (1984)
System Surfaces: There Is Never Just Only One Structure 65
84. McCulloch, W.S., Pitts, W.: A logical calculus of the ideas immanent in nervous activities.
Bull. Math. Biophys. 5, 115–133 (1943)
85. Meixner, J.: Zur Thermodynamik der irreversiblen Prozesse. Zeitschrift für physikalische
Chemie (B) 53, 235, 238–242 (1943)
86. Mesarovic, M.D.: A mathematical theory of general systems. In: Klir, G. (ed.) Trends in
General System Theory, pp. 251–269. Wiley-Interscience, New York (1972)
87. Morin, K., Michalski, K., Rudolph, E. (eds.): Offene Systeme II – Logik und Zeit. Klett-Cotta,
Stuttgart (1981)
88. von Neumann, J.: The general and logical theory of automata. In: Jeffers, L.A. (ed.) Cerebral
Mechanism of the Behavior. Wiley, New York (1951). Dt.: Allgemeine und logische Theorie
der Automaten. Übersetzt von D. Krönig. In: Kursbuch 8, März, pp. 139–192 (1967)
89. von Neumann, J.: The Computer and the Brain (8th printing). Yale University Press,
New Haven (1974)
90. von Neumann, J., Morgenstern, O.: Spieltheorie und wissenschaftliches Verhalten. Physica,
Würzburg (1961)
91. Neumann, J.V.: In: Burks, A. (ed.) Theory of Self-Reproducing Automata. Urbana, London
(1966)
92. Newell, A., Simon, H.A.: Human Problem Solving. Prentice Hall, Englewood Cliffs (1972)
93. Padulo, L., Arbib, M.A.: System Theory - A Unified State Space Approach to Continuous
and Discrete Systems. Hemisphere Publ., Washington (1974)
94. Palm, G.: Neural Assemblies. Springer, Berlin (1981)
95. Parson, T.: The Social System. Free Press, New York (1957)
96. Penrose, R.: Computerdenken- die Debatte um Künstliche Intelligenz, Bewusstsein und die
Gesetz der Physik. Spektrum, Heidelberg (1991)
97. Pontrjagin, L.S., Boltjanskij, V.G., Gamkrelidze, R.V., Miscenko, E.F.: Mathematische
Theorie optionaler Prozesse. VEB Deutscher Verlag der Wissenschaften, Berlin (Ost) (1964)
98. Prigogine, J.: Vom Sein zum Werden. Piper, München (1979)
99. Röhrle, E.A.: Komplementarität und Erkenntnis. – von der Physik zur Philosophie. Lit,
Münster (2000)
100. Ropohl, G.: Systemtheorie der Technik. Hanser, München (1979); 2nd edition: Allgemeine
Technologie. Hanser, München (1999); 3rd edition: Karlsuher Universitätsverlag, Karlsruhe
(2009)
101. Rosen, R.: Causal structures in brains and machines. Int. J. Gen. Syst. 12, 107–116 (1986)
102. Schulz, R.: Systeme, biologische. In: Ritter, J., Gründer, K. (eds.) Historisches Wörterbuch
der Philosophie, Bd. 10, pp. 856–862. Wiss. Buchgesellschaft, Darmstadt (1998)
103. Shannon, C.E., Weaver, W.: The Mathematical Theory of Communication. Urbana,
Chicago, London (1949/1969). Deutsch: Mathematische Grundlagen der Informationstheorie.
R. Oldenbourg, München (1976)
104. Strub, Ch.: System. II. System und Systemkritik in der Neuzeit. In: Ritter, J., Gründer,
K. (eds.) (1984) Historisches Wörterbuch der Philosophie, Bd. 10, pp. 25–856. Wiss.
Buchgesellschaft, Darmstadt (1998)
105. Thom, R.: Structural Stability and Morphogenetics. Benjamin, Reading (1975)
106. Weizenbaum, J.: Die Macht des Computers und die Ohnmacht der Vernunft. Suhrkamp,
Frankfurt a.M. (1977)
107. von Weizsäcker, C.F.: Die Einheit der Natur, S. 352. Hanser Verlag, München (1971)
108. von Weizsäcker, E.U.: Erstmaligkeit und Bestätigung als Komponenten der Pragmatischen
Information. In: von Weizsäcker, E.U. (Hrsg.) Offene Systeme I, S. 82–113. Klett, Stuttgart
(1974)
109. Wiener, N.: Kybernetik – Regelung und Nachrichtenübertragung in Lebewesen und Maschi-
nen. Rowohlt, Reinbeck, 1968 (1971), Econ Düsseldorf, Wien (1963), neue Auflage (1980)
110. Winston, P.H.: Artificial Intelligence, 3rd edn. Addison-Wesley, Reading (1992)
111. Wunsch, G.: Geschichte der Systemtheorie – Dynamische Systeme und Prozesse,
Oldenbourg, München, Wien (1985)
66 K. Kornwachs
112. Zahn, M.: System. In: Krings, H., Baumgartner, H.M., Wild, Ch. (eds.) Handbuch
Philosophischer Grundbegriffe, Band 5, S. 1458–1475. Kösel Verlag, München (1974)
113. Zucker, H.F.: Information, Entropie, Komplementarität und Zeit. In: von Weizsäcker, E.U.
(eds.) Offene Systeme I, S. 35–81. Klett, Stuttgart (1974)
114. Zucker, H.F.: Phenomenological Evidence and the “Ideal” of Physics. Working Paper, Max
Planck Institut zur Erforschung der Lebensbedingungen, Starnberg, Probable (1978)
Elements of a Semantic Code
Bernd-Olaf Küppers
Information is based upon signs and symbols and sequences of these. Such
sequences may be rich in content, that is, they can have a sense or a meaning.1 The
content, in turn, may become operative in various ways, because of the different
kinds of reactions that it can induce in the recipient. In general, three dimensions
of information can be distinguished. The “syntactic” dimension is understood as
the ordered arrangement of symbols and the relationships between them. The
“semantic” dimension includes the relationships between the symbols and also that
for which they stand. Finally, the “pragmatic” dimension includes the relationships
1
Here the two terms sense and meaning are used synonymously, and the distinction between them
that is usual in linguistic philosophy is not retained.
B.-O. Küppers
Frege Centre for Structural Sciences, Friedrich Schiller University Jena, Jena, Germany
e-mail: bernd.kueppers@uni-jena.de
between the symbols, that for which they stand, and also the effect that they have
upon the recipient.
However, resolving the concept of information into a syntactic, semantic and
pragmatic dimension is only justified within the framework of a systematic analysis.
Strictly speaking, the three dimensions make up an indissoluble unity, as one cannot
allude to one of them on its own without automatically invoking the other two.
Syntax only takes on a meaning when both the symbols as such and their
interrelationships have been stipulated. Semantics, in turn, presuppose a syntactic
structure as carrier of the semantics, even though semantics go further than this.
The same applies to pragmatics, since the sense and the meaning of information do
not open up without reference of the symbols to the real world. Sense and meaning
of information can only be objectified through the pragmatic effects that they have
upon the recipient and which trigger his interactions with the surrounding world.
Because of the intimate intermeshing of semantics and pragmatics, they are
frequently considered together as the so-called “semantico-pragmatic” aspect of
information. However, the pragmatic aspect is accompanied by further facets of
semantic information, expressed in its novelty, complexity, selectivity and other
features. Nevertheless, it is purely a matter of definition whether one considers these
as independent properties of semantic information or perhaps as various aspects of
the pragmatic component that endow the latter with a fine structure. In any case the
semantic aspect of information is something like a “hinge” between its syntactic and
its pragmatic level.
The syntactic and the pragmatic aspects are ultimately equivalent to the structural
and functional aspects of information. Even though function necessarily presup-
poses a structure as its carrier, this does not mean that functions can be traced
back to their structures. Human language may provide a good example of this:
here, the meaning and thus the pragmatic—i.e., functional—relevance of words and
sentences are fixed through their syntax, but nonetheless cannot be derived from it.
Rather, the semantics appear to stand in a “contingent” relationship to the syntax. In
other words, the semantics of a syntactic structure are not necessarily the way they
are. They are governed neither by chance nor by law.
Moreover, in human communication the exchange of semantic information is
always an expression of human mental activities with all their individual and
unique features. Can these be blanked out, allowing us to approach the general
characteristics of semantic information? Above all: is there a semantic code that
embodies the principles according to which meaningful information is constituted?
Questions of this kind appear to go well, and irrevocably, beyond the scope of
the exact sciences, which are based upon objectification, idealisation, abstraction
and generalisation. It is therefore understandable that the founding fathers of
communication theory—Hartley, Shannon and others—were seeking a scientific
approach to the concept of information that is independent of the meaning of
information but that provides a measure of the amount of information.
In fact, communication technology is not concerned with the actual content
of information; rather, it sees its primary task in passing the information in a
reliable way from the sender to the receiver. The path of this passage, the so-called
Elements of a Semantic Code 69
Ik 1=pk : (1)
where we have introduced the binary (or dyadic) logarithm ld. This definition thus
also takes account of the conventional use of binary coding in communications
technology. If, in the simple case, a source of messages consists of N messages,
which all have the same expectation value for the recipient, then the recipient needs
I D ld.N / (3)
binary decisions to select a given message. For this reason the quantity I is also
termed the “decision content” of the message in question. In fact, the decision
content had been proposed by Hartley [7] as a measure of the quantity of information
some years before Shannon formulated the general expression above. Even though
definition (2) does not consider the meaning of information, it points at least
indirectly to a semantic aspect, namely that of the novelty of an item of information,
insofar it is correlated with its inverse expectation probability.
The prior probabilities pk are fixed by the recipient’s prior knowledge. They are,
however, objectively determined quantities, i.e., they are the same for all recipients
that share the same facilities for, and methods of, obtaining knowledge. One can
also say that the information measure developed by Shannon is “objectively subject-
related” [20]. In other words: information is not an absolute entity, but only a relative
one—a fact which is also termed the “context-dependence” of information.
70 B.-O. Küppers
then this may lead to negative values, which would contradict our everyday picture
of a gain, which we associate with a real increase in information. However, as Rényi
[14] has shown, this deficiency can be resolved when one calculates the gain not by
considering the difference between two mean values, but rather by first determining
the gain in information for each single message and then calculating the mean value
X X
H.QjP / D qk ŒIk .pk / Ik .qk /
D qk ld.qk =pk /: (8)
k k
H.QjP / 0: (9)
Thus, within this concept, the information content is defined through the complexity
of a sequence of symbols. It provides, as it were, a measure of the “bulk” of the
information contained in a message.
Like Shannon’s definition of information, algorithmic information takes no
account of the actual content of information. Rather, the algorithmic measure
depends solely upon the complexity of an item of information, which in this case
means the aperiodicity of its syntax. From a technological point of view, those
restrictions are perfectly justified. However, in the natural sciences such a concept
of information soon reaches the limits of its usefulness. This applies above all in the
study of biological systems. This is because the results of modern biology, especially
in the area of genomic analysis, show clearly that a profound understanding of
biological phenomena requires access to the aspect of meaningful information. This
applies not least to the problem of the origin of life itself, which is equivalent to that
of the origin and evolution of information that has a meaning for the inception and
sustenance of vital functions [9].
Insofar as syntax, semantics and pragmatics are intimately interrelated (albeit in
ascending order of complexity), we are confronted in a fundamental way with the
question of whether the semantico-pragmatic level of information is accessible to
an exact, scientific analysis. Here—as we have already pointed out—we encounter
major difficulties, because according to our traditional understanding of science the
level of sense and meaning evades the kind of objectification and quantification that
characterises the exact sciences.
The nature of these issues can be illustrated by the human language. As
syntax, semantics and pragmatics represent increasingly complex dimensions of
information, one’s first thought might be to try and derive the semantics of a
linguistic expression from the lower level of its syntax, i.e., from the sequence of
phonemes, letters or other symbols which constitute this expression. However, this
thought is short-lived, since there is no necessary connection between the syntax and
the semantics of a message that would allow us to move logically from the one to the
other. A garbled or incomplete text, for example, cannot in general be reconstructed
by analysing the available fragments. This can at best be done against a background
72 B.-O. Küppers
of additional information, which can serve as a semantic reference frame for the
reconstruction.
In linguistics, the relationship between syntax and semantics is referred to as
“arbitrary”, because it does not allow any necessary connection to be discerned.
According to this view, syntax is indeed a carrier of semantics, but semantics are not
caused by the syntax. Rather, the irreducibility of semantics seems to be an example
of “supervenience”, i.e., the relationship of non-causal dependence between two
properties A and B.
It seems to be a plausible explanation that the arbitrariness of the relationship
between syntax and semantics is a consequence of a convention that is determined
by the community using the language in question. Nonetheless, the use of the terms
“arbitrary” and “conventional” can be criticised, as they still imply a causation
principle, even if this is only in the form of the fixation of the semantics by the
decision of the language community. For this reason, it might be better to describe
the relationship between syntax and semantics as “contingent”, as this term is
neutral in respect of causation; it merely expresses the fact that the relationship
between syntax and semantics is not of necessity that which one observes it to be.
Syntax and semantics are thus independent dimensions of information insofar as
their interrelationship is not law-like, even though they are indissolubly connected
to one another. Furthermore, the term “semantics” is strongly associated with
numerous connotations such as sense, meaning, usefulness, functionality, value,
content and so forth. These in turn leave a relatively large scope for determining
the semantic dimension of information.
There are two possible ways of coping with this problem. One is to regard
precisely the multiplicity as a characteristic feature of semantics. In that case, one
would retain the variety of possible definitions and try to build up a theory of
semantics on the most general basis possible. The other is to attempt to narrow down
the broad field of possible definitions, in order to sharpen the concept of semantics.
In that case one would arrive at a definition that would be precise, but at the cost of
considerable one-sidedness.
In the latter respect, the greatest progress has been made by the analytical
philosophy of language. This already possesses an elaborate theory of semantics,
at the centre of which is the logical construction of human language. For example,
within this theory strict distinction is made between concepts such as “sense” and
“meaning”, which in non-scientific usage are largely synonymous.
According to Frege, the founder of logical semantics, the “meaning” of a symbol
or an expression is the object that it denotes, while the “sense” is the manner in
which it is presented. Thus, the terms “evening star” and “morning star” have the
same meaning, as they refer to the same object, the planet Venus; however, as Frege
explains, they have different senses, because Venus is presented in the expression
“morning star” differently from the way in which it is presented in the expression
“evening star”. The fact that the morning star and the evening star are the same
object requires no further justification. However, the statement that the “star” we
see close to the rising sun is the same “star” that appears in the evening sky after
sunset is not an obvious truth. Rather, it is a discovery that we owe to the Babylonian
Elements of a Semantic Code 73
astronomers. Semantic precision of this kind was applied by Frege not only to proper
nouns, but also to entire sentences. Ultimately, they led to profound insights into the
truth value of linguistic statements.
Even though logical depth is a unique property of human language, we may consider
the possibility that language is a general principle of natural organisation that exists
independently of human beings. In fact, there is much evidence in support of this
hypothesis. Thus, it is generally accepted that various forms of communication exist
already in the animal kingdom, as revealed by comparative behavioural research. We
also know that even plants use a sophisticated communication system, employing
certain scent molecules, to inform each other and in this way to protect against the
threat of danger from pests. And it has long been known that bacteria communicate
with one another by releasing chemical signals.
However, more surprising is the fact that essential characteristics of human
language are reflected even in the structure of the genetic information-carriers. This
analogy does not just consist in a mere vague correspondence; rather, it embraces
largely identical features that are shared by all living beings.
Let us consider some facts: the carriers of genetic information, the nucleic acids,
are built up from four classes of nucleotide, which are arranged in the molecules
like the symbols of a language. Moreover, genetic information is organised hierar-
chically: each group of three nucleotides forms a code-word, which can be compared
to a word in human language. The code-words are joined up into functional units,
the genes. These correspond to sentences in human language. They in turn are linked
up into chromosomes, which are higher-order functional units, comparable to a
long text passage. As in a written language, the “genetic text” includes punctuation
marks, which label—for example—the beginning and end of a unit to be read. Last
but not least, the chemical structure of the nucleic acids even imposes a uniform
reading direction.
In addition to the parallels described here, there are further fundamental corre-
spondences between the structure and function of genes and that of human language.
These include, especially, the vast aperiodicity of nucleotide sequences and the
context-dependence of genetic information. The context in turn is provided by
the physical and chemical environment, which confers an unambiguous sense upon
the (in itself) plurivalent genetic information [10].
Just as a printing error in a written text can distort the meaning, the replacement
of a single nucleotide in the genome can lead to collapse of the functional order and
thus to the death and decay of the organism. This shows that genetic information
also has a semantic dimension; in other words, it possesses functional significance
for the sustenance of life processes. As the dynamics of life are encoded in the genes,
it would seem only consistent to speak of the existence of a molecular language.
74 B.-O. Küppers
Naturally, the analogy between the language of genes and that of humans would
immediately break down if we were to make the entire richness of human language
the measure of our comparison with the genetic language. The language of Nature
is more to be understood as an all-pervading natural phenomenon, which has found
its most elementary form of expression in the language of the genes and its highest
in human language.
The use of terms such as “genetic information” and “the language of genes” is in
no way an illegitimate transfer of linguistic concepts to the non-linguistic realm of
molecules. On the contrary: the existence of a genetic, molecular language appears
to be an indispensable prerequisite for the construction of living systems, as their
complex functional organisation arises along the path of material instruction and
communication. In fact, the unfolding and expression of the tremendous amount of
information stored in the genome takes place stepwise in the form of an exceedingly
intricate process of communication between gene and gene product. However, any
communication requires certain rules and these can only be understood by an
appropriate reference to the model of language.
In view of the broad-ranging parallels between the structures of human language
and the language of genes, recent years have seen even the linguistic theory of
Chomsky move into the centre of molecular-genetic research [16]. This is associated
with the hope that the methods and formalisms developed by Chomsky will also
prove suited to the task of elucidating the structures of the molecular language of
genetics. How far this hope will carry, we must wait to see. However, in the most
general sense it may be expected that the application of linguistic methods to biology
will open up completely new theoretical perspectives.
Like the philosophy of language, biology uses a special concept of semantics.
This stands here, as we have already stated, for the plan-like and purpose-directed
self-construction of living matter which finally arises in its extreme functional order.
Thus, biological semantics could just as well be termed “functional semantics”.
However, functions can as little be inferred from their underlying structures as the
semantics of a linguistic expression can be determined from its syntax. Like the
semantics of linguistics expressions, the semantics of the genetic information appear
to present an irreducible property of living matter. Notwithstanding, any progress
that linguistic research achieves concerning the problem of semantics will be an
enormous help for a better understanding of the corresponding problem in biology.
This is also demonstrated by the following example. Even though the language
of the genes lacks the logical depth of human language, we can draw a surprising
parallel between the two concerning the distinction between “sense” and “meaning”.
Genetic information is passed down from parent to daughter organism primarily as
an instruction, a mere “pre-scription”, which is stored in the hereditary molecules.
This form of existence can be compared to the meaning of a linguistic expression.
However, the genetic information does not take effect until it is expressed in the
fertilised ovum, and this expression is accompanied by a permanent re-assessment
and re-evaluation of the information. In this process, the context-dependence of
the genetic information comes into effect—precisely the property that imparts the
Elements of a Semantic Code 75
“sense” to a word. In accordance with Frege, we can say that the terms “instruction”
and “information” have the same meaning but different senses.
If we wish to know what the language of Nature is, then we must first look for the
general structure of language. However, we cannot solve this problem by simply
making an abstraction based on human language, as this has unique features that are
due to the unique position of humans as thinking beings. In human language we can
pronounce judgements; formulate truths; and express opinions, convictions, wishes
and so forth. Moreover, human language has various manifestations at the level
of interpersonal communication: there are picture languages, gesture languages,
spoken languages and written languages. Therefore, to uncover the general structure
of language, we must develop our concept of language from the bottom up. Setting
out from the most general considerations possible, we must endeavour to develop
an abstract concept of language that reflects human language and at the same time
is free from its complex and specific features.
This constraint immediately highlights an almost trivial aspect of language, one
that is basic for any process of communication: successful communication is only
possible when the partners in communication use a common pool of symbols.
Further to that, sequences of such must also be structured according to rules and
principles which are known to both the sender and the recipient. Such syntactic
structures then form the framework for a symbol language shared by sender and
recipient.
However, the syntactic structure must fulfil yet another condition: the sequences
of symbols from which a language is built up must have an aperiodic structure.
This is because only aperiodic sequences are able to encode sufficiently complex
information. This can also be shown to be the case for human language.
Let us consider the language of symbols in more detail. Its character can be
illustrated by reference to the communicative processes that take place in living
matter. Here, the language is initially instructive in Nature, that is—setting out from
genetic information—it sets up the posts that the innumerable elementary processes
have to follow. However, the organism is anything but a piece of mechanical
clockwork that follows a rigid set of symbols. It is better viewed as an extremely
complex system of internal feedback loops, one that has continually to orientate
and adjust its internal processes in order to preserve itself and its capacity for
reproduction. To do this, it must continually take up new information from its
environment (both inside and outside the organism) and evaluate and process this
information.
The interactive procedures of expression and continual re-evaluation of infor-
mation in living matter go beyond mere instruction, insofar as this level of
signal-processing takes place with reference to the entire context of the signals. To
take account of the full breadth of this aspect of communication, let us look at human
76 B.-O. Küppers
The use of terms such as “information” and “language” in biology is often criticised
as a naturalistic fallacy. Both terms, it is claimed, fail to characterise natural
objects, being illegitimate projections from the non-scientific area of interpersonal
communication and understanding into the area of natural sciences. However, this
charge is not even justified in a scientific-historical sense. This is because the
concept of information in biology is rooted not in the idea of communication but
Elements of a Semantic Code 77
microstates possess the same prior probability pi D 1=p, so that the above equation
adopts the form
S D k ln W; (12)
where W is the number of possible microstates that make up a macrostate. This
number is also called the “thermodynamic probability”; it should however not
be confused with the mathematical probability, which always lies in the range
0 p 1.
Thus, apart from its sign, the Shannon entropy has the same mathematical
structure as the Boltzmann entropy as given by (11). Strictly speaking, informa-
tional entropy and thermodynamic entropy are two structurally equivalent partition
functions, of which one refers to the distribution of messages and the other the
distribution of energy among the various quantum states.
In consequence of the structural aspects described above, it is therefore justifiable
to equate information with negentropy. Boltzmann had already recognised this when
he laid the statistical foundations of entropy, pointing out that the entropy function
is at the same time related to the information that we have about the system [2].
Indeed, for the case of an ideal gas this is immediately obvious: the temperature of
such a gas is directly proportional to the mean kinetic energy of the gas molecules
and represents an intensive quantity—that is, it depends not upon the number of
particles for which the mean value is calculated. It necessarily follows from this that
there must be another quantity, complementary to temperature, that is proportional
to the size or extent of the system. Were this not the case, then our knowledge of the
total thermal energy of the system would be incomplete. This extensive quantity is
the entropy.
The entropy function thus is a measure of the loss of information that arises
in statistical physics when one dispenses with exact knowledge of the microstate.
In this way, Boltzmann brought into focus—for the first time—the fundamental
dependence of scientific statements upon the means and abstractions used to
describe them.
The view of information and negentropy as being equivalent furnishes the con-
cept of information not with a naturalistic, but rather with a structural interpretation.
This leads to the collapse of the frequently expressed criticism that science wrongly
regards information as a natural entity. It now becomes clear that this criticism
rests upon a fundamental misunderstanding. In fact, the natural sciences rarely
concern themselves directly with natural entities. Of much more importance for
a causal understanding of Nature are the relationships between such entities. This
becomes especially clear with the example of physical entropy. Although it is true
that entropy is an object of scientific discourse that can be described by a physical
quantity, it is not a natural entity in the same sense that trees, stones or other material
objects are. Rather, the entropy of a system is a statistical partition function, the
objective nature of which lies exclusively in the objective nature of the distribution
of energy that it expresses.
Information is precisely not a “natural” entity in the narrow sense, but rather
a “structural” entity that is structurally equivalent to the physical entropy. Conse-
Elements of a Semantic Code 79
quently, information theory should be assigned neither to the natural sciences nor to
the humanities. It belongs to the rapidly increasing branch of the so-called structural
sciences. Their programme is to describe the comprehensive, abstract structures of
reality, independently of where we encounter them and of whether they characterise
living or non-living systems, natural or artificial ones. The structural sciences form
a self-contained scientific view of reality, seeking the elements that combine the
traditional natural sciences and the humanities.
In general, one obtains the laws of structural sciences by removing all the
empirical constants from empirical laws and replacing them with logical constants.
The structural laws then have the same syntactic structure as the empirical laws
from which they were derived. In science, there are numerous examples of such
structural equivalences. A very simple example from physics is the parity of
structures of the law of gravity, which describes the attraction between two masses,
and Coulomb’s law, which describes the force between two charges. One may also
mention the Fourier equation, which is applied in electrodynamics, hydrodynamics
and thermodynamics. Another well-known example is that of the Lotka–Volterra
equations, which are used for the description of dynamic systems not only in
biology, but just as much in chemistry and economics.
Above all, the emergence of the structural sciences was due to the investigation
of complex systems. As there are no additional natural laws that apply only to
complex systems, their underlying laws must be sufficiently abstract to do justice
to the vast diversity of complex phenomena, each with its individual characteristics.
This explains the spectacular growth of the structural sciences in our day. Alongside
information theory, the middle of the last century saw the establishment of cyber-
netics, game theory, systems theory and semiotics. The classical structural sciences
have been enriched in recent decades by disciplines as important as complexity
theory, network theory, synergetics and the theory of self-organisation. At the same
time, the structural sciences form a mould for possible sciences, comparable with
mathematics, which one may regard as a prototype of a structural science. Unlike
mathematics, however, the structural sciences develop only through interaction
with the experienceable structures of reality. And, most importantly: the structural
sciences also contain the key that opens the door to the semantic dimension of
information.
This measure, in turn, has the same mathematical structure as the entropy of a
message source (4). However, in place of the weighted messages, relation (14)
contains the weighted values Wk of a chosen message Ij . At the same time, the
number k is a measure of the fine structure of the evaluation scale: the greater k is,
the sharper, i.e., the more differentiated is the evaluation by the recipient. In the
limiting case, where the only value a recipient attaches to a message is its novelty,
(14) reduces to the information measure (2) of classical information theory.
The information value W .Ij / is a relative and subjective measure insofar as it
depends upon the evaluation criteria of the recipient. However, for all recipients
who use the same elements of the semantic code, and who for a given message Ij
assign the same weights to these elements, W .Ij / is an objective quantity.
The elements of the semantic code can be—at least in principle—quantified. For
novelty value, Shannon’s definition of information content is a good example.
For complexity, the measure offered by algorithmic information theory seems
appropriate. Let us look more closely at the latter, because it allows far-reaching
conclusions to be drawn concerning the context-dependence of information. To
investigate this, we take up the central thesis of philosophical hermeneutics, i.e., that
one can only understand something if one has already understood something else
(for this issue see [12]). This thesis we now put onto a quantitative basis. Expressed
in the language of information theory, the question would be: how much information
is needed in order to understand another piece of (meaningful) information?
An exact answer to this question seems at first glance impossible, as it still
contains the problematical concept of “understanding”. However, surprisingly, an
answer is nonetheless possible, even though one must restrict the consideration
to the minimum condition for any understanding. This minimum condition is the
Elements of a Semantic Code 83
X
n11
2i D 2n10 2: (15)
i D1
As no algorithm with K < n10 can generate more than one binary sequence, there
are fewer than 2n10 ordered binary sequences. These make up one 210 th of all
n-membered binary sequences. This means that among all binary sequences of
length n only about every thousandth sequence is non-random and possesses a
complexity K < n 10.
To summarise the result: in order to understand a piece of information, one
invariably needs background information that has at least the same degree of
complexity as the information that is to be understood. This is the sought-for answer
to the question of how much information is needed to understand some other piece
of information. This finding gives the phenomenon of the context-dependence of
information and language a highly precise form.
7 Concluding Remark
According to the model of the semantic code developed in this paper, the specific
content of meaningful information is constituted by the superposition and the
weighting of the general aspects of that information. It is quite conceivable
that meaningful information may be assembled according to the same pattern in
the brain. Thus, theoretical and experimental research suggests that brain cells,
stimulated by sensory impressions, join up into a synchronous oscillating ensemble,
whereby the individual information segments, scattered over different parts of the
brain, are united into a self-consistent cognitive structure. In the light of the above
considerations the synchronisation of oscillatory activity appears as the neuronal
expression of a principle that we have identified as the generating principle of
semantics. It may be that this principle will prove to be a useful model for the
so-called neural code that neurobiology is seeking and which is assumed to be the
guiding principle for the constitution of meaningful information in the brain.
References
1. Bar-Hillel, Y., Carnap, R.: Semantic information. Br. J. Philos. Sci. 4: 147–157 (1953)
2. Boltzmann, L.: Vorlesungen ber Gastheorie, 2 Bd. J. A. Barth, Leipzig (1896/1898)
3. Cassirer, E.: Substanzbegriff und Funktionsbegriff. Wissenschafliche Buchgesellschaft,
Darmstadt (1980)
4. Chaitin, G.J.: On the length of programs for computing finite binary sequences. J. Assoc.
Comput. Mach. 13, 547 (1966)
Elements of a Semantic Code 85
5. Deleuze, G.: Woran erkennt man den Strukturalismus?, p. 8. Merve, Berlin (author’s transla-
tion) (1992)
6. Eigen, M.: Selforganisation of matter and the evolution of biological macromolecules.
Naturwissenschaften 58, 465–523 (1971)
7. Hartley, R.V.L.: Transmission of information. Bell Syst. Tech. J. 7, 535–563 (1928)
8. Kolmogorov, A.N.: Three approaches to the qunatitative definition of information. Problemy
Peredachi Informatsii 1, 3–11 (1965)
9. Küppers, B.-O.: Information and the Origin of Life. MIT, New York (1990)
10. Küppers, B.-O.: The context-dependence of biological information. In: Kornwachs, K.,
Jacoby, K. (eds.) Information. New Questions to a Multidisciplinary Concept, pp. 135–145.
Akademie Verlag, Berlin (1995)
11. Küppers, B.-O.: Der semantische Aspekt von Information und seine evolutionsbiologische
Bedeutung. Nova Acta Leopoldina 294, 195–219 (1996)
12. Küppers, B.-O.: Information and communication in living matter. In: Davies, P.,
Gregersen, N.H. (eds.) Information and the Nature of Reality, pp. 170–184. Cambridge
University Press, Cambridge (2010)
13. McKay, D.: Information, Mechanism and Meaning. MIT, New York (1969)
14. Rényi, A.: Probability Theory. Elsevier, Amsterdam (1970)
15. Schrödinger, E.: What is Life?. Cambridge University Press, Cambridge (1944)
16. Searls, D.B.: The language of genes. Nature 420, 211–217 (2002)
17. Shannon, C.E.: A mathematical theory of communication. Bell Syst. Tech. J. 27, 379–423,
623–656 (1948)
18. Solomonoff, R.J.: A formal theory of inductive inference. Inform. Contrl. 7, 1–22, 224–254
(1964)
19. Weaver, W., Shannon, C.E.: The Mathematical Theory of Communication, p. 8. University of
Illinois Press, London (1963)
20. von Weizsäcker, C.F.: The Unity of Nature. Farrar Straus Giroux, New York (1981)
Talking About Structures
Abstract In the first part, we try to give a sufficiently precise definition of the
concept of “structure” as a term that denotes abstract entities. Some examples are
included. The second part is devoted to a short analysis of how the concept is used in
different scientific areas, reaching from mathematics and physics to the humanities
and computer science.
1 Introduction
During the second half of the past century a series of books were published
containing the term “structure” in their title. Let us only mention the following:
– Structure and Appearance (1961) by Nelson Goodman
– Strukturwandel der Öffentlichkeit (1962) by Jürgen Habermas
– The Structure of Scientific Revolutions (1962) by Thomas Kuhn
– Erfahrung und Struktur (1968) by Friedrich Kambartel
– La struttura assente (1968) by Umberto Eco
– The Logical Structure of Mathematical Physics (1971) by Joseph D. Sneed
Let us begin with a look at the word “structure” and its use in science and
philosophy. In the English language the word “structure” was first1 used in the
beginning of the seventeenth century in connection with buildings. It gradually
became used metaphorically and reached science at the end of the nineteenth
century. In 1919 the mathematician and philosopher Betrand Russell (cf. [14],
p. 60f.) was the first to give an explicit definition of this notion, and it was based
on abstraction. In 1928 this definition was also given by Rudolf Carnap in his book
Der logische Aufbau der Welt (which was translated into English under the title
The logical structure of the world) [5]. Let us first recall Russell’s definition. In his
Introduction to Mathematical Philosophy he states [14, p. 61] he calls “structure”
a word which, important as it is, is never (so far as we know) defined in precise terms by
those who use it.
1
According to An etymological dictionary of the English language by Walter W. Skeat, Oxford
4
1910 [16].
Talking About Structures 89
The diagram shown in this definition has a property nowadays called commutative
and shortly written as
Q ı S D S ı P:
The modern reader of Russell’s text will recognize in it a definition of structure by
abstraction: the first step being a definition of structural equality or—in Russell’s
terms: “likeness” or “ similarity”—and the second step being a restriction of the
language used when talking on structures to terms that are invariant with respect
to the equivalence relation in question. In this way structures are abstract objects
represented by “concrete” relations; or—as mathematicians will say—structures
are equivalence classes of relations with respect to the equivalence relation called
“likeness”.
All that means that in order to define a structure we need an “instance” first, that
is: a set (Russell used the term “field”), one or more relations defined on it, and a
kind of list naming all pairs (or triples) of elements having the relation in question.
Thus the problem of representing structures arises.
In simple cases it is indeed a finite list that solves our problem. And in the
most simple case we have just one object (to have an example: a person P) and
one relation connecting this object with itself (in our example: P’s knowing (k)
himself. Thus our “list” consists of a single line telling us that k(P,P). This can be
visualized as
P i k
Forgetting now the example and talking abstract we get something we may call the
singleton structure, represented by the following graph:
h
Note the emphasis on the definite article: abstract objects are individuals. The next
case is mentioned here only to have a name for it in our later considerations: we have
two objects (example: Socrates, S, and Plato, P), with the relation “being teacher
of”, visualized as
t
S / P
2
With a symmetric relation “is different from”, so the arrowhead is suppressed.
3
“[. . . ] the first definition, one generally employed, of the concept of structure: the presence of two
terms and the relationship between them.” [7, p. 19].
90 R. Inhetveen and B. Schiemann
s1 s2
not s2 not s1
4
Picture taken from the “Königsberg”-article in wikipedia.de.
Talking About Structures 91
n n+1
. . .•→ • . . .
At first sight there are no relations visible: they are hidden in the innocent term
“structure-type”, as a “real” example will show:
(defstruct customer
(name Miller)
(first-name John)
(assets 1,000,000-$)
...)
Structures in LISP
As a little more complex example we may take ordinary table salt (cf. Fig. 4)5
where a cube consisting of chlorin and sodium ions is representing this partial
structure (cf. Fig. 5).
Let us emphasize again that table salt does not look like shown in these pictures;
instead we are making a description using spatial images of what in fact is invisible:
the ions as well as the chemical forces connecting them.
In even more complex cases where this kind of visualization fails, mathematics
has given us the means to master this situation too: we can (sometimes) describe
such structures axiomatically. The best known example here is of course euclidean
geometry, where David Hilbert in his Grundlagen der Geometrie (1899) [9]
5
Picture taken from: wikipedia.org/wiki/Bild:NaCl-Ionengitter.png
Talking About Structures 93
delivered a complete description of the structure behind the theory. But this was
not clearly expressed in the text—mainly because Hilbert used the pictorial terms
of Euclid’s geometry instead of “dummy” words for the objects and relations
constituting the structure at hand. Only in 1942 Heinrich Scholz delivered a
“tranlation” of Hilbert’s text into a purely structural one, transforming for example
Hilbert’s axiom
I.1 For two points A, B there always exists a straight line a which contains each of the two
points A, B.
Mathematics. With the last example we have entered the realm of mathematics
which nowadays is “the” science of structures. That will say mathematics deals with
structures. So there must of course be a clear definition of the concept—and there
is. In full generality it can be found in the monumental work of the Bourbaki-group,
to be precise: in chapter four of the volume devoted to set theory. In the booklet
of results a less rigorous definition is given, with a note reading (see [4, p. 52, note
()]; my translation, R.I.):
The reader will note that the indications given in this paragraph remain rather vague; they
are here only for heuristic reasons and it seems nearly impossible to give the general and
precise definitions concerning structures outside the frame of formal mathematics (see Set
Theory, chap. VI).6
That is why we shall not repeat this definition here. In fact it differs from Russel’s
definition not in essence but in generality. Instead of going into technical details we
therefore shall give a short sketch of what is done with structures in mathematics,
once they are defined.
First, three basic sorts of structures are distinguished: order structures (as,
e.g., the natural numbers with their “natural” order), algebraic structures (as, e.g., the
ring of the integers), and topological structures (as, e.g., the common euclidean
space with its metric). Second, operations are defined to gain new structures from
these by canonical methods: substructures, quotient structures, product structures,
and some more. These methods are common to all types of structures. Third—
and most important—multiple structures are defined, that is: sets carrying two or
6
Original text: Le lecteur notera que les indications données dans ce paragraphe restent assez
vague; elles ne sont là qu’à titre heuristique et il ne semble guère possible d’énoncer des définitions
générales et précices concernant les structures, en dehors du cadre de la Mathématique formelle
(voir Ens., chap. IV).
94 R. Inhetveen and B. Schiemann
more different types of structures: the most prominent example being the reals as an
ordered field with a “natural” metric.
A further central idea using multiple structures consists in compatibility of the
two or more structures at hand. As an example of what is meant the associative law
may serve, saying that the upper right and the upper left triangle in the following
diagram are commuting (Fig. 6):
The main advantage of studying structures in place of examples is—of course—the
generality of results. In view of the abundance of groups it is indeed a deep theorem
(first proved by Cayley) that every group is isomorphic to a group of permutations. It
illustrates at the same time what are the two main tasks in mathematics as a science
of structures: to give really structural theorems and to give manageable standard
representations of the structures studied.
Historians of mathematics, as, e.g., Dieudonné (see [6, p. 59]), trace this kind
of thinking in structures back to the famous Erlanger Programm by Felix Klein
[11], where a geometry is characterized by a group and the objects studied are
the invariants under the operations of this group. Here we meet again the idea of
invariance mentioned at the beginning that is unavoidable whenever abstraction is
being used.
Physics. Let us now look at physics which means theoretical physics here. One of
the great figures of this discipline, Heinrich Hertz, wrote in his Die Prinzipien der
Mechanik in neuem Zusammenhange dargestellt of 1894 [8] on the first page of his
introduction (Fig. 7):
Talking About Structures 95
We make us inner images or symbols of the outer objects, and we make them such, that
the logically necessary consequences of the images always are the images of the naturally
necessary consequences of the depicted objects.
Transforming this thought into a diagram, we get Fig. 8, where the diagram is meant
to be commutative, that is: the two relations of “consequence” are “similar” in
Russell’s terms.
Now it is not difficult to pronounce what Hertz has said into the language of
structures, getting the result that the structure of the outer objects (i.e., the real
world) has the same structure as the “inner images” we make of them. In short:
theory and reality are structurally isomorphic.
So in physics too we find structural thinking towards the end of the nineteenth
century. Nowadays physicists are more modest in their assertions on “reality”. They
talk of models (in place of “inner images”) which they try to make successful for
descriptions of the outer world, avoiding ontological commitments. But structural
thinking remains in use as Hertz had suggested. The way of doing it consists in an
extensive use not just of mathematics but also of structural mathematics.
One result of this way to work is that physics itself has become a structural
science—though it was not explicitly mentioned in the long list of structural
sciences presented by Professor Küppers [12]. To see how this is to be understood
let us start from what has been called “structural laws”. We do not maintain that
the examples given (law of gases respectively of supply and demand; entropy in
thermodynamics and in information theory) are not good candidates for structural
laws or conceptions. Instead we would like to add two more candidates for structural
laws.
The first example remains on the level of abstraction known from Küppers’s
examples; it’s the law of “harmonic oscillation”:
xR C c x D 0 (1)
FDma (2)
Here the “acceleration” a is a quantity that can be measured directly, and the
same holds for the factor that turns proportionality into equality, that is, for the
“mass” m. But F is a sophisticated entity: it has been called a “theoretical term”,8
thus indicating that a direct measurement is impossible: each measuring device must
already make use of a law, e.g., Hooke’s law, to work properly. Therefore Newton’s
second “law” sometimes was read as a definition of mass.
A structural look on the above formula gives another picture: the letter F is used
as a symbol relating Newton’s “law” to at least another law containing the same
symbol for a special case of force, as e.g.:
and many more. Setting one of these terms equal to the F of Newton’s law we get
a differential equation whose solution, with “fitting” initial or boundary conditions,
is called an explanation of the corresponding phenomen on (mechanical oscillation,
planetary movement, trajectories of charged particles in magnetic fields). Newtonian
mechanics in this sense supplies us with what is called the structure of an
explanation. It is itself called a structural science exactly for this reason.
In the ongoing efforts of establishing so-called “unified theories” for the four
fundamental interactional forces of physics we see another example of structural
thinking. The final result, a GUT (grand unified theory), is not yet reached, but,
interestingly, we “know” some of its formal—and that means: structural—properties
7
The quotation is from the third edition (1871), p. 13.
8
By Sneed (1971) [17], for example.
Talking About Structures 97
and several years later, in 1974, his colleague Louis Althusser added [1, p. 57]:
[the] “flirt” with structuralist terminology has surely exceeded the allowed measure.
As indicated near the beginning of this talk computer science is using structures
in different places and hopefully in a better way than just seen for the humanities.
Everyone would agree that a huge amount of, e.g., sales statistics of big companies,
need structures to store them in a controlled way. The success of computer science
building structures is on the one side related to the terms database (DB), Entity
Relationship model (ER), and the Structured Query Language. If one would
measure success of this structure technique, just take the amount of money the
second biggest enterprize software company Oracle can make with it.
On the other side techniques motivated by the possibilities given by computer
reasoned ontologies lead to program structures that are able to model the real-
98 R. Inhetveen and B. Schiemann
References
9
University of Erlangen-Nuremberg, Department of Information Systems II and the Chair for
Artificial Intelligence.
Talking About Structures 99
Abstract The aim of this chapter is to present a formal theory of the structure
of information that will support a variety of statements about documents in various
media, their internal structure, and how they function in the world at large. We begin
by sketching an approach to anchoring symbolic systems in human cognition and
discuss various levels of intentionality that occur. We then consider compositionality
in different symbolic systems and the sometimes complex coreference relations that
arise from that. This theory is the basis of a program for translating natural language
into logical form, and this is described. Then the theory is applied to the specific case
of diagrams as information-bearing objects, and a logical theory of Gantt charts
is constructed as an illustration. Finally there is a discussion of issues raised with
respect to various modalities and various manifestations of symbolic artifacts.
1 Introduction
Search engines today are very good at finding information in text resources. But very
often the best answer to a question is in a diagram, a map, a photograph, or a video.
For example, consider the questions
What is the Krebs cycle?
How has the average height of adult American males varied over the last
100 years?
How did the Native Americans get to America?
In this paper we will assume that we have a coherent notion of causality, as in [11],
and change of state, as in [15], as well as a theory of commonsense psychology at
least rich enough to account for perception, planning and intentional behavior, and
Toward a Formal Theory of Information Structure 103
what we here call “cognizing,” that is, taking some cognitive stance toward, such
as belief, thinking of, wondering about, and so on. We will refer to the contents of
thoughts and beliefs as “concepts,” a general notion that not only subsumes proposi-
tions [5] but also includes nonpropositional concepts like “dog” and “near,” images,
vague feelings of apprehension, and so on. We will assume the “ontologically
promiscuous” notation of Hobbs [7], but for typographical convenience, we will
abuse it by using propositions as arguments of other propositions, where a proper
treatment would reify the corresponding eventualities and use those as arguments.
Some of the inferences below are defeasible, and thus the underlying logic must
support a treatment of defeasible inference. There are many frameworks for this,
e.g., McCarthy [19] and Hobbs et al. [14]. To minimize notational complexity,
defeasibility is not made explicit in the axioms in this paper.
The basic pattern that symbols rest on is the perception of some external stimulus
causing an agent to cognize a concept.
It is important to note, however, that perception can trigger many concepts and that
not everything that is cognized needs to be what is perceived. Perceiving a bell can
cause an agent to cognize food (as well as the bell). This makes symbols possible.
104 J.R. Hobbs and R. Mulkar-Mehta
Where the concept cognized is propositional, we could talk about its truth in the
world. That is, it is not only true that e is cognized, but it also holds in the real
world–holds.e; w/. However, this will play no role in this paper. The meanings of
symbols will be strictly in terms of the concepts they invoke in the recipient.
Communication begins when another agent presents an entity causing the first
agent to perceive it.
goal.g; b/ ^ executable.g; b/
cause.goal.g; b/; g/ (5)
(All axioms are universally quantified on the variables in the antecedents of the
highest-level implication). If g is a goal of b’s and is executable by b (or achievable
by an executable action), then its being a goal will cause it to actually occur. We
won’t explicate executable here, but it means that g is (achievable by) an action of
which b is the agent, and all the preconditions for the action are satisfied.
When an observer a uses this causal rule, he is recognizing the intention that lies
behind the occurrence of the event.
It is most common in human communication that the intention is recognized.
Agent b knows that presenting x causes a to perceive x, which causes a cognize
concept c. b has the goal that a cognize c. So that causes b to present x. Agent a
comes up with exactly this causal explanation of b’s action of presentation, so not
only does a cognize c but a also recognizes b’s goal that a cognize c.
This recognition relies on agents’ knowing a defeasible rule that says that
goal.g1; b/ ^ cause.g2 ; g1 /
goal.g2 ; b/ (6)
That is, if an agent b has a goal g1 and g2 tends to cause g1 , then b may have g2 as
a goal as well.
In the case of communication, g1 is cognize.a; c/ and g2 is present.b; x; a/. The
recipient observes the event of the presenting, uses axiom (5) to infer abductively
that it is intentional, and uses axiom (6) together with schema (4) to recognize that
b intends for a to cognize c.
106 J.R. Hobbs and R. Mulkar-Mehta
We can get to full Gricean nonnatural meaning [6] by decomposing Rule (6) into
two rules:
goal.g1; b/ ^ cause.g2; g1 /
goal.cause.g2; g1 /; b/ (7)
goal.cause.g2; g1 /; b/
goal.g2 ; b/ (8)
That is, if an agent b has a goal g1 and g2 tends to cause g1 , then b may have as a
goal that g2 cause g1 . Moreover, if an agent b has as a goal that g2 cause g1 , then b
has the goal g2 .
When g1 is cognize.a; c/ and g2 is present.b; x; a/, a uses axioms (7) and (8)
to explain the presentation; then a will recognize not only b’s intention to have a
cognize c but also b’s intention that a do so by virtue of the causal relation between
perceiving x and cognizing c. This is the definition of Gricean nonnatural meaning.
In order for this sort of communication to work, it must be mutually known
between a and b that perceiving x causes cognizing c. Mutual belief can be
characterized by three properties. First, if a group mutually believes something, each
of the members believes it. Second, if a group mutually believes something, then it
mutually believes that it mutually believes it; this allows one to step up to arbitrary
levels of embedding of belief inside belief. Third, mutual belief can be successively
approximated by shared beliefs. That is, if we both believe something, we probably
but not necessarily mutually believe it; if we both believe we both believe it, it is
even more likely that we mutually believe it. The more levels of embedding we add
on, the more difficult it is to construct examples where mutual belief fails, and thus,
the more likely it is that mutual belief holds.
Communicative conventions are causal rules having the form of (7) and (8),
where g1 is the presentation of a symbol and g2 is the act of cognizing its
meaning. Communicative conventions grow up in different social groups and
become mutually believed within the groups. Thus, the structure of a communicative
convention is
for a specific x and a specific c. That is, a social group s that agents a and b
are members of mutually believes the causal relation between presenting x and
cognizing c. For example, x might be a red flag with a white diagonal, s might
be the community of boaters, and c the concept that there is a scuba diver below.
These communicative conventions can originate and take hold in a group in many
different ways. The culture of a group consists in large part of a number of such
rules.
Note that there is nothing particularly advanced about the arbitrariness of the
symbol x. That is already there in the most primitive stage, in the connection
between the bell and the food.
This completes the sketch of how the meaning of atomic symbols can be
grounded in a theory of cognition: in our scheme, x is a symbol that means or
Toward a Formal Theory of Information Structure 107
means.x; c; s/
We will leave out the third argument in the development of the theory of diagrams
below; the community is simply the set of people expected to be able to understand
the diagrams.
Like a folk theory of belief, a folk theory of intention is a useful thing for a social
animal to have. It allows individuals to predict the behavior of other individuals
with some degree of accuracy. A stark example of this point is the difference in our
reaction to walking down a sidewalk a few feet from a bunch of cars hurtling past
at 50 miles an hour and our reaction to standing on a mountain slope a few feet
from a landslide. The difference is entirely a matter of intention. Once a theory of
intention has evolved, it can be deployed in the construction of a much richer theory
of meaning, as described here, and for example allows agents to interpret utterances
as deliberately deceptive.
We next turn to how more complex symbolic objects convey more complex
meanings in different modalities.
An atomic symbol, i.e., one that does not have interpretable parts, corresponds to
some concept. Atomic symbols can be composed in various ways, depending on
the type of symbol system, and the meaning of the composition is determined by
meaning of the parts and the mode of composition. These composite elements can
then be components in larger structures, giving us symbolic structures of arbitrary
complexity. This is illustrated in the commuting diagram of Fig. 1.
Composition in symbol systems occurs when entities x and y, meaning c1 and c2 ,
respectively, are presented with a relation R1 between them, where R1 conveys the
relation R2 in the target domain. Thus, we have three causal relations.
The relation R1 .x; y/ can be thought of as just another entity in the symbol
system, so it is subject to full Gricean interpretation just as atomic symbols are,
and it can similarly be involved in the conventions of some community.
With respect to the concepts invoked, we will confine ourselves here to proposi-
tional concepts. The advantage of having a flat notation in which anything can be
reified is that when composite concepts are constructed, we can view this as simply
108 J.R. Hobbs and R. Mulkar-Mehta
a conjunction of what is already cognized with the new relations conveyed by the
composition.
Speech (and text as spoken) takes place in time, so the only compositional
relation possible on the left side of the diagram is concatenation. In discourse
beyond the sentence, concatenation generally conveys a coherence relation based on
causality, similarity, interlocking change of state, or figure-ground [8]. In a sense,
the adjacency of the segments of discourse says that the states or events described
in each segment are related somehow, and the hearer has to figure out how. The
particular coherence relations that occur are the most common kinds of relations
that typically obtain among states and/or events.
Within sentences, the composition of smaller units into larger units conveys
primarily a predicate-argument relation between the meanings of the components.
Thus, when we concatenate “men” and “work” into “men work,” we are indicating
that the referent of “men” is an argument or role-filler in the event denoted by
“work.” This view of syntax as conveying predicate-argument (and modification)
relations through adjacency of constituents is elaborated in [10], in which an
extensive grammar of English is developed in a manner similar to the Head-
driven Phrase Structure Grammar of Pollard and Sag [24]. Section 6 describes
an implementation of this idea for generating logical forms for natural language
sentences.
Protolanguage [2] is more like discourse. Two words, phrases, or larger segments
of natural language are adjacent, and it is up to the hearer to figure out the relation
between them. Children’s language in the two-word phase is an example of this;
“mommy sock” might mean that this sock belongs to mommy, or that mommy
should put this sock on or take it off the child, or any number of other relations.
The language of panic is another example of protolanguage, for example, when
someone runs out of his office shouting, “Help! Heart attack! John! My office! CPR!
Just sitting there! 911! Help! Floor!” we understand the parts and hypothesize the
relations that build them into a coherent scenario. The nominal compound can be
seen as a relic of protolanguage in modern English; the relation between the two
nouns might be a predicate-argument relation, as in “language origin”, but it might
be any other relation as well, as in “turpentine jar”.
Toward a Formal Theory of Information Structure 109
Hobbs [12] tells an incremental story of the evolution of syntax that begins with
protolanguage. Then constraints are added to the effect that adjacency is interpreted
as predicate-argument relations. Further steps extend this to discontinuous elements,
as in “Pat is likely to go,” where “Pat” is the subject of “go.” Long-distance
dependencies are seen as language-specific constraints on possible predicate-
argument relations between noun phrases and adjacent clauses.
Tables are another kind of complex symbolic object. In tables, the elements in
individual cells refer to some concept. The manner of composition is placement of
these cells in a vertical and horizontal arrangement with other cells. Generally, the
aggregate represents a set of relations: The item in a cell that is not the first in its row
stands in some relation to the first element in the row. The relation is the same for
all elements in that column and is often explicitly labeled by a header at the top of
the column. For example, in a table of United States presidents, we might have the
year 1732 in one cell. The label on the row may be “George Washington” and the
label on the column “Birth date.” This spatial arrangement then conveys the relation
birthdate.GW; 1732/.
A beep in your car can mean several things—you haven’t fastened your seat belt,
the door is open, your lights are still on. When a sequence of beeps is combined
with a property of that sequence, namely, a rapid frequency, the composite meaning
can be more precise—your lights are still on.
A map is an interesting example of a complex visual symbolic object. There
are underlying fields, indicated by colors, that represent regions of various political
or geologic types. Icons are overlaid on these fields in a way that bears at least a
topological relation to the reality represented, and perhaps a geometric relation.
Generally there is a mixture of topological and geometric correspondences; the
width of a road on a map is usually not proportional to its width in reality. The
icons represent entities of interest. The icons can have internal structure representing
different categories of entities; for example, the size of a city may be represented
by the number of circles around a center dot. Labels represent names. Information
is conveyed by the possible spatial relations of adjacency, distance, and orientation.
For example, labels naming cities are placed near the icon representing the city.
In a process diagram, as in maps, icons represent entities of different types.
Individual states may be represented by a set of icons standing in particular
spatial relationships to each other, representing the relations among the entities that
characterize the state. Adjacent states may be connected by arrows, the direction of
the arrow indicating temporal order and thus state transitions [23, 26].
Documents are composed of blocks of text and other symbolic elements in a
particular spatial relationship. Some spatial relationships are tightly constrained.
Titles need to be at the top of the document; the relation conveyed is that the title
somehow indicates the content of the body of the document. Paragraphs meant
to be read in sequence should be adjacent (with exceptions for page and column
breaks). Paragraph breaks should also correlate with the coherence structure of the
text, although often in a complex way. Other spatial relationships can be looser.
A diagram, map, photograph, or sidebar only needs to be relatively near the block
110 J.R. Hobbs and R. Mulkar-Mehta
of text describing roughly the same thing, and the ordering of such figures should
respect the order of references to them in the body of the text.
Web sites and PowerPoint presentations are complex symbolic objects amenable
to similar analysis.
The performance of an individual in face-to-face conversation [1] is also a
complex symbolic object with its own compositional principles. The principal
component symbolic elements are the speech stream, prosody, facial expressions,
gaze direction, body posture, and gestures with the hands and arms. The primary
mode of composition is temporal synchrony. The relationship conveyed by temporal
synchrony could either be that the content of the two presentations are the same, as
when someone makes a rolling gesture while describing Tweety rolling down the
hill as in McNeil’s experiments [20], or parallel actions, as when someone freezes
her arm in midgesture to hold her turn while beginning a new clause. Lascarides
and Stone [17] have shown that the possible relations among gestures and between
gestures and speech are the same as the coherence relations that obtain between
adjacent segments of discourse, namely, those based on causality, similarity and
contrast, elaboration, interlocking change of state, and figure-ground.
A play is a more complex composite symbolic object, but its structure is equally
amenable to a compositional account in terms of adjacency relations among the
symbolic components and conjoined relations in the meaning.
In Sect. 6 we describe a program implementing these ideas to translate natural
language into logical form. In Sect. 7 we develop more fully a theory of diagrams
to illustrate the application of this theory of information structure to a special case
of substantial importance. But first some comments about coreference.
5 Coreference
6 Compositional Semantics
Hobbs [7, 9] defines a logical formalism for representing natural language in which
the logical form of a sentence is a flat conjunction of atomic propositions in which
all variables are existentially quantified with the widest possible scope. What in
more standard notations are represented by scope are represented in this notation by
predicate-argument relations, through the liberal use of reification of such things as
possible eventualities and type elements of sets.
In such a notation, it is very straightforward to see how compositional semantics
can be organized around the idea that concatenation within sentences means
predicate-argument relations. Morphemes provide one or more predications, that
is, an n-ary predicate applied to n arguments, where the arguments are existentially
quantified variables. When two adjacent strings are recognized as part of the same
constituent, we have recognized the identity of two of the variables. This effects a
recognition of the predicate-argument relation. Consider a simple example.
Tall men succeed.
We will ignore the plural and present tense, although they are not difficult to
handle. The word “tall” contributes the proposition tal l.x1 /. The morpheme “man”
contributes man.x2 /. The word “succeed” contributes succeed.x3 /.
7 A Theory of Diagrams
In this section we develop a theory of diagrams and apply it to defining Gantt charts.
A Gantt chart is a particularly common type of diagram in certain corporate cultures;
it graphs tasks and subtasks in a project against a time line. We need to rely on
concepts from several background theories, not all of which have been described in
published papers.
Toward a Formal Theory of Information Structure 113
Strings: We assume there are strings of characters. They are usually symbolic
objects, so they can be the first argument of the predicate means.
Time: In the development on Gantt charts, reference is made to concepts in
OWL-Time [13]. This ontology posits temporal entities (i.e., intervals and instants),
the beginnings and ends of intervals, a before relation, Allen interval relations
like intMeets, and temporal aggregates, which are sets of nonoverlapping, ordered
temporal entities. It also handles durations and clock and calendar terms. The three
predicates we need here are the following:
The one predicate we need from a theory of causality or processes [11] is enables:
For Gantt charts we need a simple ontology of projects, with the following three
predicates.
Project.p/ W p is a project.
taskIn.z; p/ W z is one of the tasks of project p:
milestoneIn.m; p/ W m is one of the milestones of project p:
A project is a composite entity among whose components are tasks and milestones,
which are events. The project and its parts can have names.
Space: The actual drawing of a diagram will involve mapping the ontology of
diagrams to an ontology of space. Some portion of space will have to be chosen as
the ground. This will define the vertical and horizontal directions and the above and
rightOf relations. In addition, the articulation between the theory of diagrams and
the theory of space would have to specify the kinds of spatial regions that realize
different kinds of diagram objects.
A diagram consists of various diagram objects placed against a ground, where each
diagram object has a meaning. We can take the ground to be a planar surface, which
thus has points. Diagram objects can have labels placed near them, and generally
they indicate something about the meaning. Diagram objects, points, frameworks,
meanings, and labels are discussed in turn, and then it is shown how these can be
used to define Gantt charts.
Diagram objects can be classified in terms of their dimensionality. In a spatial
ontology in general, we would have to specify both a dimension of the object and
the dimension of the embedding space, but in this theory of diagrams, we will take
our embedding space to be a two-dimensional plane. Thus, there are three types of
diagram objects:
0DObject.x/
DiagramObject.x/
1DObject.x/
DiagramObject.x/
2DObject.x/
DiagramObject.x/
Toward a Formal Theory of Information Structure 115
Zero-dimensional diagram objects in diagrams are the class of diagram objects that
are treated as having zero dimensions in the context of the diagram. Of course,
in a spatial ontology they would actually be small regions generally with some
symmetries. Three types of zero-dimensional diagram objects are dots, tickmarks,
and diamonds.
Dot.x/
0DObject.x/
Tickmark.x/
0DObject.x/
Diamond.x/
0DObject.x/
Curve.x/
1DObject
Three important kinds of curves are lines, rays (half-lines), and line segments.
Line.x/
Curve.x/
Ray.x/
Curve.x/
LineSegment.x/
Curve.x/
A line has no beginning or end. A ray has a unique beginning but no end. A line
segment has both a unique beginning and a unique end.
Line.x/
Œ:.9 p1 /Œbegins.p1 ; x/
^ :.9 p2 /Œends.p2 ; x/
Ray.x/
ŒŒ.9 Šp1 /Œbegins.p1 ; x/
^ :.9 p2 /Œends.p2 ; x/
LineSegment.x/
.9 Šp1 ; p2 /Œbegins.p1 ; x/
^ ends.p2 ; x/
LineSegment.x/
.8p/ŒbeginningOf .x/ D p begins.p; x/
LineSegment.x/
.8p/ŒendOf .x/ D p ends.p; x/
It will be useful to have a term Linear that covers all three types of linear diagram
objects.
A line segment “in” a linear diagram object is one that is wholly contained in it.
116 J.R. Hobbs and R. Mulkar-Mehta
lineSegmentIn.x; y/
LineSegment.x/ ^ Linear.y/ ^ .8p/ŒpointIn.p; x/
pointIn.p; y/
Arrow.x/
Curve.x/
Arrow.x/
.9 Šp1 ; p2 /Œbegins.p1 ; x/ ^ ends.p2 ; x/
An arrow may be straight or curved. It may overlap in part with other arrows.
An arrow is similar to a line segment but is realized in the underlying spatial
ontology differently, e.g., with a small triangle for its end.
It will be useful to talk about an arrow as a ternary relation.
A ground consists of points and any diagram object consists of points, in some
loose sense of “consist of”; that is, for any ground and any diagram object there is a
corresponding set of points.
ŒGround.x/ _ DiagramObject.x/
.9 s/.8p/Œmember.p; s/
pointIn.p; x/
0DObject.x/
.9 Šp/pointIn.p; x/
For convenience we will say that the single point in a zero-dimensional object both
begins and ends it.
0DObject.x/
.8p/ŒpointIn.p; x/ Œbegins.p; x/ ^ ends.p; x/
begins.p; x/ ^ Linear.x/
pointIn.p; x/
ends.p; x/ ^ Linear.x/
pointIn.p; x/
Points in the ground are partially ordered by an above relation and a rightOf relation.
above.p1 ; p2 ; g/
Ground.g/ ^ pointIn.p1 ; g/ ^ pointIn.p2 ; g/
rightOf .p1 ; p2 ; g/
Ground.g/ ^ pointIn.p1 ; g/ ^ pointIn.p2 ; g/
horizontal.x; g/ Linear.x/
^:.9 p1 ; p2 /ŒpointIn.p1 ; x/ ^ pointIn.p2 ; x/ ^ above.p1 ; p2 ; g/
verti cal.x; g/ Linear.x/
^:.9 p1 ; p2 /ŒpointIn.p1 ; x/ ^ pointIn.p2 ; x/ ^ rightOf .p1 ; p2 ; g/
A horizontal ray all of whose points are to the right of its beginning is a rightward
positive ray.
A vertical ray all of whose points are above its beginning is a upwardly positive ray.
A vertical ray all of whose points are below its beginning is a downwardly positive
ray.
A special kind of line segment needed for Gantt charts is a horizontal bar.
HBar.x/
.9 g/Œhorizontal.x; g/ ^ LineSegment.x/
When realized spatially, it will generally be thicker than other line segments.
Diagrams are constructed by placing points in diagram objects at points in the
ground or in another diagram object. The at relation expresses this.
at.p1 ; p2 /
.9 x1 ; x2 /ŒpointIn.p1 ; x1 / ^ pointIn.p2 ; x2 / ^ DiagramObject.x1 /
^ŒGround.x2 / _ DiagramObject.x2 /
^ x1 ¤ x2
0DObject.x/
.8p/Œat.x; p/ .9 p1 /ŒpointIn.p1 ; x/ ^ at.p1 ; p/
Typically, frameworks (see below) will be placed with respect to some points in the
ground, and other diagram objects will be placed with respect to the framework or
other diagram objects.
The relations of a diagram as a composite entity include its at relations. To say
this formally we can reify the at relation. Thus, at0 .r; p1 ; p2 / means that r is the at
relation between p1 and p2 . We can then say that r is a member of the relations of
the diagram.
7.4 Frameworks
Framework.f /
.9 s/.8x/Œmember.x; s/
DiagramObject.x/ ^ componentOf .x; f /
RCoordinateSystem.f /
Framework.f /
RCoordinateSystem.f /
.9 g/ŒGround.g/ ^ groundFor.g; f /
RCoordinateSystem.f / ^ groundFor.g; f /
.9 x; y/ŒxAxisOf .x; f / ^ yAxisOf .y; f / ^ rtPositive.x; g/
^ŒupPositive.y; g/ _ dnPositive.y; g/
xAxisOf .x; f /
RCoordinateSystem.f / ^ componentOf .x; f /
yAxisOf .y; f /
RCoordinateSystem.f / ^ componentOf .y; f /
xAxisOf .x; f /
axi s.x/
yAxisOf .y; f /
axi s.y/
Two points have the same x-coordinate if there is a vertical line that contains both
of them. Similarly, same y-coordinate.
sameX.p1 ; p2 ; f /
.9 l; g/ŒgroundFor.g; f / ^ verti cal.l; g/ ^ pointIn.p1 ; l/ ^ pointIn.p2 ; l/
sameY.p1 ; p2 ; f /
.9 l; g/ŒgroundFor.g; f / ^ horizontal.l; g/ ^ pointIn.p1 ; l/ ^ pointIn.p2 ; l/
The xValue of a point p is a point p1 in the x axis with the same x-coordinate.
Similarly for the yValue.
It will be convenient below to talk about the yValue of a horizontal line segment.
7.5 Meanings
Associated with every object in a diagram is its meaning. Meaning for diagrams
is thus a function mapping diagram objects into entities provided by some other
ontology. Meaning is conveyed by the predication means.x; c/ introduced above,
120 J.R. Hobbs and R. Mulkar-Mehta
DiagramObject.x/
.9 c/means.x; c/
7.6 Labels
A label is a textual object that can be associated with objects in a diagram. The two
basic facts about labels cannot be defined with precision without making reference
to the cognition of the reader of the diagram.
1. A label is placed near the object it labels, in a way that allows the reader of the
diagram to uniquely identify that object.
2. The content of the label as a string bears some relation to the meaning of the
object that it labels, in that perceiving the string causes one to think of the
meaning.
Specifying the first of these completely is a very hard technical problem [4]. For
example, often on a map one cannot correctly associate the name of a town with a
dot on the map without doing the same for all nearby towns, and associating a curve
on a map with the name of a road often requires abductive inferences about shortest
paths and consistency of line thickness. Here we will simply say that a label can
be placed at an object and leave it to component-specific computation to determine
what at means in some context.
label.l; x/
string.l/ ^ DiagramObject.x/ ^ at.l; x/
The second property of labels is also a difficult technical, or even artistic, problem.
But a very common subcase is where the label is a name. The whole purpose of a
name is to cause one to think of the object when one perceives the name, so it serves
well for this property of labels.
label.l; x/
.9 c/Œmeans.l; c/ ^ means.x; c/
Toward a Formal Theory of Information Structure 121
GanttChart.g; p/
Diagram.p/^Project.p/
^.9f; x; y; t; s/ŒRCoordinateSystem.f /^componentOf .f; g/^xAxisOf .x; f /
^rtPositive.x/ ^ means.x; t/ ^ TimeLine.t/ ^ yAxisOf .y; f /
^ŒupPositive.y/ _ dnPositive.y/
^ means.y; s/
^.8z/Œmember.z; s/
taskIn.z; p/
A Gantt chart has horizontal bars representing the interval during which a task is
executed.
Because a task is an event, OWL-Time allows instants as the beginnings and ends
of tasks. This axiom says that a Gantt chart has a set of components which are
horizontal bars representing tasks and the beginning of the bar represents the starting
time of the task and the end of the bar represents the finishing time of the task.
Similarly, a Gantt chart has diamonds representing milestones.
A Gantt chart often has arrows going from the end of one bar to the beginning of
another, indicating the first bar’s task is a prerequisite for the second bar’s task.
A diamond can also be the source and/or target of an arrow.
A bar in a Gantt chart may have labels for the date at its beginning and end.
The cardinality statement is a way of saying there is either zero or one label.
Similarly, a diamond in a Gantt chart may have a label for a date.
Points on the y-axis of a Gantt chart can be labeled with task names.
Points in the x-axis of a Gantt chart can be labeled with dates or times.
Toward a Formal Theory of Information Structure 123
Line segments in the x-axis of a Gantt chart can be labeled with calendar intervals.
Further elaborations are possible. The labels can have internal structure. For
example, labels for subtasks may be indented. Labels for time intervals may be
broken into a line for months, a line below for weeks, and so on.
In order for communication to work, perception of the symbol must occur. Humans
are able to perceive optical, acoustic, and chemical phenomena, as well as pressure
and temperature. Of these modalities the optical and acoustic are by far the most
important, because they offer the richest possibilities for composition. Artifact
agents of course have other modalities.
Communication requires one or more devices. There must be a manner in
which the presentation is carried out. Allwood [1] categorizes these into primary,
secondary, and tertiary. The primary devices or media are the ones that are human
body parts and processes. The voice is used for grunts, speech and song. The hands,
arms, body, face, and head are used for gesture. Even at this level some encoding
must be done; we need to find words for the concepts we wish to convey, and these
must be mapped into sequences of articulatory gestures.
The secondary media are those involving devices external to the human body,
such as marks on paper as in writing and drawings, computer terminals, telephones,
videotape, and so on. These typically require multiple encodings, where the
final code is known to the intended audience. The various media have different
advantages and disadvantages that can be exploited for different kinds of represented
content. For example, visual spatial representations can exploit more dimensions for
conveying relationships than can auditory temporal representations. Hovy and Arens
[16] catalog many of these correlations.
124 J.R. Hobbs and R. Mulkar-Mehta
manifest.x1 ; x/ ^ means.x; c; s/
means.x1 ; c; s/
(This does not take into account translations, where the s’s differ.)
Thus, to use the example of Pease and Niles [22], there is an entity called Hamlet.
The performance of Hamlet manifests Hamlet. The performance of Hamlet in a
particular season by a particular company manifests that, and a performance on a
particular night may manifest that. A videotape of that particular performance man-
ifests the performance, and every copy of that videotape manifests the videotape. A
similar story can be told about the script of the play, a particular edition of the script,
and a particular physical book with that edition of the script as its content.
The above rule should be thought of as defeasible, because variations exist, lines
can be dropped, and printer’s errors occur. More precisely, if some proposition
occurs in the content of one symbolic entity then defeasibly it occurs in the content
of symbolic entities that manifest it.
9 Summary
Acknowledgments We have profited from discussions with and work by Eduard Hovy, Nishit
Rathod, Midhun Ittychariah, and Paul Martin. The opinions expressed here, however, are entirely
our own. This material is based in part upon work supported by the Defense Advanced Research
Projects Agency (DARPA), through the Department of the Interior, NBC, Acquisition Services
Division, under Contract No. NBCHD030010. It was also supported in part by DARPA’s “Learning
by Reading: The Möbius Project” (NBCHD030010 TO #0007), and in part under the IARPA
(DTO) AQUAINT program, contract N61339-06-C-0160.
Toward a Formal Theory of Information Structure 125
References
23. Pineda, L., Garza, G.: A model for multimodal reference resolution. Comput. Linguist. 26(2),
139–194 (2000)
24. Pollard, C., Sag, I.A.: Head-Driven Phrase Structure Grammar. University of Chicago Press
and CSLI Publications, Chicago (1994)
25. Rathod, N.: LFToolkit. Available at http://www.isi.edu/hobbs/LFToolkit/index.html (2008)
26. Wahlster, W., André, E., Finkler, W., Profitlich, H.-J., Rist, T.: Plan-based integration of natural
language and graphics generation. Artif. Intell. 63(1–2), 387–427 (1993)
Mechanics and Mental Change
Jon Doyle
In memoriam
Joseph Arthur Schatz, 1924–2007
1 Vive la Résistance
J. Doyle ()
Department of Computer Science, North Carolina State University, Raleigh, NC, USA
e-mail: Jon Doyle@ncsu.edu
behavior. Indeed, some find it easier to change behavior in large ways than in small,
but few find change easy, either to effect or to accept.
The prominence of resistance to change in human nature raises doubts about the
standard conception of ideal rationality in thought and action. The foundations of the
ideal rationality put forward by decision theory and economics suppose that any bit
of new information can lead to arbitrarily large changes arbitrarily quickly. One can
easily tell someone “Everything you know is wrong!”, but few outside of fantastic
fiction can act on such advice even if they believe it. Real people can accept new
information without connecting it with prior beliefs to draw new consequences or
to notice inconsistencies and must search and think carefully to ensure that they do
not overlook some relevant fact in making decisions.
Resistance to change therefore impacts the very nature of human belief. Infer-
ences based on mundane prejudices and stereotypes permeate human knowledge
and provide commonly useful conclusions unless one makes the effort to distinguish
the atypical aspects of the current circumstances. Failure to find or acknowledge
these atypical aspects thus leads to systems of belief corrupted by inappropriate
assumptions. More generally, humans organize knowledge into numerous hierar-
chies of abstractions that reflect commonalities and differences in meanings. Failure
to find or acknowledge commonalities and to restructure abstractions to reflect them
produces conceptual organizations that omit useful connections between concepts
and so impede effective recognition of and response to changing circumstances.
A robust theory of rational reasoning and action falls short without a good
account of the origins and character of resistance to change. Philosophy and
psychology have developed several ways of understanding resistance to change,
especially the notions of habit, refraction, and entrenchment, but suffer a striking
omission: the long-standing notions of inertia and force in common informal usage
in everyday descriptions. Mathematicians from Galileo to Euler showed how to
use these concepts precisely in characterizing the behavior of physical bodies, but
psychology has suffered from the lack of such concepts except for use in informal
metaphor and analogy. One sees this lack clearly in the case of theories of ideal
rationality, in which the characteristic unbounded response to bounded impetus
shows none of the proportional response embodied in mechanical laws of force and
inertia.
The following presents an account of resistance to change in psychological
systems that reveals an underlying mechanical character to psychology. This
account employs an extension of the mathematical formulation of mechanics to
cover minds and persons with mechanical minds and bodies. As detailed in [14],
which develops the formalism in axiomatic terms along with the appropriate math-
ematical structures, the extended mechanics separates the fundamental properties
and structures of position, motion, mass, and force from special characteristics of
continuous space and extends these notions to hybrid systems that combine different
types of positions, masses, and forces. The presentation here omits the axiomatics in
favor of an annotated example of mechanics in a specific type of reasoner developed
in artificial intelligence.
Mechanics and Mental Change 129
in Minsky’s [24] theory of the “society of mind,” but few have sought to formalize
refraction directly. The most relevant formalizations do not address reaction as much
as they address conflict, specifically decision making in which mental subagencies
argue with each other [4] or have conflicting preferences [15]. Although one can
view refraction through these conceptual lenses, this view does not capture the
relation between the imposed change and the reaction to it.
A u x D .A ´ :x/ C x; (2)
Mechanics and Mental Change 131
stating that revision by x should be the same as adding x after first removing
anything contradicting x.
Simply naming these types of operations does not define them, inasmuch as one
might be able to remove a statement from a theory in many different ways, up to
and including removal of all statements except the tautologies, which cannot ever
be removed from deductively closed theories. Accordingly, the formal development
of these operations begins by understanding the types of changes that fit with each
of these operations.
We will divide the phenomenological approaches to analyzing mental change
into two subclasses, comparative theories that provide means comparing the relative
size of different changes, and characteristic theories that seek to characterize the
superficial properties of the starting and ending states connected by changes. We will
later contrast these phenomenological approaches with substantial theories that seek
to obtain comparative and characteristic properties of revision from more detailed
assumptions about the substance and structure of mental states.
X W Y (3)
over possible worlds such that each binary relation W is a preordering of possible
worlds, that is, satisfies the condition
X W X (4)
132 J. Doyle
X W Z (5)
whenever
X W Y W Z (6)
W W X (7)
X W Y (8)
to hold iff
but the axioms of comparative similarity do not require any structure of distance,
much less Euclidean distance, on the set of possible worlds. A comparative
similarity relation provides a way to compare changes with respect to the same
origin, but need not provide any way of comparing changes with different starting
points. One can capture that broader range of comparisons instead by considering
a preorder over all pairs of worlds, namely a binary relation satisfying the
conditions
1. .X; Y / .X; Y /,
2. .X; Y / .X 00 ; Y 00 / whenever .X; Y / .X 0 ; Y 0 / .X 00 ; Y 00 /, and
3. .X; X / .X; Y / (or even the stronger condition .X; X / .Y; Z/).
One easily verifies that each such a preorder over pairs of worlds induces a
comparative similarity relation by defining
Y X Z (10)
We say that s represents a pair-based similarity relation just in case we have that
Obviously such representations are purely ordinal in character and can make
comparable alternatives that were incomparable in the represented partial preorder.
We may thus regard the formalism of pair-based comparative similarity as a simple
(even simple-minded) means with which to describe different levels of difficulty
when moving from one world to another.
Although the notion of similarity of worlds does not in itself rest on any notion
of difficulty or of resistance to change, at least some notions of similarity and
dissimilarity track the ease and difficulty of changing from one mental state to
another, as will be seen later.
To go further than the abstract notion of comparative similarity, one must be more
specific about the nature of mental states. One important example is the theory
of Alchourrón, Gärdenfors, and Makinson (AGM) [1, 16], who laid down axioms
intended to characterize the observable properties of a notion of logically rational
belief revision. Their axioms for contraction are as follows.
(´1) A ´ x is a theory whenever A is
(´2) A´x A
(´3) If x … Cn.A/, then A ´ x D A
(´4) If 6` x, then x … Cn.A ´ x/
(´5) If ` x $ y, then A ´ x D A ´ y
(´6) A Cn..A ´ x/ C x/ whenever A is a theory
(´7) .A ´ x/ \ .A ´ y/ A ´ .x ^ y/ whenever A is a theory
(´8) If x … A ´ .x ^ y/, then A ´ .x ^ y/ A ´ x whenever A is a theory
These axioms mainly state fairly intuitive conditions, for example, that A ´ x is
always included in A (´2); that contraction leaves A unchanged if x … Cn.A/ (´3);
that the contraction omits x as long as 6` x (´4); that contractions by equivalent
statements yield the same result (´5); and that adding x to the result of contracting
by x only yields conclusions present prior to the contraction (´6).
AGM showed that a range of natural revision functions based on ideas similar to
Rescher’s preference-based revisions satisfy all their axioms, with different types of
AGM revisions corresponding to the way revision proceeds when there are multiple
134 J. Doyle
The AGM axioms characterize changes of entire theories, but do not explicitly
relate changes affecting different beliefs. Gärdenfors and Makinson [17] studied
theory revision from the perspective of individual statements and showed that the
outcomes-based AGM approach implies the existence of an ordering of beliefs by
relative epistemic entrenchment, such that the effect of adopting a new belief is
to abandon the least entrenched prior beliefs necessary to ensure consistency. The
ordering x < y of two statements in a theory means that one gives up x before y in
contractions when one cannot keep both. They formulate the following set of axioms
stating that the entrenchment ordering must be transitive (ĺ1); exhibit a general type
of reflexivity (ĺ2); must give up at least one conjunct when giving up a conjunction
(ĺ3); must regard propositions not in the current beliefs to be minimally entrenched
(ĺ4); and must regard propositions to be maximally entrenched only if they are
logically valid (ĺ5).
(ĺ1) If x ĺ y and y ĺ z, then x ĺ z
(ĺ2) If x ` y, then x ĺ y
(ĺ3) Either x ĺ x ^ y or y ĺ x ^ y
(ĺ4) If A is consistent, then x ĺ y for all y iff x … A
(ĺ5) If x ĺ y for all x, then ` y
Gärdenfors and Makinson then prove that the axioms for contraction and entrench-
ment are equivalent in the sense that every contraction function corresponds to
some entrenchment ordering, and every entrenchment ordering corresponds to some
contraction function. Formally, x ĺ y holds iff either
x … A ´ .x ^ y/ or `x^y I (14)
Mechanics and Mental Change 135
correspondingly,
y 2 A´x (15)
holds iff
The Lewis, AGM, and Rescher conceptions provide a framework for formalizing
comparisons between changes that let us say that one change is bigger than another.
Although one expects to find such comparisons in any approach to understanding
habit and refraction, mere comparisons do not provide insight into why one change
ranks bigger than another, nor do they provide cardinal measures to quantify just
how much bigger one change is than another. To understand why one change
is harder than another, and just how much harder it is, we must look beyond
mere comparisons to theories that explain difficulty in terms of the substance and
organization of mental states. To do this, we again turn to Quine, who painted a
picture of substantial origins for mental change in his image of the “web of belief”
[30, 31], in which belief revision restructures a network of connections between
beliefs, with the resulting changes guided by the principle of minimum mutilation.
One key element of the Quinian conception is that connections between beliefs
reflect some type of coherence between the beliefs. Coherence in this conception
demands at least consistency between beliefs, and in some cases that some beliefs
explain others, but in all cases that the overall set of beliefs is the central focus
of attention. Yet Quine’s picture also carries within it another conception, one in
which it is direct connections between beliefs that take center stage. Think of a
physical web, whether that of a spider or a fisherman’s netting, and one thinks of
pokes and tugs moving and stretching smaller or larger portions of the web, and
of the elements of the web exhibiting degrees of stiffness, elasticity, and the like.
Such an approach to mental change has been explored in artificial intelligence in
two ways: structural or foundational methods based on reasons and dependencies
and the foundations they provide for beliefs and other attitudes, and preferential
methods based on domain-specific preferences among beliefs and how they guide
rational reasoning.
136 J. Doyle
A C ; (17)
A B C ; (18)
¿ ¿ fag : (20)
The typical RMS uses these inferential records to construct a set of conclusions
that contain all and only the conclusions demanded by the recorded reasons, so that
every belief in the set of conclusions is a consequence of some reason for which the
antecedents are present (In) and the defeaters are absent (Out). Formally, a reason
AB C (21)
Mechanics and Mental Change 137
ŒA X B ! ŒC X ; (22)
AB C (23)
is valid if
A X B: (24)
The requirement that each conclusion be supported by some valid reason, which
one might call local grounding, represents one position on a spectrum of possible
grounding conditions [5, 12]. The original RMS imposed a strong global grounding
requirement, namely that each conclusion is supported by a noncircular argument
from the base reasons [3]. One can also consider intermediate organizations that
divide memory into multiple working sets or locales and require strict grounding of
conclusions within each locale but only local grounding across locales [13].
In a foundational approach, one effects changes in conclusions indirectly by
making changes in the base beliefs. One adds reasons that generate beliefs rather
than simply adding beliefs directly. This requires systematic addition of reasons
in the course of reasoning, for example, through comprehensive recording of the
reasoning history [19] or using episodic summarization methods such as chunking
[20]. The reasons persist in memory until explicitly removed, even as the set of
conclusions changes due to changes in the set of base reasons. In fact, by employing
reason structures that provide for uniform nonmonotonic defeasibility, one need not
ever remove reasons, but instead can defeat any reason marked for removal [4].
In the foundational setting, reasoned contraction involves tracing underlying
derivations recursively. To remove some conclusion, one looks to its supporting
reason or argument and seeks to defeat some assumption or reason involved in
this support, as in the technique of dependency-directed backtracking [3, 35]. The
removal of conclusions by support tracing need not be immediate, because when
one assumption is removed or defeated, one must reevaluate the reasons by which
it supported other statements. If those supported statements have alternative means
of support in other reasons, one must repeat the support-tracing process and make
further changes until one finally removes the target conclusion.
limited appeal as methods of judging difficulty of change, for the wide variability of
time and space requirements due to variations in algorithms and architectures means
that the difficulty of a particular change has little relation to specific numerical
measures of time and space.
Rather than looking to standard computational measures of difficulty, one can
instead employ noncomputational comparisons of difficulty of change related more
directly to the reason maintenance approach itself [5]. Such comparisons can take
the form of Lewis-like relations
X W Y (25)
of comparative similarity of belief states W; X; Y that capture the intent, if not the
practice, of reason maintenance revisions. For example, one can compare change
difficulty in terms of the set of changed conclusions, so that
X W Y iff X 4W Y 4W ; (26)
.X n Y / [ .Y n X / : (27)
One can also measure change difficulty in terms of the number of changes made, in
which case
X W Y iff jX 4Y j jY 4W j : (28)
One can connect reasoned foundations with entrenchment in another way as well.
As noted earlier, one can interpret nonmonotonic reasons as constraints on sets of
beliefs, mandating that certain conclusions be held if certain antecedents are held
and others are not. In this interpretation, one can regard reasons as expressing
intentions of the reasoner about the content of its mental state, namely that
conclusions of valid reasons must be believed.
However, one can also interpret nonmonotonic reasons as expressing preferences
about the content of its mental state as well [5, 6], somewhat akin to Rescher’s revi-
sion preferences and the preference-order character of the epistemic entrenchment
relation. The preferential content of reasons indicates that indicated nonmonotonic
conclusions should be believed in preference to indicated nonmonotonic qualifica-
tions, in the sense that sets of conclusions X satisfying
ŒA 6 X _ ŒA X B ^ C X (29)
A
X 6
B ; (30)
140 J. Doyle
ŒA X ^ X 6 B ^ C 6 X : (31)
The substantive theories of mental change go beyond the ideals of pure logical
and economic rationality by relating the shape of mental changes to the origins
of mental changes. The AGM theory of belief revision, for example, says nothing
about how the entrenchment order changes, only that it exists and guides revisions,
although some subsequent theories treat some revisions as shifting the base of
the entrenchment ranking. Similarly, standard theories of economic rationality say
much about how probabilistic expectations change with new evidence, but almost
nothing about how or when preferences change. Artificial intelligence addresses
these questions with theories of problem solving in which reducing goals to subgoals
changes the preferences of the reasoner, in which chunking or related mechanisms
change the set of base reasons and hence the foundations of mental states, and in
which base and derived reasons express preferences that shape the revisions they
induce.
5 A Mechanical Perspective
Even though one can interpret a variety of systems in terms of reasons, reasons still
represent only one means for assessing difficulty of change. To find a more general
identification of generators of difficulty in change, we look to mechanics and its
notions of mass, inertia, and constitutive resistive forces. In the present discussion,
the base reasons of the foundations approach constitute the mass of the reasoner,
the derived conclusions constitute its position, changes in conclusions constitute its
velocity, and changes in base reasons constitute its change in mass. Reasoning in
such reasoners generates forces that depend on the character of the reasoner, on
how the reasoner conducts volitional and deliberative processes.
Everyday references to habit, refraction, and entrenchment make informal use
of mechanical notions in addition to the neutral and psychological language
used above. In this long-standing but informal usage, refraction becomes reaction
Mechanics and Mental Change 141
to force, and entrenched habits require force to change. This usage probably predates
the development of mechanics as a formal, mathematical subject, but ever since the
Newtonian era, application of mechanical concepts to anything but physical systems
has involved only metaphor and analogy.
Recently, however, axiomatic work in rational mechanics and formal work in
artificial intelligence have laid the groundwork for a change in the theoretical
status of psychological application of mechanical concepts. Artificial psychological
systems such as RMS satisfy the axioms of mechanics, and thus constitute systems
to which mechanical concepts properly apply without recourse to metaphor or
analogy [14].
Modern rational mechanics [26, 27, 37] supports applications to psychology by
providing a formal theoretical structure that offers several advantages over the older
mechanical tradition. Three of these advantages offer special benefit in formalizing
psychology in mechanical terms.
First, rational mechanics cleanly separates general mechanical requirements
from the properties of special materials. Modern theories of mechanics regard
bodies as subject to general laws applying to all types of materials, laws that set
out the properties of space, time, bodies, and forces, and that relate forces on bodies
to the motions caused by the forces.
None of the central mechanical laws say anything about which forces exist, or
even that any particular forces exist. Such statements instead come in special laws of
mechanics, such as laws of dynamogenesis that characterize the origin of forces. The
most general of these set out the laws of inertial forces. Traditional presentations of
mechanics give a distinguished role to this force through the Newton–Euler equation
f D pP ; (32)
in which one takes the quantity f to aggregate all the non-inertial forces acting on
the body. The more general law
f pP D 0 (33)
We represent cognitive states using vectors in the binary vector space D D .Z2 /D
over scalars Z2 . In the RMS terminology, 1 means In and 0 means Out. We define
and
xCx D 0 (36)
and
xy D x \ y : (38)
For infinite D, we consider only vectors representing the finite and cofinite subsets.
In this space, orthogonal transformations are permutations.
We write xt 2 D to denote the position at discrete instant t and
xP t D xt xt 1 (39)
to denote the trailing velocity. This trailing velocity corresponds to the change
signals used in some automated reasoners in triggering rules [3]. The leading
acceleration
xR t D xP t C1 xP t (40)
144 J. Doyle
Fig. 1 The kinematical relationships among position variables in time. The boxed quantities
denote conventional labels for the quantities of interest at instant t , with a reasoning agent
observing xt and xP t and choosing xR t
reflects the additions and removals indicated by steps of reasoning. We depict these
kinematical quantities in Fig. 1.
We denote the mass at an instant by mt 2 D, and the leading mass flux
P t D mt C1 mt :
m (41)
The momentum
pt 2 D D (42)
pPt D pt C1 pt D .m
P t ; xR t / : (43)
ft 2 D D : (44)
ft D pPt D .m
P t ; xR t / : (45)
ft D ft a C ft s (46)
on the reasoner combines the applied force ft a of environment on the reasoner with
the self-force ft s of the reasoner on itself.
In this setting, it is natural to interpret reasons as conditional invariants of the
motion. In particular, an interval reason
r D Ar Br Cr Dr ; (47)
Mechanics and Mental Change 145
and acceleration
Cr x t C Dr xt xP t if Ar x t C Br xt D 0
xR t D (51)
xP t otherwise
With reasoning mediated by reasons in this way, we can calculate the work and
effort expended in reasoning as follows. The power
Pt D xP t C1 ft (52)
exerted across interval .t; t C 1/ is found in the inner product of the force acting
across that interval with the velocity across that same interval. The differing
temporal subscripts of velocity and force in this formula reflect the difference
between leading forces and trailing velocities. We calculate the instantaneous power
Pt to be
Pt D xP t C1 ft (53)
D xP t C1 .m
P t ; xR t /
D j.m
P t ; xP t C1 xR t /j (54)
146 J. Doyle
D j.m
P t ; xP t C1 xP t C1 xP t /j
D j.m
P t ; xP t C1 /j j.0; xP t C1 xP t /j; (55)
here using the norm that counts the number of 1s in a vector corresponding to the
ordinary inner product of binary vectors. The work expended across some interval,
therefore, is the integral (sum) of the power over that interval, and we may use this
mechanical work as a measure of the mental effort of reasoning.
We examine this formula in greater detail to understand how this measure of
effort works in the context of reasoning. Different reasoners operate in different
ways, and we find that the overall effort of reasoning varies accordingly.
One can divide reasoning models into two types that differ in the treatment of
time. In what one might call the internal time model, one identifies instants with
steps of reasoning, no matter how long separates these steps in the world at large.
In what one might call an external time model, one regards steps of reasoning as
separated by intervals during which the reasoner does nothing. Both of these models
offer useful insights.
We consider the internal time model first. This model corresponds best to a notion
of deliberate reasoning, in which every step of reasoning involves some change to
memory or outlook. In (55) we see that the power expended across a basic unit
of time is the change of mass and position minus a cross term xP t C1 xP t involving
velocity at successive intervals. This cross-term vanishes in deliberate reasoning
because normally one does not immediately retract a conclusion one has just drawn,
or draw a conclusion one has just retracted; there would be no point to it. In this
setting, therefore, we obtain the magnitude of the power expended by the step of
reasoning by
Pt D j.m
P t ; xP t C1 /j (56)
P t j C jxP t C1 j:
D jm (57)
In this case, the work of a step of reasoning just adds together the number of changes
made in memory and attitudes, so the effort involved in a chain of reasoning steps
consists of the cumulative number of changes made in memory and attitudes across
the span of reasoning.
In the external time model, we regard steps of reasoning as exerting impulse
forces on the reasoner, with the reasoner exhibiting inertial (force-free) motion
between steps of reasoning. The “Simple Impulse” table in Fig. 2 illustrates the
application of a simple impulse spatial force akin to the internal-time model just
discussed. This impulse expends an effort of j˛j in the time step in which it is
applied, according to the preceding calculation. In the subsequent inertial motion,
of course, the force ft vanishes, and so by (53) the power vanishes as well, so the
total effort expended in a chain of reasoning steps again equals the cumulative sum
of the number of changes to memory and attitudes, with the inertial motions doing
no work.
Mechanics and Mental Change 147
Inertial motion takes a cyclic form in the discrete space D due to the algebraic
characteristic that x C x D 0. As Fig. 2 indicates, inertial motion with velocity ˛
starting from a position ˛ thus traverses the trajectory ˛; 0; ˛; 0; : : : . It is certainly
not commonplace to think of reasoners as cycling the last set of conclusions in
this way. In standard artificial intelligence mechanizations, one instead regards step
of reasoning as changing the set of conclusions from one set to another and then
leaving it there until the next step of reasoning, as in the internal-time picture
of motion. Accommodating this expectation requires one to modify the simplistic
picture of reasoning seen in the internal time model.
One easily obtains a more familiar picture of reasoning by regarding steps of
reasoning as exerting two impulses, corresponding to the rising and falling edges
of a pulse, as depicted in the “Up and Down” table of Fig. 2. That is, the force of
the first half of a step of reasoning changes the velocity so as to effect the desired
change of position, and the force of the second half of the step of reasoning changes
the velocity back to zero by simply reversing (repeating) the thrust of the first half.
This produces a pattern of motion of start–stop steps separated by zero-velocity
intervals. This start–stop pattern of forces is in fact the pattern of reason forces, in
which the frictional force component—xP provides the falling impulse. This does
not involve twice the mechanical effort of the internal time and simple external
time pictures, however, because the falling impulses, matched with zero velocities
in (54), contribute nothing to the cumulative effort.
Note that mechanical work only measures the effort of making the change
itself and does not include any effort involved in evaluating the applicability of
some reasoning habit, of searching for the appropriate inference to perform, or
of deciding among alternative inferences, if these activities are not effected by
means of reasoning steps themselves. If generating a force ˛ requires effort j˛j,
for instance, then the Simple Impulse motion involves an additional effort of j˛j,
while the Up and Down motion involves an additional effort of 2j˛j. The effort
associated with such activities, however, depends on the organization and realization
of the mind. For example, there need be no effort involved in producing the falling
impulse of Up and Down as this value is already available as the velocity. Or for
another example, evaluating the applicability of a set of reasons by a serial scan of
databases of reasons and conclusions likely involves more effort than by parallel
evaluations conducted by reasons wired together like neurons or Boolean circuits.
148 J. Doyle
Fig. 3 Reasoned decomposition of a transition between equilibrium states t and tC1 into a
series of “microtransitions” between microstates t;i effected by a series of reason applications rt;i
The effort of reasoning also involves the work of focusing attention on reaching
the intended conclusions and on not pursuing distractions arising in the course
of reasoning. Autonomous habits of reasoning or perceptions can exert forces of
distraction on the reasoner and forces that move it either back toward where it
started or merely off the intended path. The reasoner must notice these backwards
steps and diversions and force itself back onto the intended path. One pays a double
price for refraction and distraction, as both these and the refocusing actions do
work and add to the total effort of reasoning. The same holds true for switching
between separate reasoning activities being pursued simultaneously, for the effort of
switching contexts adds to the efforts attributable to the separate reasoning activities.
The effort involved in a single step of reasoning also enters into the comparisons
made by coherence-based conceptions of mental change indirectly because these
large-scale changes to mental state typically involve an extended process of smaller
reasoning steps. Mechanically, the coherence-based changes take the form of elastic
response. The stipulated change, or step of reasoning that triggers a change to a
new coherent state of mind, consists of an external deformation. This deformation
requires a change to a new “relaxed” or equilibrium state. In RMS revision, this
relaxation process consists of a sequence of smaller reasoning steps, which we
depict in Fig. 3. As the base reasons constituting the mass of the reasoner grows
through learning and experience, the time needed to effect revisions can also grow
as revisions involve more and more reasons.
Nonmonotonic reasoning exhibits an elastic character even more strongly. One
can view the defeat of an otherwise valid nonmonotonic reason as producing
a deformed configuration of the reasoner. Defeat or removal of this defeater
forces restoration of the state of mind prior to imposition of the deformation,
with the nonmonotonic reason acting like a spring element in shaping the mental
configuration. The character of such elastic response is more complex in reasoning
than in simple physical materials like springs in that RMS-based reasoners search
for equilibria in both relaxation and restoration processes, and the equilibrium
resulting from a restoration need not coincide with the one existing prior to the
deformation. One must look at more complicated physical systems than individual
springs to see similar indeterminacy of equilibrium transitions.
Mechanics and Mental Change 149
7 Conclusion
Modern rational mechanics provides concepts of mass, force, work, and elasticity
that constitute a new analytical framework and vocabulary for characterizing limits
to mental change. These concepts go beyond those available in theories of ideal
logical and economic rationality that focus on mere comparative difficulty, and
beyond computational and neurophysiological theories that focus on measures
only loosely related to mental effort. Mechanics also provides other means for
characterizing limitations on mental change, including bounds on the forces exerted
in reasoning and forms of constitutional rigidity (see [14]).
References
1. Alchourrón, C., Gärdenfors, P., Makinson, D.: On the logic of theory change: partial meet
contraction functions and their associated revision functions. J. Symbolic Logic 50, 510–530
(1985)
2. Brooks, R.A.: A robust layered control system for a mobile robot. IEEE J. Robot. Automat.
2(1), 14–23 (1986)
3. Doyle, J.: A truth maintenance system. Artif. Intell. 12(2), 231–272 (1979)
4. Doyle, J.: A model for deliberation, action, and introspection. AI-TR 581, Massachusetts
Institute of Technology, Artificial Intelligence Laboratory (1980)
5. Doyle, J.: Some theories of reasoned assumptions: an essay in rational psychology. Technical
Report 83–125, Department of Computer Science, Carnegie Mellon University, Pittsburgh
(1983)
6. Doyle, J.: Reasoned assumptions and Pareto optimality. In: Proceedings of the 9th International
Joint Conference on Artificial Intelligence, pp. 87–90. Morgan Kaufmann, San Francisco
(1985)
7. Doyle, J.: Artificial intelligence and rational self-government. Technical Report CS-88-124,
Carnegie-Mellon University Computer Science Department (1988)
8. Doyle, J.: Constructive belief and rational representation. Comput. Intell. 5(1), 1–11 (1989)
9. Doyle, J.: Rational belief revision. Presented at the third international workshop on
nonmonotonic reasoning, Stanford Sierra Camp, CA, June 1990
10. Doyle, J.: Rational belief revision (preliminary report). In: Fikes, R.E., Sandewall, E.
(eds.) Proceedings of the Second Conference on Principles of Knowledge Representation and
Reasoning, pp. 163–174. Morgan Kaufmann, San Mateo (1991)
11. Doyle, J.: Reason maintenance and belief revision: foundations vs. coherence theories. In:
Gärdenfors, P. (ed.) Belief Revision, pp. 29–51. Cambridge University Press, Cambridge
(1992)
12. Doyle, J.: Reasoned assumptions and rational psychology. Fundamenta Informaticae 20(1–3),
35–73 (1994)
13. Doyle, J.: Final report on rational distributed reason maintenance for planning and replanning
of large-scale activities (1991–1994). Technical Report TR-97-40, ADA328535, Air Force
Research Laboratory (1997)
14. Doyle, J.: Extending Mechanics to Minds: The Mechanical Foundations of Psychology and
Economics. Cambridge University Press, London (2006)
15. Doyle, J., Wellman, M.P.: Impediments to universal preference-based default theories. Artif.
Intell. 49(1–3), 97–128 (1991)
150 J. Doyle
16. Gärdenfors, P.: Knowledge in Flux: Modeling the Dynamics of Epistemic States. MIT,
Cambridge (1988)
17. Gärdenfors, P., Makinson, D.: Revisions of knowledge systems using epistemic entrenchment.
In: Vardi, M.Y. (ed.) Proceedings of the Second Conference on Theoretical Aspects of
Reasoning About Knowledge, pp. 83–95. Morgan Kaufmann, Los Altos (1988)
18. Gärdenfors, P.: The dynamics of belief systems: foundations vs. coherence theories. Revue
Internationale de Philosophie 172, 24–46 (1990)
19. de Kleer, J., Doyle, J., Steele Jr., G.L., Sussman, G.J.: AMORD: explicit control of
reasoning. In: Proceedings of the 1977 Symposium on Artificial Intelligence and Programming
Languages, pp. 116–125. ACM, New York (1977)
20. Laird, J.E., Rosenbloom, P.S., Newell, A.: Chunking in soar: the anatomy of a general learning
mechanism. Mach. Learn. 1(1), 11–46 (1986)
21. Levesque, H.J.: A logic of implicit and explicit belief. In: Proceedings of the National Confer-
ence on Artificial Intelligence, American Association for Artificial Intelligence, pp. 198–202.
AAAI Press, Palo Alto, CA, USA (1984)
22. Lewis, D.: Counterfactuals. Blackwell, Oxford (1973)
23. McCarthy, J.: Programs with common sense. In: Proceedings of the Symposium on
Mechanisation of Thought Processes, vol. 1, pp. 77–84. Her Majesty’s Stationery Office,
London (1958)
24. Minsky, M.: The Society of Mind. Simon and Schuster, New York (1986)
25. Newell, A., Simon, H.A.: Human Problem Solving. Prentice-Hall, Englewood Cliffs (1972)
26. Noll, W.: The foundations of classical mechanics in the light of recent advances in continuum
mechanics. In: Henkin, L., Suppes, P., Tarski, A. (eds.) The Axiomatic Method, with Special
Reference to Geometry and Physics; Proceedings of an International Symposium held at the
University of California, Berkeley, December 26, 1957–January 4, 1958. Studies in Logic and
the Foundations of Mathematics, pp. 266–281. North-Holland, Amsterdam (1958). Reprinted
in [28]
27. Noll, W.: Lectures on the foundations of continuum mechanics and thermodynamics. Arch.
Rational Mech. Anal. 52, 62–92 (1973). Reprinted in [28]
28. Noll, W.: The Foundations of Mechanics and Thermodynamics: Selected Papers. Springer,
Berlin (1974)
29. Quine, W.V.: Philosophy of Logic. Prentice-Hall, Englewood Cliffs (1970)
30. Quine, W.V.O.: Two dogmas of empiricism. In: From a Logical Point of View: Logico-
Philosophical Essays, 2nd edn., pp. 20–46. Harper and Row, New York (1953)
31. Quine, W.V., Ullian, J.S.: The Web of Belief, 2nd edn. Random House, New York (1978)
32. Rescher, N.: Hypothetical Reasoning. North Holland, Amsterdam (1964)
33. Scott, D.S.: Domains for denotational semantics. In: Nielsen, M., Schmidt, E.M. (eds.)
Automata, Languages, and Programming: Ninth Colloquium. Lecture Notes in Computer
Science, vol. 140, pp. 577–613. Springer, Berlin (1982)
34. Shand, A.F.: The Foundations of Character: Being a Study of the Tendencies of the Emotions
and Sentiments, 2nd edn. Macmillan and Company, London (1920)
35. Stallman, R.M., Sussman, G.J.: Forward reasoning and dependency-directed backtracking in a
system for computer-aided circuit analysis. Artif. Intell. 9(2), 135–196 (1977)
36. Sussman, G.J., Stallman, R.M.: Heuristic techniques in computer-aided circuit analysis. IEEE
Trans. Circ. Syst. CAS-22(11) (1975)
37. Truesdell, C.: A First Course in Rational Continuum Mechanics, vol. 1. Academic, New York
(1977)
38. Ullman, S.: Visual routines. AI Memo 723, MIT AI Lab, Cambridge (1983)
39. del Val, A.: Non-monotonic reasoning and belief revision: syntactic, semantic, foundational,
and coherence approaches. J. Appl. Non-Classic. Logics 7(2), 213–240 (1997). Special Issue
on Inconsistency-Handling
Semantic Technologies: A Computational
Paradigm for Making Sense of Qualitative
Meaning Structures
Udo Hahn
Abstract The World Wide Web has become the largest container of human-
generated knowledge but the current computational infrastructure suffers from
inadequate means to cope with the meaning captured in its interlinked documents.
Semantic technologies hold the promise to help unlock this information compu-
tationally. This novel paradigm for the emerging Semantic Web is characterized
by its focus on qualitative, from the perspective of computers un(der)structured,
data, with the goal to ‘understand’ Web documents at the content level. In this
contribution, fundamental methodological building blocks of semantic technologies
are introduced, namely terms (symbolic representations of very broadly conceived
‘things’ in the real world), relations between terms (assertions about states of the
world), and formal means to reason over relations (i.e. to generate new, inferentially
derived assertions about states of the world as licensed by rules). We also discuss
briefly the current status and future prospects of semantic technologies in the Web.
U. Hahn
Department of German Linguistics Friedrich Schiller University Jena, Jena, Germany
e-mail: udo.hahn@uni-jena.de
studied in physics, economics or medicine that employ the rules of special branches
of mathematics, calculus and statistics, in particular. Data generated on this basis
have a clear meaning since they rely on well-defined algebraic and metrical criteria
and conditions of measurement. Computers as hardware devices and the vast
majority of software running on them are typically seen to support numerical
computation schemes only involving quantitative data (e.g. solving differential
equations, performing matrix multiplications or eigenvalue computations) and the
processing of large amounts of numerical data—the reason why computers for
decades were also termed ‘number crunchers’. A typical computational vehicle to
organize and manage large volumes of such data in a thematically coherent way are
database systems. They impose additional structure on these data as defined by the
underlying organizational structure (e.g. relational schemata) of the database. In this
way, quantitative data, in addition, become structured data.
Yet, the overwhelming part of our knowledge about the world is not quantita-
tively represented. It is rather encoded by non-numeric, non-metrical qualitative
symbol systems whose intrinsic properties, one might suspect, seem to resist
mathematical rigor, most prominently everything we express in some form of
natural language. The reasons for using natural languages in our everyday and
scientific communication are manifold. They are humans’ most natural communi-
cation vehicle (we all speak at least one language, our mother tongue), and they
are so rich in expressivity and flexible to use. Natural languages allow us not
only to formulate crisp claims with precise statements but also to add information
that overlays these assertions. We may, for instance, establish different temporal
axes for an utterance (whether it holds in the past, the presence or the future),
or attach different degrees of certainty (with lexical markers such as ‘possibly’,
‘perhaps’, etc.) or modality (using ‘should’, ‘may’, ‘could’, etc.) to an utterance.
These linguistic means are necessary to formulate intentionally vague assumptions,
claims, guesses, opinions or stipulate hypotheses. Using natural language, we may
inform, comment on, challenge, criticize, convince or persuade others. This versatile
communication power, however, comes with a high potential for ambiguity (i.e.
multiple interpretations of the same underlying utterance). Natural languages are
adaptive and dynamic systems that undergo continuous changes of the lexical
(vocabulary) system and (to a lesser degree though) the grammar system, i.e.
the rules guiding the construction (and analysis) of utterances at the syntactic
(structural) and semantic (meaning) level. Moreover, as far as linguists can tell, the
lexical and grammar systems are very large and interact heavily at all levels of usage
and description.
Altogether, these properties of natural languages make it hard for computers
to deal with the contents contained in qualitative raw data such as letters, notes,
manuals, scientific or newspaper articles, etc. Today’s computers primarily allow
straightforward text editing and formatting, deleting and inserting character strings
at the linear level of so-called string processing in text processors. But standard
computer programs dealing with textual data as character sequences only have no
clue at all what the contents of the underlying text really is, e.g. what the facts
encoded in it are, whether these facts are true or consistent relative to what is
Semantic Technologies 153
commonly shared knowledge, whether the statements being made are well-argued
assertions, unfounded claims, certain or uncertain hypotheses, or merely vague
guesses. In a word, the semantics of natural language statements concerned with the
meaning of single linguistic utterances and their contextual embedding both relative
to the discourse setting in which they are uttered and the background knowledge
to which they refer currently remain outside the scope of standard computational
facilities.
This lack of semantic competence of today’s computational machinery is not at
all restricted to the understandability of natural language utterances but carries over
to other kinds of qualitative symbol systems in which hard-to-interpret sequences of
textually encoded symbols, on the one hand, or visual data (such as tables, figures,
drawings, pictures, and movies) as well as audio data (such as sounds, music, and
spoken language), on the other hand, may constitute meaningful contents. Consider
as yet another example the case of the four chemical bases Guanin, Thymin,
Cytocin, and Acetin from which DNA sequences are formed. Since many years,
powerful string readers (implemented in high-throughput sequencing robots) are
available, which read out tons of data constituted by sequences of this 4-letter
vocabulary and compare pairs of DNA strings for similarity by checking insertions,
deletions, or permutations of characters or character substrings, most notably using
variants of the famous BLAST string processing algorithm [1].1 But only few
algorithmic means are available, on a larger systematic scale, to automatically link
a syntactically decoded DNA substring to its biomedical meaning or function, e.g.
where a piece of a DNA string encodes protein (mal)functions or, even more so,
diseases. Obviously, not only the semantics of natural languages but also that of
biological ones remain, by and large, terra incognita for computing machines.
From the perspective of computers at least, qualitative symbol systems such
as natural or bio-molecular languages (for further analogies, see Searls’ seminal
article [40]) appear as being unstructured because computers have no built-in
interpretation schemes to decode the intended meaning from sequences of their
respective base vocabularies, words and chemical bases, respectively, and the
structural configurations they are embedded in according to the underlying syntax.
Information contained in such qualitative raw data is, however, at the core of the
emerging Semantic Web (Web 3.0)2 [5].3 Technologies are being developed to cope
with the meaning of this type of data and, thus, determine the structure inherent
to qualitative data computationally. The challenge for this undertaking is huge
because humans, unlike computers, are particularly proficient to make sense of
1
http://blast.ncbi.nlm.nih.gov/Blast.cgi
2
http://www.w3.org/2001/sw/
3
Seth Grimes discusses the myth of the 80% rule (80% of the data on the Web are stipulated
to be un- or semi-structured, whereas only the remaining 20% are structured and thus easily
interpretable by machines) that has evolved from several hard to track sources (mostly observations
made by IT companies, e.g. database, network and storage suppliers); see http://clarabridge.com/
default.aspx?tabid=137&ModuleID=635&ArticleID=551. A summary of the arguments related to
the proliferation of unstructured data is contained in [41].
154 U. Hahn
such (seemingly) unstructured data, interpret them in context and infer relations
between single data items based on previous individual and community knowledge.
Interestingly, the semantic interpretation gap we witness not only relates to socially
defined and human-rooted symbol systems like natural languages but applies to ones
determined solely by nature, like bio-molecular languages, as well.
Why do computers do such a good job when it comes to temperatures, air
pressure, trading indicators and physical parameters of nutrition consumption
but fail miserably when the semantics of qualitative symbolic systems such as
natural or bio-molecular languages come into play? After all, it seems that the
meaning of a natural language statement such as ‘Jack played tennis with John’
is almost impossible to represent by some standard form of calculus statement (as
is commonly used for quantitative data), since prima facie no formal operation
involving summation, multiplication, square roots, integrals or differentiation, etc.
plays a role here. Although some aspects of a tennis match can be captured directly
by quantitative statements targeting, e.g. the velocity of a tennis ball, the length
of the tennis match in time, the oxygen consumption of the players, all these
aspects are merely peripheral for an adequate meaning representation capturing
the very essence of a tennis playing event. Accordingly, focusing on the relevant
actors (the players), the rules of the game, its location and the framing conditions
required to play a tennis match outdoors (versus indoors tennis) are as important as
features like the capabilities and actual performance of the players, decisions of the
referee, decisive breaks, weak services and deadly volleys, and the final outcome
(who won?). What is seriously lacking to capture these crucial information pieces
explicitly is an appropriate formal representation system for such a setting so that
computers can ‘understand’ what’s going on (a tennis match, not a football game,
not a theater play). What is needed here is a system comparable in mathematical
rigor to ones manipulating numerical data, one that can reasonably deal with
‘qualitative’ rather than (only) quantitative data in a computationally feasible way.
Yet, is computation without numbers only wishful thinking, or is it a serious
alternative?
Fortunately, there have been efforts already to capture this qualitative slice
of reality with formal, computational means. One branch of these activities is
rooted in computer science where researchers in the field of artificial intelligence
(AI) deal with the representation (i.e. formalization) of common-sense knowledge
and corresponding reasoning (i.e. computation) procedures since many decades
(see, e.g. [10, 35] or [38]). The fundamental goal of this work is to find a
formal foundation to describe the behaviour of human (and virtual) agents acting
intelligently, perhaps collaboratively, in the real (and virtual) world, i.e. search,
reason, plan, learn and solve problems, but also visually and acoustically perceive
information in order to individually experience, reflect on and subsequently change
states of the real (or virtual) world. By doing so, such agents converse with other
agents (i.e. talk) about the world’s past, current and future states. This goal is
much broader than the ambitions of linguistic semanticists who, for more than a
century, have been studying the relevant entities, their intrinsic properties and the
interdependencies among these entities relevant for formal meaning analysis of
Semantic Technologies 155
natural language utterances (see, e.g. [11]). One of the main insights they gained
(earlier than AI researchers, by the way) was that a reasonable candidate for formal
semantic analysis of natural language (though certainly not the only one) were
systems based on formal logics (for an overview, see, e.g. [6])—several variants
of predicate logics (first as well as higher-order) and even more sophisticated
formal systems such as modal logics of different types (including temporal, spatial
or deontic logics, just to mention a few of the more powerful and thus more
expressive systems). Since logical computation, irrespective of the fact whether
it is done deductively, inductively, abductively, etc., is completely different from
numerical computation, we claim that computers are mostly ignorant of qualitative
structure computing because they were (and still are) not directly equipped with
computational means to deal with a suitable form of mathematics, formal logics, in
particular.4
One caveat may be appropriate here. We do certainly not claim that formal
semantic computations for qualitative symbolic systems, such as natural languages,
can only be carried out within a logical representation framework. Recent efforts
in probabilistic semantics (including Bayesian, game and information theoretical
approaches) clearly demonstrate that statistical approaches (e.g. co-occurrence
statistics, tf-idf metrics, latent semantic analysis), together with the sheer mass of
data on the Web [22], also provide a reasonable framework for meaning analysis,
in particular for all kinds of classificatory problems (see, e.g. [33]). Hence, we
obviously have to dig a little deeper to uncover the essential objects of interest for
semantic analysis of meaning structures of natural languages and other qualitative
symbol systems.
4
One of the most remarkable exceptions to this observation is due to the introduction of the
logic programming paradigm around 1980 and PROLOG , its primary programming language
representative, which was based on Horn clause logic [13].
156 U. Hahn
2.1 Terms
The basic, non-decomposable units of semantic analysis are here called ‘terms’;
they refer to the objects (or entities) natural language utterances (but also visual
or auditory data streams) are about, without taking into account the way they
are combined in terms of assertions (see Sect. 2.2). Considering our example
from above (‘Jack played tennis with John’), the basic terms contained in this
sentence are the players, ‘Jack’ and ‘John’, as well as the event they enjoy, namely
‘playing tennis.’ Still, the notion of a ‘term’ remains open for further clarification.
One prominent distinction for semantically different forms of terms sets concrete
individual entities (e.g. ‘my friend John’ or ‘my tennis match this morning’) apart
from class- or type-level entities (e.g. all persons ever born, or all tennis matches
that have been or will ever be played). We may, tentatively, define (the semantics
of) the class of tennis matches by the union of all instances of (all individual) tennis
matches. Moving from directly observable instances in the concrete physical world
to theoretically constructed types is a major conceptual abstraction from perceived
reality that will allow us to further organize the set of all types by means of semantic
relations, or, alternatively, assign inherent properties, attributes, conditions, and
constraints to them on a definitional basis (see below).
However, defining a class as a set of individual instances might not be fully
satisfactory from a semantically ‘constructive’ point of view since this is an entirely
formal approach to term definition. In particular, it does not clarify the intrinsic
conceptual properties that allow us to distinguish my tennis match this morning
from a particular rose in your garden and our neighbour’s car (besides the fact that
they constitute different instances and are assigned different, completely unrelated
types). What is missing here is a constructive semantic specification of a term, either
implicitly by stating terminological semantic relations it has to other terms (at the
class/type level), or explicitly by a definition of the meaning of a tennis match, a rose
and a car. Such a definition should be exhaustive enough so that any other meaning
of a term (at the class/type level) can be excluded from the one being defined.
There are two foundational types of terminological semantic relations by which
terms can be linked, a taxonomic and a partonomic one.5 The first one relates a
semantically specific to a semantically more general term. In linguistic semantics
(see, e.g. [15]), this specific-to-general relation is called hyponymy (or general-to-
specific hypernymy), in the AI knowledge representation community it is called
‘Is-a’, and in documentation languages within information science it is called
‘narrower’ (more specific) versus ‘broader’ (more general) term. For example,
a ‘tennis match’ (hyponym) is a specific ‘sports event’ (hypernym), a ‘rose’ is a
specific ‘flower’, and a ‘car’ is a specific ‘vehicle’. The generality of the hypernyms
can easily be shown since there are sports events other than tennis matches,
5
We distinguish terminological semantic relations from assertional semantic relations, the latter
covering the multitude of empirical assertions such as interacts-with, is-effective-for, is-mother-of,
and is-located-in.
Semantic Technologies 157
flowers other than roses and vehicles other than cars. The second terminological
relation binds terms indicating a whole to terms indicating the whole’s constituent
parts. In linguistic semantics, the part-to-whole relation is called partonymy (or
whole-to-part holonymy), the AI knowledge representation and information science
communities call it ‘Part-of ’ (or ‘Has-part’). For example, a (prototypical) ‘rose’
has as (non-exhaustive) constituent parts ‘root’, ‘leaves’, and ‘thorns’, while a ‘car’
has ‘engine’, ‘brakes’, ‘tires’ and ‘seats’ as its parts. Linguistic lexicons make heavy
use of such semantic relationships (see, e.g. WORDNET),6 but also thesauri and
terminologies for various science domains are based on these relations (see., e.g. the
Unified Medical Language System (UMLS) Semantic Network),7 as well as a bunch
of common-sense and domain-specific ontologies (for the life sciences, see, e.g. the
Foundational Model of Anatomy (FMA)8 or the Gene Ontology (GO)9 ). Although
one might think that both relation types can be attributed a fairly straightforward,
non-controversial semantics, this is true neither for the taxonomic (see, e.g. [9]), nor
for the partonomic case (see, e.g. [39]).
If terms are structured in this way, a semantic network (formally, a directed
graph) can be built from such individual terminological relations, where terms
form the vertices and terminological semantic relations form the edges of a concept
graph. One may navigate in such a graph by following the edges (links) according
to certain graph traversal strategies (e.g. moving from a term only following its
‘Is-a’ links should lead to more and more general terms; for example, ‘rose’ Is-a
‘flower’ Is-a ‘plant’ Is-a ‘eukaryote’). Note that relational encodings of this type
constitute an implicit semantic characterization of the meaning of terms because
terms are only defined indirectly, relative to other terms in the network. Still, this
representation format is already manageable by a computer, since we have mapped a
system of unstructured terms to a formal construct (a directed graph) that is directly
representable in computers (e.g. as a term–term adjacency matrix) and processable
(e.g. finding fully connected relation paths in that matrix via connectivity search).
Perhaps, the most prominent and advanced example of this representation approach
is Conceptual Graphs [45].
The standard approach to an explicit, constructive textual definition is the
Aristotelian type. We specify the definiendum (what has to be defined) by a series
of properties (idealiter an exhaustive list of defining attributes and accompanying
constraints), the definiens, such that no other term (at the type level) also fulfills
the conjunction of all these properties. For instance, we may define ‘tennis’ as a
ball game played by either two or four players involving a yellow ball (of a certain
diameter and texture made from a special material). Furthermore, the game is played
either on a sand, a lawn or a rubber court (a size-wise well-defined quadrangle
separated in the half by a let of a certain height) where each player uses a tennis
6
http://wordnet.princeton.edu/
7
http://semanticnetwork.nlm.nih.gov/
8
http://sig.biostr.washington.edu/projects/fm/
9
http://www.geneontology.org/
158 U. Hahn
racket for playing the tennis ball. Additional rules of the game (e.g. who serves
when, from which position on the court, legal ball moves (in- and out-conditions),
the scoring of points, winning conditions for a game, a set, and a match, etc.) might
be made explicit as well, if necessary. Obviously, this task is truly open-ended since
it is hard to decide, if not impossible at all, when the list of properties is really
complete.
For human speakers of the same language, with a reasonable level of semantic
competence, this indeed helps clarify the meaning of a term. This is so because
we have a genus proximum (‘ball game’, a general class of games which subsumes
tennis but also other ball games such as football, hockey, basketball), with various
differentia specifica (e.g. 2 or 4 players, sand, lawn or rubber court, i.e. properties
which in their entirety distinguish tennis from all other alternative ball games).
However, such a natural language definition cannot be directly used for formal
semantic analysis since natural language is not per se a formal language (we will
come back to this problem in Sect. 2.2).
Natural language is a poor candidate to serve as a formally adequate semantic
meta language for several reasons. Among the most pressing problems encountered
are a systematic semantic overloading and excessive proliferation of meaning
variants for single terms (and term variants for a single form of meaning, as well).
First, very often natural languages provide several alternatives for denoting the same
term by different linguistic expressions (e.g. ‘car’ and ‘automobile’ are synonymous
at the monolingual level, while ‘car’, ‘Auto’, ‘voiture’, etc. are synonymous at
the multilingual level of several natural languages—they all denote the same
semantic type). Second, one term label (its natural language name) may denote
different types of entities (e.g. the English term name ‘bank’ denotes at the
same time a river bank, a piece of furniture and a financial institute, and is thus
semantically ambiguous). Furthermore, definitions are, in principle, required for
each natural language term. On the one hand, their number is huge (size counts
for comprehensive natural language lexicons enumerate up to 150,000 lexical items
for covering the general language portion of a specific natural language,10 not
counting the possible extensions of this lexical set by one or more, e.g. scientific,
sublanguages). Even worse, a substantial part of such a lexicon is intrinsically hard
to define uncontroversially (consider, e.g. loaded terms such as ‘freedom’, ‘dignity’,
or ‘respect’; or colors such as ‘green’ ‘red’ or ‘blue’; or moods such as ‘happy’,
‘angry’ ‘sad’ or ‘stubborn’).
We here face the problem of all-embracing and at the same time often rather
complex definition tasks. Sciences in general, quantitatively based ones such as
physics or chemistry, in particular, profit from a reductionist view on the world
where only a constrained slice of the reality has to be taken care of for theoretical
considerations, usually abstracting away from many idiosyncracies one encounters
in reality. For instance, under the particular view of physics or chemistry different
10
This number is based on the lexical population statistics of WORD NET, the most comprehensive
lexical database for the English language.
Semantic Technologies 159
degrees and shapes of rusty materials such as iron are simply of no concern (but
theoretical abstractions such as iron’s atomic weight and binding constraints relative
to other chemical elements are), whereas building and construction engineers have
to struggle heavily with such phenomena in their daily professional work. Other
kinds of sciences, e.g. jurisprudence, medicine, etc., suffer as well from not being
able to reduce reality by definition. Natural language semanticists also face the
dilemma that virtually all lexical items of a natural language have to be defined,
in principle. Aside from the sheer size and multitude of semantic descriptions,
complexity is yet another hard problem for linguistic semanticists. While physics
may focus on fundamental notions such as ‘mass’, ‘energy’, and ‘electrons’, from
which complex theories are built in a compositional and constructive way, linguists
have neither quantitative parameters, nor exhaustive and uncontroversial criteria for
defining the semantics of qualitative notions such as ‘sunshine’, ‘smell’, ‘politics’,
‘fun’ or ‘justice’.
Terms have their merits for text analytics in the sense that they characterize what
a document is about, i.e. its theme(s) or topic(s). Such a level of analysis, however,
does not tell us anything about assertions, i.e. the statements, claims or hypotheses
being made in a document, which relate basic entities mentioned in this context so
that, e.g. a truth status can be assigned to them (e.g. the assertion ‘Jack played tennis
with John’ is either true or false under well-specified spatio-temporal conditions,
whereas ‘Jack’ cannot be assigned such a truth value at all).
11
Formally speaking, we here outline a system which is usually referred to as typed first-order logic
(FOL; see, e.g. [18]). Throughout the discussion, we avoid going into low-level details of FOL, in
particular, in what concerns the use of quantifiers.
160 U. Hahn
For illustration purposes, consider our tennis match example from above. The
sentence ‘Jack played tennis with John’ describes a particular sports event, an
instance of the semantic type Playing tennis. The semantic description of a tennis
match requires, among other arguments, players as constituent entities. Players are
human actors, such as specific persons, e.g. Johni and Jackj (the indices i and j
denote special instances of all persons whose given name is either John or Jack,
respectively; for the sake of readability, we will leave away such indices whenever
possible). We may roughly characterize this assertion as Playing tennis (fJack,
JohngACT); the (nonstandard) set notation merely indicates the players involved,
more generally the ACTors of the tennis match. Of course, this representation is
incomplete in many ways; e.g. one may consider the LOCation, the DURation,
or even the RESult of the match as being crucial knowledge for a tennis match
that has to be specified as well. If we integrate these considerations and insert
type-specific placeholders (actually, typed variables) into predicates, we end up
with a representation structure like the following one: Playing tennis (fJack,
JohngACT; LOC; DUR; RES ). For an n-ary predicate, we have n argument
positions associated with it (so the above predicate is 4-ary, the first argument is
already instantiated by two actors, whereas the remaining three argument positions
remain uninstantiated so far, i.e. only their type requirements are made explicit).
One particular problem we here encounter for semantic descriptions is that, for
virtually each predicate one can think of, the number n cannot be fixed a priori. This
problem resembles one well-known in linguistics, namely the distinction between
(syntactically required) complements (e.g. the number and types of grammati-
cally necessary (in)direct objects, the subject, etc.) and (syntactically/semantically
possible) adjuncts (e.g. different types of optional spatial, temporal, and manner-
type extensions; as an example, consider the extended example sentence ‘Jack
played tennis with John [on a sand court]LOC1 [in Florida]LOC2 [on a Sunday
morning]TIME [in an A-level tournament]MANNER1 [under extremely humid weather
conditions]MANNER2 : : :’).
The additional information provided by assertions is the composition of terms
under a particular predicate name. Several assertions can be combined as sets of
assertions. Such a store of knowledge is usually called a knowledge base. We may,
of course, add or delete assertions from such a knowledge store in the same way
as we enter or delete entries in a database. But there are no internal mechanisms
to compute new assertions (e.g. we may conclude from Jack’s and John’s playing
outdoor tennis that, at the time of playing, no heavy rainfall occurred in the area of
the tennis court). Given these considerations, knowledge bases as sets of relation
tuples bear a strong similarity to the type of ‘tabular’ knowledge contained in
databases. It might also be worthwhile to emphasize the fact that the transformation
of the informal natural language utterance ‘Jack played tennis with John’ into the
formal assertion Playing tennis (fJack, JohngACT) constitutes a perfect example for
moving from unstructured data to structured ones. Note also that this assertional
format, unlike the original sentence, is readily manageable and processable by
machine. The distinctive feature, however, that sets knowledge bases apart from
Semantic Technologies 161
squash, football, hockey or checkers. This problem cannot be solved within any
symbol-based approach (such as the logical one) since it deals with symbols
(and, for logics, truth values assigned to symbols), only, that need some sort of
interpretation; when we leave the realm of logical specification this interpretation
is almost always natural language-bound. As an alternative, one might consider
symbol-grounding approaches which abstract away from symbolic interpretation
problems and relate symbols to physical parameters and perceptual data in the real
world (e.g. spatio-temporal data points in a 3-D representation), which might then be
susceptible to additional quantitative analysis (as outlined, e.g. by [14, 24] or [20]).
In the long run, the symbol-grounding approach, with its direct ties to perceived
reality, might link our human understanding of the world to quantitative categorical
description systems mentioned in the introductory section. A serious problem for
this approach is, however, to adequately deal with non-groundable terms which
address non-physical entities (such as ‘freedom’, ‘art’ or ‘knowledge’).
following one: ‘Jack played tennis with John. Jack’s racket got damaged when he
threw it on the ground and so he lost the match.’ To grasp the plot of that story
we have to know that a racket is the only device to play tennis. Tennis can only be
successfully played with fully operational rackets. If a racket gets damaged in the
course of the match and no proper substitute racket is available, the player with the
damaged racket is most likely to lose the game because the player’s ability to hit
and smash the tennis ball properly is heavily impaired.
In the quantitative view of the world, a light-weight correspondence can be
established between inference rules and algebraic composition rules, e.g. addition
yields a new number (the sum) derived from a series of argument numbers. Whereas
numerical addition works for some domains (e.g. when I add 5,000 e to my account
which originally had a balance of 50,000 e , I end up with a new amount of
55,000 e on my account), for others it does not (e.g. addition is not defined for
gene sequences or natural language utterances or speech signals as raw data input).
Consequently, other operations on qualitative signature strings, e.g. splicing and
copying (for molecular biology) or concatenating or parsing (for natural language
processing), are required here.
or their verbal definitions (giving rise to the problems just discussed when natural
language is used as a meta language for semantic description).
The life sciences are one of the most active scientific areas in which we observe
a long-standing tradition in providing, actively maintaining and newly developing
such knowledge repositories. The most important and freely available ones are
organized in large umbrella systems. For instance, the Unified Medical Language
System (UMLS)12 assembles more than 100 medical source terminologies ranging
from anatomy, pathology to diagnostic and clinical procedures. In a similar vein,
the Open Biological Ontologies (OBO)13 and the NCBO BIOPORTAL14 constitute
a collection of more than 80 and 200, respectively, biological ontologies, again
covering many of the specialties in this field (e.g. genes and proteins, experimental
methods, species, and tissues). These terminologies and ontologies vary tremen-
dously in size (some few hold more than 70,000 base terms and 150,000 relations,
others are small and cover no more than 1,000 terms and relations). Though being
valuable resources on its own, semantic integration is usually out of reach because
terminologies /ontologies being part of one of these umbrellas are not really linked
to each other. This issue is dealt with in the field of ontology alignment [17],
with its emphasis on finding lexical and structural indicators for matching and
connecting different, yet semantically overlapping, knowledge resources in a larger
and coherent knowledge assembly.
Whereas the life sciences are a particularly impressive and active application area
for terminologies and ontologies, other areas, such as agriculture,15 law,16 or arts and
architecture,17 are also involved in such initiatives, though less committed. Finally,
the requirements for dealing with everyday common-sense knowledge have also
led to the emergence of general-purpose ontologies such as SENSUS18 and CYC19 .
There are also borderline cases where lexical databases such as WORDNET20 and
FRAMENET21 (both for the general English language) are considered as ontologies
as well because they make heavy use of (verbally defined) semantic relations as part
of their specifications. Two problems are common to all these approaches. First, how
to cover a really large proportion of subject- as well as domain-specific terms and
relations from a chosen domain, second, what types of semantic relations should be
encoded?
12
http://www.nlm.nih.gov/research/umls/
13
http://www.obofoundry.org/
14
http://bioportal.bioontology.org/
15
http://agclass.nal.usda.gov/
16
http://www.loc.gov/lexico/servlet/lexico/liv/brsearch.html?usr=pub-13:0&op=frames&db=
GLIN
17
http://www.getty.edu/research/tools/vocabularies/aat/index.html
18
http://www.isi.edu/natural-language/resources/sensus.html
19
http://www.cyc.com/
20
http://wordnet.princeton.edu/
21
https://framenet.icsi.berkeley.edu/fndrupal/
Semantic Technologies 165
As to the first question, virtually thousands of person years have already been
spent world-wide on the manual creation and maintenance of terminological and
ontological resources. The results though being impressive in terms of the sheer
size of some repositories are often controversial within the respective domains,
when it comes to the underlying principles of the organization of terms or specific
content areas, i.e. particular branches of a terminology. This is, to some degree, due
to vague and often even lacking formal semantic criteria what a particular relation
really should mean (see, e.g. [30, 43, 44], and also the remarks below). Note that
any form of terminological specification which rests on natural language inherits,
by default, all known problems of natural languages as specification languages
(being inherently vague, imprecise, ambiguous, etc.). Attempts at automating the
construction process of lexicons, terminologies or ontologies are still in their infancy
(this area of research is called lexical/semantic acquisition (see, e.g. [8]) or ontology
learning (see, e.g. [12]) and do currently not deliver reasonable and sound results,
on a larger scale, at least.
As to the second question, concerning the types of semantic relations to be
taken into consideration for terminology / ontology building, several disciplines
contributed to such efforts. In structural linguistic semantics, as well as in the broad
field of documentation languages (including, e.g. nomenclatures, terminologies,
thesauri, classifications for scientific domains), which is part of information (or
library) science, we witness a tradition of dealing with terms (in linguistic jargon,
lemmata or lexical entries) and informal semantic relations among terms. These
activities have mostly been focusing on the specification of synonyms, taxonomic
and partonomic relations. Taxonomic and partonomic relations, e.g. Heart Is-a
Organ and Heart Has-part Heart Chamber, cover terminological relationships but
do not specify any empirical or propositional ones as, e.g. ‘Tennis playing’ or
‘Is-effective-for’ do (when combined with proper arguments). Only in the life
sciences domain, such propositional relations play an admittedly marginal, yet
currently growing role (most ontologies lack them, for those which incorporate them
their proportion is usually less than 5–10% of all relations available; around 80%
deal with taxonomic and 10–15% with partonomic relations (as of the end of 2011)).
These semantic relations are assigned a priori to different terms by linguists,
information scientists, terminologists, ontologists, or domain experts. There is also
usually no serious attempt at specifying the regularities holding between hyponyms
and hypernyms or partonyms and holonyms at a more general and formal level, at
least in terminologies. Work on ontologies has made some progress in this field of
research. One stream of activities deals with the specification of algebraic properties
between terms linked by various semantic relations. For instance, the Is-a relation is
assigned the properties of being transitive, antisymmetric, and reflexive (for further
relations which also obey algebraic properties, see [42]).
Another way of dealing with the (formal) semantics of the Is-a relation is by
defining necessary and sufficient criteria for definitional properties of terms and
computing (rather than defining) relationships on the basis of these specifications.
This brings us back to the Aristotalian approach to term definition where a
genus proximum (the hypernym) is complemented by a set of differentia specifica
166 U. Hahn
In the previous section, we focused primarily on the structure of terms and relations
and how these constructs are used in terminologies and ontologies. We now turn
to computational issues related to them—representation conventions and reasoning
engines using these shared formats. We already introduced inference rules as a
formal means to mechanically generate additional assertions from a set of a priori
given assertions where the extension of the original assertion set is licensed by
rules (truth-preserving ones when logical calculi are applied deductively). Formally,
these deductive extensions are already logically implied by the initial assertion
set together with the rules being supplied and, thus, only for human perception
constitute ‘new’ knowledge.
These mechanics, their formal underpinnings and implications are well known in
the AI knowledge representation community for quite a while. However, harmful
formal properties of reasonably powerful logical calculi (e.g. various forms of
description logics (DL), first-order predicate logics (FOPL), as well as even more
sophisticated, e.g. modal, logics) concerning their computational tractability (e.g.
lacking decidability of FOPL, NP-completeness of subsumption in DL, etc.)22 are
a strong impediment to the application of these formal systems in successful real-
world applications.
The Semantic Web as a data container adds further constraints to the usability
of such formal systems. Prime among them is the requirement that supersized
volumes of data have to be accounted for. Under these circumstances, completeness
of (formal) specifications and their consistency can never be granted in the Web.
Also the Semantic Web is constantly growing so that any fixed number of data
or knowledge items (the Web considered as a static knowledge base) cannot be
assumed. Data snapshots are outdated in the very moment they are made. Not to
mention the fact that Web data are notoriously ‘dirty’, i.e. uncertain, incomplete,
vague, imprecise, contradictory, etc. and contributions vary significantly in terms
of quality, trustability and objectiveness. Hence, rather than pursuing a ‘strong’
approach to formal knowledge representation and reasoning requiring clean and,
therefore, artificial toy world assumptions about the nature of the raw data, the
emerging semantic technologies aim at a compromise between enhanced expres-
sivity of the underlying representation structures and the operations that can be
performed on them in the large, i.e. on a Web-scale basis, also being aware of
‘hostile’ degradations and deformations of these data.
The formal mechanisms which we have discussed in the previous sections
constitute the core of the emerging semantic computing paradigm. Relational tuples
with reference to some state of the world are known in linguistic semantics under the
heading of predicate-argument structures for a long time (although formal reasoning
22
http://www.cs.man.ac.uk/ezolin/dl/
168 U. Hahn
and the inferential derivation of new assertions has not been much of a concern in
this field), whereas the roots of description logics (and thus the rich potential for
terminological reasoning) can be traced in the history of AI back to the early eighties
of the past century (in particular, their emergence from ‘structured’ knowledge
representation formalisms, such as early semantic networks, frames or scripts). So
the structural format and inferential power of knowledge representation schemes is
not the ground-breaking innovation behind semantic computing given this legacy.
The reason why these methods recently became so important for semantic
computing is the fact that standards were emerging and a community has formed
that adheres to them in their daily work. Relational tuples, for instance, became a
W3C standard under the name of Resource Description Framework (RDF).23 RDF
is based on the idea of identifying ‘things’ using Web identifiers (called Uniform
Resource Identifiers (URIs)), and describing resources in terms of simple properties
and property values. The convention basically requires statements about the real
world to be made in the format of tuples which are composed of three basic entities,
namely (subject—predicate—object). The ‘thing’ a statement is about is called the
subject. The part that identifies the property or characteristic of the subject that a
statement specifies is called the predicate, and the part that identifies the value of
that property is called the object. For instance, if a photograph under http://www.
PicofJohn.jpg (the subject) depicts (the predicate) ‘John’ as a tennis player (the
object), then we might capture this information as an RDF statement in the following
way: (http://www.PicofJohn.jpg,http://purl.org/dc/elements/1.1/depiction, ‘John as
a tennis player’) Huge amounts of such triples are then collected in so-called RDF or
Large Triple Stores which constitute, in essence, a structured database of assertions
about the world.24
RDF is quite limited in terms of expressivity and can be coupled with the Simple
Knowledge Organization System (SKOS)25 for lightweight knowledge sharing and
linking on the Web (e.g. using informal thesauri and classifications). An extension
of RDF, RDF-SCHEMA (RDFS) or RDF Vocabulary Description Language,26 allows
to state semantic relations among the terms used in RDF statements. Semantically
even more powerful, however, is to link RDF structures with inference rules [27]
such that an embedding of RDF into OWL is achieved. OWL corresponds to a family
of description logics which incorporate different expression layers, ranging from
full OWL via OWL-DL to OWL-LITE. While expressivity decreases from full to lite
variants, computational properties get more handsome (e.g. full OWL or OWL with
the Semantic Web Rule Language (SWRL) are undecidable, whereas OWL-DL and
OWL-LITE are decidable; see, e.g. [34]). OWL, unlike RDF, still has not become an
industry standard up until now, but has rather become a de facto standard for the
Semantic Web community.
23
http://www.w3.org/RDF/
24
http://www.w3.org/wiki/LargeTripleStores
25
http://www.w3.org/2001/sw/wiki/SKOS
26
http://www.w3.org/TR/rdf-schema/
Semantic Technologies 169
URI, RDF, RDFS, SKOS and OWL are computable representation schemes. This
is fine from a methodological perspective but unless massive amounts of contents
are encoded and publicly made accessible in this format they are not much of a
help for large-scale semantic computing. Fortunately, the vast majority of ontologies
developed these days are not only specified in RDF format but, even more so, are
made available in some form of OWL. Hence, large amounts of not only common-
sense knowledge representation structures (SENSUS, CYC) but also domain-specific,
special-purpose knowledge (e.g. for medicine (FMA,27 NCI,28 etc.) or biology
(GO,29 RO,30 etc.)) can be reused for semantic computation.
The semantics of qualitative knowledge of the world have long been disregarded
by computer science and industry. This is beginning to change these days as
computationally adequate representation formats (RDF, OWL, etc.) and correspond-
ing software (e.g. RDF databases, OWL classifiers) are being made available and
robust enough for large-scale knowledge processing. While the prospects look
very positive for semantic technologies, the field will have to solve several crucial
problems some of which are outlined below.
1. Currently, a large diversity and heterogeneity characterizes the wealth of avail-
able and emerging ontologies. Lacking though are robust mechanisms which
interlink single ontologies by means of automatic or computer-supported inter-
active ontology alignment. The problems encountered are: different perspectives
(views) in different ontologies on the same subject area (e.g. should a disease
ontology assume ideal organism functions for reference purposes or assume
pathologies, and if so, which ones, as a comparison standard?), different levels of
granularity of knowledge (shallow (layman) versus detailed (expert) knowledge)
and different levels of definitional rigidity and empirical correctness—to mention
just a few hurdles for the integration and interoperability of collections of
ontologies.
2. Many domains of discourse lack ontological descriptions or, if available, suffer
from limited depth and coverage of their specifications. Since building and
maintaining semantic descriptions is extremely time-consuming and intellectu-
ally challenging, automatic knowledge acquisition and ontology learning have
become premium research topics. The grand idea is to use raw Web data
(free text, definitional statements, e.g. from WIKIPEDIA, perhaps complemented
27
http://trac.biostr.washington.edu/trac/wiki/FMAInOWL
28
http://ncicb.nci.nih.gov/download/evsportal.jsp
29
http://www.geneontology.org/GO.format.shtm
30
http://obofoundry.org/ro/
170 U. Hahn
31
http://www.w3.org/wiki/SemanticWebArchitecture
Semantic Technologies 171
References
1. Altschul, S.F., Gish, W., Miller, W., Myers, E.W., Lipman, D.J.: Basic local alignment search
tool. J. Mol. Biol. 215(3), 403–410 (1990)
2. Baader, F., Calvanese, D., McGuinness, D.L., Nardi, D., Patel-Schneider, P.F. (eds.): The
Description Logic Handbook: Theory, Implementation, and Applications, 2nd edn. Cambridge
University Press, Cambridge (2007)
32
http://www.w3.org/standards/semanticweb/data
33
https://www.mturk.com/mturk/welcome
172 U. Hahn
3. Baader, F., Horrocks, I., Sattler, U.: Description logics. In: van Harmelen, F., Lifschitz, V.,
Porter, B. (eds.) Handbook of Knowledge Representation, pp. 135–180. Elsevier, Amsterdam
(2008)
4. Baader, F., Nutt, W.: Basic description logics. In: Baader, F., Calvanese, D., McGuinness, D.L.,
Nardi, D., Patel-Schneider, P.F. (eds.) The Description Logic Handbook: Theory, Implementa-
tion, and Applications, 2nd edn., pp. 47–100. Cambridge University Press, Cambridge (2007)
5. Berners-Lee, T., Hendler, J., Lassila, O.: The Semantic Web: a new form of Web content that is
meaningful to computers will unleash a revolution of new possibilities. Scientif. Am. 284(5),
34–43 (2001)
6. van Benthem, J., Ter Meulen, A.: Handbook of Logic and Language, 2nd edn. Elsevier,
Amsterdam (2010)
7. Blythe, J., Hobbs, J.R., Domingos, P., Kate, R., Mooney, R.: Implementing weighted abduction
in Markov Logic. In: IWCS 2011 — Proceedings of the 9th International Conference on
Computational Semantics, pp. 55–64 (2011)
8. Boguraev, B., Pustejovsky, J. (eds.): Corpus Processing for Lexical Acquisition. MIT,
New York (1996)
9. Brachman, R.J.: What Is-A is and isn’t: an analysis of taxonomic links in semantic networks.
IEEE Comput. 16(10), 30–36 (1983)
10. Brachman, R., Levesque, H.: Knowledge Representation and Reasoning. Morgan Kaufmann,
Los Altos (2004)
11. Chierchia, G., McConnell-Ginet, S.: Meaning and Grammar: An Introduction to Semantics,
2nd edn. MIT, Cambridge, MA (2000)
12. Cimiano, P.: Ontology Learning and Population from Text. Springer, Berlin (2006)
13. Clocksin, W.F., Mellish, C.S.: Programming in Prolog. Springer, Berlin (2003)
14. Cottrell, G.W., Bartell, B., Haupt, C.: Grounding meaning in perception. In: GWAI-90 —
Proceedings of the 14th German Workshop on Artificial Intelligence, pp. 307–321 (1990)
15. Cruse, D.A.: Lexical Semantics. Cambridge University Press, Cambridge (1986)
16. Domingos, P., Lowd, D.: Markov Logic: An Interface Layer for Artificial Intelligence.
Morgan & Claypool, San Rafael (2009)
17. Euzenat, J., Shvaiko, P.: Ontology Matching. Springer, Berlin (2007)
18. Fox, C., Lappin, S.: An expressive first-order logic with flexible typing for natural language
semantics. Logic J. Int. Group Pure Appl. Logic 12, 135–168 (2004)
19. Giunchglia, F., Walsh, T.: A theory of abstraction. Artif. Intell. 57(2–3), 323–389 (1992)
20. Glenberg, A.M., Robertson, D.A.: Symbol grounding and meaning: a comparison of high-
dimensional and embodied theories of meaning. J. Memory Lang. 43(3), 379–401 (2000)
21. Goldreich, O.: Foundations of Cryptography. Volume 1: Basic Tools. Oxford University Press,
Oxford (2007)
22. Halevy, A., Norvig, P., Pereira, F.: The unreasonable effectiveness of data. IEEE Intell. Syst.
24(2), 8–12 (2009)
23. Hansen, E.A., Zilberstein, S.: Monitoring and control of anytime algorithms: a dynamic
programming approach. Artif. Intell. 126(1–2), 139–157 (2001)
24. Harnad, S.: The symbol grounding problem. Physica D 42(1–3), 335–346 (1990)
25. Hobbs, J.R.: Granularity. In: IJCAI-85 — Proceedings of the 9th International Joint Conference
on Artificial Intelligence, pp. 432–435 (1985)
26. Hobbs, J.R., Stickel, M.E., Appelt, D.E., Martin, P.A.: Interpretation as abduction. Artif. Intell.
63(1–2), 69–142 (1993)
27. Horrocks, I., Patel-Schneider, P.F., van Harmelen, F.: From SHIQ and RDF to OWL: the making
of a Web Ontology Language. J. Web Semant. 1(1), 7–26 (2003)
28. Jurafsky, D., Martin, J.H.: Speech and Language Processing. An Introduction to Natural
Language Processing, Computational Linguistics, and Speech Recognition, 2nd edn. Prentice
Hall, Englewood Cliffs (2009)
29. Kittur, A., Kraut, R.E.: Harnessing the wisdom of the crowds in W IKIPEDIA : quality through
coordination. In: CSCW ’08 — Proceedings of the 2008 ACM Conference on Computer
Supported Cooperative Work, pp. 37–46 (2008)
Semantic Technologies 173
30. Köhler, J., Munn, K., Rüegg, A., Skusa, A., Smith, B.: Quality control for terms and definitions
in ontologies and taxonomies. BMC Bioinformat. 7, 212 (2006)
31. Koller, D., Friedman, N.: Probabilistic Graphical Models: Principles and Techniques. MIT,
New York (2009)
32. Lutz, C.: Adding numbers to the SHIQ description logic: first results. In: KR-02 — Proceedings
of the 8th International Conference on Principles and Knowledge Representation and Reason-
ing, pp. 191–202 (2002)
33. Manning, C., Schütze, H.: Foundations of Statistical Natural Language Processing. MIT,
New York (1999)
34. Motik, B.: On the properties of metamodeling of OWL. J. Logic Comput. 17(4), 617–637
(2007)
35. Nilsson, N.: Principles of Artificial Intelligence. Morgan Kaufmann, Los Altos (1980)
36. Pearl, J.: Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference.
Morgan Kaufmann, Los Altos (1988)
37. Ricci, F., Rokach, L., Shapira, B.: Recommender Systems Handbook. Springer, Berlin (2011)
38. Russell, S.J., Norvig, P.: Artificial Intelligence: A Modern Approach, 3rd edn. Prentice Hall,
Englewood Cliffs (2010)
39. Schulz, S., Kumar, A., Bittner, T.: Biomedical ontologies: what part-of is and isn’t. J. Biomed.
Informat. 39(3), 350–361 (2006)
40. Searls, D.B.: The language of genes. Nature 420(6912), 211–217 (2002)
41. Shariff, A.A., Hussain, M.A., Kumar, S.: Leveraging unstructured data into intelligent
information: analysis & evaluation. In: Proceedings of the 2011 International Conference on
Information and Network Technology, pp. 153–157 (2011)
42. Smith, B., Ceusters, W., Klagges, B., Köhler, J., Kumar, A., Lomax, J., Mungall, C., Neuhaus,
F., Rector, A.L., Rosse, C.: Relations in biomedical ontologies. Genome Biol. 6, R46 (2005)
43. Smith, B., Köhler, J., Kumar, A.: On the application of formal principles to life science data:
a case study in the Gene Ontology. In: DILS 2004—Data Integration in the Life Sciences,
pp. 79–94 (2004)
44. Smith, B., Rosse, C.: The role of foundational relations in the alignment of biological ontolo-
gies. In: MedInfo 2004—Proceedings of the 11th World Congress on Medical Informatics,
pp. 444–448 (2004)
45. Sowa, J.F.: Conceptual Structures: Information Processing in Mind and Machine. Addison-
Wesley, Reading (1984)
46. Wiebe, J., Wilson, T., Bruce, R., Bell, M., Martin, M.: Learning subjective language. Comput.
Linguist. 30(3), 277–308 (2004)
What Are Ontologies Good For?
Ian Horrocks
1 Introduction
I. Horrocks ()
Oxford University, Department of Computer Science, Oxford, UK
e-mail: ian.horrocks@cs.ox.ac.uk
features that distinguish them: the well-known “tree of Porphyry”, for example,
identifies animals and plants as sub-categories of living things distinguished by
animals being sensitive and plants being insensitive (see Fig. 1).
In contrast, in computer science an ontology is an engineering artifact, usually
a so-called “conceptual model” of (some aspect of) the world; it introduces
vocabulary describing various aspects of the domain being modelled and provides
an explicit specification of the intended meaning of the vocabulary by describing
the relationships between different vocabulary terms. These relationships do, how-
ever, invariably include classification-based relationships not unlike those used in
Porphyry’s tree.
In a logic-based (formal) ontology, the vocabulary can be thought of as predicates
and constants. For example, Animal and Plant can be thought of as (unary)
predicates and Socrates, Plato and Aristotle as constants. Relationships between
vocabulary terms can be specified using axioms, logical sentences such as
The Web Ontology Language (OWL) [19] is currently by far the most widely
used ontology language. OWL was developed by a World Wide Web Consortium
(W3C) working group in order to extend the capabilities of the Resource Description
Framework (RDF), a language for representing basic information about entities and
relationships between them [13]. OWL exploited existing work on languages such
as OIL [4] and DAML C OIL [12] and, like them, was based on a description logic
(DL).
Description logics (DLs) are a family of logic-based knowledge representation
formalisms; they are descendants of Semantic Networks [29] and KL-ONE [2].
These formalisms all adopt an object-oriented model, similar to the one used by
Plato and Aristotle, in which the domain is described in terms of individuals,
concepts (usually called classes in ontology languages), and roles (usually called
relationships or properties in ontology languages). Individuals, e.g., “Socrates”,
are the basic elements of the domain; concepts, e.g., “Human”, describe sets
of individuals having similar characteristics; and roles, e.g., “hasPupil” describe
relationships between pairs of individuals, such as “Socrates hasPupil Plato”.
178 I. Horrocks
As well as atomic concept names such as Human, DLs also allow for concept
descriptions to be composed from atomic concepts and roles. Moreover, it is
possible to assert that one concept (or concept description) is subsumed by (is a
sub-concept of), or is exactly equivalent to, another. This allows for easy extension
of the vocabulary by introducing new names as abbreviations for descriptions. For
example, using standard DL notation, we might write:
This introduces the concept name HappyParent and asserts that its instances are
just those individuals that are instances of Parent, and all of whose children are
instances of either Intelligent or Athletic.
Another distinguishing feature of DLs is that they are logics and so have a formal
semantics. DLs can, in fact, be seen as decidable subsets of first-order predicate
logic, with individuals being equivalent to constants, (atomic) concepts to unary
predicates and (atomic) roles to binary predicates. Similarly, complex concepts are
equivalent to formulae with one free variable, and standard axioms are equivalent
to (bi-) implications with the free variable universally quantified at the outer level.
For example, the concept used above to describe a happy parent is equivalent to the
following formula
and the DL axiom introducing the concept name HappyParent is equivalent to the
following bi-implication:
8xŒHappyParent.x/”Parent.x/^8yŒhasChild.x; y/!.Intelligent.y/_Athletic.y//
SHOIN also provides for transitive roles, allowing us to state, e.g., that if y is an
ancestor of x and z is an ancestor of y, then z is also an ancestor of x, and for
inverse roles, allowing us to state, e.g., that if z is an ancestor of x, then x is also
an descendent of z. The constructors provided by OWL, and the equivalent DL
syntax, are summarised in Fig. 2.
In DLs it is usual to separate the set of statements that establish the vocabulary to
be used in describing the domain (what we might think of as the schema) from the
set of statements that describe some particular situation that instantiates the schema
(what we might think of as data); the former is called the TBox (Terminology Box)
and the latter the ABox (Assertion Box). An OWL ontology is simply equivalent
to a set of SHOIN TBox and ABox statements. This mixing of schema and
data is quite unusual (in fact ontologies are usually thought of as consisting only
of the schema part), but does not affect the meaning—from a logical perspective,
SHOIN KBs and OWL ontologies are just sets of axioms.
The main difference between OWL and SHOIN is that OWL ontologies use
an RDF-based syntax intended to facilitate their use in the context of the Semantic
Web. This syntax is rather verbose and not well suited for presentation to human
beings. Figure 3, for example, illustrates how the description of happy parent given
above would be written in OWL’s RDF syntax.
180 I. Horrocks
Ontologies and ontology-based information systems are closely related in both form
and function to databases and database systems: both support the development of
domain models that can be queried and updated. Figure 4, for example, illustrates
a database entity relationship (ER) schema that models (a fragment of) a university
domain in much the same way as the conceptual (TBox) part of an ontology, and
(some of) the information it captures could be expressed using the following DL
axioms:
where (1) states that every course is taught by exactly one teacher, (2) states that
only courses can be taught (i.e., being taught by something implies being a course),
and (3) states that graduate students are a subclass of students.
Although there are clear analogies between databases and OWL ontologies, there
are also important differences. Unlike databases, OWL has a so-called open world
semantics in which missing information is treated as unknown rather than false,
and OWL axioms behave like inference rules rather than database constraints. In
the above axioms, for example, it is stated that only courses can be taught; in
OWL, if we know that IntroductionToAI is taught by Dave, this leads to the
implication that IntroductionToAI is a Course—if we were to query the ontology
for instances of Course, then IntroductionToAI would be part of the answer. In a
database setting the schema is interpreted as a set of constraints on the data: adding
the fact that IntroductionToAI is taught by Dave without IntroductionToAI being
already known to be a Course would lead to an invalid database state, and such an
update would therefore be rejected by a database management system as a constraint
violation.
In contrast to databases, OWL also makes no unique name assumption (UNA).
For example, from axiom (1) we know that a course can be taught by only one
teacher, so additionally asserting that IntroductionToAI is taught by David would
lead to the implication that Dave and David are two names for the same individual.
In a database setting this would again be treated as a constraint violation. Note that
in OWL it is possible to assert (or infer) that two different names do not refer to
the same individual; if such an assertion were made about Dave and David, then
asserting that IntroductionToAI is taught by both Dave and David would make
the ontology inconsistent. Unlike database management systems, ontology tools
typically don’t reject updates that result in the ontology becoming wholly or partly
inconsistent, they simply provide a suitable warning.
The treatment of schema and constraints in a database setting means that they can
be ignored at query time—in a valid database instance all the schema constraints
must already be satisfied. This makes query answering very efficient: in order to
What Are Ontologies Good For? 181
Tof Tby
TEACHING Teacher
(1,1) (0,∞)
Ein Eof
Course ENROLLING Student
(2,30) (4,6)
(2,20)
4 Ontology Reasoning
Ontologies are often large and complex: the well-known SNOMED clinical terms
ontology includes, for example, more than 400,000 class names [25]. Building and
182 I. Horrocks
maintaining such ontologies is very costly and time consuming, and providing
tools and services to support this “ontology engineering” process is of crucial
importance to both the cost and the quality of the resulting ontology. State-of-the-art
ontology development tools, such as SWOOP [14], Protégé 4 [17], and TopBraid
Composer (see http://www.topbraidcomposer.com/), use a DL reasoner, such as
FaCTCC [28], Racer [9] or Pellet [23], to provide feedback to the user about the
logical implications of their design. This typically includes (at least) warnings about
inconsistencies and redundancies.
An inconsistent (sometimes called unsatisfiable) class is one whose description
is “over-constrained”, with the result that it can never have any instances. This
is typically an unintended feature of the design—why introduce a name for a
class that can never have any instances—and may be due to subtle interactions
between axioms. It is, therefore, very useful to be able to detect such classes
and bring them to the attention of the ontology engineer. For example, during
the development of an OWL ontology at the NASA Jet Propulsion Laboratory,
the class “OceanCrustLayer” was found to be inconsistent. This was discovered
(with the help of debugging tools) to be the result of its being defined to be
both a region and a layer, one of which (layer) was a 2-dimensional object and
the other a 3-dimensional object, where the axioms describing 2-dimensional and
3-dimensional objects ensured that these two classes were disjoint (had no instances
in common). The inconsistency thus highlighted a fundamental error in the design
of the ontology, discovering and repairing which obviously improved the quality of
the ontology.
It is also possible that the descriptions in the ontology mean that two classes
necessarily have exactly the same set of instances, i.e., that they are alternative
names for the same class. This may be desirable in some situations, e.g., to capture
the fact that “Myocardial infarction” and “Heart attack” mean the same thing. It
could, however, also be the inadvertent result of interactions between descriptions,
and so it is also useful to be able to alert users to the presence of such “synonyms”.
For example, when developing a medical terminology ontology a domain expert
added the following two axioms:
AspirinTablet 9hasForm:Tablet
AspirinTablet v AspirinDrug
intending to capture the information that aspirin tablets are just those aspirin drugs
that have the form of a tablet. Instead, these axioms had the effect of making
every kind of tablet be an aspirin tablet. This was immediately corrected when the
reasoner alerted the domain expert to the unexpected equivalence between Tablet
and AsprinTablet.
In addition to checking for inconsistencies and synonyms, ontology development
tools usually check for implicit subsumption relationships and update the class
hierarchy accordingly. This is also a very useful design aid: it allows ontology
developers to focus on class descriptions, leaving the computation of the class
What Are Ontologies Good For? 183
5 Ontology Applications
The availability of tools and reasoning systems such as those mentioned in Sect. 3
has contributed to the increasingly widespread use of OWL. Applications of OWL
are particularly prevalent in the life sciences where it has been used by the
developers of several large biomedical ontologies, including the SNOMED, GO
and BioPAX ontologies mentioned above, the FMA [5] and the National Cancer
Institute thesaurus [10].
Many ontologies are the result of collaborative efforts within a given community
and are developed with the aim of facilitating information sharing and exchange.
Some ontologies are even commercially developed and subject to a licence fee. In
most cases, an ontology focuses on a particular domain, although there are some
well-known “foundational” or “upper” ontologies, such as DOLCE1 and SUMO,2
whose coverage is more general; their aim is, however, mainly to provide a carefully
formalised basis for the development of domain-specific ontologies.
1
http://www.loa-cnr.it/DOLCE.html
2
http://www.ontologyportal.org/
What Are Ontologies Good For? 185
Many OWL ontologies are now available on the web—an OWL ontology is
identified by a URI, and the ontology should, in principle, be available at that
location. There are also several well-known ontology libraries, and even ontology
search engines such as SWOOGLE,3 that can be used to locate ontologies. In
practice, however, applications are invariably built around a predetermined ontology
or set of ontologies that are well understood and known to provide suitable coverage
of the relevant domains.
The importance of reasoning support in ontology applications was highlighted
by recent work on the MED ontology in which potentially critical errors were dis-
covered (see Sect. 1). Similarly, an extended version of the SNOMED ontology was
checked using an OWL reasoner, and a number of missing subClassOf relationships
found. This ontology is being used by the UK National Health Service (NHS) to
provide “A single and comprehensive system of terms, centrally maintained and
updated for use in all NHS organisations and in research” and as a key component
of their multi-billion pound “Connecting for Health” IT programme. An important
feature of this system is that it can be extended to provide more detailed coverage
if needed by specialised applications. For example, a specialist allergy clinic may
need to distinguish allergies caused by different kinds of nut, and so may add new
terms to the ontology such as AlmondAllergy:
Using a reasoner to insert this new term into the ontology will ensure that it is
recognised as a subClassOf NutAllergy. This is clearly of crucial importance in
order to ensure that patients with an AlmondAllergy are correctly identified in the
national records system as patients having a NutAllergy.
Ontologies are also widely used to facilitate the sharing and integration of infor-
mation. The Neurocommons project,4 for example, aims to provide a platform for
sharing and integrating knowledge in the neuroscience domain. A key component
is an ontology of annotations that will be used to integrate available knowledge on
the web, including major neuroscience databases. Similarly, the OBO Foundry5 is
a library of ontologies designed to facilitate information sharing and integration in
the biomedical domain.
In information integration applications the ontology can play several roles: it
can provide a formally defined and extensible vocabulary for use in semantic
annotations, it can be used to describe the structure of existing sources and the
information that they store, and it can provide a detailed model of the domain against
which queries can be formulated. Such queries can be answered by using semantic
annotations and structural knowledge to retrieve and combine information from
3
http://swoogle.umbc.edu/
4
http://sciencecommons.org/projects/data/
5
http://www.obofoundry.org/
186 I. Horrocks
multiple sources [26]. It should be noted that the use of ontologies in information
integration is far from new and has already been the subject of extensive research
within the database community [1].
6 Discussion
References
1. Batini, C., Lenzerini, M., Navathe, S.B.: A comparative analysis of methodologies for database
schema integration. ACM Comput. Surv. 18(4), 323–364 (1986)
2. Brachman, R.J., Schmolze, J.G.: An overview of the KL-ONE knowledge representation
system. Cognit. Sci. 9(2), 171–216 (1985)
3. Derriere, S., Richard, A., Preite-Martinez, A.: An ontology of astronomical object types for
the virtual observatory. In: Proceedings of Special Session 3 of the 26th Meeting of the IAU:
Virtual Observatory in Action: New Science, New Technology, and Next Generation Facilities
(2006)
4. Fensel, D., van Harmelen, F., Horrocks, I., McGuinness, D., Patel-Schneider, P.F.: OIL: an
ontology infrastructure for the semantic web. IEEE Intell. Syst. 16(2), 38–45 (2001)
5. Golbreich, C., Zhang, S., Bodenreider, O.: The foundational model of anatomy in OWL:
experience and perspectives. J. Web Semant. 4(3), 181–195 (2006)
6. Goodwin, J.: Experiences of using OWL at the ordnance survey. In: Proceedings of the First
OWL Experiences and Directions Workshop. CEUR Workshop Proceedings, vol. 188, CEUR
Online Workshop Proceedings (http://ceur-ws.org/) (2005)
7. Grau, B.C., Horrocks, I., Kazakov, Y., Sattler, U.: Just the right amount: extracting modules
from ontologies. In: Proceedings of the Sixteenth International World Wide Web Conference
(WWW 2007). ACM Press, New York (2007a)
8. Grau, B.C., Kazakov, Y., Horrocks, I., Sattler, U.: A logical framework for modular integration
of ontologies. In: Proceedings of the 20th International Joint Conference on Artificial
Intelligence (IJCAI 2007), pp. 298–303 (2007b)
9. Haarslev, V., Möller, R.: RACER system description. In: Proceedings of the International Joint
Conference on Automated Reasoning (IJCAR 2001). Lecture Notes in Artificial Intelligence,
vol. 2083, pp. 701–705. Springer, Berlin (2001)
10. Hartel, F.W., de Coronado, S., Dionne, R., Fragoso, G., Golbeck, J.: Modeling a description
logic vocabulary for cancer research. J. Biomed. Informat. 38(2), 114–129 (2005)
11. Horrocks, I., Sattler, U.: A tableaux decision procedure for SHOIQ. In: Proceedings of
the 19th International Joint Conference on Artificial Intelligence (IJCAI 2005), pp. 448–453.
Professional Book Center. AAAI Press, Menlo Park (2005)
12. Horrocks, I., Patel-Schneider, P.F., van Harmelen, F.: Reviewing the design of DAML C OIL:
an ontology language for the semantic web. In: Proceedings of the 18th National Conference on
Artificial Intelligence (AAAI 2002), pp. 792–797. AAAI Press (2002), ISBN 0-26251-129-0
13. Horrocks, I., Patel-Schneider, P.F., van Harmelen, F.: From SHIQ and RDF to OWL: the
making of a web ontology language. J. Web Semant. 1(1), 7–26 (2003) ISSN 1570–8268
14. Kalyanpur, A., Parsia, B., Sirin, E., Cuenca-Grau, B., Hendler, J.: SWOOP: a web ontology
editing browser. J. Web Semant. 4(2) (2005a)
15. Kalyanpur, A., Parsia, B., Sirin, E., Hendler, J.: Debugging unsatisfiable classes in OWL
ontologies. J. Web Semant. 3(4), 243–366 (2005b)
16. Kershenbaum, A., Fokoue, A., Patel, C., Welty, C., Schonberg, E., Cimino, J., Ma, L., Srinivas,
K., Schloss, R., Murdock, J.W.: A view of OWL from the field: use cases and experiences.
In: Proceedings of the Second OWL Experiences and Directions Workshop. CEUR Workshop
Proceedings, vol. 216 (http://ceur-ws.org/). CEUR (2006)
17. Knublauch, H., Fergerson, R., Noy, N., Musen, M.: The Protégé OWL Plugin: an open
development environment for semantic web applications. In: McIlraith, S.A., Plexousakis, D.,
van Harmelen, F. (eds.) Proceedings of the 3rd International Semantic Web Conference (ISWC
2004). Lecture Notes in Computer Science, vol. 3298, pp. 229–243. Springer, Berlin (2004),
ISBN 3-540-23798-4
18. Lacy, L., Aviles, G., Fraser, K., Gerber, W., Mulvehill, A., Gaskill, R.: Experiences using
OWL in military applications. In: Proceedings of the First OWL Experiences and Directions
Workshop. CEUR Workshop Proceedings, vol. 188 (http://ceur-ws.org/). CEUR (2005)
188 I. Horrocks
19. Patel-Schneider, P.F., Hayes, P., Horrocks, I.: OWL Web Ontology Language semantics and
abstract syntax, W3C Recommendation, 10 February 2004, URL http://www.w3.org/TR/owl-
semantics/
20. Rector, A., Rogers, J.: Ontological and practical issues in using a description logic to represent
medical concept systems: experience from GALEN. In: Reasoning Web, Second International
Summer School, Tutorial Lectures. LNCS, vol. 4126, pp. 197–231. Springer, Heidelberg
(2006)
21. Ruttenberg, A., Rees, J., Luciano, J.: Experience using OWL DL for the exchange of biological
pathway information. In: Proceedings of the First OWL Experiences and Directions Workshop.
CEUR Workshop Proceedings, vol. 188 (http://ceur-ws.org/). CEUR (2005)
22. Sidhu, A., Dillon, T., Chang, E., Sidhu, B.S.: Protein ontology development using OWL.
In: Proceedings of the First OWL Experiences and Directions Workshop. CEUR Workshop
Proceedings, vol. 188 (http://ceur-ws.org/). CEUR (2005)
23. Sirin, E., Parsia, B., Cuenca Grau, B., Kalyanpur, A., Katz, Y.: Pellet: a practical OWL-DL
reasoner. J. Web Semant. 5(2), 51–53 (2007)
24. Soergel, D., Lauser, B., Liang, A., Fisseha, F., Keizer, J., Katz, S.: Reengineering thesauri for
new applications: the AGROVOC example. J. Digital Inform. 4(4) (2004)
25. Spackman, K.: Managing clinical terminology hierarchies using algorithmic calculation of
subsumption: experience with SNOMED-RT. J. Am. Med. Informat. Assoc. (2000) Fall
Symposium Special Issue
26. Stevens, R., Baker, P., Bechhofer, S., Ng, G., Jacoby, A., Paton, N.W., Goble, C.A., Brass,
A.: Tambis: transparent access to multiple bioinformatics information sources. Bioinformatics
16(2), 184–186 (2000)
27. SWEET: Semantic web for earth and environmental terminology (SWEET). Jet Propulsion
Laboratory, California Institute of Technology (2006), http://sweet.jpl.nasa.gov/
28. Tsarkov, D., Horrocks, I.: FaCTCC description logic reasoner: system description. In: Pro-
ceedings of the International Joint Conference on Automated Reasoning (IJCAR 2006).
Lecture Notes in Artificial Intelligence, vol. 4130, pp. 292–297. Springer, Berlin (2006)
29. Woods, W.A.: What’s in a link: foundations for semantic networks. In: Brachman, R.J.,
Levesque, H.J. (eds.) Readings in Knowledge Representation, pp. 217–241. Morgan Kaufmann
Publishers, San Francisco, California (1985); Previously published In: Bobrow, D.G., Collins,
A.M. (eds.) Representation and Understanding: Studies in Cognitive Science, pp. 35–82.
Academic, New York (1975)
Taxonomic Change as a Reflection of Progress
in a Scientific Discipline
Abstract The terminology that is used to index the world’s biomedical literature
has its roots in an indexing system developed in the late nineteenth century. In the
early 1960s that terminology, now known as the Medical Subject Headings (MeSH),
was updated and reframed for information retrieval purposes in the context of
MEDLINE, the US National Library of Medicine’s bibliographic retrieval system.
MeSH is a semantic system in its own right and responds, just as all such systems
do, to both internal and external forces. We conducted a study of the evolution of
MeSH over the last 45 years. We hypothesized, and our analyses confirmed, that
some changes reflect internal considerations, such as developing a more principled
ontological structure, and others reflect important external forces, most notably
the development of biomedical knowledge itself. Our work has implications for
research in ontology evolution and for research that is concerned with the conceptual
modeling and evolution of a knowledge domain.
1 Introduction
The terminology that is used to index the world’s biomedical literature has its roots
in an indexing system developed in the late nineteenth century [1]. In the early 1960s
that terminology, known as the Medical Subject Headings (MeSH), was updated and
reframed for information retrieval purposes in the context of MEDLINE, the US
National Library of Medicine’s bibliographic retrieval system. MeSH began as a
loose collection of indexing terms and has over the years grown to become a highly
structured arrangement of tens of thousands of terms that are used to index hundreds
of thousands of articles per year. MeSH is technically a thesaurus [2], though it is
also increasingly referred to as a biomedical ontology [3–5]. For the purposes of the
MeSH is the controlled terminology designed in the 1960s by the National Library
of Medicine (NLM) for the purpose of indexing and retrieving the biomedical
literature. The scope of MeSH is very broad (all of biomedicine) and, according
to Nelson et al. the goal of the terminology is “to provide a reproducible partition of
concepts relevant to biomedicine for purposes of organization of medical knowledge
and information.” [8].
MeSH currently contains over 26,000 descriptors, arranged in a hierarchi-
cal structure of 16 top-level categories with up to 11 levels of embedding.
MeSH top-level categories include specific topics such as anatomy, organisms,
diseases, drugs, and procedures, as well as categories that cover large disciplinary
areas, such as anthropology, education, and sociology, technology, industry, and
agriculture, humanities, and information science. There are also some 180,000
“entry terms,” which are synonyms or near-synonyms. For example, “Antisocial
Personality Disorder” has five entry terms, one is a lexical variant, “Personal-
ity Disorder, Antisocial,” and the others are closely related terms, “Antisocial
Personality,” “Dyssocial Behavior,” “Psychopathic Personality,” and “Sociopathic
Personality.” The entry terms share the same unique identifier (DUID) with the
MeSH descriptor so that searching with an entry term gives the same search results
as searching with the preferred descriptor name.
Each MeSH descriptor also has one or more tree numbers, representing its place
in the MeSH hierarchy. Any given descriptor can appear in multiple hierarchies,
Taxonomic Change as a Reflection of Progress in a Scientific Discipline 191
1.2 MEDLINE
Fig. 1 Sample MEDLINE citation record. MeSH terms with an asterisk are major topics of the
article
cases, when this happens, the older term becomes an entry term to the newer one in
MeSH, thus, ensuring that if users search under the older term, they will still retrieve
the relevant articles.
Both the annual MeSH and MEDLINE files are freely available for download in
XML format [12].
In the context of recent research on the Semantic Web, there has been renewed
interest in ontology evolution, with a focus on versioning and change management
[22–26]. It has become clear that ontologies are not static artifacts, but rather, just
like the knowledge they encapsulate, ontologies necessarily evolve. The approach
to ontology evolution in these cases is to develop methods for recognizing both
terminological change and structural change with the explicit goal of ensuring con-
tinued (inter)operability. In some cases, tools have been developed for revealing the
differences between various versions of an ontology such that ontology developers
are able to manage modifications as the ontology grows.
While insights from ontology versioning methods are pertinent to the work
we describe here, our specific techniques for assessing the evolution of MeSH
are closely related to research on ontology alignment and mapping [27–29].1
Euzenat and Shvaiko propose several levels (and associated techniques) at which
the ontology mapping process may occur [27]. Following the work of Kang and
Naughton [30], they make a high level distinction between element level analysis,
on the one hand, encompassing a variety of string, name-based, and linguistic
techniques for assessing changes in terminology, and structure-level analysis, on the
other hand, including taxonomic and topological similarity measures for assessing
changes in overall structure.
2.1 Materials
1
This is not remarkable if we view each of our ten slices of MeSH as separate (though, of course,
closely related) ontologies which are then aligned with each other across the years.
2
Notably, in 2009 the printed MeSH was discontinued.
194 A.T. McCray and K. Lee
2.2 Methods
“Behavior and Behavior Mechanisms” and is, thus, assigned the normalized tree
number “6.1.2.” This tree number assignment and normalization preserves the order
within the same year and makes the comparison of the node position of a given term
across different years possible.
Term-Level Changes So that we would be able to study and explore the structural
and terminological changes in MeSH over time, we built an interactive web
application, which we call the MEVO (MeSH Evolution) Browser. MEVO has three
196 A.T. McCray and K. Lee
distinct views: a tree structure view, a year-by-year view, and a term view, grouped
by DUID. These three views are fully hyperlinked with each other so that traversing
from one view to another can be done easily. Figure 4 shows a MEVO screenshot.
The screenshot shows the evolution of the descriptor, DUID D019966, from 1963
to 2008. In 1963 only “Drug Addiction” appeared in MeSH. In 1968 “Drug Abuse”
was added, and in 1973 “Glue Sniffing” joined this group of co-existing MeSH
descriptors. By 1978, “Drug Addiction” had been dropped, and “Drug Dependence”
was introduced. From 1983 to 1993 a different set of four descriptors co-existed
and all of the earlier ones were dropped. In 2008 the official name of DUID
D019966 is “Substance-Related Disorders” and each of the other eight historical
terms have become entry terms. For any given year the terms listed in Fig. 4 had
their own DUID’s. For example, “Organic Mental Disorders, Substance-Induced,”
“Substance Abuse,” “Substance Dependence,” and “Substance Use Disorders”
co-existed in 1993 with four different DUID’s. By 1998 they were all merged with
“Substance-Related Disorders” and they now share one DUID. DUID’s are never
reused, but they may be deleted.
Just below the term display in Fig. 4, a portion of the fully navigable tree structure
is shown, with a focus here on the term “Drug Addiction.” Scrolling down the
display reveals the tree structures for each of the years when “Drug Addiction”
existed as an MeSH descriptor (1963–1973). MEVO is fully interactive. Clicking
on any term gives a view that focuses specifically on that term and its historical
evolution. Clicking on a year displays the tree structures, terms, and additions and
deletions for that year and is an entry point for any of the other years.
Taxonomic Change as a Reflection of Progress in a Scientific Discipline 197
Global Changes
Terminology When comparing MeSH terms across multiple years we did so at the
string, lexical, and descriptor levels. That is, our comparisons looked, respectively,
for the absence or presence of
1. The same string of characters
2. The same term, but where the actual strings might be different because of lexical
variation3
3. Different terms, but where the terms have the same MeSH descriptor DUID4
The results at each of these different levels can be quite different. For example,
in 1978 an MeSH descriptor was expressed as “Psychiatry, Community,” and by
1983 it had become “Community Psychiatry.” So, while the strings are different,
they only vary by word order. We would not consider this either as a deletion or
as an addition at the lexical level. At the string level, however, we would say the
1978 string had been deleted and the 1983 string had been added. Similarly, in 1973
a MeSH descriptor was expressed as “Mongolism”, and by 1978 it had become
“Down’s Syndrome”. These are completely different strings and terms and thus
at the string and lexical levels they would be considered deletions and additions,
respectively. However, at the descriptor level, because they share the same DUID
they are counted as equivalent.
Structure From one year to another across our ten time slices, we compared
descriptor additions and deletions, changes in depth and breadth of hierarchies,
and movement of descriptors within hierarchies. We distinguished true deletions,
i.e., when a string is entirely deleted from MeSH, from those cases when a string
is no longer a descriptor in its own right, but rather has become an entry term to
a different descriptor. We introduced the notion of “demotion” to account for these
latter cases. To measure structural change, we used our normalized tree numbers and
adopted Resnik’s revision [34] of Wu–Palmer’s similarity measure [35]. Similarity
s between terms c and c 0 is defined as follows:
where p.c/ is the number of edges from the root node to c and c ^ c 0 is the most
common ancestor of c and c 0 . Similarity s becomes 1 when c and c 0 are the same
(i.e., when there has been no movement) and approaches 1 when there is a minor
movement, for example, from one deeply embedded node to another where the
3
We used the UMLS lexical programs, and specifically the “norm” program to evaluate lexical
variation [31, 32].
4
These distinctions are similar, but not identical, to the sui (string), lui (lexical variant), and cui
(concept) distinctions in the Unified Medical Language System [33]. The DUID often functions
like a concept, where its entry terms are synonymous with each other, but just as often the DUID
includes a combination of synonymous entry terms and terms that are only closely related to each
other.
198 A.T. McCray and K. Lee
nodes share a proximal ancestor. Similarity approaches zero when both c and c 0
are deeply embedded and their common ancestor is close to the root. For example,
if a term were to move from being a deeply embedded descendant of “Behavior
and Behavior Mechanisms” (which lies close to the root) in 1998 to become a
descendant of Mental Disorders (which also lies close to the root) in 2003, this
would signal a significant change in interpretation of that term. Because distance
and similarity are inversely related to each other (the greater the distance the less
similar the terms are), we defined distance, d between terms c and c 0 as follows:
Distance d becomes zero when c and c 0 are the same, and approaches 1 when c
and c 0 are further apart.
An example illustrates. “Peer Review” has the same MeSH tree number in 1998
and 2003, but two different normalized tree numbers, “6.1.12.2.4” and “6.1.13.3.5,”
respectively. This means that new terms were added in 2003 at the grandparent and
parent levels as well as at the same level. (Recall that each dot represents one level
of embedding.) The distance between the position of “Peer Review” in 1998 and its
new position in 2003 is calculated as follows:
When a term is added or deleted from one time period to the next, we define
distance, d as follows:
d.c/ D 1=p.c/ w; (4)
where p.c/ is the number of edges from the root node to c and w is the weight
factor.
We assigned a weight factor of 3 for both addition and deletion, and a weight
factor of 2 for demotion. We gave greater weight to additions and deletions because
they have a greater impact on the overall vocabulary. For example, “Substance-
Related Disorders” was added in 1998 as F03.900 (6.3.16), where the distance is 12
3 D 1:5. Demotions, on the other hand, reflect an internal change where a descriptor
is not fully removed, but, rather, is incorporated within another existing descriptor
as an entry term. For example, “Dementia, Presenile” and “Alzheimer’s Disease”
co-existed as descriptors until 1993, and then became entry terms of “Alzheimer’s
Disease” in 1998. The distance for “Dementia, Presenile” (normalized tree number
6.3.5.4.1.4.2) between 1993 and 1998 is calculated as follows (with the demotion
weight of 2):
d D 1=p.6:3:5:4:1:4:2/ 2 D 0:33 : (5)
Taxonomic Change as a Reflection of Progress in a Scientific Discipline 199
Finally, when there is more than one tree number for a given term in the
same year, we calculated all possible pairs of those tree numbers across years and
(arbitrarily) chose the minimum distance. The overall structural change between two
years is the sum of all distances for each individual term. To compare this change
with other year pairs, we divided the sum of all distances for each term across the
two years by the total number of unique terms that existed in these two years.
Literature We used the Entrez programming utilities [36] to retrieve all of the
articles indexed with descriptors from the F section of MeSH in each of our 10 time
periods and also the full set of articles published during each of those same periods.
So as not to overestimate the relative growth of the psychiatry and psychology
literature over time, we normalized the growth of articles by dividing the number of
articles indexed by descriptors from the F section of MeSH by the total number of
articles published during any given time period.
3 Results
Using the MEVO browser, we explored changes in individual MeSH terms or groups
of MeSH terms over time. Tables 2 and 3, for example, show the evolution of term-
level changes for two interesting cases, autism spectrum disorders (ASD) and human
sexuality.
Fig. 5 Total number of additions and deletions of psychiatry and psychology terms by year
Growth in Literature Table 5 shows the growth in the literature over the years.
The table shows the increase in the number of MEDLINE citations indexed
by MeSH F section terms normalized by the overall increase in citations over the
years. Row c in the table indicates a burst of research in psychiatry and psychology
between 1968 and 1973 and then again between 1988 and 1993.
4 Discussion
Our study of the psychiatric and psychologic terms in MeSH revealed a large
number of both structural and terminological changes in the last 45 years. We
found that some changes reflect internal considerations, such as developing a more
principled ontological structure. This was particularly evident in the early years
when the vocabulary was beginning to take shape. The 1963 version had very little
explicit structure; by 1973 three top-level categories had been introduced; and by
1978 a fully formed hierarchical structure with tree numbers had been established.
In the early years there was one top-level F node, “psychiatry and psychology.”
Table 6 shows that by 1973, there were three top-level nodes, by 1978 there were
four, and in the last decade there have been four slightly different top-level nodes.
The changes in the top-level categories between 1993 and 1998 represent a clear
ontological “clean-up,” the result of which was that behaviors were clearly separated
from disorders. In addition, between 1993 and 1998 a large number of terms were
deleted from the F section. (See Fig. 5 above.) In many cases, as has already been
noted, the deleted term did not disappear from MeSH entirely, but rather, it survived
as an entry term. In some other cases, the deleted term was moved to a different
section of MeSH, e.g., “decision theory,” which had been a child of “psychological
techniques,” was moved outside of F to two different MeSH categories (H and L,
where it appears under Mathematical Computing) This drop in the number of terms
in 1998 is another indicator of a thorough review and revision of the psychiatry and
psychology section in this time period.
The impact of these changes, and others at more granular levels, is not to
be underestimated. Search algorithms depend on the structure of MeSH when
performing queries, and an ill-formed structure will lead to incomplete results at
best. In the past, (expert) users were more aware of the structure of MeSH when
formulating queries, and could, perhaps, better control the search results. Today, the
202 A.T. McCray and K. Lee
Fig. 6 Changes in MeSH and the literature over the years, with respect to Psychiatry and
Psychology
PubMed search engine transforms every query, among other things, into a search of
the query term together with all of its descendants.
It is the thesis of this paper that the evolution of biomedical terminology reflects
the development of biomedical knowledge itself. Figure 6 provides evidence that
this is indeed the case.
The figure shows the relationship between the (normalized) growth of psychiatry
and psychology articles since 1963 and the degree of change in the F section of
MeSH over that same time period. Notice that the peak in the literature in the 1968–
Taxonomic Change as a Reflection of Progress in a Scientific Discipline 203
1973 time frame is reflected in extensive changes to MeSH in the subsequent time
period 1973–1978. Because we sampled every 5 years, this is only an approximation
of when the actual changes to MeSH were made. Nonetheless, the figure indicates
that a burst in activity in the literature is soon followed by major activity in the
indexing vocabulary itself, thereby reflecting progress in the discipline. Studying
this activity in greater detail by looking at the changes in individual terms or clusters
of closely related terms reveals a good deal about the state of knowledge itself.
Several examples will illustrate this point.
Autism is one of a spectrum of disorders that encompass a range of highly
heritable, complex childhood neurodevelopmental disorders, usually diagnosed
before the age of three in children from all socioeconomic, ethnic, and cultural
backgrounds [37, 38]. ASD’s are four times more likely to occur in boys than girls,
and family studies show a strong, though not yet well-defined, genetic contribution
to this condition.
The 2008 MeSH recognizes four pervasive child development disorders:
“Asperger syndrome,” “autistic disorder,” “Rett syndrome,” and “childhood
schizophrenia,” although this latter term is said to be “an obsolete concept,
historically used for childhood mental disorders thought to be a form of
schizophrenia.” ASD’s last throughout a lifetime and are often associated with
other comorbid conditions. Table 2 above shows that there has been a growth in the
number of terms used to describe the phenomenon and shows which terms coexisted
with each other in any given time period. Figure 7 shows the growth of the ASD
literature over the years.
The numbers on the graph indicate the number of ASD citations in each time
period and the superimposed line normalizes the number according to the overall
number of citations in all of MEDLINE during that same time period. Until 1973,
there were only two MeSH descriptors to describe the ASD literature. (See Table 2
above.) By 1993 there were five distinct descriptors, but Asperger syndrome was
not yet among them. What this tells us is that the descriptor “Autism” actually
had a rather different (more diffuse) meaning in 1963 when it only shared the
204 A.T. McCray and K. Lee
Fig. 8 Placement of the descriptor “Homosexuality” in the MeSH hierarchy over time
ASD conceptual space with “childhood schizophrenia” from the meaning it had in
1993 when there were several other concepts in the same space. That is, “infantile
autism,” “Rett syndrome,” and “pervasive child development disorders” were now
also available for indexing. (Interestingly, a number of articles during these various
time periods are concerned specifically with differentiating ASD disorders from
each other [39, 40].) Any study that looks at the trends in ASD research would do
well to recognize these shifts in meaning (and knowledge) over time.
The changes affecting the descriptor “Homosexuality” serve as another inter-
esting example of the reflection in the MeSH terminology of a shift in our
understanding of a particular phenomenon. Figure 8 shows the placement of the
descriptor “Homosexuality” in the MeSH hierarchy during each of our time periods.
Notice that in the early years “Homosexuality” is a child of “Sex Deviation” in
the hierarchy and appears in the “Mental Disorders” subtree. By 1978 “Homosex-
uality” appears exclusively under the “Behavior” subtree. The significant change
in the structural placement of this descriptor can be directly tied to a statement
published by the American Psychiatric Association (APA) in late 1973. In this
policy document, which resulted in the removal of the term “homosexuality”
from the Diagnostic and Statistical Manual of Mental Disorders [41], the APA
Taxonomic Change as a Reflection of Progress in a Scientific Discipline 205
5 Conclusion
Our study of the evolution of the MeSH terminology since its early beginnings in
the 1960s to the present day revealed a wide range of interesting terminological
and structural changes. Because early versions of MeSH are only available in print
form, we reconstructed the data from the printed books, and then only for one
section of MeSH, sampled every five years. For the results of our study to be
broadly applicable, all earlier versions of MeSH would need to be made available
in digital form. With these data in hand, more sophisticated and nuanced studies of
the evolution of biomedical knowledge would be possible. The MeSH terminology
provides a rich window into biomedical research, very clearly showing the evolution
in our understanding of biomedical phenomena. The work described here can
be seen as a contribution both to research that is concerned with understanding
the conceptual structure and evolution of a knowledge domain and to research
whose goal it is to develop optimal methodologies for understanding and tracking
ontological change.
Acknowledgements We thank the MeSH staff at the National Library of Medicine for providing
us electronic versions of MeSH tree structure files. We thank Malika McCray for data entry and
quality control. This work was supported in part by Harvard Catalyst—The Harvard Clinical and
Translational Science Center (NIH Grant #1 UL1 RR 025758-01 and financial contributions from
Harvard University and participating academic health care centers).
References
1. Library of the Surgeon-General’s Office (U.S.): Index-Catalogue of the Library of the Surgeon-
General’s Office. Government Printing Office, Washington (1880–1961)
2. Guidelines for the construction, format, and management of monolingual thesauri. ANSI/NISO
Z39.19-1993
3. Cote, R.G., Jones, P., Apweiler, R., Hermjakob, H.: The Ontology lookup service, a lightweight
cross-platform tool for controlled vocabulary queries. BMC Bioinformat. 7, 97 (2006)
4. Lambrix, P., Tan, H., Jakoniene, V., Str’mb’ck, L.: Biological ontologies. Semantic Web,
pp. 85–99. Springer, Berlin (2007)
5. Reed, S., Lenat, D.: Mapping ontologies into cyc. In: Proceedings of the AAAI Conference
2002 Workshop on Ontologies for the Semantic Web. AAAI Press (2002)
6. Gruber, T.: Toward principles for the design of ontologies used for knowledge sharing. Int. J.
Human Comput. Stud. 43, 907–928 (1995)
7. McCray, A.T., Nelson, S.J.: The representation of meaning in the UMLS. Methods Inf. Med.
34, 193–201 (1995)
8. Stuart, J.N., Douglas, J., Betsy, L.H.: Relationships in medical subject headings. In: Carol,
B.A., Rebecca, G. (eds.) Relationships in the Organization of Knowledge, pp. 171–184.
Kluwer, New York (2001)
9. MeSH Vocabulary Changes. National Library of Medicine. http://www.nlm.nih.gov/mesh/
2009/introduction/intro voc change.html. Accessed 28 Feb 2013
10. MEDLINE/PubMed Data maintenance overview. U.S. National Library of Medicine. http://
www.nlm.nih.gov/bsd/licensee/medline maintenance.html. Accessed 28 Feb 2013
Taxonomic Change as a Reflection of Progress in a Scientific Discipline 207
11. Medical subject headings: citation maintenance tasks in XML format. U.S. National Library
of Medicine. http://www.nlm.nih.gov/mesh/gcmdoc2008.html. Accessed 28 Feb 2013
12. MeSH Memorandum of Understanding. U.S. National Library of Medicine. http://www.nlm.
nih.gov/mesh/2009/download/termscon.html. Accessed 28 Feb 2013
13. Chen, C., Paul, R.: Visualizing a knowledge domain’s intellectual structure. Computer 34, 65–
71 (2001)
14. Chen, C.: Searching for intellectual turning points: progressive knowledge domain visualiza-
tion. Proc. Natl. Acad. Sci. USA 101(Suppl 1), 5303–5310 (2004)
15. Chen, T., Lee, M.: Revealing themes and trends in the knowledge domain’s intellectual
structure. Lect. Notes Comput. Sci. 4303, 99–107 (2006)
16. Kleinberg, J.: Temporal dynamics of on-line information streams. Data Stream Management:
Processing High-Speed Data Streams. Springer, Berlin (2005)
17. Kontostathis, A., Galitsky, L.M., Pottenger, W.M., Roy, S.: A survey of emerging trend detec-
tion in textual data mining. Survey of Text Mining: Clustering. Springer (ISBN 0387955631)
(2003)
18. Mane, K.K., Borner, K.: Mapping topics and topic bursts in PNAS. Proc. Natl. Acad. Sci. USA
101(Suppl 1), 5287–5290 (2004)
19. Roy, S., Gevry, D., Pottenger, W.M.: Methodologies for trend detection in textual data mining.
In: Proceedings of the Textmine, vol. 2, pp. 1–12 (2002)
20. Shiffrin, R.M., Borner, K.: Mapping knowledge domains. Proc. Natl. Acad. Sci. USA
101(Suppl 1), 5183–5185 (2004)
21. White, H.D., Lin, X., Buzydlowski, J.W., Chen, C.: User-controlled mapping of significant
literatures. Proc. Natl. Acad. Sci. USA 101(Suppl 1), 5297–5302 (2004)
22. Noy, N., Kunnatur, S., Klein, M., Musen, M.: Tracking changes during ontology evolution.
Lecture Notes in Computer Science, pp. 259–273. Springer, Heidelberg (2004). http://link.
springer.com/chapter/10.1007%2F978-3-540-30475-3 19
23. Klein, M., Fensel, D.: Ontology versioning on the Semantic Web. Proceedings of the SWWS,
pp. 75–91. Stanford University (2001)
24. Noy, N.F., Musen, M.A.: PROMPTDiff: a fixed point algorithm for comparing ontology
versions. In: 18th National Conference on Artificial Intelligence. AAAI-2002. AAAI Press
(2002)
25. Klein, M., Fensel, D., Kiryakov, A., Ognyanov, D.: OntoView: comparing and versioning
ontologies. ISWC 2002, Sardinia, Italy, 2002
26. Voelkel, M., Groza, T.: SemVersion: an RDF-based ontology versioning system. In: Proceed-
ings of the IADIS international conference WWW/Internet. Stanford University (2006)
27. Euzenat, J., Shvaiko, P.: Ontology Matching. Springer, Berlin (2007)
28. Zhang, S., Bodenreider, O.: Experience in aligning anatomical ontologies. Int. J. Semant. Web
Inf. Syst. 3, 1–26 (2007)
29. Flouris, G., Plexousakis, D.: Handling ontology change: survey and proposal for a future
research direction. TR-362, FORTH-ICS (2005)
30. Kang, J., Naughton, J.: On schema matching with opaque column names and data values. In:
Proceedings of the 22nd International Conference on Management of Data (SIGMOD), pp.
205–216. ACM (2003). http://dl.acm.org/citation.cfm?doid=872757.872783
31. Specialist NLP Tools. http://specialist.nlm.nih.gov. Accessed 28 Feb 2013
32. McCray, A.T.: The nature of lexical knowledge. Methods Inf. Med. 37, 353–360 (1998)
33. Unified Medical Language System. http://www.nlm.nih.gov/pubs/factsheets/umls.html.
Accessed 28 Feb 2013
34. Resnik, P.: Semantic similarity in a taxonomy: an information-based measure and its applica-
tion to problems of ambiguity in natural language. J. Artif. Intell. Res. 11, 95–130 (1999)
35. Wu, Z., Palmer, M.: Verb semantics and lexical selection. In: Proceedings of the 32nd
Annual Meeting on Association for Computational Linguistics. Association for Computational
Linguistics, Las Cruces, New Mexico (1994)
36. Entrez Utilities. http://www.ncbi.nlm.nih.gov/books/NBK25500/. Accessed 28 Feb 2013
208 A.T. McCray and K. Lee
37. Folstein, S.E., Rosen-Sheidley, B.: Genetics of autism: complex aetiology for a heterogeneous
disorder. Nat. Rev. Genet. 2, 943–955 (2001)
38. Fombonne, E.: Epidemiological surveys of autism and other pervasive developmental disor-
ders: an update. J. Autism. Dev. Disord. 33, 365–382 (2003)
39. Dauner, I., Martin, M.: Autismus Asperger oder Fruehschizoprenie? Zur nosologischen
Abgrenzung beider Krankheitsbilder. Padiatr. Padol. 13, 31–38 (1978)
40. Howlin, P.: Outcome in high-functioning adults with autism with and without early language
delays: implications for the differentiation between autism and Asperger syndrome. J. Autism.
Dev. Disord. 33, 3–13 (2003)
41. Diagnostic and Statistical Manual of Mental Disorders, 4th edn. DSM-IV. American Psychi-
atric Association, Washington, DC (1994)
42. American Psychiatric Association: Homosexuality and Sexual Orientation Disturbance: Pro-
posed Change in DSM-II, 6th printing, p. 44. Position statement (1973)
Crystallizations as a Form of Scientific Semantic
Change: The Case of Thermodynamics
C. Ulises Moulines
1 Introduction
Scientific disciplines consist, among other things, of particular units usually called
“scientific theories”. These units are relatively well-defined conceptual systems
in the sense that they are built out of a particular set of specific concepts of
their own, which they apply in specific ways to a given portion of empirical reality—
that portion a particular theory is supposed to deal with. In any given scientific
theory, the semantics of its specific concepts and the methodology of its application
to an intended range of experience are essentially inter-twinned. Now, scientific
theories (and, especially, empirical scientific theories) are not static entities1 : A
full grasping of their nature has to be undertaken from a diachronic perspective.
Scientific theories, and the conceptual frameworks they are associated with, are
like organisms—it makes sense to say of them that they change. Since theories
are the stuff (or, more cautiously, one stuff) scientific disciplines are made of, major
changes in the theories that make up a discipline also imply major changes in the
discipline itself. An interesting task for the philosophy of science is to determine the
different types of change that may occur within a given discipline. Scientific theories
are in a perpetual state of change, and so are the disciplines containing them. This is
no surprise. Science is constantly in flux. However, a great number of these changes
have no deep epistemological and semantical significance, since they don’t involve a
conceptual change—or, at most, the theory’s concepts change extensionally but not
intensionally. Consider, for example, the quite usual case where a given parameter in
a given theory is determined with some more exactness than before (say, by adding
some decimals); or consider the case where a theory that has already been applied
successfully to a number of systems is subsequently applied to a new system that
looks very much like the previous ones. In such cases, there is certainly a change
in the theory and it may be quite important for the practice of science or for its
technological applications, but it is not the one that we should take seriously as
a genuine conceptual change. Sure, the theory’s concepts change a bit, since their
extensions change a bit, but not so their intensions in the sense of their position in the
theory’s conceptual frame—the interrelationships between the theory’s fundamental
concepts and the laws they appear in remain the same. These are not changes that
should worry the philosophers of science much.
Consider a discipline D (say, physics, or a branch of physics) that consists of
the theories T1 , T2 , : : : , Tn . In a first step, we may consider two sorts of changes
in D that are significant from an epistemological and semantical point of view.
In one case, there may be a significant change in one of the theories constituting
D, say in Ti , although Ti remains “the same”—in a sense of “the same” that still
has to be explicated. In this case, strictly speaking, there is a conceptual change
in Ti but not in D as a whole, since D still consists of the same set T1 , T2 , : : : ,
Tn . On the other hand, there may be more substantial changes in D. One type of
them appears to be the case where Ti is eliminated from D and completely replaced
by a Tk that is incompatible with the theory eliminated. Clearly, this is an important
change for D itself. A second possibility is that, while Ti is retained, a new theory Tk
appears to which Ti has a particular sort of subordinate relationship. Finally, we may
consider the case where Ti changes gradually until, after a more or less long process,
1
In the following, by “a (scientific) theory” I always refer only to empirical theories. Problems
related to the nature of purely mathematical theories lie outside the scope of this paper.
Crystallizations as a Form of Scientific Semantic Change 211
it eventually results in a new Ti0 , which has some kind of “family resemblance” with
the original Ti but is actually a different theory. In all the three kinds of changes just
mentioned, the evolution of the discipline itself is accompanied with (or, rather, is
conditioned by) more or less drastic conceptual changes in its theories.
All of this is still rather vague and metaphorical. The question is whether we
can make these intuitive distinctions between different kinds of changes formally
precise. I think we can, but only if we don’t conceive of theories as just being axiom
systems in the way classical philosophy of science did. We have to abandon the
statement view of theories and work instead with a non-statement or “modellistic”
view. Not the notion of a scientific statement but that of a scientific model is to be
regarded as the basic unit when dealing with the analysis of theories in general, and
more particularly with the different sorts of conceptual changes that show up in a
diachronic perspective on science. This is an insight that has gained momentum
in the recent development of philosophy of science among several authors and
schools. One such modellistic approach is what is known as the “structuralist
view of theories”, or “structuralism” for short, set forth by Joseph D. Sneed,
Wolfgang Stegmüller, Wolfgang Balzer and myself, among others. Here, I’m going
to use some of the methodological instruments of structuralism to deal with the
questions of conceptual change at stake, and in particular with the last kind of
conceptual evolution listed above. For those readers who are not well-acquainted
with structuralist methodology in general, a brief summary of the notions required
for our present purposes is laid out in the next section.2
The first thing to notice is that, according to structuralism, the term “theory” is
polisemic: Its ordinary use denotes in fact different kinds of entities, which differ
according to their levels of aggregation. For the present purposes, it suffices to
consider two of the formal concepts that structuralism has proposed as explications
of the intuitive notion of a theory: theory-elements and theory-nets. A theory-net
is a hierarchically organized array of theory-elements with a “common basis” and
a common conceptual framework. The “essential identity” of a theory-element is,
in turn, given by two constituents: a homogeneous class of models and a domain
of intended applications described by partial structures of the models (their relative
non-theoretical parts). (Besides these two constituents, according to structuralism, a
“normal” theory-element is also constituted by other, more complex model-theoretic
structures; nevertheless, for the purposes of the present paper, we may forget about
them.)
2
The standard exposition of the structuralist metatheory (with several examples of application)
is [1].
212 C.U. Moulines
Now according to the model-theoretic tradition going back to the works of Alfred
Tarski and Patrick Suppes, a model is a structure of the form
hD1 ; : : : ; Dm I A1 ; : : : ; An I R1 ; : : : ; Rp I f1 ; : : : ; fq i; (1)
where the Di are the theory’s basic domains (its “ontology”), the Ai are auxiliary
sets of mathematical entities (usually numbers or vectors), the Ri are relations on
(some of) the basic sets, and the fi are (metrical) functions on (some of) the basic
and auxiliary sets.
It is not very important how we determine these models—whether by thoroughly
formal means, or semi-formally, or in ordinary language. But it seems that the
most practical method is to define the class of a theory’s models by means of the
Suppesian method of defining a so-called set-theoretical predicate, i.e. a predicate
defined solely by means of the tools of naive or semi-formal set theory. (We’ll see
several examples of such predicates below.)
A homogeneous class of models, M, is a class of structures satisfying all the same
set-theoretical predicate. In empirical theories, to this class of models a domain of
intended interpretations, I, is associated: This is a class of substructures of some of
the elements of M, containing (almost) all the Di and Ai , but only some of the Ri
and fi —those that are non-theoretical with respect to the theory in question, i.e.
those whose values may be determined without presupposing the substantial laws
used to define the set-theoretical predicate which, in turn, determines M. The other
relations and functions, if any, are called “relatively theoretical”, or “T -theoretical”.
The domain of intended applications of a given theory represents the physical
systems to which the theory is supposed to apply, where it has to be tested, or in
other words, its empirical basis (in a relative sense).
Though the distinction between T -theoretical and T -non-theoretical notions is
an important constituent of the structuralist metatheory, its application to concrete
case studies is usually a quite intricate issue, which involves a lengthy and careful
discussion. For lack of space, I’ll not go into this issue in the case study analysed in
this paper.
Thus, a theory-element is (in this highly simplified version of structuralist
methodology) an ordered pair hM; Ii
A theory-net is a finite class of theory-elements ordered by a relation of
specialization and having a first element. This first element is what may be called
“the basic theory-element”, T0 (D hM0 ; T0 i). It is the one whose models are
determined only by the fundamental law(s) of the theory in question. Taken in
isolation, the fundamental law(s) in T0 usually have an empirically almost vacuous
nature—they are “guiding principles” rather than statements of fact. But for this
very reason, they constitute the theory’s core. All other theory-elements Ti of the
net come out of T0 by a process of specialization, i.e. by adding more special (and
more contentful) laws and conditions to the fundamental law(s), which frequently
involve the addition of new, more specific concepts, and by restricting the range
of intended applications to some specified systems. The relation of specialization
between theory-elements (let’s symbolize it by “”) may be constructed out of
Crystallizations as a Form of Scientific Semantic Change 213
Fig. 1 A theory-net
Kuhn called “normal science” and Imre Lakatos “a research program”. Example:
the evolution of Newtonian mechanics from the end of the seventeenth century
to the beginning of the twentieth century.3 In this kind of change, the “core
semantics” of the fundamental concepts appearing in the conceptual framework
of T0 is preserved, but there appear some minor semantic changes due to the
fact that the concepts get more concrete interpretations in different ways in
the theory-elements that are specializations of T0 . A further conceptual change
may possibly come from the appearance or disappearance of secondary concepts
(“parameters”) in the specializations that come and go.
– Replacement: A theory, i.e. a theory-net, aimed at systematizing a given domain
of intended applications is completely replaced by another theory-net (with a
very different conceptual framework) aimed at more or less the same domain
of intended applications. Formally, the older basic theory-element hM0 ; I0 i gets
replaced by a new basic theory-element hM?0 ; I?0 i, where M0 and M?0 are very
different, while I0 and I?0 overlap to a substantial degree. This corresponds
to one sense of what Kuhn has called a “scientific revolution”. Example: the
replacement of phlogiston theory by the oxidation theory.4 This is the more
extreme case of conceptual change within a discipline, though (pace Kuhn) we
may not speak of total semantic incommensurability since the interpretation of
the concepts describing I0 and I?0 cannot become completely disparate (unless
one wants to claim that there is absolutely no way to compare the relative merits
and demerits of the respective theory-nets—a claim that Kuhn himself apparently
didn’t want to make and that would be historically implausible anyway).
– Embedding: The models of a previous theory-net are (approximatively, and
perhaps not completely) embedded into the models of a more complex theory.
“Embedding” is to be understood here as meaning that the models of the first
theory (at least a substantial part of them) become partial structures of the models
of the second theory. This roughly corresponds to another sense of Kuhnian
“revolutions”. Example: the (approximative) embedding of Keplerian planetary
theory into Newtonian mechanics.5
– Crystallization: After the breakdown of a previous theory-net, and through a long
and gradual process, the models of a new theory are constructed in a piecemeal
fashion, with many intermediate, fragmentary states before a fully developed new
theory-net appears. Such a process is neither “normal” nor “revolutionary” in
anything like Kuhn’s sense, and I know of no other philosophers of science who
have dealt with this kind of process (with the possible exception of Yehouda
Elkana [6]). But I contend that it is quite frequent in the history of science.
Crystallization is much more complex than the previously mentioned types of
diachronic structures; it contains some elements of the other types but it also
has its own features. The slow emergence of phenomenological thermodynamics
3
The detailed structuralist reconstruction of this example will be found in [1, Ch. V.].
4
For a structuralist reconstruction of this example see [2].
5
This example is dealt with in [1, Ch. VII].
Crystallizations as a Form of Scientific Semantic Change 215
after the breakdown of the caloric theory seems to be a clear case in point. In
this process, the work of Clausius analysed in this paper played a crucial role
(though it was only a part of the whole process). In the crystallization of a new
theory, some of the old concepts are completely given up, but some others are re-
interpreted in a new fashion (according to new fundamental laws and in view of
new intended applications), and also some totally new concepts are introduced.
In the next sections, a portion of the crystallization of phenomenological thermody-
namics in the nineteenth century is reconstructed according to the general scheme
of evolving theory-nets. This portion of the historical process will be reconstructed
as three interrelated but essentially different theory-nets developed by Clausius, and
we’ll see how Clausius’s conceptual system gradually evolved.
The “paradigm” for the theory of heat from the end of the eighteenth century until
the mid-1820s was the caloric theory developed by Lavoisier, Laplace, Biot, and
others. This was a quite successful theory until it began to agonize in the late
1820s due to “internal” (conceptual) as well as “external” (application-related)
“anomalies”. A new “paradigm” for the theory of heat, i.e. a full-fledged theory
which was fully satisfactory both from the conceptual-mathematical and from the
empirical point of view, was to appear only much later, in 1876, with J. Willard
Gibbs’s On the Equilibrium of Heterogeneous Substances (first part) [7]. There we
have for the first time what we now call “phenomenological thermodynamics” in
its presently acknowledged form, almost indistinguishable from what is laid out
in contemporary textbooks on thermodynamics. In the meantime, that is, between
the 1820s and the 1870s we are confronted with a long and chaotic period of
partial attempts at the construction of a fully satisfactory theory of heat, which were
partially successful but which also partially failed, with very different theoretical
concepts and principles. In this long process, the names of Carnot, Clapeyron,
Regnault, Mayer, Helmholtz, Joule, Kelvin, Rankine, and above all Clausius,
played an important role. These scientists were working from different standpoints
and obtaining different, sometimes incompatible results. The culmination of this
process was Gibbsian thermodynamics.6 Clausius’s work was an essential part of
this process, itself a fragmentary crystallization within the general crystallization
process. This fragment is what is to be reconstructed in the next pages.
In the 1850s, Clausius published three papers on the theory of heat: “Über die
bewegende Kraft der Wärme” (1850), “Über eine veränderte Form des zweiten
Hauptsatzes der mechanischen Wärmetheorie” (1854), and “Über die Art der
Bewegung, die wir Wärme nennen” (1857) [3–5]. The last paper amounts to the
6
For the history of the emergence of Gibbsian thermodynamics see [8].
216 C.U. Moulines
first consistent exposition of the kinetic theory of gases, and in this sense, it is the
forerunner of statistical mechanics; as such, it is of no concern to us here, since we
are dealing with the development of (phenomenological) thermodynamics—which
is what Clausius, and others, called then the “mechanische Wärmetheorie”—the
“mechanical theory of heat”. So, let’s restrict our attention to the papers of 1850 and
1854. The 1850 essay [3] contains one theory, while the 1854 [4] paper contains
two theories, which presumably were conceived at different times. Each one of
these theories may consistently be represented as a theory-net in our sense. To
abbreviate, we’ll denote the 1850 theory(-net) by “Cl-1”, while the two theories (i.e.
theory-nets) laid out in the 1854 paper will be denoted by “Cl-2.1” and “Cl-2.2”,
respectively.
Clausius’s paper of 1850 [3] is divided into two parts, respectively, entitled: “I.
Consequences of the principle of the equivalence of heat and work”, and “II.
Consequences of Carnot’s principle in connection with the former”.7 Given the
form of their presentation, one could be misled in thinking that Clausius lays out
two different fundamental principles: the principle of the heat/work equivalence,
and the hypothesis that Clausius considers to be the core of the so-called “Carnot
principle”, namely that “whenever work produces heat while not producing a
permanent transformation in the state of the working body, a certain amount of
heat goes from the hot body to a cold one” [3, p. 4]. Now, this interpretation,
which is still to be found in contemporary expositions, seems to be misleading,
from the point of view both of Clausius’s original text and of a coherent formal
reconstruction of them. In fact, Clausius himself stresses several times that both
principles are “intimately linked”. It is more adequate, as we shall see, to interpret
both “principles” as two specifications of a more general and fundamental principle
that Clausius takes implicitly as his starting point.
To find out how the models of Cl-1 look like, two preliminary questions have
to be dealt with: First, what is the theory’s “basic ontology”, that is, what kinds of
objects are assumed, and second, what relations and functions defined over those
objects does one need to formulate the theory’s laws.
1. Clausius speaks once and again of bodies, that is, of concrete, spatio-temporally
located entities that may be in physical contact and/or have component parts. In
some passages of his paper, he also has to speak about a more abstract kind of
entities: chemical substances to which the concrete bodies may belong; but this
he does only for some specializations of his theory. This means that his basic
7
Here, as in the rest of the paper, I translate as truthfully as I can from the original German text.
The page numbers indicated in the quotations are those of the original essay.
Crystallizations as a Form of Scientific Semantic Change 217
8
One needs a continuous sequence of states as arguments of differentiable magnitudes. Clausius
assumes throughout his text without further ado that most thermodynamic magnitudes are
differentiable functions.
9
Clausius assumes that, according to Joule’s experiments, the value of A should be approximately
1/421. But A’s concrete value is irrelevant for the formal reconstruction of the theory.
218 C.U. Moulines
10
A further interesting aspect of U should be noted: Clausius explicitly avows that he introduces
this magnitude in order to formulate a differential equation that will allow for a derivation of the
ideal gas law, U making sense only in the context of the equation at stake. In other words, U
appears to be Cl-1-theoretical. The same goes for the function h introduced later on—see below.
11
Clausius writes at the end of his paper that experimental results suggest that the actual value of a
should lie “around 273”; but, again, the concrete value is unimportant for the formal reconstruction.
Crystallizations as a Form of Scientific Semantic Change 219
Having settled the conceptual framework of Cl-1, the next step is to provide
an adequate reconstruction of the fundamental law(s) of the basic theory-element
of Cl-1. A general metatheoretical hypothesis of structuralism is that, in theories
having attained a certain degree of unity and structural complexity, there will be just
one fundamental law, playing the role of a guiding principle and having (almost) no
empirical content when taken in isolation. At first sight, Cl-1 seems to contravene
this hypothesis, since Clausius explicitly states two axioms: the Q/W-equivalence
and “Carnot’s Principle”. However, I’ll argue that appearances mislead us here and
that the metatheoretical just-one-fundamental-law hypothesis continues to be valid
in the present case.
Clausius formulates the Q/W-equivalence principle in the first part of his article
[3, p. 17]:
the heat consumed divided by the work produced D A.
As for the Carnot Principle, which is relevant for the second part of the essay,
Clausius states it in a purely informal way:
to the production of work as equivalent there corresponds a mere transfer of heat from a hot
body to a cold one [3, p. 30].
It is striking that, when Clausius goes on to formulate the Carnot Principle in the
second part of his article, he presupposes that there is an equivalence of work and
heat, that is, he presupposes the first principle. And, as already remarked, there are
other passages in the article where he emphasizes that both principles are “intimately
connected”. Therefore, we are confronted with two possible interpretations: (a)
the equivalence principle is the theory’s only fundamental law and the Carnot
Principle is a specialization thereof in the structuralist sense; or (b) both principles
constitute in fact two aspects of one and the same (implicit) guiding principle that
is to be formulated synthetically. Both interpretations are coherent with the rest of
the original text. Nevertheless, I favor the second one since it is easy to make it
explicit and, moreover, it seems to me to be more in accordance with Clausius’s
original intentions. Indeed, if we admit that Q and W are differentiable functions
(as Clausius himself presupposes all the time), then it is immediate to state both
principles for any body k being in the state z as:
Equivalence Principle:
dQ.k; z/
D A 2 <I (2)
d W .k; z/
Carnot’s Principle:
d W .k; z/ > 0 ! dQ.k; z/ < 0: (3)
This is the fundamental law of Cl-1. It is easy to see that, as is usual in such kinds
of laws, this formula by itself is almost devoid of empirical content since Q and
W , taken in isolation, cannot be concretely determined (they are Cl-1-theoretical).
They have to be put in some kind of testable relationship with the other primitive
magnitudes of the theory: P , V , t. And, in fact, in his argumentation, Clausius
always presupposes that Q and W are dependent on these magnitudes at least.
Speaking in more formal terms, it may be said that Q and W are functionals of
P , V , t, that is, there are functions f Q and f W (which are functions of functions)
expressing the general form of the dependence of Q and W on P , V , t.
The preceding considerations allow now for a formal definition of the models
of the basic theory-element of the net Cl-1. For more perspicuity, let’s skip out the
explicit indication of the formal features of each single component of the models
(i.e. whether it is a finite or infinite set, a differentiable function, and so on) as well
as the arguments of the magnitudes (clearly the latter are always functions from
physical bodies and their states into real numbers), whenever there is no danger of
confusion.
Cl-1: x 2 M0 [Cl-1] iff: 9 K; Z; P; V; t; Q; W; A; f Q ; f W , such that
(0) x D hK; Z; <; A; P; V; t; Q; W i
(1) Q D f Q .P; V; t; : : : /
(2) W D f W .P; V; t; : : : /
(3) A>0
(4) dQ D A d W
We could synthesize conditions (1)–(4) into one “big” formula:
Thus, the net’s basic theory-element is: hM0 [Cl-1]I I[CI-1]i, as defined.
12
It is not clear in the original text whether Clausius wanted to conceive A as a positive or a
negative value. In the first case, of course, he should have worded his “Equivalence Principle”
slightly differently. But this is a minor point.
Crystallizations as a Form of Scientific Semantic Change 221
R dV dt
fW D (p. 15)I (6)
V
d dQ d dQ
fQ D dVdt (p. 17): (7)
dt dV dV dt
As Clausius himself recognizes, the second specification is somewhat problematic
since the notion of a “heat differential” is itself physically and mathematically
dubious. However, by combining it with the less problematic determination of
f W , Clausius comes to the conclusion that one can establish a more plausible
differential equation if we postulate that there must exist an “arbitrary” function U
that allows for a mathematically correct formulation. (I have already remarked that
this “U ” is “our” internal energy.) Therefore, here we have an example of a brand-
new theoretical concept introduced for the purpose of giving the theory a coherent
conceptual framework; this new notion only makes sense if we already assume the
validity of the theory’s laws, and in particular the following specialization of the
fundamental law:
aCt
dQ D dU C A dV: (8)
V
This is the most general specialization for ideal gases. Using the additional primitive
concepts we have already indicated above (U , R, cV , cP , a, and the implicit notion
of the set of gaseous substances, ) we come to the following definition of the
models of this specialization:
G : x 2 MG iff: 9 K; Z; ; V; t; Q; W; U; R; cV ; cP ; A; a; f Q ; f W , such that
(0) x D hK; Z; ; <; A; a; P; V; t; Q; W; U; R; cV ; cP i
(1) hK; Z; <; A; P; V; t; Q; W i 2 M0 [Cl-1]
(2) dQ D dU C A R aCt V dV
For Clausius, the second term of the sum in (2) represents the “external work” done
by the gas. One obtains a specialization of this specialization by just assuming that
this work has simply the form “p dV”. In such a case, by a simple calculus we obtain
the law of ideal gases. We may call “IG” this specialization of the specialization G:
222 C.U. Moulines
We are now in a position to build up the graph representing the net Cl-1 (Fig. 2),
where the arrows represent successive specializations.
Crystallizations as a Form of Scientific Semantic Change 223
In his 1854 paper, Clausius [4] makes a new attempt at establishing phenomeno-
logical thermodynamics on a well-founded basis. This attempt is more restricted
than Cl-1: It almost exclusively deals with cycles (“Kreisprozesse”). He defines
a cycle as a “transformation” (“Verwandlung”) through which a body comes
back to its original state. Clearly, this new concept is inspired by Carnot’s (and
Clapeyron’s) pioneering theoretical work on the steam engine. However, it is only
in Clausius’s second theory that it appears as an “official” scientific notion of the
new discipline of thermodynamics. Another important difference with respect to the
first article is that Clausius presents now the “two principles of thermodynamics”
in a completely independent manner, as if they would correspond to two different
theories. The article is divided into two sharply separated parts with quite different
titles, the second part being much longer and better articulated. Even by the style of
exposition, one gets the impression that the two parts were written at two different
periods of the development of Clausius’s thought. The two principles now get the
wording:
I. “Principle of the equivalence of heat and work”
II. “Principle of the equivalence of transformations”.
Formally, there is no other way but to reconstruct the two parts as two different
theory-nets, which we will call “Cl-2.1” and “Cl-2.2”, respectively. Let’s analyse
first Cl-2.1. Other than in the 1850 article, Clausius now states the heat/work
equivalence principle by immediately introducing the “anonymous” function U as
a basic concept, and he makes the following symptomatic remark about it: “U [is
a magnitude] that we cannot specify at present, but of which we know at least that
it is completely determined by the initial and the final state of the body” (p. 130).
Consequently, he formulates the fundamental law of Cl-2.1 as follows:
Q D U C A W: (13)
224 C.U. Moulines
Comment: The symbol “W ” employed here by Clausius doesn’t refer now to the
“total work”, as in Cl-1, but only to the “external work”.
The first specialization considered refers to cycles. For such cases, Clausius
writes: U D 0. Presumably, what he wants to express is what we now would write as
I
dU D 0 (14)
dQ D dU C A p dV: (15)
Towards the end of this first part of the essay, Clausius suggests that one could
consider further specializations of this specialization “by applying this formula to
certain kinds of bodies. For the two most important cases, namely for ideal gases
and for steam at a maximum density, I have developed this more special application
in my precedent essay” (p. 133). Obviously, he is referring to the net Cl-1.
Let’s undertake now the structuralist reconstruction of Cl-2.1. Clearly, the
conceptual framework is essentially the same as for the models of the basic theory-
element of Cl-1 but for the additional primitive concept “U ”. That is, the basic
models have the form hK; Z; <; P; V; t; Q; W; U i. Formally, this means that the
models of M0 [Cl-1] become now substructures of the models of M0 [Cl-2.1].
M0 [Cl-2.1] is determined by the same conditions as M0 [Cl-1], except for the
introduction of an additional “functional” to express the dependence of U with
respect to V and t [4, p. 131] and for the new form of the fundamental law:
Cl-2.1: x 2 M0 [Cl-2.1] iff: 9 K; Z; P; V; t; Q; W; A; f Q ; f W ; f U , such that
(0) x D hK; Z; <; A; P; V; t; Q; W; U i
(1) Q D f Q .P; V; t; : : : /
(2) W D f W .P; V; t; : : : /
(3) A>0
(4) U D f U .V; t/
(5) Q DU CAW
8 Specializations of Cl-2.1
The first specialization of this theory-net deals with cycles. Intuitively, a cycle is a
process, that is, in our terms, a sequence Z of states with a first element (the “initial
state”) and a last element (the “final state”) such that the values of all fundamental
magnitudes are the same in the two states. We could define this notion in completely
formal terms within our model-theoretic framework, but since this involves some
Crystallizations as a Form of Scientific Semantic Change 225
technicalities, for reasons of brevity, let’s just presuppose that the notion of a cycle
has already been defined. Then, the specialization looks like this:
C : x 2 MŒC
iff: 9 K; Z; P; V; t; Q; W; A; U such that
(0) x D hK; Z; <; A; P; V; t; Q; W; U i and x 2 M0 [Cl-2.1]
(1) Z is a cycleH
(2) 8k 2 K W dU.k; z/ d z D 0
As remarked, the other specialization Clausius considers in this part of the
article is completely independent of the previous one: In it, it is assumed that W
takes a particularly simple form: P dV. We may call it the “uniform pressure
specialization”, UP.
UP : x 2 MŒUP
iff: 9 K; Z; P; V; t; Q; W; A; U such that
(0) x D hK; Z; <; A; P; V; t; Q; W; U i and x 2 M0 [Cl-2.1]
(1) dQ D dU C A P dV
As Clausius himself indicates, from this latter specialization one could obtain the
law of ideal gases as well as the laws for steam systems. This means that we can
embed one part of the previous theory-net Cl-1 into the new net Cl-2.1. This is a
typical move in a crystallization process.
The graph of the theory-net Cl-2.1 is accordingly (Fig. 3).
As already noted, the theory-net Cl-2.2 is exclusively devoted to the most general
treatment of the “Second Principle” as completely independent of the “First
Principle”. And its intended domain of applications are solely cycles.
To begin with, Clausius provides an intuitive formulation of the “Second
Principle” that he considers to be more general than the one attributed to previous
researchers (notably Carnot and Clapeyron): “Heat can never go from a cold to a
hotter body unless another transformation takes place that depends on the first one”
[4, p. 134]. Clausius sets himself the task of finding a mathematically appropriate
formulation of this principle. The first step consists in introducing the new notion of
“equivalence value” (“Aequivalenzwerth”) of a transformation (of heat into work or
conversely). His proposal is
1
Equivalence value of W W f .t/ Ddf Q .
T .t/
Here, there is a new “anonymous” function T , of which Clausius says: “T is
an unknown function of the temperature appearing in the equivalence values” [4,
p. 143]. This is for us again an example of a new theoretical concept introduced for
the purpose of providing a coherent framework to the theory. Only at the end of the
paper, Clausius suggests that T may simply be regarded as the absolute temperature.
226 C.U. Moulines
10 Specializations
1 d T .k; z/ A
(2) D
T .k; z/ dt.k; z/ C.k; z/
The other specialization line concerns irreversible cycles.
IRREV: x 2 MŒIRREV
iff:
(0) x 2 M0 [Cl-2.2]
I
dQ.k; z/
(1) 8k 2 K dz > 0
T .k; z/
From this, we may obtain a further specialization by admitting Regnault’s additional
hypothesis above-mentioned.
REGN: x 2 MŒREGN
iff:
(0) x 2 IRREV
(1) dQ=dV D A P
Within this same branch of the net, we may get still another specialization by simply
incorporating the ideal gas law we have already reconstructed, M[IG]. It is not
necessary to repeat its formulation here.
By combining M[REGN] with M[IG], that is, by constructing a specialization
common to both specializations that we may call “AT” (for “absolute temperature”),
or in model-theoretic terms, by taking M[AT] D M[REGN]\M[IG], the following
result may be obtained:
If x 2 MŒAT
, then: T D 273 C t.
Accordingly, the theory-net Cl-2.2 has the form (Fig. 4).
We may now compare the three theory-nets implicit in Clausius’s writings
examined in our exposition, [Cl-1], [Cl-2.1], and [Cl-2.2]. It is to be noted that
some (but not all) specializations of [Cl-1] are embedded into the nets [Cl-2.1] and
[Cl-2.2], while some of those in [Cl-2.1] are embedded into [Cl-2.2].
Crystallizations as a Form of Scientific Semantic Change 229
References
1. Balzer, W., Moulines, C.U., Sneed, J.D.: An Architectonic for Science. D. Reidel, Dordrecht
(1987)
2. Caamaño, M.C.: A Structural Analysis of the Phlogiston Case. Erkenntnis 70(3), 331–364
(2009)
3. Clausius, R.J.: Über die bewegende Kraft der Wärme. Poggendorff’s Annalen der Physik XCII
(1850)
230 C.U. Moulines
4. Clausius, R.J.: Über eine veränderte Form des zweiten Hauptsatzes der mechanischen
Wärmetheorie. Annalen der Physik 93 (1854)
5. Clausius, R.J.: Über die Art der Bewegung, die wir Wärme nennen. Annalen der Physik 100
(1857)
6. Elkana, Y.: The Discovery of the Conservation of Energy. Hutchinson, London (1974)
7. Gibbs, J.W.: On the Equilibrium of Heterogeneous Substances, Part I, Transactions of the
Connecticut Academy of Arts and Sciences, vol. 3 (1876), reprinted in The Scientific Papers
of J. Willard Gibbs (ed. by Bumstead, H.A., Van Name, R.G.), New York; 1906 (2nd edition,
New Haven, 1957)
8. Moulines, C.U.: The Classical Spirit in J. Willard Gibbs’s Classical Thermodynamics. In:
Martinas, K., Ropolyi, L., Szegedi, P. (eds.) Thermodynamics: History and Philosophy. World
Scientific, Singapore (1991)
Index
Abstraction, 88, 89, 94, 95, 97 Formal semantics, 161, 165, 166
Artificial intelligence, 127–130, 135, 140, 141,
147
Automated reasoning, 175, 176, 186 Genetic information, 73–75, 80
Genuine singular term(s), 6, 8–11
Grand unified theory, 96
Behaviour, 48, 50, 52
Biological language, 153, 154
Information
structure, 101, 102, 110, 112, 124
Compatibility, 94 value, 82
Compositional semantics, 111 Instrumentalism, 21, 24, 28
Computational complexity, 167, 168, 170 Invariance, 89, 94
Conceptual framework, 209–211, 213, 214,
219, 221, 224, 227
Context dependence, 69 Knowledge evolution, 189, 190, 206
Convention, 105–107 Knowledge representation, 154, 167, 168, 170,
Coreference, 101, 102, 110 171
Crystallization, 209, 214, 215, 225, 229
Formal logics, 155, 161, 166, 167 Natural language, 152–154, 158, 161, 165,
Formal reasoning, 154, 162, 166, 167, 171 166, 171
Natural language processing, 166, 170, 171 Semantic computation, 155, 163, 166–171
Neural code, 84 Semantic information, 67, 68, 79–82
Non-classical systems, 31, 54 Semantic relations, 156, 157, 165, 166
Normative view on reference, 10, 11 Semantic representation language, 154, 155,
157–161, 165, 166, 169, 170
Semantic technologies, 151, 163, 166, 167,
Ontology(ies), 32–34, 41, 43, 44, 56, 97, 98, 169, 171
113, 114, 116, 119, 120, 124, 163, Semantic Web, 153, 166–168, 171
164, 166, 169, 171, 175, 180, 181, Semantics of biological language, 153
183, 184, 186, 189, 190, 193 Semantics of natural language, 153–156, 161,
deployment, 175, 181, 183, 186 164, 170, 171
design, 175, 181, 186 Semiotic square, 90
OWL. See Web Ontology Language (OWL) Structural belief revision, 136
Structural equality, 89, 95, 98
Structural science(s), 13, 14, 27, 28, 79
(Phenomenological) thermodynamics, 209, Structuralism, 209, 211, 213, 219, 224, 227
214–216, 223, 229 Structure, 31, 33–39, 47–50, 52, 56, 57, 87–89,
Pragmatic maxim, 14, 16, 19, 20, 22, 26, 27 93, 97
Pragmatism, 13, 14, 20, 22 Structured data, 152, 153, 160, 166, 171
Pragmatism, structural, 14, 26, 28 Symbol / symbolic system(s), 101, 102, 107,
Preferential belief revision, 131, 134, 139 112, 124
PubMed, 191, 202 System, 31–36, 41–49, 51, 52, 57, 59, 61
border, 31, 38, 39, 41, 47, 52
surface, 31, 38–41, 44, 46, 52, 57
Qualitative data, 152–154, 163, 169–171
Quantitative data, 151, 152, 154, 163, 170
Theory
Quantum mechanics, 48, 54, 57
element(s), 211–214, 216, 223, 225
net(s), 211–217, 221, 223–225, 228, 229
Thermodynamics, 43, 56, 57
Rational belief revision, 133
RDF (Resource Description Framework), 168
Reasoning, 127–129, 131, 134–137, 140, 143, Unstructured data, 153, 160, 166, 171
145, 148
Reconstruction, 21, 22, 25–27
Reference, 1, 10, 11 Value scale, 67, 82
Reification, 1, 7
Representation, 89, 92, 94
Resistance to change, 128–130, 133, 134 Web Ontology Language (OWL), 168, 177,
Rigid designator(s), 8, 11 180–182, 184–186