Professional Documents
Culture Documents
SFI Working Papers contain accounts of scientific work of the author(s) and do not necessarily represent the
views of the Santa Fe Institute. We accept papers intended for publication in peer-reviewed journals or
proceedings volumes, but not papers that have already appeared in print. Except for papers by our external
faculty, papers must be based on work done at SFI, inspired by an invited visit to or collaboration at SFI, or
funded by an SFI grant.
©NOTICE: This working paper is included by permission of the contributing author(s) as a means to ensure
timely distribution of the scholarly and technical work on a non-commercial basis. Copyright and all rights
therein are maintained by the author(s). It is understood that all persons copying this information will
adhere to the terms and constraints invoked by each author's copyright. These works may be reposted only
with the explicit permission of the copyright holder.
www.santafe.edu
SANTA FE INSTITUTE
Language Networks: their structure, function and evolution
Ricard V. Solé,1, 2 Bernat Corominas Murtra,1 Sergi Valverde,1 and Luc Steels3, 4
1
ICREA-Complex Systems Lab, Universitat Pompeu Fabra (GRIB), Dr Aiguader 80, 08003 Barcelona, Spain
2
Santa Fe Institute, 1399 Hyde Park Road, Santa Fe NM 87501, USA
3
Sony CSL Paris, 6 Rue Amyot, 75005 Paris
4
Vrije Universiteit Brussel, Pleinlaan 2, 1050 Brussels
Several important recent advances in various sciences (particularly biology and physics) are based
on complex network analysis, which provides tools for characterising statistical properties of net-
works and explaining how they may arise. This article examines the relevance of this trend for the
study of human languages. We review some early efforts to build up language networks, charac-
terise their properties, and show in which direction models are being developed to explain them.
These insights are relevant, both for studying fundamental unsolved puzzles in cognitive science,
in particular the origins and evolution of language, but also for recent data-driven statistical
approaches to natural language.
É Ë ² §¯¯§Å §¨ ª §© Á ¨ ¨ § º¿ «
". /10#2 3465#087:9;4=<;9;4>3@?BAB9#<CAEDF465#0825:<GFAE9#C19#H@50#2 ¨ ªt«
?I57JAEKL9#K+DNMO PQ2O 5KSR!?UT+9@2#T#9<WVX5#2#258D@5:?IO 2 TJ9LYE557 ¦ÂE°Ã ´
5MZ5K@A@[ <I5?UKX\B]_^I` a a#bc%deb"fZgihjWa"k` lnmNo:T@AEKI465#089<CAEDS7ZA ¨ ³i ²
ª ¼«
2 58<GUA9#C19#H@50#2#?B57ZAEKS9#K#DnMOPQ2O 5K8pq<E9#2#D@5?UKr5KZ2 T@A ² ª@« ³ ½®³ «
H#9#K#CQ<_5 M:9SYEO sAQYU9K#DZHtAiV>9#KL258?B5K+D@AQYU?FT#9@2#2 T@AB?I5#YiD#< ª §¨¨ © ® ³"³ ¸ ¼ « ªn® ¼ À¨ ¯µ « ¸
»³)¬ «%¼ ¬ «%²
7JAE9#K@2m@utTtA4v78O V>T+2#7ZA9#KL<O%7WGxw 4=9>MtAE?yYEAE789@YiC<U9#H+5#02 ®» ¨¯«³
z 9#K#K;4e/10Y;K@A%4W{n9ZD@AE?|7J5#YEAB9+H@5#02}#9#K@An~80<2 AEKt{X9ru@YiO R Á)®%¼ ¨ !³ ® « ª §¯© ¼ ¬ º» ¼ ³ « ¸ ©«²
ªn«® ³
H@0#2AB25Z2 T@Aq/BYK#2AE<F9#K#DJ98>CAQ2 PQTL5%M_L9#?I5#Y2 T8U9@Yi<E5K+9#R ¬" ³ « ¹ µ ®%² ¼ ¯ µ ¼ ª ª § µ¯
VXA10K#DtAQYx<K@5?I{X<Q57ZAB?IO2 2 O PO <E7O MqGx5<E<O%H+w AI9#H#5#0+2E
O <E< ¦ ¹
°!± ¨ § ª+Á ¿ ¸ ¨ Á)« ® À
O 2 M#5#Y;D+{X9LYAE<GUAP2 M#0w@9#w w0<O5KJ2 5L8AQ5#Y"VXAq1w w O5#2 {X9SYEA M#AQR µ ® ¾i« ¼ «¨ Á«Å ¯ © »¿
³ «
YEAEKtPA125q
Y;<I19<CAw%w@9K#DL5KtAB?B5#0+w%DJT#9sABD@5K@AQm/B0+2t9+2 ² !»"¿ ¬ ¦ÇEÈ ¨ À« ¯ Å µ « ¿¿§ ¯
² §¯µ
<QAQP5K#DJ<O V>T+2@2%TtAB?B5+Y;D#<F<QAQAE7JAEDLKt5#2#<E58<QO 7WGxwAQmQ ¨ ¼ « © « ¼ « ³ Å«
² ³ ¼ § ¾« ¼
¯ ¼ § º!» ¯ « ¬! «¼ «
F=>6r@ LX@ Z¡N¢8£ ¤JZ¥1¡ º ® ³ À¨ ®¨ À« ¿ ¿ º« ®³ ¹ ¹
º ¼ Ä ³ ¯ «¨ ¹ ¹ ´° ® ¿ ¿ » ¨ § ³
ª#¼ ¨
§ «%Æ)Á ¿ ® § ³
Ê ¯¼¸
© § Å%¯ § !³ ª@«
®¨ À« ¬
² §¿¿ ´ ¶i·
!³ ¯ ¸ "»
² ³i¬ «%¼
¨®¯ ¹ ª@® ¸ ²#«
¬ ² ³ ² ¼¬ ¨ ² µ ®¯
¨§ µ ¯ µ ® ¨ ² µ «³
³) ¯ ¹ º» ¯ ¨®¸
¨ ª «¬
¨ « « n ¨ «Å !³)¬ ®¯
¨ § ª@Á ¿ « ªn«® ³ ¯
Ì "
Í
* $
#
*
"
)
,
$$
" +
" * $
!
)%
,
$
% !, * , $ !
- * $
#
$$
!
,
& ' " ( *"
"
" "
%
+%
"
$$
"
$ $
)
* $
, "
FIG. 1 How to build language networks. Starting from a given text (a) we can define different types of relationships among
words, including precedence relations and syntactic relations. In (b) we show them using blue and black arrows, respectively. In
figures (c) and (d) the corresponding co-occurrence and syntactic networks are shown. Paths on network (c) can be understood
as the potential universe of sentences that can be constructed with the given lexicon. An example of such path is the sentence
indicated in red. Nodes have been coloured according to the total (in- and out-) word degree, highlighting key nodes during
navigation (The higher the degree the lighter its colour). In (d) we build the corresponding syntactic network, taking as a
descriptive framework dependency syntax (50), assuming as criterion that arcs begin in complements and end in the nucleus
of the phrase; taking the verb as the nucleus of well-formed sentences. The previous sentence appears now dissected into two
different paths converging towards “try”. An example of the global pattern found in a larger network is shown in (e) which is
the coocurrence network of a fragment of Moby Dick. In this graph hubs we have the, a, of, to.
to be shared by most complex networks, both natural connected: it is very easy to reach a given element from
and artificial. The first is their small world structure. In another one through a small number of jumps (29). In
spite of their large size and sparseness (i. e. a rather social networks, it is said that there are just six degrees
small number of links per unit) these webs are very well of separation (i.e. six jumps) between any two randomly
3
chosen individuals in a country. The second is less ob- interactions has been shown to influence the emergence
vious, but not less important: these webs are extremely of a self-consistent language (36)
heterogeneous: most elements are connected to one or
two other elements and only a handful of them have a
very large number of links. These hubs are the key com- Box 1. Measuring network complexity
ponents of web complexity. They support high efficiency
of network traversal but are for the same reason their Several key statistical measures can be performed on a
Achilles heel. Their loss or failure has very negative con- given language network as we illustrate here by focusing
sequences for system performance, sometimes even pro- on the co-occurrence of words. If W = {Wi } is the set
moting a system’s collapse (30). of words (i = 1, ..., N ) and {Wi , Wj } is a pair of words, a
Language is clearly an example of a complex dynami- link can be defined based on a given choice of word-word
cal system (31; 32). It exhibits highly intricate network relation. The number of links of a given word is called
structures at all levels (phonetic, lexical, syntactic, se- its degree. The total number of links is indicated by L,
mantic) and this structure is to some extent shaped and and the average degree < k > is simply < k >= 2L/N .
reshaped by millions of language users over long peri-
ods of time, as they adapt and change them to their Path length: it is defined as the average minimal dis-
needs as part of ongoing local interactions. The cogni- tance between any pair of elements.
tive structures needed to produce, parse, and interpret
language take the form of highly complex cognitive net- Clustering coefficient: is the probability that two ver-
works as well, maintained by each individual and aligned tices (e.g. words) that are neighbors of a given vertex
and coordinated as a side effect of local interactions (33). are neighbors of each other. It measures the relative
These cognitive structures are embodied in brain net- frequency of triangles in a graph.
works which exhibit themselves non-trivial topological
Random graphs: they are obtained by linking nodes
patterns (25). All these types of networks have their
with some probability. The most simple example is
own constraints and interact with the others generating
given by the so called Erdös-Renyi (ER) graph for which
a dynamical process that leads language to be as we find
any two nodes are connected with some given probabil-
it in nature.
ity. For a random graph, we have very small clustering
The hypothesis being explored in this article is that (with C 1/N ) and D ≈ log N/ log hki.
the universal laws governing the organization and evolu-
tion of networks is an important additional causal factor Small world (SW) structure: this networks have a
in shaping the nature and evolution of language (34). high clustering (and thus many triangles) but also a very
Commonalities seen in all languages might be an indi- short path length, with D ∼ log N as in random graphs.
cator of the presence of such common, inevitable genera- A small world graph can be defined as a network such
tive mechanisms, independent on the exact path followed that D ∼ Drand and C Crand .
by evolution. If this is the case, the origins of language
Degree distributions: They are defined as the fre-
would be accessible to scientific analysis, in the sense
quency P (k) of having a word with k links. Most com-
that it would be possible to predict and explain some of
plex networks are characterized by highly heterogeneous
the observed statistical universals of human languages in
distributions, following a power law (scale-free) shape
terms of instantiations of general laws. Current results
P (k) ∼ k −γ , with 2 < γ < 3.
within the new field of complex networks give support to
this possibility.
TABLE I Statistical universals in language networks. The data shown are based on different studies involving different corpuses
and different languages (41–44; 48). The small world character of the webs appears reflected in the short path lengths and
the high clustering. Here we compare the value of C with the one expected from a purely random graph C rand . The resulting
fraction C/Crand indicates that the observed clustering is orders of magnitude larger than expected from random.
0 0 0
10 10 10
-1 -1 -1
10 10 10
Cumulative distribution
-2 -2 -2
10 10 10
-3 -3 -3
10 10 10
-4 -4 -4
10 10 B 10 C
A
-5 -5 -5
10 0 1 2 3 4 10 0 1 2 3 4 10 0 1 2 3 4
10 10 10 10 10 10 10 10 10 10 10 10 10 10 10
k k k
FIG. 3 Scaling laws in language webs. Here three different FIG. 4 Defining a bipartite graph of word-meaning associa-
corpuses have been used and the co-occurrence networks have tion. A set of words (signals) W is linked to a set R of objects
been analysed for: (a) basque, (b) english and (c) russian. of reference. Here some words are connected to a single object
Specifically, we computed the degree distribution P (k) (box 1) (W1 , W2 , W4 ) and other are polysemous (W3 ). Some objects
and in order to smooth the fluctuations the cumulative are linked to more than one word: W1 and W2 are thus syn-
P distri-
bution has been used, being defined as P> (k) = P (j). onyms.
j>k
Each corpus has 104 lines of text. Although some differences
exist (44), their global patterns are rather similar, thus sug-
gesting common principles of organization. gence within human populations. The ontogeny of lan-
guage is strongly tied to underlying cognitive potentials
in the developing child. But it is also a product of the
This is highly relevant for two reasons. First it shows exposure of individuals to normal language use (without
that language networks indeed exhibit certain significant formal training). On the other hand, the child (or adult)
statistical properties, confirming the discovery of a new is not isolated from other speakers. The emergence and
type of universals in human languages similar to the uni- constant change in language needs to be understood in
versals found in (statistical) physics. Second, it strength- terms of a collective phenomenon (32) with continuous
ens the case that these statistical properties are a conse- alignment of speakers and hearer to each other’s lan-
quence of universal laws governing all types of evolving guage conventions and ways of viewing the world (33).
networks, independent of the specific cognitive and so- Although the two previous views seem in conflict, they
cial processes that generate them. In the next section, are actually complementary. It depends on what level of
we explore this topic further. observation is considered. Without a social context in
which a given language is sustained through invention,
learning and consensus, no complex language develops.
III. NETWORK GROWTH AND EVOLUTION But unless a minimal neural substrate is present, it is
not possible for language users to participate in this pro-
What could the structural principles for language net- cess.
works be? Language networks are clearly shaped at two The next question is: What is the nature of the shaping
different scales (at least). The first is linked to language that affects the statistical properties of network growth?
acquisition by individuals, and the second to its emer- This question is still wide open and some successful ap-
6
proaches based on simple rules of network growth have tive efforts of the hearer and the speaker are properly
been presented (48; 55). An interesting approach, based defined ((13). If we indicate by Es and Eh the speaker
on constraints operating on communication and coding, and hearer’s efforts, respectively, it is possible to assign a
can be defined in terms of two forces: one that pushes weight to their relative contribution by means of a linear
towards communicative success, and another one that function:
pushes towards least effort, a principle already used by
Zipf (5) and Simon (56). The first step is to define ide- Ω(λ) = λEs + (1 − λ)Eh (3)
alised simpler models of communicative interactions in
the form of language games (57), possibly instantiated where 0 < λ < 1 is a parameter. Effort is defined in
in embodied agents (58). In the simplest language game terms of information theoretic measures (59). For Es ,
(usually called the Naming Game) a mapping between the number of different words in the lexicon can be used
signals (words) and objects of reference is maintained by (the signal entropy), whereas for Eh , the ambiguity for
each agent (Box 2). This map defines a bipartite net- the hearer. The minimal effort for the speaker is obtained
work (Fig.4). The most successful communication arises when a single word refers to many objects (Fig.4a) but
obviously when the individual lexicons take the form of this is maximal effort for the hearer (most ambiguous
the same one-to-one mapping between names and ob- lexical network). The opposite case is indicated in Fig.4c.
jects. Many simulations and theoretical investigations It provides minimal effort for the hearer, because the
have now shown that this state can be rapidly reached speaker is using one word for each object, but that means
when agents attempt to optimize their mutual under- maximal effort for the speaker.
standing and adjust weights between signals and objects By tuning λ, we can move from one case to the other.
(37). What happens in between? Interestingly, a sharp change
occurs at some intermediate λc value, where we observe
a rapid shift from no-communication to one-to-one net-
works. This is a phase transition similar to the ones found
Box 2. Word-meaning association networks in physics. At this phase transition (Fig.4b) we recover
for Naming Games several quantitative traits of human language. The first is
Let us consider an external world defined as a finite Zipf’s law: if we map the number of links of a word to its
set of m objects of reference (i. e. meanings): frequency (as it occurs in real language) the previous scal-
ing relation is recovered at close to the critical λc point.
R = {R1 , ..., Ri , ..., Rm } (1) Such emergence of scaling close to critical points shares
a number of characteristics with scenarios observed in
and the previous set words (signals) now used to label
physical systems under conflicting forces (19). An ad-
them W = {Wi }. If a word wi is used to name
ditional finding is that the bipartite graph with a word
a given object of reference Rj , then a link will be
frequency distribution following Zipf’s law can also im-
established between both. Let us call A = (aij ) the
pose strong constraints on syntactic networks (60; 61). It
matrix connecting them, where 1 ≤ i ≤ n and 1 ≤
is worth mentioning that the presence of a gap has imme-
j ≤ m. Here aij = 1 if the word ai is used to refer
diate consequences for understanding the origins of the
to rj and zero otherwise. A graph is then obtained,
unique character of human language compared to other
the so called lexical matrix, including the two types
species: a complexity gap in the patterning of word in-
of elements (words and meanings) and their links.
teractions would be required should be overcome in order
This is known as a bipartite graph. An example of
to achieve symbolic reference.
such graph is shown in figure 3 where A reads:
1 1 0 0
0 0 1 0 IV. DISCUSSION
A= (2)
0 0 1 0
0 0 0 1 This article argued that there are statistical universals
in language networks, which are similar to the features
This matrix enables the computation of all relevant found in other ’scale free’ networks arising in physics, bi-
quantities concerning graph structure. ology and the social sciences. This observation is very
exciting from two points of view. First, it points to new
Next, we not only consider communicative success types of universal features of language, which do not fo-
but also the cost of communication between hearer and cus on properties of the elements in language inventories
speaker in order to connect lexical matrices as studied as the traditional study of universals (e.g. phoneme in-
in language games with the kind of statistical regulari- ventories or word order patterns in sentences) but rather
ties made visible by language network analysis. As noted on statistical properties. Second, the pervasive nature of
by George Zipf the conflict between speaker and hearer these network features suggests that language must be
might be explained by a lexical tradeoff which he named subject to the same sort of self-organization dynamics
the principle of least effort (5). This principle can be than other natural and social systems and so it makes
made explicit using the lexical matrix, when the rela- sense to investigate whether the general laws governing
7
CognitiveConstraints
Production
Syntactic
Network
Coocurrence
... Evolution Social Interaction
Network
Acquisition ... Network
Semantic Comunication
Networks
Cognitive Constraints
in Categorization and
Conceptualization
Meaning
Consensus
Physical/Articulatory
Constraints
FIG. 6 The network of language networks. The three classes of language networks are indicated here, together with their links.
They are all embedded within embodied systems belonging themselves to a social network. The graph shown here is thus a
feedback system in which networks of word interactions and sentence production shape, and are shaped by, the underlying
system of communicating agents. Different constraints operate at different levels. Additionally, a full understanding of how
syntactic and semantic networks emerge requires an exploration of their ontogeny and phylogeny.
BCM thank Murray Gell-Mann, David Krakauer, Eric D. [9] Chomsky, N. 2000. Minimalist inquiries: the framework.
Smith, Constantino Tsallis for useful discussions on lan- In Roger Martin, David Michaels, and Juan Uriagereka,
guage structure and evolution. This work has been sup- eds., Step by step: essays on minimalist syntax in honor
ported by grants FIS2004-0542, IST-FET ECAGENTS of Howard Lasnik, pp. 89-155. MIT Press. Cambridge,
project of the European Community founded under EU Mass.
[10] Piatelli-Palmarini, M. and Uriagereka, J. (2004) The im-
R&D contract 011940, IST-FET DELIS project under
mune syntax: the evolution of the language virus. In:
EU R&D contract 001907, by the Santa Fe Institute and Lyle Jenkins, ed, Variation and Universals in Biolinguis-
the Sony Computer Science Laboratory. tics, pp. 341–377. Elsevier.
[11] Newport, E. L. (1990). Maturational constraints on lan-
guage learning. Cogn. Sci. 14: 11-28.
[12] Steels, L. (2005) The emergence and evolution of linguis-
tic structure: from lexical to grammatical communication
References systems. Connection Sci. 17(3-4), 213–230.
[13] Ferrer i Cancho, R. and Solé, R. V. (2003) Least effort
[1] Hauser, M.D., Chomsky, N. and Fitch, W.T. (2002). The and the origins of scaling in human language. Procs. Natl.
faculty of language: what is it, who has it,and how did Acad. Sci. USA 100, 788-791
it evolve? Science 298, 1569-1579. [14] Kirby, S. and Hurford, J. (1997) Learning, culture and
[2] Jackendoff, R. (1999) Possible stages in the evolution of evolution in the origin of linguistic constraints. In P. Hus-
the language capacity. Trends Cogn. Sci. 3(7), 272–279. bands and I. Harvey, editors, ECAL97, pages 493–502.
[3] Maynard-Smith, J. and Szathmàry, E. (1995) The major MIT Press.
transitions in evolution. Oxford U. Press, Oxford. [15] Nowak, M. A., Plotkin, J. B. & Krakauer, D. C., 1999.
[4] Greenberg J.H. (1963)Universals of Language. MIT The Evolutionary Language Game, J. Theor. Biol. 200,
Press. Cambridge, Ma. 147-162.
[5] Zipf, G.K. (1949) Human Behavior and the Principle of [16] Greenberg, Joseph H. (2002) Indo-European and its Clos-
Least-Effort. Addison-Wesley, Cambridge, MA. est Relatives: the Eurasiatic Language Family Volume I
[6] Peter Ladefoged & Ian Maddieson (1996). The sounds of and II. Stanford University Press, Stanford.
the world’s languages. Blackwell. UK [17] Prigogine, I. and Nicolis, G. (1977) Self-Organization in
[7] Pinker, S. and P. Bloom (1990) Natural language and Non-Equilibrium Systems: From Dissipative Structures
natural selection. Behav. Brain Sci. 13, 707-784. to Order Through Fluctuation. J. Wiley & Sons, New
[8] Chomsky, N. (2002) Interview on minimalism. In: On York
Nature and Language, Belletti, A. and Rizzi, Luigi (Edi- [18] Kaufman, S. (1993) The Origins of Order: Self-
tors) Cambridge University Press, New York. Organization and Selection in Evolution Oxford Univer-
9