Emergence of Scaling in Random Networks

Albert-L´ aszl´ o Barab´ asi∗ and R´ eka Albert
Department of Physics, University of Notre-Dame, Notre-Dame, IN 46556

arXiv:cond-mat/9910332v1 [cond-mat.dis-nn] 21 Oct 1999

Systems as diverse as genetic networks or the world wide web are best described as networks with complex topology. A common property of many large networks is that the vertex connectivities follow a scale-free power-law distribution. This feature is found to be a consequence of the two generic mechanisms that networks expand continuously by the addition of new vertices, and new vertices attach preferentially to already well connected sites. A model based on these two ingredients reproduces the observed stationary scalefree distributions, indicating that the development of large networks is governed by robust self-organizing phenomena that go beyond the particulars of the individual systems.

∗ alb@nd.edu

1

ranging from molecular biology to computer science (1). driven by the computerization of data acquisition. the topology of these networks is largely unknown. or describe the world wide web (www). and the edges characterize the social interaction between them (4). living systems form a huge genetic network. networks of complex topology have been described using the random graph theory of Erd˝ os and R´ enyi (ER) (7). raising the possibility of understanding the dynamical and topological stability of large networks. 6). such topological information is increasingly available. whose vertices are proteins and genes. In this paper we report on the existence of a high degree of self-organization characterizing the large scale properties of complex networks. but in the absence of data on large networks the predictions of the ER theory were rarely tested in the real world. the probability P (k ) that a vertex in the network interacts with k other vertices decays as a power-law. where vertices are individuals or organizations. the edges representing the chemical interactions between them (2). whose vertices are the nerve cells.The inability of contemporary science to describe systems composed of non-identical elements that have diverse and nonlocal interactions currently limits advances in many disciplines. For example. whose vertices are the elements of the system and edges represent the interactions between them. following P (k ) ∼ k −γ . Due to their large size and the complexity of the interactions. At a different organizational level. To understand the origin of this scale invariance. This result indicates that large networks self-organize into a scale-free state. connected by axons (3). whose vertices are HTML documents connected by links pointing from one page to another (5. independent of the system and the identity of its constituents. a large network is formed by the nervous system. we show that existing network models fail to 2 . Exploring several large databases describing the topology of large networks that span as diverse fields as the www or the citation patterns in science we show that. Traditionally. But equally complex networks occur in social science. The difficulty in describing these systems lies partly in their topology: many of them form rather complex networks. However. a feature unexpected by all existing random network models.

the vertices representing generators. indicating that the probability that k documents point to a certain webpage follows a power-law. the edges corresponding to the high voltage transmission lines between them (10). two key features of real networks. Using a model incorporating these two ingredients. the scaling region is less prominent. The collaboration graph of movie actors represents a well documented example of a social network. detailed topological data is available only for a few.1 ± 0.1 (Fig. the edges representing links to the articles cited in a paper. following P (k ) ∼ k −γactor . with γwww = 2.3 ± 0. complex network is formed by the citation patterns of the scientific publications. The above examples (12) demonstrate that many large random networks share the com3 . A network whose topology reflects the historical patterns of urban and industrial development is the electrical powergrid of western US. consequently. Finally. Due to the relatively modest size of the network. Recently Redner (11) has shown that the probability that a paper is cited k times (representing the connectivity of a paper within the network) follows a power-law with exponent γcite = 3. 1A). 1B) (9). the vertices standing for papers published in refereed journals. The probability that an actor has k links (characterizing his or her popularity) has a power-law tail for large k .1 ( Fig. we show that they are responsible for the powerlaw scaling observed in real networks. we argue that these ingredients play an easily identifiable and important role in the formation of many complex systems. 1C). where a vertex is a document and the edges are the links pointing from one document to another. containing only 4941 vertices. The topology of this graph determines the web’s connectivity and. implying that our results are relevant to a large class of networks observed in Nature. but is nevertheless approximated with a power-law with an exponent γpower ≃ 4 (Fig. our effectiveness in locating information on the www (5). Finally.incorporate growth and preferential attachment. two actors being connected if they were cast together in the same movie. A more complex network with over 800 million vertices (8) is the www. where γactor = 2. transformers and substations. a rather large. While there are a large number of systems that form complex networks. Information about P (k ) can be obtained using robots (6). Each actor is represented by a vertex.

the power-law tail characterizing P (k ) for the studied networks indicates that highly connected (large k ) vertices have a large chance of occurring. without modifying N . For example. In contrast. In the small world model recently introduced by where λ = N    k Watts and Strogatz (WS) (10). P (k ) is still peaked around z . or reconnected (WS model).1 and 4 which is unexpected within the framework of the existing network models. a common feature of these systems is that the network continuously expands by the addition 4 . the research literature constantly grows by the publication of new papers. With probability p each edge is reconnected to a vertex chosen at random.  that a  N −1 k  p (1 − p)N −1−k . N . dominating the connectivity. that are then randomly connected (ER model). with an exponent γ between 2. a large k ) decreases exponentially with k . they form by the continuous addition of new vertices to the system. and connect each pair of vertices with probability p. thus vertices with large connectivity are practically absent. most real world networks are open. There are two generic aspects of real networks that are not incorporated in these models. leading to a small-world phenomenon (13). the actor network grows by the addition of new actors to the system. increases throughout the lifetime of the network. both models assume that we start with a fixed number (N ) of vertices. The long-range connections generated by this process decrease the distance between the vertices. each vertex being connected to its two nearest and next-nearest neighbors. Consequently. In contrast. The random graph model of ER (7) assumes that we start with N vertices. following a power-law for large k . while for finite p. For p = 0 the probability distribution of the connectivities is P (k ) = δ (k − z ). often referred to as six degrees of separation (14). In the model the probability vertex has k edges follows a Poisson distribution P (k ) = e−λ λk /k !. N vertices form a one-dimensional lattice. A common feature of the ER and WS models is that the probability of finding a highly connected vertex (that is. but it gets broader (15). thus the number of vertices. First. the www grows exponentially in time by the addition of new web pages (8). where z is the coordination number in the lattice.mon feature that the distribution of their local connectivity is free of scale.

2A).9 ± 0. 2A demonstrates. a new actor is cast most likely in a supporting role.of new vertices that are connected to the vertices already present in the system. a newly created webpage will more likely include links to well known. Second. the random network models assume that the probability that two vertices are connected is random and uniform. the system organizes itself into a scale-free stationary state. subsequently. For example. Consequently. After t timesteps the model leads to a random network with t + m0 vertices and mt edges. at every timestep we add a new vertex with m(≤ m0 ) edges that link the new vertex to m different vertices already present in the system. In contrast. the probability that a new actor is cast with an established one is much higher than casting with other less known actors. To incorporate the growing character of the network. The development of the power-law scaling in the model indicates that growth and pref5 . well known actors. As the power-law observed for real networks describes systems of rather different sizes at different stages of their development. independent of the system size m0 + t).1 (Fig. but there is a higher probability to be linked to a vertex that already has a large number of connections. or a new manuscript is more likely to cite a well known and thus much cited paper than its less cited and consequently less known peer. indicating that despite its continuous growth. starting with a small number (m0 ) of vertices. Indeed. as Fig. popular documents with already high connectivity. Similarly. We next show that a model based on these two ingredients naturally leads to the observed scale invariant distribution. we assume that the probability Π that a new vertex will be connected to vertex i depends on the connectivity ki of that vertex. such that Π(ki ) = ki / j kj . with more established. These examples indicate that the probability with which a new vertex connects to the existing vertices is not uniform. To incorporate preferential attachment. This network evolves into a scale-invariant state with the probability that a vertex has k edges following a power-law with an exponent γmodel = 2. most real networks exhibit preferential connectivity. P (k ) is independent of time (and. it is expected that a correct model should provide a distribution whose main features are independent of time.

a scaling property that could be directly tested once time-resolved data on network connectivity becomes available. The probability that a vertex i has a connectivity smaller than k . this property can be used to calculate γ analytically. P (ki(t) < k ). we obtain that P (ti > m2 t/k 2 ) = 1 − P (ti ≤ m2 t/k 2 ) = 1 − m2 t/k 2 (t + m0 ). While at early times the model exhibits power-law scaling. leading with time to some vertices that are highly connected. which gives ki (t) = m(t/ti )0. Thus older (smaller ti ) vertices increase their connectivity at the expense of the younger (larger ti ) ones. Such a model (Fig. thus an initial difference in the connectivity between two vertices will increase further as the network grows. but preferential attachment is eliminated by assuming that a new vertex is connected with equal probability to any vertex in the system (that is. 2C). where ti is the time at which vertex i was added to the system (see Fig. namely growth and preferential attachment. Due to the preferential attachment. Π(k ) = const = 1/(m0 + t − 1)). 2B) leads to P (k ) ∼ exp(−βk ). a ”rich-gets-richer” phenomenon that can be easily detected in real networks. At each time step we randomly select a vertex and connect it with probability Π(ki ) = ki / j kj to vertex i in the system. Model A keeps the growing character of the network. To verify that both ingredients are necessary. which. The rate at which a vertex acquires edges is ∂ki /∂t = ki /2t. The probability density P (k ) can be obtained from P (k ) = ∂P (ki (t) < k )/∂k . after T ≃ N 2 timesteps the system reaches a state in which all vertices are connected. In model B we start with N vertices and no edges. at 6 . P (k ) is not stationary: since N is constant. The failure of models A and B indicates that both ingredients. a vertex that acquired more connections than another one will increase its connectivity at a higher rate. indicating that the absence of preferential attachment eliminates the scale-free feature of the distribution.erential attachment play an important role in network development. we investigated two variants of the model. Assuming that we add the vertices at equal time intervals to the system. Furthermore. and the number of edges increases with time. are needed for the development of the stationary power-law distribution observed in Fig.5 . can be written as P (ti > m2 t/k 2 ). 1.

including such important examples as genetic or signaling networks in biological systems. independent of m. While these and other system-specific features could modify the exponent γ . For this we need to model these systems in more detail. We often do not think of biological systems as open or growing. Finally. including business networks (17. observed in all systems for which detailed data has been available to us.1 and 4. social networks (describing individuals or organizations). For example. if we assume that a fraction p of the links is directed. while in general Π(k ) could have an arbitrary nonlinear form Π(k ) ∼ k α . k3 giving γ = 3. Consequently. leads to the stationary solution P (k ) = 2m2 . possible scale-free features of genetic and signaling networks could reflect the evolutionary history dominated by growth and aggregation of different constituents. for which so far less topological information is available. Growth and preferential attachment are mechanisms common to a number of complex systems. etc. 18). With the fast advances in mapping 7 .long times. we expect that the scaleinvariant state. our model offers the first successful mechanism accounting for the scale-invariant nature of real networks. since their features are genetically coded. While it reproduces the observed scale-free distribution. that is Π(k ) ∼ k . but by adding (and sometimes removing) connections between established vertices. its applicability reaching far beyond the quoted examples. the proposed model cannot be expected to account for all aspects of the studied networks. in the model we assumed linear preferential attachment. the exponents obtained for the different networks are scattered between 2. some networks evolve not only by adding new vertices. leading from simple molecules to complex organisms. which is supported by numerical simulations (16). For example. simulations indicate that scaling is present only for α = 1. However. However. is a generic property of many complex networks. transportation networks (19). Furthermore. However. we obtain γ (p) = 3 − p. it is easy to modify our model to account for exponents different from γ = 3. A better description of these systems would help in understanding other complex systems as well.

D. 130 (1999). 2. 92 (1999). Albert. Publ. J. References and Notes 1. Sci. Am 280. Cambridge. R. Wasserman and K. London. Gallagher and T. 10.out genetic networks. 7. Adamic. answers to these questions might not be too far. 17 (1960). the distribution of searches [B. Strogatz. 6. Sci 5. and A. E. Science 280. A. 1994). since the scale-free inhomogeneities are the inevitable consequence of self-organization due to the local decisions made by the individual vertices. the www displays a number of other scale-free features. 12. J. characterizing the organization of the webpages within a domain [B. Lukose. Nature 401. Science 284. Jeong and A. B. 131 (1998). based on information that is biased towards the more visible (richer) vertices. Appenzeller. or the number of links per webpage (6). R´ enyi. 80 (1999). Similar mechanisms could explain the origin of the social and economic disparities governing competitive systems. Bollob´ as. 131 (1999)]. A. Nature 393. Redner. We also studied the neural network of the worm Caenorhabditis elegans (3. 5. Iyengar. 440 (1998). Weng. (Cambridge University Press. Random Graphs (Academic Press. S. S. Nature 400. A. Math. T. H. 11. Huberman. Pirolli. Lawrence and C. 79 (1999). Bhalla. Hung. Watts and S. irrespective of the nature and the origin of this visibility. 95 (1998)]. European Physical Journal B 4. R. 1985).edu∼networks. P. Science 284. see also http://www. G. Huberman and L. S. Laurent. S. 3. Service. 96 (1999). Pitkow and R. 10) and the benchmark diagram of a computer chip 8 . Science 284. J. P. Acad. Note that in addition to the distribution of incoming links. C. Barab´ asi. Faust. R. 8. Science 284. 9. Nature 401. 107 (1999). Social Network Analysis. L. R. Inst. Members of the Clever project. F. Koch and G. Erd˝ os.nd. L. Giles. 4. 98 (1998). H.-L. 54 (June 1999). Science 280. U.

15. A. NJ. Banavar. Amaral. Rinaldo.edu/∼cheese/ispd98. 60 (1967). R. A. J. 1990). Kochen (ed. A. Maritan and A. However. B. N. Barth´ el´ emy and L. N. private communication]. Rev. private communication]. J. Amaral and M. 13. 107 (1999). Six Degrees of Separation: A play (Vintage Books. C. 14.cs.html).) The Small World (Ablex. M. B. 9 . 16. Milgram. 82. Psychol. Watts for providing the C. 19. elegans and the power grid data. We find that P (k ) for both is consistent with power-law tails. M. 1989). Tu. despite the fact that for C. N. 130 (1999). Science 284. 20. Amaral for helpful discussions. We thank D. W. choosing m randomly will not change the exponent γ [Y. Norwood. Nature 399. 17. elegans the relatively small size of the system (306 vertices) limits severely the data quality.(http://vlsicad. Note that preferential attachment was also used to model correlations between stock prices [L. while for the wiring diagram of the chips vertices with over 200 edges have been eliminated from the database. Arthur. J. 15 (1999). New York. Today 2. Phys. This work was partially supported by NSF Career Award DMR-9710998. 18. Barth´ el´ emy. Lett.ucla. Jeong for collecting the data on the www and L. Tjaden for supplying the actor data. Note that for most networks the connectivity m of the newly added vertices is not constant. A. H. Guare. S.

FIGURES 10 10 10 10 10 10 -1 -2 A 10 0 B 10 0 C 10 -3 -2 10 -1 P(k) 10 -4 -4 10 -2 -5 10 -6 10 -3 -6 10 0 10 1 10 2 10 3 10 -8 10 0 -4 10 10 1 10 2 10 3 10 4 10 0 10 1 k FIG. (C) Powergrid data. (B) γwww = 2. N = 325. (A) Actor collaboration graph with N = 212. The distribution function of connectivities for various large networks. 1.1 and (C) γpower = 4.3. N = 4.46 (6). k = 5. 250 vertices and average connectivity k = 28. 729. 941. k = 2. (B) World wide web.78. The dashed lines have slopes (A) γactor = 2.67. 10 .

(A) The power-law connectivity distribution at t = 150. in the case of m0 = m = 1 (o). The dashed line has slope 0. (C) Time evolution of the connectivity for two vertices added to the system at t1 = 5 and t2 = 95. 11 . (B) The exponential connectivity distribution for model A. using m0 = m = 5. m0 = m = 5 (3) and m0 = m = 7 (△). 2. 000 (o) and t = 200.10 0 A 10 -2 10 0 B 10 3 C 2 10 -2 10 P(k) P(k) 10 -4 ki(t) -4 10 -6 10 10 1 10 -8 10 0 10 1 10 2 10 3 10 -6 0 20 40 60 80 10 0 1 2 3 4 5 10 10 10 10 10 10 0 k k t FIG.9. m0 = m = 3 (2).5. The slope of the dashed line is γ = 2. 000 (2) as obtained from the model (see text).

Sign up to vote on this title
UsefulNot useful