You are on page 1of 23

Small Worlds

CS 249B: Science of Networks Week 04: Thursday, 02/21/08 Daniel Bilar Wellesley College Spring 2008
1

Goals this lecture and next lecture


Methods to read and recall papers
Read: Ideas, not words Retain/Retrieve: Concept maps

Review of Power Law


Relationship to Zipf and Pareto

Empirical Small world


Kevin Bacon game Erdos number Milgrams letters Dodds Global experiment
2

Efficient reading
"t ws drk nd strmy nght; th rn fll n trrnts--xcpt t ccsnl ntrvls, whn t ws chckd by vlnt gst f wnd whch swpt p th strts (fr t s n Lndn tht r scn ls), rttlng lng th hstps, nd frcly gttng th scnty flm f th lmps tht strggld gnst th drknss. -Edward George Bulwer-Lytton, Paul Clifford (1830) Text has all vowels removed (~30%) -> still readable
(Shannon entropy of single character in English is ~1-2 bits)

For our purpose (scientific texts), just remember:


75% of words are redundant to get the concepts! Leverage that fact Very much can be gleaned just from structural makeup

Lets practice this with Barabasis paper


3

Review: Power laws in distributions


A Power Law is a function log(p(x)) f(x) where the value y is proportional to some power of the input x
f(x) ~ x-

x b*x

Properties
So-called Fat tail
Large x is rare but possible

Asymmetric Slope is
straight line on a log log plot

Scale free

log(x)

p(bx) = (bx) = b x

Zipfs law and Power laws


George Kingsley Zipf, a Harvard linguistics professor, sought to determine the 'size' of the 3rd or 8th or 100th most common word. Size here denotes the frequency of use of the word in English text (not the length of the word itself) Zipf's law states that the size of the nth largest occurrence of the event is inversely proportional to its rank r :

n
rank 1

J r - , with close to 1 and J the size of

Pareto and Power Laws


The Italian economist Vilfredo Pareto was interested in the distribution of incomes Paretos law is expressed in terms of the (complimentary) cumulative distribution function (the probability that a person earns X or more)

P[X > x] ~ x-
6

Relationship Zipf and Pareto


Zipfs representation of data is the discretevalued, flipped counterpart of Pareto
Zipfs phrase The r th largest city has n inhabitants is equivalent to saying "r cities have n or more inhabitants"
This is exactly the definition of the Pareto distribution, except the x and y axes are flipped. Whereas for Zipf, r is on the x-axis and n is on the y-axis, for Pareto, r is on the y-axis and n is on the x-axis.

Simply inverting the axes, we get that if the rank exponent is , i.e. n ~ r for Zipf, (n = income, r = rank of person with income n) then the Pareto exponent is 1/ so that r ~ n-1/ (n = income, r = number of people whose income is n or higher)

What do we mean by small


small means that almost every element of the network is somehow close to almost every other element, even those that are perceived as likely to be far away
close and far away are explained in context

The Kevin Bacon Game


Invented by Albright College students in 1994: Craig Fass, Brian Turtle, Mike Ginelly Goal: Connect any actor to Kevin Bacon, by linking actors who have acted in the same movie. Oracle of Bacon website uses Internet Movie Database (IMDB.com) to find shortest link between any two actors:
Boxed version of the Kevin Bacon Game

http://oracleofbacon.org/

The Kevin Bacon Game


An Example

Kevin Bacon
Mystic River (2003)

Tim Robbins
Code 46 (2003)

Om Puri
Yuva (2004)

Rani Mukherjee
Black (2005)

Amitabh Bachchan
10

The Kevin Bacon Game


Total # of actors in database: ~550,000 Average path length to Kevin: 2.79 Actor closest to center: Rod Steiger (2.53) Rank of Kevin, in closeness to center: 876th Most actors are within three links of each other!
Center of Hollywood?
11

Erds Number
Number of links required to connect scholars to Erds, via coauthorship of papers Erds wrote 1500+ papers with 507 co-authors. Jerry Grossmans (Oakland Univ.) website allows mathematicians to compute their Erds numbers: http://www.oakland.edu/enp/
Paul Erds (1913-1996)

Connecting path lengths, among mathematicians only:


average is 4.65 maximum is 13
12

Erds Number
Paul Erds
Paul Erdos, Guantao Chen(1994). Ramsey problems involving degrees in edge-colored complete graphs of vertices belonging to monochromatic subgraphs. European J. Combin. 14

Guantao Chen
Guantao Chen and Michael Stewart (2004). An interlacing result on normalized Laplacians. SIAM J. Discrete Math. 18 (2004), no. 2

Michael Stewart
Michael Stewart and George Cybenko(1992). The linear algebra of perfect reconstruction NATO Adv. Sci. Inst. Ser. E Appl. Sci., no. 232

George Cybenko
George Cybenko and Daniel Bilar (1999). Machine Learning Applications in Grid Computing. Proceedings of the 37th Allerton Conference on Communication, Control, and Computing

Daniel Bilar
13

Milgrams experiment (1960)


MA

NE

Given a target individual and a particular property, pass the message to a person you correspond with who you think closest to the target.
14

Milgrams small world experiment


Random people from Nebraska were to send a letter (via intermediaries) to a stock broker in Boston
Could only send to someone with whom they were on a first-name basis.

296 senders from Boston and Omaha. 20% of senders reached target average chain length = 6.5 Six degrees of separation
15

Small world experiments 2003


Experiment at Columbia
See paper Dodds (2003) in reading list

Targets
18 targets 13 different countries

Results so far
60,000+ participants 24,163 message chains 384 reached their targets

average path length approx .4.0

image by Stephen G. Eick http://www.bell-labs.com/user/eick/index.html (unrelated to small world experiment)


16

Targets
a professor at an Ivy League university, an archival inspector in Estonia, a technology consultant in India, a policeman in Australia, a veterinarian in the Norwegian army
No US-Swiss, Simpsons loving computer scientist yet
17

Attrition
show 95% confidence intervals are per step attrition rates (rL is attrition)
~ 37% participation rate after first step: Probability of a chain of length 10 getting through: .3710 ~ 5 x 10-5 . That is very small! so only one out of 20,000 chains would make it Actual # of completed chains: 384 (1.6% of all chains). Note: Small changes in attrition rates lead to large changes in completion rates E.g., a 15% decrease in attrition rate would lead to a 800% increase in completion rate
18

Estimating chain length


<L> = 4.05 for all completed chains But this just covers completed chains .. but so what? Bias! Shorter chains are more likely to be completed than longer chains
underestimation of <L>

Dodds et al estimated recovered chain lengths for uncompleted chains


L* = Estimated `true' median chain length Intra-country chains: L* = 5 Inter-country chains: L* = 7 All chains: L* = 7 For Milgram (1960): L* ~ 8-9 hops
19

Successful/Unsuccessful Chains
Successful chains disproportionately used
weak ties (Whats that? Google Granovetter weak ties) professional ties (34% vs. 13%) ties originating at work/college target's work (65% vs. 40%)

disproportionately avoided
hubs (8% vs. 1%) (+ no evidence of funnels) family/friendship ties (60% vs. 83%)

Selection Strategy leading to success seems to be


Geography, then Work
20

Six degrees of separation?


Participants are not perfect in routing messages They use only local information
The accuracy of small world chains in social networks (Killworth, McCarty et al) Analyze 10,920 shortest path connections between 105 members of an interviewing bureau, together with the individuals actual route path The mean small world path length (3.23) is 40% longer than the mean of the actual shortest paths (2.30)

Study suggests that people make a less than optimal small world choice more than half the time. In other words: Existence of a short chain is one thing, finding it quite another Research on this made Kleinberg (Cornell) famous
21

Why should we perceive the world as anything other than small?


It is remarkable because
1. 2.

3.

4.

The network is numerically large in the sense that the world contains n >> 100 people. In the real world, n is on the order of billions. The network is sparse in the sense that each person is connected to an average of only k other people, which is, at most, on the order of thousands (Kochen 1989)hundreds of thousands of times smaller than the population of the planet. The network is decentralized in that there is no dominant central vertex to which most other vertices are directly connected. This implies a stronger condition than sparseness: not only must the average degree k be much less than n, but the maximal degree kmax over all vertices must also be much less than n. The network is highly clustered, in that most friendship circles are strongly overlapping. That is, we expect that many of our friends are friends also of each other

See Watts (1999), Networks, Dynamics, and the SmallWorld Phenomenon

22

For next Monday


60 min: Read actively Watts (1999) Network Dynamics and Small World Networks using our techniques 30 min: Peruse Dodds (2003)

23

You might also like