Small Worlds

Small Worlds
CS 249B: Science of Networks Week 04: Thursday, 02/21/08 Daniel Bilar Wellesley College Spring 2008
1
Goals this lecture and next lecture

Methods to read and recall papers
Read: Ideas, not words Retain/Retrieve: Concept maps
Review of Power Law

Relationship to Zipf and Pareto
Empirical Small world

Kevin Bacon game Erdos number Milgrams letters Dodds Global experiment
2
Efficient reading
"t ws drk nd strmy nght; th rn fll n trrnts--xcpt t ccsnl ntrvls, whn t ws chckd by vlnt gst f wnd whch swpt p th strts (fr t s n Lndn tht r scn ls), rttlng lng th hstps, nd frcly gttng th scnty flm f th lmps tht strggld gnst th drknss. -Edward George Bulwer-Lytton, Paul Clifford (1830) Text has all vowels removed (~30%) -> still readable
(Shannon entropy of single character in English is ~1-2 bits)
For our purpose (scientific texts), just remember:

75% of words are redundant to get the concepts! Leverage that fact Very much can be gleaned just from structural makeup
Lets practice this with Barabasis paper

3
Review: Power laws in distributions

A Power Law is a function log(p(x)) f(x) where the value y is proportional to some power of the input x
f(x) ~ x-
x b*x
Properties
So-called Fat tail
Large x is rare but possible
Asymmetric Slope is
straight line on a log log plot
Scale free
log(x)
p(bx) = (bx) = b x
Zipfs law and Power laws

George Kingsley Zipf, a Harvard linguistics professor, sought to determine the 'size' of the 3rd or 8th or 100th most common word. Size here denotes the frequency of use of the word in English text (not the length of the word itself) Zipf's law states that the size of the nth largest occurrence of the event is inversely proportional to its rank r :
n
rank 1
J r - , with close to 1 and J the size of
Pareto and Power Laws

The Italian economist Vilfredo Pareto was interested in the distribution of incomes Paretos law is expressed in terms of the (complimentary) cumulative distribution function (the probability that a person earns X or more)
P[X > x] ~ x-
6
Relationship Zipf and Pareto

Zipfs representation of data is the discretevalued, flipped counterpart of Pareto
Zipfs phrase The r th largest city has n inhabitants is equivalent to saying "r cities have n or more inhabitants"
This is exactly the definition of the Pareto distribution, except the x and y axes are flipped. Whereas for Zipf, r is on the x-axis and n is on the y-axis, for Pareto, r is on the y-axis and n is on the x-axis.
Simply inverting the axes, we get that if the rank exponent is , i.e. n ~ r for Zipf, (n = income, r = rank of person with income n) then the Pareto exponent is 1/ so that r ~ n-1/ (n = income, r = number of people whose income is n or higher)
What do we mean by small

small means that almost every element of the network is somehow close to almost every other element, even those that are perceived as likely to be far away
close and far away are explained in context
The Kevin Bacon Game

Invented by Albright College students in 1994: Craig Fass, Brian Turtle, Mike Ginelly Goal: Connect any actor to Kevin Bacon, by linking actors who have acted in the same movie. Oracle of Bacon website uses Internet Movie Database (IMDB.com) to find shortest link between any two actors:
Boxed version of the Kevin Bacon Game
http://oracleofbacon.org/

An Example
Kevin Bacon
Mystic River (2003)
Tim Robbins
Code 46 (2003)
Om Puri
Yuva (2004)
Rani Mukherjee
Black (2005)
Amitabh Bachchan
10

Total # of actors in database: ~550,000 Average path length to Kevin: 2.79 Actor closest to center: Rod Steiger (2.53) Rank of Kevin, in closeness to center: 876th Most actors are within three links of each other!
Center of Hollywood?
11
Erds Number
Number of links required to connect scholars to Erds, via coauthorship of papers Erds wrote 1500+ papers with 507 co-authors. Jerry Grossmans (Oakland Univ.) website allows mathematicians to compute their Erds numbers: http://www.oakland.edu/enp/
Paul Erds (1913-1996)
Connecting path lengths, among mathematicians only:

average is 4.65 maximum is 13
12
Erds Number
Paul Erds
Paul Erdos, Guantao Chen(1994). Ramsey problems involving degrees in edge-colored complete graphs of vertices belonging to monochromatic subgraphs. European J. Combin. 14
Guantao Chen
Guantao Chen and Michael Stewart (2004). An interlacing result on normalized Laplacians. SIAM J. Discrete Math. 18 (2004), no. 2
Michael Stewart
Michael Stewart and George Cybenko(1992). The linear algebra of perfect reconstruction NATO Adv. Sci. Inst. Ser. E Appl. Sci., no. 232
George Cybenko
George Cybenko and Daniel Bilar (1999). Machine Learning Applications in Grid Computing. Proceedings of the 37th Allerton Conference on Communication, Control, and Computing
Daniel Bilar
13
Milgrams experiment (1960)

MA
NE
Given a target individual and a particular property, pass the message to a person you correspond with who you think closest to the target.
14
Milgrams small world experiment

Random people from Nebraska were to send a letter (via intermediaries) to a stock broker in Boston
Could only send to someone with whom they were on a first-name basis.
296 senders from Boston and Omaha. 20% of senders reached target average chain length = 6.5 Six degrees of separation
15
Small world experiments 2003

Experiment at Columbia
See paper Dodds (2003) in reading list
Targets
18 targets 13 different countries
Results so far
60,000+ participants 24,163 message chains 384 reached their targets
average path length approx .4.0
image by Stephen G. Eick http://www.bell-labs.com/user/eick/index.html (unrelated to small world experiment)

16
Targets
a professor at an Ivy League university, an archival inspector in Estonia, a technology consultant in India, a policeman in Australia, a veterinarian in the Norwegian army
No US-Swiss, Simpsons loving computer scientist yet
17
Attrition
show 95% confidence intervals are per step attrition rates (rL is attrition)
~ 37% participation rate after first step: Probability of a chain of length 10 getting through: .3710 ~ 5 x 10-5 . That is very small! so only one out of 20,000 chains would make it Actual # of completed chains: 384 (1.6% of all chains). Note: Small changes in attrition rates lead to large changes in completion rates E.g., a 15% decrease in attrition rate would lead to a 800% increase in completion rate
18
Estimating chain length

<L> = 4.05 for all completed chains But this just covers completed chains .. but so what? Bias! Shorter chains are more likely to be completed than longer chains
underestimation of <L>
Dodds et al estimated recovered chain lengths for uncompleted chains

L* = Estimated `true' median chain length Intra-country chains: L* = 5 Inter-country chains: L* = 7 All chains: L* = 7 For Milgram (1960): L* ~ 8-9 hops
19
Successful/Unsuccessful Chains
Successful chains disproportionately used
weak ties (Whats that? Google Granovetter weak ties) professional ties (34% vs. 13%) ties originating at work/college target's work (65% vs. 40%)
disproportionately avoided
hubs (8% vs. 1%) (+ no evidence of funnels) family/friendship ties (60% vs. 83%)
Selection Strategy leading to success seems to be

Geography, then Work
20
Six degrees of separation?

Participants are not perfect in routing messages They use only local information
The accuracy of small world chains in social networks (Killworth, McCarty et al) Analyze 10,920 shortest path connections between 105 members of an interviewing bureau, together with the individuals actual route path The mean small world path length (3.23) is 40% longer than the mean of the actual shortest paths (2.30)
Study suggests that people make a less than optimal small world choice more than half the time. In other words: Existence of a short chain is one thing, finding it quite another Research on this made Kleinberg (Cornell) famous
21
Why should we perceive the world as anything other than small?

It is remarkable because
1. 2.
3.
4.
The network is numerically large in the sense that the world contains n >> 100 people. In the real world, n is on the order of billions. The network is sparse in the sense that each person is connected to an average of only k other people, which is, at most, on the order of thousands (Kochen 1989)hundreds of thousands of times smaller than the population of the planet. The network is decentralized in that there is no dominant central vertex to which most other vertices are directly connected. This implies a stronger condition than sparseness: not only must the average degree k be much less than n, but the maximal degree kmax over all vertices must also be much less than n. The network is highly clustered, in that most friendship circles are strongly overlapping. That is, we expect that many of our friends are friends also of each other
See Watts (1999), Networks, Dynamics, and the SmallWorld Phenomenon
22
For next Monday

60 min: Read actively Watts (1999) Network Dynamics and Small World Networks using our techniques 30 min: Peruse Dodds (2003)
23

Small Worlds

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Small Worlds

Uploaded by

Copyright:

Available Formats

Small Worlds

Goals this lecture and next lecture

Review of Power Law

Empirical Small world

For our purpose (scientific texts), just remember:

Lets practice this with Barabasis paper

Review: Power laws in distributions

Zipfs law and Power laws

J r - , with close to 1 and J the size of

Pareto and Power Laws

Relationship Zipf and Pareto

What do we mean by small

The Kevin Bacon Game

The Kevin Bacon Game

The Kevin Bacon Game

Connecting path lengths, among mathematicians only:

Milgrams experiment (1960)

Milgrams small world experiment

Small world experiments 2003

average path length approx .4.0

image by Stephen G. Eick http://www.bell-labs.com/user/eick/index.html (unrelated to small world experiment)

Estimating chain length

Dodds et al estimated recovered chain lengths for uncompleted chains

Selection Strategy leading to success seems to be

Six degrees of separation?

Why should we perceive the world as anything other than small?

See Watts (1999), Networks, Dynamics, and the SmallWorld Phenomenon

For next Monday

You might also like