Professional Documents
Culture Documents
CS 249B: Science of Networks Week 04: Thursday, 02/21/08 Daniel Bilar Wellesley College Spring 2008
1
Efficient reading
"t ws drk nd strmy nght; th rn fll n trrnts--xcpt t ccsnl ntrvls, whn t ws chckd by vlnt gst f wnd whch swpt p th strts (fr t s n Lndn tht r scn ls), rttlng lng th hstps, nd frcly gttng th scnty flm f th lmps tht strggld gnst th drknss. -Edward George Bulwer-Lytton, Paul Clifford (1830) Text has all vowels removed (~30%) -> still readable
(Shannon entropy of single character in English is ~1-2 bits)
x b*x
Properties
So-called Fat tail
Large x is rare but possible
Asymmetric Slope is
straight line on a log log plot
Scale free
log(x)
p(bx) = (bx) = b x
n
rank 1
P[X > x] ~ x-
6
Simply inverting the axes, we get that if the rank exponent is , i.e. n ~ r for Zipf, (n = income, r = rank of person with income n) then the Pareto exponent is 1/ so that r ~ n-1/ (n = income, r = number of people whose income is n or higher)
http://oracleofbacon.org/
Kevin Bacon
Mystic River (2003)
Tim Robbins
Code 46 (2003)
Om Puri
Yuva (2004)
Rani Mukherjee
Black (2005)
Amitabh Bachchan
10
Erds Number
Number of links required to connect scholars to Erds, via coauthorship of papers Erds wrote 1500+ papers with 507 co-authors. Jerry Grossmans (Oakland Univ.) website allows mathematicians to compute their Erds numbers: http://www.oakland.edu/enp/
Paul Erds (1913-1996)
Erds Number
Paul Erds
Paul Erdos, Guantao Chen(1994). Ramsey problems involving degrees in edge-colored complete graphs of vertices belonging to monochromatic subgraphs. European J. Combin. 14
Guantao Chen
Guantao Chen and Michael Stewart (2004). An interlacing result on normalized Laplacians. SIAM J. Discrete Math. 18 (2004), no. 2
Michael Stewart
Michael Stewart and George Cybenko(1992). The linear algebra of perfect reconstruction NATO Adv. Sci. Inst. Ser. E Appl. Sci., no. 232
George Cybenko
George Cybenko and Daniel Bilar (1999). Machine Learning Applications in Grid Computing. Proceedings of the 37th Allerton Conference on Communication, Control, and Computing
Daniel Bilar
13
NE
Given a target individual and a particular property, pass the message to a person you correspond with who you think closest to the target.
14
296 senders from Boston and Omaha. 20% of senders reached target average chain length = 6.5 Six degrees of separation
15
Targets
18 targets 13 different countries
Results so far
60,000+ participants 24,163 message chains 384 reached their targets
Targets
a professor at an Ivy League university, an archival inspector in Estonia, a technology consultant in India, a policeman in Australia, a veterinarian in the Norwegian army
No US-Swiss, Simpsons loving computer scientist yet
17
Attrition
show 95% confidence intervals are per step attrition rates (rL is attrition)
~ 37% participation rate after first step: Probability of a chain of length 10 getting through: .3710 ~ 5 x 10-5 . That is very small! so only one out of 20,000 chains would make it Actual # of completed chains: 384 (1.6% of all chains). Note: Small changes in attrition rates lead to large changes in completion rates E.g., a 15% decrease in attrition rate would lead to a 800% increase in completion rate
18
Successful/Unsuccessful Chains
Successful chains disproportionately used
weak ties (Whats that? Google Granovetter weak ties) professional ties (34% vs. 13%) ties originating at work/college target's work (65% vs. 40%)
disproportionately avoided
hubs (8% vs. 1%) (+ no evidence of funnels) family/friendship ties (60% vs. 83%)
Study suggests that people make a less than optimal small world choice more than half the time. In other words: Existence of a short chain is one thing, finding it quite another Research on this made Kleinberg (Cornell) famous
21
3.
4.
The network is numerically large in the sense that the world contains n >> 100 people. In the real world, n is on the order of billions. The network is sparse in the sense that each person is connected to an average of only k other people, which is, at most, on the order of thousands (Kochen 1989)hundreds of thousands of times smaller than the population of the planet. The network is decentralized in that there is no dominant central vertex to which most other vertices are directly connected. This implies a stronger condition than sparseness: not only must the average degree k be much less than n, but the maximal degree kmax over all vertices must also be much less than n. The network is highly clustered, in that most friendship circles are strongly overlapping. That is, we expect that many of our friends are friends also of each other
22
23