Professional Documents
Culture Documents
December 30 2021
1 Abstract
In this case study we glance at the Small world phenomenon, what it is and how
the small world challenge gave a very interesting result. We then, generalise its
implication in today’s world with its richer inter-connectivity and dive into the
challenges faced while analysing the network. Streaming Data based issue are
discussed and algorithms that provide solution to the problem are discussed at
length. Finally we take real life social media example to look at the points
discussed in action.
2 Introduction
Small world phenomenon is a hypothesis stating – any two random people picked
from the crowd will have levels of acquaintances that connect them. Stanley
Milgram’s experiment in response to the challenge of ‘the small world problem’
showed the increasing nature of inter-connectivity in the world around us.
Milgram experimented by sending post cards and a letter addressed to his
colleague in Boston. The participants were randomly chosen from Wichita,
Kansas, Omaha and Nebraska - as these places are distant from Boston ge-
ographically as well as culturally. Hence, providing a good barometer of the
connections that the society has built. The letters that were provided to the
people contained the profile of the person addressed, and the reason of the ex-
periment. While the postcard, was placed to fill the name of the person the
letter is being sent to. People in the experiment were asked to send the letter
directly to the person addressed if they know them personally (in the experi-
ment first-name basis was considered ’personal’ enough), else they would send
the letter to someone in their friends and relative circle they ’think’ might know
the addressed person. Based on the results of the experiment they found - on
average it took 6 levels of acquaintances to reach the destination, giving birth
to ’Six degree of separation’. The term itself was coined by Milgram, but the
implications of his findings were widely accepted.
The Rise of the internet and with it the social media, the inter-connectivity
of the people has risen. This piqued interest to explore this phenomenon on
global basis, especially with the amount of data present around. As the data
being generated is vast the managing and curating of data is an issue, that
1
cannot be tackled easily adding to it the vast network that already exists. The
human society is a living organism on its own and hence required meticulous
reading and research to mimic even the minor aspects of its general day to day
working.
Some of the common small-world network creating models - Erdos-Reyni
Model, Watts-Strogatz Model and Barabasi- Albert Model, are used to tackle
the issue of societal networking. These models work on creating graphs where
the connection topology is neither completely regular nor completely random.
While, Data streaming algorithms help in managing the computation of large
continuous data loads on a static and low memory size. We further look into
these algorithms.
A data stream model (S) is an appropriate model for organizations and oc-
cupations that deal with live events. Sports analytic, banks, telecommunication
- these industries stand to gain immense insight from their real time data that
they generate. Data stream causes stress to transmission line, storage and com-
putation simultaneously, so issue is no longer just about storage, but even if the
streaming data is being read there is always the question about the data is not
present there. Data points can only be read in the order in which they arrive
and true of length of stream is unknown.
A data point in a stream of data model(S) is acted upon a single element in
the Universe(U), defined as the complete data stream model. The algorithms
need to go through the point in the same fashion in which they arrive and
compute a function on S.
Based on how the data points are in U are presented in S, there are two
models:
1. Cash Register Model: Each time data point, different point is treated as
an arbitrary order.
Basic Techniques that are used for designing the algorithms are : Sampling
and Sketching.
1. Sampling: Sampling framework follwos an algorithm choosing incoming
data point with a pre-defined probability. Sampled item thus selected is
added to the memory, else discarded. Any function that then needs to
be computed is then computed on the data points stored in the memory.
Sampling helps in answering distinct values , quantiles or frequently found
items in a data stream.
2
3 Sampling Technique Streaming Algorithms
3.1 Finding Distinct Elements in Data Stream
3.1.1 AMS Algorithm
Given by Alon, Matias, Szegedy - to approximately find the number of the
unique data points present in the date stream. AMS works on the belief, after
applying a hash function, all the data points in the S are uniformly distributed,
and on average one of the distinct values in hit the ρ(h(x)) ≥ log(F0 ), where F0
is the distinct numbers in S.
Algorithm
1. Pick a random function (h): [n] → [n] from a pair-wise independent hash
functions;
2. z ← 0;
3. while an item x arrives do
4. if ρ(h(x)) > z then
5. z ←ρ(h(x));
6. Return 2z+1/2
3
8. z ← z+1;
9. shrink B by removing (x, ρ(h(x))) with ρ(h(x)) ¡ z;
1. while 1 ≤ j ≤ k do
2. Ej ← 0;
3. while a pair (si , Ui ) arrives do
4. if Ui = + then
5. f or j ← 1, k do
6. Ej ← Ej + M[si , j];
7. else
8. f or j ← 1, k do
9. Ej ← Ej - M[si , j];
10. Return medium|Ej |p · scalefactor(p), for 1 ≤ j ≤ k
4
4 Theory at Work: LinkedIn
LinkedIn was founded in 2002 and launched in 2003. The founders Reid Hoff-
man, Allen Blue, Konstantin Guericke, Eric Ly, and Jean-Luc Vaillant sought
to create a social network for professional development as opposed to social de-
velopment.LinkedIn’s user base has grown tremendously. As of Ocotber 2021,
LinkedIn has reported about 790.4 million users registered from over 200 coun-
tries.
With its bi-directional graph structure forming connection, if average person
has 400 connections. Total connection edges become - 3.1 * 1010 edges
The greater density in LinkedIn’s network reflects the functionality of the plat-
form. hence, users readily connect with people outside their immediate friend
circle creating a denser network. We would expect heavy overlap in the popu-
lations as well.
5 Conclusion
We had brief look into the Small world phenonmenon to understand what it
is and the experiment of Milgram to reach the ’Six degrees of separation’. We
then saw the difficulty that are faced in creating a sythesized network and the
challenges streaming data can provide.
Next, we looked into how data streaming algorithms work, and looked into some
specific ones - AMS, BJKST, Indyk and Count-Min. Each providing its own
unique way to solve a particular problem for particular type of streaming data
model.
We finished with a small analysis of LinkedIn and how small world phenomem-
non works in real life.