You are on page 1of 5

Online Social Network - Small World

Phenomenon - An Algoretmic Perspective


Author: Anubhav Kandwal

December 30 2021

1 Abstract
In this case study we glance at the Small world phenomenon, what it is and how
the small world challenge gave a very interesting result. We then, generalise its
implication in today’s world with its richer inter-connectivity and dive into the
challenges faced while analysing the network. Streaming Data based issue are
discussed and algorithms that provide solution to the problem are discussed at
length. Finally we take real life social media example to look at the points
discussed in action.

2 Introduction
Small world phenomenon is a hypothesis stating – any two random people picked
from the crowd will have levels of acquaintances that connect them. Stanley
Milgram’s experiment in response to the challenge of ‘the small world problem’
showed the increasing nature of inter-connectivity in the world around us.
Milgram experimented by sending post cards and a letter addressed to his
colleague in Boston. The participants were randomly chosen from Wichita,
Kansas, Omaha and Nebraska - as these places are distant from Boston ge-
ographically as well as culturally. Hence, providing a good barometer of the
connections that the society has built. The letters that were provided to the
people contained the profile of the person addressed, and the reason of the ex-
periment. While the postcard, was placed to fill the name of the person the
letter is being sent to. People in the experiment were asked to send the letter
directly to the person addressed if they know them personally (in the experi-
ment first-name basis was considered ’personal’ enough), else they would send
the letter to someone in their friends and relative circle they ’think’ might know
the addressed person. Based on the results of the experiment they found - on
average it took 6 levels of acquaintances to reach the destination, giving birth
to ’Six degree of separation’. The term itself was coined by Milgram, but the
implications of his findings were widely accepted.
The Rise of the internet and with it the social media, the inter-connectivity
of the people has risen. This piqued interest to explore this phenomenon on
global basis, especially with the amount of data present around. As the data
being generated is vast the managing and curating of data is an issue, that

1
cannot be tackled easily adding to it the vast network that already exists. The
human society is a living organism on its own and hence required meticulous
reading and research to mimic even the minor aspects of its general day to day
working.
Some of the common small-world network creating models - Erdos-Reyni
Model, Watts-Strogatz Model and Barabasi- Albert Model, are used to tackle
the issue of societal networking. These models work on creating graphs where
the connection topology is neither completely regular nor completely random.
While, Data streaming algorithms help in managing the computation of large
continuous data loads on a static and low memory size. We further look into
these algorithms.
A data stream model (S) is an appropriate model for organizations and oc-
cupations that deal with live events. Sports analytic, banks, telecommunication
- these industries stand to gain immense insight from their real time data that
they generate. Data stream causes stress to transmission line, storage and com-
putation simultaneously, so issue is no longer just about storage, but even if the
streaming data is being read there is always the question about the data is not
present there. Data points can only be read in the order in which they arrive
and true of length of stream is unknown.
A data point in a stream of data model(S) is acted upon a single element in
the Universe(U), defined as the complete data stream model. The algorithms
need to go through the point in the same fashion in which they arrive and
compute a function on S.
Based on how the data points are in U are presented in S, there are two
models:

1. Cash Register Model: Each time data point, different point is treated as
an arbitrary order.

2. Turnstile model: A Dataset is maintained, initially kept as NULL. Every


item from the stream is paired with a ’+’ or ’-’ value, for former the data
point is added and for later the value is removed from the dataset. This
model reflects the change in the dataset over a certain period of time.

Basic Techniques that are used for designing the algorithms are : Sampling
and Sketching.
1. Sampling: Sampling framework follwos an algorithm choosing incoming
data point with a pre-defined probability. Sampled item thus selected is
added to the memory, else discarded. Any function that then needs to
be computed is then computed on the data points stored in the memory.
Sampling helps in answering distinct values , quantiles or frequently found
items in a data stream.

2. Sketching: Dimensional reduction is done using projections along random


vectors. The projection thus formed are called ’Sketches’. Data present in
the sketches reflects some statistical information of the data stream, but
is not exactly similar to the data in the stream.

2
3 Sampling Technique Streaming Algorithms
3.1 Finding Distinct Elements in Data Stream
3.1.1 AMS Algorithm
Given by Alon, Matias, Szegedy - to approximately find the number of the
unique data points present in the date stream. AMS works on the belief, after
applying a hash function, all the data points in the S are uniformly distributed,
and on average one of the distinct values in hit the ρ(h(x)) ≥ log(F0 ), where F0
is the distinct numbers in S.

Algorithm
1. Pick a random function (h): [n] → [n] from a pair-wise independent hash
functions;
2. z ← 0;
3. while an item x arrives do
4. if ρ(h(x)) > z then
5. z ←ρ(h(x));
6. Return 2z+1/2

3.1.2 BJKST Algorithm


Given by Bar-Yossef, Jayram, Kumar, Sivakumar and Trevisan. It keeps a
sampled items, running independent copies in parallel and returing the medium
of the output. General idea of the algorithm is:
1. Use a set to maintain items;
2. When set is full, set is reduced by half and then sample probability be-
comes smaller.
3. Finally reduced number of items can be used to give an approximation of
distinct data points.
Algorithm
1. Pick a random function (h): [n] → [n] from a pair-wise independent hash
functions;
2. z ← 0;
3. B ←
4. while an item x arrives do
5. if ρ(h(x)) > z then
6. B ← B ∪ (x, ρ(h(x)));
7. while |B| ≥ c/ϵ2 do

3
8. z ← z+1;
9. shrink B by removing (x, ρ(h(x))) with ρ(h(x)) ¡ z;

10. Return |B| · 2z

3.1.3 Indyk’s Algorithm


Approximation algorithm don’t directly translate for turnstile data model. This
is where Indyk’s Algorithm is helpful. First the approximation of F0 is found
for turnstile stream.

• Input: A sequence of pairs of the form (si , Ui ), where si ∈ [n] and Ui =


+/-.
• Output: F0 of the data set expressed by the stream.
Algorithm

1. while 1 ≤ j ≤ k do
2. Ej ← 0;
3. while a pair (si , Ui ) arrives do
4. if Ui = + then

5. f or j ← 1, k do
6. Ej ← Ej + M[si , j];
7. else

8. f or j ← 1, k do
9. Ej ← Ej - M[si , j];
10. Return medium|Ej |p · scalefactor(p), for 1 ≤ j ≤ k

3.2 Estimating Frequency


3.2.1 Count-Min Sketch
Introduced by Cormode and Muthukrishnan, this algorithm follows sketching
technique to find the approximate frequency count of the data points in the
stream. Algorithm requires a fixed array M with counters of width w and depth
d. These counters are initially zero. Every row has its own hash function hi ,
mapping element from data stream.
Whenever an INSERT happens,
C[j,h(xi )] ← C[j,h(xi )] + 1, for every row 1 ≤ i ≤ d.
Whenever a DELETE operation is done,
C[j,h(xi )] ← C[j,h(xi )] - 1
Frequency of an element si mi ≜ min C[j,h(xi )], for 1 ≤ j ≤ k

4
4 Theory at Work: LinkedIn
LinkedIn was founded in 2002 and launched in 2003. The founders Reid Hoff-
man, Allen Blue, Konstantin Guericke, Eric Ly, and Jean-Luc Vaillant sought
to create a social network for professional development as opposed to social de-
velopment.LinkedIn’s user base has grown tremendously. As of Ocotber 2021,
LinkedIn has reported about 790.4 million users registered from over 200 coun-
tries.
With its bi-directional graph structure forming connection, if average person
has 400 connections. Total connection edges become - 3.1 * 1010 edges
The greater density in LinkedIn’s network reflects the functionality of the plat-
form. hence, users readily connect with people outside their immediate friend
circle creating a denser network. We would expect heavy overlap in the popu-
lations as well.

5 Conclusion
We had brief look into the Small world phenonmenon to understand what it
is and the experiment of Milgram to reach the ’Six degrees of separation’. We
then saw the difficulty that are faced in creating a sythesized network and the
challenges streaming data can provide.
Next, we looked into how data streaming algorithms work, and looked into some
specific ones - AMS, BJKST, Indyk and Count-Min. Each providing its own
unique way to solve a particular problem for particular type of streaming data
model.
We finished with a small analysis of LinkedIn and how small world phenomem-
non works in real life.

You might also like