You are on page 1of 3

The purpose of this work is the program management of the time-varying network Stack

Overflow Temporal Network, the description of this network and its data can be found at
https://snap.stanford.edu/data/sx-stackoverflow.html. Each edge of this network is
associated with a timestamp that corresponds to the time at which it was created.
The set of directed edges of the network with the corresponding time stamps is stored in
the sx-stackoverflow.txt file in the form of consecutive triads (source_id, target_id,
timestamp), where source_id is the edge node ID, target_id is the ID of the acne node while
the timestamp indicates the time of onset of acne.

Therefore, the available data for the network in question can be represented as time-
correlated edges of the format:

e ij ( t )=⟨ v i , v j , t ⟩ (1) για t min ≤ t ≤t max

where t min is the oldest time observation in the available data set and t max is the most recent
time observation. The specific time interval T =[ t min , t max ] is divided into a set (N) of non-
overlapping time periods { T 1 , T 2 , ⋯ ,T j , … ,T N } of equal duration (δt) considering a set (N +
1) time points { t 0 ,t 1 ,t 2 , ⋯ ,t j −1 , t j … ,t N −1 , t N } such that:

t j=t min + j∗δt (2) για 0 ≤ j ≤ N

ΔT
Where ΔT =t max −t min ( 3 ) and δ t= (4). According to the above clarifications the j-th time
N
period can be defined according to the following relation:

[ t j−1 ,t j ) , 1 ≤ j≤ N−1 ; (5)


T j=
{ [ t j−1 ,t j ] , j=N .

For each of the time periodsT j for 1 ≤ j≤ N we can consider the corresponding sub-graph of
total network G [ t j−1 , t j ]=(V [t j−1 , t j ], E[t j−1 , t j ]) where V [ t j−1 , t j ] is the set of vertices
appearing at the edges of the network edges during the time period T j. The set of network
edges created during this time period is the set E [t j−1 ,t j] .

A more rigorous description of the temporal evolution of the examined network within the
context of the problem of edge forecasting requires the formulation of some complementary
relations. Specifically, during the transition of the network from the time period T j to the
time period T j+ 1 we are interested in the set of vertices that remains common between the
time intervals [ t j−1 ,t j ] and[ t j , t j+1 ], which will be denoted by the setV [t j−1 , t j+1 ] given by the
¿

relation:

V ¿ [ t j−1 ,t j +1 ]=V [ t j−1 , t j ] ⋂ V [ t j ,t j +1 ] (6) for 1 ≤ j≤ N−1.

Respectively, we are interested in limiting the sets E [t j−1 ,t j] and E [t j , t j+1 ] to those subsets
¿
of edges whose vertices strictly belong to the set V [t j−1 , t j+1 ]. These finite edge sets will be
¿
denoted as the set E [t j , t j+ 1] and will be given by the relations:
E¿ [ t j−1 , t j ]={( u , v ) ∈ E [ t j−1 , t j ] :u ∈V ¿ [t j−1 , t j+1 ] και v ∈ V ¿ [ t j−1 , t j +1 ]} ( 7 )

E¿ [ t j , t j+1 ]= {( u , v ) ∈ E [ t j ,t j +1 ] :u ∈V ¿ [t j−1 , t j+1 ] και v ∈ V ¿ [ t j−1 , t j +1 ]} ( 8 )

The programming management of the aforementioned time-varying network consists of


writing code in either the MatLab or Python programming environment to perform the
following procedures:
1. Calculation of time moments t min and t max.
2. Divide the total time interval T =[ t min , t max ] into subdivisions{ T 1 , T 2 , ⋯ ,T j , … ,T N } and
calculate the corresponding time points { t 0 ,t 1 ,t 2 , ⋯ ,t j −1 , t j … ,t N −1 , t N } as a function
of parameter (N). The parameter (N) can be changed by the program user before
running it.
3. Programmatic mapping (either via the adjacency matrix) or through a tool native to
the tool you will use, e.g. a Graph object of the Python NetworkX module) of the set
of subnets G [ t j−1 , t j ] for 1 ≤ j≤ N .
4. For each of the subnets G [ t j−1 , t j ] for 1 ≤ j≤ N calculate and graph the distribution of
the values of the following centrality measures:
I. Degree Centrality
II. In-Degree Centrality
III. Out-Degree Centrality
IV. Closeness Centrality
V. Betweenness Centrality
VI. Eigenvector Centrality
VII. Katz Centrality
¿
5. For each pair of consecutive subnets ¿) for 1 ≤ j≤ N−1 calculate the sets V [ t j−1 ,t j +1 ],
E¿ [ t j−1 , t j ] and E¿ [ t j , t j+1 ] .
¿ ¿
6. For each pair of nodes ( u , v ) ∈ V [ t j−1 ,t j +1 ] and each set V [ t j−1 ,t j +1 ] with
1 ≤ j≤ N−1 calculate the following tables similarly:
a. SGD =[ S GD ( u , v ) ]=−Lengt h of S h ortest Pat h Between u∧v [Graph Distance]
b. SCN =[ SCN ( u , v ) ]=¿ Γ (u)⋂ Γ (v )∨¿ [Common Neighbors] where Γ (u)
the set of neighbors of node u.
¿
c. SJC =[ S JC ( u , v ) ]=¿ Γ (u) ⋂ Γ (v )∨ ¿ Γ (u)⋃ Γ (v)∨¿¿ ¿ [Jaccard’s Coefficient]
1
d. S A =[ S A ( u , v ) ] = ∑ log ⁡¿ ¿ ¿ [Adamic / Adar]
z ∈ Γ (u) ⋂ Γ (v)

e. S PA =[ S PA ( u , v ) ]=¿ Γ (u)∨¿∨Γ ( v ) ∨¿ [Preferential Attachment]

Caution: The above similarity tables will be calculated for the set of nodes that are
¿
common to two consecutive subnets, ie the set V [ t j−1 ,t j +1 ], but on the basis of the set of
¿
edges E [ t j−1 , t j ]. These are the edges of the previous period of time that are formed
¿
between vertices that belong to the common set of nodes V [ t j−1 ,t j +1 ].
7. For each of the SGD , SCN , SJC , S A and S PA similarity tables calculated in the previous
¿
query, for each of the node sets V [ t j−1 ,t j +1 ], export the top pGD %, pCN %, pJC % , p A %
and p PA % (higher) similarity values and the pairs of peaks to which they correspond.
¿
The percentage of these pairs of vertices that actually belong to the set E [ t j , t j+1 ]
indicates the percentage of success in predicting future edges of each metric.
Calculate the percentage of correct forecast for each similarity measure for each set
V ¿ [ t j−1 ,t j +1 ] (for example for each pair of consecutive subnets). The values of the
parameters pGD %, pCN %, pJC % , p A % and p PA % will be given by the program user
before running the program.

You might also like