Professional Documents
Culture Documents
By
In this final report, we would like to introduce the concept of Graph Signal Processing (GSP) and
its two “Blind Detection” applications. Due to much more random complex graphs and data,
instead of using standard data analysis tools, the technique of GSP is used to model graph nodes
as signals to do further complicated graph analysis on graph signals. By the application of “Blind
Community Detection” and “Blind Identification of Stochastic Block Models”, we are going to
show how the GSP technique can help us to solve the graph problems more easily. Section 2 will
discuss the background knowledge of GSP. Section 3&4 will introduce the two “Blind Detection”
applications respectively. And section 4 also contains an innovative part presenting a new
simulation of the algorithms introduced in the “Blind Identification of Stochastic Block Models”
application.
Graph is a powerful tool that allows us to model various types of data in the real world to analyze
the data and their interactions in a more comprehensive way. For standard regular data, we can use
the normal graph theory to deal with them. However, for some irregular random data with more
complexity, we derive a technique called “Graph Signal Processing (GSP)” from the conventional
idea of “Digital Signal Processing”. To achieve GSP, we first convert the signal from “sequence”
in DSP to “vectors” in GSP like:
Then, some elements of DSP can also be converted to the versions in GSP to assist in analyzing
graphs containing data information, especially tools made from adjacency matrix and Laplacian
matrix. Let’s introduce some conversions from DSP to GSP that can help to analyze the graph
problems mentioned later.
We have mentioned the signals representation in both DSP and GSP. Then, we can also convert
the system process method (filter) and signal transformation (Z-transform & Discrete Fourier
Transform) from DSP to GSP. In DSP, a signal passes a shift filter and creates an output shifted
signal by ℎ&'()* (𝑧) = 𝑧 $" , which is similar with the idea of Z-transform (both are the polynomial
representation of the signal). This is a basic example of shift filter representing a signal system
process. We can also achieve a shift filter in GSP by using a matrix H, 𝑠+,* = H ∙ 𝑠(- . Also, we
have two special matrices of graphs -- adjacency matrix A and Laplacian matrix L in graph theory
that can also act like a shift filter. These two matrices will be introduced in detail in applications
mentioned later. We have learned the Z-transform & Discrete Fourier Transform in DSP in
previous ECE courses, so let’s focus on the Z-transform & Fourier Transform in GSP. Like
mentioned above, z-transform in GSP can be simply modeled as shift filter matrices due to the
characteristics of the polynomial representation of the signal 𝑧 $- , 𝑛 = 0, … , N − 1.Then, the
Fourier Transform in GSP can also be modeled as a matrix, which is just the inverse matrix of the
eigenvectors of the shift matrix, defined as V. We can then derive the graph Fourier Transform of
graph signals a decomposition like:
With the understanding of basic concepts of GSP, we would like to use tools derived from GSP to
analyze specific graph problems – the “Blind Detection” problems in the next two sections.
Blind community detection problem is a field in which we can straightforwardly apply the idea of
graph signal processing (GSP) as models to real-world problems and applications. When it comes
to the blind detection problems, not like flows of the normal idea of GSP filter, we are going to
model the problem with GSP inversely. In this case, we only have the observed graph signals and
desired number of communities K as inputs, then we would like to infer the community structure
of the graph from the observed graph signals. Therefore, we would like to model our observed
graph signals as the output of the graph filter with unknown “low-rank” input excitations.
As mentioned in the section of GSP above, both the adjacency matrix and the Laplacian matrix of
the graph can represent the shift of the graph signals, and they can also be used as Graph Shift
Operators (GSO) to capture the graph information like community structures. In this problem, to
avoid some analytical and numerical difficulties when using the adjacency matrix A, we will use
the Laplacian matrix L as the GSO in our problem of Blind Community Detection. Another
important feature of our solutions to the Blind Community Detection problem is “low-rank”. In
previous research, when using GSP techniques to solve graph problems, it is required to have all
the observed signals as outputs or inputs to be “full-rank”. Then, a lot of graph problems are limited
to be solved by this assumption, since the amount and complexity of work would be exponentially
increased when dealing with graphs with much more nodes. However, in this problem, we are
going to introduce the model with a “low-rank” feature that can help to significantly decrease the
complexity of graph problems. We summarize the procedures as two methods --- “BlindCD” and
“Boosted BlindCD” with the “low-rank” feature. In the following sections, we are going to
introduce the background knowledge and details of the “BlindCD” algorithm & “Boosted BlindCD”
algorithm respectively.
3.1 Background Knowledge
We picked the Laplacian matrix L to be the GSO that can be used to define the graph shift filter
and capture the information about graph communities. Often the Laplacian matrix L of the graph
can be defined in two ways shown below:
𝑳 ∶= 𝑫 − 𝑨
𝑳 ∶= 𝑽𝜦 𝑽𝐓
D here is the diagonal matrix that contains the degrees of nodes in graph G. A here is the adjacency
matrix of the graph G. V is a matrix that contains all the eigenvectors of L. And 𝜦 is the diagonal
matrix that contains all the corresponding eigenvalues, 𝜦 = 𝐃𝐢𝐚𝐠([𝝀𝟏 , … , 𝝀𝑵 ]), where 𝝀𝟏 = 𝟎 is
the minimal and 𝝀𝑵 is the maximal corresponding eigenvalue. Also, one of the most important
definition is that a graph signal in this problem is something modeled as a function on all the nodes
of the graph G and shown as a vector, like 𝒙 ∶= [𝒙𝟏 , 𝒙𝟐 , … , 𝒙𝑵 ], where each element in the vector
x represent the value on the corresponding node. Next, we can define the graph filter we use in the
model as H(L) shown below:
𝑻𝒅 $𝟏 𝑻𝒅 $𝟏
𝑯(𝑳) ∶= O 𝒉𝒕 𝑳 = 𝑽( O 𝒉𝒕 𝜦𝒕 ) 𝑽𝐓
𝒕
𝒕4𝟎 𝒕4𝟎
, where 𝑇6 represents the order of the graph filter. And other components were introduced above.
Another important definition we would like to introduce is the frequency response of the graph
R , which is defined as:
filter 𝒉
𝑻𝒅 $𝟏
ℎS( ∶= ℎ(l( ) = O ℎ* l*(
*4!
The frequency response of the graph filter 𝒉 R would be an important parameter that can affect the
performance of the “BlindCD” algorithm, we will discuss this in detail in later sections. Now we
can simply formulate our model as: 𝒚 = 𝑯(𝑳)𝒙, where y is our observed graph signals, then we
would like to retrieve the community structure from this simple model. Let’s discuss this model in
detail. First, let’s rewrite the model formula with more details as:
To explain the model formula in detail, 𝒚𝒍 is our observed graph signal as mentioned above. 𝑯(𝑳)
is our graph filter based on the Laplacian matrix L of the graph, 𝒘𝒍 are the possible errors during
measurements and analysis. 𝒙𝒍 represents an excitation input of the graph filter, then 𝒙𝒍 can be
further decomposed as:
𝒙𝒍 = 𝑩𝒛𝒍 ,
where 𝒛𝒍 is defined as the latent parameter vector that affects the input excitation signal 𝒙𝒍 . And
the matrix B is a matrix that sketches the behaviors between 𝒛𝒍 and 𝒙𝒍 and compresses the vector
dimension, especially the input excitation signal 𝒙𝒍 . As mentioned above, the model has the “low-
rank” feature, we can achieve the “low-rank” feature by applying the latent parameter vector 𝒛𝒍
which is R and assuming the input excitation signal 𝒙𝒍 to be within R dimensions of all the N
nodes. Another critical concept that helps us to complete the blind community detection is the
spectral clustering method. Then, the K-mean method is an important step of the spectral clustering
method. Basically, when applying the spectral clustering method, we will create another matrix
VK which is consist of first K eigenvectors of the Laplacian matrix L of our graph G, then we will
apply the K-mean method on the “eigenmatrix” VK to divide all the nodes N into K communities
that make the distance between each row vector of VK and their respective means be minimized to
optimally retrieve the graph community structure. The formula for the K-mean method is shown
below:
@
8
1
𝐹(𝐶" , … , 𝐶8 ) ∶= O O l𝑣(9:; − O 𝑣=9:; l
|𝐶< |
<4" (∈?" =∈?"
@
With enough background knowledge, now we can move on to the details of the our “BlindCD”
algorithm.
For the “BlindCD”, we generally apply the spectral clustering method to the sampled covariance
matrix of our observed graph signals to detect the community structure of the graph from the inputs
of observed graph signals and the desired number of communities K. And the sampled covariance
matrix contains information about the communities. Let’s first summarize the steps of the
“BlindCD” algorithm shown as below:
1. Input: Our observed graph signals yl, l = 1,…, L. Our desired number of communities K.
p 𝒚 as:
2. Compute the sampled covariance 𝑪
1 C
p𝒚 = q r O 𝑦 B (𝑦 B )% , where 𝐶D ∶= 𝔼[𝑦 B (𝑦 B )% ] = 𝐻(𝐿)𝐵𝐵% 𝐻% (𝐿) + 𝜎E@ 𝐼
𝑪
𝐿 B4"
From the previous equations, we have 𝐻(𝐿)𝐵 = 𝑉Diag(ℎS)𝑉 % 𝐵 as the sketch of the graph filter,
where B is the sketch matrix that can achieve the “low-rank” mode to decrease the dimension
from N to R.
p 𝒚 and create a new “eigenmatrix” 𝑉{8 .
3. Find the top-K eigenvectors of the sampled covariance 𝑪
4. Apply the K-mean method to the “eigenmatrix” 𝑉{8 , which seeks to optimize
@
8
1
min 𝐹(𝐶" , … , 𝐶8 ) ∶= O O l𝑣|(9:; − O 𝑣|=9:; l ,
?# ,…,?$ ⊆I |𝐶< |
<4" (∈?" =∈?"
@
where 𝑣|(9:; ∶= }𝑉{8 ~(,: ∈ ℝ8
Similar to directly apply the spectral clustering method on the Laplacian matrix L, in the “BlindCD”
algorithm, we detect the community structure by applying the spectral clustering method on the
p𝒚 , which also contains the community structure information to reduce the
sampled covariance 𝑪
complexity of the problem compared with conventional methods. So, we actually can consider the
p 𝒚 , as a spectral sketch of the Laplacian matrix L of the graph.
sampled covariance 𝑪
To analyze the performance of the “BlindCD” algorithm, we would like to introduce the concept
of low-pass graph filter (LPGF) and low-pass coefficient ƞ. We define a graph filter H(L) is a (K,
ƞ)-LPGF if
max {†ℎS8K" †, … , †ℎS# †}
ƞ ∶= < 1,
min {†ℎS" †, … , †ℎS8 †}
1 𝐿' @ 1 𝜇' @
1− q ∆l8 − ℎL (l8K" )∆l8 r ≤ ƞ ≤ 1 − Œ ∆l8 − ℎL (l8K" )∆l8 Ž,
ℎS8 2 ℎS8 2
We want the low-pass coefficient ƞ to be nearly ideal as around 0. To achieve this goal, the max
of K+1th frequency response should be as small as possible, and the Kth frequency response
should be relatively larger. Then, it means that only the first K elements of the frequency response
of the graph filter contains the most significant information of the graph community structure to
make the rank of the graph filter to be nearly K, applying the spectral clustering method on its
sampled covariance matrix and get the optimal community structure. And the low-pass coefficient
ƞ is the quantification of this property. The low-pass coefficient ƞ is derived from the frequency
response of the graph filter, therefore, the frequency response of the graph filter 𝒉R , especially the
Kth value, has significant effects on the low-pass coefficient ƞ. From the boundary of the low-pass
coefficient ƞ, we can see that one of the ways to reduce the low-pass coefficient ƞ is to reduce the
value of ℎS8 . By analyzing the “suboptimality” of the output derived from “BlindCD” compared
to the output derived by directly applying spectral clustering method on Laplacian matrix L, we
can conclude that the number of the communities K and the low-pass coefficient ƞ can affect the
performance of the “BlindCD” algorithm. When K decreases and the low-pass coefficient ƞ is
smaller, the performance of the “BlindCD” algorithm is better. Making the low-pass coefficient ƞ
smaller is an important approach to boost the performance of “BlindCD” algorithm. Therefore, in
the next section, we are going to introduce the “Boosted BlindCD” that can boost the performance
of “BlindCD” by reducing the low-pass coefficient ƞ through a sparse decomposition method.
3.3 Algorithm 2: “Boosted BlindCD” & Performance Analysis
The main idea of the “Boosted BlindCD” algorithm is reducing the low-pass coefficient ƞ by
R (𝐿)
decomposition on the original graph filter H(L) to get a new version of boosted graph filter 𝑯
with a smaller boosted low-pass coefficient ƞ• and a smaller Kth frequency response of graph filter
R (𝐿) as ℎS8 − ℎS# . The decomposition is shown below:
𝑯
𝐻(𝐿)𝐵 = 𝐻(𝐿)𝐵 − ℎS# 𝐵 + ℎS# 𝐵 = 𝐻(𝐿)𝐵 − ℎS# 𝐼𝐵 + ℎS# 𝐵 = •𝐻(𝐿) − ℎS# 𝐼‘𝐵 + ℎS# 𝐵
From the previous section and the “Boosted BlindCD” algorithm, we can see that the new version
of graph filter 𝐻 R (𝐿)𝐵 and boosted low-pass coefficient ƞ• can improve the performance of the
“BlindCD” algorithm. Then, by estimation error between 𝐻∗ and 𝐻(𝐿)𝐵 , we find that the
performance of the “Boosted BlindCD” algorithm can be improved by increasing the number of
observed graph signals L. Also, when we analyze the estimation error between (𝑆• ∗ , 𝐵 •∗ ) and
(𝐻R (𝐿)𝐵, ℎS# 𝐵), we also find that the increasing number of observed graph signals L and increasing
excitation rank R can help to improve the performance of the “Boosted BlindCD” algorithm. And
like what we did in the last section, smaller number of communities K and smaller boosted low-
pass coefficient ƞ• can also help to improve the performance of the “Boosted BlindCD” algorithm.
In the next section, we would like to apply the technique of GSP to another application of “Blind
identification of stochastic block models from dynamical observations”.
This report also tackles the problem of learning the network and its edges without seeing the
network structure. The two large ideas of the paper are representing the network as a stochastic
block model and taking samples of the dynamical system of the graph to reproduce the network.
This section of the project will go through the definition of a stochastic block model (SBM), define
the types of systems that Schaub et al. worked on in their paper, and describe the
methods/algorithms that they used to recreate the system.
In this paper, Schaub et al. decided to limit the types of stochastic block models to ones that
followed this condition on the minimum degree of the adjacency matrix.
This condition on the minimum degree condition means that it is more likely that nodes in the
same group also belong to the same connected component. This helps with the group recovery,
and if this was not established recovering groups would be very difficult
x = Lx
t+1 t (4.3)
In this application the graph is undirected, and the weights of the edges are binary in the non-
normalized adjacency matrix. They initialize the system with random node values from a Gaussian
distribution, for example the standard normal distribution.
In step one they are creating a sample covariance matrix from the s samples of the network. An
important thing to note here is that this sample covariance matrix is an approximation for the
graph’s actual covariance matrix. The actual covariance matrix is also a simple matrix power of
the normalized adjacency matrix; thus, the eigenvectors are the same between the two, but not the
eigenvalues. This means that the following steps give good information on the actual graph and is
the basis for why this algorithm should work.
Steps 2 through 4 take the eigenvectors corresponding to the top eigenvalues of the covariance
matrix and perform a k-means clustering on them to get the partition. They noted in the results that
they did the k-means multiple times to get the best results based on the objective function.
(4.4)
(4.5)
The algorithm, after solving a system of 3 equations will result in the affinity matrix which can be
used in addition to the partition recovery result to get an estimate of the affinity matrix. The
algorithm can be seen in appendix A.
One thing to note is the accuracy of the sample covariance matrix. They noted that the difference
between the sample covariance and the actual covariance depended on the time T that the samples
are taken, how many samples are taken and how large the graph is. Intuitively, with more samples
it will be easier to recover the graph and with a larger graph there will be more information to use
to recover the graph as well. However, the constraint on T, goes both ways. If sampled too early,
the graph signal has not had enough time to develop and will not fully represent the graph. If
sampled too late, the graph signal will be too similar as it moves to convergence, thus
differentiating nodes will be too difficult as well.
(5.1)
Where z is the fraction of node groups guessed correctly and
actual
We tried to replicate this experiment as closely as possible, starting by creating an SBM from their
exact number of groups and nodes in each group. We used a python library, networkx, that has a
function for creating a stochastic block model. We then took samples of the graph initialized at
different starting node values, from a standard normal distribution. Then created a function to do
the partition recovery algorithm. Inside our partition recovery we also performed the k-means
clustering multiple times to take the best result, but we did 30 instead of 10. One thing to note is
the condition on the minimum degree from equation 4.2. Because the graph is made by the pre-
made function, we could not force this condition to be true. We could, however, keep creating
random graphs until the condition is met. In our results we will compare what happens when the
condition met is weaker or stronger. Our code can be seen in appendix B.
ε = 0.01 ε = 0.1
dmin = 360 dmin = 296
Our implementation of the algorithm performed significantly worse than theirs. It is difficult to
extract a true trend from the results of this experiment, but we can see a slight difference when a
stricter condition is set for the minimum degree. The highest overlap score is greater when the
minimum degree is higher. The difference is not significantly larger, but it is possible that setting
ε to be even smaller, thus getting a higher minimum degree would make a difference.
Unfortunately, setting the condition higher takes too long to find a graph that satisfies it. The paper
when performing their experiment did not specify what their ε value was, which could be a large
factor in the difference in success. It may also be possible that our implementation of their
algorithm is slightly different leading to worse results, however if we worked with a ‘more solvable’
graph by having higher minimum degree we could achieve better results. For both results above
we used the same graph for every set of samples but varied the initial conditions for each sample.
This variation in initial condition plays a role in the randomness of the results. Another possible
source of randomness in the results is the k-means clustering. While we took the k-means 30 times
and found the best result, it is possible that between trials the k-means performed better in one than
another.
Appendix
[3] U. von Luxburg, “A tutorial on spectral clustering,” Statist. Comput., vol. 17, no. 4, pp. 395–
416, Dec. 2007.
[5] Schaub, M. T., Segarra, S., & Tsitsiklis, J. N. (2020). Blind identification of stochastic block
models from dynamical observations. SIAM Journal on Mathematics of Data
Science, 2(2), 335-367.