You are on page 1of 11

CS168: The Modern Algorithmic Toolbox

Lectures #11: Spectral Graph Theory, I


Tim Roughgarden & Gregory Valiant⇤
May 2, 2021

Spectral graph theory is the powerful and beautiful theory that arises from the following
question:

What properties of a graph are exposed/revealed if we 1) represent the graph as


a matrix, and 2) study the eigenvectors/eigenvalues of that matrix.

Most of our lectures thus far have been motivated by concrete problems, and the problems
themselves led us to the algorithms and more widely applicable techniques and insights that
we then went on to discuss. This section has the opposite structure: we will begin with
the very basic observation that graphs can be represented as matrices, and then ask “what
happens if we apply the linear algebraic tools to these matrices”? That is, we will begin with
a technique, and in the process of gaining an understanding for what the technique does, we
will arrive at several extremely natural problems that can be approached via this technique.
Throughout these lecture notes we will consider undirected, and unweighted graphs (i.e.
all edges have weight 1), that do not have any self-loops. Most of the definitions and
techniques will extend to both directed graphs, as well as weighted graphs, though we will
not discuss these extensions here.

1 Graphs as Matrices
Given a graph, G = (V, E) with |V | = n vertices, there are a number of matrices that we
can associate to the graph, including the adjacency matrix. As we will see, one extremely
natural matrix is the Laplacian of the graph:

Definition 1.1 Given a graph G = (V, E), with |V | = n, the Laplacian matrix associated
to G is an n ⇥ n matrix LG = D A, where D is the degree matrix—a diagonal matrix where
⇤c 2015–2021, Tim Roughgarden and Gregory Valiant. Not to be sold, published, or distributed without
the authors’ consent.

1
D(i, i) is the degree of the ith node in G, and A is the adjacency matrix, with A(i, j) = 1 if
and only if (i, j) 2 E. Namely,
8
>
<deg(i) if i = j
LG (i, j) = 1 if (i, j) 2 E
>
:
0 otherwise.

For ease of notation, in settings where the underlying graph is clear, we will omit the sub-
script, and simply refer to the Laplacian as L rather than LG .
To get a better sense of the Laplacian, let us consider what happens to a vector when we
multiply it by L: if Lv = w, then
X X
w(i) = deg(i)v(i) vj = (v(i) v(j)) .
j:(i,j)2E j:(i,j)2E

That is the ith element of the product Lv is the sum of the di↵erences between v(i) and the
indices of v corresponding to the neighbors of i in the graph G.
Building o↵ this calculation, we will now interpret the quadratic form, v t Lv, which will
prove to be the main intuition that we come back to when reasoning about the eigenval-
ues/eigenvectors of L:
X X
v t Lv = v(i) (v(i) v(j))
i j:(i,j)2E
X
= v(i) (v(i) v(j))
(i,j)2E
X
= v(i) (v(i) v(j)) + v(j) (v(j) v(i))
i<j:(i,j)2E
X
= (v(i) v(j))2 .
i<j:(i,j)2E

In other words, if one interprets the vector v as assigning a number to each vertex in
the graph G, the quantity v t Lv is exactly the sum of the squares of the di↵erences between
the values of neighboring nodes. Phrased alternately, if one were to place the vertices of
the graph on the real numberline, with the ith node placed at location v(i), then v t Lv is
precisely the sum of the squares of the lengths of all the edges of the graph.

2 The Eigenvalues and Eigenvectors of the Laplacian


What can we say about the eigenvalues of the Laplacian, 1  2  . . .  n ? Since L is
a real-valued symmetric matrix, all of its eigenvalues are real numbers, and its eigenvectors
are orthogonal to eachother. The calculation above showed that for any vector v, v t Lv =

2
P
i<j:(i,j)2E (v(i)v(j))2 . Because this expression is the sum of squares, we conclude that
v t Lv 0, and hence all the eigenvalues of the Laplacian must be nonnegative.
The eigenvalues of L contain information about the structure (or, more precisely, infor-
mation about the extent to which the graph has structure). We will see several illustrations
of this claim; this is a very hot research area, and there continues to be research uncovering
new insights into the types of structural information contained in the eigenvalues of the
Laplacian.

2.1 The zero eigenvalue


Theplaplacian p always has at least one eigenvalue that is 0. P
To see this, consider theP
vector
pv =
(1/ n, . . . , 1/ n), and recall that the ith entry of Lv is j:(i,j)2E v(i) v(j) = (1/ n
p
1/ n) = 0 = 0 · v(i). The following theorem shows that the multiplicity of the zeroth
eigenvalue reveals the number of connected components of the graph.

Theorem 2.1 The number of zero eigenvalues of the Laplacian LG (i.e. the multiplicity of
the 0 eigenvalue) equals the number of connected components of the graph G.
Proof: We first show that the number of zero eigenvalues is at least the number of connected
components of G. Indeed, assume that G has k connected components, corresponding to p the
partition of V into disjoint sets S1 , . . . , Sk . Define k vectors v1 , . . . , vk s.t. vi (j) = 1/ |Si |
if j 2 Si , and 0 otherwise. For i = 1, . . . , k, it holds that ||vi || = 1. Additionally, for i 6= j,
because the sets Si , Sj are disjoint, hvi , vj i = 0. Finally, note that Lvi = 0. Hence there is a
set of k orthonormal vectors that are all eigenvectors of L, with eigenvalue 0.
To see that the number of P 0 eigenvalues is at most the number of connected components
of G, note that since v Lv = i<j,(i,j)2E (v(i) v(j))2 , this expression can only be zero if v
t

is constant on every connected component. To see that there is no way of finding a k + 1st
vector v that is a zero eigenvector, orthogonal to v1 , . . . , vk observe that any eigenvector, v
must be nonzero in some coordinate, hence assume that v is nonzero on a coordinate in set
Si , and hence is nonzero and constant on all indices in set Si , in which case v can not be
orthogonal to vi ., and there can be no k + 1st eigenvector with eigenvalue 0. ⌅

2.2 Intuition of lowest and highest eigenvalues/eigenvectors


To think about the smallest eigenvalues
P and their associated eigenvectors, it is helpful to
come back to the fact that v Lv = (i,j)2E,i<j (v(i) v(j))2 is the sum of the squares of the
t

distances between neighbors. Hence the eigenvectors corresponding to the lowest eigenvalues
correspond to vectors for which neighbors have similar values. Eigenvectors with eigenvalue
0 are constant on each connected component, and the smallest nonzero eigenvector will be
the vector that minimizes this squared distance between neighbors, subject to being a unit
vector that is orthogonal to all zero eigenvectors. That is, the eigenvector corresponding to
the smallest non-zero eigenvalue will be minimizing this sum of squared distances, subject
to the constraint that the values are somewhat spread out on each connected component.

3
Analogous reasoning applies to the maximum eigenvalue. The maximum eigenvalue
max||v||=1 v t Lv will try to maximize the discrepancy between neighbors’ values of v.
To get a more concrete intuition for what the small/large eigenvalues and their associated
eigenvectors look like, you should try to work out (either in your head, or using a computer)
the eigenvectors/values for lots of common graphs (e.g. the cycle, the path, the binary tree,
the complete graph, random graphs, etc.) See the examples/figure below. [The first part of
this week’s miniproject will have you examine a number of basic graphs.]

4
0.4 0.4

0.3 0.3

second−largest eigenvector
0.2 0.2
3nd eigenvector

0.1 0.1

0 0

−0.1 −0.1

−0.2 −0.2

−0.3 −0.3

−0.4 −0.4
−0.4 −0.2 0 0.2 0.4 −0.4 −0.2 0 0.2 0.4
2nd eigenvector largest eigenvector

0.1 0.15

0.1

0.05 second−largest eigenvector 0.05


3nd eigenvector

0
0
−0.05

−0.1
−0.05

−0.15

−0.1 −0.2
−0.1 −0.05 0 0.05 0.1 −0.1 −0.05 0 0.05 0.1
2nd eigenvector largest eigenvector

Figure 1: Spectral embeddings of the cycle on 20 nodes (top two figures), and the 20 ⇥ 20
grid (bottom two figures). Each figure depicts the embedding, with the red lines connecting
points that are neighbors in the graph. The left two plots show the embeddings onto the
eigenvectors corresponding to the second and third smallest eigenvalues; namely, the ith
node is plotted at the point (v2 (i), v3 (i)) where vk is the eigenvector corresponding to the
kth largest eigenvalue. The right two plots show the embeddings onto the eigenvectors
corresponding to the largest and second–largest eigenvalues. Note that in the left plots,
neighbors in the graph are close to each other. In the right plots, most points end up far
from all their neighbors.

5
3 Applications of Spectral Graph Theory
We briefly describe several problems for which considering the eigenvalues/eigenvectors of a
graph Laplacian proves useful.

3.1 Visualizing a graph: Spectral Embeddings


Suppose one is given a list of edges for some graph. What is the right way of visualizing, or
drawing the graph so as to reveal its structure? Ideally, one might hope to find an embedding
of the graph into Rd , for some small d = 2, 3, . . .. That is, we might hope to associate some
point, say in 2 dimension, to each vertex of the graph. Ideally, we would find an embedding
that largely respects the structure of the graph. What does this mean? One interpretation
is simply that we hope that most vertices end up being close to most of their neighbors.
Given this interpretation, embedding a graph onto the eigenvectors corresponding to the
small eigenvalues seems extremely natural. Recall that the small P eigenvalues correspond to
t 1
unit vectors v that try to minimize the quantity v Lv = 2 (i,j)2E (v(i) v(j)), namely,
these are the vectors that are trying to map neighbors to similar values. Of p course, ifpthe
graph has a single connected component, the smallest eigenvector v1 = (1/ n, . . . , 1/ n),
which is not helpful for embedding, as all points have the same value. In this case, we should
start by considering v2 , then v3 ,et.
While there are many instances of graphs for which the spectral embedding does not make
sense, it is a very natural first thing to try. As Figure 1 illustrates, in some cases the embed-
ding onto the smallest two eigenvectors really does correspond to the “natural”/“intuitive”
way to represent the graph in 2-dimensions.

3.2 Spectral Clustering/Partitioning


How can one find large, insular clusters in large graphs? How can we partition the graph
into several large components so as to minimize the number of edges that cross between the
components? These questions questions arise naturally in the study of any large networks
(social networks, protein interaction networks, semantic/linguistic graphs of word usage,
etc.)
In the next lecture, we will discuss some specific metrics for the quality of a given cluster
or partition of a graph, and give some quantitative bounds on these metrics in terms of the
second eigenvalue of the graph Laplacian.
For the time being, just understand the intuition that the eigenvectors corresponding
to small eigenvalues are, in some sense, trying to find good partitions of the graph. These
low eigenvectors are trying to find ways of assigning di↵erent numbers to vertices, such that
neighbors have similar values. Additionally, since they are all orthogonal, each eigenvectors
is trying to find a “di↵erent”/“new” such partition. This is the intuition why it is often a
good idea to look at the clusters/partitions suggested by a few di↵erent small eigenvectors.
[This point is especially clear on the next homework.]

6
3.3 Graph Coloring
Consider the motivating example of allocating one of k di↵erent radio bandwidths to each
radio station. If two radio stations have overlapping regions of broadcast, then they cannot
be assigned the same bandwidth (otherwise there will be interference for some listeners). On
the other hand, if two stations are very far apart, they can broadcast on the same bandwidth
without worrying about interference from each other. This problem can be modeled as the
problem of k-coloring a graph. In this case, the nodes of the graph will represent radio
stations, and there will be an edge between two stations if their broadcast ranges overlap.
The problem then, is to assign one of k colors (i.e. one of k broadcast frequencies) to each
vertex, such that no two adjacent vertices have the same color. This problem arises in several
other contexts, including various scheduling tasks (in which, for example, tasks are mapped
to processes, and edges connect tasks that cannot be performed simultaneously).
This problem of finding a k-color, or even deciding whether a k-coloring of a graph exists,
is NP-hard in general. One natural heuristic, which is motivated by the right column of Fig-
ure 1, is to embed the graph onto the eigenvectors corresponding to the highest eigenvalues.
Recall that these eigenvectors are trying to make vertices as di↵erent from their neighbors
as possible. As one would expect, in these embeddings, points that are close together in
the embedding tend to not be neighbors in the original graph. Hence one might imagine
a k-coloring heuristic as follows: 1) plot the embedding of the graph onto the top 2 or 3
eigenvectors of the Laplacian, and 2) locally partition the points in this space into k regions,
e.g. using k-means, or a kd-tree, and 3) assign all the points in each region the same color.
Since neighbors in the graph will tend to be far apart in the embeddings, this will tend to
give a decent coloring of the graph.

7
CS168: The Modern Algorithmic Toolbox
#12: Spectral Graph Theory, Part 2
Tim Roughgarden & Gregory Valiant⇤
May 5, 2021

In last lecture, we introduced the notion of the Laplacian matrix, L, associated to a graph.
We defined it as L = D A, where D is the diagonal matrix with element Di,i defined as
the degree of the ith node, and A as the adjacency matrix of the graph. The key insight,
which provided an intuitive understanding of the eigenvectors/eigenvalues of the Laplacian,
was the calculation characterizing the associated quadratic form v T Lv as corresponding to
assigning value v(i) to the ith node in the graph, and then summing the squares of the
di↵erences in entries, across all edges of the graph:
X
v T LV = (vi vj )2 .
(i,j)2edges

This characterization led to our understanding that the eigenvectors of L with small eigen-
value tend to give similar values to neighboring nodes, and the eigenvectors with largest
eigenvalues tend to give di↵erent values to neighboring nodes. These observations motivated
using spectral embeddings that use the small eigenvectors for applications like 1) visual-
ization, or 2) “spectral clustering”. In contrast, embedding onto the largest eigenvectors
is useful for applications like k-coloring, where neighboring vertices should be assigned to
di↵erent sets/colors.

1 Conductance, isoperimeter, and the second eigen-


value
Last lecture, we leveraged the characterization of the quadratic form v T Lv to prove that the
multiplicity of the zero eigenvalue is the number of connected components. Based on this,
it seems intuitively clear that if 2 is extremely small, then the graph might be “close” to
having two connected components, in the sense that there is a way of partitioning the nodes
of the graph into two sets, with very few edges crossing from one set to the other. We now
formalize this connection between 2 , and the quality of the best such partition.
⇤c 2015–2021, Tim Roughgarden and Gregory Valiant. Not to be sold, published, or distributed without
the authors’ consent.

1
There are several natural metrics for quantifying the quality of a graph partition. We
will state two such metrics. First, it will be helpful to define the boundary of a partition:

Definition 1.1 Given a graph G = (V, E), and a set S ⇢ V , the boundary of S, denoted
(S) is defined to be the set of edges of G with exactly one endpoint in set S.
One natural characterization of the quality of a partition (S, V \S) is the isoperimetric
ratio of the set, defined to be the ratio of the size of the boundary of S to the minimum of
|S| and |V \S| :

Definition 1.2 The isoperimetric ratio of a set S, denoted ✓(S), is defined as


| (S)|
✓(S) = .
min(|S|, |V \S|)
The isoperimetric number of a graph G is defined as ✓G = minS⇢V ✓(S).
A related notion of the quality of a partition, is the conductance, which is the ratio of
the size of the boundary (S) to the minimum of the number of edges involved in S and the
number involved in V \S (where an edge is double-counted if both its endpoints lie within
the set). Formally, this is the following:

Definition 1.3 The conductance of a partition of a graph into two sets, S, V \S, is defined
as
| (S)|
cond(S) = P P .
min( i2S degree(i), i62S degree(i))

1.1 Connections with 2

There are connections between the second eigenvalue of a graph’s Laplacian and both the
isoperimetric number as well as the minimum conductance of the graph. The following
surprisingly easily proved theorem makes the first connection rigorous:
|S|
Theorem 1.4 Given any graph G = (V, E) and any set S ⇢ V , ✓(S) 2 (1 |V |
). In other
words, the isoperimetric number of the graph satisfies:
|S|
✓G 2 (1 ).
|V |
Proof: Given a set S, consider the associated vectors vS defined as follows:
( |S|
1 |V |
if i 2 S
vS (i) = |S|
|V |
if i 62 S
t Lv
vS S
We now compute vStv . First, we consider the numerator:
X
vSt Lvs = (vS (i) vS (j))2 = | (S)|,
i<j:(i,j)2E

2
since the only terms in this sum that contribute a non-zero term correspond to edges with
exactly one endpoint in set S, namely the boundary edges. We now calculate vSt vS =
|S| 2 |S| 2 |S|
|S|(1 |V |
) + (|V | |S|)( |V |
) = |S|(1 |V |
).
t Lv
vS S | (S)|
Hence we have shown that tv
vS
= |S| . On the other hand, we also know that
S |S|(1 |V | )
P
i vS (i) = 0, hence hvS , (1, 1 . . . , 1)i = 0. Recall that

v t Lv vSt LvS | (S)|


2 = min t
 t
= |S|
.
v:hv,(1,...,1)i=0 v v vS vS |S|(1 |V )
|

Multiplying both sides of this inequality by (1 |S|


|V |
) yields the theorem. ⌅

What does Theorem 1.4 actually mean? It says that if 2 is large (say, some constant
bounded away from zero), then all small sets |S| have lots of outgoing edges—linear in
|S|. For example, if 2 1/2, then for all sets S with |S|  |V |/4, we have that ✓(S)
1
2
(1 1/4) = 3/8, which implies that | (S)| 38 |S|.
There is a partial converse to Theorem 1.4, known as Cheeger’s Inequality, which argues
that if 2 is small, then there exists at least one set S, such that the conductance of the
set S is also small. Cheeger’s inequality is usually formulated in terms of the
p eigenvalues of
the normalized Laplacian matrix,defined by normalizing entry L(i, j) by 1/ deg(i) · deg(j).
Rather than formally defining the normalized Laplacian, we will simply state the theorem
for regular graphs (graphs where all nodes have the same degree):

Theorem 1.5 (Cheeger’s Inequality) If 2 is the second smallest eigenvalue of the Lapla-
cian of a d-regular graph G = (V, E), then there exists a set S ⇢ V such that
p
2 2 2
 cond(S)  p .
2d d

1.2 Beyond 2?

Is there an analog of Theorem 1.4 and Cheeger’s inequality that applies to 3 , or higher
eigenvalues? If k = 0, we know the graph has k connected components, and hence it is
tempting to conclude that if k is nonzero, but small, the graph should have a partition into
k pieces with few edges crossing between partitions. In some sense, this has been well-known
empirically, though it was only in the past decade that rigorous analogs of “higher order”
Cheeger’s inequality were established [1], and this is an area of active research.

2 Random walks and di↵usion on graphs


Beyond their use for graph partitioning and visualization, the eigenvalues and eigenvectors
of a graph arise naturally when considering random processes on graph. Below, we describe
one concrete model of di↵usion over a graph, and give a high-level discussion of how the

3
eigenvalues and eigenvectors connect to this process. Next week, when we discuss Markov
Chains, we will expand on some of these connections.
Consider the following basic model of di↵usion, which models a variety of biological
phenomena, as well as how beliefs, political views, cultural norms, etc. spread in a social
network: Let A denote the adjacency matrix of a graph, G = (V, E) with |V | = n, and
assume that at time t = 1, the vector v1 has entries corresponding to the initial views of
each of the n nodes. (For example, let v1 (i) 2 (0, 1) represent how strongly person i believes
that people should wear masks in public spaces.) Consider the following dynamics: at each
time t > 1, the views of each node are replaced by the average of the views that their
neighbors had at time t 1. Formulating this update as a matrix product, we have:

vt+1 = D 1 Avt ,

where D 1 denotes the diagonal matrix whose ith diagonal entry is 1/deg(i).
These dynamics do not necessarily converge in the limit as t gets large: for a network with
just two people with an edge between them, they will “swap” their views at each timestep.
Nevertheless, if the dynamics do eventually converge to a vector, v, we know that vector
must satisfy v = D 1 Av, and hence v will be an eigenvector of D 1 A with eigenvalue 1!!
In the case where this convergence happens, how quickly will it be expected to occur?
Namely, how large a value of t will be necessary for vt ⇡ v? Recall from our analysis of the
power iteration algorithm from Lecture 8, that the di↵usion update defined above is simply
performing the power iteration algorithm that computes the top eigenvector of D 1 A—the
only di↵erence is that in the power iteration algorithm, we choose a uniformly random unit
vector as the initial vector, whereas in the above dynamics, the initial vector v1 corresponds
to the initial “views” of the corresponding nodes. In our analysis of the power iteration
algorithm, we saw that the ratio of the largest to second largest eigenvalues determined how
quickly we would expect the dynamics to converge to the top eigenvector. (If the eigenvalues
are very close, then we will need to run for more iterations.)
At a high level, if the dynamics are run on a well-connected graph, then 1) they will
converge quickly (if they do actually converge) 2) the ratio of the largest to second largest
eigenvalue of D 1 A will be large, and 3) the second smallest eigenvalue of the Laplacian
will be large. Conversely, if the graph is not well-connected (e.g. a graph consisting of two
clusters with only a single edge crossing between them), then the dynamics will converge
slowly, the second largest eigenvalue of D 1 A will be similar to the largest, and the second-
smallest eigenvalue of the graph Laplacian will be close to zero. These high-level principles
will be useful to keep in mind next week, as we discuss Markov Chains and Markov Chain
Monte Carlo (MCMC).

References
[1] James R Lee, Shayan Oveis Gharan, and Luca Trevisan. Multiway spectral partitioning
and higher-order Cheeger inequalities. Journal of the ACM (JACM), 61(6):37, 2014.

You might also like