You are on page 1of 43

The Shannon Capacity of a Graph

Femke Bekius
July 22, 2011

Bachelor Thesis
Supervisor: prof. dr. Alexander Schrijver

KdV Instituut voor wiskunde


Faculteit der Natuurwetenschappen, Wiskunde en Informatica
Universiteit van Amsterdam
Abstract
This thesis focuses on the Shannon capacity of a graph. Sup-
pose we want to send a message across a channel to a receiver.
The general question is about the effective size of an alphabet
in a model such that the receiver may recover the original mes-
sage without errors. To answer this question Shannon defined the
Shannon capacity of a graph, Θ(G), and stated Shannon’s Theo-
rem. We study an article of Lovász [5] where he determined the
Shannon capacity of the cycle graph C5 and introduced the Lovász
Number, an upper bound for the Shannon capacity. About the
Lovász Number we define some formulas and prove a couple of
theorems. In the last chapter we consider three problems Lovász
stated at the end of the article. The problem is that determining
the Shannon capacity of a graph, even for very simple graphs,
is very difficult. Due to this determining Θ(C7 ) is still an open
problem.

Data
Title: The Shannon Capacity of a Graph
Author: Femke Bekius, Femke.Bekius@student.uva.nl, 5823390
Supervisor: prof. dr. Alexander Schrijver
Second assessor: prof. dr. Monique Laurent
Enddate: July 22, 2011

Korteweg de Vries Instituut voor Wiskunde


Universiteit van Amsterdam
Science Park 904, 1098 XH Amsterdam
http://www.science.uva.nl/math
Contents

Introduction 2

1 Graph Theory, Linear Algebra and Shannon’s Theorem 4


1.1 Graph Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2 Linear Algebra . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.3 Shannon’s Theorem . . . . . . . . . . . . . . . . . . . . . . . . 9
1.3.1 Perfect Graphs . . . . . . . . . . . . . . . . . . . . . . 11
1.3.2 Even and odd Cycle Graphs Cn . . . . . . . . . . . . . 12

2 The capacity of the pentagon 14

3 Formulas for ϑ(G), an upper bound for Θ(G) 16

4 Further results on the Shannon capacity 31

5 Conclusion 36

Popular summary 37

1
Introduction

Suppose we want to send a message across a channel to a receiver. Dur-


ing the transmission it is possible that the message changes because of noise
on the channel. An interesting question is: What is the maximum rate of
transmission such that the receiver may recover the original message with-
out errors? In 1956 Shannon asked himself if it is possible to calculate the
zero-error capacity of a certain communication model such that he could say
something about this maximum rate [1]. For answering this question he de-
fined the Shannon capacity of a graph. It can be seen as an information
theoretical parameter which represents the effective size of an alphabet in a
communication model represented by a graph G(V, E).
The Shannon capacity attracted some interest in the field of Information
Theory and in the scientific community, because of the applications to com-
munication issues. There are also connections with some central combinato-
rial and computational questions in graph theory, like computing the largest
clique and finding the chromatic number of a graph [2]. Unfortunately deter-
mining the Shannon capacity of an arbitrary graph is a very difficult problem.
Even for the simple cycle graph C7 , or more generally, for the cycle graph
Cn with n odd, the Shannon capacity is still unknown.
In this thesis we study the article of Lovász, published in 1979, where he
explained what is known about the Shannon capacity of graphs. √ He proved
that the Shannon capacity of the the cycle graph C5 equals 5 and intro-
duced the Lovász Number about which he proved a number of formulas and
theorems.
In the first chapter we give some important definitions and theorems which
we will need for the rest of the thesis. We start with a short introduction
in graph theory and then explain some concepts of linear algebra before we
state Shannon’s Theorem. Since it is not possible to determine the Shannon
capacity of every graph exactly, Shannon’s Theorem gives us an upper and
a lower bound for the Shannon capacity. After that, by using Shannon’s
Theorem, we determine the Shannon capacity of some simple cycle graphs.
Related to this we say something about an apart collection of graphs, the so

2
called Perfect Graphs. In chapter 2 we use Lovász technique to determine
the Shannon capacity of C5 . For long time this was an open problem and
therefore this is a very important result.
Thereafter we proceed with the Lovász Number, which is another upper
bound for the Shannon capacity, about which we prove a couple of formulas
and properties. At the end we prove that the Lovász Number is indeed a
smaller or equal upper bound for the Shannon capacity than the one Shannon
found. In this chapter we use, among others, the techniques introduced in
chapter 1. In the last chapter we consider the three problems Lovász stated
at the end of his article for which Haemers found a counterexample.

3
Chapter 1

Graph Theory, Linear Algebra


and Shannon’s Theorem

In this chapter we give some definitions, theorems and lemmas of the con-
cepts needed for the proofs in the rest of the thesis. If there is no reference to
an article or book we used the definition, theorem, lemma, example or corol-
lary from Lovász [5]. First we start with some definitions in graph theory
and describe some important concepts of linear algebra. In the last section
we state Shannon’s Theorem and define the perfect graphs, for which the
Shannon capacity is easy to determine. At the end we consider the cycle
graphs Cn .

1.1 Graph Theory


Definition 1.1. (Graph) An undirected graph is a pair G = (V, E), where
V is a finite set and E is a family of unordered pairs from V , we denote
E ⊆ {{i, j} | i, j ∈ V, i 6= j}. The elements of V are called the vertices and
the elements of E are called the edges [6].
Let G be an undirected graph without loops with on the vertices the letters
of the alphabet. Two vertices of G are adjacent if they are either connected
by an edge or are equal. The letters on the vertices can be confused if the
vertices are adjacent [5].
Definition 1.2. (Complementary graph) The complementary graph G of
G has V (G) = V (G) and two vertices in G are connected if they are not
connected in G.
Definition 1.3. (Induced and Complete subgraph) Let G = (V, E) and
H = (W, F ) be graphs. Then H is a subgraph of G if W ⊆ V and F ⊆ E.

4
H is an induced subgraph of G if W ⊆ V and F = {e ∈ E | e ⊆ W }.
H is a complete subgraph of G if H is an induced subgraph of G with the
additional feature that every two vertices in H are connected by an edge. A
complete subgraph is a clique, see definition 1.20.

Definition 1.4. (Maximum independent set) An independent set in a graph


is a set of pairwise nonadjacent vertices. A maximum independent set con-
sists of the maximum number of pairwise nonadjacent vertices and its size is
denoted by α(G) [7].

The maximum independent set, α(G), is the maximum number of 1-letter


messages which can be sent without danger of confusion [5]. In other words,
such that the receiver knows whether the received message is correct or not.

Definition 1.5. (Strong product) The strong product G1 G2 of two graphs
G1 (V1 , E1 ) and G2 (V2 , E2 ) has vertex set V1 × V2 = {(u1 , u2 ) : u1 ∈ V1 , u2 ∈
V2 } with (u1 , u2 ) 6= (v1 , v2 ) adjacent if and only if ui = vi or ui vi ∈ Ei for
i = 1, 2.
The k-strong product of graph G, Gk = G  G  . . .  G, with vertex set
V k = {(u1 , . . . , uk ) : ui ∈ V } with (u1 , . . . , uk ) 6= (v1 , . . . , vk ) adjacent if and
only if ui = vi or ui vi ∈ Ei ∀i [1].

It follows that α(Gk ) is the maximum number of k-letter messages which can
be sent without danger of confusion. There are at least α(G)k of such words.
Two k-letter messages are confoundable if for all 1 ≤ i ≤ k their ith letters
are adjacent.

Theorem 1.6. α(Gk ) ≥ α(G)k

Proof. Let U ⊆ V (G) be a maximum independent set of vertices in G, so


| U |= α(G). The α(G)k vertices in Gk of the form (u1 , . . . , uk ), ui ∈ U ∀i
0 0
clearly form an independent set in Gk . Because if (u1 , . . . , un ), (u1 , . . . , un ) are
0 0 0
distinct and (u1 , . . . , un ), (u1 , . . . , un ) ∈ U ×. . .×U , then ∃i : ui 6= ui for which
0 0 0 0
is ui , ui ∈ U . Therefore ui , ui are not adjacent and so (u1 , . . . , un ), (u1 , . . . , un )
are not adjacent. Hence α(Gk ) ≥ α(G)k .

Example 1.7. For the cycle graph C5 , also called the pentagon, α(C52 ) = 5.
In fact, if v1 , . . . , v5 are the vertices in a cyclic way, then the 2-letter messages
v1 v1 , v2 v3 , v3 v5 , v4 v2 and v5 v4 are nonconfoundable. In the figure on the next
page the red colored vertices form the maximum independent set of α(C52 ).

5
Figure 1.1: Strong product C5  C5

1.2 Linear Algebra


In this thesis all vectors are column vectors, I denotes the identity matrix,
J denotes the square matrix where all entries equal one and j is the vector
with all entries equal to one. The inner product of vectors v, w is denoted
by v T w.
Definition 1.8. (Tensor product) If v = (v1 , . . . , vn ) and w = (w1 , . . . , wm )
then v ◦ w is the vector (v1 w1 , . . . , v1 wm , v2 w1 , . . . , vn wm )T of length nm.
It follows from the computation below that the definitions above are con-
nected by
(x ◦ y)T (v ◦ w) = (xT v)(y T w). (1.1)
Indeed, let

x = (xi )ni=1 , y = (yj )m n m


j=1 , v = (vi )i=1 , w = (wj )j=1 .

yj )n,m
Then x ◦ y = (xiP i=1,j=1 )n,m
, v ◦ w = (vi wjPi=1,j=1 and
(x◦y) (v◦w) = i=1 j=1 xi yj vi wj = ( ni=1 xi vi )( m
T n m T T
P P
j=1 yj wj ) = (x v)(y w).

Definition 1.9. (Orthonormal representation) An orthonormal representa-


tion of G is a collection of unit vectors (u1 , . . . , un ) in a Euclidean space
such that if i and j are nonadjacent vertices, then ui and uj are orthogonal.
Every graph has an orthonormal representation.
Lemma 1.10. Let (u1 , . . . , un ) and (v 1 , . . . , v m ) be orthonormal represen-
tations of G and H respectively. Then the vectors (ui ◦ v j | i = 1, . . . n, j =
1, . . . m) form an orthonormal representation of G  H.

6
Proof. Let (i1 , i2 ), (j1 , j2 ) ∈ V (G) × V (H) be nonadjacent in G  H; that is,
i1 , j1 are nonadjacent in G or i2 , j2 are nonadjacent in H. Then uTi1 uj1 = 0
or v Ti2 uj2 = 0. From this we have (ui1 ◦ v i2 )T (uj1 ◦ v j2 ) = 0. On the other
hand, if (i1 , i2 ), (j1 , j2 ) ∈ V (G) × V (H) adjacent or equal in G  H; that is,
i1 , j1 adjacent or equal in G and i2 , j2 adjacent or equal in H. Hence from
(1.1) follows that ui ◦ v j is an orthonormal representation of G  H.
The value of an orthonormal representation (u1 , . . . , un ) is defined to be
1
min max ,
c 1≤i≤n (cT ui )2

where c ranges over all unit vectors and the minimum of all these c is called
the handle of the representation [5].

Definition 1.11. (Lovász Number) ϑ(G) is the minimum value over all the
representations mentioned above, i.e.,
1
ϑ(G) = min min max .
u1 ,...,un c 1≤i≤n (cT ui )2

A representation is optimal if it achieves this minimum value.

Definition 1.12. (Positive semidefinite matrix) A matrix M is positive


semidefinite (p.s.d.) if there is a matrix X such that M = X T X or, equiva-
lently, if M is symmetric and all eigenvalues are non-negative.

Another definition of ϑ(G) is


n
X
ϑ(G) = max{ Xij | X = (xij )ni,j=0 p.s.d. , Tr(X) = 1, xij = 0 if {i, j} ∈ E(G)}
i,j=1

Lemma 1.13. ϑ(G  H) ≤ ϑ(G)ϑ(H).

Proof. Let (u1 , . . . , un ) and (v 1 , . . . , v m ) be optimal orthonormal represen-


tations of G and H, with handles c and d respectively. Then c ◦ d is a unit
vector by definition and (ui ◦ v j ) is an orthonormal representation of G  H
by lemma 1.10. Hence,
1 1 1
ϑ(G  H) ≤ max = max · T
= ϑ(G)ϑ(H).
i,j ((c ◦ d)T (ui ◦ v j ))2 i,j (cT ui )2 (d v j )2

Remark. In chapter 3 we will prove that equality holds in Lemma 1.13.

7
Lemma 1.14. α(G) ≤ ϑ(G)
Proof. Let (u1 , . . . , un ) be an optimal orthonormal representation of G with
handle c. Let {1, . . . , k} be a maximum independent set in G. Then u1 , . . . , uk
are pairwise orthogonal. Extends this to an orthonormal representation
0 0 0
(u1 , . . . , un ) where ui = ui if i = 1, . . . k. Then,
n k
2 T
X
T 0 2
X k α(G)
1=c =c c= (c ui ) ≥ (cT ui )2 ≥ = .
i=1 i=1
ϑ(G) ϑ(G)

Hence α(G) ≤ ϑ(G).


Definition 1.15. (Adjacency matrix) The adjacency matrix of graph G with
vertex set {1, . . . , k} is a k × k-matrix where the entry (i, j) equals one if i
is adjacent to j and equals 0 when i = j or i is nonadjacent to j.
We write Au,v := number of edges connecting u and v for u, v ∈ V (G) [6].
The adjacency matrix is symmetric if the graph is undirected.
In this thesis we consider undirected graphs G, therefore the adjacency matrix
of G is always symmetric in the next theorems and lemmas.
Remark. Lovász used another definition of adjacency, when i = j he called
the vertices also adjacent. In the next proofs we will use the definition of
Lovász but when we consider an adjacency matrix we will use the defintion
above. In other cases it will be clear from the context what is meant.
 
0 1 0 0 1
1 0 1 0 0
 
Example 1.16. The adjacency matrix for the 5-cycle C5 is A =  0 1 0 1 0.
0 0 1 0 1
1 0 0 1 0
Lemma 1.17. Let X and Y be matrices, then Tr(XY ) =Tr(Y X).
Proof.
n n n
! n n
!
X X X X X
Tr(XY ) = (XY )i,i = Xi,j Yj,i = Xi,j Yj,i
i=1 i=1 j=1 j=1 i=1
n n
! n
X X X
= Yj,i Xi,j = (Y X)j,j = Tr(Y X).
j=1 i=1 j=1

8
1.3 Shannon’s Theorem
In 1959 Shannon introduced the Shannon capacity of a graph [5].

Definition 1.18. (Shannon capacity) The Shannon capacity of a graph is


defined by p
Θ(G) = sup k α(Gk )
k
p p
By Fekete’s lemma we prove that Θ(G) = supk k
α(Gk ) = limk→∞ k α(Gk ).

Lemma 1.19. [6] Let a1 , a2 , . . . be a sequence of reals such that an+m ≥


an + am for all positive n, m ∈ Z. Then
an an
lim = sup
n→∞ n n n

Proof. For all i, j, k ≥ 1 we have ajk+i ≥ ajk + ai ≥ ak + ak + . . . + ak +ai =


| {z }
j times
jak + ai . So, for all fixed i, k ≥ 1 we have
   
ajk+i jak + ai
lim inf ≥ lim inf
j→∞ jk + i j→∞ jk + i
 
jak ai
= lim inf +
j→∞ jk + i jk + i
 
ak jk ai
= lim inf +
j→∞ k jk + i jk + i
ak
= .
k
Since this is true for each i, we have for fixed k ≥ 1,
a   
n ajk+i
lim inf = inf lim inf
n→∞ n i=1,...,k j→∞ jk + i
a  a
k k
= inf = .
i=1,...,k k k

Therefore, lim inf n→∞ ann ≥ supk akk .




And this implies


an an
lim = sup .
n→∞ n n n

9
Using the multiplicative version, an+m ≥ an am for all positive n, m ∈ Z and
an > 0 ∀n, we have

log an+m ≥ log(an am ) = log an + log am .


| {z } | {z } | {z }
bn+m bn bm

If we use bn+m ≥ bn + bm and the proof of Lemma 1.19 we get


√ √
lim k ak = sup k ak
k→∞ k

and since α(Gp+l ) ≥ α(Gp )α(Gl ),


p p
lim k α(Gk ) = sup k α(Gk )
k→∞ k

as Shannon defined.

From the previous definition and by Theorem 1.6 it is seen that


p p
Θ(G) = sup k α(Gk ) ≥ sup k α(G)k = sup α(G) = α(G)
k k k

Hence, Θ(G) ≥ α(G) and we conclude that α(G) is a lower bound for the
Shannon capacity of G.

Even for simple graphs it is difficult to determine the Shannon capacity.


Shannon proved that Θ(G) = α(G) for graphs which can be covered by α(G)
cliques. Examples of such graphs are the perfect graphs [5], in section 1.3.1
we will give an explanation.
Definition 1.20. (Clique) A clique in a graph G is a set of pairwise adjacent
vertices [7].
Definition 1.21. (Fractional vertex packing) A fractional vertex packing of
P is a function w : V (G) → R+ such that for all cliques C applies
G
x∈C w(x) ≤ 1.

A general upper bound on Θ(G) was given by Shannon. This upper bound
is denoted by α∗ (G), which is the maximum of x∈V w(x) taken over all
P
fractional vertex packings w [5].

The Duality Theorem of Linear Programming states, for A ∈ Rm×n , b ∈


Rm , c ∈ Rn , the following:

max{cT x | x ≥ 0, Ax ≤ b} = min{y T b | y ≥ 0, y T A ≥ cT }

10
provided that ∃x ≥ 0 : Ax ≤ b and ∃y ≥ 0 : y T A ≥ cT ,[6].
Consider a matrix A whoserows are indicated by cliques, columns by vertices
1 if v ∈ C
and whose (C, v)-entry =
  0 if v∈
/ C,
1   1  
 ..   .. 
and vectors c =  .  V (G) , b =  .  P (G),
 
1  1 
here P (G) is the collection cliques in G, C ∈ P (G) and v ∈ V (G).
With this theorem α∗ (G) can be defined dually as P follows: assign nonnega-
P C of G such that x∈C q(C) ≥ 1 for each
tive weights q(C) to the cliques
point x of G and minimize C q(C).

With this notation Shannons theorem states,

α(G) ≤ Θ(G) ≤ α∗ (G).

1.3.1 Perfect Graphs


As is said before, it is difficult to determine the Shannon capacity of a graph,
even for very simple graphs. In the next chapter we will see√how Lovász
proved that the Shannon capacity of the cycle graph C5 equals 5. However
there is a collection of graphs for which the Shannon capacity is easy to
determine, the so called Perfect Graphs.

Definition 1.22. (Vertex-coloring) Vertex-coloring, or coloring, is a parti-


tion of V into maximum independent sets. Each of the maximum indepen-
dent sets is called a color of the coloring. The vertex-coloring number χ(G)
is the minimum number of colors in a vertex-coloring [6].

Definition 1.23. (Perfect Graph) A graph G is perfect if χ(H) = ω(H) for


every induced subgraph H of G, with χ(H) the minimum size of coloring
and ω(H) the maximum size of a clique [7].

For perfect graphs we have α(G) = α∗ (G) and by Shannon’s Theorem we


conclude that α(G) = Θ(G). To prove this we need another theorem.

Theorem 1.24. G is perfect ⇐⇒ G is perfect.

Proof. See [4] for a proof.

Theorem 1.25. If G is perfect, then α(G) = α∗ (G).

11
Proof. If G is perfect, then G is perfect. Then ω(G) = χ(G) by definition
and it follows that α(G) = χ(G) = χ(G), because ω(G) = α(G).
In general α(G) ≤ α∗ (G) ≤ χ(G), since

α(G) = max{| S || S ⊆ V (G), S an independent set in G}


= max{cT x | x is (0, 1) − vector in RV (G) , Ax ≤ b}
≤ max{cT x | x ≥ 0, Ax ≤ b}
= min{y T b | y ≥ 0, y T A ≥ cT }
≤ min{y T b | y is (0, 1) − vector in P (G), y T A ≥ cT }
= χ(G).

Hence, if α(G) = χ(G) then also α(G) = α∗ (G).


We give an example of a perfect graph for which we can determine the Shan-
non capacity.
Example 1.26. Let G = C4 , then G is perfect because χ(G) = ω(G) = 2
and χ(H) = ω(H) for every induced subgraph H of G. Also α(G) = 2 and
hence Θ(G) = 2.

Figure 1.2: Cycle graph C4

1.3.2 Even and odd Cycle Graphs Cn


From the example in the previous section the question raises what is known
about the Shannon capacity of cycles Cn . We will see that it depends on
whether n is even or odd. First we give α(Cn ) and α∗ (Cn ) for the most
simple cycles.

• C3 : α(C3 ) = 1, α∗ (C3 ) = 1 ⇒ Θ(C3 ) = 1

• C4 : α(C4 ) = 2, α∗ (C4 ) = 2 ⇒ Θ(C4 ) = 2

12
• C5 : α(C5 ) = 2, α∗ (C5 ) = 5
2
⇒ 2 ≤ Θ(C5 ) ≤ 5
2

• C6 : α(C6 ) = 3, α∗ (C6 ) = 3 ⇒ Θ(C6 ) = 3

• C7 : α(C7 ) = 3, α∗ (C7 ) = 7
2
⇒ 3 ≤ Θ(C7 ) ≤ 7
2

• C8 : α(C8 ) = 4, α∗ (C8 ) = 4 ⇒ Θ(C8 ) = 4

• C9 : α(C9 ) = 4, α∗ (C9 ) = 9
2
⇒ 4 ≤ Θ(C9 ) ≤ 9
2

Figure 1.3: Cycle graphs C3 to C9

We see that if n is even, then α(Cn ) = α∗ (Cn ) = Θ(Cn ). This result follows
from the fact that if n is even Cn is a perfect graph or we can check directly
that χ(Cn ) = n2 . If n is odd we see Θ(C3 ) = 1, but for n > 3 we can only
give a lower and an upper bound for Θ(Cn ), namely n−1 2
≤ Θ(Cn ) ≤ n2 .
We should remark here that in general the exact determination of maximum
independent sets in Cnd seems to be a very hard task [2].
In the next chapter
√ we will see how Lovász proved that the Shannon capacity
of C5 equals 5, but for C7 and in general for Cn with n odd and n > 5, we
cannot give the exact value of the Shannon capacity.

13
Chapter 2

The capacity of the pentagon

In this chapter we explain how to determine the Shannon capacity of the


cycle graph C5 , also known as the pentagon [5]. From Shannon’s theorem we
know α(G) ≤ Θ(G) ≤ α∗ (G) and hence
√ 5
5 ≤ Θ(C5 ) ≤ .
2
p √ √
Indeed, Θ(C5 ) = supk k α(C5k ) =Psup{2, 5, . . .}, so Θ(C5 ) ≥ 5.
Also, for a clique C must apply x∈C w(x) ≤ 1, and since we want to max-
imize the sum of the weights of the vertices we give every vertex maximum
weight 21 , so α? (G) = 5 · 12 = 52 and hence Θ(C5 ) ≤ 25 .

Figure 2.1: C5


Theorem 2.1. Θ(C5 ) = 5

In the next proof we use the umbrella technique introduced by Lovász, see
also figure 2.2 [5].

14
Proof. Consider an umbrella whose handle and five ribs have the unit length.
Open the umbrella till the point where the maximum angle between the ribs is
π
2
. Let u1 , u2 , u3 , u4 , u5 be the ribs and c the handle as vectors oriented away
from their common point. Then u1 , . . . , u5 is an orthonormal representation
T −1/4
of C5 . By the Spherical √ Cosine Theorem we can compute c ui = 5 .
To show: Θ(C5 ) ≤ 5.
Let S be a maximum independent set in C5k , so S ⊆ C5k . Then
S ⊆ {(i1 , . . . , ik ) | i1 , . . . , ik ∈ V (C5k )}. We have X := {ui1 ◦ . . . ◦ uik |
(i1 , . . . , ik ) ∈ S} is orthonormal because ui and uj are orthogonal if i, j
nonadjacent. Let c be a unit vector. Then,

1 = hc
| ◦ .{z
. . ◦ c}, c
| ◦ .{z
. . ◦ c}i
k k
X
≥ hc ◦ . . . ◦ c, xi2
x∈X
X
= hc ◦ . . . ◦ c, ui1 ◦ . . . ◦ uik i2
(i1 ,...ik )∈S
k
X Y X 1 |S|
= hc, uij i2 = √ = √
( 5)k ( 5)k
(i1 ,...ik )∈S j=1 x∈X

⇒| S |≤ ( 5)k .

Therefore, q√
p
k k

Θ(C5 ) = sup | S | ≤ sup ( 5)k = 5.
k k

Figure 2.2: The Lovász umbrella

15
Chapter 3

Formulas for ϑ(G), an upper


bound for Θ(G)

In this chapter we investigate the upper bound ϑ(G) in more detail and give
several proofs which were stated by Lovász. We first prove that ϑ(G) is an
upper bound for Θ(G). Then we prove a theorem about positive semidefinite
matrices such that we can give a relation between ϑ(G) and eigenvalues of
symmetric matrices. Next we prove a theorem about the value of ϑ(G). After
introducing an orthonormal representation of G we draw some consequences.
Then we give another way of representing the value of ϑ(G) and we prove
equality in Lemma 1.13. We give some properties of ϑ(G) if G is vertex-
or edge-transitive or if G is regular and we end with proving that ϑ(G) is a
smaller or equal upper bound for Θ(G) than the one Shannon found.

Theorem 3.1. Θ(G) ≤ ϑ(G)

Proof. By Lemma 1.13 and 1.14 we have,


p p p
Θ(G) = sup k α(Gk ) ≤ sup k ϑ(Gk ) ≤ sup k ϑ(G)k = ϑ(G).
k k k

Theorem 3.2. λI − A positive semidefinite if and only if the largest eigen-


value of A ≤ λ.

Proof. Let λ1 , . . . , λn be the eigenvalues of A. Then λ − λ1 , . . . , λ − λn are


the eigenvalues of λI − A. Therefore λI − A is positive semidefinite ⇐⇒
λ − λ1 , . . . , λ − λn ≥ 0 ⇐⇒ λ1 ≤ λ, . . . , λ1 ≤ λ ⇐⇒ the largest eigenvalue
of A ≤ λ.

16
Theorem 3.3. Let G be a graph on vertices {1, . . . , n}. Then ϑ(G) is the
minimum of the largest eigenvalue of any symmetric matrix A = (aij )ni,j=1
such that
aij = 1 if i = j or if i and j are nonadjacent . (3.1)

Proof. Let U = (u1 , . . . , un ) be an optimal orthonormal representation of G


with handle c. Define
(
uT u
1 − (cT uii)(cjT uj ) if i 6= j
aij =
1 if i = j

Then (3.1) is satisfied and


 T  
ui uj
−aij = c− T c− T , i 6= j (3.2)
(c ui ) (c uj )

Because
T  T
uTi uj
  
ui uj T T uj ui
c− T c− T =c c−c T −c +
(c ui ) (c uj ) c uj cT u i (cT ui )(cT uj )
uT uj
= cT c − 2 + T i T
(c ui )(c uj )
uT uj
= −1 + T i T = −aij .
(c ui )(c uj )

And  2  
ui 1
ϑ(G) − aii = c− T + ϑ(G) − T 2 (3.3)
(c ui ) (c ui )
Because
(cT ui ) uTi ui 1
ϑ(G) − aii = cT c − 2 T
+ T 2
+ ϑ(G) − T 2
(c ui ) (c ui ) (c ui )
1 1
= 1 − 2 + T 2 − T 2 + ϑ(G) = ϑ(G) − 1.
(c ui ) (c ui )
 
By equation (3.2) define D := c − cTuu1 1 , . . . , c − cTuun n and let B = DT D.
By defintion of ϑ(G) holds ∀i ϑ(G) ≥ (cT 1ui )2 . Hence ∀i we can define wi :=
ϑ(G) − (cT 1ui )2 ≥ 0, here ∆w is the matrix with on the diagonal w1 , . . . , wn
and the other entries equal 0. Then by (3.2) and (3.3),
ϑ(G)I−A = B+∆w is positive semidefinite because B is of the form DT D and
B is symmetric with non-negative eigenvalues and ∆w is clearly symmetric

17
and has positive eigenvalues. By Theorem 3.2 the largest eigenvalue of A is
at most ϑ(G).
On the other hand, let A = (aij ) be a matrix as defined above and let λ be
its largest eigenvalue. Then by Theorem 3.2 λI − A is positive semidefinite
and hence there exists vectors x1 , . . . , xn such that
(λI − A)ij = λIij − Aij = λδij − aij = xTi xj with

1 if i = j
δij = (3.4)
0 if i 6= j

Let c be a unit vector perpendicular to x1 , . . . , xn and set


1
ui = √ (c + xi ).
λ
Then u2i = λ1 (c2 + 2cxi + x2i ) = λ1 (1 + x2i ).
Which follows 1 + x2i = λu2i and 1 + x2i = λ if i = j.
Hence u2i = 1 by (3.1) and (3.4) and for i, j nonadjacent
1
uTi uj = (c + xi )T (c + xj )
λ
1
= (cT c + cxTi + cT xj + xTi xj )
λ
1
= (1 + xTi xj ).
λ
Whence uTi uj λ = 1 + xTi xj and so uTi uj λ − 1 = xTi xj .
So by (3.4) uTi uj = 0.
Therefore (u1 , . . . , un ) is an orthonormal representation of G.
Moreover, cT ui = √1λ (cT c + cT xi ) = √1λ and we conclude

1
λ= .
(cT ui )2
Therefore, ϑ(G) ≤ λ by definition of ϑ(G).
A new definition of the Lovász Number just considered is:

ϑ(G) = min λ such that λ is the largest eigenvalue of a matrix A satisfying (3.1).
Remark. The above proof shows that among the optimal orthonormal repre-
sentations there is one such that
1 1
ϑ(G) = = . . . = . (3.5)
(cT u1 )2 (cT un )2

18
The next theorem gives a good characterization of the value of ϑ(G).
Theorem 3.4. Let G be a graph on vertices {1, . . . , n} and let B = (bij )ni,j=1
range over all positive semidefinite symmetric matrices such that
bij = 0 (3.6)
for every pair (i, j) of distinct adjacent vertices and
T r(B) = 1. (3.7)
Then ϑ(G) = maxB Tr(BJ).

Remark. Tr(BJ) is the sum of all entries of matrix B.


Proof. First we prove ϑ(G) ≥ maxB Tr(BJ).
Let A = (aij )ni,j=1 be the matrix as in Theorem 3.3 with largest eigenvalue
ϑ(G), and let B be any matrix as described above. Then
n X
X n n X
X n
Tr(BJ) = bij = aij bij = Tr(AB),
i=1 j=1 i=1 j=1

bij if i = j or i, j nonadjacent
because aij bij =
0 if i, j adjacent
and so
ϑ(G) − Tr(BJ) = ϑ(G) − Tr(AB) = ϑ(G)Tr(B) − Tr(AB)
= Tr(ϑ(G)B) − Tr(AB) = Tr(ϑ(G)IB − AB)
= Tr((ϑ(G)I − A)B).
Here both ϑ(G)I − A and B are positive semidefinite, by Theorem 3.2.
Let e1 , . . . , en be a set of mutually eigenvectors of B corresponding with
eigenvalues λ1 , . . . , λn ≥ 0. Then
n
X
Tr((ϑ(G)I − A)B) = eTi (ϑ(G)I − A)Bei (3.8)
i=1
n
X
= λi eTi (ϑ(G)I − A)ei ≥ 0. (3.9)
i=1

Namely, let U = [e1 , . . . , en ], then U T U = U U T = I.


For matrix C we have,
n
X n
X
T T T
Tr(C) = Tr(CI) = Tr(CU U ) = Tr(U CU ) = (U CU )i,i = eTi Cei .
i=1 i=1

19
Now let C = (ϑ(G)I − A)B and (3.9) follows.
The last inequality holds because if C positive semidefinite then
∀x xT Cx ≥ 0 since C = M T M . Then xT Cx = xT M T M x =k M x k2 ≥ 0.
By the derivation above ϑ(G) − Tr(BJ) ≥ 0 and so ϑ(G) ≥ max Tr(BJ).
Second we prove ϑ(G) ≤ max Tr(BJ).
Construct a matrix B which makes that the previous inequality becomes a
equality, i.e. Tr((ϑ(G)I − A)B) = 0.
Let (i1 , j1 ), . . . , (im , jm )(ik < jk ) be the edges of G. Consider the (m + 1)-
dimensional vectors
 X T
ĥ = hi1 hj1 , . . . , him hjm , ( hi )2
where h = (h1 , . . . , hn ) ∈ Rn , k h k= 1 ranges through all unit vectors and
z = (0, 0, . . . , 0, ϑ(G))T .
Claim: z is a convex hull of vectors ĥ.
By definition [6],
Xt t
X
convex hull (ĥ) = { λi fi | f1 , . . . , ft ∈ ĥ, λ1 , . . . , λt ∈ R+ , λi = 1} ⊆ Rm+1 .
i=1 i=1

Suppose the claim is not true. The vectors ĥ form a compact set, since
K = {h ∈ Rn |k h k= 1} is compact and the image of a compact space
under a continuous function is compact. Hence there exists a hyperplane
which seperate z from all the ĥ. Therefore there is a vector a and a real
number α such that aT ĥ ≤ α for all unitvectors h but aT z > α.
Set
a = (a1 , . . . , am , y)T .
Then in particular, aT ĥ ≤ α, for h = (1, 0, . . . , 0), so ĥ = (0, 0, . . . , 1) and
aT ĥ = y ≤ α. On the other hand, aT z > α implies ϑ(G)y > α.
Hence y > 0 and α > 0. To see this we have to show 0 < y ≤ α < ϑ(G)y,
i.e. (ϑ(G) − 1)y > 0.
ϑ(G) − 1 ≥ 0 because ϑ(G) ≥ 1 by definition of ϑ(G). If ϑ(G)y ≤ 0, then
y ≤ 0. Then ϑ(G)y ≤ y and (ϑ(G) − 1)y ≤ 0. Which is a contradiction, so
y > 0 and α > 0.
Up to rescaling we can suppose that y = 1 and hence α < ϑ(G).
Now define  1
a +1
2 k
if {i, j} = {ik , jk }
aij =
1 otherwise,
then aT ĥ ≤ α can be written as
Xn X n
aij hi hj ≤ α.
i=1 j=1

20
Since the P largest eigenvalue (l.e.v.) of A = (aij ) is max{xT Ax || x |= 1} and
T n Pn
x Ax = i=1 j=1 aij hi hj ≤ α it follows that the l.e.v.(aij ) is at most α.
First we show l.e.v.(A) = max{xT Ax || x |= 1}.
Let Ax = λx, we may assume | x |= 1, thus xT Ax = λxT x = λ | x |2 = λ.
Then, max{xT Ax || x |= 1} ≥ l.e.v.(A).
Since A is symmetric, there are eigenvectors e1 , . . . , en such that
eTi ej = δi,j , as in (3.4).
Agian assume | x |= 1 and x = ni=1 αi ei .
P
Then X X
1 =| x |2 = xT x = αi αj eTi ej = αi2 ,
i,j i

and
X X X
xT Ax = αi αj eTi Aej = αi αj eTi λj ej = λj αi αj eTi ej
i,j i,j i,j
X X X
= λi αi2 ≤ l.e.v.(A)αi2 = l.e.v.(A) αi2 = l.e.v.(A).
i i i

Since max{xT Ax || x |= 1} = α and since (aij ) satisfies (3.1), this implies


ϑ(G) ≤ α which is a contradiction and proves the claim.
By the claim there exists a finite number of unit vectors h1 , . . . , hN and non-
negative reals α1 , . . . αN such that

α1 + . . . + αN = 1
α1 ĥ1 + . . . + αN ĥN = z.
Set

hp = (hp,1 , . . . , hp,n )T
N
X
bij = αp hp,i hp,j
p=1

B = (bij ).

The matrix B is symmetric and positive semidefinite as is stated in the begin.


Further α1 ĥ1 + . . . + αN ĥN = z implies
N
X
bi k j k = αp hp,ik hp,jk = α1 h1ik h1jk + . . . + αN hNik hNjk
p=1

= α1k ĥ1 + . . . + αNk ĥN = z = (0, . . . 0, ϑ(G)) = 0 for k = 1, . . . m.


| {z }
m

21
Also,

X N
X X N
X X X
TrBJ = bij = αp hp,i hp,j = αp ( hp,i )( hp,j )
ij p=1 ij p=1 i j
N
X
= αp (ĥp )m+1 = z m+1 = ϑ(G).
p=1

And α1 + . . . + αN = 1 implies TrB = 1.


Therefore ϑ(G) = maxB TrBJ and completes the proof.
Lemma 3.5. Let (u1 , . . . , un ) be an orthonormal representation of G and
(v 1 , . . . , v n ) be an orthonormal representation of G. Moreover, let c and d
be any vectors. Then
n
X
(uTi c)2 (v Ti d)2 ≤ c2 d2 .
i=1

Proof. By (1.1) and (3.4) (ui ◦ v i )(uj ◦ v j ) = (uTi uj )(v Ti v j ) = δij .


Hence they form an orthonormal system and in general P if b1 , . . . bk are or-
thonormal, i.e. bTi bj = δij , then for all c holds | c |2 ≥ i (bTi c)2 . This is be-
cause we can expand b1 , . . . bk to an orthonormal basis b1 , . . . bk , bk+1 , . . . bn .
Let B = [b1 , . . . bn ], then B T B = I and also BB T = I.
Then
k
X n
X n
X
(bTi c)2 ≤ (bTi c)2 = bTi ccT bi = Tr(B T (ccT )B)
i=1 i=1 i=1
n
X
= Tr(BB T (ccT )) = Tr(ccT ) = cTi ci =| c |2 .
i=1

From this we can conclude


n
X
(uTi c)2 (v Ti d)2 ≤ cT c dT d = c2 d2 .
i=1

Corollary 3.6. If (v 1 , . . . , v n ) is an orthonormal representation of G and d


is any unit vector, then
X n
ϑ(G) ≥ (v Ti d)2 .
i=1

22
Indeed, ϑ(G) = (cT 1ui )2 ∀i by definition of ϑ(G) and (3.5).
Also by Lemma 3.5,
n n
X
2 2 1 X T 2
1=c d ≥ (uTi c)2 (v Ti d)2 = (v d) .
i=1
ϑ(G) i=1 i

Hence, Corollary 3.6 follows.

Corollary 3.7. ϑ(G)ϑ(G) ≥ n


1
For the complementary graph G we have ϑ(G) = (v T 2 ∀i.
i d)
By Corollary 3.6
n n
X X 1 1
ϑ(G) ≥ (v Ti d)2 = =n· .
i=1 i=1
ϑ(G) ϑ(G)

So, Corollary 3.7 follows.

The next theorem gives another formula for the value of the upper bound
ϑ(G). It uses the orthonormal representation of the complementary graph
G.

Theorem 3.8. Let (v 1 , . . . , v n ) range over all orthonormal representations


of G and d over all unit vectors. Then
n
X
ϑ(G) = max (dT v i )2 .
i=1

Proof. By Corollary 3.6 we have the inequality ≥. We will construct a


representation of G and a unit vector d for which equality holds. Let
B = (bij ) be a positive semidefinite matrix satisfying (3.6) and (3.7) such that
TrBJ = ϑ(G). Since B is positive semidefinite, there are vectors w1 , . . . wn
such that
bij = wTi wj . (3.10)
2
Note that ni=1 w2i = ni=1 wTi wi = TrB = 1 and ( ni=1 wi ) = TrBJ =
P P P
ϑ(G).
Set
( ni=1 wi )
P
wi
vi = and d = Pn .
| wi | | i=1 wi |

23
Then the vectors v i form an orthonormal representation of G by (3.10) and
(3.6). Moreover, by using the Cauchy-Schwarz inequality,
√ √
|< x, y >|≤ < x, x > < y, y >, we have
n n
! n !
X X X
(dT v i )2 = w2i (dT v i )2
i=1 i=1 i=1
n
!2 n
!2
X X
T T
≥ | wi | (d v i ) = (d wi )
i=1 i=1
n
!2 n
!2
X X
T
= d wi = wi = ϑ(G).
i=1 i=1

So, the inequality ≤ holds also and this proves the theorem.
Remark. If there is equality in the Caucht-Schwarz inequality it also follows
that
(dT v i )2 = ϑ(G)w2i = ϑ(G)bii ∀i. (3.11)
Since we derived some formulas for ϑ(G) in the previous we can now derive
some consequences, for example if G is vertex-transitive or regular.

Theorem 3.9. ϑ(G  H) = ϑ(G)ϑ(H)

Proof. In Lemma 1.13 we proved that ϑ(G  H) ≤ ϑ(G)ϑ(H). To prove the


opposite inequality, let (v 1 , . . . , v n ) be an orthonormal representation of G,
let (w1 , . . . , wn ) be
P an orthonormal representation of H and let c, d be unit
vectors such that ni=1 (v Ti c)2 = ϑ(G) and ni=1 (wTi d)2 = ϑ(H) by Theorem
P
3.8. Since v i ◦ wj is an orthonormal representation of G  H and if we prove
the next claim we can conclude v i ◦ wj is an orthonormal representation of
G  H.
Claim: G  H ⊆ G  H.

Let (u, v), (x, y) ∈ V (G) × V (H) be adjacent or equal in G  H


⇐⇒ ((u = x) ∨ (u, x) ∈ E(G)) ∧ ((v = y) ∨ (v, y) ∈ E(H))
⇐⇒ ((u, x) ∈
/ E(G)) ∧ ((v, y) ∈ / E(H))
=⇒ ((u = x) ∧ (v = y)) ∨ ((u 6= x) ∧ (u, x) ∈ / E(G)) ∨ ((v 6= y) ∧ (v, y) ∈
/ E(H))
⇐⇒ ((u, v) = (x, y)) ∨ ({(u, v), (x, y)} ∈
/ E(G  H))
⇐⇒ ((u, v) = (x, y)) ∨ ({(u, v), (x, y)} ∈ E(G  H))
⇐⇒ (u, v), (x, y) ∈ V (G) × V (H) adjacent or equal in G  H.

24
Moreover, c ◦ d is a unit vector. So,
n X
m
X 2
ϑ(G  H) ≥ (v i ◦ wj )T (c ◦ d)
i=1 j=1
Xn X m
= (v Ti c)2 (wTj d)2
i=1 j=1
Xn Xm
= (v Ti c)2 (wTj d)2 = ϑ(G)ϑ(H).
i=1 j=1

For the next theorem we introduce a new definition.

Definition 3.10. (Vertex- and Edge-transitive) A graph G is vertex-transitive


if for every pair i, j ∈ V (G) there is an automorphism that maps i to j. In
other words, vertex-transitivity guarantees that the graph looks the same
from each vertex. A graph is edge-transitive if for all e, f ∈ E(G) there is an
automorphism of G that maps the endpoints of e to the endpoints of f [7].

Theorem 3.11. If G has a vertex-transitive automorphism group, then

ϑ(G)ϑ(G) = n.

Proof. By Corollary 3.7 we have seen ϑ(G)ϑ(G) ≥ n. We will prove the op-
posite inequality.
Let Γ be the vertex-transitive automorphism group of G. Consider the el-
ements of Γ as n × n permutation matrices. Let B = (bij ) be a matrix
satisfying (3.6) and (3.7) such that TrBJ = ϑ(G). Consider
!
1 X
B = (bij ) = P −1 BP .
| Γ | P ∈Γ

ThenPB satisfies (3.6), because bij = 0 and therefore


1 −1

|Γ| P ∈Γ P BP = bij = 0.
1 −1
P 
Also TrB = 1, because if Tr( |Γ| P ∈Γ P BP ) = 1, then
Tr(PP ∈Γ P −1 BP ) =| ΓP
P
| and since TrB = 1 we have
−1
Tr( P ∈Γ P 1P ) =Tr( P ∈Γ I) =| Γ |, as desired.
Also,

25
! ! !!
1 X 1 X
TrBJ = Tr P −1 BP J = Tr P −1 BJP
|Γ| P ∈Γ
|Γ| P ∈Γ
!! !
1 X 1 X −1
= Tr P −1 BJ = P ϑ(G)
|Γ| P ∈Γ
| Γ | P ∈Γ
1
= | Γ | ϑ(G) = ϑ(G).
|Γ|
We used that P J = JP = J. This holds because P is a permutation matrix
which satisfies the condition that in each row and each column there is exactly
one entry equal to 1 and the other entries equal 0.
Beside this, B is symmetric and positive semidefinite, since B is positive
semidefinite and P −1 BP = P T BP since P is a permutation matrix.
Let {P P0 | P ∈ Γ} =  Γ. 
−1 −1 1 −1 1 −1 −1
P P
Then P0 BP0 = P0 |Γ| P ∈Γ P BP P 0 = |Γ| P ∈Γ P0 P BP P0 = B.
Therefore P −1 BP = B for all P ∈ Γ.
Since Γ is vertex-transitive and TrB = 1, bii = n1 for all i. Constructing
the orthonormal representation (v 1 , . . . , v n ) of G and unit vector d as in the
proof of Theorem 3.8, we have
ϑ(G)
(dT v i )2 = ϑ(G)bii =
n
by (3.11). From the definition of ϑ(G) we have,
1 n
ϑ(G) ≤ max =
1≤i≤n (v T d)2 ϑ(G)
i

and hence,
ϑ(G)ϑ(G) ≤ n.

Corollary 3.12. If G has a vertex-transitive automorphism group, then

Θ(G)Θ(G) ≤ n.

According to Theorem 3.1, where we proved that Θ(G) ≤ ϑ(G), the Corollary
follows.
Remark. Theorem 3.11 and Corollary 3.12 do not hold for all graphs be-
cause there are graphs with α(G)α(G) > n, for example a star. In this case
Theorem 3.11 an Corollary 3.12 contradict with Lemma 1.14.

26
Definition 3.13. (Regular Graph) A graph G is a regular graph if ∆(G) =
δ(G). Here ∆(G) is the maximum degree and δ(G) is the minimum degree.
The degree of a vertex v in G is the number of edges incident to v [7].
Theorem 3.14. Let G be a regular graph and let λ1 ≥ λ2 ≥ . . . ≥ λn be the
eigenvalues of the adjacency matrix A of G. Then
−nλn
ϑ(G) ≤ .
λ1 − λn
Equality holds if the automorphism group of G is transitive on the edges.
Proof. Consider the matrix J − xA where x will be chosen later. The matrix
satisfies the condition that (J −xA)ij = 1 if i = j or if i and j are nonadjacent
because aij = 0 in this case. We use Theorem 3.3 for J − xA and hence its
largest eigenvalue is at least ϑ(G). Let v i be the eigenvector of A belonging
to λi . Then since G is regular, say k−regular, we have Aj = kj. Therefore
v 1 = j is an eigenvector with eigenvalue k. Since A is symmetric we can
choose the eigenvectors all perpendicular to each other.
Let λ1 6= λ2 , Av i = λi v i and Av j = λj v j , then v Ti v j = 0. This is because
v Tj Av i = v Ti Av j , v Tj Av i = λi v Tj v i and v Ti Av j = λj v Tj v i . This is only
possible if v Ti v j = 0. Therefore j, v 2 , . . . , v n are also eigenvectors of J.
If it is true that v Ti v j = 0 if i 6= j. Then also

nv i if i = 1
Jv i =
0 if i 6= 1

So, (J − xA)v 1 = nv 1 − xλ1 v 1 = (n − xλ1 )v 1


and (J − xA)v i = −xλi v i if i 6= 1.
Hence, the eigenvalues of J − xA are n − xλ1 , −xλ2 , . . . , −xλn .
The largest of these is the first or the last one and the optimal choice of x is:
n
x= .
(λ1 − λn )

 n − xλ1and −xλn we get,


If we now rewrite
n
n − xλ1 = n − (λ1 −λ n)
λ1 = λ−nλ n
1 −λn

−xλn = λ−nλ n
1 −λn
.
Therefore we can conclude
−nλn
ϑ(G) ≤ .
λ1 − λn
We assume the automorphism group Γ of G is transitive on the edges.
Let C = (cij ) be a symmetric matrix such that cij = 1 if i = j or if i and

27
j are nonadjacent and having largest eigenvalue ϑ(G). As in the proof of
Theorem 3.11 consider
!
1 X
C = (cij ) = P −1 CP .
| Γ | P ∈Γ

Then C also satisfies the condition that (cij ) = 1 if i = j or if i and j are


nonadjacent. Using Theorem 3.3 its largest eigenvalue is at most and even
equal to ϑ(G). Since

1 if i = j or i, j nonadjacent
C = (cij )
β if i 6= j and i, j adjacent,

C = J − xA if x = 1 − β and hence also the second assertion follows.

Corollary 3.15. For odd n,

n cos( πn )
ϑ(Cn ) = .
1 + cos( πn )

We explain why this result is true. Let A be the adjacency matrix of Cn ,


then ai,j = 1 if and only if | j − i |= 1(mod n).
2πikl
Define (fk )l := e n , then
n
X 2πik(j+1) 2πik(j−1)
(Afk )j = aj,l (fk )l = (fk )j+1 + (fk )j−1 = e n +e n

l=1
 
2πikj
 2πik −2πik
 2πk
=e n e n +e n = 2 cos (fk )j .
n
2πk

Therefore, the eigenvalues
  of A are {2 cos n
| k = 1, . . . n}. Hence λ1 = 2
(n−1)π
= 2 cos π − n = −2 cos( πn ). Since Cn is transitive
π

and λn = 2 cos n
π π
−n·−2 cos( n ) n cos( n )
on the edges and by Theorem 3.14 ϑ(Cn ) = π
2+2 cos( n )
= 1+cos( nπ .
)

7 cos( π7 )
Example 3.16. For the cycle graph C7 we have ϑ(C7 ) = 1+cos( π7 )
and we
know α(C7 ) = 3, thus
7
α(C7 ) = 3 ≤ Θ(C7 ) ≤ ϑ(C7 ) ≈ 3, 3 ≤ α∗ (C7 ) = .
2
In this example we see that ϑ(C7 ) a smaller upper bound is for Θ(C7 ) than
α∗ (C7 ).

28
We arrived at one of the most important theorems. This theorem says that
ϑ(G) really is a smaller or equal upper bound for the Shannon capacity than
the one Shannon found.
Theorem 3.17. ϑ(G) ≤ α∗ (G)
Proof. Use Theorem 3.4 which says ϑ(G) = maxB TrBJ if B satisfies (3.6)
and (3.7). Let (u1 , . . . , un ) be an orthonormal representation of G and c a
unit vector such that n
X
ϑ(G) = (cT ui )2 .
i=1

Let C be any clique in G. Then {ui | i ∈ C} is an orthonormal set of vectors,


and hence X
(cT ui )2 ≤ cT c = c2 = 1.
i∈C

Hence the weights (c ui )2 form a fractional vertex packing and so by defini-


T

tion of α∗ (G)
X n
ϑ(G) = (cT ui )2 ≤ α∗ (G).
i=1

Another upper bound for Θ(G) is the dimension of an orthonormal represen-


tation of G which we will prove in the next theorem.
Theorem 3.18. Assume that G admits an orthonormal representation in
dimension d. Then
ϑ(G) ≤ d.
Proof. Let (u1 , . . . , un ) be an orthonormal representation of G in d−dimensional
space. Then V = (u1 ◦u1 , . . . , un ◦un ) is another orthonormal representation
of G. Let {e1 , . . . , ed } be an orthonormal basis and
1
b = √ (e1 ◦ e1 + . . . + ed ◦ ed ).
d
Then b2 = bT b = d1 i,j (ei ◦ ei )T (ej ◦ ej ) = d1 i,j δi,j = 1
P P
d
· d = 1, and
d
T 1 X
(ui ◦ ui ) b = √ (ek ◦ ek )T (ui ◦ ui )
d k=1
d
1 X T 2 1
=√ (ek ui ) = √ .
d k=1 d

29
We have dk=1 (eTk ui )2 = 1 because V T V = 1 which says
P
Pd 2
Pd T 2
Pd T 2
j=1 (V j ) = j=1 (V ej ) = j=1 (ej V ) = 1.
In terms of the definition of ϑ(G),
1 1 √
= = d.
(ui ◦ ui )T b bT (ui ◦ ui )

We take at both sides the square and since we just take an orthonormal
representation of G and an unit vector
 we get the inequality.

Therefore, ϑ(G) = minu1 ,...,un minc max1≤i≤n (cT 1ui )2 ≤ d.

30
Chapter 4

Further results on the Shannon


capacity

In this last chapter we discuss an article by Haemers [3] where he solved the
problems Lovász stated at the end of his article. The problems are:

1. Is ϑ(G) = Θ(G)?

2. Is Θ(G  H) = Θ(G)Θ(H)?

3. Is it true that Θ(G)Θ(G) ≥| V (G) |?

Haemers found a graph, the so called Schläfli graph, which is a counterex-


ample for the three problems.

Figure 4.1: Schläfli graph

31
Let A be a symmetric n × n matrix over a field with all diagonal entries
equal to one. Let G(A) be the graph with vertex set {1, . . . , n}, two vertices
being adjacent if and only if (A)ij 6= 0. Let A⊗k denote the kth tensor prod-
uct of A with itself. All other notation is the same as in [5]. First we need a
theorem.

Theorem 4.1. [3] Θ(G(A)) ≤ rank(A).

Proof. First, rank(A ⊗ B) =rank(A) rank(B).


Second, if B an m × m matrix with Bi,i = 1 ∀i, then α(G(B)) is equal to the
biggest principal submatrix I of B. So, rank(B) ≥ α(G(B)).
Third, G(A⊗k ) = G(A)k , because A⊗k is an nk × nk matrix with
(A⊗k )(i1 ,...,ik ),(j1 ,...,jk ) = Ai1 ,j1 Ai2 ,j2 . . . Aik ,jk . So,

(i1 , . . . , ik ) = (j1 , . . . , jk ) or {(i1 , . . . , ik ), (j1 , . . . , jk )} is an edge of G(A⊗k )


⇐⇒ (A⊗k )(i1 ,...,ik ),(j1 ,...,jk ) 6= 0 ⇐⇒ ∀h = 1, . . . k : Aih ,jh 6= 0
⇐⇒ ∀h = 1, . . . k : ih = jh or {ih , jh } is an edge of G(A)
⇐⇒ (i1 , . . . , ik ) = (j1 , . . . , jk )or{(i1 , . . . , ik ), (j1 , . . . , jk )} is an edge of G(A)k .

Hence, G(A⊗k ) = G(A)k .


Now we can conclude, rank(A)
p
k
= rank(A⊗k ) ≤ α(G(A⊗k )) = α(G(A)k ).
Therefore, rank(A) ≥ supk α(G(A)k ) = Θ(G(A)).
k

We will construct a matrix B having a (0, 1)−adjacency matrix which is the


complement of the Schläfli graph. Let B̃ be the (0, 1)−adjacency matrix of
the Kneser graph K(8, 2) which has 82 = 28 vertices and is regular of degree
6

2
= 15. The adjacency matrix B̃ equals one on the entries where the pairs
ofvertices are disjoint with each other. Therefore every row and column has
6
2
= 15 entries equal to one. Hence

(B̃ − I)(B̃ + 5I) = B̃ 2 + 4B̃ − 5I28 = 10J28 ; B̃J28 = 15J28 .

Since,

(B̃)2(i,j),(k,l) = number of pairs disjoint of both (i, j) and (k, l).


 4
 2 = 6 if (i, j) ∩ (k, l) = ∅
5
= = 10 if (i, j) ∩ (k, l) = 1
 62
2
= 15 if (i, j) = (k, l)

32
So, B̃ 2 = 6B̃ + 10(J − I − B̃) + 15I = −4B̃ + 5I + 10J.

Write  
T
0 j15 012 " #
0 aT
B̃ =  j15 B1 N=
 
a C
012 N T B2
then
" #2 " # " # " # " #
2
0 aT aT a aT C 0 aT 1 0 1 T
j27
B̃ = = = −4 +5 +10
a C Ca aat + C 2 a C 0 I27 j27 J27

and " #" # " # " #


0 aT 1 aT j27 1
B̃j28 = = = 15
a C j27 a + Cj27 j27
In general,
" # " #" #" #
0 J15,12 −I15 0 0 −J15,12 −I15 0
=
J12,15 0 0 I12 −J12,15 0 0 I12

Define
" # " # " #! " #
B1 J15,12 − N −I15 0 0 J15,12 −I15 0
B= = C−
J12,15 − N T B2 0 I12 J12,15 0 0 I12

Then

33
" # " #!2 " #
2
−I15 0 0 J15,12 −I15 0
B = C−
0 I12 J12,15 0 I12 0
" # " #
−I15 0 T
 −I15 0
T 2
= C − (j27 − a)a − a(j27 − a)
0 I12 0 I12
" #
−I15 0
= C 2 − C(j27 − a)aT − Ca(j27 − a)T − (j27 − a)aT C − a(j27 − a)T C
0 I12
" #!" #
12J15 0 −I15 0
+
0 15J12 0 I12
" #
−I15 0
= C 2 − Cj27 aT − Caj27
T
+ 2CaaT − j27 aT C − aj27
T
C + 2aaT C
0 I12
" #!" #
12J15 0 −I15 0
+
0 15J12 0 I12
" #
−I15 0
= − aaT − 4C + 5I27 + 10J27 − (15j27 − a)aT − (−4a + 10j27 )j27
T
0 I12
+ 2(−4a + 10j27 )aT − j27 (−4aT + 10j27 T
) − a(15j27T
− aT ) + 2a(−4aT + 10j27 T
)
" #!" #
12J15 0 −I15 0
+
0 15J12 0 I12
" # " #! " #
−I15 0 5J15 −J15,12 −I15 0
= −4C + 5I27 +
0 I12 −J12,15 5J12 0 I12
" # " # " #! " #
−I15 0 0 J15,12 J15 −J15,12 −I15 0
= −4C + 5I27 + 4 +5
0 I12 J12,15 0 −J12,15 J12 0 I12
" #" #" #
−I15 0 J15 −J15,12 −I15 0
= −4B + 5I27 + 5 .
0 I12 −J12,15 J12 0 I12

It follows that
(B − I)(B + 5I) = 5J

34
And from
" # " #! " #
−I15 0 0 J15,12 −I15 0
Bj27 = C− j27
0 I12 J12,15 0 0
I12
" # " #
−I15 0 −I 15 0
C − (j27 − a)aT − a(j27 − a)T

= j27
0 I12 0 I12
" #
−I15 0
Cj27 − 2Ca − j27 aT j27 + 2j27 aT a + aaT (j27 − 2a) − a(j27 − a)T (j27 − 2a)

=
0 I12
" #
−I15 0
= (15j27 − a − 2(−4a + 10j27 ) − 15j27 + 30j27 − 27a)
0 I12
" #
−I15 0
= (10j27 − 20a) = 10j27
0 I12
follows
BJ = 10J.
This implies that B has an all-one eigenvector with eigenvalue 10 and from
(B − I)(B + 5I) = 5J follows that the other eigenvalues are 1 and −5. Let
a, b, c be the multiplicities of the eigenvalues respectively 10, 1 and −5, then
a = 1 and 10a + 1b − 5c = TrB = 0. This implies b = 20 and c = 6.
Let A = I − B and G = G(A). The complement of G is the Schläfli graph.
The eigenvalues of A are 1 − 10 = −9, 1 − 1 = 0 and 1 − (−5) = 6 with
multiplicities 1, 20 and 6. So A has eigenvalue 0 with multiplicity 20, hence
rank(A) = 27 − 20 = 7. Thus by the previous theorem Θ(G) ≤ 7.
Define A(G) = J − I − B and let v 1 , . . . , v n be an orthogonal system of
eigenvectors. So if i ≥ 2, then j T v i = 0.
Hence (J − I − B)v i = Jv i − v i − λi v i = −v i − λi v i = (−1 − λi )v i if i ≥ 2
and (J − I − B)v 1 = 27v 1 − v 1 − λ1 v 1 = (27 − 1 − λ1 )v 1 .
Therefore A(G) = J − I − B has eigenvalues 27 − 1 − 10 = 16, −1 − 1 = −2
and −1 − (−5) = 4. Applying Theorem 3.14 to G and its complement G
gives
−27 · −5 −27 · −2
ϑ(G) ≤ = 9 and ϑ(G) ≤ = 3.
10 − (−5) 16 − (−2)
By Corollary 3.7 we have ϑ(G)ϑ(G) ≥ 27. Hence ϑ(G) = 9 and ϑ(G) = 3.
Therefore we have Θ(G) 6= ϑ(G), Θ(G)Θ(G) ≤ 21 < 27 and Θ(G  G) ≥ 27
since for every graph we have α(GG) ≥| V (G) | because {(v, v) | v ∈ V (G)}
is an independent set in G  G and we know α(G  G) ≤ Θ(G  G). Hence
also Θ(G  G) ≥| V (G) |= 27.
Whence we can conclude that all three problems are answered in the negative.

35
Chapter 5

Conclusion

We introduced a couple of definitions and theorems in the field of graph the-


ory and linear algebra before we defined the Shannon capacity and stated
Shannon’s Theorem. With Shannon’s Theorem we determined the Shannon
capacity for perfect graphs, but we only have a lower and an upper bound
for most of the other graphs. For example for the cycle graphs Cn with n
odd and n > 5 we have only n−1 ≤ Θ(Cn ) ≤ n2 .
2 √
For the cycle graph C5 we proved that the Shannon capacity equals 5 by
using the umbrella technique introduced by Lovász. He also introduced the
Lovász Number, ϑ(G), which is an upper bound for the Shannon capacity.
We proved that it is smaller or equal than the upper bound Shannon found.
Therefore, with the Lovász Number we can approach the Shannon capacity
of a graph more exactly. We proved a number of formulas and properties
of ϑ(G), include different ways of determining the value of ϑ(G) and multi-
plication of ϑ(G). In the last chapter we considered three problems Lovász
stated at the end of his article and we gave a counterexample by the so called
Schläfli graph, proved by Haemers. For this graph we have Θ(G) 6= ϑ(G) and
so the problems are answered in the negative.
As mentioned in the introduction, it would be valuable to know the Shan-
non capacity for communication models. Unfortunately since determining
the Shannon capacity of an arbitrary graph is very difficult, even for the
simple cycle graph C7 the Shannon capacity is unknown, we conclude that
this theoretical parameter is not very useful at this moment. On the other
hand, because of the connections with some central questions in graph the-
ory, further research in this field will maybe contribute to a better approach
and resulting in more applications of the Shannon capacity.

36
Popular summary

Suppose we want to send a message to a receiver. During the transmission by


a channel noise may occur and the message changes. In 1956 Shannon posed
the following interesting question: What is the maximum rate of transmis-
sion such that the receiver may recover the original message without errors?
We model the channel as a graph G(V, E) which is a set of vertices V (G) and
edges E(G) where a relation between two vertices appears if there is an edge
between these vertices. The vertices of the graph represent the letters of an
alphabet and two letters may be confused if they are connected by an edge.
For example we consider the graph C5 where the vertices correspond to the
numbers 1 to 5 as in Figure 5.1. We see that number 1 can confuse with

Figure 5.1: Cycle graph C5

number 2 or number 5. The maximum number of 1-letter messages which


can be sent without danger of confusion is defined as α(C5 ). This is the
maximum independent set in C5 which is defined as the maximum number
of vertices such that no two vertices are connected by an edge. In this case
α(C5 ) = 2 which says that there are 2 1-letter messages which can be sent
without danger of confusion. In other words, such that the receiver knows
whether the received message is correct. We define α(Gk ) as the maximum
number of k-letter messages which can be sent without danger of confusion.

37
Figure 5.2: C5  C5

For example we consider the graph C52 = C5  C5 as in Figure 5.2. For this
graph we have that two vertices (u1 , u2 ) 6= (v1 , v2 ) are adjacent in C5  C5 if
and only if ui = vi or ui vi ∈ E(C5 ) for i = 1, 2. In Figure 5.2, the red ver-
tices form the maximum independent set of this graph. The 2-letter messages
which can not be confused are v1 v1 , v2 v3 , v3 v5 , v4 v2 and v5 v4 , hence α(C52 ) = 5.

To answer the question of Shannon we define the Shannon capacity of a


graph, using the maximum independence number, as
p
Θ(G) = sup k α(Gk ).
k

The Shannon capacity attracted some interest in the field of Information


Theory and in the scientific community because of the applications to com-
munication issues. Unfortunately, the determination of the Shannon capacity
is a very difficult problem even for very simple small graphs. Therefore Shan-
non found an upper and a lower bound for the Shannon capacity. He stated

α(G) ≤ Θ(G) ≤ α∗ (G)

which is known as Shannon’s Theorem. Here α∗ (G) is defined as the maxi-


mum sum of all weights taken over all vertices where for every clique must
hold that the sum of the weights of the vertices in a clique is smaller or
equal to 1. A clique is defined as a subgraph of G where every two vertices
are connected by an edge. To clarify this we take again the graph C5 . In
Figure 5.3 we have one clique circled and in total we have five cliques. For
every clique must hold that the sum of the vertices is smaller or equal to
one. Let’s give each vertex weight 12 , then the condition for cliques is sat-
isfied and when we take the sum over all vertices we have α∗ (C5 ) = 5· 21 = 52 .

38
Figure 5.3: Clique in C5

Therefore, by Shannon’s Theorem we have 2 ≤ Θ(C5 ) ≤ 25 .



In 1979 Lovász proved that Θ(C5 ) = 5 using his umbrella technique. For
long time this was an open problem and therefore this was an important
result in the field of mathematics. Up to now the Shannon capacity of C7
is still an open problem which marks the difficulty of the problem. For the
cycle graphs Cn with n even the Shannon capacity is known because these
are perfect graphs and for these hold α(G) = Θ(G).

Beside the determination of the Shannon capacity of C5 Lovász also defined


the Lovász Number ϑ(G), an upper bound for the Shannon capacity, and
most important it is a smaller upper bound than the one Shannon found.
At the end of this article Lovász stated three problems, one of them is
Θ(G) = ϑ(G)? Haemers solved the three problems by giving a counterexam-
ple by the so called Schläfli graph. Therefore the problems are answered in
the negative.

Figure 5.4: Schläfli graph

39
Concluding, it would be valuable to know the Shannon capacity for a certain
graph, but because determining the Shannon capacity is a very difficult prob-
lem, we say that this theoretical parameter is not very useful at this moment.
On the other hand, because of the connections with some central questions in
graph theory, further research in this field will maybe contribute to a better
approach and resulting in more applications of the Shannon capacity.

40
Bibliography

[1] Aigner, M. & Ziegler, G.M. (2010) Proofs from THE BOOK, 4th edition,
Springer-Verlag Berlin Heidelberg 241-250.

[2] Codenotti, B., Gerace, I. & Resta, G. (2003) Some remarks on the Shan-
non capacity of odd cycles, Ars Combinatoria, 66, 243-257.

[3] Haemers, W. (1979) On Some Problems of Lovász Concerning the Shan-


non Capacity of a Graph, IEEE Transactions on Information Theory, 25,
no 2, 231-232.

[4] Lovász, L. (1972) Normal hypergraphs and the perfect graph conjecture,
Discrete Mathematics, 2, no 3, 253-267.

[5] Lovász, L. (1979) On the Shannon Capacity of a Graph, IEEE Transac-


tions on Information Theory, 25, no 1, 1-7.

[6] Schrijver, A. (2003) Combinatorial Optimalization, vol. A, Springer-


Verlag, Berlin.

[7] West, D.B. (2001) Introduction to Graph Theory, 2th edition, Prentice
Hall.

41

You might also like