Index Coding With Side Information

Index Coding with Side Information
Summer Project Report
by
SHUBHAM GIRDHAR
1311041
to the
School of Mathematical Sciences

National Institute of Science Education and Research
Bhubaneswar
Date
CERTIFICATE
International Institute Of Information Technology, Hyderabad
Certified that the summer project report Index Coding with Side Infor-
mation is the bonafide work of Shubham Girdhar ,Roll No- 1311041 ,School
of Mathematical Sciences, National Institute Of Science Education And Research,
Bhubaneswar carried out under my supervision during 20-May-2016 to 7-July-2016.
Place
Date
SUPERVISOR
Dr.Prasad Krishnan
Assistant Professor
SPCRC
International Institute Of Information Technology
i
ACKNOWLEDGEMENTS
To my family, your constant love and support during good times and bad gets me
through. Thank you for believing in me, and for me at times. To my friends, I am
glad to have every one of you. Thank you for your loyalty and integrity. And there
are many other people from outside mathematics to whom I owe a debt of gratitude,
too numerous to mention here.
To all the staff of the School of Mathematical Sciences,NISER, thank you for fostering
my interest. I would particularly like to thank the head of the Department, Dr. Anil
Karn for providing this oppurtunity. Im also thankful to Dr. Prasad Krishnan, an
extraordinary gentleman who supervised my project. I consider myself very fortu-
nate to have this experience, and have come to enjoy our meetings immensely. As a
chapter closes, I trust that your door will remain open.
Thank you
Shubham Girdhar
ii
Abstract
The index coding problem with side information is generally studied. Information and
matroid theory have been employed to understand the index coding problems with
near extreme rates and attempt has been made to convert graph theoretic results to
matroid theoretic results. The dual of the same has also been studied which leads to
generalized locally repairable codes and a new dual IC problem has been tried to be
defined.
Contents
1 Pre-requisites . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Information theory . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Matroid theory . . . . . . . . . . . . . . . . . . . . . . . . . . 1
2 Index coding problem Setup . . . . . . . . . . . . . . . . . . . . . . . 3
3 Outer Bounds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
4 A class of Index Coding Problems with rate 1/2 . . . . . . . . . . . . 5
5 Bounding optimal rate of ICSI . . . . . . . . . . . . . . . . . . . . . . 12
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
5.2 Index Coding with side information . . . . . . . . . . . . . . . 13
5.3 Digraphs with min-rank one less than the order . . . . . . . . 15
6 Generalized Locally Repairable Codes (GLRC) . . . . . . . . . . . . . 18
7 Our contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
7.1 Relation between matroidal theory and GLRC . . . . . . . . . 23
7.2 Results for digraph with (G = 1 . . . . . . . . . . . . . . . . 23
i
1 Pre-requisites
1.1 Information theory
Entropy
Entropy is a measure of the uncertainty of a random variable. Let X be a discrete
random variable and p(x) = P r{X = x}, x X be its probability mass function.
Definition 1. The entropy H(X) of a discrete random variable X is defined by

X
H(X) = p(x)log p(x) (1)
xX
Lemma 1. H(X) 0
1
Proof:0 p(x) 1 implies that log p(x)
0.
1.2 Matroid theory
Definition 2. (Matroid) A matroid M=(E , I ), where E is the ground set(usually

finite) and I is the set of independent sets (and is the subset of power set P(E ))
satisfying the following conditions:
1. I
2. If I I then for all subsets I 0 I implies I 0 I .
3. For any I1 , I2 I such that |I1 | < |I2 | then there exist some e I1 \I2 such
that I1 e I
Definition 3. A subset of the ground set E that is not independent is called depen-
dent. A maximal independent set is called a basis for the matroid. A circuit in a
matroid M is a minimal dependent subset of E .
The dependent sets, the bases, or the circuits of a matroid can characterize the
matroid completely. For instance, one may define a matroid as:
1
Definition 4. A matroid M be a pair (E , B) , where E is a finite set as before and
B is a collection of subsets of E , called bases, with the following properties:
1. B is non-empty.
2. If A and B are distinct members of B and a A\B, then there exists an

element b B such that A \ {a} {b} B.
It follows from the basis exchange property that no member of B can be a proper
subset of another.
Rank function
One of the basic result of matroid theory, directly analogous to a similar theorem of
basis in linear algebra, that any two bases of a matroid M have the same number of
elements. This number is called the rank of matroid M. Let A E , then matroid
on A can be defined by considering a subset of A independent iff it is independent
in matroid M. Thus, we can define the ranks of any subset of E . The rank of A is
given by rank function r(A), which maps subsets of E to positive integers. Now, the
matroid can be defined through rank function as follows:
Definition 5. A matroid M is a pair (E , r) satisfying following conditions:
1. For X E , 0 r(X) |X|.
2. If X Y , then r(X) r(Y ).
3. r(X) + r(Y ) r(X Y ) + r(X Y ), for all X, Y E .
Example
Dual of a matroid
If M is a finite matroid, we can define the dual matroid M* by taking the same ground
set and calling a set a basis in M* if and only if its complement is a basis in M. i.e.
2
M*=(E , B ) such that B = {E \B | B B}
It is not difficult to verify that M* is a matroid and that the dual of M* is M.
Theorem 1. Let X E and r, r be the rank function of M and M* respectively,

then r (X) = |X| r(E ) + r(E X)
2 Index coding problem Setup
Consider the following Index Coding(IC) problem. There are n messages in the
system, x1 , x2 , ...., xn ,where xj {0, 1}tj for j [n] and some tj . There are n re-
ceivers,where receiver j wants to obtain message xj and knows a subset of the mes-
sages a priori, denoted by x(Aj ) for some Aj [n]\{j}. For simplicity, We will refer
to j as the wanted message and to Aj as the side information of receiver j, respec-
tively. Any instance of this problem can be specified by a side information graph G
with n nodes, in which a directed edge i j represents that receiver j has message
i as side information (i Aj ) . Here [n] denotes the set {1, 2, ..., n} and the set of all
non-empty subsets of [n] is N = {J [n] : J 6= }.
The main difference in the system model compared to centralised index coding
problem is in the server setup. Instead of a single server which contains all messages,
there are 2n 1 servers. For each J N , there is a server that contains all messages
j J and the capacity of the broadcast link connecting server J to all receivers is
denoted by Cj . Hence, we assume that there are 2n 1 ideal bit pipes to the receivers
with arbitrary link capacities.If CJ = 1 only for J = [n] and is zero otherwise, we
recover the centralised index coding problem. A special normalised symmetric case
is where CJ = 1 for all J N . Server J sends sequence yJ {0, 1}uJ for some uJ to
all receivers which is a function of the messages at that server.
3
Figure 1: Distributed Index Coding for n=3, source: [2]
Based on the side information Aj and the received bits yJ {0, 1}uJ from all
servers, receiver j finds the estimate x
bj of the message xj . We say that rate-capacity
tuple (R, C) = ((Rj , j [n]), (CJ , J N )) is achievable if there exists r such that:
tj uJ
Rj , CJ , j [n], J N.
r r
For a given C , the capacity region C of this index coding problem is the closure
of the set of achievable rate tuples R = (R1 , R2 , ......, Rn ).
3 Outer Bounds
We generalize the polymatroidal outer bound for the centralised index coding prob-
lem as done in [2].
Theorem 2. Let Bj be the set of interfering messages at receiver j, i.e., Bj =

[n]\(Aj {j}). If (R, C) is achievable, then for every T N ,
4
Rj fT (Bj {j}) fT (Bj ), j T,
for some fT (S), S T , such that
1. fT () = 0
P
2. fT (T ) = J : JT 6= CJ ,
3. fT (A) fT (B) for all A B T , and
4. fT (A B) + fT (A B) fT (A) + fT (B), A, B T.
Corollary 1. If (R, C) is achievable for an index coding problem represented by the
directed graph G, then for every T N it must satisfy
P P
jS Rj J : JT 6= CJ ,
for all S T for which the subgraph of G induced by S does not contain a directed
cycle.
4 A class of Index Coding Problems with rate 1/2
Throughout this section, we use the following notations. Let [1 : m] denote {1, 2, ...., m}.
For a set of vectors A, sp(A) denotes their span. For a vector space V , dim(V ) de-
notes its dimension. An arbitrary finite field is denoted by F . A vector from the
5
m-dimensional vector space F m is said to be picked at random if it is selected accord-
ing to the uniform distribution on F m .
Formally, the index coding problem (over some field F ) consists of a broadcast
channel which can carry symbols from F , along with the following.
A set of T receivers
A source which has messages W = {Wi , i [1 : n]}, each of which is modelled as
a vector over F .
For each receiver j, a set D(j) W denoting the set of messages demanded by the
receiver j.
For each receiver j, a set S(j) W \D(j) denoting the set of side-information
messages available at the jth receiver.
Definition 6. (Index code of symmetric rate R). An index code of symmetric rate
R for a given index coding problem consists of an encoding function
E : |F LR F LR{z ... F LR} F L .

n times
for some L 1, mapping the n LR-length message vectors (Wi F L R) to some

L-length codeword which is broadcast through the channel, as well as decoding func-
tions
Dj : F L |F LR F LR{z ... F LR} |F LR F LR{z ... F LR}

|S(j)| times |D(j)| times
6
at the receivers j = [1 : T ], mapping the received codeword and the side-information
messages to the demanded messages D(j), i.e.,
Dj (E (W1 , ...., Wn ) , S(j)) = D(j), j [1 : T ].
Remark 1. We could in general have different rates for different messages, but in
this section we restrict our attention to symmetric rates. Therefore any rate referred
to in this section is the symmetric rate.
Definition 7. (Achievable rates and rate R feasibility). For a given index coding
problem, a rate R is said to be achievable if there exists an index code of rate R, and
the index coding problem is said to be rate R feasible.
Definition 8. (Scalar index codes and linear index codes). If a rate R = 1/L is
achievable, the associated index code is a scalar index code of length L. If the encoding
and decoding functions are linear, then we have a linear index code.
If we have a linear index code of rate R, then we can represent the encoding func-
tions as follows.
n
X
E (W1 , W2 , ....., Wn ) = Vi Wi ,
i=1
where each Vi is a L LR matrix with elements from F . In the scalar linear index
coding, we have LR = 1. Finding a scalar linear index code with length L is equiv-
alent to finding an assignment of these L-length vectors Vi s to the n messages such
7
that the receivers can all decode their demanded messages, i.e.,
n
X
Dj ( Vi Wi , S(j)) = D(j), j [1 : T ].
i=1
Remark 2. We restrict our attention to scalar linear index codes for the rest of this
section. However we believe that our results can be extended to vector linear index
codes as well.
Definition 9. (Interfering sets and messages, conflicts). For some receiver j and
for some messages Wk D)(j), let Interfk (j) = W \(Wk S(j)) denote the set
of messages (except Wk ) not available at the receiver j. The sets Interfk (j), k are
called the interfering sets at receiver j.If receiver j does not demand message Wk ,
then we define inf erk (j) = . If a message Wi is not available at a receiver j
demanding at least one message Wk = Wi , then Wi is said to interfere at receiver
j, and Wi and Wk are said to be in conflict.
For a set of vertices A W , let VE (A) denote the vector space spanned by the
vectors assigned to the messages in A, under the specific encoding function E . If
A = , we define VE (A) as the zero vector.
Definition 10. (Resolved conflicts). For a given assignment of vectors to the mes-
sages (or equivalently, for a given encoding function E ), we say that conflicts within
a subsets W 0 W are resolved, if
/ VE (Interfk (j) W 0 ) , Wk W 0 , receivers j [1 : T ],

Vk (2)
where Vk is the vector assigned to Wk under the encoding function E . If (2) holds
for W 0 = W , then all the conflicts in the given index coding problem are said to be
resolved.
8
Lemma 2. For any encoding function E , successful decoding at the receivers is pos-
sible if and only if all the conflicts are resolved.
By Lemma 2, it is clear that if there is an assignment of L-length vectors Vi s to
the messages Wi s such that the condition in Lemma 2 is satisfied, then these vectors
naturally define an index code of length L for the given index coding problem.
Definition 11. (Alignment graph and alignment sets). In the alignment graph, the
vertices Wi and Wj are connected by an edge(called an alignment edge) when the

messages Wi and Wj are not available at a receiver demanding a message other than
Wi and Wj . A connected component of the alignment graph is called an alignment
set.
It is easy to see that the alignment sets define a partition of the alignment graph.
Also, the messages in Interfk (j), for all messages k at all receivers j are fully con-
nected in the alignment graph.
Definition 12. (Conflict graph). In the conflict graph, Wi and Wj are connected by
an edge (called an conflict edge) if Wi is not available at a receiver demanding Wj ,
or Wj is not available at a receiver demanding Wi .
Definition 13. (Conflict hypergraph). The conflict hypergraph is an undirected hy-

pergraph with vertex set W (the set of messages), and its hyperedge set defined as
follows.
For any receiver j demanding any message Wk , Wk and Interfk (j) are connected
by a hyperedge, which is denoted by {Wk , Interfk (j) }.
Lemma 3. Suppose two index coding problems, denoted by I1 and I2 , are modelled
by the same conflict hypergraph. Then any index coding solution for I1 is an index
coding solution for I2 .
9
Definition 14. (Internal conflict). A conflict between two messages within an align-
ment set is called an internal conflict.
Lemma 4. Let U1 , ..., UN be N sets of vectors, such that dim(sp(Uj Uj+1 )) =

K, j [1 : N 1].Then the space spanned by N
j=1 Uj has dimension K if and only
if each Uj spans a vector space of dimension K.
The following is proved in [1]
Theorem 3. An index coding problem is rate 1/2 feasible iff there are no internal
conflicts.
Proof :-
Corresponding to any vertex Wk in the alignment graph, let Align(k) denote the
alignment set it belongs to (this is unique as the alignment sets partition the align-
ment graph). we first note that in any (scalar linear) index coding scheme for the
given problem, all the vertices must be assigned some non-zero vectors (zero vector
cannot be assigned to any message as this means that the message cannot be decoded
by any receiver).
If part: Suppose that there are no internal conflicts. We assume a large field F .
For each alignment set, we independently generate a random 2 1 vector over F and
assign it to the vertices of the alignment set. Because of random generation, we can
assume that any assigned vector is non-zero and any two assigned vectors are linearly
independent with high probability.Let E denote the associated encoding function and
Vk denote the vector assigned to vertex Wk . Since there are no internal conflicts,
we only have to check conflicts between alignment sets. For any vertex Wk , the set
Interfk (j) for any receiver j which demands Wk belongs to a
10
different alignment set than Align(k) because there are no internal conflicts.
Unique alignment set because all the messages in Interfk (j) must be in same align-
ment set.
Since any two alignment sets get independent vectors with high probability, we have
that Vk
/ VE (Interfk (j)), and the same argument is true for all receivers j and all
messages Wk . Hence this assignment of vectors ensures successful decoding by Lemma
2.
Only if part: Suppose that there is some internal conflict(represented in the

conflict graph as an edge between node k and node k) in the alignment set Align(k).
Because k 0 , k are part of the same alignment set Align(k), there lies a path from k to
k given by an ordered set {k 0 , i1 , ..., iN 1 , k}, such that every adjacent pair of elements
belong to interfering set of some receiver.
In some assignment corresponding to a rate 1/2 solution, let Vk0 , Vi1 , ...., Vk be
the non-zero vectors assigned to the vertices {k 0 , i1 , ..., iN 1 , k}. We define the sets,
U1 = {Vk0 , Vi1 }, Ul = {Vil1 , Vil }, l [2 : N 1] and UN = {ViN 1 , Vk }.
Suppose some dim(sp(Ul )) = 2 for some l. Then a receiver j (at which message
corresponding to Ul are unavailable) sees an interfering space of dimension 2. Thus

we assign a vector linearly independent from Ul (which itself has dimension 2) to the
corresponding demanded message of receiver j. This can be possible only if the as-
signed vectors are of length at least 3, i.e., the rate can be at most 1/3.
Therefore, for a rate 1/2 index coding assignment, all Ul should be spanning a
space of dimension 1. Thus we have dim(sp(Ul )) = 1, l and dim(sp(Ul Ul+1 )) =
11
1, l [1 : N 1]. By Lemma 4, we should thus have dim(sp(N
l=1 Ul )) = 1.
However k 0 and k are in conflict, which means that they should be assigned linearly
independent vectors, i.e., dim(sp({Vk0 , Vk })) = 2 which means that dim(sp(N
l=1 Ul )) >
1. Thus there is a contradiction and thus any internal conflicts forces the rate to be
less than 1/2. This concludes the proof.
5 Bounding optimal rate of ICSI

5.1 Introduction
Since its introduction in, the problem of index coding has been generalized in a
number of directions. It is a problem that has aroused much interest in recent years,
from the theoretical perspective, its equivalence to network coding has established it
as an important area of network information theory.
In this section we introduce some notation and definitions required for subsequent
subsections. We will assume that q, is some power of p. i.e. q = pl . For any positive
integer, n, we define [n] := {1, 2, . . . , n}. Let Fq denote the finite field and Fqnt denote
the vector space of n t matrices over the field Fq .

Given a matrix X Fnt j
q , then Xi and X denote the i
th
row and j th column,
respectively. Also, hXi denote the row span of matrix X.
Definition 15. A digraph D is a pair (V, E) where:
V is the set of vertices.
E is the set of arcs(or directed edges).
The arc of D is the ordered pair e = (u, v) E for some u, v V.
The number of vertices of a (di)graph is called its order.
12
Definition 16. (paths and circuits) A path in digraph D is the sequence of vertices
(u1 , ..., uk ) such that (ui , ui+1 ) E for all i [k 1]. If a path is closed, i.e. (uk , u1 )
E, then it is called circuit.
A (di)graph is called acyclic if it contains no circuits.
Let (D) be the circuit packing number of D, namely, the maximum number
of vertex-disjoint circuits in D.A feedback vertex set of D is a set of vertices whose

removal destroys all circuits in D. Let (D) denote the minimum size of a feedback
vertex set of D. We denote by (D) the maximum induced acyclic subgraph.
Definition 17. A clique of a (di)graph is a set of vertices that induces a complete
subgraph of that (di)graph. A clique cover of a (di)graph is a set of cliques that

partition its vertex set. A minimum clique cover of a (di)graph is a clique cover
having minimum number of cliques. The number of cliques in such a minimum clique
cover of a (di)graph is called the clique cover number of that (di)graph.
We denote by cc(D), the clique cover number of D.
Definition 18. Let D = (V, E) be a digraph of order n. A matrix M = (mi,j ) Fqnn

is said to fit D if

1 if i=j
mi,j =
0 if (i, j)
/E
The min-rank of D over Fq is defined to be
minrkq (D) = min{rankq (M ) | M f itsD}
5.2 Index Coding with side information
Lemma 5. A I(X , f )-IC of length N over Fq has a linear encoding map if and only
n
if there exists a matrix L FN
q such that for each i [m], there exists a vector
13
u( i) Fnq satisfying
Support(u(i) ) Xi (3)
u(i) + ef (i) hLi (4)
The following theorems are proved in [5]
Theorem 4. Let I(X , f ) be an instance of ICSI problem and Hbe its hypergraph.
Then the optimal length of q-ary I IC is minrkq (H).
All the users forming a clique in the side information digraph can be simultane-
ously satisfied by transmitting the sum of their packets. This idea shows that the
number of cliques required to cover all the vertices of the graph (the clique cover
number) is an achievable upper bound. An acyclic (di)graph has min-rank equal to
its order and for any subgraph G 0 of a graph G we have
minrkq (G 0 ) minrkq (G)
Let M be a matrix that fits G, the sub-matrix M of M restricted on the rows and
columns indexed by the vertices in V(G 0 ) is a matrix that fits G. These two results
are summarized in the following theorem.
Theorem 5. Let G be a digraph then (G) minrkq (G) cc(G)
Instead of covering with cliques, one can cover the vertices with circuits. It is based
on the observation that the existence of a circuit of length k in the side-information
(di)graph G requires at most k1 transmissions to satisfy the demands of the corre-
sponding k users. Therefore a collection of vertex disjoint circuits corresponds to a

saving of at least one transmission. The bound is stated as follows: Let (G) be the
circuit-packing number of a graph G of order n. Then
14
minrkq n (G)
There is also partition multicast scheme, which outperforms the circuit-packing num-
ber.
Theorem 6. Let G be a graph of order n. Then minrkq (G) n minvV degO (v),
for any q > n.
5.3 Digraphs with min-rank one less than the order
Let G be a digraph and let (G) denote the minimum number of vertices that must
be removed from G in order to obtain an acyclic subgraph. Then n (G) = (G).

Therefore,
n (G) minrkq (G) c.c(G)
over any finite field Fq . The following lemma is proved in [4],
Lemma 6. Let G = (V, E) be a directed graph of order n such that there exist i1 , i2 V
with
1. (i1 , i2 ) Eand(i2 , i1 )
/E
2. degO (i1 ) = 1
Let G 0 = (V 0 , E 0 ) with V 0 = V\{i1 } and E 0 = (E {(j, i2 )|(j, i1 ) E})\({(i1 , i2 )}

{(j, i1 )|(j, i1 ) E}). Then
minrkq (G) = minrkq (G 0 ) + 1
for any q.
Proof: Let M = (mi,j ) be a matrix that fits G of minimum rank. We may assume
that i1 = 1 and i2 = 2 so that the first two rows of M are:
15
M1 = (1, , 0, ..., 0) and
M2 = (0, 1, m2,3 , ..., m2,n )
If = 0 then it is easy to check that deleting the first row and the first column of M
we obtain M 0 of rank rank(M ) 1 that fits G 0 .

Now suppose that 6= 0. We may suppose that the rows of M1 , M2 , ..., Mminrk(G) are
linearly independent.
For each vertex i V\{1}, label the corresponding vertex in i V\1, label the
corresponding vertex in V 0 by i - 1. Then construct the (n 1) (n 1) matrix
M 0 whose ith row is obtained from the (i + 1)th row of M in the following way: for
i = 1, ..., minrkq (G) 1 let
Mi0 = (mi+1,1 + mi+1,2 , mi+1,3 , ..., mi+1,n ),
and for i = minrkq (G), ..., n 1 we define
Mi0 = (mi+1,1 + mi+1,2 1 (1 + ), mi+1,3 , ..., mi+1,n )

Pminrkq (G)
where 1 Fq satisfies Mi + 1 = r=1 r Mr for some i . The matrix M fits G 0 ,
so
minrkq (G 0 ) rank(M 0 ) minrkq (G) 1.
Conversely, let M 0 = (m0i,j ) be a matrix that fits G 0 having rank minrkq (G 0 ) and
suppose the rows M10 , M20 , ..., Mminrkq (G 0 ) are linearly independent. Let I = {j|(j, 1)
E} be the set of vertices of G with outgoing arcs directed to 1. We construct the

matrix M such that
M1 = (1, 1, 0, ..., 0),
Mi = (m0i1,1 , 0, m0i1,2 , ..., m0i1,n1 ),
for i I {2, ..., minrkq (G 0 ) + 1} and
16
Mi = (0, m0i1,1 , m0i1,2 , ..., m0i1,n1 ),
for i ([n]\I) {2, ..., minrkq (G 0 ) + 1}. For i > minrkq (G 0 ) + 1 we have that the
i 1th row of M is given by
0
Pminrkq (G 0 )
Mi1 = r=1 r Mr0 ,
for some r Fq . If i I, we put
Mi = (m0i1,1 , 0, m0i1,2 , ..., m0i1,n1 ) and hence obtain

0
Mi1 = M1 + minrk q (G )+1
P
r=2 r1 Mr ,
0
where the r are the coefficients in the linear combination of Mi1 , with respect to
the first minrkq (G 0 ) rows of M , and = rI
P
/ r 1. If i
/ I we set
Mi = (0, m0i1,1 , m0i1,2 , ..., m0i1,n1 ) and

0
Mi = M1 + minrk q (G )+1
P
r=2 r1 Mr ,
r 1 Then M fits G and minrkq (G) rank(M ) minrkq (G 0 ) +

P
where = rI
/
1.
Lemma 7. ( [4]) Let G be a directed graph of order n such that (G) = 2. Then
minrkq (G) = n 2, for any q > n.
Proof: Since n (G) minrkq (G), we need only to prove that minrkq (G)
n 2.
We may suppose without loss of generality that there does not exist i V with
out-degree less than 1, otherwise we can delete the node i and consider the induced
subgraph G 0 , which satisfies minrkq (G 0 ) = minrkq (G) 1.

Since (G) = 2, we have (G) {1, 2}. If it is equal to 2 then we have our claim
immediately. So, assume (G) = 1. We apply Lemma 6, iteratively. Note that any
17
time that we reduce a graph G by an appropriate arc contraction, we obtain G 0 with
(G 0 ) = 2 and (G 0 = 1. In fact any time that we reduce the graph we only shorten
the circuits that pass through the node that we delete, and we do not create any new
circuit from the fact that the out-degree of the node is 1.
At the point that Lemma 6 is no longer applicable, there are two possible cases:
1. the out-degree of each node of the reduced graph G 0 is at least 2,
2. here exists i1 with out-degree 1 and (i1 , i2 ), (i2 , i1 ) E 0 .
This last case is not possible, in fact if we consider the circuit C = (i1 , i2 ), from
(G 0 ) = 2 we have that there exists a circuit C 0 which remains after deleting i2 .
Then, C does not pass through i1 otherwise it has to pass through i2 . Then C and
C are disjoint, but this is not possible because (G 0 ) = 1.
Therefore, reducing G we obtain G 0 with k fewer nodes and all nodes have out-degree
at least 2. Then from Theorem 6 and Lemma 6 it follows that
minrkq (G) = minrkq (G 0 + k) n 2.
Corollary 2. ( [4]) Let G a graph of order n and let q > n. Then minrkq (G) = n 1
if and only if (G) = 1.
Proof: If (G) = 1 then (G) = 1 and we have minrkq (G) = n 1.

Conversely towards a contradiction assume that (G) 2. Then consider a subgraph
G 0 of G with (G) = 2. From Lemma 7 we have our claim.
6 Generalized Locally Repairable Codes (GLRC)
We will define a vector linear Index Code(IC), a vector linear Generalized Locally
Repairable Code(GLRC) and obtain a duality between GLRC and IC.
18
Definition 19. An index coding problem instance is given by n distinct messages,
xi 1 i n with xi p , each intended for a distinct user among a set of n users.

P
Every user has some side-information which is described by a set of indices, Si

{1, 2, 3, ..., n}, such that j Si implies that user i has packet xj as side information
and i
/ Si . This is represented by a directed side information graph G(V, E) where
each vertex represents a user and a directed edge from i to j is present if j Si .
For ease of notation, let x = [xT1 xT2 ..... xTn ]. The objective is to design a suitable
transmission schemes such that each user decodes its desired packet from the encoded
transmission and the side information packets available with them. Formally, a vector
linear index code, which represents a linear transmission scheme, is defined as follows:
P
Definition 20. A valid ( , p, n, k) vector linear index code, for an index coding prob-
lem on G(V, E), is a collection of k linear encoding vectors vi pn1 spanning a sub-
P
space C pn of dimension k such that, from the k broadcast transmissions viT x, all
P
users are able to decode their respective packets using their side-information using lin-
ear decoding. In other words, there are decoding functions i : i ({vi T x}ki=1 , {xj }jSi ) =
P
xi , i which are linear in all the arguments (in all the sub-symbols belonging to ).
The broadcast rate of the index code is given by k/p since every channel use
P
consists of p symbols from the alphabet . The total number of transmission is k in
P
terms of the alphabet . The total number of transmissions that is needed if side
information is not present is np. The index code C has the following generator matrix
with the encoding vectors vi as the rows.

T
v1
v2T

.

.
V =

.
vkT
19
y = V x is the vector containing the k encoded transmissions corresponding to
the index code C. The complementary index coding problem is essentially is same
as the index coding problem except that the objective is to maximize the number of
transmissions saved. The number of saved transmissions is (np k). The comple-
P
mentary index code rate is given by (n k/p) since log( ) bits are transmitted
every channel use.
P
Definition 21. A ( , p, n, k) vector linear generalized locally repairable code (GLRC)
of dimension k is a k dimensional subspace C pn where each set of p subsymbols

P
is grouped into one codeword supersymbol. Further, a codeword supersymbol i satisfies

the following recoverability condition : every sub symbol of the ith supersymbol is a
linear combination of the subsymbols belonging to a set Si of codeword super symbols

not containing i. These conditions can also be represented in the form of a directed
recoverability graph G(V, E) where the vertices correspond to the n super symbols and
the directed out-neighbourhood of a vertex i is the recoverability set Si .
A GLRC C is said to be valid on the recoverability digraph G if it satisfies the

conditions given by the digraph. The generator matrix, of dimensions k pn, for
the code C is given by
G = [g11 g12 .... g1p g21 ..... gnp ]
Pk1
Here, gij , 1 i n, 1 j p is the coding vector that determines
the jth subsymbol of the supersymbol i in a codeword through a linear combination

of k message subsymbols.
The following theorem establish the relation between dual IC problem and GLRC
20
and is proved in [3].
Theorem 7. Let C be a linear code(of a subspace) of dimension k. Let the dual
code(or the dual subspace) of C of dimension np k be denoted by C pn . Then,

P
C is a valid index code for the side information graph G iff C is a valid GLRC when
G is taken as a recoverability graph.
Proof :- We first show that if C is a valid index code on G, with generator V as

described before, then the dual code C with its generator G is a valid GLRC code
for G. Consider any user i in the index coding problem. Let the side information
be Si . If C is a valid index code, then there exists a vector linear decoding function
i : i (y, {xj }jSi ) = xi . This is true for all message vectors x : y = V x. Let w
be a vector such that y = V w. Let x represent the actual message vector(of all n
messages). Let the encoded transmission be y. Then x = w + z for some z C

because C is the right null space of V .
Given y, the uncertainty about message vector x is because of the unknown z

in the null space. In that sense, given the generator V of the code, one can fix a
candidate w for a given y. Because i is linear in all the arguments, we have the
following chain of inequalities:
i (y, {xj }jSi ) = xi
i (y, {wj + zj }jSi ) = wi + zi
i (y, {wj }jSi ) + i (0, {zj }jSi ) = wi + zi (5)
The last step uses linearity of i . The decoding should work even when w is the
actual message vector. Hence, i (y, {wj }jSi ) = wi . With (5), we have:
i (0, {zj }jSi ) = zi (6)
21
Since i is linear, this implies that every subsymbol of the ith code supersymbol is
linearly dependent on all the code subsymbols in the set Sj for the dual code C
since z C . Hence, the dual code is valid GLRC proving one direction.
To prove the other direction, let us assume that for every i : 1 i n, there exists
functions i such that:

i ({zj }jSi ) = zi , z C (7)
Here, z is a vector of all supersymbols zi . This means that every supersymbol i

of the GLRC code C is recoverable from the set Si of codeword supersymbols. For
the index coding problem, let x be the message vector not known to the users prior
to receiving the encoded transmission. Let y = V x. Given y, from the previous part
of the proof, we know that x = w + z for some z C . w is known to all users
from just y because the code V employed is known to all users.

Since z satisfies the recoverability conditions in (7), wi + i ({xj wj }jSi ) = xi . w
is a function of just y and V . Hence, user i can recover xi from supersymbols from
the side information set Si and the encoded transmission y for all message vector x.
We again note that the choice of w is arbitrary. For every y, users have to pick
some w such that y = V w. Since the forward map is linear, the inverse one-to-one
map 1 (y) determining w can be made linear by fixing 1 (ei ) for all unit vectors
e1 , ..., ek . Then, linearity of the forward map determines a candidate pre-image for
(ei ). Therefore, if i are all linear in all the
Pk
all vectors y, i.e. 1 (y) = i=1 yi
1
subsymbol arguments, then the decoding functions for the index coding problems are
also linear. This completes the proof.
22
7 Our contribution
7.1 Relation between matroidal theory and GLRC

Let there be one source with n messages and n receivers. Let V = V1 ... Vn kn
,
where k is the minrank of the IC problem.

P P
At any receiver, xi Vi + xinterf Vinterf . {Vdemands } is linearly independent and
hVinterf i hVdemands i = 0
dim(hVinterf i) + dim(hVdemands i) = dim(hVinterf i hVdemands i).

Now, r(VI ) = r(E \(SI d)) = dim(hVinterf i),
r(Vd ) = dim(hVdemand i). Therefore, the IC problem is
r(E\SI) = r(E\(SI d)) + r(d) = r(E\(SI d)) + |d|

Let M* be the dual matroid of M, then(using Theorem 1)
r (SI) = r (SI d) (8)
This equation can be interpreted as, the demand vectors at each receivers can be
recovered with the help of side information present at that receiver. This is same
result we obtained in section 6, the only difference here is that we used matroid
theoretic approach.
7.2 Results for digraph with (G = 1)
Let G be a digraph. Define (G) to be the minimum number of vertices to be removed
to make the digraph acyclic.

The dual problem of an IC problem with minrank k requires its minrank to be n-k
and the condition r (SI) = r (SI d) to be satisfied at each receiver, where n is
number of messages in IC problem.
Lemma 8. Let G be a digraph, such that (G) = 1. Then the column vector of
23
node, removal of which makes G acyclic, in dual matrix cannot be zero for a minrank
solution.
Proof: Let T (G) be the set of nodes of G such that removing any node in T (G)
makes G acyclic and let v T (G). Consider a outgoing neighbour of v, say v1 and
its side information set, say SIv1 . The set SIv1 contains v and some other nodes.
All the nodes in SIv1 , which are not part of any cycle in G have corresponding zero
column vector in dual matrix and are thus, not demanded in dual problem. Therefore,
without loss of generality, SIv1 and subsequent side information sets can be assumed
to contain only the nodes which are part of some cycle in G. Let x SIv1 \{v} and
consider side information set of x, which is SIx . This set cannot contain v1 , because
if it does then the cycle x v1 x will remain after removing v from G, which is not
possible. Now, take some x1 SIx \{v} and consider its side information set SIx1 .
This, set cannot contain v1 , x because, if it does then the possible cycles x x1 x and
x1 x v1 x1 are without v which is not possible. Now, take some x2 SIx1 \{v} and
continue the process. At each step our side information set is getting smaller and as
number of nodes is finite, there will be a node say, xn SIxn1 \{v} such that SIx1
will only have v. It cannot be empty as xn is part of some cycle in G. Let us denote xn
as w, which is different from v.Hence, the column vector of w in dual matrix depends
only on column vector of v by the condition r (SI) = r (SI d).
Consider a outgoing neighbour of w, say w1 and its side information set, say
SIw1 . The set SIw1 contains w and some other nodes(may or may not contain v).
All the nodes in SIw1 , which are not part of any cycle in G have corresponding zero
column vector in dual matrix and are thus, not demanded in dual problem. Therefore,
without loss of generality, SIw1 and subsequent side information sets can be assumed
to contain only the nodes which are part of some cycle in G. Let x SIw1 \{w, v} and
24
consider side information set of x, which is SIx . This set cannot contain w1 , because
if it does then the cycle x w1 x will remain after removing v from G, which is not
possible. Now, take some x1 SIx \{w, v} and consider its side information set SIx1 .
This, set cannot contain w1 , x because, if it does then the possible cycles x x1 x and
x1 x w1 x1 are without v which is not possible. Now, take some x2 SIx1 \{w, v}
and continue the process. At each step our side information set is getting smaller and
as number of nodes is finite, there will be a node say, xn SIxn1 \{w, v} such that
SIx1 will only have v or w or both v, w. It cannot be empty as xn is part of some

cycle in G. Let us denote xn as y, which is different from v, w.Hence, the column
vector of y in dual matrix depends on column vectors of v and w by the condition
r (SI) = r (SI d). But as column vector of w also depends only on v column
vector of y depends only on column vector of v in dual matrix.
Continuing this way, we get a set, say S = {w, y, ...} such that each node in S is
distinct, part of some cycle in G and is side information for some node. As there are
only finite number of nodes means that S is a finite set.
The only remaining nodes are the ones which are part of a cycle and have no side
information other than v i.e. outgoing neighbour of v with no other nodes as side
information. Let be any such node.Then only cycle possible in this case is v v.
Here, clearly, the column vector of on dual matrix depends on column vector of v.
Theorem 8. Let G be a digraph such that (G) = 1. Then there exists atleast a cycle,
say C in G such that the column vectors in the dual matrix corresponding to messages
in C are all non-zero for a minrank solution.
Proof: Let v be as given in previous lemma. Hence, the column vector corre-
sponding to v in dual matrix is non-zero (by previous lemma). We consider that every
node has a side information because if the node do not have any side information
25
it has zero column vector in the dual matrix. Now, the column vector of v in the dual
matrix depends on the column vectors of nodes in side information(SI) of v.

v1 in the SI of v such that column vector of v1 in dual vector has to be non-zero.
Now, this column vector of v1 in dual matrix depends on the column vector of nodes
in SI of v1 .
v2 in the SI of v1 such that column vector of v2 in dual matrix is non zero.
Continuing this way, since there are finite number of nodes, this sequence forms a
cycle, which we denote by C. This C has to pass through v or else removing v from
G will not make G acyclic. Therefore, for every node in C, we have corresponding
non-zero column vector in dual matrix.
Corollary 3. Let G be a digraph such that (G) = 1 and let T (G) be the set as in
lemma 1, then column vector corresponding to every node in T (G) is non-zero in dual
matrix, for a minrank solution.
Proof: Directly, by the application of previous theorem.
Theorem 9. Suppose G is a digraph. If (G) = 1, then minrkq = n 1, q.
Proof:- We know that n (G) minrkq (G) and minrkq (G) n (G) .
So n 1 minrkq (G).
Since (G) = 1, We have atleast one cycle in G .

So, (G) 1.
This implies n 1 minrkq (G) n (G) n 1.
Hence, minrkq (G) = n 1, q as q is arbitrary.
26
References
[1] Prasad Krishnan and V.Lalitha, A class of index coding problems with rate 1/3,
ArXiv, Jan. 2013, https://arxiv.org/pdf/1601.06689
[2] Parastoo Sadeghi, Fatemeh Arbabjolfaei and Young-Han Kim, Distributed Index
Coding, ArXiv, Apr. 2016, https://arxiv.org/abs/1604.03204
[3] Karthikeyan Shanmugam and Alexandros G. Dimakis, Bounding Multiple Uni-

casts through Index Coding and Locally Repairable Codes, ArXiv, Feb. 2014,
http://arxiv.org/abs/1402.3895
[4] Eimear Byrne and Marco Calderini, Bounding the optimal rate of the ICSI and
ICCSI problem, ArXiv, Apr. 2016, https://arxiv.org/abs/1604.05991
[5] Son Hoang Dau, Vitaly Skachek, and Yeow Meng Chee. On the security of in-
dex coding with side information. Information Theory, IEEE Transactions on,
58(6):39753988, 2012.
[6] Thomas M. Cover and Joy A. Thomas, Elements of Information Theory, (Wiley,
Hoboken, New Jersey, 2006) [second edition]
27

Index Coding With Side Information

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Index Coding With Side Information

Uploaded by

Copyright:

Available Formats

Index Coding with Side Information

Summer Project Report

School of Mathematical Sciences

International Institute Of Information Technology, Hyderabad

Entropy is a measure of the uncertainty of a random variable. Let X be a discrete

Definition 1. The entropy H(X) of a discrete random variable X is defined by

1.2 Matroid theory

Definition 2. (Matroid) A matroid M=(E , I ), where E is the ground set(usually

satisfying the following conditions:

2. If I I then for all subsets I 0 I implies I 0 I .

matroid M is a minimal dependent subset of E .

B is a collection of subsets of E , called bases, with the following properties:

2. If A and B are distinct members of B and a A\B, then there exists an

Definition 5. A matroid M is a pair (E , r) satisfying following conditions:

1. For X E , 0 r(X) |X|.

2. If X Y , then r(X) r(Y ).

3. r(X) + r(Y ) r(X Y ) + r(X Y ), for all X, Y E .

It is not difficult to verify that M* is a matroid and that the dual of M* is M.

Theorem 1. Let X E and r, r be the rank function of M and M* respectively,

2 Index coding problem Setup

non-empty subsets of [n] is N = {J [n] : J 6= }.

Theorem 2. Let Bj be the set of interfering messages at receiver j, i.e., Bj =

for some fT (S), S T , such that

3. fT (A) fT (B) for all A B T , and

Corollary 1. If (R, C) is achievable for an index coding problem represented by the

directed graph G, then for every T N it must satisfy

4 A class of Index Coding Problems with rate 1/2

ing to the uniform distribution on F m .

R for a given index coding problem consists of an encoding function

E : |F LR F LR{z ... F LR} F L .

for some L 1, mapping the n LR-length message vectors (Wi F L R) to some

Dj : F L |F LR F LR{z ... F LR} |F LR F LR{z ... F LR}

Dj (E (W1 , ...., Wn ) , S(j)) = D(j), j [1 : T ].

to in this section is the symmetric rate.

j, and Wi and Wk are said to be in conflict.

A = , we define VE (A) as the zero vector.

/ VE (Interfk (j) W 0 ) , Wk W 0 , receivers j [1 : T ],

sible if and only if all the conflicts are resolved.

By Lemma 2, it is clear that if there is an assignment of L-length vectors Vi s to

vertices Wi and Wj are connected by an edge(called an alignment edge) when the

Definition 13. (Conflict hypergraph). The conflict hypergraph is an undirected hy-

ment set is called an internal conflict.

Lemma 4. Let U1 , ..., UN be N sets of vectors, such that dim(sp(Uj Uj+1 )) =

if each Uj spans a vector space of dimension K.

The following is proved in [1]

Interfk (j) for any receiver j which demands Wk belongs to a

Only if part: Suppose that there is some internal conflict(represented in the

U1 = {Vk0 , Vi1 }, Ul = {Vil1 , Vil }, l [2 : N 1] and UN = {ViN 1 , Vk }.

corresponding to Ul are unavailable) sees an interfering space of dimension 2. Thus

space of dimension 1. Thus we have dim(sp(Ul )) = 1, l and dim(sp(Ul Ul+1 )) =

5 Bounding optimal rate of ICSI

the vector space of n t matrices over the field Fq .

Definition 15. A digraph D is a pair (V, E) where:

V is the set of vertices.

E is the set of arcs(or directed edges).

The arc of D is the ordered pair e = (u, v) E for some u, v V.

The number of vertices of a (di)graph is called its order.

of vertex-disjoint circuits in D.A feedback vertex set of D is a set of vertices whose

Definition 17. A clique of a (di)graph is a set of vertices that induces a complete

subgraph of that (di)graph. A clique cover of a (di)graph is a set of cliques that

cover of a (di)graph is called the clique cover number of that (di)graph.

We denote by cc(D), the clique cover number of D.

Definition 18. Let D = (V, E) be a digraph of order n. A matrix M = (mi,j ) Fqnn